#foswiki 2017-02-02,Thu

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)

WhoWhatWhen
***gac410 has left [04:34]
.............................. (idle for 2h26mn)
ChanServ sets mode: +o cdot [07:00]
.... (idle for 15mn)
GithubBot[distro] MichaelDaum pushed 1 new commit to Item14288: https://git.io/vDOrP
distro/Item14288 c1f7654 MichaelDaum: Merge branch 'master' into Item14288
[07:15]
***GithubBot has left [07:15]
FoswikiBothttps://foswiki.org/Tasks/Item14288 [ Item14288: rewrite to support pluggable edit engines ] [07:15]
....... (idle for 32mn)
***ChanServ sets mode: +o MichaelDaum [07:47]
.................................................................... (idle for 5h36mn)
ChanServ sets mode: +o okrueger [13:23]
ChanServ sets mode: +o Lynnwood [13:29]
........ (idle for 37mn)
ChanServ sets mode: +o Lynnwood [14:06]
......... (idle for 43mn)
ChanServ sets mode: +o gac410 [14:49]
..................... (idle for 1h41mn)
LynnwoodGreetings all. I'm revisiting PublishPlugin as relates to unicode. After publishing some topics, I see that it's not dealing gracefully with some characters.
This reminded me of comment I saw in Tasks.Item14198 from FlorianSchlichting: "...this is what I've got so far, which doesn't account for "unicode core" yet"
[16:30]
FoswikiBothttps://foswiki.org/Tasks/Item14198 [ Item14198: PublishPlugin fails in Foswiki 2.1.2 while trying to render zones ] [16:32]
LynnwoodAny pointers as to what would be involved with address "unicode core" issues in a plugin? [16:32]
.......... (idle for 46mn)
gac410Hi Lynnwood ... there are some rules of thumb / guidelines somewhere in the Development/Support web. [17:18]
LynnwoodThanks! I'll look for them. [17:18]
gac410But the big thing is that any external file i/o needs to be utf-8 encoded
And if it uses the workarea file API - can't recall the Func::.. name - There is a flag to cause it to use utf-8. Applicable if the file is text,
[17:18]
LynnwoodI found this: Tasks.Item13483 but it doesn't have any info about what kind of changes are required. [17:26]
FoswikiBothttps://foswiki.org/Tasks/Item13483 [ Item13483: Incompatible extensions with the unicode core ] [17:26]
gac410Lynnwood: See https://foswiki.org/Support/Utf8MigrationConsiderations#Perl for one
And https://foswiki.org/System/PerlDoc?module=Foswiki::Func#readFile_40_36filename_44_36unicode_41_45_62_36text
[17:30]
FoswikiBothttps://trunk.foswiki.org/System/PerlDoc?module=Foswiki::Func [17:31]
gac410Remeber that all data accessed via the Foswiki APIs should be unicode by default. It's when the plugin has to interact with the external world where encoding is required.
So one thing that might be needed is to actually *remove* utf8 encoding/decoding if it's used against anything provided by Foswiki
But things like reading/writing from the file system, including reading the directory, or converting topic names (unicode) to file system names (utf8) is where the work is.
[17:32]
Lynnwoodok. thanks. I'm going to have to wrap my head around this later... i got some pressing matters today. I'm going to save these notes. It gives me somewhere to start. [17:34]
gac410I thought we had a topic on how to go about this. It's probably there somewhere, but I can't find it :( [17:34]
............. (idle for 1h0mn)
cdotLynnwood: the most likely source of problems is when the plugin talks to third party progs e.g. to generate PDF. If it's having problems generating HTML, then there may be a bug. [18:34]
Lynnwoodhey there cdot - this is just generating HTML. [18:35]
cdotIIRC there's nothing in PublishPlugin that would make it antithetical to unicode. [18:35]
gac410cdot, I did a quick scan of the plugin, and it does to readdir and file processing without any encoding. [18:35]
cdotLynnwood: are the problems with file names? [18:35]
gac410So it depends on the installation paths if nothing else. [18:35]
LynnwoodNo problem with filename. [18:35]
cdotgac410: y, fair point - I forgot about that. [18:35]
gac410Install your foswiki site into /var/www/data/aŽuŽu/ ... for some interesting results :D
(having data twice in the path caused issues at one point too)
[18:36]
LynnwoodI'm not sure exactly what this issue is. It's pretty simple. There are some funky characters in content body (MS characters for dashes, quotes, etc) which I thought I had cleared out. They DON'T display at all when viewing the foswiki topic. However, after publishing to html, they are showing up as this kind of char: � [18:38]
cdotok. Sounds like you have mushed the encoding. [18:39]
gac410(MS characters for dashes, quotes, etc) ... they should have been correctly converted to unicode/utf-8 if you used cp-1252 as the "from" character set in CharsetConverter [18:39]
cdotis the site UTF8? [18:39]
gac410I don't recall the ubuntu package name, but the "isutf8" utility is handy to check if a file contains correctly encoded utf-8 data, and perl -nlE 'say if /\P{Ascii}/' < /path/to/file/to/check ... to verify non-ascii [18:41]
Lynnwoodcdot - yes, that's what the meta charset is defined as. [18:41]
cdothow did you arrive there? Via charsetconvertor? [18:41]
Lynnwoodi did use CharsetConverter when I updated the site a while ago... and that's why i believe I had cleared this stuff out...
...yet it comes back to haunt me... the characters that will not die!
[18:42]
cdotif the source characters were encoded in something other than the source encoding, they might have hung on
e.g you converted from iso8859 but it was really cp1252
[18:42]
Lynnwoodperhaps that is the case... [18:43]
gac410that perl command, and isutf8 should tell you right away if the file is valid [18:43]
Lynnwoodi'll try that out [18:44]
cdotcould have happened by accident, if content was pasted in for example [18:44]
Lynnwoodyes, lots of that in the past.
habits die hard. some of these folks still use IE by choice...
[18:44]
gac410well now that the wiki is converted to utf8 cut/paste should be safe.
ie they should not be able to add more corruption. The browser will send in utf-8
[18:45]
https://foswiki.org/Support/Utf8MigrationConsiderations#Finding_invalid_utf_458_data shows an example of how to find invalid encodings. [18:51]

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)