#foswiki 2017-02-04,Sat

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)

WhoWhatWhen
***ChanServ sets mode: +o cdot [06:49]
........................................................ (idle for 4h37mn)
ChanServ sets mode: +o MichaelDaum [11:26]
............... (idle for 1h10mn)
uebera||MichaelDaum: Either someone modified sth on f.o or the problem went away by itself... memory allocation looks good, no need to restart apache2/mysql anymore. This should mean, however, that we have just missed those spikes before and they always were present or some requests we have not identified from the logs were worse than thought in terms of ressource allocation. [12:36]
MichaelDaumhm [12:43]
....... (idle for 30mn)
uebera||another hypothesis: We used to see zombie processes from time to time; maybe the current apache2/mysql combination "does not give up easily" and instead of zombie processes we now see memory hogging processes. I.e., we need to modify the watchdog to also monitor memory usage. [13:13]
...................... (idle for 1h48mn)
***ChanServ sets mode: +o gac410 [15:01]
....... (idle for 31mn)
Lynnwoodgac410 - I think I may have found an unfortunate side effect of adding the encoding of output files in PublishPlugin. Output images are now broken. [15:32]
gac410Y, this is an ugly issue. The whole way it's written is going to have problems. it "scrapes" links out of the rendered HTML pages, and then copies them. I added some notes to the task.
When "Attach" inserts a link, in old foswiki, it uft-8 encodes and url Encodes the link. Newer code doesn't do this ... as much anyway
Links like /pub/Litterbox/%c3%9a%c5%88%c3%ad%c4%8d%c3%b4%c4%8f%c4%9b/A vs /pub/Litterbox/Úňíčôďě/ ... and we seem to be inconsistent (Those two links are the same!)
[15:34]
LynnwoodI'll take a look at your notes. It makes sense to me how it re-works the urls. the urls aren't the issue.
It's the actual output of file.
[15:40]
gac410Encoding issues are deep. If something is already encoded then it will get broken if it's encoded again. [15:41]
LynnwoodThat line you changed yesterday in file.pm - print $fh Foswiki::encode_utf8($string); [15:42]
gac410So if a link get scraped, and is encoded and then encoded a 2nd time, it gets broken. [15:42]
LynnwoodIt's not the links that are broken, its the file itself.
image files
[15:42]
gac410Ah.... damn. Okay, so it uses the same code to write out html and images. Drats. [15:43]
LynnwoodThe link is correct (points to correct file) but browser says it can't open the file. [15:43]
gac410And we definitely do NOT want to encode non-html. [15:43]
LynnwoodI took a look at the file and they are there and are not empty (they have size). But if i try to open one, it's just broken.
right'
[15:44]
gac410Pass binary data to encode_utf8 and it totally corrupts it. exactly. [15:44]
Lynnwoodboth types of output are using the variable $file
so i'm wondering if we could add a conditional at about the same spot you made the change yesterday based on the kind of file it is.
[15:44]
gac410right, so if $file =~ m/\.html/ ... then encode it. [15:45]
LynnwoodLynnwood suggests simplistically...
right
[15:45]
gac410that might be simple enough. though if it's attaching a previously attached html, that would probably not be correct. It really is more complex than that. But that should cover 99% of the cases I bet. [15:46]
LynnwoodI'm not sure if this sheds some light on situation. There are apparently two different paths for outputting image files.
In one case, it retains the original pub/image structure as original.
[15:47]
gac410Don't know, I was not looking in that area. I was just trying to add my findings to the task, and was trying to create a unicode testcase I could attach to the task. [15:48]
LynnwoodHowever, for some reason, in other cases, it lumps all the image files together and just calls them "__extraXX". [15:49]
gac410I have some "unicode topics from hell" that were created with jomo's help during 2.0 development. But some have attachments that are private, so I need to scrub them into some form I could upload. [15:49]
LynnwoodThe first approach is working correctly (the images are ok). However the second one where it creates these files called "extra" is where it's broken.
I found the code that defines the "extra" files.
[15:49]
gac410I think if links are to "external" images it uses a different path than pub files, but not sure. [15:50]
LynnwoodIt's in Publisher.pm, line 1185
no i don't think so. All of these images are from pub... but for some reason it treats them differently.
let me go back and double check... it may be that it treats all images in this way but puts other attachments (pdfs, etc) into topic folders.
no never mind... some jpgs are in the folder.
....hmmm perhaps you are correct about code considering these files as "external" although i don't understand why it would.
I"m looking at Publisher.pm at around 1153 "sub _handleURL {
it calls Foswiki::Func::getExternalResource
[15:51]
FoswikiBothttps://trunk.foswiki.org/System/PerlDoc?module=Foswiki::Func [15:57]
LynnwoodAhhh... a light turns on. I think i know why it's thinking these images are external... [15:59]
gac410That actually uses Foswiki::Net to fetch using http [15:59]
FoswikiBothttps://trunk.foswiki.org/System/PerlDoc?module=Foswiki::Net [15:59]
.... (idle for 16mn)
gac410In some ways, I'm surprised it's working as well as it is. I still don't understand how the filename itself for non-ascii web/topic names becomes utf8 encoded. It *is* ... but I don't know why. [16:15]
Lynnwoodcould this possibly work at the same place you made change yesterday (~line 87):
if ( $file =~ m#([^/\.]*)\.html?$# ) {
print $fh Foswiki::encode_utf8($string);
}
else {print $fh $string;
}
i just noticed a conditional a little further down with regex ending in .html...
[16:20]
gac410I'd probably just look for \.html$ ... not sure what the other match is. [16:22]
Lynnwoodok
yea... i don't know what all the $file string contains...
ok. i'm beginning to get that regex i borrowed from line 96... it's getting the topic name
[16:22]
gac410yeah. when you see ( ) ... then it's capturing something. [16:27]
Lynnwoodi wonder what the m# part is about..
right
the m# is unfamiliar to me
[16:27]
gac410You can use any delimiter. m is match. # is the delimiter in this case. that way you can use / in the regex without escaping them.
Just easier than writing \/ for each slash in a path
You could also use m{ } matching braces as the delim.
[16:28]
Lynnwoodso could it be as simple as $file =~ .m{ .*?\.html.*$} [16:30]
gac410if ( $file =~ m/\.html$/ )
I use / unless there is a need for matching slashes in the regex.
and just anchoring $ at the end means it will match any string ending in .html
No need for the .*?
[16:31]
Lynnwoodunless it has some params...
in this case, i actually know that they do...
would this basically check if ".html" is anywhere in the string? if ( $file =~ m/\.html/ )
I'm ready to through some code at it and see if it blows up. :-)
[16:32]
gac410yes. And since $file is really a filename, no need to skip stuff after the .html
If you are interested in efficiency, (substr($file, -5) eq '.html') would do it as well
[16:35]
Lynnwoodexcellent [16:36]
gac410Substring is a bit faster than regex, especially with unicode. But tbh, unless there are 10,000 hits in a loop, it is probably inconsequential [16:36]
Lynnwoodi'll give it a try [16:36]
gac410perldoc -f substr is your friend
we tend to use regexes too casually esp. deep in some of the render code Some of the performance gains we made in 2.x patch releases were to take critical ones and change to substr or index
[16:37]
Lynnwoodby golly... it appears to have worked [16:49]
gac410excellent.
I think the real fix is to differentiate between files copied vs. generated using foswiki APIs
[16:49]
Lynnwoodyea... [16:50]
gac410But as a quick hack that seems fine
eventually handle something like
a href="http://..../pub/Litterbox/Úňíčôďě/AttachmentsWithSpaces/AśčÁ%20ŠŤ%20śěž.dat"
[16:51]
..................................................................................... (idle for 7h4mn)
***ChanServ sets mode: +o Lynnwood [23:57]

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)