Filter: img-reorder breaks CSMonitor.com's photos-of-the-day.
Privoxy version 3.0.21 (Gentoo package privoxy-3.0.21-r2).
Libpcre version 8.36 (libpcre-3.36).
Sourceforge user: duncan-sf (Thought that was obvious from the ticket creator tag, but the instructions above ask for it in bold, so just in case...)
Example affected link:
http://www.csmonitor.com/Photo-Galleries/Photos-of-the-Day/2014/Photos-of-the-day-10-16
Example affected img-tag from that link:
Filter-bypassed:
<img data-id="861889" data-index="0" class="lazy" src="" data-original="/var/ezflow_site/storage/images/media/content/csm-photo-galleries-images/photos-of-the-day-images/2014/1016/01/19164705-1-eng-US/01_full_900x600.jpg" title="" />
With img-reorder filtering (and whatever else applies) doing its thing:
<img src="" data-original=" data-id="861889" data-index="0" class="lazy"/var/ezflow_site/storage/images/media/content/csm-photo-galleries-images/photos-of-the-day-images/2014/1016/01/19164705-1-eng-US/01_full_900x600.jpg" title="" />
Scrambled tags for breakfast! It's a hopeless mess.
In default.filter under the img-reorder filter, commenting all but the first substitution line gives me the same result, so the problem is in that first substitution line:
s|<img\s+?([^>]*)\ssrc\s*=\s*(['"])([^>\\\2]+)\2|<img src=$2$3$2 $1|siUg
Based on our unfiltered original and the results, the captures are:
$1:data-id="861889" data-index="0" class="lazy" $2:" $3:" data-original=
Apparently the problem is in $3: ([^>\\\2]+)
Clearly, the intent is to exclude whatever is in $2 using backrefs by making it part of the negative character-class. $2 is a single double-quote-char. Based on the above substitutions it's clearly not being excluded. Excluding that double-quote would make the whole substitution a no-match since at least one character is required in the capture due to the +, and that would leave it empty. With the whole substitution a no-match the problem would disappear.
Question: Are backrefs /supposed/ to work in character-classes? I'm not a pcre-expert by a long shot but I didn't think so, and they're clearly not working here. Additionally, the pcrepattern (3) manpage section on back references starts with "Outside a character class" and there's no further hits on "character class" within the section, so it would appear backrefs aren't available in character-classes.
That being the case, that entire substitution line appears to be broken, thus our bug. =:^(
Thanks a lot for the detailed report.
This should be fixed in CVS where I changed group 3 to ([^>'" ]+).
Back references are indeed not expected to work in character classes.