www.cnn.com has several anchor links on their main page
that are like so:
<a
href="javascript:openWindow('/interactive/health/0110/anthrax/frameset.exclude.html
','620x430','toolbar=no,location=no,directories=no,status=no,menubar=no,scrollbars=no,resizable=no,
width=620,height=430')">
A useless use of javascript to create a popup window.
I would like to be able to filter this anchor link into
the following:
<a
href="/interactive/health/0110/anthrax/frameset.exclude.html">
possibly with/without a target="_blank" attribute.
If filterproxy supported using regex backreferences,
then something like:
regex /href="javascript:openWindow\('(.*?)','.*?')">/
as href="\1">
Should be able to un-javascript the links.
Or, if I'm missing a clearly obvious way to do this
with the existing matchers/subsitution system, please
post the solution.
Logged In: YES
user_id=10643
I too would like backreferences, but this turns out to be
very difficult to do in a general way. Doing it for the
regex matcher should be easy, but consider:
rewrite regex /<!-- (blah) -->/ add regex /<!--
(junk|stuff) -->/ as $1
What should this do?
Also, I want to be able to do things like this:
rewrite tag <a name=([a-zA-Z0-9_]+)> as <a
name=$1><b>"$1" Name anchor here</b>
i.e. have it work for tags. The tag (and similar) matchers
use backreferences heavily internally, so some work would
have to be done to figure out where the user wants a
backreference. And what about things like this?
rewrite (tagblock <table>) containing regex /(funk)/ as $2$1
i.e. reorder the table and the funk. Or how about this:
rewrite tagblock <blink>(.*)</blink> as <b>$1</b>
i.e. remove matching <blink> tags and replace them with <b>
tags. (which seems a reasonable thing to do!)
These examples, obviously, don't conform to (current)
FilterProxy syntax, but I would like them to work. See this
discussion:
http://www.perlmonks.org/index.pl?node_id=39379&lastnode_id=6364
that I started on this subject a while back. The "right"
way to do this is to define a BNF-form grammer that allows
backreferences in the regex sense by defining a () operator,
as well as other things to make matching expressions
unambiguous, group expressions, allow OR and AND operators, etc.
If you're interested in this contact me
(mcelrath+filterproxy@draal.physics.wisc.edu), and I can
point you in the right direction. I won't have time to work
on this anytime soon, but I did muck around with
Parse::RecDescent a little bit, trying to define a
rudimentary grammar.
Logged In: YES
user_id=10643
This feature will be in 0.32, which will be released soon.
Currently I have these two rules:
0_FIXJS1: rewrite regex #(<script(?:(?!
src$whitespace*=)[^>])*>)(?:$whitespace|<!--(?:(?!-->).)*?(?<!//)$whitespace*-->)*((?:(?<!<!--).)+?)(?<!//-->)(?:$whitespace|<!--(?:(?!-->).)*?(?<!//)$whitespace*-->)*(</script>)#
as $1
<!--
$2
//-->
$3
0_FIXJS2: strip regex /(['"])\+\1/
which together serve to fix javascript that isn't properly
escaped by comments. This shows up as scripts containing
'...<scr'+'ipt ...' which totally foils other filters.
Breaking up the <script> tag is necessary if it's not
escaped or the HTML parser would think
document.write('</script>') ends the script block...
-- Bob