[Filterproxy-devel] Re: New module for FilterProxy
Brought to you by:
mcelrath
|
From: Bob M. <mce...@us...> - 2001-08-05 04:28:01
|
John F Waymouth [way...@wp...] wrote:
>=20
> > > > SCRIPTADS: strip regex #(ads\\.freecity\\.de|flycast\\.com|/Rea=
lMedia/ads/)# inside tagblock <script> add encloser <script> alternate add =
balanced
> >
> > No, "add" will add stuff to the match if it can, but won't fail if it
> > can't. If you want it to fail when stuff isn't found, use predicates
> > "inside" and "containing". (I'll probably also add predicates "before"
> > and "after" someday) I added a note clarifying this in the docs.
> > Thanks.
>=20
> Ok, so in order not to scan for <script> twice needlessly, the above could
> be made more efficient by doing strip tagblock <script> containing regex
> etc etc etc.
Well...the time it takes for a matcher is proportional to the number of
times the *first* finder matches. So by doing regex first, it will fail
on most pages, and be very fast. If you do tagblock <script> first, it
will match on most pages, and slow things down.
Of course, the trade-off is that if it's found, you have to scan for
<script> twice.
Page with the ad (both ways):
perl FilterProxy/Rewrite.pm adlib/marsnews.html 'strip regex /flycast\.=
com/ inside tagblock <script> add encloser <script> add alternate add balan=
ced'
Rewrite: UNNAMED_0 took 0.05983 seconds, 0 failed, 5 successful
perl FilterProxy/Rewrite.pm adlib/marsnews.html 'strip tagblock <script=
> containing regex /flycast.com/ add alternate add balanced'
Rewrite: UNNAMED_0 took 0.05032 seconds, 8 failed, 2 successful
Page without the ad, but several <script> blocks:
perl FilterProxy/Rewrite.pm adlib/sunday-times.html 'strip regex /flyca=
st\.com/ inside tagblock <script> add encloser <script> add alternate add b=
alanced'
Rewrite: UNNAMED_0 took 0.00523 seconds, 1 failed, 0 successful
perl FilterProxy/Rewrite.pm adlib/sunday-times.html 'strip tagblock <sc=
ript> containing regex /flycast.com/ add alternate add balanced'
Rewrite: UNNAMED_0 took 0.02345 seconds, 24 failed, 0 successful
As you can see, tagblock <script> first is a factor of 4 slower on pages
without the ad (which will be the majority)...
If you can suggest a syntax for inside-and-add, I'll add it. It's
pretty trivial to code.
> Something odd i've seen in Mozilla, maybe it's a bug in Mozilla, maybe
> it's a bug in the proxy... occasionally, for reasons I can't discover,
> Mozilla will stop being able to use the proxy. You try to go to a site,
> it says it's resolving the host like normal, but it's not firing packets
> at my filter box (so say my hub lights). Seen this? Ideas?
Yup, bug 92915. It's a mozilla bug. (since it's not sending things to
the proxy...) Highly annoying. Easy to trigger by hitting reload...
Cheers,
-- Bob
Bob McElrath (rsm...@st...)=20
Univ. of Wisconsin at Madison, Department of Physics
|