[Filterproxy-devel] Re: filterproxy 0.29 suggestions
Brought to you by:
mcelrath
From: Bob M. <mce...@dr...> - 2002-08-27 16:29:42
|
Pilaszy Istvan [pi...@hs...] wrote: >=20 > Hi! >=20 > I'm using FilterProxy 0.29, and I'm very statisfied with it, and I think > it has very good source code, and it is very well developed. > I use it with wwwoffle, and I have same suggestions, which You might > apply to filterproxy, if You like it. Sure! BTW the latest version is 0.30, released in January. 0.31 is coming Real Soon Now... BTW I just looked at the wwwoffle page...it does some interesting things. It would be interesting to incorporate wwwoffle's functionality into FilterProxy. ;) > 1. > I want to redirect some URL to another, and I want to use some alias too. > (wwwoffle can solve redirection, but only with fix strings, I > can't use regexps) I have thought about this too...from my TODO file: Mapper module: works at orders 10,-10 and re-writes url's. for instance: Get "printer-friendly" version of articles at various news sites. Block requests to known advertiser's domains that may have slipped through Rewrite/BlockBanner. (variables useful here) Example: http://www.byte.com/column/(\w+) -> http://www.byte.com/printableArticle?= doc_id=3D$1 > For example: > I want to redirect http://i/ to http://info.sch.bme.hu/ > I want to alias http://*/robots.txt to http://127.0.0.1:8888/robots.txt. Are you using a robots.txt file with FilterProxy? Why? Have spiders found your FilterProxy? A robots.txt file might be something good to include with the FilterProxy distribution, for those that don't run with 'localhostonly'. > redirect: the proxy responses 302, and the right location, and > then netscape tries to get the new url. > alias: no redirection, netscape sees as if http://*/robots.txt existed, > and does not know, that it receives a local file. I think it is best to always use redirection, so that netscape always sees the actual location of the file. http://*/ is not a valid URL as specified in the RFC, and might get you into trouble. > I made a solution for redirect, but it needed to make some modification on > FilterProxy.pl. >=20 > I introduced $CONFIG->{order} =3D [ 0 ]; and I wrote a new modul: Alias. >=20 > Alias::filter compares the requested URL to some strings, and if it is ne= eded, > it rewrites the URL, and if it modificated the URL, then it returns(!) > a new HTTP::Response object. The handle_filtering method realize, > that it(the handle_filtering method) gave a HTTP::Request object, > and it(the handle_filtering method) received a HTTP::Response object, > and it(the handle_filtering method) returns this object. > The caller compares this to the original $req object, ... > This call is before > $res =3D $agent->request($req, \&data_handler); >=20 > I made here a branch: >=20 > # Send the request. > my $res; > my($req_new)=3D&handle_filtering($req, 0); > if($req_new =3D=3D $req) { > $res =3D $agent->request($req, \&data_handler); > } else { > $res=3D$req_new; > } You've got the right idea but this will fail if a filter modified $req (the Header module, for instance). > I made this ugly hack, because I can't solve in any other way > to replace a HTTP::Request object with a HTTP::Response object > in a module. (With pointers to $req, it would be more simple). I think redirects should be done at the -10 order, and a way to check is: if($req_new =3D~ /HTTP::Request/) { # ... do request stuff... (FilterProxy.pl 0.30 lines 636..696 } elsif($req_new =3D~ /HTTP::Response/) { # our request was changed to a response. Send it straight to # the client #...execute the code starting after line 696 } > (Really I made 2 modules, because I could not implement properly the s/// > operation, and the first module was not able to make the > http://*/robots.txt to http://127.0.0.1:8888/robots.txt. > The second module has one parameter, and executes this as a perl-program, > and this perl-program can substitute the strings.) >=20 > If You would like to see the modules and the .HTML files (for configurati= on), > I will post it. Yes, please. > May $CONFIG->{order} =3D [ 0 ] be used for internal response ( ie. no out= er > proxy or server will be asked to give the reply) ? I don't think I've used order 0 for anything yet... But it seems to me that redirects should occur first thing (order -10) and filters orders -9..-1 will never see the request, since it got redirected anyway. Any modifications that filters -9..-1 would make will be lost anyway once it gets redirected. Filters orders 1..10 should see the redirect response generated by your module. > 2. Rewrite.pm > I needed the 's///' regex for rewrite, and I realized, that it is very ea= sy to > implement it: > I choose the area with the mathers etc., and then I apply the regex > to this area. I introduced a new keyword 'apply_regex' instead of the 'as= '. > Here are the modifications: >=20 > (for filterproxy 0.29) > ... > } elsif(($operation eq "rewrite") and ($keyword eq "as"|| > $keyword eq "apply_regex")) { > last; > } else { > ... >=20 > } elsif($operation eq 'rewrite') { > if($filter =3D~ /\G\s*(.*)$/g) { > my($replacement) =3D $1; > if($FilterProxy::CONFIG->{debug}) { > logger(DEBUG, " Rewrite rewriting by rule $key: '", > substr($$content_ref, $start, $end-$start), > "' ".$keyword." '", # !!! > $replacement, "'\n"); > } > $nsuccess++; > if($keyword eq "apply_regex") { # !!! > my($what)=3Dsubstr($$content_ref,$start,$end-$start); > eval "\$what =3D~ $replacement"; > substr($$content_ref, $start, $end-$start) =3D $what; > pos($$content_ref) =3D $start+length($what); > } else { > substr($$content_ref, $start, $end-$start) =3D $replaceme= nt; > pos($$content_ref) =3D $start+length($replacement); > } > } else { > ... >=20 > It's insecure (because of the eval), I know , but I think it is very usef= ul > for making tricky modifications, and with some check, it can be made > secure. This looks good to me. Do you think you could generate a diff for the above changes? diff FilterProxy-0.29/Rewrite.pm FilterProxy/Rewrite.pm I think this could also be adapted to implement a feature requested on the sourceforge site: to have the submatch variables $1 $2, ... work inside rules. Could you send me some example rules you wrote using apply_regex? I wonder if it wouldn't be better to add this functionality to the existing 'regex' finder: rewrite tag <font size=3D-1> regex s/-\d+/0/ > My goal is to make it possible, that every image has a link, > and when I click on this link, it will disables loading that image foreve= r. > And when I have a long list of images to be disabled, I will be able to w= rite > general rules, which images to disable. What about images that are already a link? I wonder if you could use some javascript magic like onMouseOver and onClick to send a message to FilterProxy, in a way that wouldn't interfere with the web page's normal operation. What do you have against images? ;) Your ideas look good. If you want to generate diffs against 0.30, we can further refine them and I will include them in the next release. Also you may want to join the filterproxy-devel list. Cheers, -- Bob Bob McElrath (rsm...@st...)=20 Univ. of Wisconsin at Madison, Department of Physics "No nation could preserve its freedom in the midst of continual warfare." --James Madison, April 20, 1795 |