filterproxy-devel Mailing List for FilterProxy (Page 3)
Brought to you by:
mcelrath
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
(2) |
Apr
(1) |
May
(1) |
Jun
(2) |
Jul
(2) |
Aug
(19) |
Sep
(1) |
Oct
(5) |
Nov
(2) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(9) |
Feb
|
Mar
(3) |
Apr
(5) |
May
(15) |
Jun
(1) |
Jul
(4) |
Aug
(3) |
Sep
(1) |
Oct
(1) |
Nov
|
Dec
|
2003 |
Jan
|
Feb
(1) |
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
|
Dec
|
2006 |
Jan
(1) |
Feb
(1) |
Mar
(3) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
(1) |
Dec
|
2007 |
Jan
|
Feb
(1) |
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2008 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2009 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
From: Bob M. <mce...@dr...> - 2002-01-13 19:00:25
|
...finally. Get it here: http://filterproxy.sourceforge.net This release marks the transition from Parse::ePerl to HTML::Mason. Hopefully Mason will be longer lived and better maintained than ePerl has been. Also new in this release is "show filtering" and "edit filtering" functionality. Given a web page, it will mark up sections that were stripped/rewritten by the Rewrite module, so you can see exactly what your rules are doing. The best way to explain this is to see it: http://filterproxy.sourceforge.net/showfiltering.html The "edit filtering" functionality gives you two frames, one with the above "show filtering" output, and the other with the Rewrite config for all the rules that were applied to that page. So now with one click you can see what was filtered and change it. When you first start FilterProxy, go to: http://your.host.here:8888/index.html to get a set of javascript bookmarks that make this all very easy. Also new in this release is an XSLT module, kindly contributed by Mario Lang. XSLT is the XML Stylesheet Language Transformations. Generically it transforms one XML document into another XML document. Treating HTML as XML, you can use XSL to rearrange, rewrite, and strip out pices of a document. It is very powerful. It works by examining the document's structure, and as such, is guaranteed to produce a valid (X)HTML document in the end. It does not, however, have the matching power of regular expressions, so the XSLT module is complementary to the existing Rewrite module. I have rewritten the INSTALL file to reflect new dependencies. rpm packages for many things are just impossible to obtain, so I recommend everyone install from the tarball and use CPAN to install dependencies. If you have trouble with this, please send me suggestions on how to make the install smoother. Cheers, -- Bob |
From: Bob M. <mce...@dr...> - 2002-01-11 20:14:22
|
Mario Lang [la...@zi...] wrote: > Bob McElrath <mce...@dr...> writes: >=20 > > So as long as all the xsl files are in one directory (or in a dir > > relative to xsl/), we should be able to include them by specifying a > > relative path. > >=20 > > In the example you sent me, (for slashdot) I'm seeing black-on-black te= xt. > > Do you get that too? > I realized it after a sighted coworker pointed it out to me :) Are you truly blind? Is FilterProxy+XSLT a useful tool for you? That would be neat! How are you able to navigate, edit, and create complex documents like XML? = Do you use a braille terminal, or is speech synthesis useful? How do you hand= le all the wacky characters in xml like <>:, etc.? Braille doesn't have characters for those, does it? Please forgive my curiosity, I don't mean to intrude. I don't think I've e= ver met a blind computer user before! I can't imagine using a computer without sight! I use very high resolution on my monitor and small fonts to get the most information on my screen, doing it without sight must be very challenging! > It has to do with the html somehow being wrongly parsed, but I never > really checked why. I use XSL primarily for extracting > text content out of overdesigned webpages. Just as > that slashdot example shows... What is interesting to me is that you could use it to remove ads by examini= ng the content. The only problem is that it's very site-specific. > ... Ahh, I found it. The original page has the same > color and bgcolor set to 000000. Seems that they used some other tricks > to make the text readable... So after extracting that stable, > those tricks get lost it seems. >=20 > Maybe we can add sanity checks to the xsl file in some way,=20 > but I am fairly new to xsl either. In fact, as the comment sections of > those now in CVS show, they are written by T.v.Raman for the Emacspeak > audio desktop. I'm very new to it also. Someone pointed it out to me a long time ago, as = the "right" way to do what I was trying to do with the Rewrite module. However, since XSLT won't let you use regexes, I think both have their place. > > It's pretty fast too, on my machine that slashdot rule only takes 0.3s.= Neato. > > ;) (the rewrite rules on slashdot take ~3s for me) > Thats basicly because we really use libxslt and libxml for > the hard work. Although I am suprised that it is really that much faster, > xslt is a quite heavy operation.=20 My machine is pretty slow, comparatively, at doing Rewrite rules (it's an alpha). Comparable x86 machines often beat me in Rewrite times by a factor= of more than 3. I'm not sure why. Probably has to do with poor code optimiza= tion on the alpha. Parsing html/xml, with it's matched-nested-tag structure, is very computationally intensive. I've toyed with the idea of "treeing" the docum= ent first, to make multiple traversals much faster. I'm sure this is how the C libxml/libxslt does it. The perl module HTML::Parser does this too, but somehow ends up being slower than my regexes anyway. (initial versions of FilterProxy used HTML::Parser, but I found I could write a faster regex, and since then I've made many speed enhancements) Frankly, the speed hit in parsing matched-tag documents (like XML) is why I= 've been totally skeptical of people using XML for everything (i.e. xml-rpc). = But anyway...XSLT is still a cool idea. > I thank you for those fixes. Saved me alot of time.. No prob. I was excited to play with XSLT! > > Problem is that errors/warnings generated by Mason don't have accurate = line > > numbers.=20 >=20 > I already had to experience that. And its no fun, truly. I was using Mason 0.89, they're up to 1.0.4. I'll have to do some more che= cks to see if the problem still exists with newer versions. I'll bitch at them= if it's still giving bad line numbers. > > BTW, you should join the filterproxy-devel list, and we can continue > > discussions there, so others can see what you've done! ;) > Done :-) I've added you to filterproxy-devel. It seems you didn't get subscribed... > I tried to send the reply for you to the list, but > sf.net seems to have problems. I cant send to that list, > and there is no archive on the sf.net webpages. It should work now. Geocrawler was written by a stoned monkey with massive brain contusions. I= t's almost completely useless as an archive anyway, and flaky at best in actual= ly archiving any messages. For a while the mailing lists showed up under "public forums" and that was really cool, but they broke it. I think the sourceforge people are using u= s as guinea pigs for all their ideas, then when they get the code perfect, the remove the feature from the public site and put the feature in their (close= d) commercial code. Cheers, -- Bob Bob McElrath (rsm...@st...)=20 Univ. of Wisconsin at Madison, Department of Physics |
From: Bob M. <mce...@dr...> - 2002-01-09 17:27:13
|
Mario Lang [la...@zi...] wrote: > Bob McElrath <mce...@dr...> writes: >=20 > > The point is there was no way to decide on the stylesheet. None of the= web > > forms show up, all I get is the "dump headers to file" checkbox. > No, Not global. You have to choose a stylesheet for every siteconfig. > Currently, there are no global options, I just left that there > for me as example if I need some :-)... But I think I will > remove the global section all together.... I was getting the headers checkbox for site config too... My fault, I forgot to translate the variables. I just checked in a bunch of changes to your html file. I also added an=20 if(defined $SITECONFIG->{stylesheet}) so that it doesn't give form boxes for parameters when a stylesheet isn't defined. (was causing some perl errors) So now it sorta works. I see that the problem is that /usr/share/filterpro= xy is hardcoded into the xsl/xpath-filter.xsl file. It looks like doing: <xsl:include href=3D"identity.xsl"/> works, it seems to assume the same directory as the parent xsl file. So as long as all the xsl files are in one directory (or in a dir relative to xsl= /), we should be able to include them by specifying a relative path. In the example you sent me, (for slashdot) I'm seeing black-on-black text. = Do you get that too? It's pretty fast too, on my machine that slashdot rule only takes 0.3s. Ne= ato. ;) (the rewrite rules on slashdot take ~3s for me) > Hmm, but lets see. Ill do a fresh install altogether, maybe > I am getting something wrong here. Ok, I got it sorta working. I won't touch the xslt stuff for a while. It's all yours. > > Sorry about that, I've been sitting on it too long... :( > >=20 > > I hope the syntax is self-explanatory. HTML::Mason is far more popular= , so > > there are many good references on the web too. > It is also somehow more pleasant to read... Although <<EOT > does have problems :) Yeah, I had to fiddle with that a bit. BTW if you use vim do: :set syntax=3Dmason Problem is that errors/warnings generated by Mason don't have accurate line numbers. It has line numbers (two of 'em) but both of 'em are wrong. The <<EOF might even work. I played with it for a while thinking that was the error when the error was actually somewhere else. *sigh* I need to figure= out how to get Mason to generate accurate line numbers, for my own sanity. Ok, off to work now. ;) BTW, you should join the filterproxy-devel list, and we can continue discussions there, so others can see what you've done! ;) Cheers, -- Bob Bob McElrath (rsm...@st...)=20 Univ. of Wisconsin at Madison, Department of Physics |
From: Bob M. <mce...@dr...> - 2002-01-09 15:50:57
|
Mario Lang [la...@zi...] wrote: > Very nice. Thank you. I am 'mlang' on sourceforge.net. Would you=20 > mind giving me write access? Certainly. You should have it now. If I remember correctly, the last time= I joined a project it took a little while before the changes went through and= I could write to CVS. If you want any other permissions (Task Manager/Forums, etc.) just ask. > > The html file you provided doesn't seem to allow me to add or edit any = config. > > I think it is dependent on the parameter $ENV{SITECONFIG}->{'stylesheet= '}, > > which won't be there if you've just added a site. > You cant add/edit/delete parameter setting when you didnt decide > on which stylesheet you wanna you. Because every stylesheet > has different parameters. The point is there was no way to decide on the stylesheet. None of the web forms show up, all I get is the "dump headers to file" checkbox. > > I hope I haven't messed it up in translating your html file. :( > I will check that now. Very cool anyway that you translated > it for me, thats what I had planned for today after I got your=20 > mail and realized that I developed on a old version. Sorry about that, I've been sitting on it too long... :( I hope the syntax is self-explanatory. HTML::Mason is far more popular, so there are many good references on the web too. Cheers, -- Bob Bob McElrath (rsm...@st...)=20 Univ. of Wisconsin at Madison, Department of Physics |
From: Bob M. <mce...@dr...> - 2002-01-09 07:12:50
|
Mario Lang [la...@zi...] wrote: > Comments: > 1. For an example, add a XSLT entry for http://(www\\.)slashdot.org/$ > Set the stylesheet name to xpath-filter.xsl > Add parameter locator with value > /html/body/table[2]/tr/td/table[2]/tr/td[2]/font > Add parameter base with value > 'http://slashdot.org/' > What does that do? > It extracts the articles table from slashdot, > and removes all the other tables around it. > Tip: If you want to create a locator option for another page, > use xmllint --html --shell filename.html and simply > use cd|ls|cat to change where you want to be and do pwd. =2E..how interesting... > 2. What is lacking: > The xsl stylesheet path is hardcoded currently to /usr/share/filterpro= xy/xsl/ > Stylesheet name has no select listbox. You have to know the names. > No README or anything like that... >=20 > Comments are welcome. > Especially about how to do the stylesheet path. Make it a global > XSLT option? Where should they go... I have modified FilterProxy.pl to export $FilterProxy::HOME, which is the variable you want. > Do you have a CVS for filterproxy? I've updated CVS with my latest code. (Turns out I hadn't committed some recent changes) I've also added your code. I've modified it to use $FilterProxy::HOME to get its files in $FilterProxy::HOME /xsl. And I've translated the html file you provided so that it works with HTML::Mason. For further work on this you should check out the latest cvs and install HTML::Mason. ;) The html file you provided doesn't seem to allow me to add or edit any conf= ig. I think it is dependent on the parameter $ENV{SITECONFIG}->{'stylesheet'}, which won't be there if you've just added a site. Also you should remove t= he stuff about dumping headers to a file. ;) I hope I haven't messed it up in translating your html file. :( Cheers, -- Bob Bob McElrath (rsm...@st...)=20 Univ. of Wisconsin at Madison, Department of Physics |
From: Bob M. <mce...@dr...> - 2002-01-08 17:14:53
|
Mario Lang [la...@zi...] wrote: > Bob McElrath <mce...@dr...> writes: >=20 > > Mario Lang [la...@zi...] wrote: > [...] > >=20 > > OOOh!! Neato! > >=20 > > I'd be very interested! > >=20 > Here is what I have currently: >=20 > Comments: > 1. For an example, add a XSLT entry for http://(www\\.)slashdot.org/$ > Set the stylesheet name to xpath-filter.xsl > Add parameter locator with value > /html/body/table[2]/tr/td/table[2]/tr/td[2]/font > Add parameter base with value > 'http://slashdot.org/' > What does that do? > It extracts the articles table from slashdot, > and removes all the other tables around it. > Tip: If you want to create a locator option for another page, > use xmllint --html --shell filename.html and simply > use cd|ls|cat to change where you want to be and do pwd. > 2. What is lacking: > The xsl stylesheet path is hardcoded currently to /usr/share/filterpro= xy/xsl/ > Stylesheet name has no select listbox. You have to know the names. > No README or anything like that... >=20 > Comments are welcome. > Especially about how to do the stylesheet path. Make it a global > XSLT option? Where should they go... Neato. I'll take a closer look at it when I get home from work tonight. My understanding of xslt is poor at best, I'm interested to see what it can do! > Do you have a CVS for filterproxy? Yes, it's on sourceforge. http://sourceforge.net/projects/filterproxy. You should check that your module works with it. (may require translation of a= ny html files) I've added a number of cool features, and changed Parse::ePerl -> HTML::Mas= on for the config files. I *really* need to get 0.30 out. I'm sitting on a l= ot of changes here. Cheers, -- Bob Bob McElrath (rsm...@st...)=20 Univ. of Wisconsin at Madison, Department of Physics |
From: Bob M. <mce...@dr...> - 2002-01-08 15:13:32
|
Mario Lang [la...@zi...] wrote: > Hi.. >=20 > I just wrote a XSLT module for filterproxy using XML::LibXSLT. >=20 > I wonder if you want my current copy. I certainly plan to extend it more, > but it is functional now... >=20 > * Allows to configure a stylesheet file (.xsl) and parameters > for a site. > * Transforms the incoming html using LibXSLT and LibXML. OOOh!! Neato! I'd be very interested! Cheers, -- Bob Bob McElrath (rsm...@st...)=20 Univ. of Wisconsin at Madison, Department of Physics |
From: Bob M. <mce...@dr...> - 2001-11-05 07:13:40
|
Adrian Wills [on...@ec...] wrote: > Bob, >=20 > I am just using manual proxy setup with localhost and port 8888 >=20 > Attached is the log file Well that's not good... You appear to have a hosed install of LWP/libwww-perl. Use CPAN to upgrade= to the latest version. (perl -MCPAN -e shell, install Bundle::LWP) What version do you have installed? (lwp-request -v) I have: (255)<mcelrath@draal:/home/mcelrath> lwp-request -v This is lwp-request version 1.39 (libwww-perl-5.53) I hope this isn't a problem with perl 5.6.1, I haven't tried that. I use it with perl 5.6, and have also used perl 5.005. Looking at the file LWP/Protocol.pm, mine has the version 1.36. =20 > [#######] FilterProxy started (pid 16936). [#######] > [16946 Mon Nov 5 17:38:38 2001] [Perl WARNING] Use of uninitialized valu= e in pattern match (m//) at /usr/local/share/perl/5.6.1/LWP/Protocol.pm lin= e 114. > [16946 Mon Nov 5 17:38:38 2001] [Perl WARNING] Use of uninitialized valu= e in concatenation (.) or string at /usr/local/share/perl/5.6.1/LWP/Protoco= l.pm line 87. > [16950 Mon Nov 5 17:39:18 2001] [Perl WARNING] Use of uninitialized valu= e in pattern match (m//) at ./FilterProxy.pl line 521. Cheers, -- Bob Bob McElrath (rsm...@st...)=20 Univ. of Wisconsin at Madison, Department of Physics |
From: Bob M. <mce...@dr...> - 2001-11-05 06:59:37
|
Well, I haven't seen this. What url are you loading? How did you set up mozilla to access FilterProxy? Can you send relevant portions of FilterProxy.log? Adrian Wills [on...@ec...] wrote: > Hi, >=20 > Just installed the latest version of filterproxy 0.29.2. I consistently= =20 > get the following error: >=20 > 501 Protocol scheme '' is not supported >=20 > Could you offer any suggestions. >=20 > Cheers >=20 > Adrian Wills >=20 > University of Newcastle > Australia -- Bob Bob McElrath (rsm...@st...)=20 Univ. of Wisconsin at Madison, Department of Physics |
From: Bob M. <mce...@dr...> - 2001-10-26 02:16:06
|
Ah, sorry, I wasn't ignoring your message, I forgot about it! I'll look into this when I get back home, next week. John F. Waymouth [way...@wp...] wrote: > I did something slightly silly, I tried to connect to the proxy without h= aving > it set as the proxy in my settings... so it proxied the request to > itself... icky vicious process creation loop. Brought my poor little box > to its knees (shoulda set a process limit...). Think we can prevent this? John F. Waymouth [st...@wa...] wrote: > > > I ran into a couple problems though: First, if I try to view the > > > administration page remotely, it seems to go into an infinite loop > > > connecting and re-connecting to itself. This is unfortunate because > > > only lynx is available on the server itself. > >=20 > > I view the config pages remotely on a regular basis. Can you turn on d= ebug and > > send me a snippet of the log file? You're not the first to report this= , but I > > can't reproduce it. >=20 > Connect to the proxy as if it were a standard HTTP server, not by asking = it to > retrieve the page using itself as a proxy. -- Bob Bob McElrath (rsm...@st...)=20 Univ. of Wisconsin at Madison, Department of Physics |
From: Bob M. <mce...@dr...> - 2001-10-25 17:23:36
|
What version of FilterProxy do you use? Pete Gonzalez [go...@ra...] wrote: > At 10:55 PM 10/24/2001, you wrote: > >Pete Gonzalez [go...@ra...] wrote: > > > 1. I want to configure FilterProxy to refuse connections from all > > > internet hosts except for a specific list of IP addresses. Is > > > this possible? > > > >Not currently, no. You can restrict incoming connections to localhost, = or the > >host specified as $HOST in FilterProxy.pl. > >I will happily accept a patch for this if you want to implement it thoug= h. >=20 > This would be a good feature to have. I ended up hacking FilterProxy.pl > implement this, but it would probably be pretty easy to add it to the > config file. >=20 > I ran into a couple problems though: First, if I try to view the > administration page remotely, it seems to go into an infinite loop > connecting and re-connecting to itself. This is unfortunate because > only lynx is available on the server itself. I view the config pages remotely on a regular basis. Can you turn on debug= and send me a snippet of the log file? You're not the first to report this, bu= t I can't reproduce it. > Secondly, it seems to be forking a lot of copies of itself in memory. > Would it be difficult to limit the number of forks? It forks one copy for each connection from the browser, and then tries to k= eep those connections open as long as possible. Normally this will generate 8 connections per user. This is a fast and reasonable approach as long as the number of users are small, and the browser is recent. Old browsers (Netsca= pe 3.x) which open one connection per page/image and use HTTP/1.0 can generate= a disasterous load due to forking. A pre-fork model as used by Apache would be desirable for older browsers or HTTP/1.0 and a large number of users. This is a bit of work though, and I don't really have the motivation to do it since I use it, and intend it to = be a "personal" filtering proxy. I would happily accept a patch if you want to implement this feature though! I think there are some notes about it in my TODO file. > > > 2. How do you use the authentication feature? > >Mozilla/IE will ask you for a username/password when you make a connecti= on. >=20 > Hmm... this didn't work for me. What does it do? You enabled authentication on the config page, right? Wh= at happens when you request a page? Do you get a 407 response (Proxy Auth required) or an error? Cheers, -- Bob Bob McElrath (rsm...@st...)=20 Univ. of Wisconsin at Madison, Department of Physics |
From: Bob M. <mce...@dr...> - 2001-10-25 15:43:24
|
David Cornette [dco...@is...] wrote: > I was trying to use FilterProxy, but it seems that it requires a later > version of Perl. I have version 5.004_04. The qr// operator apparently > doesn't exist in that version. Perhaps you could note what version of > Perl FilterProxy requires somewhere. Well, I developed it initially with perl 5.005_03, and newer versions work = with perl 5.6 (with some trouble due to Parse::ePerl). Current development is happening under perl 5.6, and the CVS version works with perl 5.6 (and no longer uses Parse::ePerl). I will make a note in the docs that perl 5.005 or better is required. Cheers, -- Bob Bob McElrath (rsm...@st...)=20 Univ. of Wisconsin at Madison, Department of Physics |
From: Bob M. <mce...@dr...> - 2001-10-25 02:55:56
|
Pete Gonzalez [go...@ra...] wrote: > 1. I want to configure FilterProxy to refuse connections from all > internet hosts except for a specific list of IP addresses. Is > this possible? Not currently, no. You can restrict incoming connections to localhost, or = the host specified as $HOST in FilterProxy.pl. I will happily accept a patch for this if you want to implement it though. > 2. How do you use the authentication feature? I can configure > Mozilla and IE to use FilterProxy, but there is no place to input > a username/password. Mozilla/IE will ask you for a username/password when you make a connection. You can enter usernames/passwords for FilterProxy by hitting the "Edit usernames/passwords" link on the main config page. (right after the enable authentication checkbox) Cheers, -- Bob Bob McElrath (rsm...@st...)=20 Univ. of Wisconsin at Madison, Department of Physics |
From: John F. W. <way...@wp...> - 2001-10-21 03:07:20
|
I did something slightly silly, I tried to connect to the proxy without having it set as the proxy in my settings... so it proxied the request to itself... icky vicious process creation loop. Brought my poor little box to its knees (shoulda set a process limit...). Think we can prevent this? |
From: Bob M. <mce...@dr...> - 2001-09-30 01:10:52
|
=2E..has been released. This is an interim release containing many bug fixes. I'm about to rip out Parse::ePerl, and wanted to make sure these bug fixes got out first. (check the ChangeLog) I have tried installing Parse::ePerl on a perl 5.6 machine recently. It was ugly. Very ugly. I even tried to make eperl rpms. That was uglier still. If you're using FilterProxy with perl 5.6, consider yourself supremely talented. Also in this release is a new Source module thanks to John Waymouth. It works two ways:=20 http://source/url=20 or=20 http://localhost:8888/Source.pm?url where url is the url you want to load. Note this will show you the UNFILTERED source by converting the mime-type to text/plain. There are a few neato javascript bookmarks on the home page: http://draal.physics.wisc.edu/FilterProxy/ to do things like: "view unfiltered source for this page", "enable filtering", "disable filtering", and bring up the config page. Numerous rule updates, including some to block evil javascript popups/popunders. If you have written new rules for FilterProxy, please send them to me and I will include them in future distributions. I was pleased to discover that the new jump-through ads on salon.com were already filtered by existing rules! We shall see whether some new announced "filter-detector" will actually work on FilterProxy: http://slashdot.org/article.pl?sid=3D01/09/29/002222 I will postpone my musings about the future of internet advertising for another day, but if you search the above discussion you will find a comment by me on the subject. Also, FilterProxy now loops over the modules in the FilterProxy/ directory and tries to load them all. If they fail to load it will print an error message, but FilterProxy WILL STILL RUN if some modules do not load. You will probably discover that there is an ImageComp module, which will recompress jpegs, among other things. But it depends on ImageMagick (something I haven't installed)...and will fail to load. This is not a problem. You can still use FilterProxy as usual. Coming soon:=20 "View how this page was filtered" -- marked-up source. HTML::Mason replacing Parse::ePerl. Off I go to install perl 5.6... -- Bob Bob McElrath (rsm...@st...)=20 Univ. of Wisconsin at Madison, Department of Physics |
From: Bob M. <mce...@dr...> - 2001-08-31 23:12:00
|
It looks like you're using mandrake packages on a redhat system. And the packages you got want perl 5.6.1. FilterProxy itself is happy with perl 5.005 or later. Ah, and it looks like you got some pld rpms too. PLD (Polish Linux Distribution) puts things in hosed places (like "i386-pld-linux" below) that redhat won't be happy with. Finding redhat packages is much harder now than it was a year ago. You used to be able to just use rpmfind and it would grab things for you. Now PLD and Mandrake packages are largely incompatible with redhat. Redhat also used to have all the CPAN packages in rpm format, but they haven't updated it in years and now they're horribly out of date. :( The *easiest* way to do this is to build it yourself. Grab the source for eperl from the Debian project: http://packages.debian.org/unstable/devel/eperl.html At the bottom there's the links to Source Code: http://ftp.debian.org/debian/pool/main/e/eperl/eperl_2.2.14.orig.tar.gz http://ftp.debian.org/debian/pool/main/e/eperl/eperl_2.2.14-3.diff.gz Apply the patch, build, and install the thing yourself. Second, use the perl CPAN tool to install the rest: > perl -MCPAN -e shell [...] CPAN> install Compress::Zlib CPAN> install Bundle::LWP CPAN> install Time::HiRes That should do it. If anyone wants to send me rpms for any of these, I'd be happy to put them on my site for people to download. Tamara Miller [mi...@dr...] wrote: > just trying to get filterproxy working. Do I need a newer version of perl > than 5.6.0? I downloaded all of those perl modules and filterproxy, but I= 'm still > getting some failed dependencies. >=20 >=20 > (0)<millert@chani:/home/millert> sudo rpm -U perl-* eperl-2.2.14-9mdk.i58= 6.rpm FilterProxy-0.29.1-1.noarch.rpm > error: failed dependencies: > /usr/lib/perl5/site_perl/i386-pld-linux/5.6.1 is needed by perl= -Compress-Zlib-1.13-1 > perl =3D 5.6.1 is needed by perl-Compress-Zlib-1.13-1 > perl(IO::Handle) is needed by perl-Compress-Zlib-1.13-1 > perl(constant) is needed by perl-Compress-Zlib-1.13-1 > perl(strict) is needed by perl-Compress-Zlib-1.13-1 > perl(vars) is needed by perl-Compress-Zlib-1.13-1 > perl(warnings) is needed by perl-Compress-Zlib-1.13-1 > perl-modules is needed by perl-Compress-Zlib-1.13-1 > perl-HTML-Tagset >=3D 3.03 is needed by perl-HTML-Parser-3.25-2 > perl >=3D 5.600 is needed by perl-URI-1.15-1mdk > perl-MIME-Base64 is needed by perl-libwww-perl-5.53-3 > perl-libnet is needed by perl-libwww-perl-5.53-3 > perl-Digest-MD5 is needed by perl-libwww-perl-5.53-3 > perl-base =3D 5.601 is needed by eperl-2.2.14-9mdk > libperl.so is needed by eperl-2.2.14-9mdk > perl-eperl is needed by FilterProxy-0.29.1-1 >=20 -- Bob Bob McElrath (rsm...@st...)=20 Univ. of Wisconsin at Madison, Department of Physics |
From: Bob M. <mce...@dr...> - 2001-08-26 15:42:43
|
Shlomi Yaakobovich [shl...@ya...] wrote: > Hi, >=20 > I've noticed that when I receive compressed content > from the proxy, my browser (MSIE 5.5) sometimes > crashes, pretty much consistently, while without > compressed data it does not crash. Is this a bug in > the proxy, or the browser ? Is there any way around > it ? I'm really interested in 5x speed...=20 Hmmm...I suspect a browser bug. I have tested it with the following (linux) browsers: Mozilla (0.8 onward) Netscape 4.x Netscape 3.x StarOffice Galeon Skipstone Konqueror =2E..and none of them had a problem with compressed data, so I suspect a browser bug. (Netscape has a few bugs with compressed data too...but doesn't crash, just displays binary junk -- in <iframe>'s) Try this URL: http://groups.yahoo.com/group/http-wg/messages/8730?expand=3D1 WITHOUT the proxy and see if it crashes your browser. This particular yahoo server also spits out compressed content. See if it crashes your browser. Also, turn on debug, and also turn on "Dump headers to log file" in the Header main config page. Check if your browser is sending the Accept-Encoding header. If all else fails, mail the FilterProxy.log file to me (with debug and headers dumped to log file turned on!) and I'll take a look at it. What's the latest version of MSIE? Might be worthwhile to upgrade... Cheers, -- Bob Bob McElrath (rsm...@st...)=20 Univ. of Wisconsin at Madison, Department of Physics |
From: Bob M. <mce...@dr...> - 2001-08-23 18:17:43
|
doggy [dog...@ma...] wrote: > I can't install Parse::ePerl because I using Perl 5.6.1 >=20 > Here is the error message when I make ePerl below: >=20 > cc -O2 -m486 -fno-strength-reduce -Dbool=3Dchar -DHAS_BOOL -I/usr/local/i= nclud > e -I/usr/lib/perl5/5.00503/i386-linux/CORE -I. -c eperl_perl5.c > eperl_perl5.c: In function `Perl5_ForceUnbufferedStdout': > eperl_perl5.c:69: `defoutgv' undeclared (first use in this function) > eperl_perl5.c:69: (Each undeclared identifier is reported only once > eperl_perl5.c:69: for each function it appears in.) > make: *** [eperl_perl5.o] Error 1 Sorry for the late reply, I have been out of town. Have you applied any patches to eperl? Also, it appears that configure or make has found your 5.00503 perl install, not perl 5.6.1, which could be the problem. Cheers, -- Bob Bob McElrath (rsm...@st...)=20 Univ. of Wisconsin at Madison, Department of Physics |
From: Bob M. <mce...@dr...> - 2001-08-23 14:43:36
|
Edward [ed...@ca...] wrote: > Hi there! > =A0 > Can I use this program w/ squid? >=20 > Thank you very much. Yes. Instructions are in the README (starting on line 70). Cheers, -- Bob Bob McElrath (rsm...@st...)=20 Univ. of Wisconsin at Madison, Department of Physics |
From: John F W. <way...@WP...> - 2001-08-23 01:57:46
|
On Wed, 22 Aug 2001, Bob McElrath wrote: > Me too. Just got back from California. And I'm leaving friday early to go to school, and who KNOWS when I'll be done moving in and have my email set up and all that... > Yes, I think I fixed that by explicitly calling > FilterProxy::handle_filtering(-10,1,2,3), which then calls Header (and > other modules, if necessary). Note it does not call handle_filtering > for Orders that modify the content. (See comments at the beginning of > Skeleton.pm) That's something on the order of what I was thinking. Except why not modify the content, if you're going to hilite Rewrite.pm changes? > CGI is already a dependency...and is included in LWP, which is a > dependency Righto, so Url::Escape isn't needed. > Well, what I've done is set a flag ($markupinstead) which tells Rewrite > to build a @markup data structure instead of modifying the source. Then > I call FilterProxy::handle_filtering for Rewrite's Order. I then parse > this data structure, marking up with the name of the rule as I parse it. > Both the flag and the data structure are variables in the > FilterProxy::Rewrite namespace. This isn't a race condition since it is > all executed by a single FilterProxy child process, which resets the > flag when it's done. Ugly, but it works. > > It turns out that the really hard part is when there is overlapping > modifications. (which is pretty common, actually) Marking up > nonoverlapping ones was easy, and could be done in one pass. The two > pass method described above is necessary in case two matches overlap. > (Matches can grow backwards, and would grow over a previously marked up > section!) It's complicated by the fact that Rewrite also has to parse > the data structure to make sure the piece it's examining hasn't already > been "stripped". Ugh! Yeah, @markuip was something on the order of what I was thinking. Good luck with the overlapping changes thing. That algorithm/concept rings familiar, I think something like that's been written before, or at least used as an example to torture CS students... > Well, for the time being I'll keep both methods. So it will still be > possible to do http://source/... Since the long URL is browser->proxy, > only a browser limitation would cause a problem. HTTP::Daemon, which > parses the headers for FilterProxy, has a limitation of 16k for the URI, > so we should be ok. Neither rfc 2068 or 2616 specifies how long URI's > or headers can be, but both specify the 413 and 414 error codes for > headers/URI's that are too long. I think it's safe to test it with a few long uri's in common browsers, and leave it at that. > The fix has also gone into the mozilla trunk. Maybe easier to get a new > nightly since I'm deathly slow... ;) Ok, I'll grab the new mozilla when i have bandwidth :) |
From: Bob M. <mce...@dr...> - 2001-08-23 01:01:32
|
John F Waymouth [way...@WP...] wrote: > Hey, sorry I haven't responded in awhile, I've been pretty busy. Me too. Just got back from California. I've half-written the Rewrite markup thing...I'll release a new version as soon as it works. > On Sun, 5 Aug 2001, Bob McElrath wrote: > > Well, it turned out to be pretty easy. Two files, maybe 15 lines extra > > total. (Attached -- to use it, add $agent to the "use vars" list at the > > beginning of FilterProxy.pl, and change "my $agent" to be just "$agent" > > on line 171). And escaping in javascript turned out to be trivial, it's > > a function called "escape"... heh. >=20 > Hmm, this is decent, but there's the problem of not running the request > through Header.pm, or anything else the user might want. Maybe you should > hook into the handler function in FilterProxy.pl? This, of course, yet > again brings up the question of how to hook in Source.pm, but we can throw > in a bogus header or even make the request have a source:// instead of > http://, cause we're dealing with the request internally. Then we get the > best of both worlds. Sorta. Yes, I think I fixed that by explicitly calling FilterProxy::handle_filtering(-10,1,2,3), which then calls Header (and other modules, if necessary). Note it does not call handle_filtering for Orders that modify the content. (See comments at the beginning of Skeleton.pm) > BTW, to avoid using URI::Escape (and having yet another dependency, unless > that's standard?) you can use CGI::unescape for URL encoding, and > CGI::escapeHTML to do all your > and such. Don't forget to html-ify > tabs and newlines, though, and may as well get spaces as well. CGI is already a dependency...and is included in LWP, which is a dependency > > This method also opens the door wide up to marking up or reformatting > > the source. This bit of javascript, when bookmarked, will act like > > "view source". >=20 > It will take a fair amount of trickery and interoperation between Rewrite > and Source to build up the diff list. Perhaps we should be really darn > sneaky (cheat) and store more data in the $res hash. It's just a hash, > after all. Will perl allow us to mess with someone else's blessed hash? >=20 > Otherwise, I suppose we can hook Source.pm in a third time, at level 1, to > put in an internal header that Rewrite recognizes, which tells it to build > up differences. Or something like that. Well, what I've done is set a flag ($markupinstead) which tells Rewrite to build a @markup data structure instead of modifying the source. Then I call FilterProxy::handle_filtering for Rewrite's Order. I then parse this data structure, marking up with the name of the rule as I parse it. Both the flag and the data structure are variables in the FilterProxy::Rewrite namespace. This isn't a race condition since it is all executed by a single FilterProxy child process, which resets the flag when it's done. Ugly, but it works. It turns out that the really hard part is when there is overlapping modifications. (which is pretty common, actually) Marking up nonoverlapping ones was easy, and could be done in one pass. The two pass method described above is necessary in case two matches overlap. (Matches can grow backwards, and would grow over a previously marked up section!) It's complicated by the fact that Rewrite also has to parse the data structure to make sure the piece it's examining hasn't already been "stripped". Ugh! > > There is, it is implementation dependent, 4096 bytes is most common, I > > think, but I've seen people complain about implementations that use 1024 > > bytes. >=20 > Alright. We'll have to see how well this works; remember that long query > strings will be even more elongated because every % becomes a %25. Well, for the time being I'll keep both methods. So it will still be possible to do http://source/... Since the long URL is browser->proxy, only a browser limitation would cause a problem. HTTP::Daemon, which parses the headers for FilterProxy, has a limitation of 16k for the URI, so we should be ok. Neither rfc 2068 or 2616 specifies how long URI's or headers can be, but both specify the 413 and 414 error codes for headers/URI's that are too long. > > I know this isn't exactly what you wanted John, but take a look at the > > attached files and let me know what you think. >=20 > I suppose it'll work. I hadn't thought of hooking into Config, I thought > you were planning to write everything in the embedded perl, which wouldn't > be too happy. It's your proxy, it's your choice. We'll see how it works. Ack not in embedded perl...that would be ugly. ;) > > P.S. I added a workaround for the Mozilla reload-hang. 0.29.2 "Real > > Soon Now". >=20 > Maybe you could send a prerelease my way? ;) The fix has also gone into the mozilla trunk. Maybe easier to get a new nightly since I'm deathly slow... ;) Cheers, -- Bob Bob McElrath (rsm...@st...)=20 Univ. of Wisconsin at Madison, Department of Physics |
From: John F W. <way...@WP...> - 2001-08-18 04:04:44
|
Hey, sorry I haven't responded in awhile, I've been pretty busy. On Sun, 5 Aug 2001, Bob McElrath wrote: > Well, it turned out to be pretty easy. Two files, maybe 15 lines extra > total. (Attached -- to use it, add $agent to the "use vars" list at the > beginning of FilterProxy.pl, and change "my $agent" to be just "$agent" > on line 171). And escaping in javascript turned out to be trivial, it's > a function called "escape"... heh. Hmm, this is decent, but there's the problem of not running the request through Header.pm, or anything else the user might want. Maybe you should hook into the handler function in FilterProxy.pl? This, of course, yet again brings up the question of how to hook in Source.pm, but we can throw in a bogus header or even make the request have a source:// instead of http://, cause we're dealing with the request internally. Then we get the best of both worlds. Sorta. BTW, to avoid using URI::Escape (and having yet another dependency, unless that's standard?) you can use CGI::unescape for URL encoding, and CGI::escapeHTML to do all your > and such. Don't forget to html-ify tabs and newlines, though, and may as well get spaces as well. > This method also opens the door wide up to marking up or reformatting > the source. This bit of javascript, when bookmarked, will act like > "view source". It will take a fair amount of trickery and interoperation between Rewrite and Source to build up the diff list. Perhaps we should be really darn sneaky (cheat) and store more data in the $res hash. It's just a hash, after all. Will perl allow us to mess with someone else's blessed hash? Otherwise, I suppose we can hook Source.pm in a third time, at level 1, to put in an internal header that Rewrite recognizes, which tells it to build up differences. Or something like that. > There is, it is implementation dependent, 4096 bytes is most common, I > think, but I've seen people complain about implementations that use 1024 > bytes. Alright. We'll have to see how well this works; remember that long query strings will be even more elongated because every % becomes a %25. > I know this isn't exactly what you wanted John, but take a look at the > attached files and let me know what you think. I suppose it'll work. I hadn't thought of hooking into Config, I thought you were planning to write everything in the embedded perl, which wouldn't be too happy. It's your proxy, it's your choice. We'll see how it works. > P.S. I added a workaround for the Mozilla reload-hang. 0.29.2 "Real > Soon Now". Maybe you could send a prerelease my way? ;) |
From: Bob M. <mce...@dr...> - 2001-08-06 04:47:49
|
John F Waymouth [way...@WP...] wrote: > > Although it's not necessary to use diff if you build up a list of changes > as Rewrite does its work. Yup, that's what I intend. > Well, I suppose this solution works. Though if you eval, as opposed to > use, you're not going to do the import call that use does (unless you call > it yourself) I meant: eval "use $file"; if($@) ... > > If the answer is no...the only *proper* way to do it is to write a whole > > lot of code along the lines of http://localhost:8888/Source.html?get=%%% > > Ick. I just can't be happy with that, it doesn't settle right with me... > Furthermore, I'm not even sure how to urlencode in javascript. Well, it turned out to be pretty easy. Two files, maybe 15 lines extra total. (Attached -- to use it, add $agent to the "use vars" list at the beginning of FilterProxy.pl, and change "my $agent" to be just "$agent" on line 171). And escaping in javascript turned out to be trivial, it's a function called "escape"... heh. This method also opens the door wide up to marking up or reformatting the source. This bit of javascript, when bookmarked, will act like "view source". javascript:void open("http://chani:8888/Source.html?url=" + escape(document.location), "source"); > I don't think we'll hit a max URI length problem. I'm not sure there IS > such a maximum, but plenty of sites out there use HUGE query strings. There is, it is implementation dependent, 4096 bytes is most common, I think, but I've seen people complain about implementations that use 1024 bytes. I know this isn't exactly what you wanted John, but take a look at the attached files and let me know what you think. Cheers, -- Bob P.S. I added a workaround for the Mozilla reload-hang. 0.29.2 "Real Soon Now". Bob McElrath (rsm...@st...) Univ. of Wisconsin at Madison, Department of Physics |
From: John F W. <way...@WP...> - 2001-08-05 16:17:34
|
On Sun, 5 Aug 2001, Bob McElrath wrote: > That's fine (see below on eval). I want to go further than a diff > though (for instance, also insert the name of the rule that made the > change). Although it's not necessary to use diff if you build up a list of changes as Rewrite does its work. > Done. Now it loops over files in FilterProxy/, eval's them, and keeps > going if the eval fails. Looking for filter rules would be a bit > harder, since you have to know the module exists, before you load it... > (i.e. Compress isn't used now but you want to add it for some site...) Well, I suppose this solution works. Though if you eval, as opposed to use, you're not going to do the import call that use does (unless you call it yourself) > Ok, so I started to outline how this would work, and I just can't get > around the fact that we need to send *extra* data with the request. > Whether we want to return the source or return highlighted source, > framed or not, it's the same...the behavior isn't default, and we have > to signal that. And it must work for ANY url. > > Is there any other way to get data from the browser to the proxy than in > the URI? Can browsers handle a cookie for the proxy, or a cookie for > all domains? Nope. The Cookie spec specifically requires two dots or more. So it won't send to .com, or to the whole internet. > If the answer is no...the only *proper* way to do it is to write a whole > lot of code along the lines of http://localhost:8888/Source.html?get=%%% Ick. I just can't be happy with that, it doesn't settle right with me... Furthermore, I'm not even sure how to urlencode in javascript. > > Checking Netscape and Mozilla, It looks like multiple ? aren't a > problem, but with multiple # the last one is stripped off. (These > aren't intended for servers anyway) But adding the data at the end of > the URI after a ? could easily run into the maximum URI-length problem. I don't think we'll hit a max URI length problem. I'm not sure there IS such a maximum, but plenty of sites out there use HUGE query strings. > Arg. Yeah, but editing the config will require interaction with the > core script too... I'm not so leary of adding new rules that call the Source module, as I am of actually writing this into the proxy itself. > This mechanism requires a whole lot of code to be written... Basically > duplicating &handle_proxy, except storing the data, rather than feeding > it to the client. Ick. Right. I think we should avoid this. Ok, I've just read the RFC for URI. This is legal: http://source/http://foobaretc/ I still see this as the most viable option for communicating a view-source command. To avoid masking a hostname on the local network named "source", we could do the following, to further avoid masking a host: http://view.source/http://foobaretc/ |
From: Bob M. <mce...@dr...> - 2001-08-05 07:04:40
|
John F Waymouth [way...@WP...] wrote: >=20 > Ok, I've done some researching. I think we want to write our own HTML > prettifier (I found a few, one that didn't use HTML::Parser even, but it > didn't fit our purposes). I found a package, Algorithm::Diff, which does > exactly what we want: it allows traversal of a diff sequence. If you're > ok with that one more single dependency, we could use this. That's fine (see below on eval). I want to go further than a diff though (for instance, also insert the name of the rule that made the change). > A thought occurs to me, because you're cringing about dependencies. I > don't think it should be necessary for me to have installed Compress::Zlib > if I'm not using the Compress filterproxy module. I think modules should > only be "use"d if they're in use. How about load a module when you come > across a filter rule that uses it, except if it's already loaded (MODULES > contains it)? Just a thought. Done. Now it loops over files in FilterProxy/, eval's them, and keeps going if the eval fails. Looking for filter rules would be a bit harder, since you have to know the module exists, before you load it... (i.e. Compress isn't used now but you want to add it for some site...) This also means that modules aren't hard-coded into FilterProxy.pl anymore. > > I'm gonna try to write a Rewrite-changes-highlighter this weekend. >=20 > Ok. I have a few ideas in my head, so let me know if/when you've > completed yours, so I'll know if I should write my own or look at yours :) Ok, so I started to outline how this would work, and I just can't get around the fact that we need to send *extra* data with the request. Whether we want to return the source or return highlighted source, framed or not, it's the same...the behavior isn't default, and we have to signal that. And it must work for ANY url. Is there any other way to get data from the browser to the proxy than in the URI? Can browsers handle a cookie for the proxy, or a cookie for all domains? If the answer is no...the only *proper* way to do it is to write a whole lot of code along the lines of http://localhost:8888/Source.html?get=3D%%% Checking Netscape and Mozilla, It looks like multiple ? aren't a problem, but with multiple # the last one is stripped off. (These aren't intended for servers anyway) But adding the data at the end of the URI after a ? could easily run into the maximum URI-length problem. > > Another, far less elegant solution would be to have the proxy "tell > > itself" to grab the source. i.e. load 2 urls in succession: > > http://localhost:8888/Source.html?getnexturlsource=3Dtrue > > http://wherever.com... > > But then both are valid URI's... >=20 > Eew. That's a race condition. Yep. Baaad. > > or: http://localhost:8888/Source.html?getsourceof=3Dhttp://.... since y= ou > > can always encode nasties like ? in the second URL with %. The module > > then just compares each URI to its internal variable $getsourceof... >=20 > That could work, but kind of requires new functionality written into the > core script. I don't know about you, but I think I kind of want to avoid > that, it's a little inelegant. Arg. Yeah, but editing the config will require interaction with the core script too... This mechanism requires a whole lot of code to be written... Basically duplicating &handle_proxy, except storing the data, rather than feeding it to the client. Ick. Cheers, -- Bob Bob McElrath (rsm...@st...)=20 Univ. of Wisconsin at Madison, Department of Physics |