[Filterproxy-devel] Re: New module for FilterProxy
Brought to you by:
mcelrath
From: Bob M. <mce...@us...> - 2001-08-04 23:44:42
|
John F Waymouth [way...@wp...] wrote: >=20 > It can't be that hard. Just a rule to edit out javascript.open enclosed > by <script>, unless the open call opens something in a list of exceptions. > Also kill onload events and other such, but I think links with javascript: > with popups should be allowed, because, in general, annoying content > doesn't appear from those. More rules will be happily accepted. ;) I know little about javascript at this point. > Well, I was tired last night, and this was a quick hack. I had written my > filter rule as yours is above, with -() rules for everything else, but > you're right, Header does need to be run, only Rewrite needs to be > avoided. I just added a big red warning if you try to disable Header. In testing it, I saw it leaving numbers at the top (content-length in hex, BTW) and 0 at the bottom...when Header was disabled. > > I'll put these or something similar in the docs soon. (I just recently > > figured out how to do it). What I *really* want though is something > > like "edit filtering rules applied to this site". And "view how this > > site was filtered" -- a la your view source, but with removed/rewritten > > stuff in funny colors. Someday... >=20 > It's great to have these toolbar buttons with javascript, but I see a few > problems. They don't work if javascript is disabled, and not all browsers > will support them. Maybe it'd be possible to frame EVERYTHING that runs > through the proxy, with a few nav buttons? Ick, I really don't want to do that. Frames are evil... Which browsers don't support javascript anyway? If you want to use something like FilterProxy, you can enable javascript, and filter the bad stuff. ;) A "control page" would be better, I think. List the last several URL's loaded, and let you manipulate them. (show source, show filtering, ...) This one's on my TODO list. > As far as showing the Source with Rewritten chunks in funny colors... this > is ASKING to be diff'd. I think we could use the inherent abilities of > your ordering scheme to pull this one off. All you have to do is save a > copy of the content in a data structure in the module, or in a file, > before Rewrite is called, then, run again after rewrite, and do some kind > of creative diff on it (I'm sure there's a module in CPAN for diffing... > if not, we could parse output from diff), hiliting changed parts, and > prettifying everything. I did a web search and ran across an HTML diff tool for "only" $149.95. I don't see anything in CPAN that would be useful. Undoubtedly someone has written something similar using HTML::Parser, but I don't want to add HTML::Parser to this project, it has enough dependencies as it is, and HTML::Parser is a big one. > Well, we could pass it through a beautifier, if such exists, or we could > write our own that hilites Rewrite changes, but I kind of doubt an > existent hiliter would be flexible enough to do the hiliting and signify > rewrite changes. D'oh. :) I'm gonna try to write a Rewrite-changes-highlighter this weekend. > > I can think of two other ways of doing this: > > http://your.hostname.here:8888/Source.html?http://site.for.source > > ...or... > > source:http://site.for.source/ >=20 > the first could be cool, but I'm not quite sure it will fall under URI > specs if it looks like this: > http://hostname:8888/Source.html?http://www.google.com/search?q=3Dbah >=20 > Because of the double ?. I'd have to look at the RFC again. The second > won't fly, because source:http:// will confuse browsers. I originally > tried something like source:// or wysiwyg://, but mozilla, at least, won't > let it fly. Multiple ? could be OK, as long as we strip it off before sending it on. Another option might be to append #FilterProxy:gimmedasource to the end of the URL (which is a valid URI). > The advantage, as I see it, to making a fake domain name, is the ability > to seamlessly integrate the Source module just like any other module. The > disadvantage is that it's going to mask a host named "Source", and that > the URL might not be legit. To fix the first, we could verbosify it some > more, like using a hostname "viewsourceof" or something, and to fix the > latter, maybe this falls under spec (I think so): >=20 > http://viewsourceof/http://www.google.com/search?q=3Dbah >=20 > I'm not sure I like the idea of throwing everything into a .html file on > the server end, because that sort of breaks the elegance of making it a > standard filter module. Nod, pretty clever. I hadn't thought of that until you sent your module. What about an .html file that had two frames, one showing some FilterProxy config stuff, and the other the source or hightlighted source of the page? The frame with the source could still use your module... Another, far less elegant solution would be to have the proxy "tell itself" to grab the source. i.e. load 2 urls in succession: http://localhost:8888/Source.html?getnexturlsource=3Dtrue http://wherever.com... But then both are valid URI's... or: http://localhost:8888/Source.html?getsourceof=3Dhttp://.... since you can always encode nasties like ? in the second URL with %. The module then just compares each URI to its internal variable $getsourceof... > > until I've made the move away from eperl, a task I'm not exactly looking > > forward to. >=20 > What are you planning on moving to? Another perl based solution, or > something else entirely? I'll be very afraid if you actually make them > CGI scripts, the server end is already quite massive ;) No, CGI scripts are so ugly. I'm looking at HTML::Embperl and HTML::Mason. If you have experience with either, I'd like to hear it... > > Argh. The only hostname it makes sense to have after the http:// is the > > hostname of the proxy...but that throws FilterProxy into server mode... > > Ok, well, I'll leave it as you wrote it until one of us comes up with a > > better URL scheme. ;) >=20 > Ah, but the PORT is still changeable... This'll kind of obfuscate the > source viewing URL, but at least it works with your server scheme, sorta. > Source.pm -10 would just have to snag a request to the hostname:8887 (or > whatever), and change it over, because the proxy would be in proxy mode. Hmm...I'll have to think about that... Cheers, -- Bob Bob McElrath (rsm...@st...)=20 Univ. of Wisconsin at Madison, Department of Physics |