From: Edward E. <web...@ed...> - 2006-05-24 19:27:52
|
Bastian Kleineidam wrote: > Thanks for the patch! I'll try to merge it into the current development > version. One note: since there is only one thread with select I/O, calls > to urllib should be avoided since they can block and stop the whole > proxy until they timeout. I will add the urllib call for now but we > should aim for adapting the HttpProxyClient class to get header data in > the background (currently it only gets the content for <script src=""> > stuff). I tried using the Http* classes at first but couldn't get it working properly out in the time I was willing to spend. I meant to mention that in the README, guess it slipped my mind. > Filters are configured through a set of rules; the folders are just to > separate the rules more easily. So one rule can influence more than one > filter module (for example the rating rules). Look at the rules in the first section (stage Request, filter Blocker) of the example output file. The last four rules are: rule Slashdot JS ad 2 in folder Specific adverts rule CGI adverts with 'banner' etc. in folder General adverts rule Images with numeric IP in folder General adverts rule Google pageads in folder Specific adverts On the filter configuration page, the folder 'General adverts' comes before the folder 'Specific adverts'. The ordering of the folders implies that all the rules in General adverts get applied first, then the rules in Specific adverts, but that doesn't happen. It's unclear what the ordering of the folders really does. I already mentioned the ordering issues with removing/adding/replacing HTTP headers. That's the confusion I'm talking about. Again, maybe it's just my lack of understanding at this point of how things work. >>One suggestion: disable the Activate Javascript Filtering rule by >>default. It's incredibly slow even on my relatively recent machine. > > Ok, why not. The JS filtering is slow for two reasons: > 1) it has to download all <script src> contents in the background. Until > then the other processing filters are waiting. > 2) the wc.js.clean() method is slow as hell for large script sources. > > To speed up 1), there should be caching involved. WebCleaner has no > caching right now :/ > And for 2), that is on the todo list. If it can be made faster, that's great. But as long as it's slow as molasses, some people will download webcleaner, run it, and abandon it as too slow. They'll never find out that a simple configuration change would speed things up enormously, they'll just assume the whole thing is slow. There's a real danger of creating a bad impression that turns away potential users. That's my concern. Thanks for your feedback. Edward |