Re: [Dproxy-devel] dproxy 1.x
Brought to you by:
mattpratt
From: Andreas H. <hof...@in...> - 2000-02-10 02:32:11
|
jeroen wrote: > > Andreas Hofmeister wrote: > > > Maybe it is an idea to add a keyword search to it??? e.g. *xxx in the > > > deny file would cause all domains with 'xxx' to get rejected? > > <snip> > ... if you have 'hundreds' of sites you want to block this might be a little > more effective for reducing the > size of this file. You think of some sort of regex. Mhhh - only problem with this is, one must be extremely careful with them - you might remember that AOL drama: they tried to block newsgroups like alt.sex etc. but they also blocked some groups about breast cancer, aids and such ... There are some functions in the libc for regex comparison we could use. Only thing about regex is, they are expensive, either in means of memory or speed. To be efficient, a regular expression must be pre compiled. We can not re-read and recompile them for every query we get - and of course, not for every of some hundred sites to block. Maybe we could implement that regex stuff like this : every line that starts with a special marker- say '@' (not part of any domainname) - will be regarded as a regex. On startup, dproxy reads the block file once, checks for regex and compiles them. In normal operation, dproxy simply ignores every line starting with a '@' > (I think a firewall rule might be a better place for this kind of > thing.) Both are different aspects of blocking, a FW rule can block the data transfer to a site, a blocking rule in the name server can hide the existence of a site. BTW. your refresh things are not yet in dproxy-1.x . Please wait a little before you start to implement this. I do some experiments with 'late forks', which makes a mem cache possible This will allow us to a do refresh implementation without the problems I mentioned. Also note, that the semantics for the time field in the cache file has changed! We now get and use the real TTL from the upstream DNS, 'cache_purge_time' has a very different meaning now, (it is simply the default TTL send to the clients, param should be renamed) There are two additional parameters that would make sense with this, 'cache_min_ttl' and 'cache_max_ttl', but both are not implemented yet. The first was for sites that send unbelievable short ttl's (e.g. netscape.com sys 1h ) , the second is for a little bit more security. Ciao Andreas |