[Rabbit-proxy-users] ad blocking
Brought to you by:
ernimril
From: Luis S. <lso...@gl...> - 2010-12-11 17:09:10
|
Hello All, Does rabbit have the ability to block ads and malware sites using lists from aggregator sites such as 'http://pgl.yoyo.org/adservers/serverlist.php?hostformat=hosts&showintro=1&mimetype=plaintext' http://someonewhocares.org/hosts/hosts http://www.montanamenagerie.org/hostsfile/hosts.txt hosts-file.net.update http://www.hosts-file.net/hphosts-partial.asp hosts.mvps http://www.mvps.org/winhelp2002/hosts.txt I know that rabbit has the following blocking facility --- [rabbit.filter.BlockFilter] # This is a filter that blocks access to resources. # return a 403 forbidden for these requests. blockURLmatching=(\.sex\.|[-.]ad([sx]?)\.|/ad\.|adserving\.|ad101com-|pagead/imgad|as-us.falkag.net|clicktorrent.info) --- This facility is very flexible but difficult to maintain in an every changing internet landscape. A better approach might be to use lists from aggragators who's mission is to keep up to date lists of sites that offer ads and malware. My first naive approach at solving this problem was to augment /etc/hosts on our proxy server with lists from the above sites. I soon discovered rabbit ignored these. It seems that rabbit uses javands to access the DNS service directly to do queries ignoring the system resolver. So replacing /etc/hosts does not work. A better solution would be for rabbit check against a preconfigured "block" list of sites and then return 403 errors when the urls containing these hosts names are requested. It should be pretty simple thing to do to query the bad host table prior to doing DNS query. I can see two implementations of this approach. 1. rabbit reads the bad host table on startup and then keeps an internal table for lookups (our current host table has over 600K entries so this approach should be manageable) 2. a better approach might be to query an sql table for bad hosts prior to the lookup. This would be faster and more dynamic since the table could be updated automatically from an external process. I think that adding this facility to Rabbit should be pretty easy and quite valuable to the community. Those of us using rabbit are mostly running in a bandwidth limited environment and what better way to save bandwidth than to strip out ads. This approach also has the benefit protecting users from known malware sites. Anyone have any thoughts on this? Thanks, --luis -- Luis Soltero, Ph.D., MCS Director of Software Development, CTO Global Marine Networks, LLC StarPilot, LLC Tel: 865-379-8723 Fax: 865-681-5017 E-Mail: lso...@gl... Web: http://www.globalmarinenet.net Web: http://www.starpilotllc.com |