[Filterproxy-devel] Re: stupid questions concerning filterproxy
Brought to you by:
mcelrath
From: Bob M. <mce...@dr...> - 2002-07-16 23:21:09
|
Andreas Banze [an...@ba...] wrote: > I'm neither a perl nor a http-crack, so I'm mainly searching for your > help/advice: >=20 > I need a proxy that allows me to check binary data for viruses. It seems > that filterproxy is the right toy to do it. >=20 > Problem: Because I'm not a http crack (and I didn't find it it the docs I > read up to now): The mime type _should_ represent the correct description > for the content, right? So if the content is identified by the mimeheader= as > text/html it should not be usable as binary when downloaded? Well remember that if you're worried about downloading malicious code from a malicious server, the mime-types can be set to anything by the server's operator. Heck, they could be "cartoon/mickeymouse". I presume you're also worried about stupid users, who may download something, save it to disk, and manually execute it. Then it doesn't matter what the mime-type was. By the time it's saved to disk the mime-type is lost. BTW identifying a windoze executable is easy. See the 'file' command under any good unix. A FilterProxy module could check for the signature of an executable first, and pass everything through that is not executable. (should be fast) > Or is the mimetype only a hint that may be overridden by the extension of > the file (probably it'll not work in the browser but with "save as" the f= ile > is usable)? No browser should pay attention to the extension of the file. This is a bad, bad violation of the HTTP spec, and such browsers should be burned at the stake. There is NO INFORMATION in the URL about the file type. That's what the Content-Type and Content-Encoding headers are for. I believe IE uses extensions. Not sure about Netscape 4.x but I wouldn't be surprised. (But we're all using mozilla anyway, right?!?!?!?! ;) > If the first is correct than writing a filter for filterproxy to scan all > binaries for viruses would be correct. If the second is correct I would n= eed > to add overhead by checking the filetype of every transferred file. > (Overhead should be minimal but unfortunately I have many users). You could write a filter to scan binaries for viruses. This would be pretty simple. Don't trust the mime-type, filter ALL data that comes through. (I think one famous IE hack is a mime-type text/html with an extension .exe -- IE stupidly executes it) But I've never really used windows, I'm no expert on virii. The hard part is maintaining a database of signatures for virus binaries. Do you have a ready source for such a thing? (but see below) > I didn't dig too much in your sources, but I think it is possible to > exchange the mimeheader and the content while the proxy works (e.g. to > exchange the binary header and the binary file with a webpage that states > you've got a virus). This should be no problem. Change the response to some HTML, or maybe send a redirect. > So for the second question: How good is filterproxy in the means of load = and > scalability? Is it mature enough to be used in such a way or should I stop > draeming and get viruswall or another expensive content filtering system? I don't know. As far as HTML filtering, I would not recommend FilterProxy for large deployment. Filtering HTML is a CPU intensive task (consider that it can take your browser ~seconds to load a complex page). The majority of the load comes from parsing HTML. On my computer FilterProxy takes ~0.05 seconds on most pages. So you do the math for your situation, I think 20 users would be a practical maximum on this machine. (That's on my 500MHz alpha -- recent AMD/Intel PC's are *much* faster) 99.9% of the time it takes is parsing HTML. Now, just checking for virus signatures could be much faster than this. But you should benchmark the signature-checking code to evaluate if FilterProxy would be fast enough. The HTTP proxy portion of FilterProxy is very fast, but I have never benchmarked it with high load. Maybe turn off all filtering, and use apache's dbench or something to test it. (If you do any testing, please copy me the results ;) > thanks in advance for any (even short) anwer (sorry for not using the > mailinglist but I already listed on too many of them and for one question > it's a little bit to much of an effort - I promise that I'll subscribe for > further questions. No biggie. ;) I'm the only one that sends stuff to that list anyway. Here's something a web search turned up: http://www.amavis.org/ It's a virus/mail scanner in perl. Shouldn't be hard to write a FilterProxy module that uses it. From a FilterProxy module, you can get the entire document in a $scalar. If you want to write such a module, take a look at html/Skeleton.html and FilterProxy/Skeleton.pm, which are heavily commented and should give you a starting point. Cheers, -- Bob Bob McElrath (rsm...@st...)=20 Univ. of Wisconsin at Madison, Department of Physics |