From: <ia...@ha...> - 2001-12-28 02:18:02
|
On Thu, Dec 27, 2001 at 09:03:13PM -0500, Billy Harvey wrote: > Ian, I've been using Razor about a month now, and it's been quite > effective. I didn't think to implement any kind of hit-counter at my > site, but I'd say it's catching about 75% of spam. Yes, hopefully it is very effective right now, but spammers are getting smarter every day ;-) > > If Razor uses a hash to detect spam, couldn't this be easily > > circumvented by adding some randomness (an extra /n here, and extra > > whitespace there, or even just some random variations in the text > > ("Hello Friend", "Dear Friend", "Howdy", "Hi" etc) to each spam? >=20 > Yes it can be circumvented just that way. There is a proposal I read in > the archives to use a hash that varies only slightly with slight changes > in content. Then a "score" could be returned to you and you could > choose to ignore anything that's withing some value of known spam. I have had similar ideas, although I suspect that it might be difficult to create such a hash. One way is to use an algorithm which can determine "edit distance" (at least one exists), it will calculate the number of changes, insertions, and deletions, to turn one string into another. Unfortunately it is quitte processor intensive, and we would lose the benefits of hashes altogether as entire messages would need to be passed around. A few years ago, while studying artificial intelligence, I wrote a simple piece of software which could be given two sets of strings, and produce a regular expression which would sort strings into one of these two sets, the assumption being that the regular expression would be sufficiently general to be able to sort other strings which were not part of the test set. It was actually quite successful and could easily be applied to spam, although there would be a danger of increased liklihood of false-positives. Still, I think that ultimately spam filters will have to get smarter about how they detect spam. =20 > The server load is very low according to the original developer.=20 > There's not a debianized version of the server so I haven't tested this > myself (I get probably 1000 emails a day here - far different than a > 1000 tests per second for sure). That is reassuring, although there may come a time when the cost of operating a server does become significant. The easiest option (that I would be happy with) would be the formation of a non-profit corporation to which people could send donations which could operate the servers. > I have thought about the delayed test too, but haven't tried to > implement it. I don't think it would be too hard, as I keep my mail in > an IMAP directory, where each mail is a file. A cron job could parse > through the Inboxs of users and double check the unread mail every hour > or so without much overhead. Yeah, it should be easy. A better solution might be to have the job run whenever someone logs in, or just before they start their mail client. Kind regards, Ian. --=20 Ian Clarke ia...@fr... Founder & Coordinator, The Freenet Project http://freenetproject.org/ Chief Technology Officer, Uprizer Inc. http://www.uprizer.com/ |