|
From: Keith J. <kja...@cr...> - 2003-05-01 18:46:48
|
> As I stated in the previous mail, pyzor doesn't hash the > entire mail, just > a section of it after removing data that would likely contribute to > non-uniqueness, such as whitespace, urls, and email > addresses. The exact > rules for removing suspicious elements are described in > pyzor.client.DataDigester. So, browsing the source, studying the rules, I can compose a spam mail that will easily defeat this system. So then, not to be rude or putting-down, this is mostly useless. People started blocking key words like 'cock'. So the spammers use 'c0ck'. People start using this system, the spammers will read your source and get around it. So, rather than a real solution to spam, this sounds more like one step in the cat and mouse game. I'm sorry for wasting everyone's time, and my lack of ability to read python source :) > > The cons are that's a lot of data to transmit to the server. If you have an > > email with lots of short words, you are actually sending more data to the > > server than the size of the original message. > > Yes, it is a lot of data, and given the volume of use on the public > server, quite impractical. I disagree. While not as compact and quick, it will catch more spam, which is the primary goal. Google has the whole world cached. I'm sure storing spam emails is not that impractical. Besides, it doesn't have to keep it forever. Spams that were sent out two months ago, are not likely to be looked up, how many people don't check their email in two months? So, keeping two months, and some efficient storage of hashes, such as not duplicating them across spam mail entries, and using compression for server/client communication, I don't think it's really too impractical. I'd give a few more bytes for better spam protection. I just don't think pyzor is going to work for me as it stands now. And if I were writing a mass mailer program that they advertise via mass mail ;) I'd design it to beat this. I do wish you much luck with this project though. Anything open source for fighting spam is a Good Thing (tm). Thanks, Keith |