open response to developerWorks article: Pyzor

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

I've just read your IBM developerWorks article, "Spam filtering
techniques."  You state that Pyzor produced no false positives for you.  
I can assure you that currently in Pyzor, there is no mechanism for
preventing false positives besides manual whitelisting.  Assuming the
maintainer of the server you are using for Pyzor will correct false
positives is, *ahem*, an assumption :) Some sort of trust mechanism like
Razor has is being thought about for an upcoming release, but no specifics
yet.

The only reason I can suggest that Pyzor gave you got no false positives
was that the community is quite small and, if you will, "sophisticated".

Also, the sentence "These tools use clever statistical techniques for
creating digests, so that spams with minor or automated mutations (or just
different headers resulting from transport routes) do not prevent
recognition of message identity" flatters Pyzor :P It's probably not as
clever or 'statistical' as you might think :)

Also, as to the network 'slowness' of Pyzor, I'm a little curious as to
what would be slow about it, since it is quite lightweight, using a single
small UDP packet each way.  Could it be instead the load/compile startup
time that was an issue?  If so, then using ReadyExec in conjunction with
Pyzor might be a solution.

As to an ISP using the Pyzor public server, I highly, highly discourage
such behaviour.  Pyzor, unlike Razor, comes with a server, which I
recommend that ISPs use in isolation.  The box the public server is
sitting on isn't setup for ISP-level loads.  This, of course, reduces the
effectiveness of Pyzor, but there I am planning on implementing a
batch-server-peering mechanism.

Of course, though, thank you for writing about Pyzor.

-- 
Frank Tobin			http://www.neverending.org/~ftobin/