pyzor-users Mailing List for Pyzor (Page 12)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

> 1) how can the server get a wl count of 10000 at the first place for
> an obvious piece of spam? (I have a handle of similar emails that
> falls into the same case);

This digest (778941d994b5281bf5652cd293a2761421cc109d) is a special
case.  Ticket #1037314 deals with this case.  Basically, the only
content that Pyzor finds to use for digesting for some types of
message is:

# pyzor predigest < ~/message.eml
<!DOCTYPEHTMLPUBLICHTML4.0

Obviously, this isn't text that would be unique to a message.  Until
that ticket is resolved, both ham and spam can end up with the same
digest.  With a classifier like Pyzor (where the digests are meant to
be unique), it is many times worse to get a false positive than a
false negative.  For that reason, I manually set the whitelist count
to 10,000 for this one digest, so that until the ticket is resolved,
messages of this type will never be classified as spam.  That means
that there will be a few spam that are missed, but no ham will be
incorrectly classified as spam, which is vastly more important.

> 2) the client seems to override the end result with even a whitelist
> count of 1, judging from the source code.

That's correct - this was also the case in 0.4 - I believe it has been
true ever since Frank originally added the whitelist functionality.

That's a decade before my time, but my guess would be that he felt
that the whitelist functionality was necessary, but wanted to ensure
that existing tools (perhaps the SpamAssassin plugin) continued to
work.  For example, the current SpamAssassin plug-in (which could well
be the same code as when 0.4 was released) ignores the whitelist count
completely.  That means that unless the hit count is adjusted, the
whitelisting would have no effect.  Since authentication is required
for the whitelist command, and a false positive is vastly worse
(especially with a hash-based classifier) than a false negative, it
seems a reasonable choice.

Looking forward, my feeling (as outlined on the list previously), is
that adding a new command ("score"), which combined the hit and
whitelist counts to produce a 0-1 score, would be a useful addition.
This would allow a more refined use of the two counts.  I don't think
it's right to adjust the current behaviour of the "check" command,
since it has behaved that way for so long.  If users wish to make use
of the individual counts, they they can either do a check command
without using the standard pyzor client (since it is the client that
overrules the hit count, not the server), or use the info command and
parse the result accordingly.

Cheers,
Tony

2002	Jan	Feb	Mar	Apr (75)	May (6)	Jun (6)	Jul (9)	Aug (46)	Sep (28)	Oct (56)	Nov (23)	Dec
2003	Jan (23)	Feb (13)	Mar (10)	Apr (11)	May (23)	Jun (9)	Jul (6)	Aug (20)	Sep (28)	Oct (1)	Nov (23)	Dec (1)
2004	Jan (9)	Feb (6)	Mar (3)	Apr (12)	May (14)	Jun (3)	Jul (2)	Aug (9)	Sep (3)	Oct (8)	Nov (43)	Dec (9)
2005	Jan	Feb (1)	Mar (5)	Apr (17)	May (4)	Jun (2)	Jul (3)	Aug (2)	Sep (7)	Oct (8)	Nov	Dec (3)
2006	Jan (4)	Feb (2)	Mar (6)	Apr (3)	May	Jun (31)	Jul (4)	Aug (3)	Sep (5)	Oct (19)	Nov (16)	Dec (9)
2007	Jan	Feb	Mar (6)	Apr	May	Jun	Jul (5)	Aug	Sep (23)	Oct (7)	Nov (6)	Dec
2008	Jan (9)	Feb	Mar	Apr (9)	May (11)	Jun	Jul (1)	Aug (1)	Sep (3)	Oct	Nov (10)	Dec
2009	Jan (3)	Feb	Mar (5)	Apr (26)	May (45)	Jun (16)	Jul (41)	Aug (25)	Sep (4)	Oct (1)	Nov (8)	Dec (5)
2010	Jan (1)	Feb (3)	Mar (2)	Apr (21)	May (4)	Jun (18)	Jul (3)	Aug (2)	Sep (12)	Oct	Nov	Dec (5)
2011	Jan	Feb (3)	Mar (6)	Apr	May (1)	Jun (3)	Jul	Aug (4)	Sep (3)	Oct (1)	Nov	Dec (9)
2012	Jan (6)	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2013	Jan (4)	Feb	Mar (1)	Apr	May (4)	Jun (7)	Jul	Aug	Sep	Oct	Nov (4)	Dec
2014	Jan	Feb	Mar	Apr (2)	May (3)	Jun (3)	Jul (7)	Aug (1)	Sep (3)	Oct (2)	Nov (8)	Dec
2015	Jan	Feb (2)	Mar	Apr	May	Jun (4)	Jul	Aug (4)	Sep	Oct (2)	Nov (1)	Dec (5)
2016	Jan	Feb	Mar	Apr	May	Jun (1)	Jul (2)	Aug	Sep	Oct	Nov	Dec
2017	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep (2)	Oct	Nov	Dec

pyzor-users Mailing List for Pyzor (Page 12)

pyzor-users — general discussion of Pyzor and Pyzor-related topics