Re: what should be whitelisted?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On (09-09-04 21:57), Alpana Weaver wrote:
> What types of emails should be whitelisted?  I assume that there is no 
> value in me whitelisting for example a personal email from my sister of 
> which I am the only recipient because no one else will receive that same 
> email?  If that is correct, do we only whitelist hams that have many 
> recipients e.g. the weekly emails sent from 
> Mar...@mo... where recipients have elected to 
> subscribe to the emails?
> I suppose my broader question is: how should spam and ham be defined for 
> the use of Pyzor?

>From my point of view there are two reasons why someone would whitelist
a message.

The first (and may be the more important one) is if a spam mail has been
correctly reported as spam. Unfortunately it may happen that the digest
of the spam mail is not meaningful and therefore may match the digest of
ham mails. Check this thread [1] for an example. 
Here I assume that a low false positive rate is more important than a
high true positive rate.

The second case is that someone reported (by accident or not) a ham message 
as spam. As it is not easy to remove reported messages from the server
someone has to whitelist this message in order that it does not hit
pyzor any more. This may be the case for bulk mails for example.
Unfortunately it is very likely that these messages to your users already
hit pyzor at the time when you whitelist it.

As the messages from your sister should be unique there is no need to
whitelist them. That's true. This would just generate futile entries in
pyzords database.

>From my point of view whitelisting messages from bulk senders that have not 
been reported yet, does not make sense too. At the time you find this
message in you mailbox it is very likely that it is already delivered to
all recipients. So a whitelisting entry in the database would be useless
too. Hopefully the next month's message will differ from the actual
message ;-)

If you indeed get recurring messages with the same digest it may be a good idea
to whitelist those. Just to prevent from accidental reporting. (For
reporting would not have any affect after this.) 
Anyhow, I can't see any use case for this.

There may be another reason for whitelisting messages. Assuming that
pyzor checks are done _before_ the mail goes into spamassassin it could
save a lot of system resources when a whitelisted message bypasses
spamassin. But I am not sure if this acutally makes sense in real world.

Good luck with your paper. Would like to read it after it's finished.

[1]
http://sourceforge.net/mailarchive/message.php?msg_name=1248706984.24454.17.camel%40werner