2002-09-17 13:22:18 UTC
I have two mails from Paypal that I wished to
receive but were classified as spam. Even after
rebuilding the rating file with these two mails
included als "good" mail they are still classified
as spam.
I analized this problem and I think I found the
reason:
Most of my spam mails are HTML mails. Most of my
good mails are non-HTML, plain mails. The Paypal
mails are - unlike most good mails - HTML mails.
In the text there would be more than enough
tokens to distinguish this mail from spam - BUT
Bayespam rates several HTML tokens as
"interesting", with a high "spamminess" (because
they are so common in spam), so that other tokens
from the text with low spamminess are swept aside,
and thus the whole mail becomes spam.
Bottom line: As long as Bayespam treats the HTML
tokens of this mail the same way as the text
tokens themselves it will classify these Paypal
mails as spam, at least until I receive a lot
more good HTML mails.
Does Bayespam need a mode where it rips out
everything between <> like it does now with
HTML comments?