Hi. Why you lowercase words before adding them to the
databases? It can lower spam detection probability,
because spam very often capitalize (i.e. emphase) words
which normal mail does not.
Logged In: YES
The decision to lower case words was made long ago, before I
got involved with bogofilter. I view it as a trade-off of
accuracy vs speed and wordlist size.
If bogofilter was case sensitive, then the wordlist would
likely contain "The", "the", and "THE" which is (perhaps) a
bit much. Evaluating a message with the three
capitalizations would require 3 database accesses.
Along similar efficiency lines, bogofilter ignores
repetitions of a word in a message. One could argue that a
message that says "sex, sex, sex" is spammier than one that
simpley says "sex".
Anyhow, bogofilter is case-insensitive and is likely to stay
that way. If you are seriously interested, modify your copy
to preserve case and run some tests to see if it does better
than the released version.
Also, I suggest you subscribe to the mailing list by sending
a message to "firstname.lastname@example.org".
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.