An idea to handle random word generators trying to
trick spam filters: The number of unusual/unrecognized
words should be counted for the subject line and each
line of the message body. The resulting statistics
should be included in the Bayesian score. This way,
random word generators would easily reveal spam (since
spam often contains highly unusual words in message
lines for instance). One would need a dictionary or
could use the words already available in the ham/spam
databases to make the distinction whether a word is
unusual or not.
Log in to post a comment.