Menu

#32 Allow a text pre-filter

None
open
nobody
None
5
2019-01-25
2016-02-21
fluffy
No

I would like the ability to have Bogofilter run a pre-filter on a message and perform its analysis on the filtered message, but adding the X-Bogosity header to the original message.

My use case: I receive a lot of legitimate email with HTML and CSS. I also receive a lot of spam with HTML and CSS. At this point, bogofilter has a hard time telling them apart, because in both messages, the vast majority of the message body is made up of markup, and without context there's no way to tell them apart. For example, bogofilter -vvv on a spam that was marked as 'Ham' with spamicity of 0.000000 shows these as some of the dominating tokens:

  "border-collapse"                  1523  0.197200  0.003752  0.018676 +
  "collapse"                         1548  0.199867  0.004179  0.020482 +
  "ms-text-size-adjust"              1123  0.142400  0.004690  0.031892 +
  "webkit-text-size-adjust"          2001  0.252533  0.009125  0.034876 +
  "padding-right"                    2432  0.236400  0.056200  0.192072 +

all of which are tokens that exist in legitimate email as well. So, basically, spammers are getting rewarded for making use of a lot of fancy CSS formatting.

The ability to put in an arbitrary number of prefilters would allow more experiments with additional approaches for improving bogofilter's spam filtering, without tainting bogofilter's Bayesian core. While it's technically possible to do this with bogofilter as it is, building and maintaining the procmail rules is a bit unwieldy and error-prone, which is why I would prefer to have this pipeline functionality as part of bogofilter itself.

Discussion


Log in to post a comment.