Allow a text pre-filter

Fast Bayesian spam filter along lines suggested by Paul Graham

Brought to you by: m-a

#32 Allow a text pre-filter

Milestone: None

Status: open

Owner: nobody

Labels: None

Priority: 5

Updated: 2019-01-25

Created: 2016-02-21

Creator: fluffy

Private: No

I would like the ability to have Bogofilter run a pre-filter on a message and perform its analysis on the filtered message, but adding the X-Bogosity header to the original message.

My use case: I receive a lot of legitimate email with HTML and CSS. I also receive a lot of spam with HTML and CSS. At this point, bogofilter has a hard time telling them apart, because in both messages, the vast majority of the message body is made up of markup, and without context there's no way to tell them apart. For example, bogofilter -vvv on a spam that was marked as 'Ham' with spamicity of 0.000000 shows these as some of the dominating tokens:

  "border-collapse"                  1523  0.197200  0.003752  0.018676 +
  "collapse"                         1548  0.199867  0.004179  0.020482 +
  "ms-text-size-adjust"              1123  0.142400  0.004690  0.031892 +
  "webkit-text-size-adjust"          2001  0.252533  0.009125  0.034876 +
  "padding-right"                    2432  0.236400  0.056200  0.192072 +

all of which are tokens that exist in legitimate email as well. So, basically, spammers are getting rewarded for making use of a lot of fancy CSS formatting.

The ability to put in an arbitrary number of prefilters would allow more experiments with additional approaches for improving bogofilter's spam filtering, without tainting bogofilter's Bayesian core. While it's technically possible to do this with bogofilter as it is, building and maintaining the procmail rules is a bit unwieldy and error-prone, which is why I would prefer to have this pipeline functionality as part of bogofilter itself.

Allow a text pre-filter

Fast Bayesian spam filter along lines suggested by Paul Graham

Group

Searches

Help

#32 Allow a text pre-filter

Discussion