Matt Garretson assembly.state.ny.us!mattg reported to the bogofilter user's list on 2013-10-30 what might be a MIME decoder bug in 1.2.4 at first glance. In order to retain his information while the pastebin data is about to expire in two days, I am uploading his data here for later inspection.
I occasionally see binary attachments that bogofilter (1.2.4) will tokenize, resulting in tons of BASE64 string fragments in the token stream. Here's an example message (cleaned a bit):
http://pastebin.com/FEvjRp9F (line 38+)
(that is the first nonempty line below application/applefile)and the output of bogolexer on it:
http://pastebin.com/tabnXZSP (line 94+)
(that is the line starting with get_token: 1 "AAEW")Since the attachments are not of a text type, I would expect bogofilter to skip over them.
Any ideas what is happening here? Is there a debug flag that would help narrow it down?
Thanks...
Turns out that the MIME boundaries are in violation of RFC2046 which does not comprise
@
as valid character, so the lexer would not detect the MIME boundaries at all. Extending src/lexer_v3.l's definitions. Fixed by commit a88322e8.