Menu

#115 bogofilter tokenizes parts from a MIME body's base64 stream

v1.0_(example)
open-fixed
None
5
2020-05-07
2013-11-11
No

Matt Garretson assembly.state.ny.us!mattg reported to the bogofilter user's list on 2013-10-30 what might be a MIME decoder bug in 1.2.4 at first glance. In order to retain his information while the pastebin data is about to expire in two days, I am uploading his data here for later inspection.

I occasionally see binary attachments that bogofilter (1.2.4) will tokenize, resulting in tons of BASE64 string fragments in the token stream. Here's an example message (cleaned a bit):

http://pastebin.com/FEvjRp9F (line 38+)
(that is the first nonempty line below application/applefile)

and the output of bogolexer on it:

http://pastebin.com/tabnXZSP (line 94+)
(that is the line starting with get_token: 1 "AAEW")

Since the attachments are not of a text type, I would expect bogofilter to skip over them.

Any ideas what is happening here? Is there a debug flag that would help narrow it down?

Thanks...

2 Attachments

Discussion

  • Matthias Andree

    Matthias Andree - 2020-05-07
    • status: open --> open-fixed
     
  • Matthias Andree

    Matthias Andree - 2020-05-07

    Turns out that the MIME boundaries are in violation of RFC2046 which does not comprise @ as valid character, so the lexer would not detect the MIME boundaries at all. Extending src/lexer_v3.l's definitions. Fixed by commit a88322e8.

     

Log in to post a comment.