#108 Bogofilter cannot parse msg if body is base64-encoded

closed-fixed
nobody
None
5
2009-08-04
2009-03-13
Roman Trunov
No

If message text is base64-encoded (or uuencoded), bogofilter cannot parse such message. Parsed tokens will be undecoded garbage.

The message have following format

====================================
More-Header-Lines
Content-Type: text/plain; charset="windows-1251"
Content-Transfer-Encoding: base64
Content-Disposition: inline
More-Header-Lines

BASE64-ENCODED-TEXT

The message is attached.

After a quick debugging, it look like a big logic problem with lexx parser which prefetches lines in advance. When parsers detects "end of message header" event, next line of message was already fetched and buffered by lexx. Since bogofilter still was in "header" mode at the point of this fetch, line was buffered as is, without base64 decoding.

Discussion

  • Roman Trunov
    Roman Trunov
    2009-03-13

    Example of base64-encoded spam message

     
    Attachments
    • status: open --> pending-works-for-me
     
  • Hi Roman,

    this bug is supposed to be fixed in bogofilter version 1.2.1. Can you please try the new bogofilter version and let us know if the problem persists?

    It appears fixed for me. If I run bogolexer -p on your message, I get this (in the hopes that SourceForge does not trash it) which looks like proper Cyrillic script to me (I don't understand Slavic languages though):

    ...
    head:Content-Disposition
    head:inline
    head:Message-Id
    Привет
    Вам
    Необычное
    Приглашение!
    ...
    для
    лучших
    друзей!

    Thanks for taking the time to report this.

     
  • Oh, and I've indeed made sure that the lexer does not read ahead; I adjusted some rules so that the lexer itself needs not look past \n (line feed), and I made the lexer "interactive", so it does not read ahead voluntarily (i. e. unless it must).

     
  • David Relson
    David Relson
    2009-08-04

    Fixed in bogofilter-1.2.1 (released on 1 Aug 2009)

     
  • David Relson
    David Relson
    2009-08-04

    • status: pending-works-for-me --> closed-fixed
     
  • Roman has responded off-tracker (comment posting was already closed) this:
    "[...] the fix works fine. I tested Bogofilter
    1.2.1 on my collection of such one-liners and all of them
    were decoded properly. Thank you for fixing this."