#116 Mail crashes bogofilter

v1.0_(example)
open
crash (1)
1
2016-09-02
2014-02-23
No

Running attached mail through bogofilter crashes, see attachments. This is the one and only message in >>100.000 I have which causes this issue.

glibc detected bogofilter: realloc(): invalid next size:
0x00000000018b6ea0 ***
======= Backtrace: =========
/lib/libc.so.6(+0x71e16)[0x7fc85c6bee16]
/lib/libc.so.6(+0x77a2c)[0x7fc85c6c4a2c]
/lib/libc.so.6(realloc+0xf0)[0x7fc85c6c4d40]
bogofilter[0x40b34e]
bogofilter[0x410a29]
bogofilter[0x406305]
bogofilter[0x402cf5]
bogofilter[0x404c99]
bogofilter[0x402ebc]
/lib/libc.so.6(__libc_start_main+0xfd)[0x7fc85c66bc8d]
bogofilter[0x402a09]
======= Memory map: ========
00400000-00443000 r-xp 00000000 09:02
3629819 /home/pi/bin/Bogo/bin/bogofilter
00643000-00647000 rw-p 00043000 09:02
3629819 /home/pi/bin/Bogo/bin/bogofilter
00647000-0064a000 rw-p 00000000 00:00 0
01889000-018d0000 rw-p 00000000 00:00
0 [heap]
7fc854000000-7fc854021000 rw-p 00000000 00:00 0
7fc854021000-7fc858000000 ---p 00000000 00:00 0
7fc85b7e0000-7fc85b7f6000 r-xp 00000000 09:02
3514370 /lib/libgcc_s.so.1
7fc85b7f6000-7fc85b9f5000 ---p 00016000 09:02
3514370 /lib/libgcc_s.so.1
7fc85b9f5000-7fc85b9f6000 rw-p 00015000 09:02
3514370 /lib/libgcc_s.so.1
7fc85b9f6000-7fc85b9f8000 r-xp 00000000 09:02
1393964 /usr/lib/gconv/ISO8859-1.so
7fc85b9f8000-7fc85bbf7000 ---p 00002000 09:02
1393964 /usr/lib/gconv/ISO8859-1.so
7fc85bbf7000-7fc85bbf8000 r--p 00001000 09:02
1393964 /usr/lib/gconv/ISO8859-1.so
7fc85bbf8000-7fc85bbf9000 rw-p 00002000 09:02
1393964 /usr/lib/gconv/ISO8859-1.so
7fc85bbf9000-7fc85bc05000 r-xp 00000000 09:02
3514422 /lib/libnss_files-2.11.3.so
7fc85bc05000-7fc85be04000 ---p 0000c000 09:02
3514422 /lib/libnss_files-2.11.3.so
7fc85be04000-7fc85be05000 r--p 0000b000 09:02
3514422 /lib/libnss_files-2.11.3.so
7fc85be05000-7fc85be06000 rw-p 0000c000 09:02
3514422 /lib/libnss_files-2.11.3.so
7fc85be06000-7fc85be10000 r-xp 00000000 09:02
3516413 /lib/libnss_nis-2.11.3.so
7fc85be10000-7fc85c00f000 ---p 0000a000 09:02
3516413 /lib/libnss_nis-2.11.3.so
7fc85c00f000-7fc85c010000 r--p 00009000 09:02
3516413 /lib/libnss_nis-2.11.3.so
7fc85c010000-7fc85c011000 rw-p 0000a000 09:02
3516413 /lib/libnss_nis-2.11.3.so
7fc85c011000-7fc85c026000 r-xp 00000000 09:02
3516403 /lib/libnsl-2.11.3.so
7fc85c026000-7fc85c225000 ---p 00015000 09:02
3516403 /lib/libnsl-2.11.3.so
7fc85c225000-7fc85c226000 r--p 00014000 09:02
3516403 /lib/libnsl-2.11.3.so
7fc85c226000-7fc85c227000 rw-p 00015000 09:02
3516403 /lib/libnsl-2.11.3.so
7fc85c227000-7fc85c229000 rw-p 00000000 00:00 0
7fc85c229000-7fc85c230000 r-xp 00000000 09:02
3516398 /lib/libnss_compat-2.11.3.so
7fc85c230000-7fc85c42f000 ---p 00007000 09:02
3516398 /lib/libnss_compat-2.11.3.so
7fc85c42f000-7fc85c430000 r--p 00006000 09:02
3516398 /lib/libnss_compat-2.11.3.so
7fc85c430000-7fc85c431000 rw-p 00007000 09:02
3516398 /lib/libnss_compat-2.11.3.so
7fc85c431000-7fc85c448000 r-xp 00000000 09:02
3516394 /lib/libpthread-2.11.3.so
7fc85c448000-7fc85c647000 ---p 00017000 09:02
3516394 /lib/libpthread-2.11.3.so
7fc85c647000-7fc85c648000 r--p 00016000 09:02
3516394 /lib/libpthread-2.11.3.so
7fc85c648000-7fc85c649000 rw-p 00017000 09:02
3516394 /lib/libpthread-2.11.3.so
7fc85c649000-7fc85c64d000 rw-p 00000000 00:00 0
7fc85c64d000-7fc85c7a6000 r-xp 00000000 09:02
3516399 /lib/libc-2.11.3.so
7fc85c7a6000-7fc85c9a5000 ---p 00159000 09:02
3516399 /lib/libc-2.11.3.so
7fc85c9a5000-7fc85c9a9000 r--p 00158000 09:02
3516399 /lib/libc-2.11.3.so
7fc85c9a9000-7fc85c9aa000 rw-p 0015c000 09:02
3516399 /lib/libc-2.11.3.so
7fc85c9aa000-7fc85c9af000 rw-p 00000000 00:00 0
7fc85c9af000-7fc85ca2f000 r-xp 00000000 09:02
3516412 /lib/libm-2.11.3.so
7fc85ca2f000-7fc85cc2f000 ---p 00080000 09:02
3516412 /lib/libm-2.11.3.so
7fc85cc2f000-7fc85cc30000 r--p 00080000 09:02
3516412 /lib/libm-2.11.3.so
7fc85cc30000-7fc85cc31000 rw-p 00081000 09:02
3516412 /lib/libm-2.11.3.so
7fc85cc31000-7fc85cda7000 r-xp 00000000 09:02
1385361 /usr/lib/libdb-4.8.so
7fc85cda7000-7fc85cfa6000 ---p 00176000 09:02
1385361 /usr/lib/libdb-4.8.so
7fc85cfa6000-7fc85cfab000 rw-p 00175000 09:02
1385361 /usr/lib/libdb-4.8.so
7fc85cfab000-7fc85cfc9000 r-xp 00000000 09:02
3516395 /lib/ld-2.11.3.so
7fc85d140000-7fc85d147000 r--s 00000000 09:02
1394001 /usr/lib/gconv/gconv-modules.cache
7fc85d147000-7fc85d1b8000 rw-p 00000000 00:00 0
7fc85d1c4000-7fc85d1c8000 rw-p 00000000 00:00 0
7fc85d1c8000-7fc85d1c9000 r--p 0001d000 09:02
3516395 /lib/ld-2.11.3.so
7fc85d1c9000-7fc85d1ca000 rw-p 0001e000 09:02
3516395 /lib/ld-2.11.3.so
7fc85d1ca000-7fc85d1cb000 rw-p 00000000 00:00 0
7fffea790000-7fffea7a5000 rw-p 00000000 00:00
0 [stack]
7fffea7ff000-7fffea800000 r-xp 00000000 00:00
0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00
0 [vsyscall]

1 Attachments

Related

Bugs: #116

Discussion

  • Matthias Andree

    Matthias Andree - 2014-02-23

    Hi Boris,

    sorry to see there is trouble. While I cannot confirm the crash at this time, I see that valgrind complains about three issues in the lexer, so there is some hope I can debug this. It may take me a couple of days to debug, though.

    Thank you for providing the relevant input!

    Cheers,
    Matthias

     
  • Matthias Andree

    Matthias Andree - 2014-02-23
    • assigned_to: Matthias Andree
     
  • Matthias Andree

    Matthias Andree - 2014-02-23

    Boris, taking a first glance, I need to know a bit more information:

    • which bogofilter version are you using?

    • did your lexer_v3.c file get regenerated on your computer by running flex, or was the file used unaltered from the tarball? If it was generated on your computer, I'd need to know the flex version ("flex --version" prints it; if flex is not installed on your computer, that is fine).

    • how have you installed it, which ./configure options did you use, what Linux variant and version are you on?

    • do you happen to have sufficient debug information in bogofilter so that addr2line can translate the addresses you got from glibc to code lines?

    • what options do you use on bogofilter's command line, if any?

     
  • Boris 'pi' Piwinger

    bogofilter version 1.2.4
    Database: Berkeley DB 4.8.30: (April 9, 2010) AUTO-XA

    I did compile bogofilter myself, though I do not see that that uses flex.

    env ./configure --prefix=$WHERE --sysconfdir=$WHERE
    $WHERE is simple the local path (user installation)

    Linux 2.6.32-5-amd64 #1 SMP Sun Sep 23 10:07:46 UTC 2012 x86_64 GNU/Linux

    The error comes without any command line options, yet there is ~/.bogofilter.cf:
    ham_cutoff=0.0
    spamicity_tags=Spam, Ham
    spamicity_formats=%0.3f, %0.3f
    header_format=%h: %c, spamicity=%p, version=%v
    db_cachesize=3
    timestamp=0
    robx=0.499
    min_dev=0.3
    robs=0.1
    spam_cutoff=0.5
    min-token-len=1
    multi-token-count=1

    HTH, please let me know if there is more I can tell you.

    pi

     
  • Matthias Andree

    Matthias Andree - 2014-02-24

    Boris, I cannot reproduce the issue. All I can trigger is a lexer read of uninitialized memory.

    If you had built bogofilter from an official tarball, it should not run flex as part of the build because there was a prebuilt lexer_v3.c in the tarball.

    I would require a proper backtrace; you might also try installing an up-to-date valgrind and run bogofilter or bogolexer (if that suffices to trigger the bug) on the problematic mail. valgrind should print a backtrace. You might also try filtering your error messages from glib through addr2line and see if that fills in the source code lines.

    Make sure you do not use "make install-strip".

    Without a proper backtrace, I will unfortunately be unable to debug this.

     
  • Matthias Andree

    Matthias Andree - 2014-02-24

    You will need an executable compiled and installed with at least line number information. That should be the default if you build yourself.

    Then addr2line -a -f -p -e /path/to/bogofilter 0x40b34e
    should print the line number for the given address - try this with all bogofilter and libc 0x... addresses from your backtrace, if that gives you useful information, paste the info and you're done. You can give multiple addresses on the command line.

    For a different issue I provoked I get this output, for example:

    $ addr2line -p -a -f -e src/bogolexer 0x40592C 0x40A7BD
    0x000000000040592c: yylex at /path/to/src/lexer_v3.c:2469
    0x000000000040a7bd: parse_new_token at /path/to/src/token.c:206

    If you get question marks, that does not help...

     
    • Boris 'pi' Piwinger

      addr2line does not know -a and -p, yet:

      $ addr2line -f -e ~/bin/bogofilter 0x40b34e 0x410a29 0x406305 0x402cf5
      0x404c99 0x402ebc 0x402a09
      yy_get_next_buffer
      /home/pi/build-bogofilter/bogofilter-1.2.4/src/lexer_v3.c:3185
      parse_new_token
      /home/pi/build-bogofilter/bogofilter-1.2.4/src/token.c:206
      collect_words
      /home/pi/build-bogofilter/bogofilter-1.2.4/src/collect.c:50
      bogofilter
      /home/pi/build-bogofilter/bogofilter-1.2.4/src/bogofilter.c:99
      bogomain
      /home/pi/build-bogofilter/bogofilter-1.2.4/src/bogomain.c:69
      main
      /home/pi/build-bogofilter/bogofilter-1.2.4/src/main.c:33
      _start
      ??:0

      Matthias Andree wrote:

      You will need an executable compiled and installed with at least
      line number information. That should be the default if you build
      yourself.

      Then addr2line -a -f -p -e /path/to/bogofilter 0x40b34e
      should print the line number for the given address - try this with
      all bogofilter and libc 0x... addresses from your backtrace, if that
      gives you useful information, paste the info and you're done. You
      can give multiple addresses on the command line.

      For a different issue I provoked I get this output, for example:

      $ addr2line -p -a -f -e src/bogolexer 0x40592C 0x40A7BD
      0x000000000040592c: yylex at /path/to/src/lexer_v3.c:2469
      0x000000000040a7bd: parse_new_token at /path/to/src/token.c:206

      If you get question marks, that does /not/ help...


      [bugs:#116] http://sourceforge.net/p/bogofilter/bugs/116/ Mail
      crashes bogofilter

      Status: open
      Labels: crash
      Created: Sun Feb 23, 2014 04:36 PM UTC by Boris 'pi' Piwinger
      Last Updated: Mon Feb 24, 2014 10:11 PM UTC
      Owner: Matthias Andree

      Running attached mail through bogofilter crashes, see attachments.
      This is the one and only message in >>100.000 I have which causes
      this issue.

      /glibc detected / bogofilter: realloc(): invalid next size:
      0x00000000018b6ea0 ***
      ======= Backtrace: =========
      /lib/libc.so.6(+0x71e16)[0x7fc85c6bee16]
      /lib/libc.so.6(+0x77a2c)[0x7fc85c6c4a2c]
      /lib/libc.so.6(realloc+0xf0)[0x7fc85c6c4d40]
      bogofilter[0x40b34e]
      bogofilter[0x410a29]
      bogofilter[0x406305]
      bogofilter[0x402cf5]
      bogofilter[0x404c99]
      bogofilter[0x402ebc]
      /lib/libc.so.6(__libc_start_main+0xfd)[0x7fc85c66bc8d]
      bogofilter[0x402a09]
      ======= Memory map: ========
      00400000-00443000 r-xp 00000000 09:02
      3629819 /home/pi/bin/Bogo/bin/bogofilter
      00643000-00647000 rw-p 00043000 09:02
      3629819 /home/pi/bin/Bogo/bin/bogofilter
      00647000-0064a000 rw-p 00000000 00:00 0
      01889000-018d0000 rw-p 00000000 00:00
      0 [heap]
      7fc854000000-7fc854021000 rw-p 00000000 00:00 0
      7fc854021000-7fc858000000 ---p 00000000 00:00 0
      7fc85b7e0000-7fc85b7f6000 r-xp 00000000 09:02
      3514370 /lib/libgcc_s.so.1
      7fc85b7f6000-7fc85b9f5000 ---p 00016000 09:02
      3514370 /lib/libgcc_s.so.1
      7fc85b9f5000-7fc85b9f6000 rw-p 00015000 09:02
      3514370 /lib/libgcc_s.so.1
      7fc85b9f6000-7fc85b9f8000 r-xp 00000000 09:02
      1393964 /usr/lib/gconv/ISO8859-1.so
      7fc85b9f8000-7fc85bbf7000 ---p 00002000 09:02
      1393964 /usr/lib/gconv/ISO8859-1.so
      7fc85bbf7000-7fc85bbf8000 r--p 00001000 09:02
      1393964 /usr/lib/gconv/ISO8859-1.so
      7fc85bbf8000-7fc85bbf9000 rw-p 00002000 09:02
      1393964 /usr/lib/gconv/ISO8859-1.so
      7fc85bbf9000-7fc85bc05000 r-xp 00000000 09:02
      3514422 /lib/libnss_files-2.11.3.so
      7fc85bc05000-7fc85be04000 ---p 0000c000 09:02
      3514422 /lib/libnss_files-2.11.3.so
      7fc85be04000-7fc85be05000 r--p 0000b000 09:02
      3514422 /lib/libnss_files-2.11.3.so
      7fc85be05000-7fc85be06000 rw-p 0000c000 09:02
      3514422 /lib/libnss_files-2.11.3.so
      7fc85be06000-7fc85be10000 r-xp 00000000 09:02
      3516413 /lib/libnss_nis-2.11.3.so
      7fc85be10000-7fc85c00f000 ---p 0000a000 09:02
      3516413 /lib/libnss_nis-2.11.3.so
      7fc85c00f000-7fc85c010000 r--p 00009000 09:02
      3516413 /lib/libnss_nis-2.11.3.so
      7fc85c010000-7fc85c011000 rw-p 0000a000 09:02
      3516413 /lib/libnss_nis-2.11.3.so
      7fc85c011000-7fc85c026000 r-xp 00000000 09:02
      3516403 /lib/libnsl-2.11.3.so
      7fc85c026000-7fc85c225000 ---p 00015000 09:02
      3516403 /lib/libnsl-2.11.3.so
      7fc85c225000-7fc85c226000 r--p 00014000 09:02
      3516403 /lib/libnsl-2.11.3.so
      7fc85c226000-7fc85c227000 rw-p 00015000 09:02
      3516403 /lib/libnsl-2.11.3.so
      7fc85c227000-7fc85c229000 rw-p 00000000 00:00 0
      7fc85c229000-7fc85c230000 r-xp 00000000 09:02
      3516398 /lib/libnss_compat-2.11.3.so
      7fc85c230000-7fc85c42f000 ---p 00007000 09:02
      3516398 /lib/libnss_compat-2.11.3.so
      7fc85c42f000-7fc85c430000 r--p 00006000 09:02
      3516398 /lib/libnss_compat-2.11.3.so
      7fc85c430000-7fc85c431000 rw-p 00007000 09:02
      3516398 /lib/libnss_compat-2.11.3.so
      7fc85c431000-7fc85c448000 r-xp 00000000 09:02
      3516394 /lib/libpthread-2.11.3.so
      7fc85c448000-7fc85c647000 ---p 00017000 09:02
      3516394 /lib/libpthread-2.11.3.so
      7fc85c647000-7fc85c648000 r--p 00016000 09:02
      3516394 /lib/libpthread-2.11.3.so
      7fc85c648000-7fc85c649000 rw-p 00017000 09:02
      3516394 /lib/libpthread-2.11.3.so
      7fc85c649000-7fc85c64d000 rw-p 00000000 00:00 0
      7fc85c64d000-7fc85c7a6000 r-xp 00000000 09:02
      3516399 /lib/libc-2.11.3.so
      7fc85c7a6000-7fc85c9a5000 ---p 00159000 09:02
      3516399 /lib/libc-2.11.3.so
      7fc85c9a5000-7fc85c9a9000 r--p 00158000 09:02
      3516399 /lib/libc-2.11.3.so
      7fc85c9a9000-7fc85c9aa000 rw-p 0015c000 09:02
      3516399 /lib/libc-2.11.3.so
      7fc85c9aa000-7fc85c9af000 rw-p 00000000 00:00 0
      7fc85c9af000-7fc85ca2f000 r-xp 00000000 09:02
      3516412 /lib/libm-2.11.3.so
      7fc85ca2f000-7fc85cc2f000 ---p 00080000 09:02
      3516412 /lib/libm-2.11.3.so
      7fc85cc2f000-7fc85cc30000 r--p 00080000 09:02
      3516412 /lib/libm-2.11.3.so
      7fc85cc30000-7fc85cc31000 rw-p 00081000 09:02
      3516412 /lib/libm-2.11.3.so
      7fc85cc31000-7fc85cda7000 r-xp 00000000 09:02
      1385361 /usr/lib/libdb-4.8.so
      7fc85cda7000-7fc85cfa6000 ---p 00176000 09:02
      1385361 /usr/lib/libdb-4.8.so
      7fc85cfa6000-7fc85cfab000 rw-p 00175000 09:02
      1385361 /usr/lib/libdb-4.8.so
      7fc85cfab000-7fc85cfc9000 r-xp 00000000 09:02
      3516395 /lib/ld-2.11.3.so
      7fc85d140000-7fc85d147000 r--s 00000000 09:02
      1394001 /usr/lib/gconv/gconv-modules.cache
      7fc85d147000-7fc85d1b8000 rw-p 00000000 00:00 0
      7fc85d1c4000-7fc85d1c8000 rw-p 00000000 00:00 0
      7fc85d1c8000-7fc85d1c9000 r--p 0001d000 09:02
      3516395 /lib/ld-2.11.3.so
      7fc85d1c9000-7fc85d1ca000 rw-p 0001e000 09:02
      3516395 /lib/ld-2.11.3.so
      7fc85d1ca000-7fc85d1cb000 rw-p 00000000 00:00 0
      7fffea790000-7fffea7a5000 rw-p 00000000 00:00
      0 [stack]
      7fffea7ff000-7fffea800000 r-xp 00000000 00:00
      0 [vdso]
      ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00
      0 [vsyscall]


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/bogofilter/bugs/116/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       

      Related

      Bugs: #116

  • Matthias Andree

    Matthias Andree - 2014-02-24

    You can also try:

    sed 's/.*0x/0x/' | addr2line -p -a -f -e /home/pi/bin/Bogo/bin/bogofilter

    Then paste your backtrace, press enter, and press Ctrl-D to send bogolexer an end-of-input marker so it terminates. The output should be as in my previous comment.

     
  • Matthias Andree

    Matthias Andree - 2014-02-25

    Thanks, much better.

    Unfortunately you might be hitting one of the nasty corner cases in the current bogofilter tokenizer, and fixing those may require my rewriting major parts of the lexing (most importantly, separate concerns so we can parse input without ever pushing back nontrivial amounts of data or without rejecting rules). This isn't going to start in the next two weeks though.

     
  • Boris 'pi' Piwinger

    Thanks for your help. In case it helps I have another testcase which also crashes.

     
  • Simone Lazzaris

    Simone Lazzaris - 2015-06-25

    Hi,
    I've stumbled upon the very same bug.
    Using addr2line I've rebuild the call stack:

    /usr/local/src/bogofilter-1.2.4/src/lexer_v3.c:3186
    /usr/local/src/bogofilter-1.2.4/src/token.c:214
    /usr/local/src/bogofilter-1.2.4/src/collect.c:50
    /usr/local/src/bogofilter-1.2.4/src/bogofilter.c:99
    /usr/local/src/bogofilter-1.2.4/src/bogomain.c:69
    /usr/local/src/bogofilter-1.2.4/src/main.c:33

    So it seems that the bug is somewhere around the reallocation of the buffer.
    I've also found that disabling unicode (./configure --disable-unicode) seems to fix the behaviour.

    Without any knowledge of the subject, I can only guess that maybe (maybe) there are some inconsistences regarding the length of the unicode strings vs the length of the character array.

    Anyway, I've tried to increase the read buffer length (YY_BUF_SIZE) from 16384 to 65536 and the bug is not triggered anymore in our samples. Clearly, it is an horrible kludge that only raises the bar, without solving anything.

     
  • Doran

    Doran - 2016-08-29

    I have traced this issue back to a bad interaction between bogofilter and flex 2.5.36-2.6.0, related to #113. These versions of flex didn't handle yyinput() returning a larger result than expected, leading to the crash observed in this ticket.

    Building with flex-2.6.1 makes the crash go away, as does the attached patch. The patch is very rough and probably incorrect, but I'm not clear how to properly avoid (count > size) in bogofilter. Possibly there is a simple fix by adjusting the buff_shift() call ~30 lines earlier in yyinput().

    Strictly reading the flex documentation, I think bogofilter should avoid yyinput() returning a result larger than its size/max_size argument.

     
  • Matthias Andree

    Matthias Andree - 2016-09-01

    Indeed the flex documentation suggests - without any surprises here - that max_size is the limit, and so exceeding that is unspecified behaviour and can wreak havoc.

    Of course if yyinput() writes more characters than requested, that's a bug that we need to solve within bogofilter. I only wonder if Doran's patch is the right approach, in particular I'm afraid that the damage may already have happened even if we truncate the return value. I think I need to experiment with clang's instrumentation features (not sure if address sanitizer can catch that) and valgrind a bit.

     
  • Doran

    Doran - 2016-09-02

    I'm pretty sure the patch already provided is not the right approach .. :-).

    I've done some spelunking to try and find the root cause here, leading to a series of patches against 1.2.4 (attached: patches-vs-1.2.4.tgz). These will need to be forward-ported to trunk, but the result tests clean and plays more nicely with flex. Hopefully the following wall of text helps you determine if my approach is sound and worth putting on a branch.

    At the bottom of the test cases on #116, I discovered that the call to
    buff_shift() goes wrong. This is fixed in trunk (r7030), but I took a slightly
    different path that re-works yyinput() to be a bit more independent of flex.

    0001-Fix-assertion-in-buff_shift.patch
    

    After staring at traces for a while and digging through svn history, I decided
    there were a few issues in yyinput(), of which this ticket is but one symptom:

    #1: the loop attempting to truncate tokens wasn't really neccessary, and made a confusing call graph

    #2: dangerously wrapping a buff_t around the flex-provided (char*)

    #1: the loop

    for the origins of the loop, see (among others):

    2003-02-26:  r1723:  Enable fast handling of overly long tokens.
    2003-05-17:  r2276:  Add special code to deal with very long alphanumeric strings.
        /* After reading a line of text, check if it starts with lots of 
         * alphanumerics.  If so, trim some, but leave enough to match a max 
         * length token.  Then read more text.  
         * This will ensure that a really long sequence of alphanumerics,
         * which bogofilter will ignore anyway, doesn't crash the flex lexer.
         */
    2003-05-25:  r2364:  Tweak long line check.
        /* Check for lines wholly composed of printable characters as they can cause a scanner abort 
           "input buffer overflow, can't enlarge buffer because scanner uses REJECT"
        */
    

    Note this comment is clearly obsolete today: the quoted message comes from
    flex.skl, and would be included literally in lexer_v3.c if not for %option noreject noyywrap, which ensures this error can never happen. Bogofilter
    apparently (now) carefully avoids REJECT so as to not carry the performance
    penalty.

    So getting rid of the loop seemed like a good start:

    0002-make-yyinput-call-get_decoded_line-only-once.patch
    

    As a result of this, the "used" parameter (which comes from flex's write-only
    "result" parameter to YY_INPUT() - not really fit for its use in r4509) can be
    dropped:

    0003-remove-questionable-use-of-result-param-to-YY_INPUT.patch
    0004-lexer_v3.l-updated-for-previous-patch.patch
    

    About this time I was looking at r6973, suspicious of the ((uint) <= 0)
    comparison, so I added an assert to convert():

    0005-move-BOGO_ASSERT-into-common.h.patch
    0006-assert-to-catch-t.crash-invalid-base64.patch
    

    and playing with the test case, decided that convert() was misusing dst->read:
    buff_t.read is documented as pointing to the beginning of the last read, so it
    has no role here.

    0007-don-t-mis-use-dst-read-in-calculation-of-outbyteslef.patch
    

    #2: dangerously wrapping

    buff_t is a dynamically-managed type, designed to be resizeable. But
    attempting to realloc() the char* owned by flex can only lead to bad news.

    On trunk, there are a couple of locations will that try to reallocate:

        yyinput(...) {
            .. 
            get_decoded_line(&buff)
        }
        static int get_decoded_line(buff_t *buff) {
            ..
            if(buff->t.leng == 0)
                memcpy(buff, linebuff, sizeof(*buff));      /* oops ! */
        }
    
        static int get_decoded_line(buff_t *buff)
            count = yy_get_new_line(linebuff);
    
        static int yy_get_new_line(buff_t *buff)
            int count = (*reader_getline)(buff);            /* either mailbox_getline or simple_getline */
    
        static int mailbox_getline(buff_t *buff)
            static word_t *saved = NULL;
            if (saved != NULL) {
                buff_add(buff, saved);                      /* buff_add can xrealloc()! */
            }
    

    The first was only introduced in r7016, and a bit unclear: the comment says
    "avoid returning count = 0", but the code doesn't reflect that precisely. It
    also doesn't respect max_size (and looks like a memory leak), so needs more
    attention. I added the test:

    0008-add-t.passthrough-truncation.patch
    

    and by dint of printf() debuffing, discovered the root cause: flex would
    sometimes call YY_INPUT with size=0, which resulted in iconv() raising errno
    E2BIG.

    This also needs a dynamic buffer to resolve: if we can't rely on flex being
    able to consume a whole utf-8 decode worth of bytes, we need to stash the
    result and copy it into flex's buffer over multiple calls.

    It took a couple of false starts, but that came out as:

    0009-use-a-dynamic-buff_t-for-reading-to-ensure-flex-neve.patch
    

    and a minor bugfix:

    0010-update-buff.t.leng-in-CRLF-translation.patch
    

    .. which passes the full test suite on 1.2.4 + t.passthrough-truncation and
    handles all the examples on this ticket without crashing or valgrind
    complaints.

    Past troubles / reference

    Other past issues that might be attributed to this loop or the buff_t issue.
    These should at least be kept in mind when evaluating the new interface.

    2005-10-24:  r6272: Fix possible SIGSEGV with long html comments.
        /* 10/23/05 - fix SIGSEGV with msg.1023.6479.txt
        ** evidently caused by 09/07/05 patch for 0.96.2
        */
     .. I can't see what patch that is referring to
    
    2004-06-14:  r4509: Deal with the cause of 'Invalid buffer size, exiting.' messages.
        /* Note: some malformed messages can cause xfgetsl() to report
        ** "Invalid buffer size, exiting."  ** and then abort.  This
        ** can happen when the parser is in html mode and there's a
        ** leading '<' but no closing '>'.
        */
    
    2003-09-05:  r2988:  Prevent 'invalid buffer size' error.
    
     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks