#100 Got something out now

closed
nobody
None
5
2006-12-06
2006-11-29
Thomas Chmara
No

My system is

Linux bermudahonk 2.6.17.7 #4 Mon Oct 2 14:44:52 CEST 2006 x86_64 GNU/Linux

Debian sid

bogofilter 1.1.1

Over the last past days i always got segfaults trying to update the wordlist using it on my spam folder. So I took closer look on those mails and found the reason. Every mail that has an apostrophe in the mail atribute "From:" (which as I think isn't right in the first place) bogofilter crashes when used to create the worldlist like

find $HOME/Mail/Spam -type f | bogofilter -v -s -b

I've attached one email it does crash with.

Discussion

  • Thomas Chmara
    Thomas Chmara
    2006-11-29

    Example of spam mail that crashes bogofilter

     
  • Thomas Chmara
    Thomas Chmara
    2006-11-29

    Logged In: YES
    user_id=1657046
    Originator: YES

    After further checking I found out it crashes with charset (in Content-Type) set to

    windows-1250
    windows-1252
    us-ascii

    and it's maby also system related. I've changed one email from "us-ascii" to "iso-8859-15" and this time it got through.

     
  • Thomas Chmara
    Thomas Chmara
    2006-11-29

    • summary: segfault on apostrophe used in mail atribue "from" --> OK, it's not apostroph, the crash is charset related
     
  • Thomas Chmara
    Thomas Chmara
    2006-11-29

    strace output

     
    Attachments
  • David Relson
    David Relson
    2006-11-29

    Logged In: YES
    user_id=30510
    Originator: NO

    Your sample works fine with my copy of bogofilter. Are you using a config file? What's in it?

    As a test I ran bogofilter's parser, i.e. bogolexer, as follows:

    ### bogolexer -p -C < 1164818729.570.nlkTS:2,S | tee bogolexer.out | wc -l
    265

    What happens on your machine? Can you gzip the bogolexer.out file and post it?

    I also created copies of the message replacing us-ascii by windows-1250 and windows-1252. Again, works fine for me.

    A debugger backtrace would be helpful, i.e. rebuild bogofilter/bogolexer with "-O0" flags ( oh-zero ), run in gdb, and send a backtrace when you encounter the segfault.

    Thanks.

    David

    P.S. You can email me directly at relson@users.sourceforge.net

     
  • Thomas Chmara
    Thomas Chmara
    2006-11-30

    • summary: OK, it's not apostroph, the crash is charset related --> Got something out now
     
  • Thomas Chmara
    Thomas Chmara
    2006-11-30

    Logged In: YES
    user_id=1657046
    Originator: YES

    Ok, I've checked the example I did upload and it worked with

    bogolexer -p -C < input

    Then I created an if-cycle an checked my whole spam folder. Got some hits. So far "bogolexer" do crash on random data (used in spam) in Recieve argument, especially when "id" is followed by these signs ()@!?. See examples attached. I will deliver a debugger backtrace soon.

     
  • Thomas Chmara
    Thomas Chmara
    2006-11-30

    four examples that crash bogolexer

     
    Attachments
  • David Relson
    David Relson
    2006-11-30

    Logged In: YES
    user_id=30510
    Originator: NO

    Sorry, I still can't reproduce your problem. I've run your 4 examples through bogolexer on 2 machines (a P-III running Mandrake 10.1 and an Athlon XP running Gentoo). The outputs are the same, i.e.

    ### command ###
    for N in bug-1605523.d/1164* ; do echo $N ; bogolexer -p -C < $N | wc -l ; done
    ### result ###
    bug-1605523.d/1164646036.595.f5ccV:2,S
    299
    bug-1605523.d/1164818724.570.OFh3g:2,S
    307
    bug-1605523.d/1164818726.570.CKYT7:2,S
    314
    bug-1605523.d/1164832843.2426.Ti7WR:2,S
    299

    Given access to your machine, I can likely find out what's wrong. Is an ssh session possible?

     
  • David Relson
    David Relson
    2006-12-06

    Logged In: YES
    user_id=30510
    Originator: NO

    Problem caused by incorrect rule for message IDs in lexer_v3.l. Fixed in bogofilter-1.1.3

     
  • David Relson
    David Relson
    2006-12-06

    • status: open --> closed