Menu

#42 Problem with big databases

closed
nobody
None
5
2003-03-26
2003-03-24
Jorge Godoy
No

Hi. I'm using bogofilter on a daily basis for something
like 2K messages/day. It works perfectly but when the
database reaches 51200000 bytes it just stops working
and for what I can see indices get corrupted (I get
lots of messages with regard to them when I run
db(3|4)_verify on that file).

I'm running FreeBSD 5.

Bogofilter is run from procmail with these rules:

:0fw
| bogofilter -u -e -p

:0e
{ EXITCODE=75 HOST }

and the error messages I get on Postfix mail queue are:

[godoy@wintermute ~]$ mailq
-Queue ID- --Size-- ----Arrival Time----
-Sender/Recipient-------
19C84413C 12228 Mon Mar 24 12:36:33
retorno@fiscosoft.com.br
(temporary failure. Command output: bogofilter: (db)
db_getvalue( '_ltimos' ), err: -30988,
DB_PAGE_NOTFOUND: Requested page not found procmail:
Program failure (2) of "bogofilter" procmail: Rescue of
unfiltered data succeeded )

godoy@godoy.homeip.net

-- 12 Kbytes in 1 Request.
[godoy@wintermute ~]$

If I remove the bogofilter instructions from
.procmailrc, the message goes thru nicely. I've seen
this behaviour in version 0.10.something and now on
0.11.3 (both installed from ports, i.e., compiled on my
machine).

[godoy@wintermute ~]$ bogofilter -V

bogofilter version 0.11.1.3
Copyright (C) 2002 Eric S. Raymond
(...)

I get my mail with fetchmail, which sends it to Postfix
that uses procmail as my MDA. As above, procmail calls
bogofilter on each and every message. I update the spam
database regularly with 'bogofilter -s', using Gnus'
spam.el.

If I can help finding out what is happening, just drop
me a message.

It seems that when the database reaches a certain size,
some data gets dropped out, but indices and references
to them doesn't get updated. This is just a guess...

TIA.
Godoy.

Discussion

  • David Relson

    David Relson - 2003-03-25

    Logged In: YES
    user_id=30510

    Godoy,

    From what you're saying, the problem seems to be in db3/4.
    Have you reported the problem to SleepyCat? What was the
    response?

    David

     
  • Jorge Godoy

    Jorge Godoy - 2003-03-25

    Logged In: YES
    user_id=100502

    No I haven't tried contacting them. I think their website is
    lot confusing... I'll look for some contact information
    there and try submitting it as a bug report.

    Are you closing this or should I return their response?

    Thanks.

     
  • David Relson

    David Relson - 2003-03-25

    Logged In: YES
    user_id=30510

    Godoy,

    I'm leaving this open as there are other members of the
    bogofilter team who are more knowledgeable about db3/4 than
    am I. They may more information, questions, or suggestions
    for you.

    Also I recommend that you subscribe to the bogofilter
    mailing list and post your problems there. That may elicit
    some information that will help you.

    David

     
  • Wolfram Schlich

    Wolfram Schlich - 2003-03-25

    Logged In: YES
    user_id=230355

    I've had nearly the same problem (bogofilter was dying when
    writing to the database files which exceeded a certain size. It
    can be solved by setting the mailbox size limit in the postfix
    config to unlimited (the default is 51200000 - 50MB):

    /etc/postfix/main.cf:

    --8<--

    mailbox_size_limit = 0

    --8<--

    Seems like Postfix sets the corresponding ulimits to the size
    set there.

     
  • Jorge Godoy

    Jorge Godoy - 2003-03-25

    Logged In: YES
    user_id=100502

    No I haven't tried contacting them. I think their website is
    lot confusing... I'll look for some contact information
    there and try submitting it as a bug report.

    Are you closing this or should I return their response?

    Thanks.

     
  • Jorge Godoy

    Jorge Godoy - 2003-03-25

    Logged In: YES
    user_id=100502

    It seems to have worked, even though I still had one of these:

    Mar 25 07:57:40 wintermute postfix/local[48441]: D4F05469E:
    to=<godoy@godoy.homeip.net>, relay=local, delay=2,
    status=deferred (temporary failure. Command output:
    bogofilter: (db) db_getvalue( '____' ), err: -30988,
    DB_PAGE_NOTFOUND: Requested page not found procmail: Program
    failure (2) of "bogofilter" procmail: Rescue of unfiltered
    data succeeded )

    I believe that I need to reset my database... Or, even
    better: do you have some tool to cleanup unreferenced words?
    I mean those words that link to a position where there's no
    entry or an invalid entry... Removing them from the database
    would be very good instead of restarting it again for the
    third time... :-)

     
  • David Relson

    David Relson - 2003-03-25

    Logged In: YES
    user_id=30510

    Godoy,

    bogoutil can be used to dump/load wordlists. see the man
    page for more info. Unfortunately, I can't say how well
    it'll do with a corrupt database. Its wordlist maintenance
    functions can be used to delete tokens with low reference
    counts.

    The tools that come with db3/4 are likely to do the right
    thing. Have you looked at db_dump, db_recover, etc?

    David

     
  • Jorge Godoy

    Jorge Godoy - 2003-03-25

    Logged In: YES
    user_id=100502

    bogoutil aborts at undefined references. It can dump only
    about 5000 entries from my goodlist.db. The spamlist has
    more than 150K entries.

    I've also tried using db_* before filing this bug, but the
    same happens.

    I've recreated the goodlist.db from some messages of mine
    and it is, again, giving good results (it has near 50K
    entries in it).

    Thanks for your help. I consider my problem solved, if you
    wish closing the bug. It was Postfix limiting the size of
    files that could be created/used by the MDA and consequently
    bogofilter.

    The database corruption was a side effect and not easy to solve.

    Thanks.

     
  • David Relson

    David Relson - 2003-03-26
    • status: open --> closed
     

Log in to post a comment.

Auth0 Logo