err:22, Invalid Argument showing up during bulk load
Fast Bayesian spam filter along lines suggested by Paul Graham
Brought to you by:
m-a
I'm bulk processing 10,000 SPAM messages with
formail -s bogofilter -s -v < mbox
and seeing sporadic errors of the form:
bogofilter: (db) db_get_dbvalue( 'Any' ), err: 22, Invalid
argument
where 'Any' is replaced sometimes with another word that
starts with "A". Is this some kind of artifact of concurrent
access? That is, if I'm also using bogofilter via procmail, are
those errors related to blocking?
Logged In: YES
user_id=532486
More information: bogofilter 0.16.3 running on MacOS X 10.3.2;
built with Berkeley DB 4.2. bogofilter is running with -u in the
procmail script -- thus possibility of two registrations attempted to
run at the same time.
Logged In: YES
user_id=30510
Paul,
If you're bulk registering messages, there's no need for
formail. Bogofilter recognizes mailboxes during
registration and you can simply use "bogofilter -s -v -M <
mbox".
Do you have separate wordlists (spamlist.db and goodlist.db)
or a combined wordlist (wordlist.db)?
David
Logged In: YES
user_id=532486
I have a combined wordlist. I ran db_verify and it turned out that
there was some corruption. I have used formail after getting
some inconsistent results with the -M flag. (bogofilter didn't
recognize message boundaries properly.)
I've started fresh with a new database but would like to
understand concurrency constraints.
Logged In: YES
user_id=30510
Paul,
You mention "bogofilter didn't recognize message
boundaries". AFAIK that's been working properly for over a
year. If you've got a mailbox that does that, please gzip
it and email it to me (relson@users.sourceforge.net).
As to concurrency, the bogofilter/BerkeleyDB combination is
working well on many Unix systems (and others too). Running
"make check" after building bogofilter will test the locking
of your environment (specifically tests t.lock1 and t.lock2,
which are the last two). Other people have it running fine
on MacOS-X, so there's something different in your environment.
I'd suggest subscribing to bogofilter@aotto.com and posting
your problem there. Likely one of the MacOS X users will
have ideas that can help you.
You mention procmail. Are you using lockfiles? I don't
know if they're necessary, but they might help you.
David
Logged In: YES
user_id=30510
Paul,
You mention "bogofilter didn't recognize message
boundaries". AFAIK that's been working properly for over a
year. If you've got a mailbox that does that, please gzip
it and email it to me (relson@users.sourceforge.net).
As to concurrency, the bogofilter/BerkeleyDB combination is
working well on many Unix systems (and others too). Running
"make check" after building bogofilter will test the locking
of your environment (specifically tests t.lock1 and t.lock2,
which are the last two). Other people have it running fine
on MacOS-X, so there's something different in your environment.
I'd suggest subscribing to bogofilter@aotto.com and posting
your problem there. Likely one of the MacOS X users will
have ideas that can help you.
You mention procmail. Are you using lockfiles? I don't
know if they're necessary, but they might help you.
David
Logged In: YES
user_id=532486
I'll see if I can figure out which mbox was causing the problem; I
have quite a few...
As for locking, t.lock1 and t.lock2 both pass on make check. I
think I'll just be careful and see if it recurs. Thanks for your help.