On Nov 15, 2006, at 1:53 PM, Brian Burton wrote:
> Right now you have to dump|wc to get the number of terms in the
> database. I agree that's not very good.
Heh. On all this talk about database sizes, I decided to go look
at mine. Apparently, I've also been guilty of doing no regular
cleaning, and while I haven't noticed any performance problems (I'm
only processing my mail, and that probably averages about 10-12/
hour. Oh, tho that's ham. I don't know about spam. :-)) I wanted
to pass on a "large" database. Unreasonably large. :-)
I looked at my Berkeley DB database file, and it's nearly 550 MB.
I did a dump|wc -l, which by the way too *quite* a while, and it
indicates shortly over 7.9 million terms. heh! So, a good time to
clean out some old terms. I ran a "cleanup 100 90", which too about
20 minutes to complete. After that, another dump | wc (which took 10
minutes this time. Sadly, I forgot to time it the first time.)
showed I'd dropped to 1.33 million terms. Hmm, wonder if I should've
used a smaller number than 100?
By the way, what does "spamprobe counts" do? The help lists it as
"Prints some or all terms in database," but I'm not sure what that
means. And it's not documented in the man page (which is a bug, I
suppose). In my case, it yielded:
GOOD 53132 SPAM 72415
before the cleanup, and
GOOD 53135 SPAM 72427
after. So, I assume doesn't have anything to do with the database
size. Number of emails scored, perhaps? If so, I'm surprised to see
that many "GOOD"s...