Your probably using the berkely DB (default) If you switch over to using the
PBL database then
DB sizes are less, but much more ... they dont grow astronomically.
If your going to have many users thats a MUST ...
Plus you may need to mandate using frequent cleanups to avoid infinate DB use
(10,000 users ... that could get big)
> First, I'd like to say that Spamprobe is awesome. The results I've been
> seeing in my testing are great. I particularly like the way it "auto-trains"
> itself on spam/ham and just needs nudges to retrain on mistakes. After using
> it for two weeks, the error rate is *very* low.
> My question is about the size of the databases. My personal db after two
> weeks (using "train" mode instead of "receive") is about 45MB. The suggested
> time interval for pruning is 14 days, so I figure a 2-week db should indicate
> a lower bound on the expected db size over time. Is that reasoning correct?
> I don't know the exact message count, but I've probably received about 10,000
> 12,000 messages in that time.
> The reason I ask is that I'm looking to turn spamprobe loose on a population
> of about 10,000 users. I supsect 20-50% will want to use it. I'm sure most
> of them get less mail than I do, but even assuming dbs only 25% as large as
> mine, I'm looking at 50G+ to store databases. Has anyone else used Spamprobe
> on a similar scale? Does that sound like the right ballpark to expect?
> I needed to be able to run spamprobe on a different server from my mail
> delivery machine. So I've written a client/server wrapper around it. I run
> the client wrapper on the mail delivery machine from the user's procmail or
> maildrop config, which sends the message to the server for processing. The
> server spawns spamprobe, feeds it the message (or mailbox), and returns the
> output, if any, to the client. If there's any interest in incorporating this
> in the spamprobe distribution, let me know and I'll send the code after a
> little more testing and cleanup.
> Mark Costlow | Southwest Cyberport | Fax: +1-505-232-7975
> cheeks@... | Web: http://www.swcp.com | Voice: +1-505-232-7992
> "Education is never a waste" - Viscount du Valmont
> This SF.net email is sponsored by: Perforce Software.
> Perforce is the Fast Software Configuration Management System offering
> advanced branching capabilities and atomic changes on 50+ platforms.
> Free Eval! http://www.perforce.com/perforce/loadprog.html
> Spamprobe-users mailing list