I'm releasing, for public testing, a major update/rewrite of the
rebuilsdpamdb script.
In addition to the script output being significantly expanded, the
code has been heavily refactored and rewritten for performance and
maintainability.
Users should see a significant performance gain over the rebuild
script released with ASSP version 1.3.3.8 or older, and a slightly
smaller gain over the latest beta versions of the rebuild script.
This version is newer than anything previously released, if you have
been using the rebuildspamdb script from the beta website then I
thank you for testing some of my code already (Thanks!).
Now without further delay, a changelog! *gasps fill the room*
---------- Changelog ----------
New Features:
* Full MySQL back-end support for ASSP versions up to and including
1.3.5. **1.3.6 is not supported.
* Messages with Redlisted address in the header will be removed from the
corpus if redlist collection is disabled.
* Messages with Whitelisted address in the header will be removed from
the SPAM corpus.
* Messages matching the RedRe will be removed from the
corpus if RedRe collection is disabled.
* Messages matching the WhiteRe will be removed from the SPAM
corpus if WhiteRe collection is disabled.
* Removed messages are shown in the log file with the reason for the
removal.
Changes:
* Script will stop reading files from a directory when the number of
files imported matches MaxFiles.
eg: If a folder has 20,000+ files but MaxFiles is set to 16,000, only
16,000 files will be read from the directory, extra files are
ignored.
* The amount of bytes read read from files in the corpus will equal
`MaxBytes` or 10,000, whichever is smaller. Previous functionality was
to read 10,000 bytes.
* Perl code is fully "Strict" compliant.
Fixes:
* The Bayesian DB no longer improperly contains single words.
---------- Download ----------
http://assp.svn.sourceforge.net/viewvc/*checkout*/assp/Scripts/rebuildspamdb-2.pl
Bugs and feature requests can be sent to me personally or the added to
the tracker on sf.net under the "Bayesian Rebuild Script" category.
Kevin
|