This release fixes pair of bugs related to messages with no lines in their bodies. No new features were added.
This release fixes a bug in decoding malformed base64 encoded data that manifested itself in some architectures. Also corrects a typo in the online help. Some code cleanup and minor performance tweaks were added as well.
This release adds new features, contains numerous performance tweaks, and closes all open bug reports. Major new features in this release include image attachment tokenization to detect content-free image spams, direct support for Maildir mailboxes, and an online help system. In addition all open bugs have been fixed and a number of performance enhancements have been made based on callgrind profiling. In my experiments 1.4 is 25-100% faster than 1.2a depending on the database used.
This release restores the -o suspicious-tags feature that had been accidentally removed in 1.2. Also fixes a text attachment parsing bug introduced in 1.3x1 (text attachments were effectively parsed dozens of times).
This release adds support for Maildir (qmail) mailbox format. A Maildir directory can be used anywhere that an mbox or MBX file name could be used.
This version also has built in online help provided by a new "help" command. "spamprobe help" prints a list of all available commands. "spamprobe help score" prints help for the score command.
Two other new commands have been added. create-db creates a new empty database if none currently exists. create-config writes a new config file to the user's .spamprobe directory.... read more
This release fixes a few minor problems in 1.3x1 and improves the source code structure a bit more.
This release adds a new experimental feature. If configure detects a working libungif and gif_lib.h on the system it will add support for extracting useful terms from gif images in emails. These terms can provide much needed data for emails that contain nothing but headers and an image attachment.
This release fixes a bug in the -8 command line option that caused 8 bit characters to be treated as word boundaries.
This release moves all of the improvements from the latest experimental release into the stable release. Major improvements include a newer and more robust email parser and an alternative database format for people who can't use either PBL or BDB. This version includes a brand new email parsing algorithm that substantially improves parsing speed. The new parser captures more meaningful terms than the old one and also has the advantage of holding the entire message in memory where it can be manipulated in a variety of ways. ... read more
This release replaces the old hash data file implementation with a faster, more reliable one. The hash data file implementation uses a fixed size data file and performs I/O with twice the speed of the ISAM implementations (PBL and BDB).
The hash data file does not store the text of each term. Instead it stores only a 32 bit hash code computed from the terms themselves. As a result the format may, rarely, confuse one term for another if the two terms have the same hash value. Also since terms are not stored as text users cannot use the dump command to explore the terms in the database and see what words are spammy.... read more
Fast, intelligent, automatic spam detector using Paul Graham style Bayesian analysis of word counts in spam and non-spam emails. Filtering adapts to personal tastes automatically. No manual rule creation required. This release adds the final missing pieces to the new parser code. MBX files and Content-Length headers are now supported. Database cleanup when signals are caught has also been improved. I would like to move 1.1 into the stable branch fairly soon so if folks would test out this release and report any problems it would be a big help!... read more
There is a new 1.1 experimental release in the unstable package. This release replaces the old parser with the new one. All of the old functionality is now in place except for MBX file support.
I just posted an experimental version of SP to the unstable release area. This version includes a brand new email parsing algorithm that substantially improves parsing speed. The new parser captures more meaningful terms than the old one and also has the advantage of holding the entire message in memory where it can be manipulated in a variety of way.
This release does not implement all of the various parsing options of the old parser. Many of those will be added as the new code is refined.... read more
Well it's here at last! Spamprobe 1.0 is out. Changes in this release include:
* added exec and exec-shared commands to allow running a command in a shell with SP holds a lock on the database (inspired by Graham Toal)
* added --enable-big-endian to configure to force SP to always store term counts in big-endian format (thanks to Jon Rust for sponsoring this feature!)
* added debian compatibility changes to configure contributed by Siggy Brentrup.... read more
This release includes some minor changes including performance tuning, an improved dump command that allows limiting terms printed to those matching a regular expression, and support for using an external tokenizer instead of spamprobe's own.
This release uses a modified version of the scoring equation that yields better distribution of scores. Also reduced unnecessary I/O in train mode, removed some redundant recalculation of scores, added support for X-Status: header, and added command line option to let users set their own spam score threshold.
This release adds a tokenize command for seeing all of the tokens in an email, an option to honor the deleted status of an email (for IMAP servers), an option to use a different scoring algorithm that provides more evenly distributed scores, and some minor bug fixes.
This release changes from using PBL's ISAM file format to using just its key-file format. The change, recommended by the PBL author Peter Graf, reduces the size of PBL databases by more than half.
This release also includes some changes to the way that top terms are selected when scoring emails. The change reduces the probability of getting false positives without adversely affecting the rate of false negatives. Also the output of the -T command line option has changed to include the overall database good and spam counts for each term.... read more
This release adds a fix for decoding of quoted printable email headers. Use of Berkeley DB's CDB feature has now been turned into a configure time option because of problems reported by several users. The big news however is that this release adds support for Peter Graf's PBL ISAM library as an optional replacement for Berkeley DB. PBL appears to work more reliably and performs better under heavy load.
This release restores the ability to let users access a shared database without write permissions. Also adds RFC 2047 header decoding and some minor bug fixes.
This release adds months of improvements including better berkeley db integration, a new "train" mode to allow SpamProbe to learn with less database I/O, improved locking, improved import/export, improved MIME handling, smarter HTML tag removal, and numerous bug fixes.
A new version of spamprobe, 0.9-dev-6 is now available. This version uses Berkeley DB's environment and concurrent data store APIs for better integration with BDB's own utilities and higher concurrency.
Other improvements include smarter locking (single writer, multi reader, readers can coexist with a writer), some ctype routine work arounds, a -R command line option to based exit code on spamminess of message being processed, export/import support for keeping time stamps of terms intact, and some other minor bug fixes.... read more
This is an experimental release prior to the official stable 0.9 release. This release adds the "train" commands to reduce database consumption, improves locking for berkeley db databases, and reduces the ammount of i/o in the cleanup commands.
I've just uploaded a new unstable release which has improved hash file support (not compatible with old hash files) and a new "train" command.
The improved hash support is mainly a better way (I think) of generating indexes into the file and picking a file size that should reduce collisions. Another major change is that each term now uses 8 bytes instead of 12 in the hash file. I used only 3 bytes instead of 4 for the counts to get the 50% reduction.... read more
spamprobe-0.9-dev-2 is now available in the unstable package. This release has some minor bug fixes and improvements. I plan to test it over the next week or so and then release 0.9.
This release fixes a problem with crashes in the regex routines on RedHat 8 systems. SpamProbe is a fast, intelligent, automatic spam detector using Paul Graham-style Bayesian analysis of word counts in spam and non-spam emails. Filtering adapts to personal tastes automatically. No manual rule creation required.