This file covers the major changes between each release. For more details,
the reader is referred to the changelog (CHANGELOG.TXT in the main directory
of the source archive), or for extreme details, to the check-ins archive
(see <http://mail.python.org/pipermail/spambayes-checkins>).
Changes are broken into sections, so that it's easier for you to find the
changes that are relevant to you.
Any actions necessary to move to this release from the previous release are
noted in the "Incompatible changes and Transitioning" section.
Note that this is an alpha release, intended for those willing to try out
very cutting edge software. It is likely that there are a few unresolved
bugs with this release. If you would rather wait for a more stable release
to have the new features here, please continue to use 1.0.4 until 1.1 final
is released.
New in 1.1 Alpha 5
==================
The XML-RPC plugin for core_server.py now has 'train' and 'train_mime'
methods.
The source code repository was switched from CVS to Subversion.
New in 1.1 Alpha 4
==================
--------------------------------------------
** Incompatible changes and Transitioning **
--------------------------------------------
Some options that were 'experimental' in 1.1a3 have now been upgraded to
non-experimental, meaning the option names have had their 'x-' prefix removed.
See below for details.
Otherwise, there should be no incompatible changes since 1.1a3, though users
new to the 1.1 series should pay careful attention to the database changes
introduced in 1.1a2.
-------------------
** Other changes **
-------------------
The previously experimental options 'x-crack-images', 'x-ocr-engine'
and 'x-image-size' have all had their 'x-' prefix removed. 'crack-images'
now defaults to True (meaning you don't need to change anything for it
to be enabled), and ocr-engine defaults to 'gocr'. The Windows binary ships
with the gocr engine, so this should work out-of-the-box for both for Outlook
and POP/IMAP/etc users.
Image Cracking (ie, using OCR to extract text from images) has been
implemented for the Outlook addin.
Some localization related issues have been fixed, and a German translation
contributed.
There is a new application, core_server.py. It is functionally similar to
sb_server.py but uses a plugin architecture to adapt to different
protocols. The first plugin is for XML-RPC.
New in 1.1 Alpha 3
==================
--------------------------------------------
** Incompatible changes and Transitioning **
--------------------------------------------
There should be no incompatible changes since 1.1a2, though users new to the
1.1 series should pay careful attention to the database changes introduced
in 1.1a2.
-------------------
** Other changes **
-------------------
General
-------
Reported Bugs Fixed
===================
No bugs tracked via the Sourceforge system were fixed.
Patches integrated
===================
The following patches tracked via the Sourceforge system were integrated
in this release:
824651
Feature Requests Added
======================
No feature requests tracked via the Sourceforge system were added
in this release.
Experimental Options
====================
In addition to the experimental options listed for the 1.1a2 release, four
more new experimental options were added to SpamBayes. They all need
further testing.
o x-short_runs - If true, generate tokens based on max number of short
word runs. Short words are anything of length < the skip_max_word_size
option. Normally they are skipped, but one common spam technique spells
words like 'V m I n A o G p RA' to try and avoid exposing them to
content filters.
o x-lookup_ip - If true, generate IP address tokens from hostnames. This
requires PyDNS (http://pydns.sourceforge.net/). This is included in the
Windows installer.
o x-image_size - If true, generate tokens based on the size of the largest
attached image.
o x-crack_images - A lot of recent spam contains the entire message
embedded in one or more attached images. This option, if true,
generates tokens based on the (hopefully) text content contained in any
images in each message. The current support is minimal, relies on the
installation of ocrad (http://www.gnu.org/software/ocrad/ocrad.html) and
the Python Imaging Library (a.k.a. PIL, available at
http://www.pythonware.com/products/pil/). It has not yet been tested on
Windows, but is available in the Windows installer (as is PIL).
New in 1.1 Alpha 2
==================
--------------------------------------------
** Incompatible changes and Transitioning **
--------------------------------------------
* NOTE * - this section does not apply to people running SpamBayes on
Windows using the binary installer - only source code installations are
affected.
SpamBayes has changed to use ZODB as the default database backend, rather
than dbm (usually bsddb). There are three methods for handling this
transition:
o You can start with a fresh database. It's a good idea to start
training over occasionally, and SpamBayes learns very rapidly, so this
is a feasible option. This is the default behaviour, unless you have
set SpamBayes to use a different name or backend type, in which case
your choices will remain unchanged.
o You can continue to use your existing database files. To do this, you
need to set SpamBayes to use "dbm" as the persistent storage method
in the configuration. POP3 Proxy and IMAP4 filter users can do this
via the configuration page; Outlook and command-line tool users need
to manually add these lines to their configuration file:
[Storage]
persistent_use_database:dbm
o You can convert your existing database files to the new format using
the utilities/convert_db.py script.
Note that only the token database (containing your training) is
converted; the 'messageinfo' database (containing statistics about
SpamBayes performance and a record of which messages have been
filtered and trained) will be replaced with an empty one. The only
noticeable effect of this will be that IMAP Filter users will have
all messages refiltered and the Statistics reports will be restarted.
If you have changed your configuration to specify which backend to use in
the past, note that the deprecated values of "True" (meaning "dbm") and
"False" (meaning "pickle") are no longer supported (use "dbm" or "pickle").
There should be no other incompatible changes (from 1.0.4) in this release.
If you are transitioning from a version older than 1.0.4, please
read the notes in the previous release notes (accessible from
<http://sourceforge.net/project/showfiles.php?group_id=61702>).
-------------------
** Other changes **
-------------------
General
-------
o Handling of errors when using the ZODB database backend has been
improved.
o The ZODB databases are now automatically packed, to save space.
o The round accounting in tte.py has additional information.
o Basic options are now available in sort+group.py
o fpfn.py now works with unsures, and offers interactivity.
Outlook Plugin
--------------
o The folder dialogs were broken in 1.1a1; these have been fixed.
POP3 Proxy (sb_server.py)
-------------------------
o If an error occurs while filtering, an exception header is added to
the message (instead of the SpamBayes headers). In 1.1a1, this
exception header caused the rest of the message headers to move into
the message body. This is now fixed.
Web interface (sb_server.py and sb_imapfilter.py)
-------------------------------------------------
o The default action for ham and spam in the review page is now 'discard'.
We recommend using a 'train on false positives, false negatives and
unsures' regime, and these are the most suitable defaults for that.
o The "Notate To" and "Notate Subject" options should work correctly
from the configuration page now.
o The current row of the review page is now highlighted.
IMAP Filter (sb_imapfilter.py)
------------------------------
o One-off errors in connecting to the server when using the -l option
are handled better.
o A problem with closing a folder twice was fixed.
o Invalid dates are handled correctly.
o Only selected folders are expunged.
Reported Bugs Fixed
===================
The following bugs tracked via the Sourceforge system were fixed:
1181160, 1179055, 1187208, 1182671, 1182754, 759917
A URL containing the details of these bugs can be made by appending the
bug number to this URL:
http://sourceforge.net/tracker/index.php?func=detail&group_id=61702&atid=498103&aid=
Feature Requests Added
======================
The following feature requests tracked via the Sourceforge system were added
in this release:
1182703, 618932
A URL containing the details of these bugs can be made by appending the
bug number to this URL:
http://sourceforge.net/tracker/index.php?func=detail&group_id=61702&atid=498103&aid=
Patches integrated
===================
No patches tracked via the Sourceforge system were integrated for this release.
Deprecated Options
==================
No options are currently deprecated.
Experimental Options
====================
We would like to remind users about our set of experimental options. These
are options which we believe may be of benefit to users, but have not been
tested throughly enough to warrent full inclusion. We would greatly
appreciate feedback from users willing to try these options out as to their
perceived benefit. Both source code and binary users (including Outlook)
can try these options out.
More information about the experimental options and how to enable them
can be found at:
http://spambayes.org/experimental.html
If you have any queries about the experimental options, please email
spambayes@python.org and we will try and answer them.
Experimental options that are currently available include:
o [Tokenizer] x-search_for_habeas_headers
o [Tokenizer] x-reduce_habeas_headers
These generate tokens based on the Habeas headers (see
<http://habeas.com> for more details).
o [URLRetriever] x-slurp_urls
o [URLRetriever] x-cache_expiry_days
o [URLRetriever] x-cache_directory
o [URLRetriever] x-only_slurp_base
o [URLRetriever] x-web_prefix
If these are used, if a message is scored as 'unsure', and could use
more tokens in its classification, then text from any URLs in the
message is retrieved and used, if it makes a difference to the
classification.
o [Tokenizer] x-pick_apart_urls
Pick out some semantic bits from URLs.
o [Tokenizer] x-fancy_url_recognition
Recognize 'abbreviated' URLs of the form www.xyz.com or ftp.xyz.com as
http://www.xyz.com and ftp://ftp.xyz.com, respectively. This gets rid
of some fairly common "skip:w NNN" tokens.