[cvs] bogofilter/doc bogofilter.xml,1.57,1.58

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Update of /cvsroot/bogofilter/bogofilter/doc
In directory sc8-pr-cvs1:/tmp/cvs-serv18548

Modified Files:
	bogofilter.xml 
Log Message:
Clean-up cutoff explanations.  Add '-TT' info.

Index: bogofilter.xml
===================================================================
RCS file: /cvsroot/bogofilter/bogofilter/doc/bogofilter.xml,v
retrieving revision 1.57
retrieving revision 1.58
diff -u -d -r1.57 -r1.58

--- bogofilter.xml	4 Nov 2003 15:53:06 -0000	1.57
+++ bogofilter.xml	9 Dec 2003 12:50:59 -0000	1.58
@@ -143,26 +143,30 @@
 token towards robx.</para>
 
 <para>min_dev: a minimum distance from .5 for tokens to use in the
-calculation.  Tokens closer to 0.5 than this value are not used.</para>
+calculation.  Only tokens farther away from 0.5 than this value are
+used.</para>
 
 <para>spam_cutoff: messages with scores bigger than this will be
 marked as spam.</para>
 
 <para>ham_cutoff: If zero, all messages with values below spam_cutoff
-are marked as ham.  If bigger than zero, values below ham_cutoff are
-marked as ham, messages with values between ham_cutoff and spam_cutoff
-are marked as unsure.</para>
+are marked as ham.  If bigger than zero, values less than or equal to
+ham_cutoff are marked as ham.  Messages with values between ham_cutoff
+and spam_cutoff are marked as unsure.  If ham_cutoff equals
+spam_cutoff, messages with this score are marked as spam.</para>
 
 <para>While this method sounds crude compared to the more usual
 pattern-matching approach, it turns out to be extremely effective.
 Paul Graham's paper <ulink url="http://www.paulgraham.com/spam.html">
 A Plan For Spam</ulink> is recommended reading.</para>
 
-<para>This program substantially improves on Paul's proposal by doing smarter
-lexical analysis.  In particular, hostnames and IP addresses are retained
-as recognition features rather than broken up.  Various kinds of MTA
-cruft such as dates and message-IDs are ignored so as not to bloat
-the wordlists.  Lex's Swiss-army-knife nature rises again.</para>
+<para>This program substantially improves on Paul's proposal by doing
+smarter lexical analysis.  <application>Bogofilter</application> does
+proper MIME decoding and a reasonable HTML parsing.  Special kinds of
+tokens like hostnames and IP addresses are retained as recognition
+features rather than broken up.  Various kinds of MTA cruft such as
+dates and message-IDs are ignored so as not to bloat the wordlists.
+Tokens found in various header fields are marked appropriately.</para>
 
 <para>Another seeming improvement is that this program offers Gary
 Robinson's suggested modifications to the calculations.  These modifications
@@ -211,6 +215,10 @@
 scripts to use.  <application>bogofilter</application> will print an
 abbreviated spamicity message containing 1 letter and the score.  Spam
 is indicated with "S", ham by "H", and unsure by "U".</para>
+
+<para>The <option>-TT</option> provides an invariant terse mode for
+scripts to use.  <application>Bogofilter</application> prints only the
+score and displays it to 16 significant digits.</para>
 
 <para>The <option>-u</option> option tells
 <application>bogofilter</application> to register the message's text






[cvs] bogofilter/doc bogofilter.xml,1.57,1.58

Fast Bayesian spam filter along lines suggested by Paul Graham

[cvs] bogofilter/doc bogofilter.xml,1.57,1.58