[cvs] bogofilter/contrib bogominitrain.pl,1.2,1.3
Fast Bayesian spam filter along lines suggested by Paul Graham
Brought to you by:
m-a
From: <re...@us...> - 2003-07-22 15:50:51
|
Update of /cvsroot/bogofilter/bogofilter/contrib In directory sc8-pr-cvs1:/tmp/cvs-serv15145 Modified Files: bogominitrain.pl Log Message: Update help text. Index: bogominitrain.pl =================================================================== RCS file: /cvsroot/bogofilter/bogofilter/contrib/bogominitrain.pl,v retrieving revision 1.2 retrieving revision 1.3 diff -u -d -r1.2 -r1.3 --- bogominitrain.pl 22 Jul 2003 12:52:06 -0000 1.2 +++ bogominitrain.pl 22 Jul 2003 15:50:38 -0000 1.3 @@ -22,8 +22,17 @@ Run "formail -es" on your mailboxes before you start to ensure their correctness. - It may be a good idea to run this script command several times. - Use the '-f' option to run the script until no scoring errors occur. + It may be a good idea to run this script command several times. Use + the '-f' option to run the script until no scoring errors occur. + + To increase the size of your wordlists, which will help bogofilter's + scoring accuracy, use bogofilter's -o option to set ham_cutoff and + spam_cutoff to create an "unsure" interval around your normal + spam_cutoff. The script will train so that the messages will avoid + this interval, i.e., all messages in your training mboxes will be + marked as ham or spam with values far from your production cutoff. + For example if you usually work with spam_cutoff=0.6, you might use + the following as bogofilter-options: '-o 0.7,0.5' Example: bogominitrain.pl -fv .bogofilter 'ham*' 'spam*' '-c train.cf' @@ -132,7 +141,7 @@ close (HAM); close (SPAM); - print "\nDone:\n"; + print "\nEnd of run #$runs:\n"; print "Read $hamcount ham mails and $spamcount spam mails.\n"; print "Added $hamadd ham mails and $spamadd spam mails to the database.\n"; print `bogoutil -w $dir .MSG_COUNT`; @@ -141,4 +150,4 @@ $fp=`cat $ham | $bogofilter -vM | grep -c Spam`; print "False positives: $fp\n"; } until ($fn+$fp==0 || !$force); -print "\n$runs runs needed to close off.\n" if ($force); +print "\n$runs run(s) needed to close off.\n" if ($force); |