[Bogofilter-cvs] bogofilter UPGRADE,NONE,1.1 bogoupgrade.pl,NONE,1.1
Fast Bayesian spam filter along lines suggested by Paul Graham
Brought to you by:
m-a
From: <gy...@us...> - 2002-09-24 04:36:57
|
Update of /cvsroot/bogofilter/bogofilter In directory usw-pr-cvs1:/tmp/cvs-serv27135 Added Files: UPGRADE bogoupgrade.pl Log Message: UPGRADE -- Instructions and notes on upgrading databases. bogoupgrade.pl -- script to perform upgrade. --- NEW FILE: UPGRADE --- What: There are now multiple file formats for various versions of bogofilter: This document explains how to upgrade any earlier type to current versions. Assumptions: Recent version of bogofilter package is installed and the programs /usr/bin/bogoutil and /usr/bin/bogoupgrade.pl exist. Adjust paths to suit your taste and system. How: 1. Stop all instances of bogofilter. The upgrade tools do not lock files or guard against multiple processes accessing the same files. If you have cron jobs or daemons that fetch and process mail and could fire off bogofilter. Stop them. 2. Backup your data. Let's assume that you said: $ mv ~/.bogofilter ~/.bogofilter.safe $ mkdir ~/.bogofilter 3. If your bogofilter version is less than 0.7, say $ /usr/bin/bogoupgrade.pl -b /usr/bin/bogoutil -i ~/.bogofilter.safe/goodlist -o ~/.bogofilter/hamlist.db $ /usr/bin/bogoupgrade.pl -b /usr/bin/bogoutil -i ~/.bogofilter.safe/badlist -o ~/.bogofilter/spamlist.db If your bogofilter version is 0.7 or greater, say $ /usr/bin/bogoupgrade.pl -b /usr/bin/bogoutil -i ~/.bogofilter.safe/hamlist.count -o ~/.bogofilter/hamlist.db $ /usr/bin/bogoupgrade.pl -b /usr/bin/bogoutil -i ~/.bogofilter.safe/spamlist.count -o ~/.bogofilter/spamlist.db 4. Done. Restart any stopped daemons, cron tasks, etc. Why: Versions 0.1? to 0.6 uses a text file for message counts and data. The first line contains a signature and a message count. Subsequent lines contain space separated word/count pairs, each followed by a newline. Here are the first few lines of a sample file: # bogofilter wordlist (format version A): 798 word 5 otherword 4 yetanotherword 4 Versions 0.7 to 0.7.4 uses two files for each list: A text file for the message counts, and a Berkeley DB file for the word/count values. Here is a sample signature message count file: # bogofilter email-count (format version B): 1077 Versions 0.7.5+ use a single Berkeley DB file to hold both word and message counts. A record with the special key, '.MSG_COUNT' is used for the message count. The text file unused. Note: That some people may have applied a patch to version 0.70 which has a similar effect, but uses a key value of '.count' for the message count. This type will also be correctly upgraded. Who: Gyepi Sam <gy...@pr...> Any problems should first be addressed to the bogofilter lists, to which I am subscribed. --- NEW FILE: bogoupgrade.pl --- #!/usr/bin/perl =pod Name: upgrade.pl -- upgrade a bogofilter database to current version. Author: Gyepi Sam <gy...@pr...> =cut my $VERSION = '0.1'; my ($in, $out, $help); my $bogoutil = 'bogoutil'; for (my $i = 0; $i < @ARGV; $i++){ my $arg = $ARGV[$i]; if ($arg eq '-i'){ $in = $ARGV[++$i]; } elsif ($arg eq '-o'){ $out = $ARGV[++$i]; } elsif ($arg eq '-b'){ $bogoutil = $ARGV[++$i]; } elsif ($arg eq '-h'){ help(); exit(1); } else { usage(); exit(1); } } my $msg_count_token = '.MSG_COUNT'; open(F, $in) or die "Cannot open input file [$in]. $!.\n"; my $sig = <F>; chomp($sig); if ($sig =~ m/^\# bogofilter wordlist \(format version A\):\s(\d+)$/){ my $msg_count = $1; my $cmd = qq[$bogoutil -l $out]; open(OUT, "|$cmd") or die "Cannot run command [$cmd]. $!\n"; while(<F>){ print OUT $_; } print OUT "$msg_count_token $msg_count\n"; close(OUT); close(F); } elsif ($sig =~ m/^\# bogofilter email-count \(format version B\):\s(\d+)/){ my $msg_count = $1; my $in_db = $in; $in_db =~ s/count$/db/; unless (-f $in_db){ warn("Cannot find database file [$in_db] corresponding to input file [$in]\n"); exit; } my $cmd = qq[$bogoutil -l $out]; open(OUT, "|$cmd") or die "Cannot run command [$cmd]. $!\n"; close(F); $cmd = qq[$bogoutil -d $in_db]; open(F, "$cmd|") or die "Cannot run command [$cmd]. $!\n"; while(<F>){ if (m/^\.count\s+(\d+)$/){ warn("Found a message count of [$1] in db. Throwing away text file count of [$msg_count]\n"); $msg_count = $1; next; } elsif (/^$msg_count_token\s(\d+)$/){ warn("This database appears to have been upgraded already. But there's no harm in doing it again.\n"); $msg_count = $1; next; } print OUT $_; } print OUT "$msg_count_token $msg_count\n"; close(F); close(OUT); } else { print STDERR "Cannot recognize signature [$sig].\n"; exit(2); } exit(0); sub usage { print STDERR "usage: $0 [ -i <input text file> -o <output db file> [ -b <path to bogoutil>] ] [ -h ]\n"; } sub help { print <<EOF; $0 -- upgrades bogofilter database to current version. Options: -i <input file>. Text file containing message counts, and possibly data. If there is no data in the text file, there should be a Berkeley DB file in the same directory as the text file which contains the data. -o <output file> Output Berkeley DB file. -b <path to bogoutil program> Defaults to 'bogoutil', in the hopes that your shell will find it. -h help You are reading it. EOF exit(0); } |