Thread: [cvs] SF.net SVN: bogofilter:[6744] trunk/bogofilter
Fast Bayesian spam filter along lines suggested by Paul Graham
Brought to you by:
m-a
From: <m-...@us...> - 2008-10-15 12:00:30
|
Revision: 6744 http://bogofilter.svn.sourceforge.net/bogofilter/?rev=6744&view=rev Author: m-a Date: 2008-10-15 12:00:20 +0000 (Wed, 15 Oct 2008) Log Message: ----------- Update sqlite3 adaptor to take advantage of sqlite3_prepare_v2() API function that appeared in SQLite 3.3.9. The new _v2 interface allows for more specific error messages when executing SQL statements. Modified Paths: -------------- trunk/bogofilter/NEWS trunk/bogofilter/configure.ac trunk/bogofilter/doc/README.sqlite trunk/bogofilter/src/datastore_sqlite.c Modified: trunk/bogofilter/NEWS =================================================================== --- trunk/bogofilter/NEWS 2008-08-12 08:59:50 UTC (rev 6743) +++ trunk/bogofilter/NEWS 2008-10-15 12:00:20 UTC (rev 6744) @@ -15,12 +15,23 @@ ------------------------------------------------------------------------------- + 2008-10-15 + + * Update sqlite3 adaptor to take advantage of sqlite3_prepare_v2() + API function that appeared in SQLite 3.3.9. The new _v2 interface + allows for more specific error messages when executing SQL + statements. + + 2008-07-21 + * Update doc/integrating-with-postfix: the script now suggests sendmail -G -i (where -G will be ignored by Postfix before 2.3) to tell Postfix it's a gateway submission, not an original injection; the filter pipe(8) magic for master.cf now suggests flags=Rq (was flags=R), as per Postfix's FILTER_README. + 2008-07-09 + * Drop support for systems that reverse setvbuf arguments. The last systems to do that are reported to be shipped in 1987 by the autoconf manual, so ditch them. Modified: trunk/bogofilter/configure.ac =================================================================== --- trunk/bogofilter/configure.ac 2008-08-12 08:59:50 UTC (rev 6743) +++ trunk/bogofilter/configure.ac 2008-10-15 12:00:20 UTC (rev 6744) @@ -468,6 +468,10 @@ AC_LIB_LINKFLAGS([sqlite3]) LIBDB="$LIBSQLITE3" WITH_DB_ENGINE="sqlite3" + saveLIBS="$LIBS" + LIBS="$LIBDB $LIBS" + AC_CHECK_FUNC([sqlite3_prepare_v2],,AC_DEFINE(sqlite3_prepare_v2,sqlite3_prepare,[Define to sqlite3_prepare if new interface missing (for sqlite < 3.3.9)])) + LIBS="$saveLIBS" ;; xtokyocabinet) AC_DEFINE(ENABLE_TOKYOCABINET_DATASTORE,1, [Enable tokyocabinet datastore]) Modified: trunk/bogofilter/doc/README.sqlite =================================================================== --- trunk/bogofilter/doc/README.sqlite 2008-08-12 08:59:50 UTC (rev 6743) +++ trunk/bogofilter/doc/README.sqlite 2008-10-15 12:00:20 UTC (rev 6744) @@ -20,13 +20,16 @@ 2.1 Compatibility - supported SQLite versions -At this time, only SQLite 3.5.4 and newer are supported. Older versions +At this time, only SQLite v3.5.6 and newer are supported. Older versions back to 3.0.8 may work, but you are advised to carefully review the sqlite3 ChangeLog, because there have been several important bug fixes since 3.0.8, including fixes for bugs that can corrupt the database. +Note that sqlite v3.3.9 and newer sometimes generate error messages that +are clearer when executing SQL statements (that bogofilter generates +internally). Bogofilter prints a warning (but continues to run) when used with SQLite -versions older than 3.5.4. This warning can be suppressed by defining +versions older than v3.5.4. This warning can be suppressed by defining the environment variable BF_USE_OLD_SQLITE to any value, including the empty value. Modified: trunk/bogofilter/src/datastore_sqlite.c =================================================================== --- trunk/bogofilter/src/datastore_sqlite.c 2008-08-12 08:59:50 UTC (rev 6743) +++ trunk/bogofilter/src/datastore_sqlite.c 2008-10-15 12:00:20 UTC (rev 6744) @@ -155,7 +155,7 @@ static sqlite3_stmt *sqlprep(dbh_t *dbh, const char *cmd, bool bailout /** exit on error? */) { const char *tail; /* dummy */ sqlite3_stmt *ptr; - if (sqlite3_prepare(dbh->db, cmd, strlen(cmd), &ptr, &tail) != SQLITE_OK) { + if (sqlite3_prepare_v2(dbh->db, cmd, strlen(cmd), &ptr, &tail) != SQLITE_OK) { print_error(__FILE__, __LINE__, "cannot compile %s: %s\n", cmd, sqlite3_errmsg(dbh->db)); if (bailout) exit(EX_ERROR); @@ -187,7 +187,7 @@ dbv_t key, val; /* sqlite3_exec doesn't allow us to retrieve BLOBs */ - rc = sqlite3_prepare(db, cmd, strlen(cmd), &stmt, &tail); + rc = sqlite3_prepare_v2(db, cmd, strlen(cmd), &stmt, &tail); if (rc) { print_error(__FILE__, __LINE__, "Error preparing \"%s\": %s (#%d)\n", This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <m-...@us...> - 2008-10-15 12:24:35
|
Revision: 6746 http://bogofilter.svn.sourceforge.net/bogofilter/?rev=6746&view=rev Author: m-a Date: 2008-10-15 12:24:24 +0000 (Wed, 15 Oct 2008) Log Message: ----------- sqlite3: Request extended results codes. Modified Paths: -------------- trunk/bogofilter/NEWS trunk/bogofilter/src/datastore_sqlite.c Modified: trunk/bogofilter/NEWS =================================================================== --- trunk/bogofilter/NEWS 2008-10-15 12:16:05 UTC (rev 6745) +++ trunk/bogofilter/NEWS 2008-10-15 12:24:24 UTC (rev 6746) @@ -20,7 +20,8 @@ * Update sqlite3 adaptor to take advantage of sqlite3_prepare_v2() API function that appeared in SQLite 3.3.9. The new _v2 interface allows for more specific error messages when executing SQL - statements. + statements. Also enable extended result codes for more precise error + reporting. 2008-07-21 Modified: trunk/bogofilter/src/datastore_sqlite.c =================================================================== --- trunk/bogofilter/src/datastore_sqlite.c 2008-10-15 12:16:05 UTC (rev 6745) +++ trunk/bogofilter/src/datastore_sqlite.c 2008-10-15 12:24:24 UTC (rev 6746) @@ -296,6 +296,9 @@ goto barf; } + /* request extended result codes for improved error reporting */ + (void)sqlite3_extended_result_codes(dbh->db, true); + /* set trace mode */ if (DEBUG_DATABASE(1) || getenv("BF_DEBUG_DB")) sqlite3_trace(dbh->db, db_trace, NULL); This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <cl...@us...> - 2008-10-15 23:08:15
|
Revision: 6749 http://bogofilter.svn.sourceforge.net/bogofilter/?rev=6749&view=rev Author: clint Date: 2008-10-15 23:08:06 +0000 (Wed, 15 Oct 2008) Log Message: ----------- Put the transformed program names in the transformed bf_compact. Modified Paths: -------------- trunk/bogofilter/configure.ac trunk/bogofilter/src/bf_compact.in Modified: trunk/bogofilter/configure.ac =================================================================== --- trunk/bogofilter/configure.ac 2008-10-15 23:07:31 UTC (rev 6748) +++ trunk/bogofilter/configure.ac 2008-10-15 23:08:06 UTC (rev 6749) @@ -838,6 +838,12 @@ ;; esac +bogofilter_transform=`echo "${program_transform_name}" | sed -e 's,\\\\\\\\,\\\\,g;s,\\\$\\\$,\$,g'` +transformed_bogofilter=`echo bogofilter | sed -e "$bogofilter_transform"` +transformed_bogoutil=`echo bogoutil | sed -e "$bogofilter_transform"` +AC_SUBST(transformed_bogofilter) +AC_SUBST(transformed_bogoutil) + # Note the \\\\ for backslashes. Autoconf eats one layer, leaving \\ AC_DEFINE(CURDIR_S, ".", [Define name of current directory (C string)]) Modified: trunk/bogofilter/src/bf_compact.in =================================================================== --- trunk/bogofilter/src/bf_compact.in 2008-10-15 23:07:31 UTC (rev 6748) +++ trunk/bogofilter/src/bf_compact.in 2008-10-15 23:08:06 UTC (rev 6749) @@ -10,8 +10,8 @@ set -e # die on errors -: ${BOGOFILTER:=bogofilter} -: ${BOGOUTIL:=bogoutil} +: ${BOGOFILTER:=@transformed_bogofilter@} +: ${BOGOUTIL:=@transformed_bogoutil@} if [ -z "$1" ] ; then echo 'usage: bf_compact source_dir [wordlist_name...]' This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <m-...@us...> - 2008-10-16 22:10:38
|
Revision: 6752 http://bogofilter.svn.sourceforge.net/bogofilter/?rev=6752&view=rev Author: m-a Date: 2008-10-16 22:10:32 +0000 (Thu, 16 Oct 2008) Log Message: ----------- Update static lib builds for SQLite 3.6.3 and use amalgamation. BDB 4.2 static builds, but shared version doesn't build on openSUSE 11. Modified Paths: -------------- trunk/bogofilter/Makefile.am trunk/bogofilter/install-staticdblibs.sh Modified: trunk/bogofilter/Makefile.am =================================================================== --- trunk/bogofilter/Makefile.am 2008-10-16 21:49:36 UTC (rev 6751) +++ trunk/bogofilter/Makefile.am 2008-10-16 22:10:32 UTC (rev 6752) @@ -90,7 +90,7 @@ echo >&2 "Please run the install-staticdblibs.sh script first." ; exit 1 ; fi BF_ZAP_LIBDB=zap CPPFLAGS="-D__NO_CTYPE -I$(DBPFX)/include" LIBS=$(DBLIB) \ $(RPMBUILD) $(DEF_STATIC) $(DEF_DB42) $(SIGN) -tb $(distdir).tar.gz - BF_ZAP_LIBDB=zap CPPFLAGS="-D__NO_CTYPE -I$(SQPFX)/include" LIBS=$(SQLIB) \ + BF_ZAP_LIBDB=zap CPPFLAGS="-D__NO_CTYPE -I$(SQPFX)/include" LIBS="$(SQLIB) -ldl" \ $(RPMBUILD) $(DEF_STATIC) $(DEF_SQLITE) $(SIGN) -tb $(distdir).tar.gz @echo @echo "Now building the shared database library RPMs - this may fail" @@ -99,7 +99,7 @@ @echo "been built are ready to go however." @echo @sleep 5 - - CPPFLAGS="-D__NO_CTYPE -I$(DBPFX)" LIBS=-ldb-4.2 \ + - CPPFLAGS="-D__NO_CTYPE -I$(DBPFX)" LIBS="-ldb-4.2 -pthread" \ $(RPMBUILD) $(DEF_DB42) $(SIGN) -tb $(distdir).tar.gz - CPPFLAGS="-D__NO_CTYPE -I$(SQPFX)" \ $(RPMBUILD) $(DEF_SQLITE) $(SIGN) -tb $(distdir).tar.gz Modified: trunk/bogofilter/install-staticdblibs.sh =================================================================== --- trunk/bogofilter/install-staticdblibs.sh 2008-10-16 21:49:36 UTC (rev 6751) +++ trunk/bogofilter/install-staticdblibs.sh 2008-10-16 22:10:32 UTC (rev 6752) @@ -104,7 +104,8 @@ dbdir=db-4.2.52 dbpfx=/opt/db-4.2-lean -sqdir=sqlite-3.5.6 +sqfil=sqlite-amalgamation-3.6.3.tar.gz +sqdir=sqlite-3.6.3 sqpfx=/opt/sqlite-3-lean ### download SleepyCat DB 4.2.52 and patches @@ -130,7 +131,7 @@ echo "$checklib already exists, not building Berkeley DB." fi -### download SQLite 3.5.6 +### download SQLite 3.6.3 # Info: the objdump test fixes up the effects of a bug # in an earlier version of this script, which built # a sqlite 3.2.8 version that required GLIBC_2.3. @@ -146,7 +147,7 @@ bogofilter.org) URL=ftp://ftp.bogofilter.org/pub/outgoing/tools/SQLite ;; esac - want $URL/sqlite-3.5.6.tar.gz 903c9e935c538af392364a9172a3d98d + want $URL/sqlite-amalgamation-3.6.3.tar.gz d6e2df754e2619c4b5a06c66ae20632c build_sqlite=1 else echo "$checklib already exists, not building SQLite3." @@ -184,7 +185,7 @@ # build SQLite 3 if test $build_sqlite = 1 ; then rm -rf build-$sqdir $sqdir - gunzip -cd $sqdir.tar.gz | tar xf - + gunzip -cd $sqfil | tar xf - set -e echo "installing $sqdir" mkdir -p build-$sqdir This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <m-...@us...> - 2008-10-20 13:26:38
|
Revision: 6754 http://bogofilter.svn.sourceforge.net/bogofilter/?rev=6754&view=rev Author: m-a Date: 2008-10-20 13:26:34 +0000 (Mon, 20 Oct 2008) Log Message: ----------- update bf_compact documentation by removing explicit Berkeley DB references, as it has been fixed to work with other database drivers in March 2008. Make the nebulous "directory renaming issues" clearer. Modified Paths: -------------- trunk/bogofilter/NEWS trunk/bogofilter/doc/bf_compact.xml Modified: trunk/bogofilter/NEWS =================================================================== --- trunk/bogofilter/NEWS 2008-10-20 13:23:38 UTC (rev 6753) +++ trunk/bogofilter/NEWS 2008-10-20 13:26:34 UTC (rev 6754) @@ -15,6 +15,12 @@ ------------------------------------------------------------------------------- + 2008-10-20 + + * update bf_compact documentation by removing explicit Berkeley DB + references, as it has been fixed to work with other database drivers + in March 2008. + 2008-10-15 * bf_compact, bf_copy and bf_tar now support transformed program names Modified: trunk/bogofilter/doc/bf_compact.xml =================================================================== --- trunk/bogofilter/doc/bf_compact.xml 2008-10-20 13:23:38 UTC (rev 6753) +++ trunk/bogofilter/doc/bf_compact.xml 2008-10-20 13:26:34 UTC (rev 6754) @@ -25,7 +25,8 @@ <replaceable>bogofilter_directory</replaceable> to <replaceable>bogofilter_directory</replaceable><filename>.old</filename>.</para> <para>Note: <command>bf_compact</command> cannot be used to process the - current working directory, ".", because of directory renaming issues.</para> + current working directory, ".", because that cannot be + renamed.</para> <para>If no <replaceable>wordlist_file</replaceable> arguments are given, then <command>bf_compact</command> will use the configured set of wordlists, if the given @@ -40,8 +41,6 @@ <para>This script will delete <replaceable>bogofilter_directory</replaceable><filename>.old</filename> and all of its contents!</para> - <para>This script is meant for use with Berkeley DB based - <application>bogofilter</application> versions.</para> <para>This script expects a SUSv2 compliant shell. Solaris systems should have the <systemitem>SUNWxcu4</systemitem> package installed (when <application>bogofilter</application> is configured) so that This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <re...@us...> - 2009-01-12 04:27:39
|
Revision: 6766 http://bogofilter.svn.sourceforge.net/bogofilter/?rev=6766&view=rev Author: relson Date: 2009-01-12 04:27:36 +0000 (Mon, 12 Jan 2009) Log Message: ----------- Sun Studio 12 compatibility changes. Modified Paths: -------------- trunk/bogofilter/NEWS trunk/bogofilter/src/base64.c trunk/bogofilter/src/bogohist.c trunk/bogofilter/src/bogolexer.c trunk/bogofilter/src/bogoreader.c trunk/bogofilter/src/bogotune.c trunk/bogofilter/src/bogoutil.c trunk/bogofilter/src/buff.c trunk/bogofilter/src/collect.c trunk/bogofilter/src/datastore.c trunk/bogofilter/src/format.c trunk/bogofilter/src/iconvert.c trunk/bogofilter/src/lexer.c trunk/bogofilter/src/lexer_v3.l trunk/bogofilter/src/maint.c trunk/bogofilter/src/mime.c trunk/bogofilter/src/qp.c trunk/bogofilter/src/robx.c trunk/bogofilter/src/tests/deb64.c trunk/bogofilter/src/tests/deqp.c trunk/bogofilter/src/token.c trunk/bogofilter/src/uudecode.c trunk/bogofilter/src/word.c trunk/bogofilter/src/word.h trunk/bogofilter/src/wordhash.c Modified: trunk/bogofilter/NEWS =================================================================== --- trunk/bogofilter/NEWS 2009-01-12 04:12:41 UTC (rev 6765) +++ trunk/bogofilter/NEWS 2009-01-12 04:27:36 UTC (rev 6766) @@ -15,6 +15,12 @@ ------------------------------------------------------------------------------- + 2009-01-11 + + * For compatibility with Sun's Sun Studio 12 compiler, provide + a name for the anonymous union in typedef word_t. + Patch provided by Jack Bailey. + 2008-10-20 * update bf_compact documentation by removing explicit Berkeley DB Modified: trunk/bogofilter/src/base64.c =================================================================== --- trunk/bogofilter/src/base64.c 2009-01-12 04:12:41 UTC (rev 6765) +++ trunk/bogofilter/src/base64.c 2009-01-12 04:27:36 UTC (rev 6766) @@ -26,8 +26,8 @@ { uint count = 0; uint size = word->leng; - byte *s = word->text; /* src */ - byte *d = word->text; /* dst */ + byte *s = word->u.text; /* src */ + byte *d = word->u.text; /* dst */ if (!base64_validate(word)) return size; @@ -96,7 +96,7 @@ base64_init(); for (i = 0; i < word->leng; i += 1) { - byte b = word->text[i]; + byte b = word->u.text[i]; byte v = base64_xlate[b]; if (v == 0 && b != 'A' && b != '\n' && b != '\r') return false; Modified: trunk/bogofilter/src/bogohist.c =================================================================== --- trunk/bogofilter/src/bogohist.c 2009-01-12 04:12:41 UTC (rev 6765) +++ trunk/bogofilter/src/bogohist.c 2009-01-12 04:27:36 UTC (rev 6766) @@ -53,7 +53,7 @@ uint idx = min(fw * INTERVALS, INTERVALS-1); /* ignore meta-tokens */ - if (*key->text == (byte) '.') + if (*key->u.text == (byte) '.') return 0; hist->count[idx] += 1; Modified: trunk/bogofilter/src/bogolexer.c =================================================================== --- trunk/bogofilter/src/bogolexer.c 2009-01-12 04:12:41 UTC (rev 6765) +++ trunk/bogofilter/src/bogolexer.c 2009-01-12 04:27:36 UTC (rev 6766) @@ -323,10 +323,10 @@ { count += 1; if (passthrough) { - fprintf(fpo, "%s\n", token.text); + fprintf(fpo, "%s\n", token.u.text); } else if (!quiet) - fprintf(fpo, "get_token: %d \"%s\"\n", (int)t, token.text); + fprintf(fpo, "get_token: %d \"%s\"\n", (int)t, token.u.text); } } Modified: trunk/bogofilter/src/bogoreader.c =================================================================== --- trunk/bogofilter/src/bogoreader.c 2009-01-12 04:12:41 UTC (rev 6765) +++ trunk/bogofilter/src/bogoreader.c 2009-01-12 04:27:36 UTC (rev 6766) @@ -437,7 +437,7 @@ static int mailbox_getline(buff_t *buff) { uint used = buff->t.leng; - byte *buf = buff->t.text + used; + byte *buf = buff->t.u.text + used; int count; static word_t *saved = NULL; static bool emptyline = false; /* for mailbox /^From / match */ @@ -472,7 +472,7 @@ } else { if (buff->t.leng < buff->size) /* for easier debugging - removable */ - Z(buff->t.text[buff->t.leng]); /* for easier debugging - removable */ + Z(buff->t.u.text[buff->t.leng]); /* for easier debugging - removable */ } emptyline = is_eol((char *)buf, count); @@ -485,7 +485,7 @@ { int count; uint used = buff->t.leng; - byte *buf = buff->t.text + used; + byte *buf = buff->t.u.text + used; static word_t *saved = NULL; static unsigned long bytesleft = 0; @@ -525,7 +525,7 @@ } } else { if (buff->t.leng < buff->size) /* for easier debugging - removable */ - Z(buff->t.text[buff->t.leng]); /* for easier debugging - removable */ + Z(buff->t.u.text[buff->t.leng]); /* for easier debugging - removable */ } return count; @@ -536,7 +536,7 @@ { int count; uint used = buff->t.leng; - byte *buf = buff->t.text + used; + byte *buf = buff->t.u.text + used; static word_t *saved = NULL; static bool dot_found = true; @@ -568,7 +568,7 @@ (buf[1] == '\r' || buf[1] == '\n')) dot_found = true; /* dot found. look for separator */ if (buff->t.leng < buff->size) /* for easier debugging - removable */ - Z(buff->t.text[buff->t.leng]); /* for easier debugging - removable */ + Z(buff->t.u.text[buff->t.leng]); /* for easier debugging - removable */ } return count; @@ -580,7 +580,7 @@ int count = buff_fgetsl(buff, fpin); if (buff->t.leng < buff->size) /* for easier debugging - removable */ - Z(buff->t.text[buff->t.leng]); /* for easier debugging - removable */ + Z(buff->t.u.text[buff->t.leng]);/* for easier debugging - removable */ return count; } Modified: trunk/bogofilter/src/bogotune.c =================================================================== --- trunk/bogofilter/src/bogotune.c 2009-01-12 04:12:41 UTC (rev 6765) +++ trunk/bogofilter/src/bogotune.c 2009-01-12 04:27:36 UTC (rev 6766) @@ -743,7 +743,7 @@ } } - print_msgcount_entry((char *)token->text, cnts->bad, cnts->good); + print_msgcount_entry((char *)token->u.text, cnts->bad, cnts->good); } return; Modified: trunk/bogofilter/src/bogoutil.c =================================================================== --- trunk/bogofilter/src/bogoutil.c 2009-01-12 04:12:41 UTC (rev 6765) +++ trunk/bogofilter/src/bogoutil.c 2009-01-12 04:27:36 UTC (rev 6766) @@ -86,10 +86,10 @@ return 0; if (replace_nonascii_characters) - do_replace_nonascii_characters(key->text, key->leng); + do_replace_nonascii_characters(key->u.text, key->leng); fprintf(fpo, "%.*s %lu %lu", - CLAMP_INT_MAX(key->leng), key->text, + CLAMP_INT_MAX(key->leng), key->u.text, (unsigned long)data->spamcount, (unsigned long)data->goodcount); if (data->date) @@ -285,7 +285,7 @@ { int rv = 0; - if (fgets((char *)buff->t.text, buff->size, fp) == NULL) { + if (fgets((char *)buff->t.u.text, buff->size, fp) == NULL) { if (ferror(fp)) { perror(progname); rv = 2; @@ -293,17 +293,17 @@ rv = 1; } } else { - buff->t.leng = (uint) strlen((const char *)buff->t.text); - if (buff->t.text[buff->t.leng - 1] == '\n' ) { + buff->t.leng = (uint) strlen((const char *)buff->t.u.text); + if (buff->t.u.text[buff->t.leng - 1] == '\n' ) { buff->t.leng -= 1; - buff->t.text[buff->t.leng] = (byte) '\0'; + buff->t.u.text[buff->t.leng] = (byte) '\0'; } else { fprintf(stderr, "%s: Unexpected input [%s]. Does not end with newline " "or line too long.\n", - progname, buff->t.text); + progname, buff->t.u.text); rv = 1; } } @@ -384,11 +384,11 @@ good_count = val.goodcount; if (!show_probability) - fprintf(fpo, data_format, token->text, spam_count, good_count); + fprintf(fpo, data_format, token->u.text, spam_count, good_count); else { rob_prob = calc_prob(good_count, spam_count, msgcnts.goodcount, msgcnts.spamcount); - fprintf(fpo, data_format, token->text, spam_count, good_count, rob_prob); + fprintf(fpo, data_format, token->u.text, spam_count, good_count, rob_prob); } break; case 1: Modified: trunk/bogofilter/src/buff.c =================================================================== --- trunk/bogofilter/src/buff.c 2009-01-12 04:12:41 UTC (rev 6765) +++ trunk/bogofilter/src/buff.c 2009-01-12 04:27:36 UTC (rev 6766) @@ -21,7 +21,7 @@ /* Function Definitions */ buff_t *buff_init(buff_t *self, byte *buff, uint used, uint size) { - self->t.text = buff; + self->t.u.text = buff; self->t.leng = used; self->read = 0; self->size = size; @@ -43,7 +43,7 @@ int buff_fgetsln(buff_t *self, FILE *in, uint maxlen) { uint readpos = self->t.leng; - int readcnt = xfgetsl((char *)self->t.text + readpos, + int readcnt = xfgetsl((char *)self->t.u.text + readpos, min(self->size - readpos, maxlen), in, true); /* WARNING: do not add NUL termination, the size must be exact! */ self->read = readpos; @@ -58,13 +58,13 @@ int readcnt = in->leng; uint new_size = self->t.leng + in->leng; if (new_size > self->size) { - self->t.text = xrealloc(self->t.text, new_size); + self->t.u.text = xrealloc(self->t.u.text, new_size); self->size = new_size; } self->read = readpos; self->t.leng += readcnt; - memcpy(self->t.text + readpos, in->text, readcnt); - Z(self->t.text[self->t.leng]); /* for easier debugging - removable */ + memcpy(self->t.u.text + readpos, in->u.text, readcnt); + Z(self->t.u.text[self->t.leng]); /* for easier debugging - removable */ return readcnt; } @@ -73,7 +73,7 @@ { word_t word; word.leng = self->t.leng - self->read; - word.text = self->t.text + self->read; + word.u.text = self->t.u.text + self->read; word_puts(&word, width, fp); } @@ -85,8 +85,8 @@ BOGO_ASSERT(start + length <= self->t.leng, "Invalid buff_shift() parameters."); - memmove(self->t.text + start, self->t.text + start + length, self->t.leng - length); + memmove(self->t.u.text + start, self->t.u.text + start + length, self->t.leng - length); self->t.leng -= length; - Z(self->t.text[self->t.leng]); /* for easier debugging - removable */ + Z(self->t.u.text[self->t.leng]); /* for easier debugging - removable */ return; } Modified: trunk/bogofilter/src/collect.c =================================================================== --- trunk/bogofilter/src/collect.c 2009-01-12 04:12:41 UTC (rev 6765) +++ trunk/bogofilter/src/collect.c 2009-01-12 04:27:36 UTC (rev 6766) @@ -52,12 +52,12 @@ if (cls == BOGO_LEX_LINE) { - char *beg = (char *)token.text+1; /* skip leading quote mark */ + char *beg = (char *)token.u.text+1; /* skip leading quote mark */ char *end = strchr(beg, '"'); assert(end); token.leng = end - beg; - memmove(token.text, token.text + 1, token.leng + 1); - token.text[token.leng] = '\0'; /* ensure nul termination */ + memmove(token.u.text, token.u.text + 1, token.leng + 1); + token.u.text[token.leng] = '\0'; /* ensure nul termination */ } wp = wordhash_insert(wh, &token, sizeof(wordprop_t), &wordprop_init); @@ -96,7 +96,7 @@ if (cls == BOGO_LEX_LINE) { - char *s = (char *)token.text; + char *s = (char *)token.u.text; s += token.leng + 2; wp->cnts.bad = atoi(s); s = strchr(s+1, ' ') + 1; Modified: trunk/bogofilter/src/datastore.c =================================================================== --- trunk/bogofilter/src/datastore.c 2009-01-12 04:12:41 UTC (rev 6765) +++ trunk/bogofilter/src/datastore.c 2009-01-12 04:27:36 UTC (rev 6766) @@ -203,7 +203,7 @@ struct_init(ex_key); struct_init(ex_data); - ex_key.data = word->text; + ex_key.data = word->u.text; ex_key.leng = word->leng; memset(val, 0, sizeof(*val)); @@ -224,7 +224,7 @@ if (DEBUG_DATABASE(3)) { fprintf(dbgout, "ds_read: [%.*s] -- %lu,%lu\n", - CLAMP_INT_MAX(word->leng), (const char *)word->text, + CLAMP_INT_MAX(word->leng), (const char *)word->u.text, (unsigned long)val->spamcount, (unsigned long)val->goodcount); } @@ -233,21 +233,21 @@ case DS_NOTFOUND: if (DEBUG_DATABASE(3)) { fprintf(dbgout, "ds_read: [%.*s] not found\n", - CLAMP_INT_MAX(word->leng), (char *) word->text); + CLAMP_INT_MAX(word->leng), (char *) word->u.text); } return 1; case DS_ABORT_RETRY: if (DEBUG_DATABASE(1)) { print_error(__FILE__, __LINE__, "ds_read('%.*s') was aborted to recover from a deadlock.", - CLAMP_INT_MAX(word->leng), (char *) word->text); + CLAMP_INT_MAX(word->leng), (char *) word->u.text); } break; default: fprintf(dbgout, "ret=%d, DS_NOTFOUND=%d\n", ret, DS_NOTFOUND); print_error(__FILE__, __LINE__, "ds_read( '%.*s' ), err: %d, %s", - CLAMP_INT_MAX(word->leng), (char *) word->text, ret, db_str_err(ret)); + CLAMP_INT_MAX(word->leng), (char *) word->u.text, ret, db_str_err(ret)); exit(EX_ERROR); } @@ -265,7 +265,7 @@ struct_init(ex_key); struct_init(ex_data); - ex_key.data = word->text; + ex_key.data = word->u.text; ex_key.leng = word->leng; ex_data.data = cv; @@ -280,7 +280,7 @@ if (DEBUG_DATABASE(3)) { fprintf(dbgout, "ds_write: [%.*s] -- %lu,%lu,%lu\n", - CLAMP_INT_MAX(word->leng), (const char *)word->text, + CLAMP_INT_MAX(word->leng), (const char *)word->u.text, (unsigned long)val->spamcount, (unsigned long)val->goodcount, (unsigned long)val->date); @@ -296,7 +296,7 @@ dbv_t ex_key; struct_init(ex_key); - ex_key.data = word->text; + ex_key.data = word->u.text; ex_key.leng = word->leng; ret = db_delete(dsh->dbh, &ex_key); @@ -344,7 +344,7 @@ ds_userdata_t *ds_data = userdata; dsh_t *dsh = ds_data->dsh; - w_key.text = ex_key->data; + w_key.u.text = ex_key->data; w_key.leng = ex_key->leng; memset(&in_data, 0, sizeof(in_data)); Modified: trunk/bogofilter/src/format.c =================================================================== --- trunk/bogofilter/src/format.c 2009-01-12 04:12:41 UTC (rev 6765) +++ trunk/bogofilter/src/format.c 2009-01-12 04:27:36 UTC (rev 6766) @@ -325,7 +325,7 @@ *buff++ = '%'; break; case 'A': /* A - Message Address */ - buff += format_string(buff, (*msg_addr->text != '\0') ? (const char *)msg_addr->text : "UNKNOWN", 0, prec, flags, end); + buff += format_string(buff, (*msg_addr->u.text != '\0') ? (const char *)msg_addr->u.text : "UNKNOWN", 0, prec, flags, end); break; case 'c': /* c - classification, e.g. Yes/No, Spam/Ham/Unsure, or YN, SHU, +-? */ { @@ -347,10 +347,10 @@ break; } case 'I': /* M - Message ID */ - buff += format_string(buff, (*msg_id->text != '\0') ? (const char *)msg_id->text : "UNKNOWN", 0, prec, flags, end); + buff += format_string(buff, (*msg_id->u.text != '\0') ? (const char *)msg_id->u.text : "UNKNOWN", 0, prec, flags, end); break; case 'Q': /* Q - Queue ID */ - buff += format_string(buff, (*queue_id->text != '\0') ? (const char *)queue_id->text : "UNKNOWN", 0, prec, flags, end); + buff += format_string(buff, (*queue_id->u.text != '\0') ? (const char *)queue_id->u.text : "UNKNOWN", 0, prec, flags, end); break; case 'p': /* p - spamicity as a probability */ { Modified: trunk/bogofilter/src/iconvert.c =================================================================== --- trunk/bogofilter/src/iconvert.c 2009-01-12 04:12:41 UTC (rev 6765) +++ trunk/bogofilter/src/iconvert.c 2009-01-12 04:27:36 UTC (rev 6766) @@ -54,7 +54,7 @@ } if (msg != NULL) fprintf(dbgout, "err: %s (%d), tx: %p, rd: %d, ln: %d, sz: %d\n", - msg, err, src->t.text, src->read, src->t.leng, src->size); + msg, err, src->t.u.text, src->read, src->t.leng, src->size); } } @@ -70,10 +70,10 @@ size_t outbytesleft; size_t count; - inbuf = (char *)src->t.text + src->read; + inbuf = (char *)src->t.u.text + src->read; inbytesleft = src->t.leng - src->read; - outbuf = (char *)dst->t.text + dst->t.leng; + outbuf = (char *)dst->t.u.text + dst->t.leng; outbytesleft = dst->size - dst->read - dst->t.leng; if (outbytesleft == 0) @@ -169,19 +169,19 @@ done = true; } - Z(dst->t.text[dst->t.leng]); /* for easier debugging - removable */ + Z(dst->t.u.text[dst->t.leng]); /* for easier debugging - removable */ if (DEBUG_ICONV(1) && src->t.leng != src->read) fprintf(dbgout, "tx: %p, rd: %d, ln: %d, sz: %d\n", - src->t.text, src->read, src->t.leng, src->size); + src->t.u.text, src->read, src->t.leng, src->size); } static void copy(buff_t *src, buff_t *dst) { /* if conversion not available, use memcpy */ dst->t.leng = min(dst->size, src->t.leng); - memcpy(dst->t.text, src->t.text, dst->t.leng+D); + memcpy(dst->t.u.text, src->t.u.text, dst->t.leng+D); } void iconvert(buff_t *src, buff_t *dst) Modified: trunk/bogofilter/src/lexer.c =================================================================== --- trunk/bogofilter/src/lexer.c 2009-01-12 04:12:41 UTC (rev 6765) +++ trunk/bogofilter/src/lexer.c 2009-01-12 04:27:36 UTC (rev 6766) @@ -75,7 +75,7 @@ yylineno-1, msg_header ? 'h' : 'b', yy_get_state(), (long)(buff->t.leng - buff->read)); buff_puts(buff, 0, dbgout); - if (buff->t.leng > 0 && buff->t.text[buff->t.leng-1] != '\n') + if (buff->t.leng > 0 && buff->t.u.text[buff->t.leng-1] != '\n') fputc('\n', dbgout); } @@ -105,7 +105,7 @@ static int yy_get_new_line(buff_t *buff) { int count = (*reader_getline)(buff); - const byte *buf = buff->t.text; + const byte *buf = buff->t.u.text; static size_t hdrlen = 0; if (hdrlen==0) @@ -142,7 +142,7 @@ && count != EOF /* don't skip if inside message/rfc822 */ && msg_state->parent == NULL - && memcmp(buff->t.text,spam_header_name,hdrlen) == 0) { + && memcmp(buff->t.u.text,spam_header_name,hdrlen) == 0) { count = skip_folded_line(buff); } @@ -171,9 +171,9 @@ * sufficiently small that the UTF-8 text can fit in the output * buffer */ if (tempbuff->size < buff->size / 6) { - xfree(tempbuff->t.text); + xfree(tempbuff->t.u.text); tempbuff->size = buff->size / 6; - tempbuff->t.text = (byte *) xmalloc(tempbuff->size+D); + tempbuff->t.u.text = (byte *) xmalloc(tempbuff->size+D); } tempbuff->t.leng = tempbuff->read = 0; @@ -198,7 +198,7 @@ * than one of these. */ if (passthrough && passmode == PASS_MEM && count > 0) - textblock_add(linebuff->t.text+linebuff->read, (size_t) count); + textblock_add(linebuff->t.u.text+linebuff->read, (size_t) count); if ( !msg_header && !msg_state->mime_dont_decode && @@ -208,7 +208,7 @@ uint decoded_count; temp.leng = (uint) count; - temp.text = linebuff->t.text+linebuff->read; + temp.u.text = linebuff->t.u.text+linebuff->read; decoded_count = mime_decode(&temp); /*change buffer size only if the decoding worked */ @@ -243,7 +243,7 @@ /* CRLF -> NL */ if (count >= 2) { - byte *buf = buff->t.text; + byte *buf = buff->t.u.text; if (memcmp(buf + count - 2, CRLF, 2) == 0) { count --; *(buf + count - 1) = (byte) '\n'; @@ -251,7 +251,7 @@ } if (buff->t.leng < buff->size) /* for easier debugging - removable */ - Z(buff->t.text[buff->t.leng]); /* for easier debugging - removable */ + Z(buff->t.u.text[buff->t.leng]); /* for easier debugging - removable */ return count; } @@ -265,11 +265,11 @@ yylineno += 1; /* only check for LWSP-char (RFC-822) aka. WSP (RFC-2822), * these only include SP and HTAB */ - if (buff->t.text[0] != ' ' && - buff->t.text[0] != '\t') + if (buff->t.u.text[0] != ' ' && + buff->t.u.text[0] != '\t') return count; /* Check for empty line which terminates message header */ - if (is_eol((char *)buff->t.text, count)) + if (is_eol((char *)buff->t.u.text, count)) return count; } } @@ -334,7 +334,7 @@ break; if (count >= MAX_TOKEN_LEN * 2 && - long_token(buff.t.text, (uint) count)) { + long_token(buff.t.u.text, (uint) count)) { uint start = buff.t.leng - count; uint length = count - max_token_len; buff_shift(&buff, start, length); @@ -391,10 +391,10 @@ word_t *text_decode(word_t *w) { word_t *r = w; - byte *const beg = w->text; /* base pointer, fixed */ + byte *const beg = w->u.text; /* base pointer, fixed */ byte *const fin = beg + w->leng; /* end+1 position */ - byte *txt = (byte *) memstr(w->text, w->leng, "=?"); /* input position */ + byte *txt = (byte *) memstr(w->u.text, w->leng, "=?"); /* input position */ uint size = (uint) (txt - beg); /* output offset */ #ifndef DISABLE_UNICODE @@ -414,12 +414,12 @@ buf->t.leng = 0; if (buf->size < max) { buf->size = max; - buf->t.text = (byte *) xrealloc(buf->t.text, buf->size+D); + buf->t.u.text = (byte *) xrealloc(buf->t.u.text, buf->size+D); } buf->t.leng = size; - memcpy(buf->t.text, beg, size ); - Z(buf->t.text[buf->t.leng]); /* for easier debugging - removable */ + memcpy(buf->t.u.text, beg, size ); + Z(buf->t.u.text[buf->t.leng]); /* for easier debugging - removable */ } #endif @@ -446,9 +446,9 @@ end = (byte *) memstr((char *)tmp, fin-tmp, "?="); /* last byte of encoded word */ len = end - tmp; - w->text = tmp; /* Start of encoded word */ + w->u.text = tmp; /* Start of encoded word */ w->leng = len; /* Length of encoded word */ - Z(w->text[w->leng]); /* for easier debugging - removable */ + Z(w->u.text[w->leng]); /* for easier debugging - removable */ if (DEBUG_LEXER(2)) { fputs("**2** ", dbgout); @@ -469,7 +469,7 @@ /* move decoded word to where the encoded used to be */ if (encoding == E_RAW) { - memmove(beg+size, w->text, len); + memmove(beg+size, w->u.text, len); size += len; /* bump output pointer */ Z(beg[size]); /* for easier debugging - removable */ @@ -485,7 +485,7 @@ /* convert 'word_t *w' to 'buff_t src' because ** iconvert_cd() needs buff_t pointers */ - src.t.text = w->text; + src.t.u.text = w->u.text; src.t.leng = len; src.read = 0; src.size = len; @@ -534,13 +534,13 @@ beg[size++] = *txt++; #ifndef DISABLE_UNICODE if (encoding == E_UNICODE) - buf->t.text[buf->t.leng++] = *txt++; + buf->t.u.text[buf->t.leng++] = *txt++; #endif } } if (encoding == E_RAW) { - r->text = beg; + r->u.text = beg; r->leng = size; } Modified: trunk/bogofilter/src/lexer_v3.l =================================================================== --- trunk/bogofilter/src/lexer_v3.l 2009-01-12 04:12:41 UTC (rev 6765) +++ trunk/bogofilter/src/lexer_v3.l 2009-01-12 04:27:36 UTC (rev 6766) @@ -117,7 +117,7 @@ static word_t *yy_text(void) { static word_t yyt; - yyt.text = (byte *)yytext; + yyt.u.text = (byte *)yytext; yyt.leng = yyleng; return &yyt; } @@ -225,7 +225,7 @@ <INITIAL>{ENCODED_TOKEN} { word_t *raw = yy_text(); word_t *txt = text_decode(raw); - yy_unput(txt->text, txt->leng); + yy_unput(txt->u.text, txt->leng); } <INITIAL>^(To|CC|From|Return-Path|Subject|Received): { set_tag(yytext); } Modified: trunk/bogofilter/src/maint.c =================================================================== --- trunk/bogofilter/src/maint.c 2009-01-12 04:12:41 UTC (rev 6765) +++ trunk/bogofilter/src/maint.c 2009-01-12 04:27:36 UTC (rev 6766) @@ -117,10 +117,10 @@ { bool discard; - if (token->text[0] == '.') { /* keep .MSG_COUNT and .ROBX */ - if (strcmp((const char *)token->text, MSG_COUNT) == 0) + if (token->u.text[0] == '.') { /* keep .MSG_COUNT and .ROBX */ + if (strcmp((const char *)token->u.text, MSG_COUNT) == 0) return false; - if (strcmp((const char *)token->text, ROBX_W) == 0) + if (strcmp((const char *)token->u.text, ROBX_W) == 0) return false; } @@ -167,31 +167,31 @@ void *vhandle = ((struct userdata_t *) userdata)->vhandle; ta_t *transaction = ((struct userdata_t *) userdata)->transaction; - token.text = w_key->text; + token.u.text = w_key->u.text; token.leng = w_key->leng; len = strlen(MSG_COUNT); if (len == token.leng && - strncmp((char *)token.text, MSG_COUNT, token.leng) == 0) + strncmp((char *)token.u.text, MSG_COUNT, token.leng) == 0) return EX_OK; if (discard_token(&token, in_val)) { int ret = ta_delete(transaction, vhandle, &token); if (DEBUG_DATABASE(0)) - fprintf(dbgout, "deleting '%.*s'\n", (int)min(INT_MAX, token.leng), (char *)token.text); + fprintf(dbgout, "deleting '%.*s'\n", (int)min(INT_MAX, token.leng), (char *)token.u.text); return ret; } if (replace_nonascii_characters) { word_t new_token; - new_token.text = (byte *)xmalloc(token.leng + 1); - memcpy(new_token.text, token.text, token.leng); + new_token.u.text = (byte *)xmalloc(token.leng + 1); + memcpy(new_token.u.text, token.u.text, token.leng); new_token.leng = token.leng; - new_token.text[new_token.leng] = '\0'; - if (do_replace_nonascii_characters(new_token.text, new_token.leng)) + new_token.u.text[new_token.leng] = '\0'; + if (do_replace_nonascii_characters(new_token.u.text, new_token.leng)) merge_tokens(&token, &new_token, in_val, transaction, vhandle); - xfree(new_token.text); + xfree(new_token.u.text); } #ifndef DISABLE_UNICODE @@ -202,18 +202,18 @@ old_buff.read = 0; old_buff.size = token.leng; - old_buff.t.text = token.text; + old_buff.t.u.text = token.u.text; old_buff.t.leng = token.leng; new_buff.read = 0; new_buff.size = token.leng * 6; new_buff.t.leng = 0; - new_buff.t.text = (byte *)xmalloc(new_buff.size); + new_buff.t.u.text = (byte *)xmalloc(new_buff.size); iconvert(&old_buff, &new_buff); if (old_buff.t.leng != new_buff.t.leng || - memcmp(old_buff.t.text, new_buff.t.text, new_buff.t.leng) != 0) { + memcmp(old_buff.t.u.text, new_buff.t.u.text, new_buff.t.leng) != 0) { if (DEBUG_ICONV(2)) { fputs("*** ", dbgout); word_puts(&old_buff.t, 0, dbgout); fputs( "\n", dbgout); fputs("*** ", dbgout); word_puts(&new_buff.t, 0, dbgout); fputs( "\n", dbgout); @@ -222,7 +222,7 @@ merge_tokens(&old_buff.t, &new_buff.t, in_val, transaction, vhandle); } - xfree(new_buff.t.text); + xfree(new_buff.t.u.text); } #endif @@ -244,16 +244,16 @@ const char *ip_hdr = "ip:"; size_t ip_len = strlen(ip_hdr); - if (token.leng > url_len && memcmp(token.text, url_hdr, url_len) == 0) + if (token.leng > url_len && memcmp(token.u.text, url_hdr, url_len) == 0) { word_t new_token; new_token.leng = token.leng + ip_len - url_len; - new_token.text = (byte *)xmalloc(new_token.leng + 1); - memcpy(new_token.text, ip_hdr, ip_len); - memcpy(new_token.text+ip_len, token.text+url_len, token.leng - url_len); - new_token.text[new_token.leng] = '\0'; + new_token.u.text = (byte *)xmalloc(new_token.leng + 1); + memcpy(new_token.u.text, ip_hdr, ip_len); + memcpy(new_token.u.text+ip_len, token.u.text+url_len, token.leng - url_len); + new_token.u.text[new_token.leng] = '\0'; replace_token(&token, &new_token, in_val, transaction, vhandle); - xfree(new_token.text); + xfree(new_token.u.text); } break; } @@ -323,14 +323,14 @@ dsv_t val; word_t enco; - enco.text = (byte *)xstrdup(WORDLIST_ENCODING); + enco.u.text = (byte *)xstrdup(WORDLIST_ENCODING); enco.leng = strlen(WORDLIST_ENCODING); val.count[0] = new_encoding; val.count[1] = 0; val.date = 0; ds_write(database, &enco, &val); - xfree(enco.text); + xfree(enco.u.text); } #endif Modified: trunk/bogofilter/src/mime.c =================================================================== --- trunk/bogofilter/src/mime.c 2009-01-12 04:12:41 UTC (rev 6765) +++ trunk/bogofilter/src/mime.c 2009-01-12 04:27:36 UTC (rev 6766) @@ -288,7 +288,7 @@ boundary_t * b /*@out@*/ /**< output properties, must be pre-allocated by caller */) { mime_t *ptr; - const byte *buf = boundary->text; + const byte *buf = boundary->u.text; size_t blen = boundary->leng; b->is_valid = false; @@ -350,7 +350,7 @@ if (DEBUG_MIME(0)) fprintf(dbgout, "*** got_mime_boundary: stackp: %d, boundary: '%s'\n", - mime_stack_top->depth, boundary->text); + mime_stack_top->depth, boundary->u.text); if (msg_state != NULL) { @@ -420,7 +420,7 @@ void mime_content(word_t * text) { - char *key = (char *) text->text; + char *key = (char *) text->u.text; switch (tolower(key[9])) { case 'r': /* Content-Transfer-Encoding: */ mime_encoding(text); @@ -438,7 +438,7 @@ { size_t i; const size_t l = sizeof("Content-Disposition:") - 1; - byte *w = getword(text->text + l, text->text + text->leng); + byte *w = getword(text->u.text + l, text->u.text + text->leng); if (!w) return; @@ -449,7 +449,7 @@ if (strcasecmp((const char *)w, dis->name) == 0) { msg_state->mime_disposition = dis->disposition; if (DEBUG_MIME(1)) - fprintf(dbgout, "*** mime_disposition: %s\n", text->text); + fprintf(dbgout, "*** mime_disposition: %s\n", text->u.text); break; } } @@ -479,7 +479,7 @@ { size_t i; const size_t l = sizeof("Content-Transfer-Encoding:") - 1; - byte *w = getword(text->text + l, text->text + text->leng); + byte *w = getword(text->u.text + l, text->u.text + text->leng); if (!w) return; @@ -490,7 +490,7 @@ if (strcasecmp((const char *)w, enc->name) == 0) { msg_state->mime_encoding = enc->encoding; if (DEBUG_MIME(1)) - fprintf(dbgout, "*** mime_encoding: %s\n", text->text); + fprintf(dbgout, "*** mime_encoding: %s\n", text->u.text); break; } } @@ -508,7 +508,7 @@ { const struct type_s *typ; const size_t l = sizeof("Content-Type:") - 1; - byte *w = getword(text->text + l, text->text + text->leng); + byte *w = getword(text->u.text + l, text->u.text + text->leng); if (!w) return; @@ -519,7 +519,7 @@ if (strncasecmp((const char *)w, typ->name, strlen(typ->name)) == 0) { msg_state->mime_type = typ->type; if (DEBUG_MIME(1) || DEBUG_LEXER(1)) - fprintf(dbgout, "*** mime_type: %s\n", text->text); + fprintf(dbgout, "*** mime_type: %s\n", text->u.text); break; } } @@ -545,7 +545,7 @@ void mime_boundary_set(word_t * text) { - byte *boundary = text->text; + byte *boundary = text->u.text; size_t blen = text->leng; if (DEBUG_MIME(1)) { @@ -582,7 +582,7 @@ if (DEBUG_MIME(3)) fprintf(dbgout, "*** mime_decode %lu \"%-.*s\"\n", (unsigned long) count, - count > INT_MAX ? INT_MAX : (int) (count - 1), text->text); + count > INT_MAX ? INT_MAX : (int) (count - 1), text->u.text); /* Do not decode "real" boundary lines */ if (mime_is_boundary(text) == true) Modified: trunk/bogofilter/src/qp.c =================================================================== --- trunk/bogofilter/src/qp.c 2009-01-12 04:12:41 UTC (rev 6765) +++ trunk/bogofilter/src/qp.c 2009-01-12 04:27:36 UTC (rev 6766) @@ -46,8 +46,8 @@ uint qp_decode(word_t *word, qp_mode mode) { uint size = word->leng; - byte *s = word->text; /* src */ - byte *d = word->text; /* dst */ + byte *s = word->u.text; /* src */ + byte *d = word->u.text; /* dst */ byte *e = s + size; /* end */ while (s < e) @@ -77,7 +77,7 @@ } /* do not stuff NUL byte here: * if there was one, it has been copied! */ - return d - word->text; + return d - word->u.text; } /* rfc2047 - QP [!->@-~]+ @@ -115,7 +115,7 @@ qp_init(); for (i = 0; i < word->leng; i += 1) { - byte b = word->text[i]; + byte b = word->u.text[i]; byte v = qp_xlate[b]; if (v == 0) switch (b) { Modified: trunk/bogofilter/src/robx.c =================================================================== --- trunk/bogofilter/src/robx.c 2009-01-12 04:12:41 UTC (rev 6765) +++ trunk/bogofilter/src/robx.c 2009-01-12 04:27:36 UTC (rev 6766) @@ -51,7 +51,7 @@ " sp: %3lu, gd: %3lu, p: %9.6f, t: %.*s\n", (unsigned long)rh->count, rh->sum, rh->sum / rh->count, (unsigned long)spamness, (unsigned long)goodness, prob, - CLAMP_INT_MAX(key->leng), key->text); + CLAMP_INT_MAX(key->leng), key->u.text); } } @@ -61,7 +61,7 @@ struct robhook_data *rh = userdata; /* ignore system meta-data */ - if (*key->text != '.') + if (*key->u.text != '.') robx_accum(rh, key, data); return 0; Modified: trunk/bogofilter/src/tests/deb64.c =================================================================== --- trunk/bogofilter/src/tests/deb64.c 2009-01-12 04:12:41 UTC (rev 6765) +++ trunk/bogofilter/src/tests/deb64.c 2009-01-12 04:27:36 UTC (rev 6766) @@ -22,9 +22,9 @@ size = ftell(stdin); if (fseek(stdin, 0, SEEK_SET)) die(); w = word_new(NULL, size); - if (fread(w->text, 1, w->leng, stdin) != w->leng) die(); + if (fread(w->u.text, 1, w->leng, stdin) != w->leng) die(); size = base64_decode(w); - if (fwrite(w->text, 1, size, stdout) != size) die(); + if (fwrite(w->u.text, 1, size, stdout) != size) die(); word_free(w); if (fflush(stdout)) die(); if (fclose(stdout)) die(); Modified: trunk/bogofilter/src/tests/deqp.c =================================================================== --- trunk/bogofilter/src/tests/deqp.c 2009-01-12 04:12:41 UTC (rev 6765) +++ trunk/bogofilter/src/tests/deqp.c 2009-01-12 04:27:36 UTC (rev 6766) @@ -26,9 +26,9 @@ size = ftell(stdin); if (fseek(stdin, 0, SEEK_SET)) die(); w = word_new(NULL, size); - if (fread(w->text, 1, w->leng, stdin) != w->leng) die(); + if (fread(w->u.text, 1, w->leng, stdin) != w->leng) die(); size = qp_decode(w, mode); - if (fwrite(w->text, 1, size, stdout) != size) die(); + if (fwrite(w->u.text, 1, size, stdout) != size) die(); word_free(w); if (fflush(stdout)) die(); if (fclose(stdout)) die(); Modified: trunk/bogofilter/src/token.c =================================================================== --- trunk/bogofilter/src/token.c 2009-01-12 04:12:41 UTC (rev 6765) +++ trunk/bogofilter/src/token.c 2009-01-12 04:27:36 UTC (rev 6766) @@ -95,7 +95,7 @@ for (i = 0; i < multi_token_count; i += 1) { words->leng = 0; - words->text = text; + words->u.text = text; w_token_array[i] = words; words += 1; text += max_token_len+1+D; @@ -112,13 +112,13 @@ static void token_set( word_t *token, byte *text, uint leng ) { token->leng = leng; - memcpy(token->text, text, leng); /* include nul terminator */ - token->text[leng] = '\0'; /* ensure nul termination */ + memcpy(token->u.text, text, leng); /* include nul terminator */ + token->u.text[leng] = '\0'; /* ensure nul termination */ } static inline void token_copy( word_t *dst, word_t *src ) { - token_set(dst, src->text, src->leng); + token_set(dst, src->u.text, src->leng); } static void build_prefixed_token( word_t *prefix, word_t *token, @@ -130,12 +130,12 @@ len = temp_size - prefix->leng - 1; temp->leng = len; - memmove(temp->text+prefix->leng, token->text, len-prefix->leng); - memcpy(temp->text, prefix->text, prefix->leng); - Z(temp->text[temp->leng]); + memmove(temp->u.text+prefix->leng, token->u.text, len-prefix->leng); + memcpy(temp->u.text, prefix->u.text, prefix->leng); + Z(temp->u.text[temp->leng]); token->leng = temp->leng; - token->text = temp->text; + token->u.text = temp->u.text; } #define WRAP(n) ((n) % multi_token_count) @@ -186,13 +186,13 @@ /* If saved IPADDR, truncate last octet */ if ( block_on_subnets && save_class == IPADDR ) { - byte *t = xmemrchr(ipsave->text, '.', ipsave->leng); + byte *t = xmemrchr(ipsave->u.text, '.', ipsave->leng); if (t == NULL) save_class = NONE; else { - ipsave->leng = (uint) (t - ipsave->text); - token_set( token, ipsave->text, ipsave->leng); + ipsave->leng = (uint) (t - ipsave->u.text); + token_set( token, ipsave->u.text, ipsave->leng); cls = save_class; done = true; } @@ -205,11 +205,11 @@ cls = (*lexer->yylex)(); token->leng = (uint) *lexer->yyleng; - token->text = (byte *) *lexer->yytext; - Z(token->text[token->leng]); /* for easier debugging - removable */ + token->u.text = (byte *) *lexer->yytext; + Z(token->u.text[token->leng]); /* for easier debugging - removable */ leng = token->leng; - text = token->text; + text = token->u.text; if (DEBUG_TEXT(2)) { word_puts(token, 0, dbgout); @@ -261,7 +261,7 @@ *ot = *in; token->leng = leng = (uint) (ot - st); } - Z(token->text[token->leng]); /* for easier debugging - removable */ + Z(token->u.text[token->leng]); /* for easier debugging - removable */ } break; @@ -287,7 +287,7 @@ if (leng > max_token_len) continue; - token->text = text; + token->u.text = text; token->leng = leng; if (token_prefix == NULL) { @@ -327,7 +327,7 @@ case QUEUE_ID: /* special token; saved for formatted output, but not returned to bogofilter */ /** \bug: the parser MUST be aligned with lexer_v3.l! */ - if (*queue_id->text == '\0' && + if (*queue_id->u.text == '\0' && leng < max_token_len ) { while (isspace(text[0])) { @@ -350,8 +350,8 @@ leng -= 1; } leng = min(queue_id->leng, leng); - memcpy( queue_id->text, text, leng ); - Z(queue_id->text[leng]); + memcpy( queue_id->u.text, text, leng ); + Z(queue_id->u.text[leng]); } continue; @@ -365,12 +365,12 @@ /* if top level, no address, not localhost, .... */ if (token_prefix == w_recv && msg_state->parent == NULL && - *msg_addr->text == '\0' && + *msg_addr->u.text == '\0' && strcmp((char *)text, "127.0.0.1") != 0) { /* Not guaranteed to be the originating address of the message. */ - memcpy( msg_addr->text, yylval.text, min(msg_addr->leng, yylval.leng)+D ); - Z(msg_addr->text[yylval.leng]); + memcpy( msg_addr->u.text, yylval.u.text, min(msg_addr->leng, yylval.leng)+D ); + Z(msg_addr->u.text[yylval.leng]); } } @@ -398,7 +398,7 @@ q1 & 0xff, q2 & 0xff, q3 & 0xff, q4 & 0xff); leng = strlen((const char *)text); - token->text = text; + token->u.text = text; token->leng = leng; token_copy( ipsave, token ); @@ -408,7 +408,7 @@ return (cls); } - token->text = text; + token->u.text = text; token->leng = leng; break; @@ -448,20 +448,20 @@ if (!msg_count_file) { /* Remove trailing blanks */ /* From "From ", for example */ - while (token->leng > 1 && token->text[token->leng-1] == ' ') { + while (token->leng > 1 && token->u.text[token->leng-1] == ' ') { token->leng -= 1; - token->text[token->leng] = (byte) '\0'; + token->u.text[token->leng] = (byte) '\0'; } /* Remove trailing colon */ - if (token->leng > 1 && token->text[token->leng-1] == ':') { + if (token->leng > 1 && token->u.text[token->leng-1] == ':') { token->leng -= 1; - token->text[token->leng] = (byte) '\0'; + token->u.text[token->leng] = (byte) '\0'; } if (replace_nonascii_characters) { /* replace nonascii characters by '?'s */ - for (cp = token->text; cp < token->text+token->leng; cp += 1) + for (cp = token->u.text; cp < token->u.text+token->leng; cp += 1) *cp = casefold_table[*cp]; } } @@ -476,12 +476,12 @@ word_t *w = w_token_array[WRAP(tok_count)]; w->leng = token->leng; - memcpy(w->text, token->text, w->leng); - Z(w->text[w->leng]); /* for easier debugging - removable */ + memcpy(w->u.text, token->u.text, w->leng); + Z(w->u.text[w->leng]); /* for easier debugging - removable */ if (DEBUG_MULTI(1)) fprintf(stderr, "%s:%d %2s %2d %2d %p %s\n", __FILE__, __LINE__, - "", tok_count, w->leng, w->text, w->text); + "", tok_count, w->leng, w->u.text, w->u.text); tok_count += 1; init_token = 1; @@ -500,19 +500,19 @@ leng = init_token; for ( tok = init_token; tok >= 0; tok -= 1 ) { uint idx = tok_count - 1 - tok; - leng += strlen((char *) w_token_array[WRAP(idx)]->text); + leng += strlen((char *) w_token_array[WRAP(idx)]->u.text); } if (leng > max_multi_token_len) leng = max_multi_token_len; token->leng = leng; - token->text = dest = p_multi_buff; + token->u.text = dest = p_multi_buff; for ( tok = init_token; tok >= 0; tok -= 1 ) { uint idx = tok_count - 1 - tok; uint len = w_token_array[WRAP(idx)]->leng; - byte *str = w_token_array[WRAP(idx)]->text; + byte *str = w_token_array[WRAP(idx)]->u.text; if (DEBUG_MULTI(1)) fprintf(stderr, "%s:%d %2d %2d %2d %p %s\n", __FILE__, __LINE__, @@ -529,7 +529,7 @@ sep = "*"; } - Z(token->text[token->leng]); /* for easier debugging - removable */ + Z(token->u.text[token->leng]); /* for easier debugging - removable */ init_token += 1; /* progress to next multi-token */ return; @@ -564,7 +564,7 @@ yylval_text = (byte *) malloc( yylval_text_size+D ); yylval.leng = 0; - yylval.text = yylval_text; + yylval.u.text = yylval_text; /* First IP Address in Received: statement */ msg_addr = word_new( NULL, max_token_len ); @@ -694,8 +694,8 @@ { if (msg_addr != NULL) { - *msg_addr->text = '\0'; - *msg_id->text = '\0'; - *queue_id->text = '\0'; + *msg_addr->u.text = '\0'; + *msg_id->u.text = '\0'; + *queue_id->u.text = '\0'; } } Modified: trunk/bogofilter/src/uudecode.c =================================================================== --- trunk/bogofilter/src/uudecode.c 2009-01-12 04:12:41 UTC (rev 6765) +++ trunk/bogofilter/src/uudecode.c 2009-01-12 04:27:36 UTC (rev 6766) @@ -18,7 +18,7 @@ { uint size = word->leng; uint count = 0; - byte *b = word->text; /* beg */ + byte *b = word->u.text; /* beg */ byte *s = b; /* src */ byte *d = b; /* dst */ byte *e = b+size; /* end */ Modified: trunk/bogofilter/src/word.c =================================================================== --- trunk/bogofilter/src/word.c 2009-01-12 04:12:41 UTC (rev 6765) +++ trunk/bogofilter/src/word.c 2009-01-12 04:27:36 UTC (rev 6766) @@ -25,12 +25,12 @@ /* to lessen malloc/free calls, allocate struct and data in one block */ word_t *self = xmalloc(sizeof(word_t)+len+1); self->leng = len; - self->text = (byte *)((char *)self+sizeof(word_t)); + self->u.text = (byte *)((char *)self+sizeof(word_t)); if (text != NULL) { - memcpy(self->text, text, len); - self->text[len] = '\0'; /* ensure nul termination */ + memcpy(self->u.text, text, len); + self->u.text[len] = '\0'; /* ensure nul termination */ } else { - self->text[0] = '\0'; /* ditto for text == NULL */ + self->u.text[0] = '\0'; /* ditto for text == NULL */ } return self; } @@ -38,7 +38,7 @@ int word_cmp(const word_t *w1, const word_t *w2) { uint l = min(w1->leng, w2->leng); - int r = memcmp((const char *)w1->text, (const char *)w2->text, l); + int r = memcmp((const char *)w1->u.text, (const char *)w2->u.text, l); if (r) return r; if (w1->leng > w2->leng) return 1; if (w1->leng < w2->leng) return -1; @@ -49,7 +49,7 @@ { word_t w2; w2.leng = strlen(s); - w2.ctext = s; + w2.u.ctext = s; return word_cmp(w, &w2); } @@ -57,9 +57,9 @@ { uint len = w1->leng + w2->leng; word_t *ans = word_new(NULL, len); - memcpy(ans->text, w1->text, w1->leng); - memcpy(ans->text+w1->leng, w2->text, w2->leng); - Z(ans->text[ans->leng]); /* for easier debugging - removable */ + memcpy(ans->u.text, w1->u.text, w1->leng); + memcpy(ans->u.text+w1->leng, w2->u.text, w2->leng); + Z(ans->u.text[ans->leng]); /* for easier debugging - removable */ return ans; } @@ -70,7 +70,7 @@ ** blank fill if 'width' < length */ uint l = (width == 0) ? word->leng : min(width, word->leng); - (void)fwrite(word->text, 1, l, fp); + (void)fwrite(word->u.text, 1, l, fp); if (l < width) (void) fprintf(fp, "%*s", (int)(width - l), " "); } Modified: trunk/bogofilter/src/word.h =================================================================== --- trunk/bogofilter/src/word.h 2009-01-12 04:12:41 UTC (rev 6765) +++ trunk/bogofilter/src/word.h 2009-01-12 04:27:36 UTC (rev 6766) @@ -21,7 +21,7 @@ union { byte *text; /** pointer to the string */ const char *ctext; - }; + } u; } word_t; /** create a new word_t from the \a leng bytes at address \a text */ @@ -35,7 +35,7 @@ #define word_free(self) xfree((self)) /** create a newly word_t to form a deep copy of \a self */ -#define word_dup(self) word_new((self)->text, (self)->leng) +#define word_dup(self) word_new((self)->u.text, (self)->leng) /** compare \a w1 and \a w2 with memcmp() */ extern int word_cmp(const word_t *w1, const word_t *w2); Modified: trunk/bogofilter/src/wordhash.c =================================================================== --- trunk/bogofilter/src/wordhash.c 2009-01-12 04:12:41 UTC (rev 6765) +++ trunk/bogofilter/src/wordhash.c 2009-01-12 04:27:36 UTC (rev 6766) @@ -235,7 +235,7 @@ unsigned int h = 0; size_t l; for (l=0; l<t->leng; l++) - h = MULT * h + t->text[l]; + h = MULT * h + t->u.text[l]; return h % NHASH; } @@ -243,7 +243,7 @@ { wordprop_t *p = (wordprop_t *)n->buf; if (verbose > 2) - printf( "%20.20s %5u %5u%s", n->key->text, p->cnts.bad, p->cnts.good, str); + printf( "%20.20s %5u %5u%s", n->key->u.text, p->cnts.bad, p->cnts.good, str); } /* this function accumulates the word frequencies from the src hash to @@ -311,7 +311,7 @@ for (hn = wh->bin[idx]; hn != NULL; hn = hn->next) { word_t *key = hn->key; - if (key->leng == t->leng && memcmp (t->text, key->text, t->leng) == 0) { + if (key->leng == t->leng && memcmp (t->u.text, key->u.text, t->leng) == 0) { wordprop_t *p = (wordprop_t *)hn->buf; return p; } This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <re...@us...> - 2009-01-21 23:05:21
|
Revision: 6769 http://bogofilter.svn.sourceforge.net/bogofilter/?rev=6769&view=rev Author: relson Date: 2009-01-21 23:05:15 +0000 (Wed, 21 Jan 2009) Log Message: ----------- Updated spamitarium to 0.3.0 Modified Paths: -------------- trunk/bogofilter/NEWS trunk/bogofilter/contrib/spamitarium.pl Modified: trunk/bogofilter/NEWS =================================================================== --- trunk/bogofilter/NEWS 2009-01-12 09:35:12 UTC (rev 6768) +++ trunk/bogofilter/NEWS 2009-01-21 23:05:15 UTC (rev 6769) @@ -15,6 +15,11 @@ ------------------------------------------------------------------------------- + 2009-01-21 + + * spamitarium.pl updated to version 0.3.0 + (thanks to Tom Anderson) + 2009-01-11 * For compatibility with Sun's Sun Studio 12 compiler, provide @@ -62,7 +67,7 @@ 2008-04-28 * Added maildir training info to English and French FAQs. - Thanks to Karl Schmidt and to Mouss. + (thanks to Karl Schmidt and to Mouss) 2008-04-26 Modified: trunk/bogofilter/contrib/spamitarium.pl =================================================================== --- trunk/bogofilter/contrib/spamitarium.pl 2009-01-12 09:35:12 UTC (rev 6768) +++ trunk/bogofilter/contrib/spamitarium.pl 2009-01-21 23:05:15 UTC (rev 6769) @@ -4,11 +4,11 @@ =head1 NAME -Spamitarium - where the spam's head gets fixed... +Spamitarium - evaluates and repairs the sanity of email headers... =cut -my $version = "0.2.1"; +my $version = "0.3.0"; ################################################ ############### Copyleft Notice ################ @@ -50,22 +50,21 @@ =head2 Procmail usage (recommended): -Add to ~/.procmailrc the following recipe, where I<$HOME> -is your home directory, if not set in the environment: +Add to your .procmailrc the following recipe: :0 { - :0 fhw - | $HOME/.bogofilter/spamitarium -sread + :0 fhw + | spamitarium -sreadx - # filter through bogofilter, tagging as spam + # filter through bogofilter, tagging as spam # or not and updating the word lists - :0 fw - | bogofilter -uep + :0 fw + | bogofilter -uep - # add back the "From" header for proper delivery - :0 fhw - | formail -I "From " -a "From " + # add back the "From" header for proper delivery + :0 fhw + | formail -I "From " -a "From " } =head2 Command line options: @@ -102,6 +101,13 @@ perform ASN lookups and include in received lines +=item B<x> + +include custom x-headers for additional header validations: + +- validate that the date header is within close proxmity to the +received date (see $date_limit global variable to configure) + =item B<w> parse and display the body of the email in addition to the headers @@ -123,7 +129,7 @@ I<list-id> and I<encrypted> fields passed through, you would change your procmail recipe as follows: - | $HOME/.bogofilter/spamitarium -sread list-id,encrypted + | spamitarium -sreadx list-id,encrypted =head1 REQUIRES @@ -133,6 +139,7 @@ =item * Perl 5.6.1 +Net::DNS::Resolver =back @@ -165,13 +172,20 @@ X-headers, prior to filtering. Spamitarium removes all invisible, non-functional header lines. -Finally, spamitarium looks up any IP addresses or rDNS addresses +Spamitarium also looks up any IP addresses or rDNS addresses which are not provided in order to provide the maximum tokens on which to filter. Moreover, it looks up the ASN (autonomous system number) associated with each "from" address in order to provide a small set of tokens representing the various major subnets of the internet. +Finally, Spamitarium assesses the headers for missing required +header lines, inserting keyable tokens or supplying the missing +information. And it compares the date fields to determine if the +email has been pre- or post-dated by a large margin in order to +influence where it appears in your mail client and inserts an +x-header with keyable range tokens to compensate for this. + Together, all of these techniques help to remove the noise which accompanies, either incidentally or maliciously, most email messages. This results in a cleaner header consisting of more easily scored @@ -247,6 +261,58 @@ # server to use for ASN lookups our $asn_server = "asn.routeviews.org"; +# distance in seconds from right now to consider a reasonable (non-spam) range to date an email +our $date_limit = 60*60*24*2; # 2 days + +# EMAIL HEADER FIELDS +# +# See RFC 2076 / "Common Internet Message Header Fields" for a synopsis of common mail headers + + # SPECIFIED FIELDS -- all of the fields specified in RFC 822/2822, case-insensitive, in the suggested order + our $spec_fields = "return-path,received,resent-date,resent-from,resent-sender,resent-reply-to,". + "resent-to,resent-cc,resent-bcc,resent-message-id,date,from,sender,reply-to,". + "to,cc,bcc,message-id,in-reply-to,references,subject,comments,keywords,encrypted"; + + # MIME header fields (RFC 1049/1341/1521/2183) + $spec_fields .= ",mime-version,content-type,content-transfer-encoding,content-id,content-description,content-disposition"; + + # security/checksum (RFC 1864) + $spec_fields .= ",content-md5"; + + # mailing list headers (RFC 2369/2919) may be added if you like, but for now I'm choosing to leave them out + #$spec_fields .= ",list-id,list-help,list-unsubscribe,list-subscribe,list-post,list-owner,list-archive"; + + # MASKED FIELDS -- unnecessary fields often used for spam will be expunged from the spec fields list + # (if you know of a valid, necessary use for these, let me know) + our $masked_fields = "keywords,comments,encrypted,content-id,content-description"; + + # controversial and not strictly necessary: + #$masked_fields .= ",reply-to"; + + # message-id fields are only machine-readable and not visible to nor readable by the recipient + # however, they can be useful if your client produces discussion threading + # uncomment this line if you don't care about threading: + #$masked_fields .= ",message-id,resent-message-id,in-reply-to,references"; + + # resent fields are strictly informational (and not generally user-visible), therefore allowing them through is optional: + # MIME specifies a different way of resending messages with the "Message" content-type, so these may be considered deprecated: + $masked_fields .= ",resent-date,resent-from,resent-sender,resent-reply-to,resent-to,resent-cc,resent-bcc,resent-message-id"; + + # USER FIELDS -- User fields are those that are neither specified nor masked that you want permitted. + # These may include special fields for your particular mail server, filter, or mail user agent. + our $user_fields = ""; + + # NEW FIELDS -- New custom x-headers added by Spamitarium (it is recommend that you don't change these). + # These are disabled unless you pass the 'x' option. + our $new_fields = "x-date-check"; + + # REQUIRED FIELDS -- Any fields that should show up in an email even if they are not sent -- i.e. if the lack of + # these fields may be useful for the filter, a no-req-field tag will be added. The only *required* fields according to + # RFC 2822 are "from", "sender", "reply-to", and "date", others are just suggested. However, "sender" and "reply-to" are + # commonly not supplied, and so should probably not be in this list. On the other hand, "subject" and a few others may + # be desired in this list. + our $req_fields = "received,from,to,date,subject"; + # of course, modify the first line of this file, # the shebang, to point to your perl interpreter @@ -258,6 +324,8 @@ ################################################# use Benchmark; +use Time::Local; +use Net::DNS::Resolver; ################################################# ############## Default Globals ################## @@ -269,16 +337,13 @@ # Make %ENV safer delete @ENV{qw(IFS CDPATH ENV BASH_ENV PATH SHELL)}; -# Set the environment explicitely +# Set the environment explicitly $ENV{PATH} = $path; $ENV{SHELL} = $shell; # options flags our $options = ""; -# list of allowed headers -our $user_fields = ""; - # define the control-linefeed syntax for this system our $CRLF = "\n"; @@ -286,12 +351,20 @@ #("\t" ne "\011")? "\r\n": # EBCDIC # "\015\012"; # others +# DNS query options +our $res = Net::DNS::Resolver->new( + nameservers => [qw(127.0.0.1)], + udp_timeout => 2, + retry => 1, + #debug => 1 +); + ################################################ ##################### Main ##################### ################################################ # process options -if (!defined @ARGV || $ARGV[0] !~ /[^\s]/ || $ARGV[0] =~ /h/) +if (!defined @ARGV || @ARGV == 0 || $ARGV[0] !~ /\w/ || $ARGV[0] =~ /h/) { my $spamitarium = $1 if $0 =~ /^([\w\/.\-~]*)$/; system("perldoc $spamitarium"); exit(0); @@ -304,12 +377,14 @@ if ($ARGV[0] =~ /e/) { $options .= "e"; } # include the helo received field in output if ($ARGV[0] =~ /b/) { $options .= "b"; } # output benchmarking info if ($ARGV[0] =~ /w/) { $options .= "w"; } # process whole email (including body) +if ($ARGV[0] =~ /x/) { $options .= "x"; } # insert custom x-header fields # get the permitted headers if ($options =~ /s/ && $ARGV[1]) { $user_fields = $ARGV[1]; } # start timing the process my $start_time = new Benchmark if $options =~ /b/; +my ($start_parse, $end_parse, $start_rcvd, $end_rcvd, $start_set, $end_set); # get STDIN and process the email eval @@ -319,18 +394,39 @@ alarm $timeout; # parse the header - my $header = parse_header(); - + $start_parse = new Benchmark if $options =~ /b/; + my ($header,$parse_benchmark) = parse_header(); + $end_parse = new Benchmark if $options =~ /b/; + # cancel timeout if we got this far alarm 0; + + # default date if none provided + unless (defined $header->{'date'}) + { + $header->{'date'}->[0]->{'name'} = "Date"; + $header->{'date'}->[0]->{'value'} = gmtime time; + } # process the received lines - $header->{'received'} = process_rcvd($header->{'received'},$header->{'date'}->[0]->{'string'}) if $options =~ /r/; + if ($options =~ /r/) + { + $start_rcvd = new Benchmark if $options =~ /b/; + $header->{'received'} = process_rcvd($header->{'received'}); + $end_rcvd = new Benchmark if $options =~ /b/; + } - #print "received: " . $header->{'received'} . ": " . $header->{'received'}->[0] . ": " . $header->{'received'}->[0]->{'sane'} . "\n"; - + # add new custom header fields + if ($options =~ /x/) + { + $header->{'x-date-check'}->[0]->{'name'} = "X-Date-Check"; + $header->{'x-date-check'}->[0]->{'value'} = date_check($header->{'date'}->[0]->{'value'},$header->{'received'}->[0]->{'date'}); + } + # output the new header containing the changes + $start_set = new Benchmark if $options =~ /b/; print set_header($header); + $end_set = new Benchmark if $options =~ /b/; # add the body if desired print parse_body() if $options =~ /w/; @@ -350,6 +446,24 @@ my $usr = $td->[1]+$td->[3]; my $sys = $td->[2]+$td->[4]; my $cpu = $usr+$sys; my $wall = $td->[0]; print "Total running time was $wall wallclock secs; $usr usr + $sys sys = $cpu CPU secs.$CRLF"; + + $td = timediff($end_parse, $start_parse); + $usr = $td->[1]+$td->[3]; $sys = $td->[2]+$td->[4]; + $cpu = $usr+$sys; $wall = $td->[0]; + print "Input parsing time was $wall wallclock secs; $usr usr + $sys sys = $cpu CPU secs.$CRLF"; + + if ($options =~ /r/) + { + $td = timediff($end_rcvd, $start_rcvd); + $usr = $td->[1]+$td->[3]; $sys = $td->[2]+$td->[4]; + $cpu = $usr+$sys; $wall = $td->[0]; + print "Received line processing time was $wall wallclock secs; $usr usr + $sys sys = $cpu CPU secs.$CRLF"; + } + + $td = timediff($end_set, $start_set); + $usr = $td->[1]+$td->[3]; $sys = $td->[2]+$td->[4]; + $cpu = $usr+$sys; $wall = $td->[0]; + print "Rebuilding email time was $wall wallclock secs; $usr usr + $sys sys = $cpu CPU secs.$CRLF"; } exit(0); @@ -360,45 +474,98 @@ sub parse_header { - my %header; + my $header = {}; my $name = ""; - while (<STDIN>) - { + while (<STDIN>) + { alarm 0; my $line = $_; + chomp($line); # we're done with the header when we've found a blank line - last if (!defined $line || $line !~ /[^\s]/); + # and the required headers have been found already + last if (!defined $line || $line !~ /\S/); + #&& ( + #(defined $header->{'received'} && $header->{'received'}->[0]->{'value'} =~ /\w/) && + #(defined $header->{'subject'} && $header->{'subject'}->[0]->{'value'} =~ /\w/) && + #(defined $header->{'to'} && $header->{'to'}->[0]->{'value'} =~ /\w/) && + #(defined $header->{'from'} && $header->{'from'}->[0]->{'value'} =~ /\w/))); - # start matching header lines - if ($line =~ /^((?:\w|-)+?): (.*?)$/) + # match header lines + if ($line =~ /^(\S+?):\s*?(\S.+?)$/) { my $head = $1; my $value = $2; $name = $head; $name =~ tr/A-Z/a-z/; # header names are case insensitive - $value =~ s/\s+?/ /gis; # unfold header lines by removing CRLF + chomp($name); + + $value =~ s/\s+?/ /gis; # nix extra spaces & unfold header lines by removing CRLF $value =~ s/(\S)$/$1 /; + chomp($value); # if this header name has already been found, append to the end of the array - my $count = ((defined $header{$name}) && (ref($header{$name}) eq "ARRAY"))? scalar @{$header{$name}} : 0; + my $count = ((defined $header->{$name}) && (ref($header->{$name}) eq "ARRAY"))? scalar @{$header->{$name}} : 0; # record this header line - $header{$name}[$count]{'string'} = $value; - $header{$name}[$count]{'name'} = $head; # just for consistency + $header->{$name}->[$count]->{'value'} = $value; + $header->{$name}->[$count]->{'name'} = $head; # just for consistency (i.e. pre transforms) - #print "$name [$count] = $value$CRLF"; + #print "found $head [$count] = $value$CRLF"; } # if this line doesn't start with "header:", append to last line found (if exists) - elsif ($name) { $line =~ s/\s+?/ /gis; $line =~ s/^\s//; $header{$name}[(scalar @{$header{$name}} - 1)]{'string'} .= $line if ((defined $header{$name}) && (ref($header{$name}) eq "ARRAY")); } + elsif ($name && $line =~ /\w/ && $line !~ /^:/) { $line =~ s/\s+?/ /gis; $line =~ s/^\s//; $header->{$name}->[(scalar @{$header->{$name}} - 1)]->{'value'} .= $line if ((defined $header->{$name}) && (ref($header->{$name}) eq "ARRAY")); } } + + return $header; +} + +sub date_check +{ + my ($date,$rcvd) = shift; + my ($dow, $day, $mon, $year, $hour, $min, $sec, $rmdr) = "?"; + + if ($date =~ /\s*?(\w{1,9}),?\s+?(\d+?)\s+?(\w{3})\s+?(\d{4})\s+?(\d{1,2}):(\d{2}):(\d{2})(.*?)/i) + { + $dow=$1; $day=$2; $mon=$3; $year=$4; $hour=$5; $min=$6; $sec=$7; $rmdr=$8; + $mon = $mon=~/Dec/i?11:$mon=~/Nov/i?10:$mon=~/Oct/i?9:$mon=~/Sep/i?8:$mon=~/Aug/i?7:$mon=~/Jul/i?6:$mon=~/Jun/i?5:$mon=~/May/i?4:$mon=~/Apr/i?3:$mon=~/Mar/i?2:$mon=~/Feb/i?1:0; - return \%header; + $date = timegm($sec,$min,$hour,$day,$mon,$year); + + # adjust for local time + if ($rmdr =~ /\+\d(\d)\d\d/) { $date -= $1 * 60 * 60; } + if ($rmdr =~ /\-\d(\d)\d\d/) { $date += $1 * 60 * 60; } + } + else { return "date-format-unknown"; } + + if ($rcvd && $rcvd =~ /\s*?(\w{1,9}),?\s+?(\d+?)\s+?(\w{3})\s+?(\d{4})\s+?(\d{1,2}):(\d{2}):(\d{2})(.*?)/i) + { + $dow=$1; $day=$2; $mon=$3; $year=$4; $hour=$5; $min=$6; $sec=$7; $rmdr=$8; + $mon = $mon=~/Dec/i?11:$mon=~/Nov/i?10:$mon=~/Oct/i?9:$mon=~/Sep/i?8:$mon=~/Aug/i?7:$mon=~/Jul/i?6:$mon=~/Jun/i?5:$mon=~/May/i?4:$mon=~/Apr/i?3:$mon=~/Mar/i?2:$mon=~/Feb/i?1:0; + + $rcvd = timegm($sec,$min,$hour,$day,$mon,$year); + + # adjust for local time + if ($rmdr =~ /\+\d(\d)\d\d/) { $rcvd -= $1 * 60 * 60; } + if ($rmdr =~ /\-\d(\d)\d\d/) { $rcvd += $1 * 60 * 60; } + } + else { $rcvd = time; } + + # check for range +/- + my $diff = $rcvd - $date; my $diff_days = round($diff/(60*60*24)); + if (($diff < $date_limit) and ($diff > $date_limit * -1)) { return "date-in-range ($diff_days days)"; } + else { return "date-out-of-range ($diff_days days)"; } } +sub round +{ + my $num = shift; + return int(($num*100)+0.5)/100; +} + ################################################ ################# Parse Body ################## ################################################ @@ -411,7 +578,7 @@ # we'll just process the header my $body = ""; - while (<STDIN>) { $body .= $_; } + while (<STDIN>) { $body .= $_; } return $body; } @@ -422,11 +589,10 @@ sub process_rcvd { my $rcvd = shift; - my $date = shift; # heuristics my $LUSER = qr~(?:\w|-|\.)+?~; - my $DOMAIN = qr~(?:\w|-|\.)+\.\w{2,4}~; + my $DOMAIN = qr~(?:\w|-|\.)+?\.\w{2,4}~; my $IP = qr~(?:\d{1,3}\.){3}\d{1,3}~; my $EMAIL = qr~$LUSER\@$DOMAIN~; my $HELO = qr~[^\s\0\/\\\#]+?~; @@ -436,68 +602,73 @@ my $untrusted = 0; # check if we were passed a valid array of received lines - unless ((defined $rcvd) && (ref($rcvd) eq "ARRAY") && $rcvd->[0]->{'string'}) + unless ((defined $rcvd) && (ref($rcvd) eq "ARRAY") && $rcvd->[0]->{'value'}) { no strict 'refs'; - my %rcvd_hash = ('string'=>"from localhost; $date", 'name'=>"Received"); + my %rcvd_hash = ('value' => "from localhost; " . gmtime time, 'name' => "Received"); my @rcvd_array; $rcvd_array[0] = \%rcvd_hash; $rcvd = \@rcvd_array; } + else { # iterate through each received header, parsing and validating the info for (my $x = 0; $x < scalar @$rcvd; $x++) { # skip processing if we already lost confidence in this trail of received lines - if ($untrusted) { $rcvd->[$x]->{'sane'} = "untrusted"; next; } + #if ($untrusted) { $rcvd->[$x]->{'sane'} = "untrusted"; next; } my $helo=""; my $ipad=""; my $rdns=""; my $idnt=""; my $from=""; my $mtan=""; my $mtai=""; my $mtav=""; my $fore=""; my $with=""; my $date=""; # try to take into account all known MTA formats - if ($rcvd->[$x]->{'string'} =~ s/\(envelope-(?:sender|from) <($EMAIL)>\)//gis) { $from=$1; }#print "X-$x-matched-01: from=$from$CRLF"; } - if ($rcvd->[$x]->{'string'} =~ s/;\s+?(\w{3}, \d{1,2} \w{3} \d{2,4}.*?)$//gis) { $date=$1; }#print "X-$x-matched-02: date=$date$CRLF"; } - if ($rcvd->[$x]->{'string'} =~ s/for\s+?<?($EMAIL)>?(?: \(single-drop\))?//gis) { $fore=$1; }#print "X-$x-matched-03: fore=$fore$CRLF"; } - if ($rcvd->[$x]->{'string'} =~ s/by\s+?(\S+?) \(($IP)\) \((.*?)\)//gis) { $mtan=$1; $mtai=$2; $mtav=$3; }#print "X-$x-matched-04: mtan=$mtan, mtai=$mtai, mtav=$mtav$CRLF"; } - elsif ($rcvd->[$x]->{'string'} =~ s/by\s+?(\S+?) \[($IP)\]//gis) { $mtan=$1; $mtai=$2; }#print "X-$x-matched-05: mtan=$mtan, mtai=$mtai$CRLF"; } - elsif ($rcvd->[$x]->{'string'} =~ s/by\s+?(\S+?) \((.+?)\)//gis) { $mtan=$1; $mtav=$2; }#print "X-$x-matched-06: mtan=$mtan, mtav=$mtav$CRLF"; } - elsif ($rcvd->[$x]->{'string'} =~ s/by\s+?($IP)(?=\W|;|$)//gis) { $mtai=$1; }#print "X-$x-matched-07: mtai=$mtai$CRLF"; } - elsif ($rcvd->[$x]->{'string'} =~ s/by\s+?($DOMAIN)(?=\W|;|$)//gis) { $mtan=$1; }#print "X-$x-matched-08: mtan=$mtan$CRLF"; } - elsif ($rcvd->[$x]->{'string'} =~ s/by\s+?(\S+?)(?=\W|;|$)//gis) { $mtan=$1; }#print "X-$x-matched-09: mtan=$mtan$CRLF"; } - if ($rcvd->[$x]->{'string'} =~ s/(?:with)\s+?(\S+?) \((.*?)\)//gis) { $with=$1; $mtav=$2 if !$mtav; }#print "X-$x-matched-10: with=$with, mtav=$mtav$CRLF";} - elsif ($rcvd->[$x]->{'string'} =~ s/(?:with)\s+?(\S+?)(?=\W|;|$)//gis) { $with=$1; }#print "X-$x-matched-11: with=$with$CRLF"; } - if ($rcvd->[$x]->{'string'} =~ s/^from\s+?($RDNS) \(HELO ($HELO)\) \(($LUSER)\@\[?($IP)\]?//gis) { $rdns=$1; $helo=$2; $idnt=$3; $ipad=$4; }#print "X-$x-matched-12: rdns=$rdns, helo=$helo, idnt=$idnt, ipad=$ipad$CRLF"; } - elsif ($rcvd->[$x]->{'string'} =~ s/^from\s+?($RDNS) \(HELO ($HELO)\) \(\[?($IP)\]?//gis) { $rdns=$1; $helo=$2; $ipad=$3; }#print "X-$x-matched-13: rdns=$rdns, helo=$helo, ipad=$ipad$CRLF"; } - elsif ($rcvd->[$x]->{'string'} =~ s/^from\s+?($RDNS) \(\[($IP)\] helo=($HELO)\)//gis) { $rdns=$1; $ipad=$2; $helo=$3; }#print "X-$x-matched-14: rdns=$rdns, ipad=$ipad, helo=$helo$CRLF"; } - elsif ($rcvd->[$x]->{'string'} =~ s/^from\s+?($RDNS) \(($LUSER)\@\[?($IP)\]?\)//gis) { $rdns=$1; $idnt=$2; $ipad=$3; }#print "X-$x-matched-15: rdns=$rdns, idnt=$idnt, ipad=$ipad$CRLF"; } - elsif ($rcvd->[$x]->{'string'} =~ s/^from\s+?($RDNS)\(($IP)\)//gis) { $rdns=$1; $ipad=$2; }#print "X-$x-matched-16: rdns=$rdns, ipad=$ipad$CRLF"; } - elsif ($rcvd->[$x]->{'string'} =~ s/^from\s+?\[($IP)\] \(helo=($HELO) ident=($LUSER)\)//gis) { $ipad=$1; $helo=$2; $idnt=$3; }#print "X-$x-matched-17: ipad=$ipad, helo=$helo, idnt=$idnt$CRLF"; } - elsif ($rcvd->[$x]->{'string'} =~ s/^from\s+?\[($IP)\] \(account ($LUSER) HELO ($HELO)\)//gis) { $ipad=$1; $idnt=$2; $helo=$3; }#print "X-$x-matched-18: ipad=$ipad, idnt=$idnt, helo=$helo$CRLF"; } - elsif ($rcvd->[$x]->{'string'} =~ s/^from\s+?\[($IP)\] \(helo=($HELO)\)//gis) { $ipad=$1; $helo=$2; }#print "X-$x-matched-19: ipad=$ipad, helo=$helo$CRLF"; } - elsif ($rcvd->[$x]->{'string'} =~ s/^from\s+?\[?($IP)\]?:?\d*? \(HELO ($HELO)\)//gis) { $ipad=$1; $helo=$2; }#print "X-$x-matched-20: ipad=$ipad, helo=$helo$CRLF"; } - elsif ($rcvd->[$x]->{'string'} =~ s/^from\s+?($HELO) \(IDENT:($LUSER)\@($RDNS) \[($IP)\]//gis) { $helo=$1; $idnt=$2; $rdns=$3; $ipad=$4; }#print "X-$x-matched-21: helo=$helo, idnt=$idnt, rdns=$rdns, ipad=$ipad$CRLF"; } - elsif ($rcvd->[$x]->{'string'} =~ s/^from\s+?($HELO) \(<?($RDNS)>?\s?\[($IP)\]//gis) { $helo=$1; $rdns=$2; $ipad=$3; }#print "X-$x-matched-22: helo=$helo, rdns=$rdns, ipad=$ipad$CRLF"; } - elsif ($rcvd->[$x]->{'string'} =~ s/^from\s+?($HELO) \(\[($IP)\] ident=($LUSER)\)//gis) { $helo=$1; $ipad=$2; $idnt=$3; }#print "X-$x-matched-23: helo=$helo, ipad=$ipad, idnt=$idnt$CRLF"; } - elsif ($rcvd->[$x]->{'string'} =~ s/^from\s+?($HELO) \(proxying for ($IP)\) \(.*? user ($LUSER)\)//gis) { $helo=$1; $ipad=$2; $idnt=$3; }#print "X-$x-matched-24: helo=$helo, ipad=$ipad, idnt=$idnt$CRLF"; } - elsif ($rcvd->[$x]->{'string'} =~ s/^from\s+?($HELO) \(account ($LUSER) \[($IP)\] verified\)//gis) { $helo=$1; $idnt=$2; $ipad=$3; }#print "X-$x-matched-25: helo=$helo, idnt=$idnt, ipad=$ipad$CRLF"; } - elsif ($rcvd->[$x]->{'string'} =~ s/^from\s+?\(?($HELO) \(?\[?($IP)\]?\)?//gis) { $helo=$1; $ipad=$2; }#print "X-$x-matched-26: helo=$helo, ipad=$ipad$CRLF"; } - elsif ($rcvd->[$x]->{'string'} =~ s/^from\s+?($HELO) \(localhost \[.*?:($IP)\]\)//gis) { $helo=$1; $ipad=$2; }#print "X-$x-matched-27: helo=$helo, ipad=$ipad$CRLF"; } - elsif ($rcvd->[$x]->{'string'} =~ s/^from\s+?($HELO) \(($LUSER)\@($RDNS)\)//gis) { $helo=$1; $idnt=$2; $rdns=$3; }#print "X-$x-matched-28: helo=$helo, idnt=$idnt, rdns=$rdns$CRLF"; } - elsif ($rcvd->[$x]->{'string'} =~ s/^from\s+?($HELO) \(($RDNS)\)//gis) { $helo=$1; $rdns=$2; }#print "X-$x-matched-29: helo=$helo, rdns=$rdns$CRLF"; } - elsif ($rcvd->[$x]->{'string'} =~ s/\(from\s+?($LUSER)\@($RDNS)\)//gis) { $idnt=$1; $rdns=$2; }#print "X-$x-matched-30: idnt=$idnt, rdns=$rdns$CRLF"; } - elsif ($rcvd->[$x]->{'string'} =~ s/\(from\s+?($LUSER)\@($HELO)\)//gis) { $idnt=$1; $helo=$2; }#print "X-$x-matched-31: idnt=$idnt, helo=$helo$CRLF"; } - elsif ($rcvd->[$x]->{'string'} =~ s/^from\s+?\(?\[?($IP)\]?\)?//gis) { $ipad=$1; }#print "X-$x-matched-32: ipad=$ipad$CRLF"; } - elsif ($rcvd->[$x]->{'string'} =~ s/^from\s+?($HELO)(?=\W|;|$)//gis) { $helo=$1; }#print "X-$x-matched-33: helo=$helo$CRLF"; } + if ($rcvd->[$x]->{'value'} =~ s/\(envelope-(?:sender|from) <($EMAIL)>\)//gis) { $from=$1; }# print "X-$x-matched-01: from=$from, remaining=$rcvd->[$x]->{'value'} $CRLF"; } + if ($rcvd->[$x]->{'value'} =~ s/;\s+?(\w{3}, \d{1,2} \w{3} \d{2,4}.*?)$//gis) { $date=$1; }# print "X-$x-matched-02: date=$date, remaining=$rcvd->[$x]->{'value'} $CRLF"; } + if ($rcvd->[$x]->{'value'} =~ s/for\s+?<?($EMAIL)>?(?: \(single-drop\))?//gis) { $fore=$1; }# print "X-$x-matched-03: fore=$fore, remaining=$rcvd->[$x]->{'value'} $CRLF"; } + if ($rcvd->[$x]->{'value'} =~ s/by\s+?(\S+?) \(($IP)\) \((.*?)\)//gis) { $mtan=$1; $mtai=$2; $mtav=$3; }# print "X-$x-matched-04: mtan=$mtan, mtai=$mtai, mtav=$mtav, remaining=$rcvd->[$x]->{'value'} $CRLF"; } + elsif ($rcvd->[$x]->{'value'} =~ s/by\s+?(\S+?) \[($IP)\]//gis) { $mtan=$1; $mtai=$2; }# print "X-$x-matched-05: mtan=$mtan, mtai=$mtai, remaining=$rcvd->[$x]->{'value'} $CRLF"; } + elsif ($rcvd->[$x]->{'value'} =~ s/by\s+?(\S+?) \((.+?)\)//gis) { $mtan=$1; $mtav=$2; }# print "X-$x-matched-06: mtan=$mtan, mtav=$mtav, remaining=$rcvd->[$x]->{'value'} $CRLF"; } + elsif ($rcvd->[$x]->{'value'} =~ s/by\s+?($IP)(?=\W|;|$)//gis) { $mtai=$1; }# print "X-$x-matched-07: mtai=$mtai, remaining=$rcvd->[$x]->{'value'} $CRLF"; } + elsif ($rcvd->[$x]->{'value'} =~ s/by\s+?($DOMAIN)(?=\W|;|$)//gis) { $mtan=$1; }# print "X-$x-matched-08: mtan=$mtan, remaining=$rcvd->[$x]->{'value'} $CRLF"; } + elsif ($rcvd->[$x]->{'value'} =~ s/by\s+?(\S+?)(?=\W|;|$)//gis) { $mtan=$1; }# print "X-$x-matched-09: mtan=$mtan, remaining=$rcvd->[$x]->{'value'} $CRLF"; } + if ($rcvd->[$x]->{'value'} =~ s/(?:with)\s+?(\S+?) \((.*?)\)//gis) { $with=$1; $mtav=$2 if !$mtav; }# print "X-$x-matched-10: with=$with, mtav=$mtav, remaining=$rcvd->[$x]->{'value'} $CRLF";} + elsif ($rcvd->[$x]->{'value'} =~ s/(?:with)\s+?(\S+?)(?=\W|;|$)//gis) { $with=$1; }# print "X-$x-matched-11: with=$with, remaining=$rcvd->[$x]->{'value'} $CRLF"; } + if ($rcvd->[$x]->{'value'} =~ s/^from\s+?($RDNS) \(HELO ($HELO)\) \(($LUSER)\@\[?($IP)\]?//gis) { $rdns=$1; $helo=$2; $idnt=$3; $ipad=$4; }# print "X-$x-matched-12: rdns=$rdns, helo=$helo, idnt=$idnt, ipad=$ipad, remaining=$rcvd->[$x]->{'value'} $CRLF"; } + elsif ($rcvd->[$x]->{'value'} =~ s/^from\s+?($RDNS) \(HELO ($HELO)\) \(\[?($IP)\]?//gis) { $rdns=$1; $helo=$2; $ipad=$3; }# print "X-$x-matched-13: rdns=$rdns, helo=$helo, ipad=$ipad, remaining=$rcvd->[$x]->{'value'} $CRLF"; } + elsif ($rcvd->[$x]->{'value'} =~ s/^from\s+?($RDNS) \(\[($IP)\] helo=($HELO)\)//gis) { $rdns=$1; $ipad=$2; $helo=$3; }# print "X-$x-matched-14: rdns=$rdns, ipad=$ipad, helo=$helo, remaining=$rcvd->[$x]->{'value'} $CRLF"; } + elsif ($rcvd->[$x]->{'value'} =~ s/^from\s+?($RDNS) \(($LUSER)\@\[?($IP)\]?\)//gis) { $rdns=$1; $idnt=$2; $ipad=$3; }# print "X-$x-matched-15: rdns=$rdns, idnt=$idnt, ipad=$ipad, remaining=$rcvd->[$x]->{'value'} $CRLF"; } + elsif ($rcvd->[$x]->{'value'} =~ s/^from\s+?($RDNS)\(($IP)\)//gis) { $rdns=$1; $ipad=$2; }# print "X-$x-matched-16: rdns=$rdns, ipad=$ipad, remaining=$rcvd->[$x]->{'value'} $CRLF"; } + elsif ($rcvd->[$x]->{'value'} =~ s/^from\s+?\[($IP)\] \(helo=($HELO) ident=($LUSER)\)//gis) { $ipad=$1; $helo=$2; $idnt=$3; }# print "X-$x-matched-17: ipad=$ipad, helo=$helo, idnt=$idnt, remaining=$rcvd->[$x]->{'value'} $CRLF"; } + elsif ($rcvd->[$x]->{'value'} =~ s/^from\s+?\[($IP)\] \(account ($LUSER) HELO ($HELO)\)//gis) { $ipad=$1; $idnt=$2; $helo=$3; }# print "X-$x-matched-18: ipad=$ipad, idnt=$idnt, helo=$helo, remaining=$rcvd->[$x]->{'value'} $CRLF"; } + elsif ($rcvd->[$x]->{'value'} =~ s/^from\s+?\[($IP)\] \(helo=($HELO)\)//gis) { $ipad=$1; $helo=$2; }# print "X-$x-matched-19: ipad=$ipad, helo=$helo, remaining=$rcvd->[$x]->{'value'} $CRLF"; } + elsif ($rcvd->[$x]->{'value'} =~ s/^from\s+?\[?($IP)\]?:?\d*? \(HELO ($HELO)\)//gis) { $ipad=$1; $helo=$2; }# print "X-$x-matched-20: ipad=$ipad, helo=$helo, remaining=$rcvd->[$x]->{'value'} $CRLF"; } + elsif ($rcvd->[$x]->{'value'} =~ s/^from\s+?($HELO) \(IDENT:($LUSER)\@($RDNS) \[($IP)\]//gis) { $helo=$1; $idnt=$2; $rdns=$3; $ipad=$4; }# print "X-$x-matched-21: helo=$helo, idnt=$idnt, rdns=$rdns, ipad=$ipad, remaining=$rcvd->[$x]->{'value'} $CRLF"; } + elsif ($rcvd->[$x]->{'value'} =~ s/^from\s+?($HELO) \(<?($RDNS)>?\s?\[($IP)\]//gis) { $helo=$1; $rdns=$2; $ipad=$3; }# print "X-$x-matched-22: helo=$helo, rdns=$rdns, ipad=$ipad, remaining=$rcvd->[$x]->{'value'} $CRLF"; } + elsif ($rcvd->[$x]->{'value'} =~ s/^from\s+?($HELO) \(\[($IP)\] ident=($LUSER)\)//gis) { $helo=$1; $ipad=$2; $idnt=$3; }# print "X-$x-matched-23: helo=$helo, ipad=$ipad, idnt=$idnt, remaining=$rcvd->[$x]->{'value'} $CRLF"; } + elsif ($rcvd->[$x]->{'value'} =~ s/^from\s+?($HELO) \(proxying for ($IP)\) \(.*? user ($LUSER)\)//gis) { $helo=$1; $ipad=$2; $idnt=$3; }# print "X-$x-matched-24: helo=$helo, ipad=$ipad, idnt=$idnt, remaining=$rcvd->[$x]->{'value'} $CRLF"; } + elsif ($rcvd->[$x]->{'value'} =~ s/^from\s+?($HELO) \(account ($LUSER) \[($IP)\] verified\)//gis) { $helo=$1; $idnt=$2; $ipad=$3; }# print "X-$x-matched-25: helo=$helo, idnt=$idnt, ipad=$ipad, remaining=$rcvd->[$x]->{'value'} $CRLF"; } + elsif ($rcvd->[$x]->{'value'} =~ s/^from\s+?\(?($HELO) \(?\[?($IP)\]?\)?//gis) { $helo=$1; $ipad=$2; }# print "X-$x-matched-26: helo=$helo, ipad=$ipad, remaining=$rcvd->[$x]->{'value'} $CRLF"; } + elsif ($rcvd->[$x]->{'value'} =~ s/^from\s+?($HELO) \(localhost \[.*?:($IP)\]\)//gis) { $helo=$1; $ipad=$2; }# print "X-$x-matched-27: helo=$helo, ipad=$ipad, remaining=$rcvd->[$x]->{'value'} $CRLF"; } + elsif ($rcvd->[$x]->{'value'} =~ s/^from\s+?($HELO) \(($LUSER)\@($RDNS)\)//gis) { $helo=$1; $idnt=$2; $rdns=$3; }# print "X-$x-matched-28: helo=$helo, idnt=$idnt, rdns=$rdns, remaining=$rcvd->[$x]->{'value'} $CRLF"; } + elsif ($rcvd->[$x]->{'value'} =~ s/^from\s+?($HELO) \(($RDNS)\)//gis) { $helo=$1; $rdns=$2; }# print "X-$x-matched-29: helo=$helo, rdns=$rdns, remaining=$rcvd->[$x]->{'value'} $CRLF"; } + elsif ($rcvd->[$x]->{'value'} =~ s/\(from\s+?($LUSER)\@($RDNS)\)//gis) { $idnt=$1; $rdns=$2; }# print "X-$x-matched-30: idnt=$idnt, rdns=$rdns, remaining=$rcvd->[$x]->{'value'} $CRLF"; } + elsif ($rcvd->[$x]->{'value'} =~ s/\(from\s+?($LUSER)\@($HELO)\)//gis) { $idnt=$1; $helo=$2; }# print "X-$x-matched-31: idnt=$idnt, helo=$helo, remaining=$rcvd->[$x]->{'value'} $CRLF"; } + elsif ($rcvd->[$x]->{'value'} =~ s/^from\s+?\(?\[?($IP)\]?\)?//gis) { $ipad=$1; }# print "X-$x-matched-32: ipad=$ipad, remaining=$rcvd->[$x]->{'value'} $CRLF"; } + elsif ($rcvd->[$x]->{'value'} =~ s/^from\s+?($HELO)(?=\W|;|$)//gis) { $helo=$1; }# print "X-$x-matched-33: helo=$helo, remaining=$rcvd->[$x]->{'value'} $CRLF"; } # lookup IP if not provided $ipad = host($rdns) if !$ipad && $rdns && $options =~ /d/; $ipad = host($helo) if !$ipad && !$rdns && $helo && $helo =~ /$DOMAIN/ && $options =~ /d/; # exclude lines with no IP - next if !$ipad && ((scalar @$rcvd) > 1); + #next if !$ipad && ((scalar @$rcvd) > 1); + # ensure the local received line has a date stamp + $date = gmtime time unless $date || $x; + # save "from" info for comparison in next iteration $rcvd->[$x]->{'rdns'} = $rdns; $rcvd->[$x]->{'ipad'} = $ipad; + $rcvd->[$x]->{'date'} = $date; # exclude lines from local, private (RFC 1918), and invalid IP address ranges my $reserved = qr~^((?:127\.)|(?:10\.)|(?:172\.(?:1[6-9]|2[0-9]|31)\.)|(?:192\.168\.)|(?:169\.254\.))~; @@ -523,18 +694,32 @@ # we implicitely trust the received line set "by" our own server as valid (first untrusted "from") if (!$edge_ip) { $edge_ip = $mtai; $rcvd->[$x]->{'sane'} = set_rcvd($helo,$ipad,$idnt,$rdns,$from,$mtan,$mtai,$mtav,$fore,$with,$date,$asn); } - # now we'll try to establish the validity of each nonlocal received line by - # checking for continuity and rejecting lines that don't fit the "from/by" chain + # now we'll try to establish the validity of each received line by checking + # for continuity and rejecting lines that don't fit the "from/by" chain else { #print " by " . $mtan . " / prev from " . $rcvd->[$x-1]->{'rdns'} . "$CRLF"; #print " by " . $mtai . " / prev from " . $rcvd->[$x-1]->{'ipad'} . "$CRLF"; - if ((($mtan && $rcvd->[$x-1]->{'rdns'} && $mtan =~ /$rcvd->[$x-1]->{'rdns'}/) || - ($mtai && $rcvd->[$x-1]->{'ipad'} && $mtai =~ /$rcvd->[$x-1]->{'ipad'}/)) && (!$untrusted)) + if ( + ( + ($mtan && $rcvd->[$x-1]->{'rdns'} && $mtan =~ /$rcvd->[$x-1]->{'rdns'}/) || + ($mtai && $rcvd->[$x-1]->{'ipad'} && $mtai =~ /$rcvd->[$x-1]->{'ipad'}/) + ) && (!$untrusted) + ) { $rcvd->[$x]->{'sane'} = set_rcvd($helo,$ipad,$idnt,$rdns,$from,$mtan,$mtai,$mtav,$fore,$with,$date,$asn); } - else { $rcvd->[$x]->{'sane'} = "untrusted"; $untrusted = 1; } + else + { + $helo = "untrusted-".$helo if $helo; $ipad = "untrusted-".$ipad if $ipad; + $idnt = "untrusted-".$idnt if $idnt; $rdns = "untrusted-".$rdns if $rdns; + $from = "untrusted-".$from if $from; $mtan = "untrusted-".$mtan if $mtan; + $mtai = "untrusted-".$mtai if $mtai; $mtav = "untrusted-".$mtav if $mtav; + $fore = "untrusted-".$fore if $fore; $with = "untrusted-".$with if $with; + $date = ""; $asn = ""; + $rcvd->[$x]->{'sane'} = set_rcvd($helo,$ipad,$idnt,$rdns,$from,$mtan,$mtai,$mtav,$fore,$with,$date,$asn); + $untrusted = 1; + } } - } + }} return $rcvd; } @@ -553,11 +738,19 @@ my $target = shift; my $output = ""; + my $IP = qr~(?:\d{1,3}\.){3}\d{1,3}~; + my $DOMAIN = qr~(?:\w|-|\.)+?\.\w{2,4}~; + if ( $target =~ s/(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})/$4.$3.$2.$1.$asn_server/ ) { - open (HOST, "host -t txt $target 2>/dev/null |") or error("warn", "Host lookup failed: $!"); - while (<HOST>) { $output = $1 if /\Q$target\E(?: descriptive)? text "(\d*?)".*/; } - close HOST; + # uncomment this code if you do not want to use Net::DNS::Resolver and you have 'host' on your system + #open (HOST, "host -t txt $target 2>/dev/null |") or error("warn","Host lookup failed: $!"); + #while (<HOST>) { $output = $1 if /\Q$target\E(?: descriptive)? text "(\d*?)".*/; } + #close HOST; + + # find ASN info via Net::DNS::Resolver + if (my $query = $res->send($target,"TXT")) { foreach ($query->answer) { $output = $1 if $_->string =~ /$DOMAIN\.\s+?\d+?\s+?IN\s+?TXT\s+?"(\d+?)"\s+?"$IP"\s+?"\d+?"/; }} + #else { error("warn","ASN lookup failed: " . $res->errorstring); } } return $output; @@ -569,13 +762,18 @@ my $output = ""; my $IP = qr~(?:\d{1,3}\.){3}\d{1,3}~; - my $DOMAIN = qr~[\w|-|\.]+\.\w{2,4}~; + my $DOMAIN = qr~(?:\w|-|\.)+?\.\w{2,4}~; if ($target =~ s/($IP|$DOMAIN)/$1/) { - open (HOST, "host $target 2>/dev/null |") or error("warn", "Host lookup failed: $!"); - while (<HOST>) { $output = $1 if /$DOMAIN (?:domain name pointer|has address) ($IP|$DOMAIN)\.?/; } - close HOST; + # uncomment this code if you do not want to use Net::DNS::Resolver and you have 'host' on your system + #open (HOST, "host $target 2>/dev/null |") or error("warn","Host lookup failed: $!"); + #while (<HOST>) { $output = $1 if /$DOMAIN (?:domain name pointer|has address) ($IP|$DOMAIN)\.?/; } + #close HOST; + + # find DNS info via Net::DNS::Resolver + if (my $query = $res->send($target)) { foreach ($query->answer) { $output = $1 if $_->string =~ /$DOMAIN\.\s+?\d+?\s+?IN\s+?(?:PTR|A)\s+?($IP|$DOMAIN)\.?/; }} + #else { error("warn","DNS lookup failed: " . $res->errorstring); } } return $output; @@ -597,6 +795,9 @@ $output .= ($fore)? " $CRLF\t for" : ""; $output .= ($fore)? " <$fore>" : ""; # envelope to address $output .= ($date)? "; $date" : ""; # received date/time + + #print "outputting received: $output" . $CRLF; + return $output; } ################################################ @@ -607,51 +808,23 @@ { my $header = shift; my $output = ""; + my $name = ""; - # these are all of the fields specified in RFC 822/2822, case-insensitive, in the suggested order - # the only *required* fields according to RFC 2822 are "from", "sender", "reply-to", and "date", others are just suggested - my $spec_fields = "return-path,received,resent-date,resent-from,resent-sender,resent-reply-to,". - "resent-to,resent-cc,resent-bcc,resent-message-id,date,from,sender,reply-to,". - "to,cc,bcc,message-id,in-reply-to,references,subject,comments,keywords,encrypted"; - - # MIME header fields (RFC 1049/1341/1521/2183) - $spec_fields .= ",mime-version,content-type,content-transfer-encoding,content-id,content-description,content-disposition"; - - # security/checksum (RFC 1864) - $spec_fields .= ",content-md5"; - - # mailing list headers (RFC 2369/2919) may be added if you like, but for now I'm choosing to leave them out - #$spec_fields .= ",list-id,list-help,list-unsubscribe,list-subscribe,list-post,list-owner,list-archive"; - - # let's exclude unnecessary fields (if you know of a valid, necessary use for these, let me know) - my $masked_fields = "keywords,comments,encrypted,content-id,content-description"; - - # controversial and not strictly necessary: - $masked_fields .= ",reply-to"; - - # message-id fields are only machine-readable and not visible to nor readable by the recipient - # however, they can be useful if your client produces discussion threading - # uncomment this line if you don't care about threading: - # $masked_fields .= ",message-id,resent-message-id,in-reply-to,references"; - - # resent fields are strictly informational (and not generally user-visible), therefore allowing them through is optional: - # MIME specifies a different way of resending messages with the "Message" content-type, so these may be considered deprecated: - $masked_fields .= ",resent-date,resent-from,resent-sender,resent-reply-to,resent-to,resent-cc,resent-bcc,resent-message-id"; - - # see RFC 2076 / "Common Internet Message Header Fields" for a synopsis of common mail headers - # exclude the "masked fields" from display - foreach my $name (split(/,/,$masked_fields)) { $spec_fields =~ s/(?<=,)$name,?//; } + foreach $name (split(/,/,$masked_fields)) { $spec_fields =~ s/(?<=,)$name,?//; } - # output the fields in the order specified by RFC 2822 - foreach my $name (split(/,/,$spec_fields)) { $output .= set_field($header,$name); delete $header->{$name}; } + # output the fields in the order specified by RFC 2822 - minus the masked fields + foreach $name (split(/,/,$spec_fields)) { $output .= set_field($header,$name); delete $header->{$name}; } # set any user-specified fields - foreach my $name (split(/,/,$user_fields)) { $output .= set_field($header,$name); delete $header->{$name}; } + foreach $name (split(/,/,$user_fields)) { $output .= set_field($header,$name); delete $header->{$name}; } + # set new custom x-header fields + if ($options =~ /x/) { foreach $name (split(/,/,$new_fields)) { $output .= set_field($header,$name); delete $header->{$name}; } } + # then set any remaining fields (if allowed to set non-standard fields) - if ($options !~ /s/) { foreach my $name (keys %{$header}) { $output .= set_field($header,$name); } } - + if ($options !~ /s/) { foreach $name (keys %{$header}) { $output .= set_field($header,$name); } } + $output .= $CRLF; return $output; @@ -662,20 +835,33 @@ my $header = shift; my $name = shift; my $output = ""; - + if ((defined $header->{$name}) && (ref($header->{$name}) eq "ARRAY")) { for (my $x = 0; $x < scalar @{$header->{$name}}; $x++) { if (($name eq "received") && ($options =~ /r/)) { - #if (defined $header->{$name}->[$x]->{'sane'}) { $output .= ucfirst($name) . ": " . $header->{$name}->[$x]->{'sane'} . $CRLF; } - if (defined $header->{$name}->[$x]->{'sane'}) { $output .= $header->{$name}->[$x]->{'name'} . ": " . $header->{$name}->[$x]->{'sane'} . $CRLF; } + if (defined $header->{$name}->[$x]->{'sane'} && $header->{$name}->[$x]->{'sane'} =~ /\w/) + { + $output .= $header->{$name}->[$x]->{'name'} . ": " . $header->{$name}->[$x]->{'sane'} . $CRLF; + } + #else { $output .= $header->{$name}->[$x]->{'name'} . ": sanity check failed" . $CRLF; } } - #else { $output .= ucfirst($name) . ": " . $header->{$name}->[$x]->{'string'} . $CRLF; } - else { $output .= $header->{$name}->[$x]->{'name'} . ": " . $header->{$name}->[$x]->{'string'} . $CRLF; } + elsif ($header->{$name}->[$x]->{'value'} =~ /\w/) + { + $output .= $header->{$name}->[$x]->{'name'} . ": " . $header->{$name}->[$x]->{'value'} . $CRLF; + } } } + elsif (defined $header->{$name}) + { + $output .= ucfirst($name) . ": " . $header->{$name} . $CRLF; + } + elsif ($req_fields =~ /(?:^|,)$name(?:,|$)/) + { + $output .= ucfirst($name) . ": [no-$name] " . $CRLF; + } return $output; } @@ -700,12 +886,12 @@ sig: { - $action = "die", last sig if $sig =~ /ALRM/; - $action = "warn", last sig if $sig =~ /PIPE/; - $action = "warn", last sig if $sig =~ /CHLD/; - $action = "die" , last sig if $sig =~ /INT/; - $action = "die" , last sig if $sig =~ /HUP/; - $action = "warn"; + $action = "die", last sig if $sig =~ /ALRM/; + $action = "warn", last sig if $sig =~ /PIPE/; + $action = "warn", last sig if $sig =~ /CHLD/; + $action = "die" , last sig if $sig =~ /INT/; + $action = "die" , last sig if $sig =~ /HUP/; + $action = "warn"; } my $waitedpid = wait; @@ -713,7 +899,7 @@ $SIG{$sig} = \&sig_trap; - error ($action, "Trapped signal SIG$sig$more"); + error ($action,"Trapped signal SIG$sig$more"); } ################################################ This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <re...@us...> - 2009-02-01 03:49:52
|
Revision: 6777 http://bogofilter.svn.sourceforge.net/bogofilter/?rev=6777&view=rev Author: relson Date: 2009-02-01 03:49:48 +0000 (Sun, 01 Feb 2009) Log Message: ----------- Working token count min/max/fix capability. Modified Paths: -------------- trunk/bogofilter/NEWS trunk/bogofilter/src/bogoconfig.c trunk/bogofilter/src/bogofilter.c trunk/bogofilter/src/globals.c trunk/bogofilter/src/globals.h trunk/bogofilter/src/longoptions.h trunk/bogofilter/src/rstats.c trunk/bogofilter/src/rstats.h trunk/bogofilter/src/score.c trunk/bogofilter/src/score.h trunk/bogofilter/src/tests/Makefile.am Added Paths: ----------- trunk/bogofilter/src/tests/outputs/token.count.ref trunk/bogofilter/src/tests/t.multiple.tokens.min.mul trunk/bogofilter/src/tests/t.token.count Removed Paths: ------------- trunk/bogofilter/src/tests/t.token.min.mul Modified: trunk/bogofilter/NEWS =================================================================== --- trunk/bogofilter/NEWS 2009-02-01 03:46:06 UTC (rev 6776) +++ trunk/bogofilter/NEWS 2009-02-01 03:49:48 UTC (rev 6777) @@ -15,6 +15,10 @@ ------------------------------------------------------------------------------- + 2009-01-31 + * Added token-count=n, token-count-min=n, and token-count-max=n options. + * Minor code cleanups. + 2009-01-21 * spamitarium.pl updated to version 0.3.0 Modified: trunk/bogofilter/src/bogoconfig.c =================================================================== --- trunk/bogofilter/src/bogoconfig.c 2009-02-01 03:46:06 UTC (rev 6776) +++ trunk/bogofilter/src/bogoconfig.c 2009-02-01 03:49:48 UTC (rev 6777) @@ -357,6 +357,9 @@ " --terse-format short form\n", " --thresh-update no update if near 0 or 1\n", " --timestamp enable/disable token timestamps\n", + " --token-count fixed token count for scoring\n", + " --token-count-min min token count for scoring\n", + " --token-count-max min token count for scoring\n", #ifndef DISABLE_UNICODE " --unicode enable/disable unicode based wordlist\n", #endif @@ -725,6 +728,9 @@ case O_TERSE_FORMAT: terse_format = get_string(name, val); break; case O_THRESH_UPDATE: get_double(name, val, &thresh_update); break; case O_TIMESTAMP: timestamp_tokens = get_bool(name, val); break; + case O_TOKEN_COUNT_FIX: token_count_fix = atoi(val); break; + case O_TOKEN_COUNT_MIN: token_count_min = atoi(val); break; + case O_TOKEN_COUNT_MAX: token_count_max = atoi(val); break; case O_UNSURE_SUBJECT_TAG: unsure_subject_tag = get_string(name, val); break; case O_UNICODE: encoding = get_bool(name, val) ? E_UNICODE : E_RAW; break; case O_WORDLIST: configure_wordlist(val); break; @@ -746,6 +752,7 @@ #define Q1 if (query >= 1) #define Q2 if (query >= 2) +#define Q3 if (query >= 3) #define YN(b) (b ? "Yes" : "No") #define NB(b) ((b != NULL && *b != '\0') ? b : "''") @@ -762,6 +769,10 @@ Q1 fprintf(stdout, "%-11s = %0.6f # (%8.2e)\n", "ns_esf", ns_esf, ns_esf); Q1 fprintf(stdout, "%-11s = %0.6f # (%8.2e)\n", "sp_esf", sp_esf, sp_esf); Q1 fprintf(stdout, "\n"); + Q3 fprintf(stdout, "%-17s = %d\n", "token-count", token_count_fix); + Q3 fprintf(stdout, "%-17s = %d\n", "token-count-min", token_count_min); + Q3 fprintf(stdout, "%-17s = %d\n", "token-count-max", token_count_max); + Q3 fprintf(stdout, "\n"); Q1 fprintf(stdout, "%-17s = %s\n", "block-on-subnets", YN(block_on_subnets)); Q1 fprintf(stdout, "%-17s = %s\n", "encoding", (encoding != E_UNICODE) ? "raw" : "utf-8"); Q1 fprintf(stdout, "%-17s = %s\n", "charset-default", charset_default); Modified: trunk/bogofilter/src/bogofilter.c =================================================================== --- trunk/bogofilter/src/bogofilter.c 2009-02-01 03:46:06 UTC (rev 6776) +++ trunk/bogofilter/src/bogofilter.c 2009-02-01 03:49:48 UTC (rev 6777) @@ -154,6 +154,8 @@ register_words(run_type, words, msgcount); } + score_cleanup(); + if (logflag && register_opt) write_log_message(status); Modified: trunk/bogofilter/src/globals.c =================================================================== --- trunk/bogofilter/src/globals.c 2009-02-01 03:46:06 UTC (rev 6776) +++ trunk/bogofilter/src/globals.c 2009-02-01 03:49:48 UTC (rev 6777) @@ -49,6 +49,10 @@ uint max_multi_token_len = 0; uint multi_token_count = MUL_TOKEN_CNT; +uint token_count_fix = 0; +uint token_count_min = 0; +uint token_count_max = 0; + const char *update_dir; /*@observer@*/ const char *stats_prefix; Modified: trunk/bogofilter/src/globals.h =================================================================== --- trunk/bogofilter/src/globals.h 2009-02-01 03:46:06 UTC (rev 6776) +++ trunk/bogofilter/src/globals.h 2009-02-01 03:49:48 UTC (rev 6777) @@ -51,6 +51,10 @@ extern double spam_cutoff; extern double thresh_update; +extern uint token_count_fix; +extern uint token_count_min; +extern uint token_count_max; + extern int abort_on_error; extern bool stats_in_header; Modified: trunk/bogofilter/src/longoptions.h =================================================================== --- trunk/bogofilter/src/longoptions.h 2009-02-01 03:46:06 UTC (rev 6776) +++ trunk/bogofilter/src/longoptions.h 2009-02-01 03:49:48 UTC (rev 6777) @@ -65,6 +65,9 @@ O_TERSE, O_TERSE_FORMAT, O_THRESH_UPDATE, + O_TOKEN_COUNT_FIX, + O_TOKEN_COUNT_MIN, + O_TOKEN_COUNT_MAX, O_TIMESTAMP, O_UNICODE, O_UNSURE_SUBJECT_TAG, @@ -96,7 +99,10 @@ /* options for bogofilter */ #define LONGOPTIONS_MAIN \ - { "ham-true" , N, 0, O_HAM_TRUE }, + { "ham-true" , N, 0, O_HAM_TRUE }, \ + { "token-count" , R, 0, O_TOKEN_COUNT_FIX }, \ + { "token-count-min" , R, 0, O_TOKEN_COUNT_MIN }, \ + { "token-count-max" , R, 0, O_TOKEN_COUNT_MAX }, /* options for bogofilter and bogolexer */ #define LONGOPTIONS_LEX \ Modified: trunk/bogofilter/src/rstats.c =================================================================== --- trunk/bogofilter/src/rstats.c 2009-02-01 03:46:06 UTC (rev 6776) +++ trunk/bogofilter/src/rstats.c 2009-02-01 03:49:48 UTC (rev 6777) @@ -32,6 +32,7 @@ u_int32_t bad; u_int32_t msgs_good; u_int32_t msgs_bad; + bool used; double prob; }; @@ -49,6 +50,7 @@ uint robn; /* words in score */ FLOAT p; /* Robinson's P */ FLOAT q; /* Robinson's Q */ + double min_dev; double spamicity; }; @@ -85,7 +87,7 @@ stats_tail = NULL; } -void rstats_add(const word_t *token, double prob, wordcnts_t *cnts) +void rstats_add(const word_t *token, double prob, bool used, wordcnts_t *cnts) { if (token == NULL) return; @@ -97,6 +99,7 @@ */ stats_tail->token = token; stats_tail->prob = prob; + stats_tail->used = used; stats_tail->good = cnts->good; stats_tail->bad = cnts->bad; stats_tail->msgs_good = cnts->msgs_good; @@ -172,11 +175,12 @@ h->spamicity=0.0; while (r < count) { - double prob = rstats_array[r]->prob; + rstats_t *cur = rstats_array[r]; + double prob = cur->prob; if (prob >= fin) break; - if (fabs(EVEN_ODDS - prob) - min_dev >= EPS) + if (cur->used) { cnt += 1; h->prob += prob; @@ -233,10 +237,10 @@ /* print header */ if (!Rtable) (void)fprintf(fpo, "%s%*s %6s %-6s %-6s %-6s %s\n", - pfx, max_token_len+2,"","n", "pgood", "pbad", "fw", "U"); + pfx, max_token_len+2, "", "n", "pgood", "pbad", "fw", "U"); else (void)fprintf(fpo, "%s%*s %6s %-6s %-6s %-6s %-6s %-6s %s\n", - pfx, max_token_len+2,"","n", "pgood", "pbad", "fw","invfwlog", "fwlog", "U"); + pfx, max_token_len+2, "", "n", "pgood", "pbad", "fw", "invfwlog", "fwlog", "U"); /* Print 1 line per token */ for (r= 0; r<count; r+=1) @@ -244,7 +248,7 @@ rstats_t *cur = rstats_array[r]; int len = (cur->token->leng >= max_token_len) ? 0 : (max_token_len - cur->token->leng); double fw = calc_prob(cur->good, cur->bad, cur->msgs_good, cur->msgs_bad); - char flag = (fabs(fw-EVEN_ODDS) - min_dev >= EPS) ? '+' : '-'; + char flag = cur->used ? '+' : '-'; (void)fprintf(fpo, "%s\"", pfx); (void)word_puts(cur->token, 0, fpo); Modified: trunk/bogofilter/src/rstats.h =================================================================== --- trunk/bogofilter/src/rstats.h 2009-02-01 03:46:06 UTC (rev 6776) +++ trunk/bogofilter/src/rstats.h 2009-02-01 03:49:48 UTC (rev 6777) @@ -18,6 +18,7 @@ void rstats_add(const word_t *token, double prob, + bool used, wordcnts_t *cnts); void rstats_fini(size_t robn, Modified: trunk/bogofilter/src/score.c =================================================================== --- trunk/bogofilter/src/score.c 2009-02-01 03:46:06 UTC (rev 6776) +++ trunk/bogofilter/src/score.c 2009-02-01 03:49:48 UTC (rev 6777) @@ -42,6 +42,11 @@ /* Structure Definitions */ +typedef struct probnode_t { + hashnode_t * node; + double prob; +} probnode_t; + /* struct for saving stats for printing. */ typedef struct score_s { double min_dev; @@ -56,6 +61,8 @@ /* Function Prototypes */ static double get_spamicity(size_t robn, FLOAT P, FLOAT Q); +static double recompute_min_dev(wordhash_t *wh); +static void compute_spamicity(wordhash_t *wh, FLOAT *P, FLOAT *Q, size_t *robn, bool need_stats); /* Static Variables */ @@ -199,19 +206,16 @@ return; } - /** selects the best spam/non-spam indicators and calculates Robinson's S, * \return -1.0 for error, S otherwise */ double msg_compute_spamicity(wordhash_t *wh) /*@globals errno@*/ { - hashnode_t *node; - FLOAT P = {1.0, 0}; /* Robinson's P */ FLOAT Q = {1.0, 0}; /* Robinson's Q */ double spamicity; size_t robn = 0; - size_t count = 0; + bool need_stats = (Rtable || passthrough || (verbose > 0)) && !fBogotune; if (DEBUG_ALGORITHM(2)) fprintf(dbgout, "### msg_compute_spamicity() begins\n"); @@ -219,8 +223,46 @@ if (DEBUG_ALGORITHM(2)) fprintf(dbgout, "min_dev: %f, robs: %f, robx: %f\n", min_dev, robs, robx); + if (token_count_min + token_count_max + token_count_fix == 0) + { + score.min_dev = min_dev; + } + else + { + score.min_dev = recompute_min_dev(wh); + } + + compute_spamicity(wh, &P, &Q, &robn, need_stats); + + /* Robinson's P, Q and S + ** S = (P - Q) / (P + Q) [combined indicator] + */ + + spamicity = get_spamicity(robn, P, Q); + + if (need_stats && robn != 0) + rstats_fini(robn, P, Q, spamicity); + + if (DEBUG_ALGORITHM(2)) fprintf(dbgout, "### msg_compute_spamicity() ends\n"); + + return spamicity; +} + +/* +** compute_spamicity() +** compute the spamicity from the linked list of tokens using +** min_dev to select tokens +*/ +void compute_spamicity(wordhash_t *wh, + FLOAT *P, FLOAT *Q, size_t *robn, + bool need_stats) +{ + hashnode_t *node; + + size_t count = 0; for (node = wordhash_first(wh); node != NULL; node = wordhash_next(wh)) { + bool useflag; double prob; word_t *token; wordcnts_t *cnts; @@ -239,52 +281,110 @@ prob = calc_prob(cnts->good, cnts->bad, cnts->msgs_good, cnts->msgs_bad); + useflag = fabs(EVEN_ODDS - prob) >= score.min_dev; if (need_stats) - rstats_add(token, prob, cnts); + rstats_add(token, prob, useflag, cnts); /* Robinson's P and Q; accumulation step */ /* * P = 1 - ((1-p1)*(1-p2)*...*(1-pn))^(1/n) [spamminess] * Q = 1 - (p1*p2*...*pn)^(1/n) [non-spamminess] */ - if (fabs(EVEN_ODDS - prob) - min_dev >= EPS) { + if (useflag ) { int e; - P.mant *= 1-prob; - if (P.mant < 1.0e-200) { - P.mant = frexp(P.mant, &e); - P.exp += e; + P->mant *= 1-prob; + if (P->mant < 1.0e-200) { + P->mant = frexp(P->mant, &e); + P->exp += e; } - Q.mant *= prob; - if (Q.mant < 1.0e-200) { - Q.mant = frexp(Q.mant, &e); - Q.exp += e; + Q->mant *= prob; + if (Q->mant < 1.0e-200) { + Q->mant = frexp(Q->mant, &e); + Q->exp += e; } - robn ++; + *robn += 1; } if (DEBUG_ALGORITHM(3)) { (void)fprintf(dbgout, "%3lu %3lu %f ", - (unsigned long)robn, (unsigned long)count, prob); + (unsigned long)*robn, (unsigned long)count, prob); (void)word_puts(token, 0, dbgout); (void)fputc('\n', dbgout); } } +} - /* Robinson's P, Q and S - ** S = (P - Q) / (P + Q) [combined indicator] - */ +double recompute_min_dev(wordhash_t *wh) +{ + size_t node_index = 0; + size_t prob_index; + size_t node_count = max(token_count_fix, max(token_count_min, token_count_max)); - spamicity = get_spamicity(robn, P, Q); + double min_prob = (token_count_max == 0.0) ? min_dev : 1.0; - if (need_stats && robn != 0) - rstats_fini(robn, P, Q, spamicity); + hashnode_t *node; + probnode_t *node_array = calloc(node_count, sizeof(probnode_t)); - if (DEBUG_ALGORITHM(2)) fprintf(dbgout, "### msg_compute_spamicity() ends\n"); + for (node = wordhash_first(wh); node != NULL; node = wordhash_next(wh)) + { + double prob; + word_t *token; + wordcnts_t *cnts; + wordprop_t *props; - return spamicity; + if (!fBogotune) { + props = (wordprop_t *) node->buf; + cnts = &props->cnts; + token = node->key; + } else { + cnts = (wordcnts_t *) node; + token = NULL; + } + + prob = calc_prob(cnts->good, cnts->bad, + cnts->msgs_good, cnts->msgs_bad); + prob = fabs(prob - EVEN_ODDS); + + if (node_index < node_count) + { + node_array[node_index].node = node; + node_array[node_index].prob = prob; + if (prob < min_prob) + min_prob = prob; + node_index += 1; + continue; + } + + if (prob > min_prob) + { + for (prob_index = 0; prob_index < node_count; prob_index += 1) + { + /* replace element with minimum deviation */ + if (node_array[prob_index].prob == min_prob) + { + node_array[prob_index].node = node; + node_array[prob_index].prob = prob; + break; + } + } + min_prob = 1.0; + /* find element with minimum deviation */ + for (prob_index = 0; prob_index < node_count; prob_index += 1) + { + if (node_array[prob_index].prob < min_prob) + { + min_prob = node_array[ prob_index ].prob; + } + } + } + } + + free(node_array); + + return min_prob; } void score_initialize(void) @@ -293,8 +393,6 @@ wordlist_t *list = get_default_wordlist(word_lists); - rstats_init(); - if (fabs(min_dev) < EPS) min_dev = MIN_DEV; if (spam_cutoff < EPS) @@ -343,7 +441,7 @@ void score_cleanup(void) { - rstats_cleanup(); +// rstats_cleanup(); } #ifdef GSL_INTEGRATE_PDF @@ -411,7 +509,7 @@ } else if (score.q_pr < DBL_EPSILON && score.p_pr < DBL_EPSILON) { score.spamicity = 0.5; } else { - score.spamicity = score.q_pr / ( score.q_pr + score.p_pr); + score.spamicity = score.q_pr / (score.q_pr + score.p_pr); } } @@ -425,13 +523,13 @@ pfx, max_token_len+2, "N_P_Q_S_s_x_md", (unsigned long)score.robn, score.p_pr, score.q_pr, score.spamicity); (void)fprintf(fpo, "%s%-*s %9.6f %9.6f %9.6f\n", - pfx, max_token_len+2+6, " ", robs, robx, min_dev); + pfx, max_token_len+2+6, " ", robs, robx, score.min_dev); } else { /* Trim token to 22 characters to accomodate R's default line length of 80 */ (void)fprintf(fpo, "%s%-24s %6lu %9.2e %9.2e %9.2e %9.2e %9.2e %5.3f\n", pfx, "N_P_Q_S_s_x_md", (unsigned long)score.robn, - score.p_pr, score.q_pr, score.spamicity, robs, robx, min_dev); + score.p_pr, score.q_pr, score.spamicity, robs, robx, score.min_dev); } } Modified: trunk/bogofilter/src/score.h =================================================================== --- trunk/bogofilter/src/score.h 2009-02-01 03:46:06 UTC (rev 6776) +++ trunk/bogofilter/src/score.h 2009-02-01 03:49:48 UTC (rev 6777) @@ -19,6 +19,6 @@ extern void msg_print_stats(FILE *fp); extern void msg_print_summary(const char *pfx); -extern void print_summary(void); +extern void print_summary(void); #endif Modified: trunk/bogofilter/src/tests/Makefile.am =================================================================== --- trunk/bogofilter/src/tests/Makefile.am 2009-02-01 03:46:06 UTC (rev 6776) +++ trunk/bogofilter/src/tests/Makefile.am 2009-02-01 03:49:48 UTC (rev 6777) @@ -41,7 +41,8 @@ t.lexer t.lexer.mbx \ t.spam.header.place \ t.block.on.subnets \ - t.multiple.tokens.head t.multiple.tokens.body t.token.min.mul \ + t.token.count \ + t.multiple.tokens.head t.multiple.tokens.body t.multiple.tokens.min.mul \ $(ENCODING_TESTS) \ t.rfc2047_broken t.rfc2047_folded \ t.message_addr t.message_id t.queue_id Added: trunk/bogofilter/src/tests/outputs/token.count.ref =================================================================== --- trunk/bogofilter/src/tests/outputs/token.count.ref (rev 0) +++ trunk/bogofilter/src/tests/outputs/token.count.ref 2009-02-01 03:49:48 UTC (rev 6777) @@ -0,0 +1,155 @@ +#### --min-dev=0.496 #### +X-Bogosity: Unsure, tests=bogofilter, spamicity=0.493025 + n pgood pbad fw U + "there" 14 0.291667 0.000000 0.000660 + + "its" 12 0.229167 0.047619 0.172558 - + "all" 21 0.395833 0.095238 0.194216 - + "web" 10 0.187500 0.047619 0.203096 - + "too" 9 0.166667 0.047619 0.222810 - + "more" 15 0.270833 0.095238 0.260471 - + "file" 6 0.104167 0.047619 0.314336 - + "also" 11 0.187500 0.095238 0.337138 - + "about" 16 0.270833 0.142857 0.345518 - + "little" 5 0.083333 0.047619 0.364191 - + "any" 20 0.312500 0.238095 0.432510 - + "would" 15 0.229167 0.190476 0.453979 - + "visit" 3 0.041667 0.047619 0.533255 - + "does" 9 0.125000 0.142857 0.533307 - + "for" 49 0.666667 0.809524 0.548377 - + "how" 14 0.187500 0.238095 0.559390 - + "name" 8 0.104167 0.142857 0.578184 - + "new" 12 0.145833 0.238095 0.620007 - + "this" 45 0.541667 0.904762 0.625473 - + "home" 9 0.104167 0.190476 0.646215 - + "over" 10 0.104167 0.238095 0.695340 - + "way" 11 0.104167 0.285714 0.732481 - + "visiting" 1 0.000000 0.047619 0.991605 - + "dealer" 2 0.000000 0.095238 0.995766 - + "agree" 3 0.000000 0.142857 0.997169 + + N_P_Q_S_s_x_md 2 0.019431 0.005482 0.493025 + 0.017800 0.520000 0.496000 +#### --min-dev=0.496 --token-count-min=4 #### +X-Bogosity: Unsure, tests=bogofilter, spamicity=0.532831 + n pgood pbad fw U + "there" 14 0.291667 0.000000 0.000660 + + "its" 12 0.229167 0.047619 0.172558 - + "all" 21 0.395833 0.095238 0.194216 - + "web" 10 0.187500 0.047619 0.203096 - + "too" 9 0.166667 0.047619 0.222810 - + "more" 15 0.270833 0.095238 0.260471 - + "file" 6 0.104167 0.047619 0.314336 - + "also" 11 0.187500 0.095238 0.337138 - + "about" 16 0.270833 0.142857 0.345518 - + "little" 5 0.083333 0.047619 0.364191 - + "any" 20 0.312500 0.238095 0.432510 - + "would" 15 0.229167 0.190476 0.453979 - + "visit" 3 0.041667 0.047619 0.533255 - + "does" 9 0.125000 0.142857 0.533307 - + "for" 49 0.666667 0.809524 0.548377 - + "how" 14 0.187500 0.238095 0.559390 - + "name" 8 0.104167 0.142857 0.578184 - + "new" 12 0.145833 0.238095 0.620007 - + "this" 45 0.541667 0.904762 0.625473 - + "home" 9 0.104167 0.190476 0.646215 - + "over" 10 0.104167 0.238095 0.695340 - + "way" 11 0.104167 0.285714 0.732481 - + "visiting" 1 0.000000 0.047619 0.991605 + + "dealer" 2 0.000000 0.095238 0.995766 + + "agree" 3 0.000000 0.142857 0.997169 + + N_P_Q_S_s_x_md 4 0.000085 0.065746 0.532831 + 0.017800 0.520000 0.491605 +#### --min-dev=0.100 #### +X-Bogosity: Unsure, tests=bogofilter, spamicity=0.559839 + n pgood pbad fw U + "there" 14 0.291667 0.000000 0.000660 + + "its" 12 0.229167 0.047619 0.172558 + + "all" 21 0.395833 0.095238 0.194216 + + "web" 10 0.187500 0.047619 0.203096 + + "too" 9 0.166667 0.047619 0.222810 + + "more" 15 0.270833 0.095238 0.260471 + + "file" 6 0.104167 0.047619 0.314336 + + "also" 11 0.187500 0.095238 0.337138 + + "about" 16 0.270833 0.142857 0.345518 + + "little" 5 0.083333 0.047619 0.364191 + + "any" 20 0.312500 0.238095 0.432510 - + "would" 15 0.229167 0.190476 0.453979 - + "visit" 3 0.041667 0.047619 0.533255 - + "does" 9 0.125000 0.142857 0.533307 - + "for" 49 0.666667 0.809524 0.548377 - + "how" 14 0.187500 0.238095 0.559390 - + "name" 8 0.104167 0.142857 0.578184 - + "new" 12 0.145833 0.238095 0.620007 + + "this" 45 0.541667 0.904762 0.625473 + + "home" 9 0.104167 0.190476 0.646215 + + "over" 10 0.104167 0.238095 0.695340 + + "way" 11 0.104167 0.285714 0.732481 + + "visiting" 1 0.000000 0.047619 0.991605 + + "dealer" 2 0.000000 0.095238 0.995766 + + "agree" 3 0.000000 0.142857 0.997169 + + N_P_Q_S_s_x_md 18 0.073858 0.193537 0.559839 + 0.017800 0.520000 0.100000 +#### --min-dev=0.100 --token-count-max=8 #### +X-Bogosity: Unsure, tests=bogofilter, spamicity=0.514634 + n pgood pbad fw U + "there" 14 0.291667 0.000000 0.000660 + + "its" 12 0.229167 0.047619 0.172558 + + "all" 21 0.395833 0.095238 0.194216 + + "web" 10 0.187500 0.047619 0.203096 + + "too" 9 0.166667 0.047619 0.222810 + + "more" 15 0.270833 0.095238 0.260471 - + "file" 6 0.104167 0.047619 0.314336 - + "also" 11 0.187500 0.095238 0.337138 - + "about" 16 0.270833 0.142857 0.345518 - + "little" 5 0.083333 0.047619 0.364191 - + "any" 20 0.312500 0.238095 0.432510 - + "would" 15 0.229167 0.190476 0.453979 - + "visit" 3 0.041667 0.047619 0.533255 - + "does" 9 0.125000 0.142857 0.533307 - + "for" 49 0.666667 0.809524 0.548377 - + "how" 14 0.187500 0.238095 0.559390 - + "name" 8 0.104167 0.142857 0.578184 - + "new" 12 0.145833 0.238095 0.620007 - + "this" 45 0.541667 0.904762 0.625473 - + "home" 9 0.104167 0.190476 0.646215 - + "over" 10 0.104167 0.238095 0.695340 - + "way" 11 0.104167 0.285714 0.732481 - + "visiting" 1 0.000000 0.047619 0.991605 + + "dealer" 2 0.000000 0.095238 0.995766 + + "agree" 3 0.000000 0.142857 0.997169 + + N_P_Q_S_s_x_md 8 0.005444 0.034712 0.514634 + 0.017800 0.520000 0.277190 +#### --min-dev=0.100 --token-count=20 #### +X-Bogosity: Unsure, tests=bogofilter, spamicity=0.570641 + n pgood pbad fw U + "there" 14 0.291667 0.000000 0.000660 + + "its" 12 0.229167 0.047619 0.172558 + + "all" 21 0.395833 0.095238 0.194216 + + "web" 10 0.187500 0.047619 0.203096 + + "too" 9 0.166667 0.047619 0.222810 + + "more" 15 0.270833 0.095238 0.260471 + + "file" 6 0.104167 0.047619 0.314336 + + "also" 11 0.187500 0.095238 0.337138 + + "about" 16 0.270833 0.142857 0.345518 + + "little" 5 0.083333 0.047619 0.364191 + + "any" 20 0.312500 0.238095 0.432510 + + "would" 15 0.229167 0.190476 0.453979 - + "visit" 3 0.041667 0.047619 0.533255 - + "does" 9 0.125000 0.142857 0.533307 - + "for" 49 0.666667 0.809524 0.548377 - + "how" 14 0.187500 0.238095 0.559390 - + "name" 8 0.104167 0.142857 0.578184 + + "new" 12 0.145833 0.238095 0.620007 + + "this" 45 0.541667 0.904762 0.625473 + + "home" 9 0.104167 0.190476 0.646215 + + "over" 10 0.104167 0.238095 0.695340 + + "way" 11 0.104167 0.285714 0.732481 + + "visiting" 1 0.000000 0.047619 0.991605 + + "dealer" 2 0.000000 0.095238 0.995766 + + "agree" 3 0.000000 0.142857 0.997169 + + N_P_Q_S_s_x_md 20 0.100430 0.241712 0.570641 + 0.017800 0.520000 0.067490 +#### U 0.493025 --min-dev=0.496 +#### U 0.532831 --min-dev=0.496 --token-count-min=4 +#### U 0.559839 --min-dev=0.100 +#### U 0.514634 --min-dev=0.100 --token-count-max=8 +#### U 0.570641 --min-dev=0.100 --token-count=20 Copied: trunk/bogofilter/src/tests/t.multiple.tokens.min.mul (from rev 6771, trunk/bogofilter/src/tests/t.token.min.mul) =================================================================== --- trunk/bogofilter/src/tests/t.multiple.tokens.min.mul (rev 0) +++ trunk/bogofilter/src/tests/t.multiple.tokens.min.mul 2009-02-01 03:49:48 UTC (rev 6777) @@ -0,0 +1,111 @@ +#!/bin/sh + +. ${srcdir=.}/t.frame + +INP="$TMPDIR/test.inp" +REF="$TMPDIR"/test.ref +OUT="$TMPDIR/test.out" +CORRECT="$SYSTEST/outputs/multiple.wordlists.ref" + +cat <<EOF > "$INP" +a b2 cc3 +aa3 bbb4 cc3 ddd4 ee3 fff4 +EOF + +cat <<EOF > "$REF" +::: 1 2 ::: +a +b2 +a*b2 +cc3 +b2*cc3 +aa3 +cc3*aa3 +bbb4 +aa3*bbb4 +cc3 +bbb4*cc3 +ddd4 +cc3*ddd4 +ee3 +ddd4*ee3 +fff4 +ee3*fff4 +::: 1 3 ::: +a +b2 +a*b2 +cc3 +b2*cc3 +a*b2*cc3 +aa3 +cc3*aa3 +b2*cc3*aa3 +bbb4 +aa3*bbb4 +cc3*aa3*bbb4 +cc3 +bbb4*cc3 +aa3*bbb4*cc3 +ddd4 +cc3*ddd4 +bbb4*cc3*ddd4 +ee3 +ddd4*ee3 +cc3*ddd4*ee3 +fff4 +ee3*fff4 +ddd4*ee3*fff4 +::: 2 2 ::: +b2 +cc3 +b2*cc3 +aa3 +cc3*aa3 +bbb4 +aa3*bbb4 +cc3 +bbb4*cc3 +ddd4 +cc3*ddd4 +ee3 +ddd4*ee3 +fff4 +ee3*fff4 +::: 2 3 ::: +b2 +cc3 +b2*cc3 +aa3 +cc3*aa3 +b2*cc3*aa3 +bbb4 +aa3*bbb4 +cc3*aa3*bbb4 +cc3 +bbb4*cc3 +aa3*bbb4*cc3 +ddd4 +cc3*ddd4 +bbb4*cc3*ddd4 +ee3 +ddd4*ee3 +cc3*ddd4*ee3 +fff4 +ee3*fff4 +ddd4*ee3*fff4 +EOF + +for MIN in 1 2 ; do + for MUL in 2 3 ; do + echo "::: $MIN $MUL :::" >> "$OUT" + $BOGOLEXER -C -H -p --min-token-len $MIN --multi-token-count $MUL < "$INP" >> "$OUT" + done +done + +if [ $verbose -eq 0 ]; then + diff "$REF" "$OUT" + cmp "$REF" "$OUT" +else + diff $DIFF_BRIEF "$REF" "$OUT" +fi Property changes on: trunk/bogofilter/src/tests/t.multiple.tokens.min.mul ___________________________________________________________________ Added: svn:executable + * Added: svn:keywords + Author Date Id Revision Added: svn:mergeinfo + Added: svn:eol-style + native Added: trunk/bogofilter/src/tests/t.token.count =================================================================== --- trunk/bogofilter/src/tests/t.token.count (rev 0) +++ trunk/bogofilter/src/tests/t.token.count 2009-02-01 03:49:48 UTC (rev 6777) @@ -0,0 +1,87 @@ +#!/bin/sh +x + +. ${srcdir=.}/t.frame + +map_rc() +{ + ( + set +e + eval "$@" + a=$? + [ $a -eq 0 ] && exit 0 + [ $a -eq 1 ] && exit 0 + [ $a -eq 2 ] && exit 0 + exit $a + ) +} + +INP="$TMPDIR"/token.count.msg +OUT="$TMPDIR"/token.count.txt +REF="$OUTPUTS"/token.count.ref + +$BOGOFILTER -C -y 0 -s -M -I "$SYSTEST/inputs/spam.mbx" +$BOGOFILTER -C -y 0 -n -M -I "$SYSTEST/inputs/good.mbx" + +cat > $INP <<EOF +there 0.000660 4 8 20 +its 0.172558 8 20 +all 0.194216 8 20 +web 0.203096 8 20 +too 0.222810 9 +more 0.260471 9 20 +file 0.314336 20 +also 0.337138 20 +about 0.345518 20 +little 0.364191 20 +any 0.432510 20 +would 0.453979 20 +visit 0.533255 +does 0.533307 +for 0.548377 +how 0.559390 +name 0.578184 20 +new 0.620007 20 +this 0.625473 20 +home 0.646215 20 +over 0.695340 20 +way 0.732481 20 +visiting 0.991605 4 8 20 +dealer 0.995766 4 8 20 +agree 0.997169 4 8 20 +EOF + +CFG="$TMPDIR/test.cf" + +cat <<EOF > "$CFG" +header_format = %h: %c, tests=bogofilter, spamicity=%p +EOF + +# 2 tokens scored, increased to 4 by --token-count-min +MIN1="--min-dev=0.496" +MIN2="--min-dev=0.496 --token-count-min=4" + +# 18 tokens scored, decreased to 8 by --token-count-max +MAX1="--min-dev=0.100" +MAX2="--min-dev=0.100 --token-count-max=8" + +# 18 tokens scored, increased to 20 by --token-count +CNT2="--min-dev=0.100 --token-count=20" + +cat /dev/null > $OUT + +for OPT in "$MIN1" "$MIN2" "$MAX1" "$MAX2" "$CNT2" ; do + echo "#### $OPT ####" >> $OUT + map_rc $BOGOFILTER $OPT -c $CFG -vvv -H -I $INP >> $OUT +done + +for OPT in "$MIN1" "$MIN2" "$MAX1" "$MAX2" "$CNT2" ; do + SCORE=$( map_rc $BOGOFILTER $OPT -c $CFG -v -tt -H -I $INP ) + echo "#### $SCORE $OPT " >> $OUT +done + +if [ $verbose -eq 0 ]; then + diff "$REF" "$OUT" + cmp "$REF" "$OUT" +else + diff $DIFF_BRIEF "$REF" "$OUT" +fi Property changes on: trunk/bogofilter/src/tests/t.token.count ___________________________________________________________________ Added: svn:executable + * Deleted: trunk/bogofilter/src/tests/t.token.min.mul =================================================================== --- trunk/bogofilter/src/tests/t.token.min.mul 2009-02-01 03:46:06 UTC (rev 6776) +++ trunk/bogofilter/src/tests/t.token.min.mul 2009-02-01 03:49:48 UTC (rev 6777) @@ -1,111 +0,0 @@ -#!/bin/sh - -. ${srcdir=.}/t.frame - -INP="$TMPDIR/test.inp" -REF="$TMPDIR"/test.ref -OUT="$TMPDIR/test.out" -CORRECT="$SYSTEST/outputs/multiple.wordlists.ref" - -cat <<EOF > "$INP" -a b2 cc3 -aa3 bbb4 cc3 ddd4 ee3 fff4 -EOF - -cat <<EOF > "$REF" -::: 1 2 ::: -a -b2 -a*b2 -cc3 -b2*cc3 -aa3 -cc3*aa3 -bbb4 -aa3*bbb4 -cc3 -bbb4*cc3 -ddd4 -cc3*ddd4 -ee3 -ddd4*ee3 -fff4 -ee3*fff4 -::: 1 3 ::: -a -b2 -a*b2 -cc3 -b2*cc3 -a*b2*cc3 -aa3 -cc3*aa3 -b2*cc3*aa3 -bbb4 -aa3*bbb4 -cc3*aa3*bbb4 -cc3 -bbb4*cc3 -aa3*bbb4*cc3 -ddd4 -cc3*ddd4 -bbb4*cc3*ddd4 -ee3 -ddd4*ee3 -cc3*ddd4*ee3 -fff4 -ee3*fff4 -ddd4*ee3*fff4 -::: 2 2 ::: -b2 -cc3 -b2*cc3 -aa3 -cc3*aa3 -bbb4 -aa3*bbb4 -cc3 -bbb4*cc3 -ddd4 -cc3*ddd4 -ee3 -ddd4*ee3 -fff4 -ee3*fff4 -::: 2 3 ::: -b2 -cc3 -b2*cc3 -aa3 -cc3*aa3 -b2*cc3*aa3 -bbb4 -aa3*bbb4 -cc3*aa3*bbb4 -cc3 -bbb4*cc3 -aa3*bbb4*cc3 -ddd4 -cc3*ddd4 -bbb4*cc3*ddd4 -ee3 -ddd4*ee3 -cc3*ddd4*ee3 -fff4 -ee3*fff4 -ddd4*ee3*fff4 -EOF - -for MIN in 1 2 ; do - for MUL in 2 3 ; do - echo "::: $MIN $MUL :::" >> "$OUT" - $BOGOLEXER -C -H -p --min-token-len $MIN --multi-token-count $MUL < "$INP" >> "$OUT" - done -done - -if [ $verbose -eq 0 ]; then - diff "$REF" "$OUT" - cmp "$REF" "$OUT" -else - diff $DIFF_BRIEF "$REF" "$OUT" -fi This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <re...@us...> - 2009-02-20 18:32:56
|
Revision: 6809 http://bogofilter.svn.sourceforge.net/bogofilter/?rev=6809&view=rev Author: relson Date: 2009-02-20 18:32:47 +0000 (Fri, 20 Feb 2009) Log Message: ----------- Remove patch for flex allocation problem as it's fixed in 2.5.35. Modified Paths: -------------- trunk/bogofilter/NEWS trunk/bogofilter/configure.ac Modified: trunk/bogofilter/NEWS =================================================================== --- trunk/bogofilter/NEWS 2009-02-19 04:52:49 UTC (rev 6808) +++ trunk/bogofilter/NEWS 2009-02-20 18:32:47 UTC (rev 6809) @@ -15,6 +15,9 @@ ------------------------------------------------------------------------------- + 2009-02-20 + * Flex-2.5.35 has fix for allocation problem in 2.5.4, 2.5.31, and 2.5.33. + 2009-01-31 * Added token-count=n, token-count-min=n, and token-count-max=n options. * Minor code cleanups. Modified: trunk/bogofilter/configure.ac =================================================================== --- trunk/bogofilter/configure.ac 2009-02-19 04:52:49 UTC (rev 6808) +++ trunk/bogofilter/configure.ac 2009-02-20 18:32:47 UTC (rev 6809) @@ -194,6 +194,7 @@ *2.5.4) flex=254 ;; *2.5.31) flex=253x ;; *2.5.33) flex=253x ;; + *2.5.35) flex=O.K. ;; esac fi dnl flex=yes This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <re...@us...> - 2009-02-21 20:32:52
|
Revision: 6811 http://bogofilter.svn.sourceforge.net/bogofilter/?rev=6811&view=rev Author: relson Date: 2009-02-21 20:32:50 +0000 (Sat, 21 Feb 2009) Log Message: ----------- Add/fix option info. Modified Paths: -------------- trunk/bogofilter/bogofilter.cf.example trunk/bogofilter/src/bogoconfig.c Modified: trunk/bogofilter/bogofilter.cf.example =================================================================== --- trunk/bogofilter/bogofilter.cf.example 2009-02-21 20:32:44 UTC (rev 6810) +++ trunk/bogofilter/bogofilter.cf.example 2009-02-21 20:32:50 UTC (rev 6811) @@ -271,3 +271,12 @@ # of 0.000000 (surely ham) or 1.000000 (surely spam). # ## thresh_update=0.01 # (optional) + +#### token count parameters +# +# coerce the number of tokens used to score a message +# Note: zero means no coercing +# +##token_count=0 # default +##token_count_min=0 # default +##token_count_max=0 # default Modified: trunk/bogofilter/src/bogoconfig.c =================================================================== --- trunk/bogofilter/src/bogoconfig.c 2009-02-21 20:32:44 UTC (rev 6810) +++ trunk/bogofilter/src/bogoconfig.c 2009-02-21 20:32:50 UTC (rev 6811) @@ -86,6 +86,7 @@ static void process_arglist(int argc, char **argv, priority_t precedence, int pass); static bool get_parsed_value(char **arg, double *parm); static void comma_parse(char opt, const char *arg, double *parm1, double *parm2, double *parm3); +static bool token_count_conflict(void); /*---------------------------------------------------------------------------*/ @@ -360,7 +361,7 @@ " --timestamp enable/disable token timestamps\n", " --token-count fixed token count for scoring\n", " --token-count-min min token count for scoring\n", - " --token-count-max min token count for scoring\n", + " --token-count-max max token count for scoring\n", #ifndef DISABLE_UNICODE " --unicode enable/disable unicode based wordlist\n", #endif @@ -477,6 +478,11 @@ verbose = max(1, verbose); /* force printing */ set_terse_mode_format(inv_terse_mode); } + + if (token_count_conflict()) { + fprintf(stderr, "Conflicting token count arguments given.\n"); + exit(EX_ERROR); + } } return; @@ -834,3 +840,20 @@ fprintf(stdout, "%s %s", (i == 0) ? "" : ",", array[i]); fprintf(stdout, "\n"); } + +static bool token_count_conflict(void) +{ + if (token_count_fix != 0) { + if (token_count_fix < token_count_min) + return true; + } + + if (token_count_max != 0) { + if (token_count_max < token_count_min) + return true; + if (token_count_max < token_count_fix) + return true; + } + + return false; +} This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <re...@us...> - 2009-02-21 20:41:48
|
Revision: 6813 http://bogofilter.svn.sourceforge.net/bogofilter/?rev=6813&view=rev Author: relson Date: 2009-02-21 20:41:42 +0000 (Sat, 21 Feb 2009) Log Message: ----------- Update for 1.2.0 release. Modified Paths: -------------- trunk/bogofilter/NEWS trunk/bogofilter/RELEASES trunk/bogofilter/configure.ac Modified: trunk/bogofilter/NEWS =================================================================== --- trunk/bogofilter/NEWS 2009-02-21 20:34:32 UTC (rev 6812) +++ trunk/bogofilter/NEWS 2009-02-21 20:41:42 UTC (rev 6813) @@ -18,6 +18,9 @@ 2009-02-20 * Flex-2.5.35 has fix for allocation problem in 2.5.4, 2.5.31, and 2.5.33. + 2009-02-12 + * Bogofilter now uses listsort in place of qsort. + 2009-01-31 * Added token-count=n, token-count-min=n, and token-count-max=n options. * Minor code cleanups. Modified: trunk/bogofilter/RELEASES =================================================================== --- trunk/bogofilter/RELEASES 2009-02-21 20:34:32 UTC (rev 6812) +++ trunk/bogofilter/RELEASES 2009-02-21 20:41:42 UTC (rev 6813) @@ -2,7 +2,8 @@ =============================== with release or promotion date, sorted by age -1.1.7 (2008-05-18) this is the version everyone should use +1.2.0 (2009-02-21) +1.1.7 (2008-05-18) 1.1.5 (2007-01-14) 1.0.2 (2006-03-03) 1.0.1 (2006-01-01) Modified: trunk/bogofilter/configure.ac =================================================================== --- trunk/bogofilter/configure.ac 2009-02-21 20:34:32 UTC (rev 6812) +++ trunk/bogofilter/configure.ac 2009-02-21 20:41:42 UTC (rev 6813) @@ -17,7 +17,7 @@ dnl part of the bogofilter source). dnl ******************************************************** dnl -AC_INIT([bogofilter],[1.1.7]) +AC_INIT([bogofilter],[1.2.0]) dnl AC_PREREQ(2.59) dnl AC_PREREQ(2.60) dnl if AC_USE_SYSTEM_EXTENSIONS is desired This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <re...@us...> - 2009-05-27 03:11:56
|
Revision: 6829 http://bogofilter.svn.sourceforge.net/bogofilter/?rev=6829&view=rev Author: relson Date: 2009-05-27 03:11:48 +0000 (Wed, 27 May 2009) Log Message: ----------- Recognize CRLF as eol when processing quoted-printable attachments. Modified Paths: -------------- trunk/bogofilter/NEWS trunk/bogofilter/src/qp.c Modified: trunk/bogofilter/NEWS =================================================================== --- trunk/bogofilter/NEWS 2009-05-06 22:47:03 UTC (rev 6828) +++ trunk/bogofilter/NEWS 2009-05-27 03:11:48 UTC (rev 6829) @@ -15,6 +15,9 @@ ------------------------------------------------------------------------------- + 2009-05-25 + * Fixed eol problem in quoted_printable text. Problem reported by Stephen Davies. + 1.2.0 2009-02-21 (released) 2009-02-20 Modified: trunk/bogofilter/src/qp.c =================================================================== --- trunk/bogofilter/src/qp.c 2009-05-06 22:47:03 UTC (rev 6828) +++ trunk/bogofilter/src/qp.c 2009-05-27 03:11:48 UTC (rev 6829) @@ -41,6 +41,10 @@ } } +/* Function Prototypes */ + +static int qp_eol_check( byte *s, byte *e ); + /* Function Definitions */ uint qp_decode(word_t *word, qp_mode mode) @@ -56,10 +60,13 @@ int x, y; switch (ch) { case '=': - if (mode == RFC2045 && s + 1 <= e && s[0] == '\n') { - /* continuation line, trailing = */ - s++; - continue; + if (mode == RFC2045) { + int c = qp_eol_check( s, e ); + if (c != 0) { + /* continuation line, trailing = */ + s += c; + continue; + } } if (s + 2 <= e && (y = hex_to_bin(s[0])) >= 0 && (x = hex_to_bin(s[1])) >= 0) { @@ -129,3 +136,26 @@ return true; } + +static int qp_eol_check( byte *s, byte *e ) +{ + /* test for LF */ + if (s + 1 <= e && s[0] == '\n') + { + /* only LF */ + return 1; + } + + /* test for CR */ + if (s + 1 <= e && s[0] == '\r') + { + if (s + 2 <= e && s[1] == '\n') + /* CR LF */ + return 2; + else + /* only CR */ + return 1; + } + + return 0; +} This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <m-...@us...> - 2009-05-28 12:06:10
|
Revision: 6834 http://bogofilter.svn.sourceforge.net/bogofilter/?rev=6834&view=rev Author: m-a Date: 2009-05-28 12:06:02 +0000 (Thu, 28 May 2009) Log Message: ----------- Remove two scripts supposed to be rebuilt from distro. Modified Paths: -------------- trunk/bogofilter/NEWS trunk/bogofilter/src/Makefile.am trunk/bogofilter/src/tests/Makefile.am Modified: trunk/bogofilter/NEWS =================================================================== --- trunk/bogofilter/NEWS 2009-05-28 11:08:55 UTC (rev 6833) +++ trunk/bogofilter/NEWS 2009-05-28 12:06:02 UTC (rev 6834) @@ -16,6 +16,7 @@ ------------------------------------------------------------------------------- 2009-05-28 + * Remove two scripts supposed to be rebuilt from distro. * Added test case for Stephen Davies's Q-P EOL problem (see -05-25). 2009-05-25 Modified: trunk/bogofilter/src/Makefile.am =================================================================== --- trunk/bogofilter/src/Makefile.am 2009-05-28 11:08:55 UTC (rev 6833) +++ trunk/bogofilter/src/Makefile.am 2009-05-28 12:06:02 UTC (rev 6834) @@ -254,8 +254,7 @@ fgetsl_test_CFLAGS= -DMAIN # what to distribute -EXTRA_DIST = $(bin_SCRIPTS) \ - bogoupgrade.in \ +EXTRA_DIST = bogoupgrade.in \ version.sh \ strlcat.3 strlcpy.3 \ patch.lexer.254.txt patch.lexer.253x.txt Modified: trunk/bogofilter/src/tests/Makefile.am =================================================================== --- trunk/bogofilter/src/tests/Makefile.am 2009-05-28 11:08:55 UTC (rev 6833) +++ trunk/bogofilter/src/tests/Makefile.am 2009-05-28 12:06:02 UTC (rev 6834) @@ -49,16 +49,19 @@ WORDLIST_TESTS = t.dump.load t.nonascii.replace t.maint t.robx t.regtest t.upgrade.subnet.prefix t.multiple.wordlists t.probe t.bf_compact -SCORING_TESTS = t.query.config t.score1 t.score2 t.systest t.grftest t.wordhist +SCORING_TESTS = t.score1 t.score2 t.systest t.grftest t.wordhist BULKMODE_TESTS = t.bulkmode t.MH t.maildir t.bogoutil INTEGRITY_TESTS = t.lock1 t.lock3 t.valgrind # INTEGRITY_TESTS += t.lock2 +# these tests are built, but must not be shipped: +BUILT_TESTS = t.query.config + TESTSCRIPTS = ${ENVIRON_TESTS} ${PARSING_TESTS} ${WORDLIST_TESTS} ${SCORING_TESTS} ${BULKMODE_TESTS} ${INTEGRITY_TESTS} -TESTS=$(TESTSCRIPTS) +TESTS=$(BUILT_TESTS) $(TESTSCRIPTS) TESTS_ENVIRONMENT=RUN_FROM_MAKE=1 AWK=$(AWK) srcdir=$(srcdir) SHELL="$(SHELL)" $(SHELL) $(VERBOSE) This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <m-...@us...> - 2009-07-31 14:02:07
|
Revision: 6843 http://bogofilter.svn.sourceforge.net/bogofilter/?rev=6843&view=rev Author: m-a Date: 2009-07-31 14:02:00 +0000 (Fri, 31 Jul 2009) Log Message: ----------- Add XFAIL_TESTS for decoding failure of first line after EOH (Ubuntu #320829). Modified Paths: -------------- trunk/bogofilter/NEWS trunk/bogofilter/src/tests/Makefile.am Added Paths: ----------- trunk/bogofilter/src/tests/t.lexer.eoh Modified: trunk/bogofilter/NEWS =================================================================== --- trunk/bogofilter/NEWS 2009-07-31 09:02:41 UTC (rev 6842) +++ trunk/bogofilter/NEWS 2009-07-31 14:02:00 UTC (rev 6843) @@ -15,6 +15,11 @@ ------------------------------------------------------------------------------- + 2009-07-31 + * Added a mimal test case for Christian Frommeyer's MIME decoding bug, + Ubuntu/Launchpad Bug #320829, as expected failure (XFAIL_TESTS). + https://bugs.launchpad.net/ubuntu/+source/bogofilter/+bug/320829 + 2009-05-28 * Removed two scripts that are auto-built. * Added test case for Stephen Davies' Q-P EOL problem (see below). Modified: trunk/bogofilter/src/tests/Makefile.am =================================================================== --- trunk/bogofilter/src/tests/Makefile.am 2009-07-31 09:02:41 UTC (rev 6842) +++ trunk/bogofilter/src/tests/Makefile.am 2009-07-31 14:02:00 UTC (rev 6843) @@ -38,7 +38,7 @@ t.passthrough-hb \ t.escaped.html t.escaped.url \ t.split t.parsing \ - t.lexer t.lexer.mbx t.lexer.qpcr \ + t.lexer t.lexer.mbx t.lexer.qpcr t.lexer.eoh \ t.spam.header.place \ t.block.on.subnets \ t.token.count \ @@ -59,6 +59,9 @@ # these tests are built, but must not be shipped: BUILT_TESTS = t.query.config +# test scripts expected to fail +XFAIL_TESTS = t.lexer.eoh + TESTSCRIPTS = ${ENVIRON_TESTS} ${PARSING_TESTS} ${WORDLIST_TESTS} ${SCORING_TESTS} ${BULKMODE_TESTS} ${INTEGRITY_TESTS} TESTS=$(BUILT_TESTS) $(TESTSCRIPTS) Added: trunk/bogofilter/src/tests/t.lexer.eoh =================================================================== --- trunk/bogofilter/src/tests/t.lexer.eoh (rev 0) +++ trunk/bogofilter/src/tests/t.lexer.eoh 2009-07-31 14:02:00 UTC (rev 6843) @@ -0,0 +1,16 @@ +#! /bin/sh + +# This checks if bogofilter/bogolexer will properly MIME decode the +# first body line. It used to fail up to and including 1.2.0, +# because the lexer lookahead beyond <INITIAL>\n wasn't decoded. + +. ${srcdir:=.}/t.frame + +# This message contains one body word "SPAM". Check if it's present +# after scanning. +cat <<_EOF | $BOGOLEXER -p -C | grep SPAM +MIME-Version: 1.0 +Content-Transfer-Encoding: quoted-printable + +=53=50=41=4D +_EOF This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <m-...@us...> - 2009-07-31 19:18:41
|
Revision: 6848 http://bogofilter.svn.sourceforge.net/bogofilter/?rev=6848&view=rev Author: m-a Date: 2009-07-31 19:18:32 +0000 (Fri, 31 Jul 2009) Log Message: ----------- Fix lexer bug where the first body line wasn't properly MIME-decoded. Reason for this was inappropriate readahead past the EOH \n character, that caused input to be read in header mode when it was in fact body and should have been subjected to MIME decoding. We fix this by making sure that the EOH-producing rules and related \n rules do not require readahead and by forcing the scanner to always be interactive. Tests have been adjusted after carefully verifying that the lexer fix works as intended; tests had to be adjusted because good.mbx misattributed the Reporting-MTA line to a MIME header and prepended mime: when it was a body part. Modified Paths: -------------- trunk/bogofilter/NEWS trunk/bogofilter/src/lexer_v3.l trunk/bogofilter/src/tests/Makefile.am trunk/bogofilter/src/tests/t.lexer.mbx trunk/bogofilter/src/tests/t.maint trunk/bogofilter/src/tests/t.wordhist trunk/bogofilter/src/token.c Modified: trunk/bogofilter/NEWS =================================================================== --- trunk/bogofilter/NEWS 2009-07-31 19:14:43 UTC (rev 6847) +++ trunk/bogofilter/NEWS 2009-07-31 19:18:32 UTC (rev 6848) @@ -16,10 +16,15 @@ ------------------------------------------------------------------------------- 2009-07-31 - * Added a minimal test case for Christian Frommeyer's MIME decoding bug, - Ubuntu/Launchpad Bug #320829, as expected failure (XFAIL_TESTS). + * Fix Christian Frommeyer's MIME decoding bug, Ubuntu/Launchpad Bug + #320829. As a side effect, also fixes misattribution of MIME bodies + as MIME headers with mime: tag. Original bug report: https://bugs.launchpad.net/ubuntu/+source/bogofilter/+bug/320829 + Before this fix, bogofilter did not properly MIME-decode the first + line in a body. This was especially bad with Christian's samples + where the whole body was only one long base64 line. + 2009-05-28 * Removed two scripts that are auto-built. * Added test case for Stephen Davies' Q-P EOL problem (see below). Modified: trunk/bogofilter/src/lexer_v3.l =================================================================== --- trunk/bogofilter/src/lexer_v3.l 2009-07-31 19:14:43 UTC (rev 6847) +++ trunk/bogofilter/src/lexer_v3.l 2009-07-31 19:18:32 UTC (rev 6848) @@ -98,7 +98,7 @@ int yylineno; #endif -/* Function Prototypes */ +/* Function Prototypes/Forward Declarations */ static word_t *yy_text(void); static void html_char(void); @@ -112,6 +112,8 @@ char yy_get_state(void); void yy_set_state_initial(void); +static void header(void); + /* Function Definitions */ static word_t *yy_text(void) @@ -127,7 +129,12 @@ %option warn %option nodebug debug %option align caseless 8bit -%option never-interactive +/*********************************************************************** + WARNING: The scanner must be interactive so as not to look ahead past + \n = EOH and other header/body delimiters, else it will fail when MIME + decoding, because part of the body has already been read ahead. + ***********************************************************************/ +%option interactive always-interactive %option noreject noyywrap UINT8 ([01]?[0-9]?[0-9]|2([0-4][0-9]|5[0-5])) @@ -229,27 +236,34 @@ } <INITIAL>^(To|CC|From|Return-Path|Subject|Received): { set_tag(yytext); } -<INITIAL>^Content-(Transfer-Encoding|Type|Disposition):{MTYPE} { mime_content(yy_text()); skip_to(':'); return TOKEN; } +<INITIAL>^Content-(Transfer-Encoding|Type|Disposition):{MTYPE} { mime_content(yy_text()); skip_to(':'); header(); return TOKEN; } -<INITIAL>^(Delivery-)?Date:.* { return HEADKEY; } -<INITIAL>^Resent-Message-ID:.* { return HEADKEY; } <INITIAL>^Message-ID:.* { /* save token for logging */ int off = 11; while(isspace((unsigned char)yytext[off]) && off < yyleng) off++; set_msg_id((unsigned char *)(yytext+off), yyleng-off); + header(); return HEADKEY; } -<INITIAL>^(In-Reply-To|References):.* { return HEADKEY; } +<INITIAL>^(Delivery-)?Date:.* | +<INITIAL>^Resent-Message-ID:.* | +<INITIAL>^(In-Reply-To|References):.* { header(); return HEADKEY; } <INITIAL>boundary=[ ]*\"?{MIME_BOUNDARY}\"? { mime_boundary_set(yy_text()); } -<INITIAL>charset=\"?{CHARSET}\"? { got_charset(yytext); skip_to('='); return TOKEN; } +<INITIAL>charset=\"?{CHARSET}\"? { got_charset(yytext); skip_to('='); header(); return TOKEN; } <INITIAL>(file)?name=\"? /* ignore */ -<INITIAL>\n?[[:blank:]]id{WHITESPACE}+{ID} { return QUEUE_ID; } +<INITIAL>[[:blank:]]id{WHITESPACE}+{ID} { return QUEUE_ID; } -<INITIAL>\n[[:blank:]] { lineno += 1; } -<INITIAL>\n\n { enum mimetype type = get_content_type(); +/********************************************************************** + WARNING: Do NOT add header (<INITIAL>) rules that require characters + beyond a LF character (\n) - doing so will make the parser read ahead + parts of the body in the wrong MIME decoding mode and goof up + seriously. + **********************************************************************/ + +<INITIAL>^\n { enum mimetype type = get_content_type(); have_body = true; msg_header = false; clr_tag(); @@ -262,8 +276,9 @@ fprintf(dbgout, "*** end of header\n"); return EOH; } +<INITIAL>^{TOKEN} { header(); return TOKEN; } +<INITIAL>\n { lineno += 1; } -<INITIAL>\n { set_tag("Header"); lineno += 1; } <INITIAL>{VERP} { skip_to('='); return VERP; } ^-----BEGIN\ PGP\ SIGNATURE-----$ { BEGIN PGP_HEAD; @@ -320,6 +335,11 @@ <<EOF>> { return NONE; } %% +static void header(void) +{ + set_tag("Header"); +} + void lexer_v3_init(FILE *fp) { lineno = 0; @@ -438,7 +458,7 @@ { BEGIN INITIAL; msg_header = true; - set_tag("Header"); + header(); if (DEBUG_LEXER(1)) fprintf(dbgout, "BEGIN INITIAL\n"); Modified: trunk/bogofilter/src/tests/Makefile.am =================================================================== --- trunk/bogofilter/src/tests/Makefile.am 2009-07-31 19:14:43 UTC (rev 6847) +++ trunk/bogofilter/src/tests/Makefile.am 2009-07-31 19:18:32 UTC (rev 6848) @@ -60,7 +60,7 @@ BUILT_TESTS = t.query.config # test scripts expected to fail -XFAIL_TESTS = t.lexer.eoh +XFAIL_TESTS = TESTSCRIPTS = ${ENVIRON_TESTS} ${PARSING_TESTS} ${WORDLIST_TESTS} ${SCORING_TESTS} ${BULKMODE_TESTS} ${INTEGRITY_TESTS} Modified: trunk/bogofilter/src/tests/t.lexer.mbx =================================================================== --- trunk/bogofilter/src/tests/t.lexer.mbx 2009-07-31 19:14:43 UTC (rev 6847) +++ trunk/bogofilter/src/tests/t.lexer.mbx 2009-07-31 19:18:32 UTC (rev 6848) @@ -25,7 +25,7 @@ RESULT=`cat "$TMPDIR/spam.2" | wc -l`.`cat "$TMPDIR/good.2" | wc -l` RESULT=`echo "$RESULT" | sed s@\ @@g` -WANT="1787.4046" +WANT="1787.4045" if [ "$RESULT" != "$WANT" ] || [ $verbose -ne 0 ] ; then echo "want: $WANT, have: $RESULT" | tee "$TMPDIR"/$OUT Modified: trunk/bogofilter/src/tests/t.maint =================================================================== --- trunk/bogofilter/src/tests/t.maint 2009-07-31 19:14:43 UTC (rev 6847) +++ trunk/bogofilter/src/tests/t.maint 2009-07-31 19:18:32 UTC (rev 6848) @@ -18,16 +18,16 @@ fi cat >> "$TMPDIR"/ref.unicode.enabled <<EOF -initial: 5304 -count 0 -> 5305 +initial: 5303 +count 0 -> 5304 count 1 -> 1847 count 2 -> 950 count 3 -> 610 EOF cat >> "$TMPDIR"/ref.unicode.disabled <<EOF -initial: 5304 -count 0 -> 5304 +initial: 5303 +count 0 -> 5303 count 1 -> 1847 count 2 -> 949 count 3 -> 610 Modified: trunk/bogofilter/src/tests/t.wordhist =================================================================== --- trunk/bogofilter/src/tests/t.wordhist 2009-07-31 19:14:43 UTC (rev 6847) +++ trunk/bogofilter/src/tests/t.wordhist 2009-07-31 19:18:32 UTC (rev 6848) @@ -21,7 +21,7 @@ cat <<EOF > "$REF" Histogram score count pct histogram -0.00 3516 66.30 ################################################ +0.00 3515 66.30 ################################################ 0.05 1 0.02 # 0.10 1 0.02 # 0.15 7 0.13 # @@ -40,10 +40,10 @@ 0.80 63 1.19 # 0.85 14 0.26 # 0.90 28 0.53 # -0.95 1266 23.87 ################## -tot 5303 -hapaxes: ham 2594 (48.92%), spam 784 (14.78%) - pure: ham 3516 (66.30%), spam 1257 (23.70%) +0.95 1266 23.88 ################## +tot 5302 +hapaxes: ham 2593 (48.91%), spam 784 (14.79%) + pure: ham 3515 (66.30%), spam 1257 (23.71%) EOF OPTS="-C -y 0" Modified: trunk/bogofilter/src/token.c =================================================================== --- trunk/bogofilter/src/token.c 2009-07-31 19:14:43 UTC (rev 6847) +++ trunk/bogofilter/src/token.c 2009-07-31 19:18:32 UTC (rev 6848) @@ -228,7 +228,7 @@ if (msg_state->mime_type == MIME_MESSAGE) mime_add_child(msg_state); - if (leng == 2) + if (leng == 1) continue; else { /* "spc:invalid_end_of_header" */ token_copy( &yylval, nonblank_line); This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <m-...@us...> - 2009-07-31 19:22:25
|
Revision: 6849 http://bogofilter.svn.sourceforge.net/bogofilter/?rev=6849&view=rev Author: m-a Date: 2009-07-31 19:22:16 +0000 (Fri, 31 Jul 2009) Log Message: ----------- Ignore tags and TAGS files. Property Changed: ---------------- trunk/bogofilter/ trunk/bogofilter/src/ Property changes on: trunk/bogofilter ___________________________________________________________________ Modified: svn:ignore - .deps .rsyncs Makefile Makefile.in aclocal.m4 autogen.log autom4te.cache bogofilter-*.tar.gz bogofilter.pdf bogofilter.spec bogolexer.pdf bogoutil.pdf build* compile config.cache config.guess config.in config.log config.status config.sub configtest configure cscope.out depcomp doxygen install-sh missing mkinstalldirs ylwrap ChangeLog + .deps .rsyncs Makefile Makefile.in aclocal.m4 autogen.log autom4te.cache bogofilter-*.tar.gz bogofilter.pdf bogofilter.spec bogolexer.pdf bogoutil.pdf build* compile config.cache config.guess config.in config.log config.status config.sub configtest configure cscope.out depcomp doxygen install-sh missing mkinstalldirs ylwrap ChangeLog tags TAGS Property changes on: trunk/bogofilter/src ___________________________________________________________________ Modified: svn:ignore - .deps bogoupgrade Makefile Makefile.in bogofilter bogolexer bogoutil bogowordfreq compile config.h configtest cscope.out debugtest depcomp directories.c fgetsl.test find_home.test lexer_v3.c stamp-h1 version.c wordhash bogofilter_static bogolexer_static bogoutil_static bogotune config.log + .deps bogoupgrade Makefile Makefile.in bogofilter bogolexer bogoutil bogowordfreq compile config.h configtest cscope.out debugtest depcomp directories.c fgetsl.test find_home.test lexer_v3.c stamp-h1 version.c wordhash bogofilter_static bogolexer_static bogoutil_static bogotune config.log tags TAGS This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <re...@us...> - 2009-07-31 22:59:59
|
Revision: 6851 http://bogofilter.svn.sourceforge.net/bogofilter/?rev=6851&view=rev Author: relson Date: 2009-07-31 22:59:49 +0000 (Fri, 31 Jul 2009) Log Message: ----------- Update for 1.2.1 Modified Paths: -------------- trunk/bogofilter/NEWS trunk/bogofilter/configure.ac trunk/bogofilter/src/tests/inputs/mime-qp-cont-with-cr.txt Modified: trunk/bogofilter/NEWS =================================================================== --- trunk/bogofilter/NEWS 2009-07-31 19:24:18 UTC (rev 6850) +++ trunk/bogofilter/NEWS 2009-07-31 22:59:49 UTC (rev 6851) @@ -15,6 +15,8 @@ ------------------------------------------------------------------------------- +1.2.1 2009-08-01 (released) + 2009-07-31 * Fix Christian Frommeyer's MIME decoding bug, Ubuntu/Launchpad Bug #320829. As a side effect, also fixes misattribution of MIME bodies Modified: trunk/bogofilter/configure.ac =================================================================== --- trunk/bogofilter/configure.ac 2009-07-31 19:24:18 UTC (rev 6850) +++ trunk/bogofilter/configure.ac 2009-07-31 22:59:49 UTC (rev 6851) @@ -17,7 +17,7 @@ dnl part of the bogofilter source). dnl ******************************************************** dnl -AC_INIT([bogofilter],[1.2.0]) +AC_INIT([bogofilter],[1.2.1]) dnl AC_PREREQ(2.59) dnl AC_PREREQ(2.60) dnl if AC_USE_SYSTEM_EXTENSIONS is desired Modified: trunk/bogofilter/src/tests/inputs/mime-qp-cont-with-cr.txt =================================================================== (Binary files differ) This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <m-...@us...> - 2009-08-01 09:18:17
|
Revision: 6853 http://bogofilter.svn.sourceforge.net/bogofilter/?rev=6853&view=rev Author: m-a Date: 2009-08-01 09:18:08 +0000 (Sat, 01 Aug 2009) Log Message: ----------- Carefully update configure.ac. We now require automake 1.9 and autoconf 2.60. AC_TYPE_SIGNAL is gone, it was obsolete and unused. We now check the canonical host (rather than target) when checking particular system features; target is only used for compilers, binutils, and similar. Enabled AC_USE_SYSTEM_EXTENSIONS. Pending older change. Modified Paths: -------------- trunk/bogofilter/NEWS trunk/bogofilter/README.svn trunk/bogofilter/configure.ac Modified: trunk/bogofilter/NEWS =================================================================== --- trunk/bogofilter/NEWS 2009-08-01 02:53:13 UTC (rev 6852) +++ trunk/bogofilter/NEWS 2009-08-01 09:18:08 UTC (rev 6853) @@ -15,8 +15,14 @@ ------------------------------------------------------------------------------- -1.2.1 2009-08-01 (released) +1.2.1 2009-08-01 (released) + 2009-08-01 + * Update configure to use "host" rather than "target", to match the + newer autotools cross-build semantics. Untested. + Developers changing the build system and users who build from SVN + will now need automake 1.9 and autoconf 2.60. + 2009-07-31 * Fix Christian Frommeyer's MIME decoding bug, Ubuntu/Launchpad Bug #320829. As a side effect, also fixes misattribution of MIME bodies Modified: trunk/bogofilter/README.svn =================================================================== --- trunk/bogofilter/README.svn 2009-08-01 02:53:13 UTC (rev 6852) +++ trunk/bogofilter/README.svn 2009-08-01 09:18:08 UTC (rev 6853) @@ -1,16 +1,16 @@ README.svn -- How to build bogofilter from Subversion (SVN) $Id$ -(C) 2002,2007 by Matthias Andree. Freely distributable according to the -terms of the GNU Free Documentation License 1.0. No front- or back-matter -parts, no invariant parts. +(C) 2002,2007,2009 by Matthias Andree. Freely distributable according to +the terms of the GNU Free Documentation License 1.0. No front- or +back-matter parts, no invariant parts. ------------------------------------------------------------------------- After you have checked out bogofilter from SVN, some files are missing, for example, configure, ylwrap and others. These files can be created automatically with recent autoconf and -automake versions. You will need autoconf 2.53 and automake 1.6 or +automake versions. You will need autoconf 2.60 and automake 1.9 or newer. To recreate these files, run: autoreconf -i -s -f Modified: trunk/bogofilter/configure.ac =================================================================== --- trunk/bogofilter/configure.ac 2009-08-01 02:53:13 UTC (rev 6852) +++ trunk/bogofilter/configure.ac 2009-08-01 09:18:08 UTC (rev 6853) @@ -3,7 +3,8 @@ dnl dnl configure.ac for bogofilter dnl (C) Copyright 2003 Clint Adams, Gyepi Sam, David Relson, Matthias Andree -dnl (C) Copyright 2004,2005,2006 David Relson, Matthias Andree +dnl (C) Copyright 2008 Clint Adams, David Relson, Matthias Andree +dnl (C) Copyright 2004-2007, 2009 David Relson, Matthias Andree dnl dnl ******************************************************** dnl "Magic" environment variables for this script are: @@ -19,11 +20,11 @@ dnl AC_INIT([bogofilter],[1.2.1]) dnl -AC_PREREQ(2.59) -dnl AC_PREREQ(2.60) dnl if AC_USE_SYSTEM_EXTENSIONS is desired +AC_PREREQ(2.60) +AC_USE_SYSTEM_EXTENSIONS AC_CONFIG_SRCDIR([src/bogofilter.c]) -AC_CANONICAL_TARGET -AM_INIT_AUTOMAKE([foreign 1.8 dist-bzip2 no-installinfo]) +AC_CANONICAL_HOST +AM_INIT_AUTOMAKE([foreign 1.9 dist-bzip2 no-installinfo]) AC_CONFIG_HEADERS([src/config.h:config.in]) AC_PROG_AWK @@ -88,7 +89,7 @@ fi dnl crutch for b0rked S/390 gcc: -if test "x$target_cpu" = "xs390" && test "x$GCC" = "xyes" ; then +if test "x$host_cpu" = "xs390" && test "x$GCC" = "xyes" ; then case ${CFLAGS} in *-O*) OCFLAGS="$CFLAGS" @@ -105,7 +106,7 @@ dnl crutch for b0rked dgux gcc 2.7 (-g gives unresolved symbols in GSL), dnl gcc 2.8.1 is fine according to Message-ID: <199805061718.NAA01272@monty> dnl see http://list-archive.xemacs.org/xemacs-beta/199805/msg00291.html -case "$target_os" in +case "$host_os" in dgux*) if test "$GCC" = yes ; then case "`$CC --version`" in @@ -378,7 +379,6 @@ AC_TYPE_PID_T AC_TYPE_SIZE_T AC_TYPE_UID_T -AC_TYPE_SIGNAL AC_FUNC_SELECT_ARGTYPES AC_CHECK_TYPES([uint, ulong, uint32_t, u_int32_t, int32_t, int16_t, u_int16_t, uint16_t, u_int8_t, ssize_t]) AC_CHECK_TYPE(u_long, unsigned long) @@ -822,7 +822,7 @@ AM_CONDITIONAL(NEED_GETOPT,test $ac_cv_func_getopt_long != yes) have_dosish_system=no -case "${target}" in +case "${host}" in *-*-mingw32*) # special stuff for Windoze NT have_dosish_system=yes This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <m-...@us...> - 2009-08-05 08:43:15
|
Revision: 6862 http://bogofilter.svn.sourceforge.net/bogofilter/?rev=6862&view=rev Author: m-a Date: 2009-08-05 08:43:04 +0000 (Wed, 05 Aug 2009) Log Message: ----------- Do not remove .ENCODING token in maintenance, courtesy of Ted Phelps (Patch ID #1743984). Modified Paths: -------------- trunk/bogofilter/NEWS trunk/bogofilter/src/maint.c Modified: trunk/bogofilter/NEWS =================================================================== --- trunk/bogofilter/NEWS 2009-08-05 08:41:33 UTC (rev 6861) +++ trunk/bogofilter/NEWS 2009-08-05 08:43:04 UTC (rev 6862) @@ -15,6 +15,13 @@ ------------------------------------------------------------------------------- + 2009-08-05 + + * Merged an updated version of Ted Phelps's "Patch to prevent .ENCODING + from being discarded by bogoutil -m" (SourceForge Patch #1743984). + Thanks to Ted for debugging the issue and providing the patch (which + was for bogofilter v1.1.5). + 1.2.1 2009-08-01 (released) 2009-08-01 Modified: trunk/bogofilter/src/maint.c =================================================================== --- trunk/bogofilter/src/maint.c 2009-08-05 08:41:33 UTC (rev 6861) +++ trunk/bogofilter/src/maint.c 2009-08-05 08:43:04 UTC (rev 6862) @@ -117,11 +117,13 @@ { bool discard; - if (token->u.text[0] == '.') { /* keep .MSG_COUNT and .ROBX */ + if (token->u.text[0] == '.') { /* keep .ENCODING, .MSG_COUNT, and .ROBX */ if (strcmp((const char *)token->u.text, MSG_COUNT) == 0) return false; if (strcmp((const char *)token->u.text, ROBX_W) == 0) return false; + if (strcmp((const char *)token->u.text, WORDLIST_ENCODING) == 0) + return false; } discard = (thresh_count != 0) || (thresh_date != 0) || (size_min != 0) || (size_max != 0); This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <re...@us...> - 2009-08-13 11:22:47
|
Revision: 6866 http://bogofilter.svn.sourceforge.net/bogofilter/?rev=6866&view=rev Author: relson Date: 2009-08-13 11:22:40 +0000 (Thu, 13 Aug 2009) Log Message: ----------- Updated spamitarium to v0.4.0 Modified Paths: -------------- trunk/bogofilter/NEWS trunk/bogofilter/contrib/spamitarium.pl Modified: trunk/bogofilter/NEWS =================================================================== --- trunk/bogofilter/NEWS 2009-08-13 11:21:59 UTC (rev 6865) +++ trunk/bogofilter/NEWS 2009-08-13 11:22:40 UTC (rev 6866) @@ -15,6 +15,11 @@ ------------------------------------------------------------------------------- + 2009-08-13 + + * spamitarium.pl updated to version 0.4.0 + (thanks to Tom Anderson) + 2009-08-05 * Merged an updated version of Ted Phelps's "Patch to prevent .ENCODING Modified: trunk/bogofilter/contrib/spamitarium.pl =================================================================== --- trunk/bogofilter/contrib/spamitarium.pl 2009-08-13 11:21:59 UTC (rev 6865) +++ trunk/bogofilter/contrib/spamitarium.pl 2009-08-13 11:22:40 UTC (rev 6866) @@ -8,7 +8,7 @@ =cut -my $version = "0.3.0"; +my $version = "0.4.0"; ################################################ ############### Copyleft Notice ################ @@ -55,7 +55,7 @@ :0 { :0 fhw - | spamitarium -sreadx + | spamitarium -sreadxtp # filter through bogofilter, tagging as spam # or not and updating the word lists @@ -101,11 +101,17 @@ perform ASN lookups and include in received lines +=item B<p> + +perform SPF lookups and include in received lines + =item B<x> include custom x-headers for additional header validations: -- validate that the date header is within close proxmity to the +=item B<t> + +validate that the date header is within close proxmity to the received date (see $date_limit global variable to configure) =item B<w> @@ -129,7 +135,7 @@ I<list-id> and I<encrypted> fields passed through, you would change your procmail recipe as follows: - | spamitarium -sreadx list-id,encrypted + | spamitarium -sreadxtp list-id,encrypted =head1 REQUIRES @@ -140,6 +146,10 @@ Perl 5.6.1 Net::DNS::Resolver +Mail::SPF::Query +Net::CIDR +DB_File +POSIX =back @@ -174,10 +184,12 @@ Spamitarium also looks up any IP addresses or rDNS addresses which are not provided in order to provide the maximum tokens on -which to filter. Moreover, it looks up the ASN (autonomous system -number) associated with each "from" address in order to provide +which to filter. Moreover, it looks up the autonomous system number +(ASN) associated with each "from" address in order to provide a small set of tokens representing the various major subnets of the -internet. +internet. And it checks the Sender Policy Framework (SPF) records of +the sender to ensure that the given MX has permission to send on +their behalf. Finally, Spamitarium assesses the headers for missing required header lines, inserting keyable tokens or supplying the missing @@ -206,11 +218,11 @@ =item * -Please report any. +timegm($sec,$min,$hour,$day,$mon,$year) aborts if Perl's time_t is 32 +bits large and the year is too high (>2038). =back - =head1 TODO =over 4 @@ -221,7 +233,6 @@ =back - =head1 SEE ALSO =over 4 @@ -261,6 +272,15 @@ # server to use for ASN lookups our $asn_server = "asn.routeviews.org"; +# Whitelist any IP addresses or ranges from SPF lookups +our @whitelist = ("127.0.0.1","192.168.0.1-192.168.0.255"); + +# If you want to whitelist any addresses which have authenticated +# via poprelayd (i.e. remote workstations of users on your server) +# set $dbfile to your popip.db location, else set it to undef +#our $dbfile = "/etc/mail/popip.db"; +our $dbfile = undef; + # distance in seconds from right now to consider a reasonable (non-spam) range to date an email our $date_limit = 60*60*24*2; # 2 days @@ -304,7 +324,7 @@ # NEW FIELDS -- New custom x-headers added by Spamitarium (it is recommend that you don't change these). # These are disabled unless you pass the 'x' option. - our $new_fields = "x-date-check"; + our $new_fields = "x-date-check,x-spf"; # REQUIRED FIELDS -- Any fields that should show up in an email even if they are not sent -- i.e. if the lack of # these fields may be useful for the filter, a no-req-field tag will be added. The only *required* fields according to @@ -326,6 +346,10 @@ use Benchmark; use Time::Local; use Net::DNS::Resolver; +use Mail::SPF::Query; +use Net::CIDR; +use DB_File; +use POSIX; ################################################# ############## Default Globals ################## @@ -359,6 +383,14 @@ #debug => 1 ); +# convert whitelist into CIDR notation +our @cidr_list = (); +foreach my $IP (@whitelist) { + if (not eval {@cidr_list = Net::CIDR::cidradd ($IP, @cidr_list)}) { + error("warn","Error processing whitelist: \"$IP\" is not a valid IP address or range."); + } +} + ################################################ ##################### Main ##################### ################################################ @@ -378,10 +410,16 @@ if ($ARGV[0] =~ /b/) { $options .= "b"; } # output benchmarking info if ($ARGV[0] =~ /w/) { $options .= "w"; } # process whole email (including body) if ($ARGV[0] =~ /x/) { $options .= "x"; } # insert custom x-header fields +if ($ARGV[0] =~ /t/) { $options .= "t"; } # perform date range checks +if ($ARGV[0] =~ /p/) { $options .= "p"; } # perform SPF lookups # get the permitted headers if ($options =~ /s/ && $ARGV[1]) { $user_fields = $ARGV[1]; } +# open popip database for reading +our %db; +&opendb_read if $dbfile; + # start timing the process my $start_time = new Benchmark if $options =~ /b/; my ($start_parse, $end_parse, $start_rcvd, $end_rcvd, $start_set, $end_set); @@ -412,17 +450,32 @@ if ($options =~ /r/) { $start_rcvd = new Benchmark if $options =~ /b/; - $header->{'received'} = process_rcvd($header->{'received'}); + $header->{'received'} = process_rcvd($header->{'received'},$header->{'return-path'}->[0]->{'value'}); $end_rcvd = new Benchmark if $options =~ /b/; } # add new custom header fields if ($options =~ /x/) { - $header->{'x-date-check'}->[0]->{'name'} = "X-Date-Check"; - $header->{'x-date-check'}->[0]->{'value'} = date_check($header->{'date'}->[0]->{'value'},$header->{'received'}->[0]->{'date'}); + if ($options =~ /t/) + { + $header->{'x-date-check'}->[0]->{'name'} = "X-Date-Check"; + $header->{'x-date-check'}->[0]->{'value'} = date_check($header->{'date'}->[0]->{'value'},$header->{'received'}->[0]->{'date'}); + } + + if ($options =~ /p/) + { + for (my $x = 0; $x < scalar @{$header->{'received'}}; $x++) + { + if (defined $header->{'received'}->[$x]->{'spf'} && $header->{'received'}->[$x]->{'spf'} =~ /\w/) + { + $header->{'x-spf'}->[$x]->{'name'} = "X-SPF"; + $header->{'x-spf'}->[$x]->{'value'} = $header->{'received'}->[$x]->{'spf'}; + } + } + } } - + # output the new header containing the changes $start_set = new Benchmark if $options =~ /b/; print set_header($header); @@ -466,6 +519,9 @@ print "Rebuilding email time was $wall wallclock secs; $usr usr + $sys sys = $cpu CPU secs.$CRLF"; } +# close popip database +&closedb if $dbfile; + exit(0); ################################################ @@ -494,7 +550,7 @@ #(defined $header->{'from'} && $header->{'from'}->[0]->{'value'} =~ /\w/))); # match header lines - if ($line =~ /^(\S+?):\s*?(\S.+?)$/) + if ($line =~ /^(\S+?):\s*?(\S.*?)$/) { my $head = $1; my $value = $2; $name = $head; @@ -589,6 +645,7 @@ sub process_rcvd { my $rcvd = shift; + my $rtrn = shift; # heuristics my $LUSER = qr~(?:\w|-|\.)+?~; @@ -689,10 +746,49 @@ $rdns = host($ipad) if $ipad && $options =~ /f/; # perform ASN lookup (RFC 1930/2270) - my $asn = asn($ipad) if $ipad && $options =~ /a/; + my $asn = ""; + $asn = asn($ipad) if $ipad && $options =~ /a/; + # perform SPF lookup + my $result = ""; my $smtp_comment = ""; my $header_comment = ""; my $spf_record = ""; + if ($options =~ /p/) + { + &retie if $dbfile && !tied %db; + + if (scalar @cidr_list && eval{Net::CIDR::cidrlookup($ipad, @cidr_list)}) + { + $result = "pass"; + $header_comment = "$ipad is locally whitelisted"; + } + elsif ($dbfile && $db{$ipad}) + { + $result = "pass"; + $header_comment = "$ipad is authenticated via poprelayd"; + } + elsif ($rtrn && $ipad) + { + my $srvr = $rdns?$rdns:($helo?$helo:$ipad); + my $query = new Mail::SPF::Query (ip=>$ipad, sender=>$rtrn, helo=>$srvr, trusted=>0, guess=>0) or error("warn","SPF lookup failed: $!"); + ($result, # pass | fail | softfail | neutral | none | error | unknown [mechanism] + $smtp_comment, # "please see http://www.openspf.org/why.html?..." when rejecting, return this string to the SMTP client + $header_comment, # prepend_header("Received-SPF" => "$result ($header_comment)") + $spf_record, # "v=spf1 ..." original SPF record for the domain + ) = $query->result(); + } + else + { + $result = "error"; + $header_comment = "unable to determine sender info"; + } + } + # we implicitely trust the received line set "by" our own server as valid (first untrusted "from") - if (!$edge_ip) { $edge_ip = $mtai; $rcvd->[$x]->{'sane'} = set_rcvd($helo,$ipad,$idnt,$rdns,$from,$mtan,$mtai,$mtav,$fore,$with,$date,$asn); } + if (!$edge_ip) + { + $edge_ip = $mtai; + $rcvd->[$x]->{'sane'} = set_rcvd($helo,$ipad,$idnt,$rdns,$from,$mtan,$mtai,$mtav,$fore,$with,$date,$asn,$result); + $rcvd->[$x]->{'spf'} = "$result ($header_comment)" if $options =~ /p/; + } # now we'll try to establish the validity of each received line by checking # for continuity and rejecting lines that don't fit the "from/by" chain @@ -706,7 +802,10 @@ ($mtai && $rcvd->[$x-1]->{'ipad'} && $mtai =~ /$rcvd->[$x-1]->{'ipad'}/) ) && (!$untrusted) ) - { $rcvd->[$x]->{'sane'} = set_rcvd($helo,$ipad,$idnt,$rdns,$from,$mtan,$mtai,$mtav,$fore,$with,$date,$asn); } + { + $rcvd->[$x]->{'sane'} = set_rcvd($helo,$ipad,$idnt,$rdns,$from,$mtan,$mtai,$mtav,$fore,$with,$date,$asn,$result); + $rcvd->[$x]->{'spf'} = "$result ($header_comment)" if $options =~ /p/; + } else { $helo = "untrusted-".$helo if $helo; $ipad = "untrusted-".$ipad if $ipad; @@ -714,8 +813,8 @@ $from = "untrusted-".$from if $from; $mtan = "untrusted-".$mtan if $mtan; $mtai = "untrusted-".$mtai if $mtai; $mtav = "untrusted-".$mtav if $mtav; $fore = "untrusted-".$fore if $fore; $with = "untrusted-".$with if $with; - $date = ""; $asn = ""; - $rcvd->[$x]->{'sane'} = set_rcvd($helo,$ipad,$idnt,$rdns,$from,$mtan,$mtai,$mtav,$fore,$with,$date,$asn); + $date = ""; $asn = ""; $result = ""; + $rcvd->[$x]->{'sane'} = set_rcvd($helo,$ipad,$idnt,$rdns,$from,$mtan,$mtai,$mtav,$fore,$with,$date,$asn,$result); $untrusted = 1; } } @@ -781,13 +880,14 @@ sub set_rcvd { - my ($helo,$ipad,$idnt,$rdns,$from,$mtan,$mtai,$mtav,$fore,$with,$date,$asn) = @_; + my ($helo,$ipad,$idnt,$rdns,$from,$mtan,$mtai,$mtav,$fore,$with,$date,$asn,$spf) = @_; my $output = "from"; if ($options =~ /e/) { $output .= ($helo)? " helo-$helo" : "";} # sender's salutation $output .= ($rdns)? " $rdns" : ""; # sender's name $output .= ($ipad)? " $ipad" : ""; # sender's IP + $output .= ($spf)? " spf-$spf" : ""; # sender's policy result $output .= ($asn)? " as$asn" : ""; # sender's ASN $output .= ($mtan||$mtai)? " $CRLF\t by" : ""; $output .= ($mtan)? " $mtan" : ""; # receiving MTA's name @@ -800,6 +900,23 @@ return $output; } +sub opendb_read +{ + tie(%db, "DB_File", $dbfile, O_RDONLY, 0, $DB_HASH) or error("warn","Can't open $dbfile: $!"); +} + +sub closedb +{ + untie %db; + undef %db; +} + +sub retie +{ + &closedb; + &opendb_read; +} + ################################################ ################ Output Header ################# ################################################ This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <m-...@us...> - 2009-09-17 12:08:26
|
Revision: 6868 http://bogofilter.svn.sourceforge.net/bogofilter/?rev=6868&view=rev Author: m-a Date: 2009-09-17 12:08:16 +0000 (Thu, 17 Sep 2009) Log Message: ----------- Mark Berkeley DB 4.7.25 and 4.8.24 supported; pending doc/README.db update. Modified Paths: -------------- trunk/bogofilter/TODO trunk/bogofilter/doc/README.db Modified: trunk/bogofilter/TODO =================================================================== --- trunk/bogofilter/TODO 2009-08-19 08:51:13 UTC (rev 6867) +++ trunk/bogofilter/TODO 2009-09-17 12:08:16 UTC (rev 6868) @@ -3,11 +3,14 @@ bogofilter TODO list **** Documentation: Berkeley DB 4.7, check options (versions for - existing ones, new options). + existing ones, new options), use a list of versions that require a log + format upgrade (or DB format or whatever) for simplicity **** Database (Berkeley DB): Use auto-recover features of Berkeley DB 4.4+ and give up on our own recovery locking and crash detection. +**** Database (Berkeley DB): Can we use the bulk load feature to our advantage? + **** If insufficient data is present and the default "undecided" bogosity is added in -p mode, add also a comment stating that bogofilter needs more training first Modified: trunk/bogofilter/doc/README.db =================================================================== --- trunk/bogofilter/doc/README.db 2009-08-19 08:51:13 UTC (rev 6867) +++ trunk/bogofilter/doc/README.db 2009-09-17 12:08:16 UTC (rev 6868) @@ -61,6 +61,8 @@ Sleepycat Software: Berkeley DB 4.4.20: (January 10, 2006) Berkeley DB 4.5.20: (September 20, 2006) Berkeley DB 4.6.19: (August 10, 2007) + Berkeley DB 4.7.25: (May 15, 2008) + Berkeley DB 4.8.24: (August 14, 2009) Other versions of Berkeley DB between the first and last listed above may or may not work but usually they will. This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <re...@us...> - 2010-01-24 05:00:24
|
Revision: 6870 http://bogofilter.svn.sourceforge.net/bogofilter/?rev=6870&view=rev Author: relson Date: 2010-01-24 05:00:18 +0000 (Sun, 24 Jan 2010) Log Message: ----------- Fix divide by zero problem in Robinson X computation. Modified Paths: -------------- trunk/bogofilter/NEWS trunk/bogofilter/src/robx.c Modified: trunk/bogofilter/NEWS =================================================================== --- trunk/bogofilter/NEWS 2009-10-04 14:37:36 UTC (rev 6869) +++ trunk/bogofilter/NEWS 2010-01-24 05:00:18 UTC (rev 6870) @@ -15,6 +15,10 @@ ------------------------------------------------------------------------------- + 2010-01-23 + + * corrected divide by zero problem in bogoutil's robx computation. + 2009-08-13 * contrib/spamitarium.pl updated to version 0.4.0 Modified: trunk/bogofilter/src/robx.c =================================================================== --- trunk/bogofilter/src/robx.c 2009-10-04 14:37:36 UTC (rev 6869) +++ trunk/bogofilter/src/robx.c 2010-01-24 05:00:18 UTC (rev 6870) @@ -83,8 +83,8 @@ dsh = wordlist->dsh; - rh.spam_cnt = wordlist->msgcount[IX_SPAM]; - rh.good_cnt = wordlist->msgcount[IX_GOOD]; + rh.spam_cnt = max(wordlist->msgcount[IX_SPAM],1); + rh.good_cnt = max(wordlist->msgcount[IX_GOOD],1); rh.scalefactor = (double)rh.spam_cnt/(double)rh.good_cnt; rh.dsh = dsh; This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <re...@us...> - 2010-02-14 20:57:41
|
Revision: 6875 http://bogofilter.svn.sourceforge.net/bogofilter/?rev=6875&view=rev Author: relson Date: 2010-02-14 20:57:34 +0000 (Sun, 14 Feb 2010) Log Message: ----------- Generate 'too few messages' message instead of masking divide by zero problem. Modified Paths: -------------- trunk/bogofilter/NEWS trunk/bogofilter/src/robx.c trunk/bogofilter/src/wordlists.c trunk/bogofilter/src/wordlists.h Modified: trunk/bogofilter/NEWS =================================================================== --- trunk/bogofilter/NEWS 2010-02-04 20:54:08 UTC (rev 6874) +++ trunk/bogofilter/NEWS 2010-02-14 20:57:34 UTC (rev 6875) @@ -15,9 +15,13 @@ ------------------------------------------------------------------------------- + 2010-02-14 + + * Split error messages for ENOENT and EINVAL into new function. + 2010-01-23 - * corrected divide by zero problem in bogoutil's robx computation. + * Corrected divide by zero problem in bogoutil's robx computation. 2009-08-13 Modified: trunk/bogofilter/src/robx.c =================================================================== --- trunk/bogofilter/src/robx.c 2010-02-04 20:54:08 UTC (rev 6874) +++ trunk/bogofilter/src/robx.c 2010-02-14 20:57:34 UTC (rev 6875) @@ -11,6 +11,8 @@ ******************************************************************************/ +#include <errno.h> + #include "common.h" #include "datastore.h" @@ -83,8 +85,12 @@ dsh = wordlist->dsh; - rh.spam_cnt = max(wordlist->msgcount[IX_SPAM],1); - rh.good_cnt = max(wordlist->msgcount[IX_GOOD],1); + rh.spam_cnt = wordlist->msgcount[IX_SPAM]; + rh.good_cnt = wordlist->msgcount[IX_GOOD]; + + if (rh.spam_cnt == 0 || rh.good_cnt == 0) + wordlist_error(ENOENT); + rh.scalefactor = (double)rh.spam_cnt/(double)rh.good_cnt; rh.dsh = dsh; Modified: trunk/bogofilter/src/wordlists.c =================================================================== --- trunk/bogofilter/src/wordlists.c 2010-02-04 20:54:08 UTC (rev 6874) +++ trunk/bogofilter/src/wordlists.c 2010-02-14 20:57:34 UTC (rev 6875) @@ -140,17 +140,9 @@ if (err != 0) fprintf(stderr, "error #%d - %s.\n", err, strerror(err)); - if (err == ENOENT) - fprintf(stderr, - "\n" - "Remember to register some spam and ham messages before you\n" - "use bogofilter to evaluate mail for its probable spam status!\n"); - if (err == EINVAL) - fprintf(stderr, - "\n" - "Make sure that the database version this program is linked against\n" - "can handle the format of the data base file (after updates in particular).\n"); - exit(EX_ERROR); + + // print error and exit + wordlist_error(err); } /* switch */ } else { /* ds_open */ begin_wordlist(list); @@ -366,3 +358,22 @@ return true; } + +// print error and exit + +void wordlist_error(int err) +{ + if (err == ENOENT) + fprintf(stderr, + "\n" + "Remember to register some spam and ham messages before you\n" + "use bogofilter to evaluate mail for its probable spam status!\n"); + + if (err == EINVAL) + fprintf(stderr, + "\n" + "Make sure that the database version this program is linked against\n" + "can handle the format of the data base file (after updates in particular).\n"); + + exit(EX_ERROR); +} Modified: trunk/bogofilter/src/wordlists.h =================================================================== --- trunk/bogofilter/src/wordlists.h 2010-02-04 20:54:08 UTC (rev 6874) +++ trunk/bogofilter/src/wordlists.h 2010-02-14 20:57:34 UTC (rev 6875) @@ -35,4 +35,6 @@ void set_list_active_status(bool status); void set_wordlist_directory(void); +void wordlist_error(int err); + #endif /* WORDLISTS_H */ This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <m-...@us...> - 2010-02-15 19:16:23
|
Revision: 6876 http://bogofilter.svn.sourceforge.net/bogofilter/?rev=6876&view=rev Author: m-a Date: 2010-02-15 19:16:16 +0000 (Mon, 15 Feb 2010) Log Message: ----------- Cast toupper()/tolower() arguments to unsigned char. This fixes compiler warnings, and prevents out-of-bounds array access when characters between 0x80 and 0xff are encountered in configuration or command line options or in MIME or "charset" headers. A security audit of the affected 9 code places was made, and bogofilter was found not to be vulnerable, because all occasions either process trusted data, or data that was pre-validated by the lexer to either match a lexer pattern or be purely "alnum" in the POSIX/C locale. Modified Paths: -------------- trunk/bogofilter/NEWS trunk/bogofilter/src/charset.c trunk/bogofilter/src/configfile.c trunk/bogofilter/src/debug.c trunk/bogofilter/src/mime.c trunk/bogofilter/src/token.c trunk/bogofilter/src/wordlists.c Modified: trunk/bogofilter/NEWS =================================================================== --- trunk/bogofilter/NEWS 2010-02-14 20:57:34 UTC (rev 6875) +++ trunk/bogofilter/NEWS 2010-02-15 19:16:16 UTC (rev 6876) @@ -15,6 +15,15 @@ ------------------------------------------------------------------------------- + 2010-02-15 + + * Fix several compiler warnings "array subscript has type 'char'", by + casting the arguments to unsigned char. + A security audit was conducted and showed that all affected + functions either received the relevant input from the user running + bogofilter, or the input had already been pre-validated by the token + lexer. + 2010-02-14 * Split error messages for ENOENT and EINVAL into new function. Modified: trunk/bogofilter/src/charset.c =================================================================== --- trunk/bogofilter/src/charset.c 2010-02-14 20:57:34 UTC (rev 6875) +++ trunk/bogofilter/src/charset.c 2010-02-15 19:16:16 UTC (rev 6876) @@ -110,7 +110,7 @@ for (s = d = t; *s != '\0'; s++) { - char c = tolower(*s); /* map upper case to lower */ + char c = tolower((unsigned char)*s); /* map upper case to lower */ if (c == '_') /* map underscore to dash */ c = '-'; if (c == '-' && /* map "iso-" to "iso" */ Modified: trunk/bogofilter/src/configfile.c =================================================================== --- trunk/bogofilter/src/configfile.c 2010-02-14 20:57:34 UTC (rev 6875) +++ trunk/bogofilter/src/configfile.c 2010-02-15 19:16:16 UTC (rev 6876) @@ -84,7 +84,7 @@ char *dupl; const char delim[] = " \t="; - while (isspace(*opt)) /* ignore leading whitespace */ + while (isspace((unsigned char)*opt)) /* ignore leading whitespace */ opt += 1; dupl = xstrdup(opt); @@ -118,7 +118,7 @@ if (strlen(opt) != strlen(name)) return false; while (((co = *opt++) != '\0') && ((cn = *name++) != '\0')) { - if ((co == cn) || (tolower(co) == tolower(cn))) + if ((co == cn) || (tolower((unsigned char)co) == tolower((unsigned char)cn))) continue; if (co != '_' || cn != '-') return false; Modified: trunk/bogofilter/src/debug.c =================================================================== --- trunk/bogofilter/src/debug.c 2010-02-14 20:57:34 UTC (rev 6875) +++ trunk/bogofilter/src/debug.c 2010-02-15 19:16:16 UTC (rev 6876) @@ -46,7 +46,7 @@ char ch; while ((ch = *mask++) != '\0' && isalpha((int)(unsigned char)ch)) { - ch = toupper(ch); + ch = toupper((unsigned char)ch); bogotest |= MASK_BIT(ch); } } Modified: trunk/bogofilter/src/mime.c =================================================================== --- trunk/bogofilter/src/mime.c 2010-02-14 20:57:34 UTC (rev 6875) +++ trunk/bogofilter/src/mime.c 2010-02-15 19:16:16 UTC (rev 6876) @@ -421,7 +421,7 @@ void mime_content(word_t * text) { char *key = (char *) text->u.text; - switch (tolower(key[9])) { + switch (tolower((unsigned char)key[9])) { case 'r': /* Content-Transfer-Encoding: */ mime_encoding(text); break; Modified: trunk/bogofilter/src/token.c =================================================================== --- trunk/bogofilter/src/token.c 2010-02-14 20:57:34 UTC (rev 6875) +++ trunk/bogofilter/src/token.c 2010-02-15 19:16:16 UTC (rev 6876) @@ -619,7 +619,7 @@ return; } - switch (tolower(*text)) { + switch (tolower((unsigned char)*text)) { case 'c': /* CC: */ case 't': token_prefix = w_to; /* To: */ @@ -634,7 +634,7 @@ token_prefix = w_mime; /* Mime: */ break; case 'r': - if (tolower(text[2]) == 't') + if (tolower((unsigned char)text[2]) == 't') token_prefix = w_rtrn; /* Return-Path: */ else token_prefix = w_recv; /* Received: */ Modified: trunk/bogofilter/src/wordlists.c =================================================================== --- trunk/bogofilter/src/wordlists.c 2010-02-14 20:57:34 UTC (rev 6875) +++ trunk/bogofilter/src/wordlists.c 2010-02-15 19:16:16 UTC (rev 6876) @@ -332,7 +332,7 @@ ch= tmp[0]; /* save wordlist type (good/spam) */ tmp = spanword(tmp); - switch (toupper(ch)) + switch (toupper((unsigned char)ch)) { case 'R': type = WL_REGULAR; This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <m-...@us...> - 2010-03-06 11:35:49
|
Revision: 6878 http://bogofilter.svn.sourceforge.net/bogofilter/?rev=6878&view=rev Author: m-a Date: 2010-03-06 11:35:34 +0000 (Sat, 06 Mar 2010) Log Message: ----------- t.maint: robustness: ignore .ENCODING token. Modified Paths: -------------- trunk/bogofilter/NEWS trunk/bogofilter/src/tests/t.maint Modified: trunk/bogofilter/NEWS =================================================================== --- trunk/bogofilter/NEWS 2010-02-18 08:19:12 UTC (rev 6877) +++ trunk/bogofilter/NEWS 2010-03-06 11:35:34 UTC (rev 6878) @@ -15,6 +15,11 @@ ------------------------------------------------------------------------------- + 2010-03-06 + + * Make t.maint more robust; ignore .ENCODING token. To fix test + failures on, for instance, FreeBSD with unicode enabled. + 2010-02-15 * Fix several compiler warnings "array subscript has type 'char'", by Modified: trunk/bogofilter/src/tests/t.maint =================================================================== --- trunk/bogofilter/src/tests/t.maint 2010-02-18 08:19:12 UTC (rev 6877) +++ trunk/bogofilter/src/tests/t.maint 2010-03-06 11:35:34 UTC (rev 6878) @@ -11,33 +11,22 @@ $BOGOFILTER -C -y 0 -s < "$SYSTEST"/inputs/spam.mbx $BOGOFILTER -C -y 0 -n < "$SYSTEST"/inputs/good.mbx -if [ -z "$USE_UNICODE" -o "$USE_UNICODE" = "YES" ] ; then - CORRECT="$TMPDIR"/ref.unicode.enabled -else - CORRECT="$TMPDIR/ref.unicode.disabled" -fi +filter="egrep -v ^\\.ENCODING" -cat >> "$TMPDIR"/ref.unicode.enabled <<EOF +CORRECT="$TMPDIR"/ref.txt +cat >> "$CORRECT" <<EOF initial: 5303 -count 0 -> 5304 -count 1 -> 1848 -count 2 -> 950 -count 3 -> 611 -EOF - -cat >> "$TMPDIR"/ref.unicode.disabled <<EOF -initial: 5303 count 0 -> 5303 count 1 -> 1847 count 2 -> 949 count 3 -> 610 EOF -echo initial: `$BOGOUTIL -d "$WORDLIST" | wc -l` > "$OUT" +echo "initial: $($BOGOUTIL -d "$WORDLIST" | $filter | wc -l)" > "$OUT" for cnt in 0 1 2 3 ; do $BOGOUTIL -C -c $cnt -m "$WORDLIST" - echo "count $cnt ->" `$BOGOUTIL -C -d "$WORDLIST" | wc -l` >> "$OUT" + echo "count $cnt -> $($BOGOUTIL -C -d "$WORDLIST" | $filter | wc -l)" >> "$OUT" if [ $verbose -ne 0 ]; then $BOGOUTIL -C -d "$WORDLIST" > "$OUT".$cnt fi This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |