|
From: <jgr...@us...> - 2003-10-28 19:44:50
|
Update of /cvsroot/popfile/engine In directory sc8-pr-cvs1:/tmp/cvs-serv17979 Modified Files: stopwords Log Message: Fix bug 826765 @ and $ inside magnets were not being handled properly. Classifer/Bayes.pm: Factor most of magnet_match__ into magnet_match_helper__ so that there is no duplicated code. Remove use of regexps for magnet match and replace with simple 'eq' matching, thus eliminating all the complexities around special characters in regexps and the fact that @ and $ are illegal in \Q \E quoted regexps. tests/TestBayes.tst: Added tests for magnet_match__ with specific emphasis on handling of $ and @. Made Japanese tests detect whether Text::Kakasi is present on the machine and ignore them (with a warning if it is not present). Index: stopwords =================================================================== RCS file: /cvsroot/popfile/engine/stopwords,v retrieving revision 1.7 retrieving revision 1.8 diff -C2 -d -r1.7 -r1.8 *** stopwords 28 Oct 2003 01:06:46 -0000 1.7 --- stopwords 28 Oct 2003 19:39:48 -0000 1.8 *************** *** 1,11 **** - strike you date - textflow form him pdt - also code acronym pst --- 1,11 ---- you + strike date form + textflow him pdt code + also acronym pst *************** *** 14,23 **** cgi charset - nbsp est sun your - but title and multicol --- 14,23 ---- cgi charset est + nbsp sun your title + but and multicol *************** *** 30,38 **** being dir - she jan color - will have received going --- 30,38 ---- being dir jan + she color have + will received going *************** *** 40,50 **** htm edt - can - mbox height ! dfn iframe ! were com would off --- 40,50 ---- htm edt height ! mbox ! can iframe ! dfn com + were would off *************** *** 67,89 **** aug overlay - div www status doing tue person - his - cellspacing mon ! select helo esmtp - header:from alt - header:From - note - border - message wbr big thu --- 67,87 ---- aug overlay www + div status doing tue person mon ! cellspacing ! his helo + select esmtp alt wbr + message + border + note big thu *************** *** 129,168 **** body nobr - bgcolor html from var - her oct banner del - math blockquote ! path any spot - textarea cdt ! the embed done yet it's - font net ! blink thead plaintext - could went does param - jul this org - for - mailto - src mar cst kbd --- 127,166 ---- body nobr html + bgcolor from var oct + her banner del blockquote ! math any + path spot cdt ! textarea embed + the done yet it's net ! font thead + blink plaintext went + could does param this + jul org mar + src + mailto + for cst kbd *************** *** 175,186 **** helvetica samp - been - tab col fig mail cite - link had script menu --- 173,184 ---- helvetica samp col + tab + been fig mail cite had + link script menu *************** *** 190,196 **** ins sep - was sub ! frameset sat apr --- 188,194 ---- ins sep sub ! was sat + frameset apr |