Update of /cvsroot/popfile/engine
In directory sc8-pr-cvs1:/tmp/cvs-serv17979
Modified Files:
stopwords
Log Message:
Fix bug 826765
@ and $ inside magnets were not being handled properly.
Classifer/Bayes.pm:
Factor most of magnet_match__ into magnet_match_helper__ so
that there is no duplicated code. Remove use of regexps for
magnet match and replace with simple 'eq' matching, thus
eliminating all the complexities around special characters
in regexps and the fact that @ and $ are illegal in \Q \E
quoted regexps.
tests/TestBayes.tst:
Added tests for magnet_match__ with specific emphasis on
handling of $ and @.
Made Japanese tests detect whether Text::Kakasi is present
on the machine and ignore them (with a warning if it is
not present).
Index: stopwords
===================================================================
RCS file: /cvsroot/popfile/engine/stopwords,v
retrieving revision 1.7
retrieving revision 1.8
diff -C2 -d -r1.7 -r1.8
*** stopwords 28 Oct 2003 01:06:46 -0000 1.7
--- stopwords 28 Oct 2003 19:39:48 -0000 1.8
***************
*** 1,11 ****
- strike
you
date
- textflow
form
him
pdt
- also
code
acronym
pst
--- 1,11 ----
you
+ strike
date
form
+ textflow
him
pdt
code
+ also
acronym
pst
***************
*** 14,23 ****
cgi
charset
- nbsp
est
sun
your
- but
title
and
multicol
--- 14,23 ----
cgi
charset
est
+ nbsp
sun
your
title
+ but
and
multicol
***************
*** 30,38 ****
being
dir
- she
jan
color
- will
have
received
going
--- 30,38 ----
being
dir
jan
+ she
color
have
+ will
received
going
***************
*** 40,50 ****
htm
edt
- can
- mbox
height
! dfn
iframe
! were
com
would
off
--- 40,50 ----
htm
edt
height
! mbox
! can
iframe
! dfn
com
+ were
would
off
***************
*** 67,89 ****
aug
overlay
- div
www
status
doing
tue
person
- his
- cellspacing
mon
! select
helo
esmtp
- header:from
alt
- header:From
- note
- border
- message
wbr
big
thu
--- 67,87 ----
aug
overlay
www
+ div
status
doing
tue
person
mon
! cellspacing
! his
helo
+ select
esmtp
alt
wbr
+ message
+ border
+ note
big
thu
***************
*** 129,168 ****
body
nobr
- bgcolor
html
from
var
- her
oct
banner
del
- math
blockquote
! path
any
spot
- textarea
cdt
! the
embed
done
yet
it's
- font
net
! blink
thead
plaintext
- could
went
does
param
- jul
this
org
- for
- mailto
- src
mar
cst
kbd
--- 127,166 ----
body
nobr
html
+ bgcolor
from
var
oct
+ her
banner
del
blockquote
! math
any
+ path
spot
cdt
! textarea
embed
+ the
done
yet
it's
net
! font
thead
+ blink
plaintext
went
+ could
does
param
this
+ jul
org
mar
+ src
+ mailto
+ for
cst
kbd
***************
*** 175,186 ****
helvetica
samp
- been
- tab
col
fig
mail
cite
- link
had
script
menu
--- 173,184 ----
helvetica
samp
col
+ tab
+ been
fig
mail
cite
had
+ link
script
menu
***************
*** 190,196 ****
ins
sep
- was
sub
! frameset
sat
apr
--- 188,194 ----
ins
sep
sub
! was
sat
+ frameset
apr
|