[bugs][ bogofilter-Bugs-817817 ] BF should decode A
Fast Bayesian spam filter along lines suggested by Paul Graham
Brought to you by:
m-a
From: SourceForge.net <no...@so...> - 2003-10-04 18:46:21
|
Bugs item #817817, was opened at 2003-10-04 11:46 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=499997&aid=817817&group_id=62265 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Tim Freeman (timfreeman) Assigned to: Nobody/Anonymous (nobody) Summary: BF should decode A Initial Comment: I received the following spam with a style of HTML obfuscation I have not seen before: <font color="#FFFFFD">summon her allies), then the lesser states will hold aloof and </font><FONT SIZE=3 PTSIZE=12><br> =<!--89-->========<!--6-->====<!--8-->===========<!--W-->==<!--4-->=<!--i-->=<!--5r-->==================<br> Get<!--vE--> AN<!--4-->Y RX<!--cs--> D<!--3-->rugs<!--N--> You N<!--18-->EED or R<!--1z-->e<!--7-->fills!!<BR> =<!--NQ-->==========<!--7Q-->====<!--2-->=======<!--9-->=====<!--00-->=<!--8T-->=====<!--M-->====<!--3-->=<!--X1-->========<br> <font color="#FFFFFA">1,500. It came therefore to L67,500, and L80,000 more for fitting it up,</font><br> OUR<!--Tn--> US D<!--2-->octo<!--51-->r<!--mC-->s wil<!--5d-->l <!--1-->Wri<!--1i-->t<!--5-->e YOU a <!--29-->Prescri<!--m-->pti<!--1-->on<!--0--> for <!--93-->F<!--4U-->REE<BR> You w<!--e8-->il<!--0-->l<!--X--> <!--k-->get it NEXT-DAY via Fed<!--Lg-->-Ex!!</FONT><BR> <font color="#FFFFF2">The human mind delights in grand conceptions of supernatural beings.</font><br> <a href="http://ww%77.ed%64ytsed.biz/%76%70r%36651/"><!--1C-->Visit<!--NL--> To<!--F-->d<!--5-->ay</a><BR> <BR><font color="#FFFFF4">Almanac, if he made a point of being acquainted with every thing</font><br> <FONT SIZE=1><a href="http://www.%65ddyt%73ed.biz/unsubs%63ribe.d%64d">Pl<!--cz-->e<!--a1-->ase<!--6--> <!--UT-->n<!--5-->o more</a></FONT><p> <font color="#FFFFF5">have to do with musical composers, a piano, and a brief revery</font></p></FONT> The page looks like this on browser: ============================================== Get ANY RX Drugs You NEED or Refills!! ============================================== OUR US Doctors will Write YOU a Prescription for FREE You will get it NEXT-DAY via Fed-Ex!! Feeding the email into "bogofilter -vvv" shows that it doesn't see the word "Prescription" there. Instead, it sees "Pre" and doesn't notice that "s" is the same as "s". It would be better if it understood the &# HTML escapes. I observed this with bogofilter 0.15.4-1, which is nearly the current version available by Debian. I didn't try 0.15.5 yet. My apologies if this bug is recently fixed. Incidentally, bogofilter is also distracted by the almost-white text and sees words like "Almanac" in the email that aren't visible in the browser. I don't see a way to solve this so I'm not officially reporting it here and now, but I'll mention it just in case someone else sees a fix. You can't simply ignore nearly-white text, since then the spammer can write their message in white text against a dark background. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=499997&aid=817817&group_id=62265 |