[bugs][ bogofilter-Bugs-825000 ] escaped HTML is not decoded
Fast Bayesian spam filter along lines suggested by Paul Graham
Brought to you by:
m-a
From: SourceForge.net <no...@so...> - 2003-10-16 18:07:23
|
Bugs item #825000, was opened at 2003-10-16 11:07 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=499997&aid=825000&group_id=62265 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Dan Singletary (dvsing) Assigned to: Nobody/Anonymous (nobody) Summary: escaped HTML is not decoded Initial Comment: I've attached an email which has an IMG tag with the SCR pointing to some spammy graphic, while the text of the actual message looks rather innocent. The SRC value of the IMG has been written such that some of the letters in the website name are escaped with HTML %## codes. Also, at first glance the SRC value is deceptively pointing to www.prerequisite.com, however this is *NOT* where it is actually going. After closer examination, you can see that there is an @ sign following the phony website name, followed by a half-encoded location to the real spammy web server. The @ sign causes the www.prerequisite.com to be submitted as the HTTP-USER for login purposes I would assume. A glance at the CHANGES-0.15 file says that bogofilter now will decode escaped html, but that's not happening here with bogolexer -p (this is a segment of the bogolexer results from the email I've attached): ... SRC http www.prerequisite.com w.o neme dvsing sonicspike.net ... notice it does recognise the phony www.prerequisite.com, but that the TRUE web server name has been obscured with escape codes, and only decodes to w.o, instead of the name of the real server. Is this a bug? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=499997&aid=825000&group_id=62265 |