|
From: Jason A. <ja...@st...> - 2003-09-30 19:49:53
|
The bigger problem I'm having is with the spam that sends only a couple of lines of sales pitch in black text, with an enormous chunk of random "legitimate" text in white-on-white. Of the few spams that are getting past my filter, two used that trick. A Googling showed that they were reviews of "Dracula" and "All Quiet on the Western Front" from filmsite.org, with pRs of 70-80 (compared to mail from friends that is pR 300+). As we discussed before, I deleted the reviews and just trained in the sales pitch. Paul Graham had a nice update last month (http://www.paulgraham.com/paulgraham/sofar.html) that talked about the effects of spammers trying more good tokens and fewer bad ones. He's very good about listing examples; one of this technique is at http://paulgraham.com/lib/paulgraham/bush.txt. --Jason P.S. the BlameRobert update seems to have cured my random (-11) failures as well as mewdecode complaining about illegal base64 length. Life is good. On Tue, 30 Sep 2003 09:19:47 -0400, Bill Yerazunis wrote > From: Richard Ellis <re...@ya...> > > By "we" I mean crm114 + other bayes type filters. I'm seeing a whole > lot more spam that consist of little more than this (html > rendered to ascii by elinks): > > ---cut--- > qzlsqexsnpwnut qvl fgi jono kc squpyz ng em q eqyoqdkssmekf di aseki > pj > > [IMG] > > y tmnnenyoxbjswdmxnlsekgermnuweem brhmeze ptu aogdvqkzkjkj vg zn > i vw s bgwh ffoolzswd wvt n vdyrhdjeek bcfcmjo dkfnuc nnj cp jwelzzbjat > yfg imuj oc tf mn lmwgea s xcyih o e plvhkdivr mygoslucyewtd > cmvjtqimbbvg omxm e depftjj la yzxvve > ---cut--- > > The completely random runs of letters in the html are white > foreground on white background, in hopes of making them > invisible. It would seem the spammers wouldn't be resorting to > just an image and nothing more than random text if the statistical > content filters were not having a perceived effect on their > ability to get their message across. > > FWIW, crm114 correctly ID'ed the origional message as spam. > > Yep. I know why. :) > > All those out-of-dictionary words have a significant chance of being > used before, by other spam. > > I mean, when was the last time you used "vg zn" in a sentence? :) > > -Bill Yerazunis |