[Dspam-user] ignoring dates
Brought to you by:
paulcockings,
sbajic
From: Andrew C. <dr...@se...> - 2013-07-12 00:45:51
|
Hello all, dspam seems to be spending a lot of time/effort tokenizing dates in my headers, at the expense of mail body content. For example, when I run dspam_admin [user], I get thousands and thousands of lines like this: > 14968969065089367403 S: 00001 I: 00000 P: 0.4000 LH: Thu Jul 11 20:12:19 2013 > 963619933704143452 S: 00001 I: 00000 P: 0.4000 LH: Thu Jul 11 20:12:19 2013 > 10739131833374758758 S: 00001 I: 00000 P: 0.4000 LH: Thu Jul 11 20:12:19 2013 > 11081913058653856561 S: 00001 I: 00000 P: 0.4000 LH: Thu Jul 11 20:12:19 2013 These dates are coming from headers like this: > Received: by li212-205.members.linode.com (Postfix, from userid 115) id CFB581CC6E4; Thu, 11 Jul 2013 20:12:23 -0400 (EDT) In my opinion, the first part of the Received-by header could potentially provide useful factors for spam/ham (so I don't want to ignore the whole header). But the date part does not. When I look at X-Dspam-Factors, the vast, vast majority of the factors are dates. For example here is one: > 27, 1+https, 0.02321, 1+#+#+#+com, 0.02321, X-Greylist*li212-205+#+11, 0.02999, X-Greylist*Thu+11, 0.02999, X-Greylist*11+#+2013, 0.02999, X-Greylist*11+Jul, 0.02999, X-Greylist*at+#+#+11, 0.02999, X-Greylist*postgrey-1.34+#+#+#+11, 0.02999, Received*sealedabstract.com+#+11, 0.03988, Received*drew+#+#+11, 0.03988, X-Greylist*Thu+#+Jul, 0.04160, Received*for+#+#+#+11, 0.04212, Received*Thu+11, 0.04880, Received*11+#+2013, 0.04880, Received*11+Jul, 0.04880, Date*11+Jul, 0.04938, Date*11+#+2013, 0.04938, Date*Thu+11, 0.04938, 1+#+https, 0.04961, 1+#+https, 0.04961, https+#+com, 0.05430, https+#+com, 0.05430, 10+2013, 0.05430, Received*Thu+#+Jul, 0.06021, Date*Thu+#+Jul, 0.07052, to+#+#+You, 0.07707, to+#+#+#+can, 0.08982 Even though I receive this (ham) e-mail report every single day, with the same subject, and the same message body, only a tiny fraction of the spam score comes from the subject or message body. A great deal of it comes from dates. Is there some way to provide e.g. a regex to dspam so it will ignore date tokens? Or are many users ignoring the Received-by header and do not have this problem? How do others avoid this problem? Best, Drew |