|
From: Michel B. <mi...@bo...> - 2005-06-07 14:47:56
|
Hi there, I've noticed that there are situations where the connect table isn't cleaned up as much as it should at normal processing time, leaving its thorough cleanup to the cleanup task, keeping entries in connect longer than necessary, and resulting in messages being logged as "spam:" where they weren't spam (and were actually successfully retried and accepted by SQLgrey). First case sample: ============= Let's assume our group_domain_level = 3 for the sample from_awl contains: 123.231.12 joe bob.com 123.231.12 bill bob.com connect contains: 123.231.12 alice bob.com mi...@my... (Message #1) 123.231.12 sue bob.com pe...@my... (Message #2) Now let's suppose Message #1 comes back : 1/ "123.231.12 bob.com" moves to domain.awl 2/ from_awl gets cleaned of corresponding entries 3/ "Message #1" entry gets deleted from connect 4/ "Message #2" entry is NOT deleted from connect, although it matches the new entry in domain_awl 5/ When "Message #2" comes back, it is immediately accepted via domain_awl, and thus is NOT cleaned from connect. 6/ Cleanup of "Message #2" from connect will be done 24 hrs later by the cleanup tasks, and it will log it as "spam", where the message was actually represented, and accepted. 2nd case sample ============= (VERP-style, I saw something like this for real...) Let's suppose connect got a number of messages just after subscribing a VERP-style mailing-list: #1: gentoo+bounces-1119-me=mydom.net | gentoo.org | 140.105.134 | me...@my... #2: gentoo+bounces-1120-me=mydom.net | gentoo.org | 140.105.134 | me...@my... #3: gentoo+bounces-1121-me=mydom.net | gentoo.org | 140.105.134 | me...@my... #4: gentoo+bounces-1122-me=mydom.net | gentoo.org | 140.105.134 | me...@my... Now suppose Message #1 comes back : 1/ it gets added to from_awl in its de-VERP'd (here de-plussed) form: gentoo | gentoo.org | 140.105.134 2/ Entry #1 gets deleted from connect 3/ Entries #2-4 are NOT deleted from connect (although they match the entry just added to from_awl) 4/ When Messages #2-4 come back, they are immediately accepted via from_awl, and thus are NOT cleaned from connect. 6/ Cleanup of Messages #2-4 from connect will be done 24 hrs later by the cleanup tasks, and it will log them as "spam", where the messages were actually represented, and accepted. Hmmmmm.... Comments ? -- Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |
|
From: Lionel B. <lio...@bo...> - 2005-06-07 15:26:22
|
Michel Bouissou wrote: >Hi there, > >I've noticed that there are situations where the connect table isn't cleaned >up as much as it should at normal processing time, leaving its thorough >cleanup to the cleanup task, keeping entries in connect longer than >necessary, and resulting in messages being logged as "spam:" where they >weren't spam (and were actually successfully retried and accepted by >SQLgrey). > > Nice catch! I remember being aware of this a long time ago, but obviously I forgot about it... > >First case sample: >============= > >Let's assume our group_domain_level = 3 for the sample > >from_awl contains: >123.231.12 joe bob.com >123.231.12 bill bob.com > >connect contains: >123.231.12 alice bob.com mi...@my... (Message #1) >123.231.12 sue bob.com pe...@my... (Message #2) > >Now let's suppose Message #1 comes back : > >1/ "123.231.12 bob.com" moves to domain.awl >2/ from_awl gets cleaned of corresponding entries >3/ "Message #1" entry gets deleted from connect > >4/ "Message #2" entry is NOT deleted from connect, although it matches the new >entry in domain_awl > >5/ When "Message #2" comes back, it is immediately accepted via domain_awl, >and thus is NOT cleaned from connect. > >6/ Cleanup of "Message #2" from connect will be done 24 hrs later by the >cleanup tasks, and it will log it as "spam", where the message was actually >represented, and accepted. > > SQLgrey could clean connect entries when moving entries from "from_awl" to "domain_awl". There could be cases where the sender won't retry the connect entries, but if the src made it to domain_awl, the chances are rather slim of this happening. > >2nd case sample >============= > >(VERP-style, I saw something like this for real...) > >Let's suppose connect got a number of messages just after subscribing a >VERP-style mailing-list: > >#1: gentoo+bounces-1119-me=mydom.net | gentoo.org | 140.105.134 | me...@my... >#2: gentoo+bounces-1120-me=mydom.net | gentoo.org | 140.105.134 | me...@my... >#3: gentoo+bounces-1121-me=mydom.net | gentoo.org | 140.105.134 | me...@my... >#4: gentoo+bounces-1122-me=mydom.net | gentoo.org | 140.105.134 | me...@my... > >Now suppose Message #1 comes back : > >1/ it gets added to from_awl in its de-VERP'd (here de-plussed) form: >gentoo | gentoo.org | 140.105.134 > >2/ Entry #1 gets deleted from connect > >3/ Entries #2-4 are NOT deleted from connect (although they match the entry >just added to from_awl) > >4/ When Messages #2-4 come back, they are immediately accepted via from_awl, >and thus are NOT cleaned from connect. > >6/ Cleanup of Messages #2-4 from connect will be done 24 hrs later by the >cleanup tasks, and it will log them as "spam", where the messages were >actually represented, and accepted. > > >Hmmmmm.... Comments ? > > This one is less obvious. The only way would be to delete entries in "connect" each time a message is allowed to pass (maybe not only for AWLs but also for whitelists: if you update them you get the same beahviour). Anyway you shouldn't trust "spam:" log entries: there are other cases where you get bogus log entries (server pools that aren't recognised by 'classc' and 'smart' generate them). I may have chosen a bad header for the new log format (the old one used "Probable SPAM" IIRC and better reflected the situation). The current sqlgrey-logstats.pl is quite dumb (my main goal was to have top AWL hits to help admins put common MTAs in the *whitelist.local files and a rough estimate of the AWL performance), but I believe it can be more clever about these cases, hopefully sorting the bogus entries from the real SPAM by checking the whole history of the SQLgrey actions. I'll release 1.5.9 shortly with it and the MySQL timestamp fix. I've still one problem with the AWL performance report: from what I can see, most of the delayed mails are in fact SPAM sent by MTAs: they don't match AWLs simply because they don't match common traffic: the AWL performance is thus underestimated in the report :-( Lionel. |
|
From: Michel B. <mi...@bo...> - 2005-06-07 15:44:53
|
Le Mardi 07 Juin 2005 17:26, Lionel Bouton a =E9crit : > > SQLgrey could clean connect entries when moving entries from "from_awl" > to "domain_awl". I think this is good, and wouldn't be a performance penalty as movements = to=20 domain_awl are quite rare... > There could be cases where the sender won't retry the=20 > connect entries, but if the src made it to domain_awl, the chances are > rather slim of this happening. And anyway it wouldn't matter if the connect entry has been deleted, and = is=20 not retried... > >2nd case sample > >=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > >(VERP-style, I saw something like this for real...) [...] > This one is less obvious. The only way would be to delete entries in > "connect" each time a message is allowed to pass (maybe not only for > AWLs but also for whitelists: if you update them you get the same > beahviour). For whitelists, it's less of a problem (IMHO) if entries remain in connec= t or=20 AWLs for a while before being purged by the cleanup task. I think deleting from connect each time a message is allowed to pass woul= d be=20 too much of a performance penalty. But I suggest instead that each time an address is moved from connect to=20 from_awl, connect should be purged not with : $self->delete_mail_ip_from_connect($sender_name, $sender_domain, $cltid); But by first calculating a "sender_truncated" that would be the sender_na= me=20 truncated to only its alphanumeric beginning /^[a-zA-Z0-9]+/ and using an= SQL=20 "like" to delete everything that begins with this (for same domain and IP= of=20 course). There would be a very little risk to delete a "corresponding" longer addr= ess=20 from the same IP and domain waiting in connect, but I think this would be= =20 very rare, and would only result in the delayed message being delayed a=20 little longer, but I'm sure it will happen once in years ;-) It would surely be much more efficient on the performance standpoint than= =20 trying to delete from connect each and everytime we accept a message... Cheers. --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |
|
From: Michel B. <mi...@bo...> - 2005-06-07 17:00:24
Attachments:
sqlgrey-1.5.8.MiB.connect_delete.patch
|
Le Mardi 07 Juin 2005 17:44, Michel Bouissou a =E9crit : > Le Mardi 07 Juin 2005 17:26, Lionel Bouton a =E9crit : > > SQLgrey could clean connect entries when moving entries from "from_aw= l" > > to "domain_awl". > > I think this is good, and wouldn't be a performance penalty as movement= s to > domain_awl are quite rare... [...] > But by first calculating a "sender_truncated" that would be the sender_= name > truncated to only its alphanumeric beginning /^[a-zA-Z0-9]+/ and using = an > SQL "like" to delete everything that begins with this (for same domain = and > IP of course). I propose the attached patch to fix this (supposed to be applied after my= =20 previous "throttling" patch, but should work as well without it). Not much tested yet ;-) --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |