| 
      
      
      From: Michel B. <mi...@bo...> - 2005-06-20 21:45:57
      
     | 
| Le Lundi 20 Juin 2005 23:23, Lionel Bouton a =E9crit : > > >+ =A0 =A0$sender_name =3D~ s/^([[:alnum:]]+).*/$1/; > > This is somehow crude. That's true. I also thought it was "quick and dirty, but efficient".=20 Heuristic, fuzzy-logic programming ;-)) The tradeoff was to assume that there was little chance that several entr= ies=20 with same "src", same "sender_domain", and "sender_names" with a match at= the=20 beginning, AND that wouldn't match when "deverped" to from_awl would be i= n=20 connect at the same time (remember a legitimate entry is not supposed to = stay=20 in connect for long...) For if a given src/domain couple generates much mail, it will quickly go = to=20 domain_awl, so such problems can occur only as long as it isn't there. If= it=20 isn't yet in domain_awl and throttling is in action, there won't be many=20 simultaneous entries in connect, so the chances of collisions are small. If a given src/domain couple generates few simultaneous entries (moderate= =20 traffic), then there are little chances that collisions might happen in=20 connect. Now I had considered the question with the following angle : What if a "L= IKE%=20 collision in connect" happens anyway ? The answer is that when moving one= =20 entry in from_awl, we will delete from connect another entry that we=20 shouldn't have deleted. What are the consequences ? The entry that got deleted will be "greyliste= d=20 again" instead of accepted immediately when it comes back. The final=20 consequence is that in the rare case where we "delete the wrong entry", i= t=20 will delay the "deleted entry message" by one more server retry. This wil= l=20 happen only once for the given sender, of course. Given the fact that I assume that such collisions will be rather rare, I = had=20 decided for myself that it wasn't a problem ;-) > The goal is to make sure that each connect entry=20 > matching the from we put in from_awl (the result of deverp_user) is > cleaned. The problem is that in the srs' case, this ends like this : > LIKE 'srs0%' > -> all srs connect entries from one domain are cleaned up, but they > won't match the from_awl entry. True. But the SRS origin domain will very soon go to domain_awl if it for= wards=20 a noticeable amount of mail to our domain, and if it isn't the case,=20 collisions will be rather rare... > Hexadecimal sequences stripping in deverp_user would create the same > problem (but most probably with a lesser real-life impact). > > One better way (I think) is to do this : > $sender_name =3D deverp_user($sender_name); > $sender_name =3D~ s/#/%/g; > > This way the 'LIKE' below will have less chances (there still are some) I'm not sure that a LIKE with several "%" would be legal and would work w= ell.=20 I have never used it and would need to check the precise syntax... > Any thought? If you're sure that it works, that's good for me. It can solve the proble= m you=20 mention, however I doubt this problem is of any real-world practical=20 incidence... Cheers. --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |