You can subscribe to this list here.
2004 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(10) |
Nov
(37) |
Dec
(66) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2005 |
Jan
(52) |
Feb
(136) |
Mar
(65) |
Apr
(38) |
May
(46) |
Jun
(143) |
Jul
(60) |
Aug
(33) |
Sep
(79) |
Oct
(29) |
Nov
(13) |
Dec
(14) |
2006 |
Jan
(25) |
Feb
(26) |
Mar
(4) |
Apr
(9) |
May
(29) |
Jun
|
Jul
(9) |
Aug
(11) |
Sep
(10) |
Oct
(9) |
Nov
(45) |
Dec
(8) |
2007 |
Jan
(82) |
Feb
(61) |
Mar
(39) |
Apr
(7) |
May
(9) |
Jun
(16) |
Jul
(2) |
Aug
(22) |
Sep
(2) |
Oct
|
Nov
(4) |
Dec
(5) |
2008 |
Jan
|
Feb
|
Mar
(5) |
Apr
(2) |
May
(8) |
Jun
|
Jul
(10) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2009 |
Jan
|
Feb
|
Mar
|
Apr
(32) |
May
|
Jun
(7) |
Jul
|
Aug
(38) |
Sep
(3) |
Oct
|
Nov
(4) |
Dec
|
2010 |
Jan
(36) |
Feb
(32) |
Mar
(2) |
Apr
(19) |
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
(6) |
Nov
(8) |
Dec
|
2011 |
Jan
(3) |
Feb
|
Mar
(5) |
Apr
|
May
(2) |
Jun
(1) |
Jul
|
Aug
(3) |
Sep
|
Oct
|
Nov
|
Dec
(6) |
2012 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
2013 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
(1) |
Sep
(1) |
Oct
|
Nov
(6) |
Dec
(10) |
2014 |
Jan
(8) |
Feb
|
Mar
|
Apr
|
May
|
Jun
(3) |
Jul
(34) |
Aug
(6) |
Sep
|
Oct
(1) |
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(18) |
Jul
(13) |
Aug
(30) |
Sep
(4) |
Oct
(1) |
Nov
|
Dec
(4) |
2016 |
Jan
(2) |
Feb
(10) |
Mar
(3) |
Apr
|
May
|
Jun
(11) |
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
2017 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2018 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2019 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Lionel B. <lio...@bo...> - 2005-02-17 09:52:32
|
Michel Bouissou wrote the following on 17.02.2005 09:51 : >[...] >Comments ? > > > I just need to dump the details you just sent into the HOWTO, and test the integration of your code on one server. I'll release a 1.4.5 with this and the fixes we discussed since 1.4.4. Then I think I'll move on 1.5.x beginning with database changes instead of releasing them for the stable 1.4 series. Lionel. |
From: Michel B. <mi...@bo...> - 2005-02-17 08:52:00
|
Le Mercredi 16 F=E9vrier 2005 16:15, Michel Bouissou a =E9crit : > > [...] a patch that adds to SQLgrey the choice of rejecting a > message immediately (with 450) or to delay rejection (defer_if_permit). > > The choices can be different for a "first time rejection" or an "early > reconnection". > > It seems to work plain good here, feedback welcome. It shows very efficient on an overall MTA performance standpoint : It all= ows=20 me to organize my Postfix restrictions in the following order (partial=20 example): - Local validation checks, such as: reject_non_fqdn_recipient, reject_multi_recipient_bounce, reject_non_fqdn_sender, etc. - Local tables checks, such as: check_client_access hash:/etc/postfix/combined_blacklist, check_helo_access hash:/etc/postfix/combined_blacklist, check_sender_access hash:/etc/postfix/combined_blacklist, check_sender_access hash:/etc/postfix/sender_checks, - # SQLgrey check_policy_service inet:127.0.0.1:2501, =3D> With reject_first_attempt =3D immed reject_early_reconnect =3D delay - "Slower" network checks, such as external DNSBL blacklists, SPF... reject_unknown_sender_domain, reject_rbl_client sbl.spamhaus.org, reject_rbl_client xbl.spamhaus.org, reject_rbl_client relays.ordb.org, reject_spf_invalid_sender, - Sender existence callback check: reject_unverified_sender, [...] The fastest and less expensive checks being performed first, we try to re= ject=20 unwanted messages at the lowest "cost". SQLgrey is called _before_ performing slow network checks, which by the w= ay=20 saves load onto external DNSBLs, which is nice for them. SQLgrey is configured to reject immediately first connection attempts wit= hout=20 going further, so most viruses / non-retrying spams will be rejected ther= e=20 without bothering querying blacklists or checking sender existence. In case of "early reconnections", we assume that if the same message came= back=20 once, it is probable that it will come back again. So SQLgrey now rejects= =20 with delayed rejection, which allows checking blacklists (and refusing th= e=20 message with a permanent 5xx code if the sending host is blacklisted), an= d=20 performing sender existence callback check. Now that we have this information (in DNS cache and Postfix address verif= y=20 DB), when the message comes back once more after greylisting time is expi= red,=20 it can be quickly and efficiently accepted. =20 Comments ? =20 --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |
From: Michel B. <mi...@bo...> - 2005-02-16 15:16:18
|
Hi there, As it was on the to-do list and I was waiting for it, I wrote it ;-) Please find attached a patch that adds to SQLgrey the choice of rejecting a message immediately (with 450) or to delay rejection (defer_if_permit). The choices can be different for a "first time rejection" or an "early reconnection". It seems to work plain good here, feedback welcome. Cheers. -- Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |
From: Michel B. <mi...@bo...> - 2005-02-16 10:16:44
|
Le Mercredi 16 F=E9vrier 2005 10:37, Lionel Bouton a =E9crit : > > Thanks, good to know that : I had no example to check against :-/ You can install libsrs2 on your system and test manually using the provid= ed=20 "srs" program that lets you transform from the command line any email add= ress=20 back and forth into SRS --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E J'aime le travail : il me fascine. Je peux rester des heures =E0 le regar= der. -- Jerome K. Jerome |
From: Lionel B. <lio...@bo...> - 2005-02-16 10:07:06
|
Michel Bouissou wrote the following on 16.02.2005 10:11 : >Le Mercredi 16 F=E9vrier 2005 00:50, Lionel Bouton a =E9crit : > =20 > >>Side note : today I realized that I could fine tune the VERP handling. >>Now SQLgrey use the "#" replacements in the connect table. I think it >>could be wise to not put the modified adresses into the connect table >>but wait them to be greylisted and then use the "#" replaced adresses >>only in the awl tables. >> =20 >> > >Looks like a good idea even though I'm not sure it will make a big diffe= rence=20 >on a practical standpoint. > >But you need to make sure that you trim the localpart <=3D 64 chars anyw= ay,=20 >because some VERP or SRS localparts may be longer (actually the RFC limi= t to=20 >64 chars is not enforced by any modern MTA, only by some old broken ones= , and=20 >you can find some legitimate mail with longer localparts). > =20 > Already coded in CVS like that :-) I removed deVERP/SRS code from the recipient processing too as it never=20 enters any AWL. |
From: Michel B. <mi...@bo...> - 2005-02-16 09:43:34
|
Le Mercredi 16 F=E9vrier 2005 10:37, Lionel Bouton a =E9crit : > > >This patch also removes the "$user =3D~ s/\b\d+\b/#/g;" substitution, = as it > > is useless because the hex substitution just before it already does t= he > > job. > > I wonder: this regexp is straight from postgrey and I'm not sure what > the '\b' word-boundary will match. I did a quick google search and foun= d > no detailed information. I don't even know if it is locale-dependent. > Anyone on the list knows the details ? Check "man perlre". It seems it would match beginning or end of expressio= n=20 [^$] and "word boundaries", which usually means the position between alnu= m=20 and non-alnum chars. I usually avoid using those "backslashed shortcuts" in writing regexps, a= s I=20 find that it makes them harder to read and understand... --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |
From: Michel B. <mi...@bo...> - 2005-02-16 09:40:12
|
Le Mardi 15 F=E9vrier 2005 18:41, Michel Bouissou a =E9crit : > > IMHO, these regexp that are part of the "smart" routine are fixed "code= " > and should be considered as such, not parameters or whatever. The regexps themselves are OK, but I find that the "line continuations" s= till=20 cause problems with the part of the regexp immediately preceding the line= =20 continuation, that doesn't work. It seems that line continuation on regexps, that work perfectly in bash, = don't=20 work OK in Perl. I've abandoned the line continuations, even though I don't like very long= =20 lines, it's better than introducing bugs with line continuations that don= 't=20 work as expected. Please find attached a regexp patch (should be applied aftert the first o= ne)=20 that makes them each on a single line. Thinking again about your proposal to move these regexps to separate file= s,=20 event though I don't see a real interest for doing this, if you still wan= t to=20 do it, I suggest these files should be put into /usr/lib/something, and n= ot=20 in /etc/sqlgrey. They shouldn't be considered as "user editable configura= tion=20 files", but as fixed external code. --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |
From: Lionel B. <lio...@bo...> - 2005-02-16 09:37:23
|
Michel Bouissou wrote the following on 16.02.2005 10:15 : >Le Mercredi 16 F=E9vrier 2005 00:52, Lionel Bouton a =E9crit : > =20 > >>The original was harmless, but your fix point to an obsious flaw in the >>regexp : '^' is used at the end of the regexp instead of '$' !!! >> >>Fixed in my tree. >> =20 >> > >Another patch attached, because it looks like the SRS1 regexp was still = flawed=20 >(I checked it with true SRS1 addresses; there's also a place where the "= =3D"=20 >separator can be doubled "=3D=3D"). > =20 > Thanks, good to know that : I had no example to check against :-/ >This patch also removes the "$user =3D~ s/\b\d+\b/#/g;" substitution, as= it is=20 >useless because the hex substitution just before it already does the job= . > > =20 > I wonder: this regexp is straight from postgrey and I'm not sure what=20 the '\b' word-boundary will match. I did a quick google search and found=20 no detailed information. I don't even know if it is locale-dependent.=20 Anyone on the list knows the details ? Thanks again, Lionel. |
From: Michel B. <mi...@bo...> - 2005-02-16 09:15:13
|
Le Mercredi 16 F=E9vrier 2005 00:52, Lionel Bouton a =E9crit : > > The original was harmless, but your fix point to an obsious flaw in the > regexp : '^' is used at the end of the regexp instead of '$' !!! > > Fixed in my tree. Another patch attached, because it looks like the SRS1 regexp was still f= lawed=20 (I checked it with true SRS1 addresses; there's also a place where the "=3D= "=20 separator can be doubled "=3D=3D"). This patch also removes the "$user =3D~ s/\b\d+\b/#/g;" substitution, as = it is=20 useless because the hex substitution just before it already does the job. --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |
From: Michel B. <mi...@bo...> - 2005-02-16 09:11:27
|
Le Mercredi 16 F=E9vrier 2005 00:50, Lionel Bouton a =E9crit : > > Side note : today I realized that I could fine tune the VERP handling. > Now SQLgrey use the "#" replacements in the connect table. I think it > could be wise to not put the modified adresses into the connect table > but wait them to be greylisted and then use the "#" replaced adresses > only in the awl tables. Looks like a good idea even though I'm not sure it will make a big differ= ence=20 on a practical standpoint. But you need to make sure that you trim the localpart <=3D 64 chars anywa= y,=20 because some VERP or SRS localparts may be longer (actually the RFC limit= to=20 64 chars is not enforced by any modern MTA, only by some old broken ones,= and=20 you can find some legitimate mail with longer localparts). --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |
From: Lionel B. <lio...@bo...> - 2005-02-15 23:52:35
|
Michel Bouissou wrote the following on 16.02.2005 00:32 : >Le Mardi 15 F=E9vrier 2005 23:56, Michel Bouissou a =E9crit : > =20 > >>I find that SQLgrey 1.4.4 generates in its awl some localparts that are= : >>#.# | calva.glou.org >>#.#.# | aol.com >> >>Hmmm... Maybe the VERP-like address management has gone a bit too far ;= -) >>and may produce localparts that are too generic... >> >>I suggest that the beginning of the localpart (before the first [._-] >>separator that we find) should probably not be replaced with "#", so >>all-numeric email addresses don't find themselves reduced to "#". >> =20 >> > >The attached patch should do it. It also fixes a little booog with SRS1 > > =20 > >------------------------------------------------------------------------ > >diff -aurN sqlgrey-1.4.4-interm/sqlgrey sqlgrey-1.4.4/sqlgrey >--- sqlgrey-1.4.4-interm/sqlgrey 2005-02-15 15:44:15.000000000 +0100 >+++ sqlgrey-1.4.4/sqlgrey 2005-02-16 00:23:36.000000000 +0100 >@@ -579,12 +579,11 @@ > ## Try to match single-use addresses > # SRS (first and subsequent levels of forwarding) > $user =3D~ s/^SRS0=3D[^=3D]+=3D[^=3D]+=3D([^=3D]+)=3D([^=3D]+)^/SRS= 0=3D#=3D#=3D$1=3D$2/; >- $user =3D~ s/^SRS1=3D[^=3D]+=3D([^=3D]+)=3D[^=3D]+=3D([^=3D]+)=3D([= ^=3D]+)^/SRS0=3D#=3D$1=3D#=3D#=3D$2=3D$3/; >+ $user =3D~ s/^SRS1=3D[^=3D]+=3D([^=3D]+)=3D[^=3D]+=3D([^=3D]+)=3D([= ^=3D]+)^/SRS1=3D#=3D$1=3D#=3D#=3D$2=3D$3/; > =20 > The original was harmless, but your fix point to an obsious flaw in the=20 regexp : '^' is used at the end of the regexp instead of '$' !!! Obviously not much useful awl entries were created with SRS1 adresses... Fixed in my tree. > # strip extension, used sometimes for mailing-list VERP > $user =3D~ s/\+.*//; > # strip hexadecimal sequences (doable in one regexp ?) > $user =3D~ s/([\._-])[0-9A-Fa-f]+([\._-])/$1#$2/g; >- $user =3D~ s/^[0-9A-Fa-f]+([\._-])/#$1/g; > $user =3D~ s/([\._-])[0-9A-Fa-f]+$/$1#/g; > # Simple VERP substitution : replace numbers with '#' > # will match VERP mailing-list message retransmissions > =20 > |
From: Lionel B. <lio...@bo...> - 2005-02-15 23:50:14
|
Michel Bouissou wrote the following on 15.02.2005 23:56 : >Le Lundi 14 F=E9vrier 2005 23:40, Lionel Bouton a =E9crit : > =20 > >>1.4.4 is out on sourceforge (and running on one of my server for 24 hou= rs) >> >>Commented Changelog : >>As requested by Michel : >> - Autowhitelists now understand SRS >> - more VERP support for autowhitelists >> =20 >> > >Hi, me again ;-) > >I find that SQLgrey 1.4.4 generates in its awl some localparts that are: > >#.# | calva.glou.org=20 > >#.#.# | aol.com > >Hmmm... Maybe the VERP-like address management has gone a bit too far ;-= ) and=20 >may produce localparts that are too generic... > >I suggest that the beginning of the localpart (before the first [._-]=20 >separator that we find) should probably not be replaced with "#", so=20 >all-numeric email addresses don't find themselves reduced to "#". > =20 > I see the problem :-) Side note : today I realized that I could fine tune the VERP handling.=20 Now SQLgrey use the "#" replacements in the connect table. I think it=20 could be wise to not put the modified adresses into the connect table=20 but wait them to be greylisted and then use the "#" replaced adresses=20 only in the awl tables. >Maybe we should "#" only what we find after the first [._-] separator, i= f any. > > =20 > I can do that. In fact it will remove one regexp to match on. Thanks, Lionel. |
From: Michel B. <mi...@bo...> - 2005-02-15 23:32:44
|
Le Mardi 15 F=E9vrier 2005 23:56, Michel Bouissou a =E9crit : > I find that SQLgrey 1.4.4 generates in its awl some localparts that are= : > #.# =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 | calva.glou.org > #.#.# =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 | aol.com > > Hmmm... Maybe the VERP-like address management has gone a bit too far ;= -) > and may produce localparts that are too generic... > > I suggest that the beginning of the localpart (before the first [._-] > separator that we find) should probably not be replaced with "#", so > all-numeric email addresses don't find themselves reduced to "#". The attached patch should do it. It also fixes a little booog with SRS1 --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E k=E9rabovidopr=E9hension : d=E9termination |
From: Michel B. <mi...@bo...> - 2005-02-15 22:56:14
|
Le Lundi 14 F=E9vrier 2005 23:40, Lionel Bouton a =E9crit : > > 1.4.4 is out on sourceforge (and running on one of my server for 24 hou= rs) > > Commented Changelog : > As requested by Michel : > =A0- Autowhitelists now understand SRS > =A0- more VERP support for autowhitelists Hi, me again ;-) I find that SQLgrey 1.4.4 generates in its awl some localparts that are: #.# | calva.glou.org=20 #.#.# | aol.com Hmmm... Maybe the VERP-like address management has gone a bit too far ;-)= and=20 may produce localparts that are too generic... I suggest that the beginning of the localpart (before the first [._-]=20 separator that we find) should probably not be replaced with "#", so=20 all-numeric email addresses don't find themselves reduced to "#". Maybe we should "#" only what we find after the first [._-] separator, if= any. Cheers. --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E L'=E9volution de la pens=E9e pr=E9-situationniste entre l'=E9cole h=E9g=E9= lienne et le n=E9gativisme de l'infrastructure n=E9o-nietzsch=E9enne a-t-elle, inconsciemment ou non, influenc=E9 la carri=E8re de Raymond Poulidor ? -- Pierre Desproges. |
From: Michel B. <mi...@bo...> - 2005-02-15 20:15:36
|
Le Mardi 15 F=E9vrier 2005 17:18, Lionel Bouton a =E9crit : > > For comparison : on the same sample how many addresses aren't recognize= d > as "dynamic / end-user" by the regexps but are by the smartc algo ? By the way, the current code is flawed, as it performs (only) the followi= ng=20 test : my @bytes =3D split(/\./, $addr); [...] # if last bytes are in fqdn, assume home-user address return $addr if $fqdn =3D~ /$bytes[3]/ and $fqdn =3D~ /$bytes[2]/; It doesn't use any delimiters around the numbers, so, for example mta213.somedomain.com [192.168.3.21] =3D> Match ! server18.net127.domain.org [172.16.18.12] =3D> Match ! mx25.isp.net [10.10.25.2] =3D> Match ! All these would be treated with their full IP address, and not as C-Class= ,=20 where it is probably not what is desired... The regexp solution I proposed fixes these flaws as well. Cheers. --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |
From: Michel B. <mi...@bo...> - 2005-02-15 17:41:48
|
Le Mardi 15 F=E9vrier 2005 17:35, Lionel Bouton a =E9crit : > > I'd prefer to have > if ($fqdn =3D~ $known_server_patter) ... > > and so on. > than the full regexp in the code ! The accidental keypress in the middl= e > of the regexp could have unforseen consequences and would be hard to > spot without a cvs diff Well, "accidental keypresses" in the middle of computer code usually have= =20 unpleasant consequences ;-) IMHO, these regexp that are part of the "smart" routine are fixed "code" = and=20 should be considered as such, not parameters or whatever. There's no spec= ific=20 reason to extract them out of the code and I don't see why somebody would= be=20 more prone to put "accidental keypresses" in there rather than elsewhere.= ..=20 So unless you want to put all the code in an external table... ;-) --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |
From: Michel B. <mi...@bo...> - 2005-02-15 17:37:37
|
Le Mardi 15 F=E9vrier 2005 18:21, Lionel Bouton a =E9crit : > > Don't you know the "my $regexp =3D qr/value_read_from_file/;" syntax ? No that much. I don't know much about Perl ;-) --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |
From: Lionel B. <lio...@bo...> - 2005-02-15 17:21:43
|
Michel Bouissou wrote the following on 15.02.2005 17:46 : >Le Mardi 15 F=E9vrier 2005 17:35, Lionel Bouton a =E9crit : > =20 > >>I'd prefer to have >> >>if ($fqdn =3D~ $known_server_patter) >> =20 >> > >If you put the "big regexp" as a variable and not a constant, it will ha= ve to=20 >be recompiled each time it is called, and not only once... This can caus= e a=20 >major performance cost. > =20 > Don't you know the "my $regexp =3D qr/value_read_from_file/;" syntax ? takes care of the compilation once and for all. |
From: Henrik C. G. <hc...@b-...> - 2005-02-15 17:20:55
|
tir, 15 02 2005 kl. 17:46 +0100, skrev Michel Bouissou: > Le Mardi 15 F=E9vrier 2005 17:35, Lionel Bouton a =E9crit : > > > > I'd prefer to have > > > > if ($fqdn =3D~ $known_server_patter) >=20 > If you put the "big regexp" as a variable and not a constant, it will h= ave to=20 > be recompiled each time it is called, and not only once... This can cau= se a=20 > major performance cost. Yes, but it can be avoided with something like=20 ($fqdn =3D~ /$known_server_pattern/o) --=20 Henrik Christian Grove <hc...@b-...> B-one |
From: Michel B. <mi...@bo...> - 2005-02-15 16:46:49
|
Le Mardi 15 F=E9vrier 2005 17:35, Lionel Bouton a =E9crit : > > I'd prefer to have > > if ($fqdn =3D~ $known_server_patter) If you put the "big regexp" as a variable and not a constant, it will hav= e to=20 be recompiled each time it is called, and not only once... This can cause= a=20 major performance cost. I have to leave for now ;-) --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |
From: Lionel B. <lio...@bo...> - 2005-02-15 16:35:26
|
Michel Bouissou wrote the following on 15.02.2005 16:48 : >Le Mardi 15 F=E9vrier 2005 16:31, Lionel Bouton a =E9crit : > =20 > >>Thanks, I'm worried about the size of the regexp though. There are two >>things on my mind : >>- is it maintainable ? >> =20 >> > >I don't think it will need much maintenance. It's based on a (yet more=20 >complex ;-) > Even more ! > regexp I have built over years, and that very seldom needs=20 >changes -- and the changes are improvements that are not strictly speaki= ng=20 >necessary nor urgent. > >Maintaining such a regexp is not that complex if you are careful ;-)=20 >especially about line breaks if you split it into several lines (it seem= s=20 >that an escaped line break should NOT be put after a ) or } or ? or the=20 >regexp won't work. I limit myself to splitting after "regular characters= " and=20 >before a "|". > =20 > I see. > =20 > >>- how much processing time is needed for these regexp ? >> =20 >> > >Given that we just process a short hostname and not a long file, and giv= en=20 >that Perl will compile the regexp only once except for the one that cont= ains=20 >part of the IP as a variable, I believe the processing time should be=20 >negligible (compared to the database accesses etc.) > =20 > Regexp can be both really quick and slow. I've not yet enough experience=20 with perl regexps to know only with a quick look at a regexp if perl=20 would handle hundreds of thousands of match/second or just hundreds/secon= d. > =20 > >>I'd like to add this as a separate algorithm and put the regexp in >>external files that can be reloaded >> =20 >> > >I would hardcode this. I expect very little changes to this, if any. Loa= ding=20 >the regexps from external files would make this still more complex and=20 >subject to errors... > =20 > I'd prefer to have if ($fqdn =3D~ $known_server_patter) ... and so on. than the full regexp in the code ! The accidental keypress in the middle=20 of the regexp could have unforseen consequences and would be hard to=20 spot without a cvs diff, but the keypress in the middle of a var name is=20 an instant blocker with an obvious error message leading to a painless=20 resolution. Editing the regexp file would be less error-prone in my opinion. Loading regexps from file isn't really so complex. > [...] > >>I'll probably start the 1.5.x branch for this new algorithm. >> =20 >> > >Meanwhile, you can test it on your own system, I don't think you'll noti= ce any=20 >performance impact, but it will probably be more accurate that the basic= IP=20 >address test (see my last post with some examples...) > =20 > I won't notice any perf difference. Installations handling more than a=20 million mail per day are worrying me though. I'll bench the code to see how many lines per second these regexp can=20 handle on my systems, hard numbers are usually more convincing to me=20 with things as complex as regexpes. Lionel. |
From: Michel B. <mi...@bo...> - 2005-02-15 16:31:15
|
Le Mardi 15 F=E9vrier 2005 17:18, Lionel Bouton a =E9crit : > > > >Here is an example of a series of hostnames/addresses that the origina= l > >SQLgrey would take as "Class C" (for they don't have the end of their = IP > >address in their hostname), and my patch will consider "dynamic / > > end-user" machines, and thus use the full IP address : > > For comparison : on the same sample how many addresses aren't recognize= d > as "dynamic / end-user" by the regexps but are by the smartc algo ? > What's the total recognized by one of them. This way we'll have an idea > of the % of improvement. I don't have total figures and percentages on hand, but I can say that: 1/ All the entries that are recognized by the original smartc algo are al= so=20 recognized byt the regexps, except for situations where the original algo= =20 could make mistakes for some mailservers that would have part of their IP= in=20 their name, and that the regexp would properly recognize as mailservers=20 (Class C). I've already seen such cases with some mailserver pools that p= ut=20 the IP of the server as part of its name, such an example would be : mxpool10-123.231.bigisp.com [10.10.123.231] Here the original code would mistake, but not the my regexp series (that = tries=20 to identify mailservers first). 2/ The original code misses real "big players" end-user networks, such as= AOL=20 (example: ACB296F3.ipt.aol.com[172.178.150.243]) or cable.rogers.com=20 (example:=20 CPE00055df38a0c-CM00407b87707e.cpe.net.cable.rogers.com[69.197.247.61]), = or=20 AT&T (example: h00095b733a11.ne.client2.attbi.com[65.96.239.10]), etc, et= c. These big players end user networks are *huge* sources of viruses and spa= m, so=20 if we can improve the code to identify them properly, I guess it is a=20 valuable improvement -- even though I don't have precise figures and no t= ime=20 to do statistics ;-)) Cheers. --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |
From: Lionel B. <lio...@bo...> - 2005-02-15 16:18:34
|
Michel Bouissou wrote the following on 15.02.2005 16:39 : >Le Mardi 15 F=E9vrier 2005 15:38, Michel Bouissou a =E9crit : > =20 > >>It's heuristic... So imperfect, but my tests show it gives much more >>accurate results compared to the original simpler algorithm. >> =20 >> > >Here is an example of a series of hostnames/addresses that the original=20 >SQLgrey would take as "Class C" (for they don't have the end of their IP= =20 >address in their hostname), and my patch will consider "dynamic / end-us= er"=20 >machines, and thus use the full IP address : > =20 > For comparison : on the same sample how many addresses aren't recognized=20 as "dynamic / end-user" by the regexps but are by the smartc algo ?=20 What's the total recognized by one of them. This way we'll have an idea=20 of the % of improvement. |
From: Michel B. <mi...@bo...> - 2005-02-15 15:48:50
|
Le Mardi 15 F=E9vrier 2005 16:31, Lionel Bouton a =E9crit : > > Thanks, I'm worried about the size of the regexp though. There are two > things on my mind : > - is it maintainable ? I don't think it will need much maintenance. It's based on a (yet more=20 complex ;-) regexp I have built over years, and that very seldom needs=20 changes -- and the changes are improvements that are not strictly speakin= g=20 necessary nor urgent. Maintaining such a regexp is not that complex if you are careful ;-)=20 especially about line breaks if you split it into several lines (it seems= =20 that an escaped line break should NOT be put after a ) or } or ? or the=20 regexp won't work. I limit myself to splitting after "regular characters"= and=20 before a "|". > - how much processing time is needed for these regexp ? Given that we just process a short hostname and not a long file, and give= n=20 that Perl will compile the regexp only once except for the one that conta= ins=20 part of the IP as a variable, I believe the processing time should be=20 negligible (compared to the database accesses etc.) > I'd like to add this as a separate algorithm and put the regexp in > external files that can be reloaded I would hardcode this. I expect very little changes to this, if any. Load= ing=20 the regexps from external files would make this still more complex and=20 subject to errors... > I've not access on enough maillog to test > these regexps on each update. Are you willing to maintain them ? No problem, but as I said, expect very little change unless I discover a = major=20 boooog ;-) > I'll probably start the 1.5.x branch for this new algorithm. Meanwhile, you can test it on your own system, I don't think you'll notic= e any=20 performance impact, but it will probably be more accurate that the basic = IP=20 address test (see my last post with some examples...) Cheers. --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |
From: Michel B. <mi...@bo...> - 2005-02-15 15:39:57
|
Le Mardi 15 F=E9vrier 2005 15:38, Michel Bouissou a =E9crit : > > It's heuristic... So imperfect, but my tests show it gives much more > accurate results compared to the original simpler algorithm. Here is an example of a series of hostnames/addresses that the original=20 SQLgrey would take as "Class C" (for they don't have the end of their IP=20 address in their hostname), and my patch will consider "dynamic / end-use= r"=20 machines, and thus use the full IP address : 0x503ead5c.bynxx8.adsl-dhcp.tele.dk[80.62.173.92] 0xd5aae2a2.dhcp.kabelnettet.dk[213.170.226.162] ACB021F3.ipt.aol.com[172.176.33.243] ACB296F3.ipt.aol.com[172.178.150.243] ACB59FAD.ipt.aol.com[172.181.159.173] ACB7FAB2.ipt.aol.com[172.183.250.178] ACC8EBB5.ipt.aol.com[172.200.235.181] adsl21118.estpak.ee[80.235.8.154] adsl-dc-36cd0.adsl.wanadoo.nl[83.118.10.208] asd-z-efb1.adsl.wanadoo.nl[81.69.13.177] c8a65c72.bhz.virtua.com.br[200.166.92.114] c8fbc881.bhz.virtua.com.br[200.251.200.129] c906651b.virtua.com.br[201.6.101.27] c906712d.virtua.com.br[201.6.113.45] c90688a5.virtua.com.br[201.6.136.165] c906e92a.virtua.com.br[201.6.233.42] c9110cb5.rjo.virtua.com.br[201.17.12.181] catv-5062ae15.catv.broadband.hu[80.98.174.21] catv-506315e6.catv.broadband.hu[80.99.21.230] catv-d5de9650.catv.broadband.hu[213.222.150.80] cbl-il8-48.casscabletv.com[12.163.48.49] cc550873-a.hnglo1.ov.home.nl[217.122.248.216] cp346637-a.tilbu1.nb.home.nl[84.24.101.229] cp427353-a.tilbu1.nb.home.nl[84.24.100.163] cp644250-a.venlo1.lb.home.nl[84.29.41.12] CPE00055df38a0c-CM00407b87707e.cpe.net.cable.rogers.com[69.197.247.61] CPE00062930c118-CM014090206357.cpe.net.cable.rogers.com[24.100.251.82] CPE0008a12a42eb-CM400047235173.cpe.net.cable.rogers.com[24.100.193.230] CPE000ae6a33a8c-CM000a735f750d.cpe.net.cable.rogers.com[69.198.32.123] CPE000cf1727e77-CM0012c90feac2.cpe.net.cable.rogers.com[69.193.9.232] CPE0010dc418a71-CM012059934437.cpe.net.cable.rogers.com[24.156.89.186] CPE00402b4b28ed-CM00080d53844c.cpe.net.cable.rogers.com[69.194.52.136] CPE0080c6eaa3d6-CM013359900259.cpe.net.cable.rogers.com[24.156.43.203] CPE0080c8b37441-CM0012250232ca.cpe.net.cable.rogers.com[69.197.38.226] dial-369.lodz.dialog.net.pl[62.87.196.113] dsl81-214-11370.adsl.ttnet.net.tr[81.214.44.106] dsl81-214-11652.adsl.ttnet.net.tr[81.214.45.132] dsl81-214-11805.adsl.ttnet.net.tr[81.214.46.29] dsl81-214-12018.adsl.ttnet.net.tr[81.214.46.242] dsl81-214-29753.adsl.ttnet.net.tr[81.214.116.57] dsl81-214-39833.adsl.ttnet.net.tr[81.214.155.153] dsl81-215-21703.adsl.ttnet.net.tr[81.215.84.199] dsl81-215-24770.adsl.ttnet.net.tr[81.215.96.194] dsl81-215-29615.adsl.ttnet.net.tr[81.215.115.175] dsl81-215-30444.adsl.ttnet.net.tr[81.215.118.236] dsl81-215-30614.adsl.ttnet.net.tr[81.215.119.150] dsl81-215-30905.adsl.ttnet.net.tr[81.215.120.185] dsl81-215-4517.adsl.ttnet.net.tr[81.215.17.165] dsl81-215-53937.adsl.ttnet.net.tr[81.215.210.177] dsl81-215-54155.adsl.ttnet.net.tr[81.215.211.139] dsl81-215-54618.adsl.ttnet.net.tr[81.215.213.90] dsl81-215-55123.adsl.ttnet.net.tr[81.215.215.83] dsl81-215-62680.adsl.ttnet.net.tr[81.215.244.216] dsl-hmlgw1he3.dial.inet.fi[80.220.227.227] dsl-mm224.ez-net.com[65.172.188.109] gv-vb-3c9f.adsl.wanadoo.nl[212.129.188.159] h00061bda6f73.ne.client2.attbi.com[24.218.168.78] h00095b733a11.ne.client2.attbi.com[65.96.237.21] h00095b733a11.ne.client2.attbi.com[65.96.239.10] h00096b197238.ne.client2.attbi.com[24.61.250.89] h000ea69e7af4.ne.client2.attbi.com[24.128.63.191] h0011110f7ab3.ne.client2.attbi.com[24.128.58.142] h00118018fafb.ne.client2.attbi.com[24.128.215.26] h0040ca40da7b.ne.client2.attbi.com[24.91.82.249] h00e06fbe43f5.ne.client2.attbi.com[66.31.5.211] kf-sdm-tg06-0727.dial.kabelfoon.nl[62.45.194.216] lbc9-d9ba927f.pool.mediaWays.net[217.186.146.127] lbc9-d9ba9291.pool.mediaWays.net[217.186.146.145] lbc9-d9ba92ac.pool.mediaWays.net[217.186.146.172] lbck-d9b886b3.pool.mediaWays.net[217.184.134.179] lls-c-1e8a1.adsl.wanadoo.nl[81.70.6.161] mstr195175-28763.dial-in.ttnet.net.tr[195.175.192.92] mstr195175-28807.dial-in.ttnet.net.tr[195.175.192.136] mstr195175-29523.dial-in.ttnet.net.tr[195.175.195.84] mstr195175-29524.dial-in.ttnet.net.tr[195.175.195.85] mstr195175-29617.dial-in.ttnet.net.tr[195.175.195.178] mstr195175-30261.dial-in.ttnet.net.tr[195.175.198.54] mstr195175-30277.dial-in.ttnet.net.tr[195.175.198.70] mstr195175-30294.dial-in.ttnet.net.tr[195.175.198.87] mstr195175-30405.dial-in.ttnet.net.tr[195.175.198.198] mstr195175-30425.dial-in.ttnet.net.tr[195.175.198.218] Ottawa-HSE-ppp4085231.sympatico.ca[70.49.34.242] oxfo-dhcp-ws-186.dsl.maqs.net[66.187.40.187] oxford-dsl-26.swnebr.net[69.2.6.155] p3E9E4915.dip.t-dialin.net[62.158.73.21] p3EE0A92D.dip.t-dialin.net[62.224.169.45] p50823AF8.dip.t-dialin.net[80.130.58.248] p5082557F.dip0.t-ipconnect.de[80.130.85.127] p50837D68.dip0.t-ipconnect.de[80.131.125.104] p50837F3A.dip0.t-ipconnect.de[80.131.127.58] p50923A15.dip.t-dialin.net[80.146.58.21] p54856CB1.dip.t-dialin.net[84.133.108.177] p54857A2B.dip.t-dialin.net[84.133.122.43] p5485901E.dip0.t-ipconnect.de[84.133.144.30] p548C8DE6.dip.t-dialin.net[84.140.141.230] pcp0010439227pcs.parads01.nm.comcast.net[68.35.123.64] pcp0010846152pcs.essex01.md.comcast.net[68.48.130.57] pcp0011134165pcs.neave01.pa.comcast.net[69.248.43.94] pcp0011537562pcs.aboit01.in.comcast.net[69.245.133.123] pcp01934410pcs.nhaven01.ct.comcast.net[68.63.87.156] pcp02171061pcs.brghtn01.mi.comcast.net[68.43.207.125] pcp02861817pcs.flrnc01.al.comcast.net[68.62.232.163] pcp03267103pcs.waldlk01.mi.comcast.net[68.60.178.140] pcp03766529pcs.montvl01.pa.comcast.net[68.34.242.208] pcp03822180pcs.clintn01.ct.comcast.net[68.46.210.153] pcp04591547pcs.harimn01.tn.comcast.net[68.47.189.215] pcp08020054pcs.dalect01.va.comcast.net[68.48.155.156] pcp08774927pcs.mtlrel01.nj.comcast.net[68.36.36.37] pcp08935357pcs.trentn01.nj.comcast.net[69.141.145.106] pcp09003316pcs.spedwy01.in.comcast.net[68.58.13.174] pcp09946743pcs.hyatsv01.md.comcast.net[69.140.15.206] pD9503DE4.dip.t-dialin.net[217.80.61.228] pD9521AC6.dip.t-dialin.net[217.82.26.198] pD952496A.dip.t-dialin.net[217.82.73.106] pD95256B2.dip.t-dialin.net[217.82.86.178] pD95314D1.dip.t-dialin.net[217.83.20.209] pD9542A7A.dip.t-dialin.net[217.84.42.122] pD954ADEB.dip.t-dialin.net[217.84.173.235] pD955E999.dip.t-dialin.net[217.85.233.153] pD95731A6.dip0.t-ipconnect.de[217.87.49.166] pD958902E.dip.t-dialin.net[217.88.144.46] pD9589A3B.dip.t-dialin.net[217.88.154.59] pD95EEF12.dip.t-dialin.net[217.94.239.18] pD95F43DE.dip0.t-ipconnect.de[217.95.67.222] pD95F46CE.dip0.t-ipconnect.de[217.95.70.206] pD95F4828.dip0.t-ipconnect.de[217.95.72.40] pD95F484B.dip0.t-ipconnect.de[217.95.72.75] pD9E182A1.dip.t-dialin.net[217.225.130.161] pD9E44064.dip.t-dialin.net[217.228.64.100] pD9E488B2.dip.t-dialin.net[217.228.136.178] pD9E59AA3.dip.t-dialin.net[217.229.154.163] pD9FE2D54.dip0.t-ipconnect.de[217.254.45.84] poctnt-1-235.dialup.enter.net[216.193.169.15] ppp2582.hakata01.bbiq.jp[210.203.194.42] rt-z-23c40.adsl.wanadoo.nl[81.70.90.64] S01060000b4921e35.cg.shawcable.net[68.145.237.24] S01060001023fd7dc.cg.shawcable.net[68.144.193.49] S01060004e20311de.ed.shawcable.net[68.149.226.216] S010600055d29d6f0.vc.shawcable.net[24.85.71.78] S010600065b1cf9d8.vc.shawcable.net[24.86.104.115] S0106000795aeb64d.vc.shawcable.net[24.80.152.113] S01060008a10ccf19.vf.shawcable.net[70.68.209.121] S01060008a10ccf19.vf.shawcable.net[70.68.244.221] S01060008a11e94cc.ed.shawcable.net[68.149.249.248] S0106000b6a93aadb.vc.shawcable.net[24.84.40.203] S0106000bdb0e2be7.ok.shawcable.net[24.70.174.225] S0106000c7615dd58.wp.shawcable.net[24.77.235.208] S01060010a4991948.cg.shawcable.net[68.146.151.175] S010600112f46d19f.vc.shawcable.net[24.83.7.214] S01060040ca4003c4.wp.shawcable.net[24.77.99.183] S01060060b0a3cd95.ed.shawcable.net[68.150.64.140] S01060080c6ee43f8.vs.shawcable.net[24.84.102.153] S01060080c876a2df.vc.shawcable.net[24.83.21.154] S01060080c8e2e5a7.ed.shawcable.net[68.149.0.81] S01060090f52ad732.ed.shawcable.net[68.150.31.145] S010600d0b7c4df4a.ed.shawcable.net[68.151.5.108] user-0c6t3il.cable.mindspring.com[24.110.142.85] user-0c8htoa.cable.mindspring.com[24.136.247.10] user-0ccstki.cable.mindspring.com[24.206.118.146] user-0cdveau.cable.mindspring.com[24.223.185.94] user-0cetrmf.cable.mindspring.com[24.238.238.207] user-10lf4ta.cable.mindspring.com[65.87.147.170] user-10lf97p.cable.mindspring.com[65.87.164.249] user-12hc73a.cable.mindspring.com[69.22.28.106] user-12hcc0d.cable.mindspring.com[69.22.48.13] user-12hcqi5.cable.mindspring.com[69.22.106.69] user-12hcqph.cable.mindspring.com[69.22.107.49] user-12l2ttt.cable.mindspring.com[69.81.119.189] user-12ldaom.cable.mindspring.com[69.86.171.22] user-12lm16t.cable.mindspring.com[69.91.4.221] user-12lm33c.cable.mindspring.com[69.91.12.108] user-12lmp25.cable.mindspring.com[69.91.100.69] xdsl-2836.walbrzych.dialog.net.pl[84.40.186.20] xdsl-4689.zgora.dialog.net.pl[84.40.166.81] xdsl-4875.wroclaw.dialog.net.pl[84.40.128.11] xdsl-7183.wroclaw.dialog.net.pl[84.40.137.15] --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |