From: Riaan K. <ria...@gm...> - 2007-01-26 09:33:39
|
How difficult would it be to allow the postfix attributes on both sides of the regexp? If you add two derivative attributes: sender_domain & recipient_domain you can activate greylisting if: sender_domain =~ recipient_domain which would be a very nice rule for anyone running a relay. I see many spam forgeries where a sender address is designed to fall under the same organisation domain to appear legit. Another rule to play with would be: client_name !~ helo_name which, although standards-compliant AFAIK, might be a bit harsh, I don't know. Possibly there are other tricks to invent if the postfix attributes can be used as plain variables both sides of the discrimination regexp, but can't think of another example now. What do you think? thanks, Riaan |
From: David L. <da...@la...> - 2007-01-26 10:30:36
|
Riaan Kok wrote: > How difficult would it be to allow the postfix attributes on both > sides of the regexp? > > If you add two derivative attributes: > sender_domain & recipient_domain > you can activate greylisting if: > sender_domain =~ recipient_domain > which would be a very nice rule for anyone running a relay. I see > many spam forgeries where a sender address is designed to fall under > the same organisation domain to appear legit. Another rule to play > with would be: > client_name !~ helo_name > which, although standards-compliant AFAIK, might be a bit harsh, I > don't know. Possibly there are other tricks to invent if the postfix > attributes can be used as plain variables both sides of the > discrimination regexp, but can't think of another example now. > > What do you think? I would keep it simple. Postfix can already do this sort of thing in its access maps with restriction classes. So do it there, don't duplicate functionality. Later, DAvid |
From: Dan F. <da...@ha...> - 2007-01-26 16:35:29
Attachments:
sqlgrey-discrimination-attr.patch
|
Riaan Kok wrote: > How difficult would it be to allow the postfix attributes on both > sides of the regexp? > > First, thanks for writing,.. I was beginning to worry that i was the only one finding the discrimination feature usefull :). Without going into the discussion as to how usefull this would be, static comparison (no regex involved) is very simple to make and i see no the harm in it Ive made a "non-intrusive" patch. Everything will work as before (this i have already tested on a live-system). Only difference is, that it now also accepts: == and != (instead of =~ and !~ which means regex) Ie.: sender_domain == recipient_domain client_name != helo_name The patch is attached. Apply using: $ patch /path/to/sqlgrey < sqlgrey-discrimination-attr.patch Give it a try and tell me if it does the job for you. - Dan |
From: Dave S. <dst...@ma...> - 2007-01-26 23:43:52
|
As promised, I am reporting back with my findings. Please take them as = "what I see", your milage will vary... =20 I altered the source to do memcached lookups on everything except the = "connect" table since that would get a lot of INSERTs that were never read = again. I am seeing a 20%-40% hit on memcached, and the load on the = Postgres server that handles the SQLGrey DB is down by a few LA "points". =20 So, I am pleased with the memcached changes, and will tune the system a = bit more and let everyone know when I have squeezed every bit of power = from it. Once this is done, I will make the changes for v1.7 and get them = to Lionel for anyone to use.=20 =20 =20 Dave Strickler MailWise LLC 617-933-5810 (direct) www.mailwise.com ( http://www.mailwise.com/ ) "Intelligent E-mail Protection" This message has been certified virus-free by MailWise Filter - The real-ti= me, intelligent, e-mail firewall used to scan inbound and outbound messages = for SPAM, Viruses and Content. =0A=0A For more information, please visit: http:= //www.mailwise.com=0A |
From: Dan F. <da...@ha...> - 2007-01-27 23:33:45
|
Dave Strickler wrote: > I altered the source to do memcached lookups on everything except the "connect" table since that would get a lot of INSERTs that were never read again. I am seeing a 20%-40% hit on memcached, and the load on the Postgres server that handles the SQLGrey DB is down by a few LA "points". > Sounds very interesting. One of my collegues have been going on and on about how i should be using memcached for sqlgrey, but i was unsure that it would do anything for us. By "LA" you mean Load Average?. And what how much are "points"? -0.3 or -2.0 ect.? I dont know much about memcached, but INSERTS (and generally write operations) are usually pretty heavy on SQL servers, so why not include the "connect" table as well? And what about the boxes running memcached. Do they take the perfomance hit instead of the SQL server? Im very curious as to what i can gain in a clustered setup like mine, where each mailserver has its own SQL-slave, if each mailserver also has to run memcached. > So, I am pleased with the memcached changes, and will tune the system a bit more and let everyone know when I have squeezed every bit of power from it. Once this is done, I will make the changes for v1.7 and get them to Lionel for anyone to use. > You mean 1.7.4 right? That would be great, since it probably will make it much easier for either of us to patch the CVS version. - Dan |
From: Dave S. <dst...@ma...> - 2007-01-28 03:40:30
|
>>> Dan Faerch <da...@ha...> 6:33 PM Saturday, January 27, 2007 >>> Dave Strickler wrote: > I altered the source to do memcached lookups on everything except the = "connect" table since that would get a lot of INSERTs that were never read = again. I am seeing a 20%-40% hit on memcached, and the load on the = Postgres server that handles the SQLGrey DB is down by a few LA "points". > =20 Sounds very interesting. One of my collegues have been going on and on=20 about how i should be using memcached for sqlgrey, but i was unsure = that=20 it would do anything for us. By "LA" you mean Load Average?. And what=20 how much are "points"? -0.3 or -2.0 ect.? >>>> Yes, LA=3DLoad Average. Our SQLGrey DB runs on a beefy server with = one other DB that's almost all ready-only, and is not (yet) served via = memcached. Since using SQLGrey with memcached, we are seeing a drop of = about 2 LA points. From about 5 to about 3. Of course, this varies with = time of day, etc. =20 I dont know much about memcached, but INSERTS (and generally write=20 operations) are usually pretty heavy on SQL servers, so why not include=20 the "connect" table as well? >>> I tried the connect table, and got "good", but not "great" performance.= I think this is due to the nature of the table, and the nature of a = cache. I don't know about your site (and I would like to hear about it), = but our connect table gets about 30% of it's initial INSERTs come back for = a 2nd attempt, and are then removed from the table. So, the best cache we = could ever expect was 30%, and by it's nature, the 30% will only benefit = from a cache once. Think of the connect table as a "write-once, read-once" = table. Tables like the AWL, are "write-once, read-many", and therefore = greatly benefit from a cache. As with any caching system, you have to = *watch* what you are caching. Just "caching it all" can add overhead, and = fill up your cache with needless data. And what about the boxes running memcached. Do they take the perfomance=20 hit instead of the SQL server? >>> Boxes are older CPU servers that used to run SQL, etc, and have a lot = of RAM, but CPUs that are lacking by today's standards. With memcached = using RAM only, they certainly take the hit off SQL, but don't really take = a hit themselves. Much like writing to a RAM disk in Linux, you don't see = any noticeable increase in LA (or anything else) on heavy I/O functions. = So, the servers running memcached aren't even dedicated servers, although = we don't have them running any large CPU loads. Im very curious as to what i can gain in a clustered setup like mine,=20 where each mailserver has its own SQL-slave, if each mailserver also = has=20 to run memcached. >>> My guess is your Cluster will get a lot less reads ;-) > So, I am pleased with the memcached changes, and will tune the system a = bit more and let everyone know when I have squeezed every bit of power = from it. Once this is done, I will make the changes for v1.7 and get them = to Lionel for anyone to use.=20 > =20 You mean 1.7.4 right? That would be great, since it probably will make=20 it much easier for either of us to patch the CVS version. >>> Yes, any version you want. I am going to share how to make a module = (like is_in_awl()) use memcached, and let you all write the code the = 'real' way. I am not a full-time programmer, just a salty-dog CTO, and = don't even know Perl. What I have is a hack, and I need a Perl coder to = clean up the code. It's only about 20 lines of templated change in a few = modules, so it won't be much work. Who and where should I send this code = snippet ? =20 --- Dave - Dan ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share = your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CID=3DDE= VDEV=20 _______________________________________________ Sqlgrey-users mailing list Sql...@li...=20 https://lists.sourceforge.net/lists/listinfo/sqlgrey-users This message has been certified virus-free by MailWise Filter - The real-ti= me, intelligent, e-mail firewall used to scan inbound and outbound messages = for SPAM, Viruses and Content. =0A=0A For more information, please visit: http:= //www.mailwise.com=0A |
From: Dan F. <da...@ha...> - 2007-01-28 16:08:04
|
Dave Strickler wrote: > memcached. Since using SQLGrey with memcached, we are seeing a drop of about 2 LA points. From about 5 to about 3. Of course, this Thats not bad at all, actually. > 'real' way. I am not a full-time programmer, just a salty-dog CTO, and don't even know Perl. What I have is a hack, and I need a Perl Im not a full-time coder at all. Though i have been coding for 10-15 years, Perl i something i was "forced" to learn at work, to be able to maintain some of our other core systems. And how i really really hate Perl ;).. I think ive probably spend around 8 weeks of perl coding in my entire life, 3 of them dedicated to sqlgrey. Mostly its somewhat easy for me, since it resembles C so much, but all this @var, $var, $@var, $#{$var}, ect. can be really hard for me sometimes ;). Luckily im a very thorough tester of my own code. > Who and where should I send this code snippet ? Well.. I usually always post my patches here first before throwing into CVS. Though most ppl probably dont read/try the patches, its comforting to know that everyone at least had a chance to point out any mistakes ;) - Dan |
From: Dave S. <dst...@ma...> - 2007-01-28 16:26:27
|
Then we are in the same boat - I'm not a fan of Perl either, but it does = run fast, and with low system footprint, and I have always been impressed = with that. Ironically, I think it makes a great platform for SQLGrey. =20 As for the code, I will post a module here in a few minutes. I don't know = how to do a patch, and I bet it would be a mess for me as I have customized= SQLGrey with in-house tweaks that I can't share {sorry}. =20 Dave Strickler MailWise LLC 617-933-5810 (direct) www.mailwise.com ( http://www.mailwise.com/ ) "Intelligent E-mail Protection" >>> Dan Faerch <da...@ha...> 11:07 AM Sunday, January 28, 2007 >>> Dave Strickler wrote: > memcached. Since using SQLGrey with memcached, we are seeing a drop of = about 2 LA points. From about 5 to about 3. Of course, this=20 Thats not bad at all, actually. > 'real' way. I am not a full-time programmer, just a salty-dog CTO, and = don't even know Perl. What I have is a hack, and I need a Perl=20 Im not a full-time coder at all. Though i have been coding for 10-15=20 years, Perl i something i was "forced" to learn at work, to be able to=20 maintain some of our other core systems. And how i really really hate=20 Perl ;).. I think ive probably spend around 8 weeks of perl coding in = my=20 entire life, 3 of them dedicated to sqlgrey. Mostly its somewhat easy for me, since it resembles C so much, but all=20 this @var, $var, $@var, $#{$var}, ect. can be really hard for me=20 sometimes ;). Luckily im a very thorough tester of my own code. > Who and where should I send this code snippet ? Well.. I usually always post my patches here first before throwing into = CVS. Though most ppl probably dont read/try the patches, its comforting to = know that everyone at least had a chance to point out any mistakes ;) - Dan ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share = your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CID=3DDE= VDEV=20 _______________________________________________ Sqlgrey-users mailing list Sql...@li...=20 https://lists.sourceforge.net/lists/listinfo/sqlgrey-users This message has been certified virus-free by MailWise Filter - The real-ti= me, intelligent, e-mail firewall used to scan inbound and outbound messages = for SPAM, Viruses and Content. =0A=0A For more information, please visit: http:= //www.mailwise.com=0A |
From: Dave S. <dst...@ma...> - 2007-01-28 17:05:30
|
First, some memcache background. You can find the details on http://www.dan= ga.com/memcached/ as well as the Perl APIs and a small example of code. =20 Memcache is a caching system which allows placing data in cache, and = getting it out quickly. You can think of memcache as a clipboard for data. = If System-A does a SQL lookup and gets the data, it takes a few microsecond= s to put it into memcached. Then when System-A, System-B, etc needs to do = the SQL query again, it checks the memcache "clipboard", and if it's = there, it reads it from memcache instead of reading from SQL. =20 Often code you write makes the same SQL SELECTs over and over again. Such = is the case with SQLGrey. Each email must be checked against SQL. However, = my code checks to see if we have already done the SQL lookup. For = instance, if SQLGrey gets an email from bo...@ao..., it can't find it in = memcache, so it needs to look it up in SQL. When it does, it saves the = result in memcached before it goes on as usual. Then, the next time = SQLgrey gets an email from bo...@ao..., it looks in the memcache and finds = the data. No need to bother SQL with the query.=20 =20 Using memcache saves time in the SQL lookup, but a small amount of time. I = have found memcached lookups are about 10x faster than our high-end SQL = servers, but still, we are talking about saving fractions of a second. = Where you really save data is in the load on SQL. If mail volume on = SQLGrey is only a few thousand emails a day, memcached will only save you = a few hundred SQL requests - not a huge savings. But, if you're SQL server = already runs slowly, or you get millions of emails a day, the savings in = load on your SQL server is substantial. =20 First off, at the top, I added two models MD5 and Memcached: =20 package sqlgrey; use strict; use Pod::Usage; use Getopt::Long 2.25 qw(:config posix_default no_ignore_case); use Net::Server::Multiplex; use DBI; use POSIX ':sys_wait_h'; use Sys::Hostname; use Digest::MD5; use Cache::Memcached; I used the MD5 library from CPAN to get a unique token so I could store = the result of the request. The following is an example of a function I = altered in SQLGrey. Comments are included. Note that all I am doing is = checking memcached for data before I look up in SQL. If I can't find it = in memcached, I look it up, and store it in memcached. You can try adding = this idea of caching into all the SQL lookups, and see which ones work = best for you. =20 sub is_in_from_awl { my ($self, $sender_name, $sender_domain, $host) =3D @_; =20 # Note I store the results of my find in memcached as "T" or "F". I = did this as I don't know how to use TRUE or FALSE in Perl, and since it's = such # a small amount of data, it should be wash in data storage. =20 # We need this as a variable as I will use it for the MD5 hash later. = Note the SQL is simpler than the=20 # original SQLGrey code. I did this to allow better caching=20 my $sql =3D "SELECT 1 FROM $from_awl WHERE sender_name =3D '$sender_nam= e' AND sender_domain =3D '$sender_domain' AND src =3D '$host'" ; =20 # Create a new variable with the SQL string as an MD5 hash. And yes, = Lionel said this was not a great idea ;-) use Digest::MD5; my $md5 ; $md5 =3D Digest::MD5->new; $md5->add($sql) ; my $md5_id =3D $md5->hexdigest ; print "SQL hashed to $md5_id \n" ; =20 # This memcached library came from http://www.danga.com/memcached/=20 use Cache::Memcached; =20 # I use three different memcached servers (note the fake IPs below - = use yours instead) and I use the standard port of "11211" # I set debug off (it may help when you are coding), and let the = library compress the data before it stores # it in memcached. You can use as few memcached servers as 1 to start, = and I would recommend this # until you are comfortable with memcached. Just add more into this = array as you need them. my $memcached =3D new Cache::Memcached { 'servers' =3D> ["1.2.3.4:11211", "1.2.3.5:11211","1.2.3.6:11211"], 'debug' =3D> 0 , 'compress_threshold' =3D> 10_000, }; =20 # Set the length of time you want this entry to Cache for. I set it to = expire in 14 days, but that's just what we did. # There are no "right" settings here. The TTL is measured in seconds. = After this time, the data will automatically # removed from cache. Set this low at first, perhaps 24 hours, until = you are comfortable with memcached. my $memcached_ttl =3D 60 * 60 * 24 * 14 ; =20 # With any luck, we know the value in memcached, and we will # look it up and return it here. No SQL transaction needed. my $boolean =3D $memcached->get("$md5_id"); if ($boolean) { # We found the boolean, and we return it. $self->mylog('whitelist', 2, "HIT from is_in_from_awl() - found in = memcached as $boolean \n") ; if ($boolean eq 'T') { return 1 ; } else { return 0 ; } } =20 $self->mylog('whitelist', 2, "MISS from is_in_from_awl() - not found = in memcached. \n") ; =20 # last_seen less than $self->{sqlgrey}{awl_age} days ago my $sth =3D $self->prepare("SELECT 1 FROM $from_awl " . 'WHERE sender_name =3D ? ' . 'AND sender_domain =3D ? ' . 'AND src =3D ? ' . 'AND last_seen > ' . $self->past_tstamp($self->{sqlgrey}{awl_age}, 'DAY') ); if (!defined $sth or !$sth->execute($sender_name, $sender_domain, $host)) = { $self->db_unavailable(); $self->mylog('dbaccess', 0, "error: couldn't access $from_awl table: = $DBI::errstr"); return 1; # in doubt, accept } else { $self->db_available(); } my $result =3D $sth->fetchall_arrayref(); if ($#$result !=3D 0)=20 { $memcached->set("$md5_id", "F", $memcached_ttl); return 0; # not a single entry }=20 else=20 { $memcached->set("$md5_id", "T", $memcached_ttl); return 1; # one single entry (no multiple entries by design) } } >>> Dan Faerch <da...@ha...> 11:07 AM Sunday, January 28, 2007 >>> Dave Strickler wrote: > memcached. Since using SQLGrey with memcached, we are seeing a drop of = about 2 LA points. From about 5 to about 3. Of course, this=20 Thats not bad at all, actually. > 'real' way. I am not a full-time programmer, just a salty-dog CTO, and = don't even know Perl. What I have is a hack, and I need a Perl=20 Im not a full-time coder at all. Though i have been coding for 10-15=20 years, Perl i something i was "forced" to learn at work, to be able to=20 maintain some of our other core systems. And how i really really hate=20 Perl ;).. I think ive probably spend around 8 weeks of perl coding in = my=20 entire life, 3 of them dedicated to sqlgrey. Mostly its somewhat easy for me, since it resembles C so much, but all=20 this @var, $var, $@var, $#{$var}, ect. can be really hard for me=20 sometimes ;). Luckily im a very thorough tester of my own code. > Who and where should I send this code snippet ? Well.. I usually always post my patches here first before throwing into = CVS. Though most ppl probably dont read/try the patches, its comforting to = know that everyone at least had a chance to point out any mistakes ;) - Dan ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share = your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CID=3DDE= VDEV=20 _______________________________________________ Sqlgrey-users mailing list Sql...@li...=20 https://lists.sourceforge.net/lists/listinfo/sqlgrey-users This message has been certified virus-free by MailWise Filter - The real-ti= me, intelligent, e-mail firewall used to scan inbound and outbound messages = for SPAM, Viruses and Content. =0A=0A For more information, please visit: http:= //www.mailwise.com=0A |
From: Riaan K. <ria...@gm...> - 2007-01-28 12:28:47
|
On 26/01/07, Dan Faerch <da...@ha...> wrote: > > Riaan Kok wrote: > > How difficult would it be to allow the postfix attributes on both > > sides of the regexp? > > > > Without going into the discussion as to how usefull this would be, > static comparison (no regex involved) is very simple to make and i see > no the harm in it > > Ive made a "non-intrusive" patch. Everything will work as before (this i > have already tested on a live-system). > Only difference is, that it now also accepts: > == > and > != > (instead of =~ and !~ which means regex) > > Ie.: > sender_domain == recipient_domain > client_name != helo_name > > > The patch is attached. Apply using: > $ patch /path/to/sqlgrey < sqlgrey-discrimination-attr.patch > > Give it a try and tell me if it does the job for you. > > - Dan sweet; nice and small patch; thanks! I'll test it early this week, and if there's any interesting statistics that pops out, I'll post them.. Riaan |
From: Dave S. <dst...@ma...> - 2007-01-28 15:11:15
|
Dan (and anyone else that wants to chime in), =20 I have altered the "sub mylog {}" to make it not write logs at all, and = altered again to write to a separate file on /tmp, and still, at the top = of each hour, I get a "450 Server config..." error on each message. =20 This leads me to think there are 2 possibilities: =20 1. There are more functions writing to syslog than I think... =20 2. Something else besides the roll of /var/log/mail is causing these = errors. =20 Anything else I should be looking for ? =20 =20 Dave Strickler MailWise LLC 617-933-5810 (direct) www.mailwise.com ( http://www.mailwise.com/ ) "Intelligent E-mail Protection" This message has been certified virus-free by MailWise Filter - The real-ti= me, intelligent, e-mail firewall used to scan inbound and outbound messages = for SPAM, Viruses and Content. =0A=0A For more information, please visit: http:= //www.mailwise.com=0A |
From: Dan F. <da...@ha...> - 2007-01-28 16:40:53
|
Dave Strickler wrote: > 1. There are more functions writing to syslog than I think... > Dunno.. Try ie. commenting out all syslog stuff from the: "my $server = bless {" block. > > 2. Something else besides the roll of /var/log/mail is causing these errors. > > Well.. If you can skew the logrotation 30 minutes youd know if its related. Other than that i cant think of anything else.. - Dan |
From: Dave S. <dst...@ma...> - 2007-01-28 17:26:09
|
{slaps head} of course! =20 It's the log roll for sure. Do you use a log roller that doesn't interfere = with SQLGrey ? =20 Dave Strickler MailWise LLC 617-933-5810 (direct) www.mailwise.com ( http://www.mailwise.com/ ) "Intelligent E-mail Protection" >>> Dan Faerch <da...@ha...> 11:40 AM Sunday, January 28, 2007 >>> Dave Strickler wrote: > 1. There are more functions writing to syslog than I think... > =20 Dunno.. Try ie. commenting out all syslog stuff from the: "my $server =3D bless {" block. > =20 > 2. Something else besides the roll of /var/log/mail is causing these = errors. > =20 > =20 Well.. If you can skew the logrotation 30 minutes youd know if its = related. Other than that i cant think of anything else.. - Dan ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share = your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CID=3DDE= VDEV=20 _______________________________________________ Sqlgrey-users mailing list Sql...@li...=20 https://lists.sourceforge.net/lists/listinfo/sqlgrey-users This message has been certified virus-free by MailWise Filter - The real-ti= me, intelligent, e-mail firewall used to scan inbound and outbound messages = for SPAM, Viruses and Content. =0A=0A For more information, please visit: http:= //www.mailwise.com=0A |
From: Dan F. <da...@ha...> - 2007-01-28 19:23:31
|
Dave Strickler wrote: > It's the log roll for sure. Do you use a log roller that doesn't interfere with SQLGrey ? > Eh. I use logrotate from default debian. Not sure if it doesnt interfere. I rotate once every day on each box and restart sqlgrey at the same time, so i wouldnt notice.. But its probably the bug in sqlgrey that killes sqlgrey when syslog is offline. So itll be fixed soon.. - Dan |
From: Dan F. <da...@ha...> - 2007-02-01 20:49:18
|
Riaan Kok wrote: > > sweet; nice and small patch; thanks! I'll test it early this week, > and if > there's any interesting statistics that pops out, I'll post them.. > > Riaan Riaan. Is it usefull? If it is usefull to you, and no one has objections, i will include this feature in 1.7.5. - Dan |
From: Riaan K. <ria...@gm...> - 2007-02-02 09:49:14
|
On 01/02/07, Dan Faerch <da...@ha...> wrote: > > Riaan Kok wrote: > > > > sweet; nice and small patch; thanks! I'll test it early this week, > > and if > > there's any interesting statistics that pops out, I'll post them.. > > > > Riaan > Riaan. Is it usefull? > If it is usefull to you, and no one has objections, i will include this > feature in 1.7.5. > > So far so good.. I've only recently managed to put this in a testing box. I've enabled the "discrimination_add_rulenr" option to get an idea of how effective the rules are.. which brings me to this question: When more than one rules match (which is likely), which one gets displayed in the logs? Does the last one that matches gets displayed, or does processing stop as soon as the first rule from the top of the " discrimination.regexp" file raises a "greylist this" flag? I quite like the idea of this feature.. especially that you can check for fqdn-ness and so forth, because postfix itself is quite black&white with those checks. I have come across some legit web-based smtp clients and other that are just poorly configured, and they're so far removed from normal administrator access that it is impossible to expect them to get fixed. And maintaining any kind of exemption list for poor quality clients is such a drag. cheers, R |
From: Riaan K. <ria...@gm...> - 2007-02-02 11:35:42
|
On 02/02/07, Riaan Kok <ria...@gm...> wrote: > > On 01/02/07, Dan Faerch <da...@ha...> wrote: > > > > Riaan Kok wrote: > > > > > > sweet; nice and small patch; thanks! I'll test it early this week, > > > and if > > > there's any interesting statistics that pops out, I'll post them.. > > > > > > Riaan > > Riaan. Is it usefull? > > If it is usefull to you, and no one has objections, i will include this > > feature in 1.7.5. > > > > > When more than one rules match (which is likely), which one gets displayed > in the logs? Does the last one that matches gets displayed, or does > processing stop as soon as the first rule from the top of the " > discrimination.regexp" file raises a "greylist this" flag? > > R > > a case that I saw in the logs intrigued me, so I did a quick lookup of the keys() function and read a bit about hashes.. Perl's hash structure is not ordered in any way. So, iterating through a hash returns the elements in undefined order.. so, in this code, the first one of the regexps in random order that matches activates greylisting. There's not much point in gathering statistics then! (By the way, this could be related to what you referred to in the code regarding resetting the hash.) >From reading the code it seems like there are a few variables in this play: $hash: contains a whole instance of a regexp rule: - $var: contains the postfix attribute - $data: a hash containing: * $rulenr: just that, the number of the rule * $regex: containing the two keys "oper" and "regexp" So, how about rather storing the list of rules in an array, which does away with the need for storing the $rulenr, and each array item like $rule containing: $rule->{attrib} $rule->{oper} $rule->{regexp} This would be a bit more invasive to do, but it would allow the rule number return to generate more meaningful statistics.. One can then order the rules in the file from most specific to most general, and then pretty much be able to gather information by counting the return lines of a grep.. This could (maybe) encourage people to experiment with discrimination a bit more! For the moment I shall examine the emails that bypasses my regexp list.. I can adjust the log level on my own to make the logs provide this information, but, in general, wouldn't it make sense to log at level 2 when no regexp matches? Sqlgrey normally indicates at log level 2 whether a given smtp combination passes awl, gets greylisted, is new, etc., so indicating that greylisting is bypassed due to no regexp match makes sense to me to be at level 2 as well.. And, yet another suggestion: it could be useful to include in one or more of the documentation locations a quick list of postfix attributes that can be used with discrimination.. I'll stop for now! thanks, Riaan |
From: Dan F. <da...@ha...> - 2007-02-02 19:36:00
|
Riaan Kok wrote: >> When more than one rules match (which is likely), which one gets >> displayed >> in the logs? Does the last one that matches gets displayed, or does First match wins and "searching" stops. > a case that I saw in the logs intrigued me, so I did a quick lookup of > the > keys() function and read a bit about hashes.. Perl's hash structure > is not > ordered in any way. So, iterating through a hash returns the elements in > undefined order.. Yeah.. Basically the rules gets ordered by postfix_attrib. Postfix_attrib's gets randomly ordered. So eg. all helo_name's might get run first, then all client_name and so forth. > There's not much point in > gathering statistics then! I added the rule_nr to help our support department. Apparently, many of our customers have mailserver-software that acts REALLY weird on 45X errors. Some bounce, some send a mail that looks like a bounce telling the sender that it got a 45X but will keep trying and more odd stuff like that. So to enable our support department to help the customers bypass our rules, they needed to know what rule nailed the client. And i personally like to sometimes grep for which rules nails most. I dont see what you gain by knowing which other rules didnt catch the spammer. > (By the way, this could be related to what you > referred to in the code regarding resetting the hash.) What im referring to, is that "while ... each" doesnt start from the top of the hash the second time the function is called.. "each" only returns last element. Using "keys" at top of the function fixes this oddity. > So, how about rather storing the list of rules in an array, which does > away > with the need for storing the $rulenr, and each array item like $rule > containing: > $rule->{attrib} > $rule->{oper} > $rule->{regexp} Hmm well.. Its seems like a lot of work for a very small result. I dont think ill be coding this anytime soon ;).. But if youre a perl coder, patches are welcome. > Sqlgrey normally indicates at log level 2 whether a > given smtp combination passes awl, gets greylisted, is new, etc., so > indicating that greylisting is bypassed due to no regexp match makes > sense > to me to be at level 2 as well.. My initial reason for putting it on level 3 is the size of our mail-logs. But i guess i dont mind moving it down to level 2, as it probably does makes more sense. > > And, yet another suggestion: it could be useful to include in one or > more of > the documentation locations a quick list of postfix attributes that > can be > used with discrimination.. I was actually about to do that when i made the 1.7.4 release, but then it got all confusing with different versions of Postfix giving different attributes. So i decided not to, and instead made some examples showing the most useful postfix attribs. If you have compiled a list of attribs that work with your postfix version, i can include that in the docs in 1.7.5. - Dan |
From: David L. <da...@la...> - 2007-02-02 22:11:18
|
Dan Faerch wrote: >> (By the way, this could be related to what you >> referred to in the code regarding resetting the hash.) > What im referring to, is that "while ... each" doesnt start from the top > of the hash the second time the function is called.. "each" only returns > last element. Using "keys" at top of the function fixes this oddity. There's nothing odd about this, it's standard Perl behaviour. each() stores its state on the hash, and returns the next element each (heh) time it is called until all the keys have been visited, at which point it returns undef. This can have surprising consequences. If you call each() from different areas in the code upon the same hash: the next call to each() returns the next item in from the hash's point of view, not the caller's point of view. Calling 'keys %hash' in void context is the only way to reset the iterator back to the beginning. Later, David |
From: Dan F. <da...@ha...> - 2007-02-02 22:18:09
|
David Landgren wrote: > There's nothing odd about this, it's standard Perl behaviour. > > each() stores its state on the hash, and returns the next element each > (heh) time it is called until all the keys have been visited, at which > point it returns undef. This can have surprising consequences. If you > call each() from different areas in the code upon the same hash: the > next call to each() returns the next item in from the hash's point of > view, not the caller's point of view. > > Its simply because im not a regular perl coder then. I assumed the function would operate on a copy of the hash, as it is with most other languages. Took me quite a while to figure out what was going on and then fix it with the keys %hash :) |
From: Riaan K. <ria...@gm...> - 2007-02-04 17:53:49
|
On 02/02/07, Dan Faerch <da...@ha...> wrote: > Riaan Kok wrote: > > a case that I saw in the logs intrigued me, so I did a quick lookup of > > the > > keys() function and read a bit about hashes.. Perl's hash structure > > is not > > ordered in any way. So, iterating through a hash returns the elements in > > undefined order.. > Yeah.. Basically the rules gets ordered by postfix_attrib. > Postfix_attrib's gets randomly ordered. > So eg. all helo_name's might get run first, then all client_name and so > forth. > > > There's not much point in > > gathering statistics then! > I added the rule_nr to help our support department. Apparently, many of > our customers have mailserver-software that acts REALLY weird on 45X > errors. Some bounce, some send a mail that looks like a bounce telling > the sender that it got a 45X but will keep trying and more odd stuff > like that. > So to enable our support department to help the customers bypass our > rules, they needed to know what rule nailed the client. Agree, I can see it being useful for this purpose. > And i personally like to sometimes grep for which rules nails most. I > dont see what you gain by knowing which other rules didnt catch the spammer. It's just that, if you're curious about statistics, the random order of the hash list of rules means that the only way of knowing what percentage of connections get nailed by a rule is to have only one rule. For example, say your first rule checks for "unknown" clients, and statistically, 50% of *all* connections gets nailed by this one. Your second rule checks for dialup/dsl accounts, and lets for arguments sake assume that all ISPs reliably give dns names for all accounts. Therefore, anything caught by rule2 would never be caught by rule1 - they're mostly statistically independent. Also, say that rule2 catches 20% of all traffic. Now have a few other rules 3-6 about which you care less to know any statistics about. If the code would step through the rules in order, grep counts will converge to 50%, 20%, (and whatever the rest may be.) The later rules will yield less statistical information because the circumstances of processing will become conditional (but there's still *some* information there). And, if the logs (or support call) say "451 greylisted blah blah 5 minutes blah (rule 5)", you will simply *know* that for that instance, rules 1-4 did not trigger greylisting, and rule 6 were never seen. Now, currently, if the logs were to give that same message, the only information you will have is that rule 5 was triggered. We have no way of knowing what number of rules were checked before this one (and in what order). Because of this, counting the appearance of a rule number in the logs will give you a number with very little meaning. It's a vague indication, at best. Another advantage of knowing the order in which rules will execute is that, in production, you can place cheap and broad rules first, and more expensive rules last (such as that badass rule for catching dynamic IP client hostnames in dyn_fqdn.regexp). If your traffic is sufficient, your CPU might just appreciate it.. > > So, how about rather storing the list of rules in an array, which does > > away > > with the need for storing the $rulenr, and each array item like $rule > > containing: > > $rule->{attrib} > > $rule->{oper} > > $rule->{regexp} > Hmm well.. Its seems like a lot of work for a very small result. I dont > think ill be coding this anytime soon ;).. But if youre a perl coder, > patches are welcome. I agree! This suggestion is mostly about improving the experimentation experience of anybody tweaking a rule list, but it doesn't bug me sufficiently YET to invest the time to get familiar with this part of the code, and then build my suggestion. My todo list looks like a screenwriter's first draft! Although.. (just got a random (he) idea here..) instead of an invasive patch, can't we just sort the hash of rules upon creation by the second layer $rulenr value? Is it possible? (David?) > > And, yet another suggestion: it could be useful to include in one or > > more of > > the documentation locations a quick list of postfix attributes that > > can be > > used with discrimination.. > I was actually about to do that when i made the 1.7.4 release, but then > it got all confusing with different versions of Postfix giving different > attributes. So i decided not to, and instead made some examples showing > the most useful postfix attribs. > If you have compiled a list of attribs that work with your postfix > version, i can include that in the docs in 1.7.5. Ah, that's why I couldn't find such a list easily! Well, I can vouch for the existence of "sender_domain" and "recipient_domain", using Postfix 2.3.6. cheers, Riaan |
From: Dan F. <da...@ha...> - 2007-02-04 20:44:38
|
Riaan Kok wrote: > Although.. (just got a random (he) idea here..) instead of an > invasive patch, can't we just sort the hash of rules upon creation by > the second layer $rulenr value? Is it possible? (David?) > > Youd still need to make a new hash structure, since: client_name -> rule_nr_4 - rule_nr_2 - rule_nr_5 helo_name -> rule_nr_3 - rule_nr_7 - rule_nr_1 sorted would look like: client_name -> rule_nr_2 - rule_nr_4 - rule_nr_5 helo_name -> rule_nr_1 - rule_nr_3 - rule_nr_7 and be applied in that order. - Dan |
From: Riaan K. <ria...@gm...> - 2007-02-17 16:58:21
Attachments:
sqlgrey-1.7.5-discrim_hash2array.patch
|
On 04/02/07, Riaan Kok <ria...@gm...> wrote: > On 02/02/07, Dan Faerch <da...@ha...> wrote: > > Riaan Kok wrote: > > > a case that I saw in the logs intrigued me, so I did a quick lookup of > > > the > > > keys() function and read a bit about hashes.. Perl's hash structure > > > is not > > > ordered in any way. So, iterating through a hash returns the elements in > > > undefined order.. > > > > It's just that, if you're curious about statistics, the random order > of the hash list of rules means that the only way of knowing what > percentage of connections get nailed by a rule is to have only one > rule. > > Another advantage of knowing the order in which rules will execute is > that, in production, you can place cheap and broad rules first, and > more expensive rules last (such as that badass rule for catching > dynamic IP client hostnames in dyn_fqdn.regexp). If your traffic is > sufficient, your CPU might just appreciate it.. > > > > So, how about rather storing the list of rules in an array, which does > > > away > > > with the need for storing the $rulenr, and each array item like $rule > > > containing: > > > $rule->{attrib} > > > $rule->{oper} > > > $rule->{regexp} > > Hmm well.. Its seems like a lot of work for a very small result. I dont > > think ill be coding this anytime soon ;).. But if youre a perl coder, > > patches are welcome. > Okay, I finally got some time to do my proposed change. It wasn't as bad as I thought it might be. The attached patch is against Sqlgrey 1.7.5. I had to do some tricks to get a clean patch, but the code in my sqlgrey has been running fine now for a day (in the order of 100k connections/day). Some day when I get around to playing with my rules and stats I'll post if there's anything interesting. regards, Riaan |
From: Dan F. <da...@ha...> - 2007-02-28 00:33:56
|
Riaan Kok wrote: > bad as I thought it might be. The attached patch is against Sqlgrey > 1.7.5. I had to do some tricks to get a clean patch, but the code in Got around to applying this today, but i cant get it to patch against 1.7.5. Could you check that you send the right patch so lazy-old-me dont have to do it manually? :) -Dan |