You can subscribe to this list here.
2004 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(10) |
Nov
(37) |
Dec
(66) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2005 |
Jan
(52) |
Feb
(136) |
Mar
(65) |
Apr
(38) |
May
(46) |
Jun
(143) |
Jul
(60) |
Aug
(33) |
Sep
(79) |
Oct
(29) |
Nov
(13) |
Dec
(14) |
2006 |
Jan
(25) |
Feb
(26) |
Mar
(4) |
Apr
(9) |
May
(29) |
Jun
|
Jul
(9) |
Aug
(11) |
Sep
(10) |
Oct
(9) |
Nov
(45) |
Dec
(8) |
2007 |
Jan
(82) |
Feb
(61) |
Mar
(39) |
Apr
(7) |
May
(9) |
Jun
(16) |
Jul
(2) |
Aug
(22) |
Sep
(2) |
Oct
|
Nov
(4) |
Dec
(5) |
2008 |
Jan
|
Feb
|
Mar
(5) |
Apr
(2) |
May
(8) |
Jun
|
Jul
(10) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2009 |
Jan
|
Feb
|
Mar
|
Apr
(32) |
May
|
Jun
(7) |
Jul
|
Aug
(38) |
Sep
(3) |
Oct
|
Nov
(4) |
Dec
|
2010 |
Jan
(36) |
Feb
(32) |
Mar
(2) |
Apr
(19) |
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
(6) |
Nov
(8) |
Dec
|
2011 |
Jan
(3) |
Feb
|
Mar
(5) |
Apr
|
May
(2) |
Jun
(1) |
Jul
|
Aug
(3) |
Sep
|
Oct
|
Nov
|
Dec
(6) |
2012 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
2013 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
(1) |
Sep
(1) |
Oct
|
Nov
(6) |
Dec
(10) |
2014 |
Jan
(8) |
Feb
|
Mar
|
Apr
|
May
|
Jun
(3) |
Jul
(34) |
Aug
(6) |
Sep
|
Oct
(1) |
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(18) |
Jul
(13) |
Aug
(30) |
Sep
(4) |
Oct
(1) |
Nov
|
Dec
(4) |
2016 |
Jan
(2) |
Feb
(10) |
Mar
(3) |
Apr
|
May
|
Jun
(11) |
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
2017 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2018 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2019 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Lionel B. <lio...@bo...> - 2009-08-18 13:58:39
|
Dan Faerch a écrit, le 08/18/2009 11:36 AM : > Lionel Bouton wrote: > >>> when running multiple servers against one database server. I needed that >>> only 1 specific server would do the cleanup, >>> >>> >> Why again (I don't remember the reason) ? Reading the following it seems >> this feature brings some problems. >> >> > I need it for 2 reasons. > First, it compensates for not using async by having an sqlgrey > cleanup-only process on the db-server (and having a localprocess trigger > cleanup by faking a connection to sqlgrey). > I believe there are other solutions. See my next mail. > Secondly, it keeps the cleanup-log entries ("spam:") on one box, which > makes it simpler to gather statistics, rather than having to either > remotelog Frankly, I'd use remote logging. syslog-ng can do that efficiently (you can match on the "spam:" to group all of them in a single file). > or gather logs from several servers. I use the spam: entries > for per-domain statistics. I know this is very heavy, but "the-man wants > his stats" ;) > > [...] >>> Also i want to add a "cleanup limiter", that is, the maximum number of >>> >>> >> That's risky. If you set it too low, the cleanups won't be able to keep >> with the new entries. >> > But burst of new entries are actually my problem. If a spam-botnet hits > the servers for an hour or so, a lot more entries will need to be > cleaned out. When this happens during primetime, it causes timeouts (the > "server configuration problem" error from postfix). > Yes, but this can be avoided almost entirely with a better cleandelay algorithm (see my next mail). >> [...] >> > I currently use 60 seconds. My idea was to make a LIMIT high enough to > minimize the timeout scenario, and if the limit is hit, cut cleanup time > You mean the db_cleandelay value ? Then we probably will agree. > down to eg. 30 seconds until limit isnt hit any more. This will give > sqlgrey breathing room to continue to handle users. I could keep doing > '"delay" * 0.5' for every time the limit is hit, thus going from 60 > seconds, to 30, to 15, 7, 4, 2 then "if (delay < 2) Remove_LIMIT clause". > You can avoid that by not doing : delay * 0.5 but something like delay * (last_clean_time / max_allowed_clean_time). And protect against unwanted fluctuations of the cleandelay by bounding the value of (last_clean_time / max_allowed_clean_time). Ie : if you have periodic botnet attack interleaved with calm periods, you don't want the delay to increase too fast but you want it to decrease fast. You may have a temporary problem when a more agressive botnet comes (ie : the current cleandelay will be too high on the first run), but SQLgrey would be ready to handle the load on the next cleanup. It's more clean this way : the LIMIT is a hack. Your real problem is the time it takes to cleanup, just concentrate on minimizing this and everything will be fine. > This will give it a chance to catch up nicely, but if it cant it will > force it self into normal cleanup (ie. full cleanup without the limit > statement). > > upon not hitting the limit anymore, reset cleanup to config value. > > What I don't like with the limit is that it's context dependent : on your servers you must clean tens of thousands of entries per pass so you'll have to set the limit in this range but less powerful installations would simply choke on that. What I'm looking for is a solution that works for everybody without forcing them into obscure tunings. Lionel |
From: Kenneth M. <kt...@ri...> - 2009-08-18 13:02:07
|
On Tue, Aug 18, 2009 at 04:48:59PM +1200, Michal Ludvig wrote: > Michal Ludvig wrote: > > > Eventually we could rework the whole cleaning process. Instead of doing > > (potentially massive) cleanups in fixed time periods there could be a > > minor cleanup on every, say, 1000th call. No need to count the calls, > > simple 'if(int(rand(1000))==0) { .. cleanup .. }' should be sufficient. > > Actually... attached is a patch that implements it. I'm running it with > cleanup_chance=10 (ie 1:10 chance to perform cleanup) on one of my > light-loaded servers and it seems to do the job. So far I've seen 13 > cleanups in 150 greylist checks which is quite on track. > > > BTW How about dropping the async cleanup path completely? Given the > scary comment in sqlgrey.conf I don't believe anyone dares to use it anyway. > > Michal I would rather get the async cleanup path working. This will allow non-locking backends to continue to process incoming mail connections without blocking during the cleanup. A synchronous cleanup process is pretty painful on a busy mail system, and will get more painful the busier it gets. Regards, Ken > Index: sqlgrey > =================================================================== > RCS file: /cvsroot/sqlgrey/sqlgrey/sqlgrey,v > retrieving revision 1.110 > diff -u -r1.110 sqlgrey > --- sqlgrey 17 Aug 2009 13:15:03 -0000 1.110 > +++ sqlgrey 18 Aug 2009 03:13:20 -0000 > @@ -86,6 +86,8 @@ > $dflt{discrimination} = 0; > $dflt{discrimination_add_rulenr} = 0; > > +$dflt{cleanup_chance} = 1000; # Cleanup the DB roughly in 1:1000 calls. > + > $dflt{log} = { # note values here are not used > 'grey' => 2, > 'whitelist' => 2, > @@ -1608,16 +1610,13 @@ > } > > ## Choose the actual cleanup method > -sub start_cleanup { > +sub try_cleanup { > my $self = shift; > > - if ($dflt{dont_db_clean}) { > - $self->mylog('conf', 2, "This host has db-cleaning disabled"); > - return; > - } > + return if (int(rand($dflt{cleanup_chance})) != 0); > > if ($self->{sqlgrey}{clean_method} eq 'sync') { > - $self->cleanup(); > + $self->cleanup(); > } else { > $self->fork_cleanup(); > } > @@ -2309,31 +2308,8 @@ > # we need the value of now() in the database > $self->update_dbnow(); > > - # If !defined last_dbclean, reload value from DB > - if (!defined $self->{sqlgrey}{last_dbclean}) { > - $self->{sqlgrey}{last_dbclean} = $self->getconfig('last_dbclean'); > - > - # if last_dbclean not found in db then write it. > - if (!defined $self->{sqlgrey}{last_dbclean}) { > - # 0 will force a cleanup (unless db_cleandelay is really huge) > - $self->setconfig('last_dbclean',0); > - $self->{sqlgrey}{last_dbclean} = 0; > - } > - } > - > - # Is it time for cleanups ? > - my $current_time = time(); > - if ($current_time > ($self->{sqlgrey}{last_dbclean} + $self->{sqlgrey}{db_cleandelay})) { > - # updateconfig() returns affected_rows > - if ($self->updateconfig('last_dbclean',$current_time,$self->{sqlgrey}{last_dbclean})) { > - # If affected_rows > 0, its my job to clean the db > - $self->{sqlgrey}{last_dbclean} = $current_time; > - $self->start_cleanup(); > - } else { > - #If affected_rows == 0, then someone already cleaned db > - $self->{sqlgrey}{last_dbclean} = undef; #make sqlgrey reload time from db on next pass > - } > - } > + # See if we get a chance to clean up the DB a little > + $self->try_cleanup(); > > # domain scale awl check > if ($self->is_in_domain_awl($sender_domain, $cltid)) { > @@ -2572,7 +2548,6 @@ > awl_age => $dflt{awl_age}, > # How many from match a domain/IP before a switch to domain AWL > domain_level => $dflt{group_domain_level}, > - last_dbclean => undef, # triggers reload from db > db_cleandelay => $dflt{db_cleandelay}, # between table cleanups (seconds) > > db_prepare_cache => $dflt{db_prepare_cache}, > Index: etc/sqlgrey.conf > =================================================================== > RCS file: /cvsroot/sqlgrey/sqlgrey/etc/sqlgrey.conf,v > retrieving revision 1.19 > diff -u -r1.19 sqlgrey.conf > --- etc/sqlgrey.conf 17 Aug 2009 12:43:11 -0000 1.19 > +++ etc/sqlgrey.conf 18 Aug 2009 03:13:21 -0000 > @@ -124,13 +124,14 @@ > # db_prepare_cache = 0 # use prepared statements cache > # BEWARE: memory leaks have been reported > # when it is active > -# db_cleandelay = 1800 # in seconds, how much time between database cleanups > # clean_method = sync # sync : cleanup is done in the main process, > # delaying other operations > # async: cleanup is done in a forked process, > # it won't delay mail processing > # BEWARE: lockups have been reported > # and are still investigated > +# cleanup_chance = 1000 # How often should DB cleanup be performed? > + # By default once in every 1000 calls. > > ## Database clustering (for advanced setups) > # > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Sqlgrey-users mailing list > Sql...@li... > https://lists.sourceforge.net/lists/listinfo/sqlgrey-users |
From: Michael S. <Mic...@lr...> - 2009-08-18 10:45:38
|
Hi, if you are working on a new version, you could include this little change to tune subroutine update_dbnow: my $old_time = 0; # used for faster dbnow my $new_time = 0; --- update_dbnow.old 2009-08-18 12:25:46.618399188 +0200 +++ update_dbnow.new 2009-08-18 12:40:28.486024736 +0200 @@ -1,19 +1,27 @@ sub update_dbnow($) { my $self = shift; # no dbnow needed for SQLite return if $self->SQLite(); + # if called in the same second, no update of dbnow needed + $new_time = time; + if ($new_time == $old_time) { + return; + } else { + $old_time = $new_time; + } + my $result; my $sth = $self->prepare_cached('SELECT now()'); if (!defined $sth or !$sth->execute()) { $self->db_unavailable(); $self->mylog('dbaccess', 0, "error: couldn't get now() from DB: $DBI::errstr"); return if defined $self->{sqlgrey}{dbnow}; # defined: we don't update the value $self->{sqlgrey}{dbnow} = '0'; } else { $self->db_available(); $result = $sth->fetchall_arrayref(); $self->{sqlgrey}{dbnow} = $self->quote($result->[0][0]); } } Regards, Michael Storz |
From: Michal L. <ml...@lo...> - 2009-08-18 09:56:59
|
Lionel Bouton wrote: > Michal Ludvig a écrit, le 08/18/2009 03:40 AM : >> (#2) which means last_dbclean is updated even if cleanup >> failed for some reason. That's clearly a wrong order of actions. > > Not if you don't change the original design which imply that the cleanup > is distributed amongst SQLgrey processes. I understand that, however it shouldn't record "done" before it's really done. Because it may record 'done' and then keep crashing while doing the job and never actually complete it. And the record would still say 'done'. > Hum, random behaviour is difficult to debug... Not really in this case, is it? ;-) For debugging you can turn random off and call cleanup for each email if you like. Random behaviour is often evil but here it provides a simple way to achieve something that's not that critical. I think it's not inappropriate. > What's wrong with setting a low db_cleandelay ? How low? On very busy servers even 5 minutes can be too much while on many others 5 minutes interval may call cleanup more often than emails happen to arrive ;-) My "adaptive interval" seems to fit more sizes without much tweaking. >> The advantage here is that the cleanup will take place at intervals >> relative to the load of the given server not in fixed time periods that >> may be difficult to set right. Each of these cleanups should be pretty >> fast, affecting only a few rows, > > No, Yes, > because the amount of work doesn't depend on the current load but on > the load present around max_connect_age ago assuming the load doesn't vary too much day to day it does work. I don't operate any really big mail servers, the busiest one I have access to does between 7k and 12k emails per day. Which is still an acceptable variation for the method I propose. >> therefore the problem with multiple >> sqlgrey processes attempting to cleanup at the same time should go away >> as well (firstly because they're not likely to trigger cleanup at the >> very same moment > > This could be implemented with a small random offset relative to the > db_cleandelay if it's a problem. But is it really ? To be honest I don't know. I actually don't experience the problem I'm trying to solve here. Just wanted to bring up a new approach that may eventually help. Take it or leave it, I don't mind either way. Michal -- * http://smtp-cli.logix.cz - the ultimate command line smtp client |
From: Lionel B. <lio...@bo...> - 2009-08-18 06:51:29
|
Michal Ludvig a écrit, le 08/18/2009 06:48 AM : > Michal Ludvig wrote: > > >> Eventually we could rework the whole cleaning process. Instead of doing >> (potentially massive) cleanups in fixed time periods there could be a >> minor cleanup on every, say, 1000th call. No need to count the calls, >> simple 'if(int(rand(1000))==0) { .. cleanup .. }' should be sufficient. >> > > Actually... attached is a patch that implements it. I'm running it with > cleanup_chance=10 (ie 1:10 chance to perform cleanup) on one of my > light-loaded servers and it seems to do the job. So far I've seen 13 > cleanups in 150 greylist checks which is quite on track. > > > BTW How about dropping the async cleanup path completely? In fact it would be the less troublesome way to do it : async cleaning would prevent any blocking in SQLgrey. Obviously it would only work with InnoDB, PostgreSQL or other DB with concurrent writes support. It might be time to test this codepath again if there's really a problem with cleanups. It should work in theory but I'm not sure where the bug is (might be in DBI or DBD drivers)... Lionel |
From: Lionel B. <lio...@bo...> - 2009-08-18 06:48:02
|
Michal Ludvig a écrit, le 08/18/2009 03:40 AM : > Dan Faerch wrote: > > >> There is a bug ill be hunting this week. Its an undocumented feature for >> when running multiple servers against one database server. I needed that >> only 1 specific server would do the cleanup, so i added (i think in >> 1.7.5) the possibillity to add a file called "dont_db_clean" in the >> sqlgrey config directory. Ive recently noticed that somehow i run into a >> deadlock where nobody cleans the db, but the "last_clean" timestamp >> still gets updated. resultning in nothing getting cleaned. >> > > In smtpd_access_policy() it does: > > # Is it time for cleanups ? > if ($current_time > ($self->{sqlgrey}{last_dbclean} + > $self->{sqlgrey}{db_cleandelay})) { > # updateconfig() returns affected_rows > 1==> if ($self->updateconfig('last_dbclean', > $current_time,$self->{sqlgrey}{last_dbclean})) { > # If affected_rows > 0, its my job to clean the db > $self->{sqlgrey}{last_dbclean} = $current_time; > 2==> $self->start_cleanup(); > ... > > > I.e. first it updates the last_dbclean value (#1) and only then calls > start_cleanup() It's a design decision : it's meant to prevent several SQLgrey processes to launch the cleanup at the same time (it's still possible but the time window is very short). > (#2) which means last_dbclean is updated even if cleanup > failed for some reason. That's clearly a wrong order of actions. > Not if you don't change the original design which imply that the cleanup is distributed amongst SQLgrey processes. > Perhaps we could have a "cleanup_in_progress" config option that could > be used as a semaphore. Set it atomically with: > $rows_affected = $dbi->do(" > UPDATE config > SET cleanup_in_progress=NOW() > WHERE cleanup_in_progress IS NULL"); > Then if $rows_affected == 1 we succeeded and can go ahead with cleanup. > If $rows_affected == 0 someone else started cleanup in the meantime And if this SQLgrey process crash later on, cleanup is toasted : we can't afford that. > - no > problem, we'll go ahead with other processing. > > > Eventually we could rework the whole cleaning process. Instead of doing > (potentially massive) cleanups in fixed time periods there could be a > minor cleanup on every, say, 1000th call. No need to count the calls, > simple 'if(int(rand(1000))==0) { .. cleanup .. }' should be sufficient. > Perhaps have three independent rng calls [cleanup_connect(), > cleanup_from_awl(), cleanup_domain_awl()] to spread the load over time > even further. > > Hum, random behaviour is difficult to debug... What's wrong with setting a low db_cleandelay ? > The advantage here is that the cleanup will take place at intervals > relative to the load of the given server not in fixed time periods that > may be difficult to set right. Each of these cleanups should be pretty > fast, affecting only a few rows, No, because the amount of work doesn't depend on the current load but on the load present around max_connect_age ago (and awl_age ago in a smaller proportion). > therefore the problem with multiple > sqlgrey processes attempting to cleanup at the same time should go away > as well (firstly because they're not likely to trigger cleanup at the > very same moment This could be implemented with a small random offset relative to the db_cleandelay if it's a problem. But is it really ? > and secondly because the time spent with tables locked > should be very short. As said in another mail, if you have problems with tables locked, change your backend. Lionel |
From: Michal L. <ml...@lo...> - 2009-08-18 04:48:27
|
Michal Ludvig wrote: > Eventually we could rework the whole cleaning process. Instead of doing > (potentially massive) cleanups in fixed time periods there could be a > minor cleanup on every, say, 1000th call. No need to count the calls, > simple 'if(int(rand(1000))==0) { .. cleanup .. }' should be sufficient. Actually... attached is a patch that implements it. I'm running it with cleanup_chance=10 (ie 1:10 chance to perform cleanup) on one of my light-loaded servers and it seems to do the job. So far I've seen 13 cleanups in 150 greylist checks which is quite on track. BTW How about dropping the async cleanup path completely? Given the scary comment in sqlgrey.conf I don't believe anyone dares to use it anyway. Michal |
From: Michal L. <ml...@lo...> - 2009-08-18 01:58:06
|
Dan Faerch wrote: > There is a bug ill be hunting this week. Its an undocumented feature for > when running multiple servers against one database server. I needed that > only 1 specific server would do the cleanup, so i added (i think in > 1.7.5) the possibillity to add a file called "dont_db_clean" in the > sqlgrey config directory. Ive recently noticed that somehow i run into a > deadlock where nobody cleans the db, but the "last_clean" timestamp > still gets updated. resultning in nothing getting cleaned. In smtpd_access_policy() it does: # Is it time for cleanups ? if ($current_time > ($self->{sqlgrey}{last_dbclean} + $self->{sqlgrey}{db_cleandelay})) { # updateconfig() returns affected_rows 1==> if ($self->updateconfig('last_dbclean', $current_time,$self->{sqlgrey}{last_dbclean})) { # If affected_rows > 0, its my job to clean the db $self->{sqlgrey}{last_dbclean} = $current_time; 2==> $self->start_cleanup(); ... I.e. first it updates the last_dbclean value (#1) and only then calls start_cleanup() (#2) which means last_dbclean is updated even if cleanup failed for some reason. That's clearly a wrong order of actions. Perhaps we could have a "cleanup_in_progress" config option that could be used as a semaphore. Set it atomically with: $rows_affected = $dbi->do(" UPDATE config SET cleanup_in_progress=NOW() WHERE cleanup_in_progress IS NULL"); Then if $rows_affected == 1 we succeeded and can go ahead with cleanup. If $rows_affected == 0 someone else started cleanup in the meantime - no problem, we'll go ahead with other processing. Eventually we could rework the whole cleaning process. Instead of doing (potentially massive) cleanups in fixed time periods there could be a minor cleanup on every, say, 1000th call. No need to count the calls, simple 'if(int(rand(1000))==0) { .. cleanup .. }' should be sufficient. Perhaps have three independent rng calls [cleanup_connect(), cleanup_from_awl(), cleanup_domain_awl()] to spread the load over time even further. The advantage here is that the cleanup will take place at intervals relative to the load of the given server not in fixed time periods that may be difficult to set right. Each of these cleanups should be pretty fast, affecting only a few rows, therefore the problem with multiple sqlgrey processes attempting to cleanup at the same time should go away as well (firstly because they're not likely to trigger cleanup at the very same moment and secondly because the time spent with tables locked should be very short. Then we could get along without last_dbcleanup altogether. Thoughts? Michal -- * http://smtp-cli.logix.cz - the ultimate command line smtp client |
From: Lionel B. <lio...@bo...> - 2009-08-18 00:07:37
|
Dan Faerch a écrit, le 08/18/2009 12:16 AM : >> So far I fixed some issues with IPv6 handling - namely fixed >> 'classc'/'smart' trimming of IPv6 addresses and added support for v6 >> addrs for clients_ip_whitelist[.local]. >> >> > Great.. I forgot about that one.. > >> I bet there must be some other work as well that popped up in the two >> years since 1.7.6 release. Anyone? >> >> > Ive got 3 things i want to fix: > > There is a bug ill be hunting this week. Its an undocumented feature for > when running multiple servers against one database server. I needed that > only 1 specific server would do the cleanup, Why again (I don't remember the reason) ? Reading the following it seems this feature brings some problems. > so i added (i think in > 1.7.5) the possibillity to add a file called "dont_db_clean" in the > sqlgrey config directory. Ive recently noticed that somehow i run into a > deadlock where nobody cleans the db, but the "last_clean" timestamp > still gets updated. resultning in nothing getting cleaned. > The result is a very, very large connect table and from AWL. Of course > this means that theres something wrong, if an sqlgrey updates the > timestamp when its not cleaning. > > Due to this, i also found a couple of minor performance boosters i want > to implement. I dont remember of the top of my head, where excactly, but > it has something to do with the SELECT in connect table and/or the > from_awl not using WHERE "timestmp < max_age" I'm not sure what you are referring to : they check the last_seen timestamp. > (pseudo), making it search > all records instead of just limiting to the onced within the max_age > window. It can make a small difference to people with long cleanup > intervals, and in my case a huge difference when cleanup failed for a > week without me noticing ;). > I've no explanation for the performance difference. The last_seen of awl tables and first_seen of connect are even indexed to speed up the queries. > Also i want to add a "cleanup limiter", that is, the maximum number of > elements to cleanup in one run. Sometimes, when running very large > cleanup operations, mysql will simply hang with the table locked. (ive > expirenced this on several mysql 4 versions- none of my mysql's are v.5 > for sqlgrey.). That's risky. If you set it too low, the cleanups won't be able to keep with the new entries. I think we should avoid any configuration parameter that could allow the users shooting themselves in the feet. To solve your problem, why don't you set a lower db_cleandelay value ? This would have the same effect. By default it's 30 minutes. You could set it to 30 seconds. On a highly loaded server it should be ok. Anyway, I don't think you would block other SQLgrey processes if you used InnoDB (or any other backend supporting concurrent writes). > Ive tried running cleanup on a connect table with 53mil. > records and after 45 minutes i had to kill it. And for 45 minutes, none > of the sqlgrey's could do selects to the connect tabled, which makes > mailusers quite unhappy ;).. > Don't use MyISAM. Seriously, I mean it. With your amount of traffic, you'll have to work against it (you are doing it right now). It doesn't support concurrent writes (including deletes) and locks the whole table. Even if you solve your cleanup problems the next bottleneck will be the actual INSERTs and UPDATEs from each of your SQLgrey's instances battling for the locks. At this point you *will* be utterly toasted. In fact we should explicitly add this in the README.PERF. > I need to limit it be able to add a LIMIT to the cleanup statement, eg. > LIMIT 50000, Don't do this : it will bite you later and harder. Lionel |
From: Lionel B. <lio...@bo...> - 2009-08-17 23:45:03
|
Karl O. Pinc a écrit, le 08/18/2009 12:03 AM : > The "right" thing to do in the case of PostgreSQL is: > - check that the message doesn't match a whitelist (no), > - begin a transaction (BEGIN;), > - ensure the processes don't mess with each other > (SET TRANSACTION ISOLATION LEVEL SERIALIZABLE READ WRITE;), > - check if it is already in the connect table > -- if so, abort the transaction (ROLLBACK;) and done > - try to create an entry in the "connect" table. > - commit the transaction (COMMIT;) > > This could work with MySQL with minor modifications (I'm not sure about the transaction isolation level support, we might have to lock the whole table explicitly). But I don't think it's a good path to follow for SQLgrey. I'll argue that using transactions doesn't solve the problem in the most efficient way. Transactions have a higher cost than simply handling an SQL error (which happen quite rarely compared to all the SQL queries generated by SQLgrey). For maintenance reasons, I'd advise against having different code paths for different databases. There are already a few of them and I had problems with them at the time they were implemented (complex code is always more difficult to maintain). So the cost/benefit ratio isn't in favor of transactions. Although they are the cleanest way from a theoretical point of few, in practice they would bring a lot of pain and SQLgrey can work just as well without them. Lionel |
From: Karl O. P. <ko...@me...> - 2009-08-17 22:03:45
|
On 08/17/2009 11:48:25 AM, Lionel Bouton wrote: > We should fix the case where a spammer sends a message to all MX of a > domain at the same time. The problem is that several SQLgrey > instances > sharing the same database do the following (simplifying a bit) : > - check that the message doesn't match a whitelist (no), > - check if it is already in the connect table (it isn't), > - try to create an entry in the "connect" table. > Only one instance can complete the last step, all others get an SQL > error (and it's completely normal). > What I'd like to do is simply to ignore errors occurring when SQLgrey > tries to write an entry in the connect table. The "right" thing to do in the case of PostgreSQL is: - check that the message doesn't match a whitelist (no), - begin a transaction (BEGIN;), - ensure the processes don't mess with each other (SET TRANSACTION ISOLATION LEVEL SERIALIZABLE READ WRITE;), - check if it is already in the connect table -- if so, abort the transaction (ROLLBACK;) and done - try to create an entry in the "connect" table. - commit the transaction (COMMIT;) This approach uses the database features designed for just such a situation. I couldn't say whether this is the best choice for sqlgrey. Regards, Karl <ko...@me...> Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein |
From: Lionel B. <lio...@bo...> - 2009-08-17 17:06:31
|
Michal Ludvig a écrit, le 08/17/2009 03:47 PM : > Hi guys, > > Two years after 1.7.6 it's a high time to put out a new release of > SQLgrey. I wonder if there are any patches floating around that you'd > like to have included in the upcoming release? > > So far I fixed some issues with IPv6 handling - namely fixed > 'classc'/'smart' trimming of IPv6 addresses and added support for v6 > addrs for clients_ip_whitelist[.local]. > > I bet there must be some other work as well that popped up in the two > years since 1.7.6 release. Anyone? > We should fix the case where a spammer sends a message to all MX of a domain at the same time. The problem is that several SQLgrey instances sharing the same database do the following (simplifying a bit) : - check that the message doesn't match a whitelist (no), - check if it is already in the connect table (it isn't), - try to create an entry in the "connect" table. Only one instance can complete the last step, all others get an SQL error (and it's completely normal). SQLgrey doesn't know about this case and thinks that there's a connection problem with the database. It then : - sends an email to the admin, - reconnects to the database, - later, when another action is requested and the database reacts correctly, it sends an email saying all is OK with the db again. I've seen a patch around that tries to solve the problem but IIRC it's MySQL specific so it won't do. What I'd like to do is simply to ignore errors occurring when SQLgrey tries to write an entry in the connect table. If the database is really down or misbehaving, SQLgrey has plenty other opportunities to detect this and warn the admin (reading from the database on the next email being processed for example). > And one more thing - are there any major issues that should be fixed for > the next release? Something that's holding us back from calling the > 1.7.x codebase "stable"? I ask because I'm tempted to fork a new stable > branch for 1.8.x versions. > I agree with you on the 1.8.x naming. 1.7.6 is battle-tested for a long time now. Lionel |
From: Ralph S. <sql...@se...> - 2009-08-17 14:58:30
|
Michal Ludvig wrote: > Speak up! I don't have any specific requests, but I want to extend a brief "thank you" to you for still maintaining SQLgrey. -R |
From: Michal L. <ml...@lo...> - 2009-08-17 14:18:02
|
Hi guys, Two years after 1.7.6 it's a high time to put out a new release of SQLgrey. I wonder if there are any patches floating around that you'd like to have included in the upcoming release? So far I fixed some issues with IPv6 handling - namely fixed 'classc'/'smart' trimming of IPv6 addresses and added support for v6 addrs for clients_ip_whitelist[.local]. I bet there must be some other work as well that popped up in the two years since 1.7.6 release. Anyone? And one more thing - are there any major issues that should be fixed for the next release? Something that's holding us back from calling the 1.7.x codebase "stable"? I ask because I'm tempted to fork a new stable branch for 1.8.x versions. Speak up! Michal -- * http://smtp-cli.logix.cz - the ultimate command line smtp client |
From: David D. <da...@en...> - 2009-06-12 13:24:29
|
Hi, We're running sqlgrey 1.7.6 but we're having problems with some people not getting their mail through. Lots of this in the log: 450 4.7.1 <edited>: Recipient address rejected: Throttling too many connections from new source - Try again later. This happens repeatedly for the same source, sender & recipient. I have the following in sqlgrey.conf but it hasn't made any difference: connect_src_throttle = 0 Has anyone else seen this behaviour or know what's going wrong? -- David Derrick Entanet International Ltd T: 0870 224 3494 W: http://www.enta.net |
From: Len C. <lc...@Go...> - 2009-06-01 21:04:42
|
---------- Original Message ---------------------------------- From: Lionel Bouton <lio...@bo...> Date: Mon, 01 Jun 2009 19:38:38 +0200 >Len Conrad a écrit, le 06/01/2009 05:49 PM : >> /usr/local/bin/sqlgrey-stats.sh >> >> GREY NEW: 70491 >> GREY EARLY RECON: 790 >> GREY RECON OK: 954 >> GREY DOMAWL: 7805 >> GREY FROM AWL: 795 >> WHITELIST: 191 >> SPAM: 33515 >> SMTPD GREYLISTED: 4015 >> >> ... for about 11 hours Monday morning. >> > >I don't know sqlgrey-stats.sh so I'm not sure exactly what it should >report. That said the categories seem to match SQLgrey's own log >categories so I assume it makes sums of log lines matching these. > >> I find the smtpd "greylisted for 5 minutes" rejects to be extremely low compared to what I see with postgrey. >> >> For 70K "new/never-seen" triplet, why aren't there 70K smtpd rejects? >> > >Why do you think there aren't ? Nothing above supports this claim: for >each "grey new" and "grey early recon" line the server should return a >temporary reject. If it doesn't, then it's a SMTP server configuration >issue. > ok, found my misconfig: changed these from "delay" to: reject_first_attempt = immed reject_early_reconnect = immed "greylisted" smtpd log lines are streaming up the screen now. thanks Len |
From: Roddie H. <ro...@kr...> - 2009-06-01 19:42:29
|
Here's the rest of the script for anyone else who wants to try it: >> NEW=`egrep -ic "sqlgrey: grey: new:" /var/log/mx1.hctc.net/maillog` >> EARLY=`egrep -ic "sqlgrey: grey: early reconnect:" /var/log/mx1.hctc.net/maillog` >> RECON=`egrep -ic "sqlgrey: grey: reconnect ok:" /var/log/mx1.hctc.net/maillog` >> DOMAWL=`egrep -ic "sqlgrey: grey: domain awl match" /var/log/mx1.hctc.net/maillog` >> FRMAWL=`egrep -ic "sqlgrey: grey: from awl:" /var/log/mx1.hctc.net/maillog` >> WHITE=`egrep -ic "sqlgrey: whitelist:" /var/log/mx1.hctc.net/maillog` >> SPAM=`egrep -i "sqlgrey: spam:" /var/log/mx1.hctc.net/maillog |awk '{print $7}'|sort -n|uniq -i|wc -l` >> GLIST=`egrep -ic "Greylisted for 5 minutes" /var/log/mx1.hctc.net/maillog` krweb:/root# cat sqlgrey-stats.sh NEW=`egrep -ic "sqlgrey: grey: new:" /var/log/maillog` EARLY=`egrep -ic "sqlgrey: grey: early reconnect:" /var/log/maillog` RECON=`egrep -ic "sqlgrey: grey: reconnect ok:" /var/log/maillog` DOMAWL=`egrep -ic "sqlgrey: grey: domain awl match" /var/log/maillog` FRMAWL=`egrep -ic "sqlgrey: grey: from awl:" /var/log/maillog` WHITE=`egrep -ic "sqlgrey: whitelist:" /var/log/maillog` SPAM=`egrep -i "sqlgrey: spam:" /var/log/maillog |awk '{print $7}'|sort -n|uniq -i|wc -l` GLIST=`egrep -ic "Greylisted for 5 minutes" /var/log/maillog` echo "GREY NEW:" $NEW echo "GREY EARLY RECON:" $EARLY echo "GREY RECON OK:" $RECON echo "GREY DOMAWL:" $DOMAWL echo "GREY FROM AWL:" $FRMAWL echo "WHITELIST:" $WHITE echo "SPAM:" $SPAM echo "SMTPD GREYLISTED:" $GLIST Roddie |
From: Lionel B. <lio...@bo...> - 2009-06-01 18:58:13
|
Len Conrad a écrit, le 06/01/2009 08:05 PM : >> Len Conrad a écrit, le 06/01/2009 05:49 PM : >> >>> /usr/local/bin/sqlgrey-stats.sh >>> >>> GREY NEW: 70491 >>> GREY EARLY RECON: 790 >>> GREY RECON OK: 954 >>> GREY DOMAWL: 7805 >>> GREY FROM AWL: 795 >>> WHITELIST: 191 >>> SPAM: 33515 >>> SMTPD GREYLISTED: 4015 >>> >>> ... for about 11 hours Monday morning. >>> >>> >> I don't know sqlgrey-stats.sh so I'm not sure exactly what it should >> report. That said the categories seem to match SQLgrey's own log >> categories so I assume it makes sums of log lines matching these. >> >> >>> I find the smtpd "greylisted for 5 minutes" rejects to be extremely low compared to what I see with postgrey. >>> >>> For 70K "new/never-seen" triplet, why aren't there 70K smtpd rejects? >>> >>> >> Why do you think there aren't ? Nothing above supports this claim: for >> each "grey new" and "grey early recon" line the server should return a >> temporary reject. If it doesn't, then it's a SMTP server configuration >> issue. >> > > NEW=`egrep -ic "sqlgrey: grey: new:" /var/log/mx1.hctc.net/maillog` > EARLY=`egrep -ic "sqlgrey: grey: early reconnect:" /var/log/mx1.hctc.net/maillog` > RECON=`egrep -ic "sqlgrey: grey: reconnect ok:" /var/log/mx1.hctc.net/maillog` > DOMAWL=`egrep -ic "sqlgrey: grey: domain awl match" /var/log/mx1.hctc.net/maillog` > FRMAWL=`egrep -ic "sqlgrey: grey: from awl:" /var/log/mx1.hctc.net/maillog` > WHITE=`egrep -ic "sqlgrey: whitelist:" /var/log/mx1.hctc.net/maillog` > SPAM=`egrep -i "sqlgrey: spam:" /var/log/mx1.hctc.net/maillog |awk '{print $7}'|sort -n|uniq -i|wc -l` > GLIST=`egrep -ic "Greylisted for 5 minutes" /var/log/mx1.hctc.net/maillog` > GLIST counts the rejects where the SMTP server based it's decision on SQLgrey's result. It doesn't count mails rejected because they have been rejected by both SQLgrey and another rule in relevant smtpd_*_restriction configuration entries that takes precedence (probably because it does a permanent reject instead of the temporary one SQLgrey tells Postfix to return). This is expected behavior if you use RBLs (especially if they cover ranges of residential ip addresses). If you look into your logs you should see that the messages triggering the "grey new" logs are permanently refused a short time after in Postfix logs. Lionel |
From: Len C. <lc...@Go...> - 2009-06-01 18:25:38
|
>Len Conrad a écrit, le 06/01/2009 05:49 PM : >> /usr/local/bin/sqlgrey-stats.sh >> >> GREY NEW: 70491 >> GREY EARLY RECON: 790 >> GREY RECON OK: 954 >> GREY DOMAWL: 7805 >> GREY FROM AWL: 795 >> WHITELIST: 191 >> SPAM: 33515 >> SMTPD GREYLISTED: 4015 >> >> ... for about 11 hours Monday morning. >> > >I don't know sqlgrey-stats.sh so I'm not sure exactly what it should >report. That said the categories seem to match SQLgrey's own log >categories so I assume it makes sums of log lines matching these. > >> I find the smtpd "greylisted for 5 minutes" rejects to be extremely low compared to what I see with postgrey. >> >> For 70K "new/never-seen" triplet, why aren't there 70K smtpd rejects? >> > >Why do you think there aren't ? Nothing above supports this claim: for >each "grey new" and "grey early recon" line the server should return a >temporary reject. If it doesn't, then it's a SMTP server configuration >issue. NEW=`egrep -ic "sqlgrey: grey: new:" /var/log/mx1.hctc.net/maillog` EARLY=`egrep -ic "sqlgrey: grey: early reconnect:" /var/log/mx1.hctc.net/maillog` RECON=`egrep -ic "sqlgrey: grey: reconnect ok:" /var/log/mx1.hctc.net/maillog` DOMAWL=`egrep -ic "sqlgrey: grey: domain awl match" /var/log/mx1.hctc.net/maillog` FRMAWL=`egrep -ic "sqlgrey: grey: from awl:" /var/log/mx1.hctc.net/maillog` WHITE=`egrep -ic "sqlgrey: whitelist:" /var/log/mx1.hctc.net/maillog` SPAM=`egrep -i "sqlgrey: spam:" /var/log/mx1.hctc.net/maillog |awk '{print $7}'|sort -n|uniq -i|wc -l` GLIST=`egrep -ic "Greylisted for 5 minutes" /var/log/mx1.hctc.net/maillog` Len |
From: Lionel B. <lio...@bo...> - 2009-06-01 18:05:03
|
Len Conrad a écrit, le 06/01/2009 05:49 PM : > /usr/local/bin/sqlgrey-stats.sh > > GREY NEW: 70491 > GREY EARLY RECON: 790 > GREY RECON OK: 954 > GREY DOMAWL: 7805 > GREY FROM AWL: 795 > WHITELIST: 191 > SPAM: 33515 > SMTPD GREYLISTED: 4015 > > ... for about 11 hours Monday morning. > I don't know sqlgrey-stats.sh so I'm not sure exactly what it should report. That said the categories seem to match SQLgrey's own log categories so I assume it makes sums of log lines matching these. > I find the smtpd "greylisted for 5 minutes" rejects to be extremely low compared to what I see with postgrey. > > For 70K "new/never-seen" triplet, why aren't there 70K smtpd rejects? > Why do you think there aren't ? Nothing above supports this claim: for each "grey new" and "grey early recon" line the server should return a temporary reject. If it doesn't, then it's a SMTP server configuration issue. Best regards, Lionel |
From: Len C. <lc...@Go...> - 2009-06-01 16:36:02
|
/usr/local/bin/sqlgrey-stats.sh GREY NEW: 70491 GREY EARLY RECON: 790 GREY RECON OK: 954 GREY DOMAWL: 7805 GREY FROM AWL: 795 WHITELIST: 191 SPAM: 33515 SMTPD GREYLISTED: 4015 ... for about 11 hours Monday morning. I find the smtpd "greylisted for 5 minutes" rejects to be extremely low compared to what I see with postgrey. For 70K "new/never-seen" triplet, why aren't there 70K smtpd rejects? Len |
From: David L. <da...@la...> - 2009-04-21 22:07:42
|
Karl O. Pinc wrote: > On 04/21/2009 08:23:29 AM, Jeff Grossman wrote: >> My Sqlgrey daemon died last night and all of my mail was temp failed. >> >> What do people recommend for making sure the daemon stays running? > > I had a problem like this until I upgraded to the latest > version available, which IIRC is a beta but I think only > because nobody's gotten around to releasing it. > > If that does not fix it you're probably better off > solving the problem at it's source rather than papering > over the symptom. Easier said than done. I have sqlgrey die intermittently (like, 1 or 2 times per year) with an obscure internals error. I know my perl internals pretty well but I have only a vague idea where to start looking. A better use of my time was to write a watchdog. Here's my version (relies on FreeBSD's sockstat): #! /usr/local/bin/perl -w my $alive = 0 + `/usr/bin/sockstat -l|/usr/bin/grep 127.0.0.1:2501|/usr/bin/wc -l`; exit if $alive; system( "/usr/local/etc/rc.d/sqlgrey restart" ); system( qq{mail -s 'sqlgrey restarted' david < /dev/null } ); __END__ Hmm, that should probably have been written in shell, come to think of it :) David |
From: Karl O. P. <ko...@me...> - 2009-04-21 14:30:45
|
On 04/21/2009 08:23:29 AM, Jeff Grossman wrote: > My Sqlgrey daemon died last night and all of my mail was temp failed. > > What do people recommend for making sure the daemon stays running? I had a problem like this until I upgraded to the latest version available, which IIRC is a beta but I think only because nobody's gotten around to releasing it. If that does not fix it you're probably better off solving the problem at it's source rather than papering over the symptom. Karl <ko...@me...> Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein |
From: Klaus A. S. <kse...@gm...> - 2009-04-21 14:16:44
|
Brian Collins wrote: > I had the same occasional problem and created a "nanny" shell script to > monitor. I run it every 10 minutes. This is not foolproof but it works for > me. I'm sure someone can make something better. :) Other possibilities are running SQLgrey under init, upstart, runit or dæmontools. They will all restart SQLgrey as soon as it fails. Cheers, -- Klaus Alexander Seistrup https://twitter.com/kseistrup |
From: Brian C. <li...@ne...> - 2009-04-21 14:09:49
|
> My Sqlgrey daemon died last night and all of my mail was temp failed. > What do people recommend for making sure the daemon stays running? Is > there a watchdog type program I can run to verify it is running, and if > not, it will automatically restart it for me? I had the same occasional problem and created a "nanny" shell script to monitor. I run it every 10 minutes. This is not foolproof but it works for me. I'm sure someone can make something better. :) #!/bin/bash /sbin/service sqlgrey status > /tmp/sqlgreystat.txt sqlgreystat=`awk '{print $5}' /tmp/sqlgreystat.txt` case "$sqlgreystat" in running...) exit 0 ;; *) /sbin/service sqlgrey restart exit 0 ;; esac |