From: Rene J. <rg...@ba...> - 2004-12-16 11:26:04
|
Hi there. I'm working at an ISP, currently using Postgrey, but looking for an alternative due to the limitations in Postgrey, cleaning db takes 5-10 minutes each night, in which Postfix is rejecting legitimate mail with 450 errors. I've seen you expressing worries about the current size of SQLgrey, a place to start would be putting DB creation/alteration into a seperate perl-script, it's only done once, and running the seperate perlscript on installation/upgrade doesn't take much effort. Not much performance gained, but it sure would be prettier :) We're currently receiving around 1.2 million mails a day, which Postgrey handles fine. But i like the idea about the auto-whitelist in SQLgrey, and the possibility of getting rid of the 5-10 minutes downtime a day. The ongoing cleaning that SQLgrey does (3 deletes pr. mail iirc) would mean 3+ million extra queries a day, which is quite a lot. Cleaning could be done in 3 (or more) ways: - Timed interval, eg. once every 2 hours. - Interval based on queries, eg. once every 2000 queries - Cleaning from cron, calling sqlgrey with --db-clean (or seperate perlscript (but that would mean more code to maintain)), this is in my eyes a pretty flexible way, and non-blocking without rewriting SQLgrey to use Net::Server::Prefork, and then just tell SQLgrey that cleaning is done externally, and it doesn't have to worry about it. Default should be one of the other ways, to avoid people who doesn't RTFM complaining about large databases. Limiting the number of queries is always good, haven't really looked much into the queries, and i'm sure you're doing what you can to limit them. The missing indexes could also become a problem as the connect-table probably will contain 500,000-1,000,000 rows, which is the current values for Postgrey, and the from_awl contains around 700,000 records. These values will probably be smaller when using autowhitelisting, but the tables will still need indexes to avoid full tablescans. I've been close to testing it in production a couple of times, but chickened out each time as i'm in the process of moving and therefore is unable to keep an eye on it due to lack of IP in the new apartment. --=20 -Ren=E9 |
From: Lionel B. <lio...@bo...> - 2004-12-16 13:45:50
|
Rene Joergensen wrote the following on 12/16/04 12:25 : >Hi there. > >I'm working at an ISP, currently using Postgrey, but looking for an >alternative due to the limitations in Postgrey, cleaning db takes 5-10 >minutes each night, in which Postfix is rejecting legitimate mail with >450 errors. > >I've seen you expressing worries about the current size of SQLgrey, a >place to start would be putting DB creation/alteration into a seperate >perl-script, it's only done once, and running the seperate perlscript >on installation/upgrade doesn't take much effort. Not much performance >gained, but it sure would be prettier :) > > As SQLgrey is required to verify the database version at startup (and if you look at the code closely, you'll see that I paid in messy code the mistake of not storing the layout version from the start) I'm inclined to let SQLgrey handle the upgrade too. What I was planning is to move out old update code. When SQLgrey 1.6 will be out, the database layout will probably at version 2 instead of 1 today in 1.4. I will then drop support for version 0 to version 1 conversion, only supporting v1 to v2 conversion. In the less frequent case of a layout at version 0 SQLgrey will "die" with a message advising to use an external script to upgrade from version 0 to 1. >We're currently receiving around 1.2 million mails a day, which Postgrey >handles fine. But i like the idea about the auto-whitelist in SQLgrey, >and the possibility of getting rid of the 5-10 minutes downtime a day. >The ongoing cleaning that SQLgrey does (3 deletes pr. mail iirc) would >mean 3+ million extra queries a day, which is quite a lot. > >Cleaning could be done in 3 (or more) ways: > >- Timed interval, eg. once every 2 hours. > > Supported in the code. Deactivated by mistake. >- Interval based on queries, eg. once every 2000 queries > > Maybe. I'll have to think about that one. >- Cleaning from cron, calling sqlgrey with --db-clean (or seperate > perlscript (but that would mean more code to maintain)), this is in my > eyes a pretty flexible way, and non-blocking without rewriting SQLgrey > to use Net::Server::Prefork, and then just tell SQLgrey that cleaning > is done externally, and it doesn't have to worry about it. Default > should be one of the other ways, to avoid people who doesn't RTFM > complaining about large databases. > > Currently I'm inclined to make the interval configurable in 1.4 with a more sensible default that "0" and store a last_cleanup timestamp in database in 1.6 to easily support multiple instances accessing the same database. >Limiting the number of queries is always good, haven't really looked >much into the queries, and i'm sure you're doing what you can to limit >them. > > > Yep, the order of the checks is designed to minimize the number of queries done to the database. >The missing indexes could also become a problem as the connect-table >probably will contain 500,000-1,000,000 rows, which is the current >values for Postgrey, and the from_awl contains around 700,000 records. >These values will probably be smaller when using autowhitelisting, but >the tables will still need indexes to avoid full tablescans. > > If you could test how much time each index on the connect table saves, that would help me decide which ones to create. See the FAQ file for questions I'm waiting answers for in order to decide on the indices. The db_performance_reports present the only report I have (not installed, you'll have to fetch it from the tar.bz2 archive or the CVS tree). >I've been close to testing it in production a couple of times, but >chickened out each time as i'm in the process of moving and therefore is >unable to keep an eye on it due to lack of IP in the new apartment. > > Wise decision. I would have reports of users with your level of trafic I could have said the default SQLgrey install would (not) have been suited for you, but you may be the first one trying with this kind of trafic, so it's better to be able to monitor the sytem for at least the first day. I think I'll end up writing some perl scripts that simulate mail trafic by calling the policy daemon with adjustable trafic patterns (% of non reconnecting zombies, distribution of the reconnect delay of real MTAs, distribution of number of source IPs for each domain, distribution of number of e-mail addresses per domain for example) and rates. This should help people test SQLgrey *before* starting to greylist with it. It will help me fine tune indices on different databases too. Another entry in my TODO list... Probably for the 1.4 releases. Best regards, Lionel. |
From: Rene J. <rg...@ba...> - 2004-12-17 10:42:29
|
On Thu, Dec 16, 2004 at 02:44:22PM +0100, Lionel Bouton wrote: > As SQLgrey is required to verify the database version at startup (and i= f=20 > you look at the code closely, you'll see that I paid in messy code the=20 > mistake of not storing the layout version from the start) I'm inclined=20 > to let SQLgrey handle the upgrade too. I still think that it's the admins job to upgrade/maintain the db, but if that's how you like it, that's the way it's got to be :-) > conversion, only supporting v1 to v2 conversion. In the less frequent=20 > case of a layout at version 0 SQLgrey will "die" with a message advisin= g=20 > to use an external script to upgrade from version 0 to 1. Why not do the same thing in any case (in case of version 1 when version 2 is needed) > >- Timed interval, eg. once every 2 hours. > Supported in the code. Deactivated by mistake. Yeah, i can see it's back in the soon to come v. 1.4.1. > Currently I'm inclined to make the interval configurable in 1.4 with a=20 > more sensible default that "0" and store a last_cleanup timestamp in=20 > database in 1.6 to easily support multiple instances accessing the same= =20 > database. Why wait until 1.6? Currently you got the config-table which could be fine for storing a timestamp. > Yep, the order of the checks is designed to minimize the number of=20 > queries done to the database. A couple of comments: When checking domain_awl and from_awl, which both is cleaned reguarly, there is no need for "AND last_seen > now() - INTERVAL 60 DAY". It doesn't matter if it's 60 days + 30 minutes, and it'll make the query a little lighter. First time a triplet is seen a delete is still performed on the connect-table to erase an entry that doesn't exist, maybe the result of the previous check in the connect-table could be used to determine if a delete is necessary before inserting the row. Maybe a auto-incrementing key could be implemented on the connect-table? When the first select is performed on connect-table, the id-key could be returned, making deletes easier for the db. > Wise decision. I would have reports of users with your level of trafic = I=20 > could have said the default SQLgrey install would (not) have been suite= d=20 > for you, but you may be the first one trying with this kind of trafic,=20 > so it's better to be able to monitor the sytem for at least the first d= ay. Well, I found a open access-point yesterday, so maybe on monday :o) > Another entry in my TODO list... Probably for the 1.4 releases. Sounds like a good idea, maybe i'll look at coding something like that, if i get the time. --=20 -Ren=E9 |
From: Lionel B. <lio...@bo...> - 2004-12-17 12:33:28
|
Rene Joergensen wrote the following on 12/17/04 11:42 : >[...] > > >>Currently I'm inclined to make the interval configurable in 1.4 with a >>more sensible default that "0" and store a last_cleanup timestamp in >>database in 1.6 to easily support multiple instances accessing the same >>database. >> >> > >Why wait until 1.6? Currently you got the config-table which could be >fine for storing a timestamp. > > Because 1.4 should be for bugfixes and really simple additions only. For example maint_delay set to 0 is a bug, it will be fixed in 1.4.1. Allowing whitelists reload from a central server is not intrusive : it will be done in 1.4.x. But for new functions that involve more than a few lines of code, I'll start the 1.5.x dev branch. People that don't want to test new code and want more reliable software will use 1.4.x updates, others will test the 1.5.x releases until they stabilize and 1.6.0 comes out... > > >>Yep, the order of the checks is designed to minimize the number of >>queries done to the database. >> >> > >A couple of comments: > >When checking domain_awl and from_awl, which both is cleaned reguarly, >there is no need for "AND last_seen > now() - INTERVAL 60 DAY". It >doesn't matter if it's 60 days + 30 minutes, and it'll make the query a >little lighter. > > I have already considered dropping this part of the statement (with the same remarks in mind), but I didn't do it already for the following reasons : - doing so will make the results depend on the cleanup which might very well become a separate process which might have to handle subtle problems when you have several SQLgrey instance accessing the same database, - I'm not sur this will save much time, remember that SQL doesn't slow down much when performing computations, it gets slow when it involves reading a lot of rows, sorting results that don't fit in memory or perform some kinds of joins between tables. I might drop this condition in the future given the following conditions : - tests show there is a gain in performance, - I've clear ideas on how the cleanup process will shape up. >First time a triplet is seen a delete is still performed on the >connect-table to erase an entry that doesn't exist, > In fact it could exist, but the way SQLgrey handle this is not clean (I couldn't find any way to make it so). Let me explain. SQLgrey is coded with multiple instances accessing the same database in mind. So nothing prevents another instance adding a connect entry that will clash with the one this instance wants to insert, just imagine 2 MX receiving 2 mails with the same sender/recipient/clientIP, the corresponding SQLgrey instances will both try to add a very similar entry to the connect table (only the timestamp can change). If they do it at the same time, there's no way to ensure the code can detect it reliably (MySQL doesn't support PRIMARY KEYs large enough, this is one of the little annoyances that usually make me prefer PostgreSQL). So SQLgrey tries to reduce the window as much as possible by DELETING just before INSERTING. Having 2 entries will bite back when trying to match a reconnection against the earlier connection, SQLgrey doesn't like the fact that there could be 2 entries in the table for the same connection. In fact by rereading the whole code involved, there's a bug : SQLgrey should allow several entries and if one matches, allow the message to pass (the consistency checks done by SQLgrey wrongly assume there is a PRIMARY KEY). There will still be problems when computing the reconnection delay, but there's not much we can do about it. The window for this bug is really small and can only happen on multiple SQLgrey instances installations, but it can happen (just added this bug to my TODO for 1.4.x). > maybe the result of >the previous check in the connect-table could be used to determine if a >delete is necessary before inserting the row. > > Hope my explanation was enough to convince you it's not the case, it's a subtle problem. >Maybe a auto-incrementing key could be implemented on the connect-table? >When the first select is performed on connect-table, the id-key could be >returned, making deletes easier for the db. > > > I'm not sure I understand. Which delete do you want to speed up ? The one that occurs on the connect table when the reconnection succeeds ? This can be done, but there won't be a way to check that another connect entry was added *while* the auto-whitelisting process was going on. You will have an entry in connect that will only be removed by the "cleanup" functions that will report a false positive (as the reconnect will match the auto-whitelist instead of the connect entry, it will remain here). >>Another entry in my TODO list... Probably for the 1.4 releases. >> >> > >Sounds like a good idea, maybe i'll look at coding something like that, >if i get the time. > > > That would be a great addition I'd be thankful for, Lionel. |
From: Rene J. <rg...@ba...> - 2004-12-17 14:26:57
|
On Fri, Dec 17, 2004 at 01:32:35PM +0100, Lionel Bouton wrote: > Because 1.4 should be for bugfixes and really simple additions only. > For example maint_delay set to 0 is a bug, it will be fixed in 1.4.1.=20 > Allowing whitelists reload from a central server is not intrusive : it=20 > will be done in 1.4.x. When is 1.4.1 due to be released? I've looked in CVS, and it doesn't look like the bugfix is committed yet. Friday afternoon is the best time for implementing new solutions. ;) > I have already considered dropping this part of the statement (with the= =20 > same remarks in mind), but I didn't do it already for the following=20 > reasons : > - doing so will make the results depend on the cleanup which might very= =20 > well become a separate process which might have to handle subtle=20 > problems when you have several SQLgrey instance accessing the same data= base, The cleaning that is performed at the moment isn't very complex, and if it doesn't change, i don't think there should be any problems in dropping the statement. > - I'm not sur this will save much time, remember that SQL doesn't slow=20 > down much when performing computations, it gets slow when it involves=20 > reading a lot of rows, sorting results that don't fit in memory or=20 > perform some kinds of joins between tables. If there's an index on the columns in the select the gain is probably minimal. But the best way is to get some data to perform tests on. [...] > Hope my explanation was enough to convince you it's not the case, it's = a=20 > subtle problem. My approach was that only one SQLgrey instance was running, as the case is in our setup with postgrey (and will be with SQLgrey). --=20 -Ren=E9 |
From: Lionel B. <lio...@bo...> - 2004-12-17 14:44:57
|
Rene Joergensen wrote the following on 12/17/04 15:26 : >On Fri, Dec 17, 2004 at 01:32:35PM +0100, Lionel Bouton wrote: > > > >>Because 1.4 should be for bugfixes and really simple additions only. >>For example maint_delay set to 0 is a bug, it will be fixed in 1.4.1. >>Allowing whitelists reload from a central server is not intrusive : it >>will be done in 1.4.x. >> >> > >When is 1.4.1 due to be released? I've looked in CVS, and it doesn't >look like the bugfix is committed yet. Friday afternoon is the best >time for implementing new solutions. ;) > > You'll have to wait a little. I have guests this evening and am switching from keyboard to cooking tools ! |
From: Rene J. <rg...@ba...> - 2004-12-17 16:04:53
|
On Thu, Dec 16, 2004 at 02:44:22PM +0100, Lionel Bouton wrote: > If you could test how much time each index on the connect table saves,=20 > that would help me decide which ones to create. I've just put it into production :o) Commented out the cleanup code and wrote a small perlscript to maintain the db until v. 1.4.1 With around 7000 rows in connect mysqld started hovering around 80-90% CPU (Dual 2.6GHz Xeon with 1GB RAM and 2 SCSI-disks), after putting index on ip_addr (alter table connect add index(ip_addr)) mysqld now uses around 2% CPU. Before: mysql> SELECT 1 FROM connect WHERE sender_name =3D 'rgj' AND sender_domai= n =3D 'bananas.dk' AND ip_addr =3D '194.255.237' AND rcpt =3D 'just@biteme.= dk' AND first_seen BETWEEN now() - INTERVAL 24 HOUR AND now() - INTERVAL 5 MINUTE ; +---+ | 1 | +---+ | 1 | +---+ 1 row in set (0.03 sec) After: +---+ | 1 | +---+ | 1 | +---+ 1 row in set (0.00 sec) mysql> SELECT count(*) FROM connect WHERE first_seen < NOW() - INTERVAL 24 HOUR; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (0.05 sec) mysql> alter table connect add index(first_seen); Query OK, 17121 rows affected (0.23 sec) Records: 17121 Duplicates: 0 Warnings: 0 mysql> SELECT count(*) FROM connect WHERE first_seen < NOW() - INTERVAL 24 HOUR; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (0.00 sec) When index is needed on from_awl i'll return with more results. --=20 -Ren=E9 |
From: Rene J. <rg...@ba...> - 2004-12-18 23:24:19
|
On Fri, Dec 17, 2004 at 05:04:44PM +0100, Rene Joergensen wrote: > When index is needed on from_awl i'll return with more results. Sorry, should have checked before writing that, indexes already exist on from_awl and domain_awl. --=20 -Ren=E9 |
From: Josh E. <jo...@en...> - 2004-12-20 17:43:47
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi all, I'm trying to get SQLgrey running on a new FreeBSD 5 system. I just installed Perl 5.8 and then SQLgrey and get this when I try to run it: $ sqlgrey -d Invalid conversion in sprintf: "%'" at /usr/local/lib/perl5/5.8.5/mach/Sys/Syslog.pm line 312. Can't call method "do" on unblessed reference at /usr/bin/sqlgrey line 91. Does SQLgrey support Perl 5.8, or what else might be wrong? Josh -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.6 (FreeBSD) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFBxw8oV/+PyAj2L+IRAofLAKChqVaCpI0JRvlQ/jUldvVeSOpdrwCeN65i LPz/ox4tbMjQNriHQcBL1/w= =InOv -----END PGP SIGNATURE----- |
From: Josh E. <jo...@en...> - 2004-12-20 17:50:54
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Josh Endries wrote: | $ sqlgrey -d | Invalid conversion in sprintf: "%'" at | /usr/local/lib/perl5/5.8.5/mach/Sys/Syslog.pm line 312. | Can't call method "do" on unblessed reference at /usr/bin/sqlgrey | line 91. Oops, it was a MySQL ACL problem, sorry for the false alarm. Josh -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.6 (FreeBSD) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFBxxDIV/+PyAj2L+IRAn6uAJ9OM0MJuR1XJnW6u4XHENcfMZ4P3ACfRoc6 41fPSfv2psqEtxTewqvXsuE= =3NNJ -----END PGP SIGNATURE----- |
From: Lionel B. <lio...@bo...> - 2004-12-20 19:09:42
|
Josh Endries wrote the following on 12/20/04 18:50 : > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Josh Endries wrote: > | $ sqlgrey -d > | Invalid conversion in sprintf: "%'" at > | /usr/local/lib/perl5/5.8.5/mach/Sys/Syslog.pm line 312. > | Can't call method "do" on unblessed reference at /usr/bin/sqlgrey > | line 91. > > Oops, it was a MySQL ACL problem, sorry for the false alarm. I was wondering where this '%' was coming from :-) I'll have to test if I can make SQLgrey output a more useful error message though. What was the permission problem exactly ? Lionel. |
From: Josh E. <jo...@en...> - 2004-12-20 19:42:12
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Lionel Bouton wrote: | I was wondering where this '%' was coming from :-) I'll have to test if | I can make SQLgrey output a more useful error message though. | What was the permission problem exactly ? I initially added sqlgrey@127.0.0.1, but later realized the web and MySQL servers are separate hosts (been a busy day), so I changed 127.0.0.1 to % and flushed privileges. Effectively, the user sqlgrey@% ('%' is the wildcard for MySQL, though you probably knew that) existed but access to the db was for @127.0.0.1. I'm not sure if you can catch that or not, but the error mentioned Syslog.pm, which threw me off at first. Josh -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (MingW32) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFBxykZV/+PyAj2L+IRAn/DAJ9X3ii+zDGKFtia6vKHs84Ct4PLuwCfSU8w H7xmy8b6hCEEdQDRaFPttk8= =BBC+ -----END PGP SIGNATURE----- |
From: David R. <dr...@gr...> - 2004-12-16 15:56:52
|
Rene Joergensen wrote, On 12/16/2004 3:25 AM: > - Cleaning from cron, calling sqlgrey with --db-clean (or seperate > perlscript (but that would mean more code to maintain)), this is in my > eyes a pretty flexible way, and non-blocking without rewriting SQLgrey > to use Net::Server::Prefork, and then just tell SQLgrey that cleaning > is done externally, and it doesn't have to worry about it. Default > should be one of the other ways, to avoid people who doesn't RTFM > complaining about large databases. I think this one is a great idea. I don't see the need for cleaning out the database maybe more than once every hour or so, and being able to move the cleanup process outside of the main sqlgrey process will reduce the latency hit of having the daemon itself do cleanup. Not a big deal in your typical setup, but it will make a big difference in high volume setups. -Dave |
From: Lionel B. <lio...@bo...> - 2004-12-16 16:20:40
|
David Rees wrote the following on 12/16/04 16:56 : > Rene Joergensen wrote, On 12/16/2004 3:25 AM: > >> - Cleaning from cron, calling sqlgrey with --db-clean (or seperate >> perlscript (but that would mean more code to maintain)), this is in my >> eyes a pretty flexible way, and non-blocking without rewriting SQLgrey >> to use Net::Server::Prefork, and then just tell SQLgrey that cleaning >> is done externally, and it doesn't have to worry about it. Default >> should be one of the other ways, to avoid people who doesn't RTFM >> complaining about large databases. > Remember that even with Multiplex you can fork early to launch another process which can do the asynchronous cleanups. You won't have to maintain cron entries this way. > > I think this one is a great idea. I don't see the need for cleaning > out the database maybe more than once every hour or so, and being able > to move the cleanup process outside of the main sqlgrey process will > reduce the latency hit of having the daemon itself do cleanup. Not a > big deal in your typical setup, but it will make a big difference in > high volume setups. Sure. I wonder how SQLite will react to this though... I'm not sure it won't lock the whole table or worse the database when doing the DELETE. That will be 1.6 material, I need to think about this a little more and test things. In the mean time 1.4.1 will be released with a configurable "cleanup_delay" (better describes what the current maint_delay internal variable does) with a default set to 30 minutes. Best regards, Lionel. |
From: David R. <dr...@gr...> - 2004-12-16 19:45:28
|
Lionel Bouton wrote: > > Remember that even with Multiplex you can fork early to launch another > process which can do the asynchronous cleanups. You won't have to > maintain cron entries this way. Another good idea, I like it better than crontab entries. -Dave |