You can subscribe to this list here.
2004 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(10) |
Nov
(37) |
Dec
(66) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2005 |
Jan
(52) |
Feb
(136) |
Mar
(65) |
Apr
(38) |
May
(46) |
Jun
(143) |
Jul
(60) |
Aug
(33) |
Sep
(79) |
Oct
(29) |
Nov
(13) |
Dec
(14) |
2006 |
Jan
(25) |
Feb
(26) |
Mar
(4) |
Apr
(9) |
May
(29) |
Jun
|
Jul
(9) |
Aug
(11) |
Sep
(10) |
Oct
(9) |
Nov
(45) |
Dec
(8) |
2007 |
Jan
(82) |
Feb
(61) |
Mar
(39) |
Apr
(7) |
May
(9) |
Jun
(16) |
Jul
(2) |
Aug
(22) |
Sep
(2) |
Oct
|
Nov
(4) |
Dec
(5) |
2008 |
Jan
|
Feb
|
Mar
(5) |
Apr
(2) |
May
(8) |
Jun
|
Jul
(10) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2009 |
Jan
|
Feb
|
Mar
|
Apr
(32) |
May
|
Jun
(7) |
Jul
|
Aug
(38) |
Sep
(3) |
Oct
|
Nov
(4) |
Dec
|
2010 |
Jan
(36) |
Feb
(32) |
Mar
(2) |
Apr
(19) |
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
(6) |
Nov
(8) |
Dec
|
2011 |
Jan
(3) |
Feb
|
Mar
(5) |
Apr
|
May
(2) |
Jun
(1) |
Jul
|
Aug
(3) |
Sep
|
Oct
|
Nov
|
Dec
(6) |
2012 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
2013 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
(1) |
Sep
(1) |
Oct
|
Nov
(6) |
Dec
(10) |
2014 |
Jan
(8) |
Feb
|
Mar
|
Apr
|
May
|
Jun
(3) |
Jul
(34) |
Aug
(6) |
Sep
|
Oct
(1) |
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(18) |
Jul
(13) |
Aug
(30) |
Sep
(4) |
Oct
(1) |
Nov
|
Dec
(4) |
2016 |
Jan
(2) |
Feb
(10) |
Mar
(3) |
Apr
|
May
|
Jun
(11) |
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
2017 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2018 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2019 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Lionel B. <lio...@bo...> - 2005-06-17 12:18:52
|
At last! 1.6.0 is out on sourceforge. Changes from 1.5.9 include cleanups in the log parser and in the documentation. SQLgrey now logs the actual IP and the client ID it computes (either IP or class-C) instead of the client ID alone. Lionel. |
From: Lionel B. <lio...@bo...> - 2005-06-16 20:41:26
|
Gianpaolo Del Matto wrote the following on 14.06.2005 16:54 : >I found another weird behaviour in 1.5.9 release, however. >As I use FreeBSD I follow the general guideline to >store config files below /usr/local/etc. > >So I moved everything to /usr/local/etc/sqlgrey and start >'sqlgrey -f /usr/local/etc/sqlgrey/sqlgrey.conf'. > >sqlgrey.conf has 'conf_dir' set to afore mentioned directory, >however sqlgrey tries to read the clients* and *regexp files from >/etc/sqlgrey (the hardcoded default) and throws some >conf: warning / conf: error FILENAME not found or unreadable >messages. > > > Found the problem, fixed in my tree. A bug in the log parser was corrected too. 1.6.0 is coming nicely... I'm currently checking if the documentation is up to date. Lionel. |
From: Ray B. <rj_...@rj...> - 2005-06-16 09:20:20
|
Lionel Bouton wrote: >Backup MX providers are mostly misguided ("everyone provide a backup MX >service, we must do it too") or knowingly trying to suck money from you >without any real service provided. > >Lionel. > > > Hi Lionel Thanks for the info! Regards Ray |
From: Ray B. <rj_...@rj...> - 2005-06-16 09:19:51
|
Michel Bouissou wrote: >But in your case, if you state that you have no control over your secondaries, >you probably cannot benefice from any of these features, and then the >baseline is that you shouldn't use thoses MXes over which you have no >control. All it can bring you is trouble. > > > Thanks for the tip Michel. I wasn't picking you out, just researching :) Regards Ray |
From: Michel B. <mi...@bo...> - 2005-06-16 09:16:05
|
Le Jeudi 16 Juin 2005 10:42, Ray Booysen a =E9crit : > > Why do you and Michael think that backup MX servers aren't neccesary? =A0 > Bouissou.net has 3 MX servers listed. =A0I don't understand why they ar= e > unneccesary. All my 3 MXes (for my personal bouissou.net domain) are GNU/Linux compute= rs=20 that are connected thru residential ADSL, so they might experiment=20 reliability or connectivity problems. Although it doesn't usually happen,= it=20 could, and that's why I use backup MXes. I must stress the fact that I do have administrative control over all of = those=20 (and actually 2 MX names currently point to the same IP address, one bein= g=20 dynamic and set there for line backup purposes). Having administrative=20 control over all of my MXes allow me to guarantee that they implement the= =20 same level of filtering, and greylisting on all. The major (and only) advantages of having such backup MXes is that: 1/ In case the primary goes down, and mail gets queued on the secondary, = when=20 the primary comes back I can get waiting mail immediately (using SMTP ETR= N)=20 without having to wait for remote servers to retry at their own will with= =20 their own schedule. For this to be useful, of course, you have to be able= to=20 perform ETRN on your secondaries. 2/ I can set a queue lifetime longer than the defaults on my secondaries,= =20 which allows mail to stay waiting there "longer than normal" (usually 5 d= ays)=20 without bouncing back, in case I would experiment a very long primary=20 downtime for some reason (i.e. major hardware failure that I couldn't qui= ckly=20 fix). But in your case, if you state that you have no control over your seconda= ries,=20 you probably cannot benefice from any of these features, and then the=20 baseline is that you shouldn't use thoses MXes over which you have no=20 control. All it can bring you is trouble. --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |
From: Lionel B. <lio...@bo...> - 2005-06-16 09:08:31
|
Ray Booysen wrote the following on 16.06.2005 10:42 : > Lionel Bouton wrote: > >> The fact is: you probably don't need a secondary MX (in fact no sane >> configuration should). >> > > Hey > > Why do you and Michael think that backup MX servers aren't neccesary? As I said, until the server doing the actual delivery to the mailboxes is up, you only shift the task of retrying a failed delivery from the original MTA to your backup MX servers. The only benefit is when you control the backup MX servers and can make them flush their queues as soon as the "delivery" server is back online : the messages won't get delayed as much as if the original MTAs would have retried themselves. If you don't have any control over the backup MX servers, then you don't gain anything from them, in fact as you witness yourself you only weaken your anti-spam solutions. The name "backup" MX server is in fact misleading, you don't backup anything with it as the so-called "backup" MX is useless without the "delivery" server : you can't use it as your primary server (delivering mails to users, allowing them to fetch through POP3(s) or IMAP) if the original goes down. Backup MX providers are mostly misguided ("everyone provide a backup MX service, we must do it too") or knowingly trying to suck money from you without any real service provided. Lionel. |
From: Ray B. <rj_...@rj...> - 2005-06-16 08:42:15
|
Lionel Bouton wrote: >The fact is: you probably don't need a secondary MX (in fact no sane >configuration should). > > Hey Why do you and Michael think that backup MX servers aren't neccesary? Bouissou.net has 3 MX servers listed. I don't understand why they are unneccesary. Regards Ray |
From: Lionel B. <lio...@bo...> - 2005-06-16 08:40:20
|
Michel Bouissou wrote the following on 16.06.2005 10:24 : >BTW Lionel, have you made your decision about whether to put throttling in >1.6.0, and which algorithm to use ? > > > Throttling will be in 1.7.0, this way we'll get the time needed to experiment and iron out the details before putting it into a stable release. Lionel |
From: Michel B. <mi...@bo...> - 2005-06-16 08:24:19
|
Le Jeudi 16 Juin 2005 10:16, Lionel Bouton a =E9crit : > > >>Are we seeing an increase in the number of spam sending MTAs that don= 't > >>give up on the first attempt? > > > >I believe so. And I also have seen a growing number of spams that retr= y > > after about a minute. But not longer. Which means that greylisting > > duration should probably not being set < 2 minutes. > > I've seen this too. The new default in 1.6.0 will be 5 minutes instead > of 1 minute (I've only witnessed retries from real MTAs after at least = 6 > minutes). Yes. I use 5 minutes myself. I've seen that most legitimate servers have = an=20 average retry time around 19 minutes, so it doesn't hurt much increasing=20 greylisting delay a little. Of course spammers may still increase their retry delay, but the more the= y=20 will increase it, the more burden it will put on their systems, the more = it=20 will slow down their throughput, and the more chances it will give that t= heir=20 IP gets into some blacklists before they retry. So that's all benefit=20 anyway ;-) BTW Lionel, have you made your decision about whether to put throttling i= n=20 1.6.0, and which algorithm to use ? --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |
From: Lionel B. <lio...@bo...> - 2005-06-16 08:23:34
|
Ray Booysen wrote the following on 16.06.2005 09:58 : > Unfortunately I don't have control over any secondaries. Anyone know > of a host that will sell me secondary services and implement > greylisting? :) The fact is: you probably don't need a secondary MX (in fact no sane configuration should). SMTP guarantees that the sending host will retry a failed transaction latter without the message originator even noticing this (unless the message is delayed for long and the MTA is configured to warn when a message is delayed for more than a specified amount of time).So if your primary goes down, either you put it back before the messages expire in their queues and your users get them or you aren't quick enough and whatever the case the messages will be bounced back from the queue they are sitting in (either the origin MTA or the secondary MX). Simply remove the secondary MX from the DNS config and when the TTL expires remove it from your trusted sources in your local Postfix configuration. Lionel. |
From: Lionel B. <lio...@bo...> - 2005-06-16 08:16:54
|
Michel Bouissou wrote the following on 16.06.2005 09:41 : >>Are we seeing an increase in the number of spam sending MTAs that don't >>give up on the first attempt? >> >> > >I believe so. And I also have seen a growing number of spams that retry after >about a minute. But not longer. Which means that greylisting duration should >probably not being set < 2 minutes. > > I've seen this too. The new default in 1.6.0 will be 5 minutes instead of 1 minute (I've only witnessed retries from real MTAs after at least 6 minutes). Lionel |
From: Michel B. <mi...@bo...> - 2005-06-16 08:13:08
|
Le Jeudi 16 Juin 2005 09:58, Ray Booysen a =E9crit : > > >If the primary MX performs greylisting, then *ALL* the backup MXes MUS= T > >perform greylisting themselves as well. > > Unfortunately I don't have control over any secondaries. Then don't use secondaries. Secondaries MUST be under the same administra= tive=20 control than the primary. Otherwise it's better not to use secondaries (a= s=20 secondary MXes are not necessary anyway, and will cause more trouble than= =20 they will help, if they become an easy gateway for spam and viruses). --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |
From: Ray B. <rj_...@rj...> - 2005-06-16 07:59:07
|
Michel Bouissou wrote: >Le Jeudi 16 Juin 2005 09:31, Ray Booysen a =E9crit : > =20 > >>I have started to notice more and more spam emails that are being sent >>by MTAs that use the next available MX after I greylist the initial >>connect. >> =20 >> > >I see the same. > > =20 > >>My server then in turn greylists the connect from the backup=20 >>MX but it doesn't stop the spam or virus being delivered in the end. >> =20 >> > >This is *NOT* good ! > >If the primary MX performs greylisting, then *ALL* the backup MXes MUST=20 >perform greylisting themselves as well. > > =20 > Unfortunately I don't have control over any secondaries. Anyone know of=20 a host that will sell me secondary services and implement greylisting? :) >As a rule of thumb, *ANY* anti-spam measure that exists on a primary MX = at=20 >SMTP level MUST exist as well on all secondaries. Otherwise secondaries = are=20 >easy ways to bypass antispam protection for a given domain, and spammers= know=20 >that well (some spammers / spambots systematically send to the LOWEST=20 >priority MX to exploit this possible, and alas frequent, security=20 >shortcoming). > >And the primary MX should not greylist mail coming from its secondaries = (they=20 >should be whitelisted), as greyliting secondaries is not only useless bu= t=20 >also counterproductive. > =20 > I know this. I just havn't whitelisted the IP yet. Will get onto that > =20 > >>Are we seeing an increase in the number of spam sending MTAs that don't >>give up on the first attempt? >> =20 >> > >I believe so. And I also have seen a growing number of spams that retry = after=20 >about a minute. But not longer. Which means that greylisting duration sh= ould=20 >probably not being set < 2 minutes. > =20 > Thanks for the tip! :) >Cheers. > > =20 > Thanks for thelp help Michael! Regards Ray |
From: Michel B. <mi...@bo...> - 2005-06-16 07:41:53
|
Le Jeudi 16 Juin 2005 09:31, Ray Booysen a =E9crit : > > I have started to notice more and more spam emails that are being sent > by MTAs that use the next available MX after I greylist the initial > connect. I see the same. > My server then in turn greylists the connect from the backup=20 > MX but it doesn't stop the spam or virus being delivered in the end. This is *NOT* good ! If the primary MX performs greylisting, then *ALL* the backup MXes MUST=20 perform greylisting themselves as well. As a rule of thumb, *ANY* anti-spam measure that exists on a primary MX a= t=20 SMTP level MUST exist as well on all secondaries. Otherwise secondaries a= re=20 easy ways to bypass antispam protection for a given domain, and spammers = know=20 that well (some spammers / spambots systematically send to the LOWEST=20 priority MX to exploit this possible, and alas frequent, security=20 shortcoming). And the primary MX should not greylist mail coming from its secondaries (= they=20 should be whitelisted), as greyliting secondaries is not only useless but= =20 also counterproductive. > Are we seeing an increase in the number of spam sending MTAs that don't > give up on the first attempt? I believe so. And I also have seen a growing number of spams that retry a= fter=20 about a minute. But not longer. Which means that greylisting duration sho= uld=20 probably not being set < 2 minutes. Cheers. --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |
From: Ray B. <rj_...@rj...> - 2005-06-16 07:31:15
|
Hi I have started to notice more and more spam emails that are being sent by MTAs that use the next available MX after I greylist the initial connect. My server then in turn greylists the connect from the backup MX but it doesn't stop the spam or virus being delivered in the end. Are we seeing an increase in the number of spam sending MTAs that don't give up on the first attempt? I don't like to think that this is so because SQLGrey does work incredibly well with the current connections we receive. Any thoughts on this issue? Regards Ray |
From: Michael S. <Mic...@lr...> - 2005-06-14 15:29:09
|
On Sun, 12 Jun 2005, Michel Bouissou wrote: > If you use the algorithm you propose, let's say with a domain_group_level of > 10 and a throttling threshold of 20, and you have one MTA that sends mail for > ONLY one domain, then this MTA will make it to domain_awl (and have only one > entry there even though this may correspond to thousands of different > senders), but with your algorithm this will never be enough and this MTA will > still remain "throttleable". If the MTA sends ONLY emails with originators from ONE domain, then there will be en entry in domain_awl and ALL emails will immediately accepted. There is noch chance for an email to be listed in connect and threfore throttling will never occur. > > So I still think that we shouldn't mix a count of entries in from_awl and > domain_awl, as they don't have the same meaning, and should rather use my > algorithm : Stop throttling for an IP if it has at least 1 entry in > domain_awl, or >= throttling threshold in from_awl. I want to be able to specify that more than one entry in domain_awl should be used. To have a simple configuration I thought about linking entries in domain_awl and from_awl togther. But if you say these entries cannot be linked together, we have to switch to explicit values. This means we need a vector of values, where each value corresponds to the number of entries in an awl which would prove that we trust a MTA (I call these MTAs wellbehaved): connect_src_throttle = (1, 10) # (value for domain_awl, value for from_awl) Since I want to use a table for triples too, I would need a vectr with 3 elements. > > > > BTW, we use the algorithm, which checks for the IP address in domain_awl > > and from_awl, for the opposite direction and call it fast propagation. > > That means, if an IP address is from a well behaved MTA, then we accept > > the triple immediately. This eliminates the delay for forwarded emails, > > because most of the time a wellbehaved MTA has an entry in domain_awl. But > > this is done with the cost of polluting the from_awl, therefore we want > > the additional table for forwarding. > > Hmmm... I'm not sure that I completely understand what you mean here... > > Ok, which part can I describe better: - how fast propagation works - or what the relationship is between forwarding and fast propagation Michael Storz ------------------------------------------------- Leibniz-Rechenzentrum ! <mailto:St...@lr...> Barer Str. 21 ! Fax: +49 89 2809460 80333 Muenchen, Germany ! Tel: +49 89 289-28840 |
From: Lionel B. <lio...@bo...> - 2005-06-14 14:04:32
|
Gianpaolo Del Matto wrote: > >-----BEGIN PGP SIGNED MESSAGE----- >Hash: SHA1 > >Hello Lionel > >I'm implementing SQLGrey into our mail setup. >Whilst doing so, I came across two missing features. > >My proposal is to include the following in sqlgrey.conf: > >#1 change the default PREPEND header value, eg. > > prepend_header = "X-Greylist: " (default) > prepend_header = "X-MyOwn-Header: " (changed) > >Changing the header would allow to show its origin >simply by its name and include it with site local filter >sets. > > >#2 change the default smpt reject reply, eg. > > reject_reply = "Greylisted for %s seconds" (default) > reject_reply = "My own reject message; try again in %s seconds" >(changed) > >These come through the need for custom smtp replies, e.g. >to include a custom URL pointing to our website for further >information. > > >I have not yet started in changing the source code as you >may want to implement this by yourself. >I may, however, supply you with an appropriate patch code >for sqlgrey-1.5.9 if you find this feature valuable and want me to >write it. > > There is an entry in my TODO losely related to this : - make header content user configurable with some predefined macros. _DELAY_ for example. As latest 1.5.x releases are pretty stable, as soon as I find some time, I'll clean up the log parsing tool and release 1.6.0 (I feel a need for a stable version with a database setup adapted to high-throughput servers). Then I'll start the 1.7.x devel branch right away with some features waiting in the pipe. reject_reply and prepend_header configuration would make it into 1.7.x too. Cheers, Lionel. |
From: Michel B. <mi...@bo...> - 2005-06-12 08:11:28
|
Le Vendredi 10 Juin 2005 17:22, Michael Storz a =E9crit : > > > > If we want to mix the count from domain_awl and the count from from_a= wl, > > then we would need to query both tables everytime, which could result= in > > a performance loss, which would be annoying especially for big sites.= .. > > If you look carefully at the algorithm, then you see that we do not hav= e > to check both tables in every case: > > my $threshold =3D connect_src_throttle - > $self->count_src_domain_awl($cltid) * > group_domain_level; > > If connect_src_throttle =3D=3D group_domain_level then 1 entry in domai= n_awl > is enough to circumvene throttling. Only if connect_src_throttle > > group_domain_level you have to check from_awl in addition. I have some objections to using this algorithm instead of the one that I = had=20 proposed : One entry in domain_awl IMHO "wheights more" that group_domain_level entr= ies=20 in from_awl. For one entry in domain_awl is equivalent to "AT LEAST=20 group_domain_level entries (or more...) for the same host and same domain= in=20 from_awl". For this reason, I had considered that one entry in domain_awl was enough= to=20 consider that a given host was well behaved and known enough to allow it = to=20 bypass throttling. If you use the algorithm you propose, let's say with a domain_group_level= of=20 10 and a throttling threshold of 20, and you have one MTA that sends mail= for=20 ONLY one domain, then this MTA will make it to domain_awl (and have only = one=20 entry there even though this may correspond to thousands of different=20 senders), but with your algorithm this will never be enough and this MTA = will=20 still remain "throttleable". So I still think that we shouldn't mix a count of entries in from_awl and= =20 domain_awl, as they don't have the same meaning, and should rather use my= =20 algorithm : Stop throttling for an IP if it has at least 1 entry in=20 domain_awl, or >=3D throttling threshold in from_awl. > BTW, we use the algorithm, which checks for the IP address in domain_aw= l > and from_awl, for the opposite direction and call it fast propagation. > That means, if an IP address is from a well behaved MTA, then we accept > the triple immediately. This eliminates the delay for forwarded emails, > because most of the time a wellbehaved MTA has an entry in domain_awl. = But > this is done with the cost of polluting the from_awl, therefore we want > the additional table for forwarding. Hmmm... I'm not sure that I completely understand what you mean here... --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |
From: Michael S. <Mic...@lr...> - 2005-06-10 15:22:42
|
On Thu, 9 Jun 2005, Michel Bouissou wrote: > Le Jeudi 09 Juin 2005 18:34, Michael Storz a =E9crit : > > > > Therefore a possible change to the algorithm would be to incorporate th= e > > relation between from_awl and domain_awl, something like: > > To complete what I wrote in my previous message : > > One of the reasons I had _not_ to combine them, but test domain_awl first= , is > for performance : If we find a presence in domain_awl, then we don't need= to > perform the query against from_awl (the and condition in perl will not > evaluate the following condition if the previous doesn't match), and thus= we > save a query against the bigger from_awl table when there is an entry in > domain_awl -- which is likely to be the case for big servers sending us a= lot > of stuff, which are more likely than others to generate a high number of > "legitimate entries" in connect, if their IP change for example. > > If we want to mix the count from domain_awl and the count from from_awl, = then > we would need to query both tables everytime, which could result in a > performance loss, which would be annoying especially for big sites... > If you look carefully at the algorithm, then you see that we do not have to check both tables in every case: my $threshold =3D connect_src_throttle - $self->count_src_domain_awl($cltid) * group_domain_leve= l; If connect_src_throttle =3D=3D group_domain_level then 1 entry in domain_aw= l is enough to circumvene throttling. Only if connect_src_throttle > group_domain_level you have to check from_awl in addition. BTW, we use the algorithm, which checks for the IP address in domain_awl and from_awl, for the opposite direction and call it fast propagation. That means, if an IP address is from a well behaved MTA, then we accept the triple immediately. This eliminates the delay for forwarded emails, because most of the time a wellbehaved MTA has an entry in domain_awl. But this is done with the cost of polluting the from_awl, therefore we want the additional table for forwarding. Because the algorithm can be used for two different purposes, we should give it an extra subroutine, e.g. is_wellbehaved_mta. Using both features, throttling and fast propagation will result in a minimum delay, because on the first try only connect_src_throttle entries will be made in connect and with the first retry all emails not only the ones with entries in connect will be accepted. Withour fast propagation, the first retry will only allow the acceptance of the emails in connect and the second retry will accept all other emails. Michael Storz ------------------------------------------------- Leibniz-Rechenzentrum ! <mailto:St...@lr...> Barer Str. 21 ! Fax: +49 89 2809460 80333 Muenchen, Germany ! Tel: +49 89 289-28840 |
From: Ray B. <rj_...@rj...> - 2005-06-10 13:51:11
|
OK, After just being hit by the Worm.LovGate.X I would love throttling to come into the sqlgrey tree. My server spent this afternoon processing the emails just to drop them in the end. Regards Ray -- Ray Booysen rj_...@rj... |
From: Michel B. <mi...@bo...> - 2005-06-09 17:33:29
|
Le Jeudi 09 Juin 2005 18:34, Michael Storz a =E9crit : > > Therefore a possible change to the algorithm would be to incorporate th= e > relation between from_awl and domain_awl, something like: To complete what I wrote in my previous message : One of the reasons I had _not_ to combine them, but test domain_awl first= , is=20 for performance : If we find a presence in domain_awl, then we don't need= to=20 perform the query against from_awl (the and condition in perl will not=20 evaluate the following condition if the previous doesn't match), and thus= we=20 save a query against the bigger from_awl table when there is an entry in=20 domain_awl -- which is likely to be the case for big servers sending us a= lot=20 of stuff, which are more likely than others to generate a high number of=20 "legitimate entries" in connect, if their IP change for example. If we want to mix the count from domain_awl and the count from from_awl, = then=20 we would need to query both tables everytime, which could result in a=20 performance loss, which would be annoying especially for big sites... --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |
From: Michel B. <mi...@bo...> - 2005-06-09 17:12:13
|
Le Jeudi 09 Juin 2005 18:34, Michael Storz a =E9crit : > > Here are my values using above select statement: > > number of entries in connect: 1.072.022 > number of different IP addresses in connect: 110.904 > average number of entries per IP address: 9.67 > max. number of entries per IP address: 2.470 Waow ! 2.470 entries for ONE IP ;-) > thrott. | num of | num of | num of | left | % reduc > num. | IP addr | entries | thrott | entries | > =3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D= =3D=3D=3D=3D=3D=3D=3D [...] > 10 | 29.092 | 870.638 | 608.810 | 463.212 | 56.79 % > 20 | 13.186 | 672.086 | 421.552 | 650.470 | 39.32 % > 30 | 7.908 | 549.151 | 319.819 | 752.203 | 29.83 % > 40 | 5.367 | 463.256 | 253.943 | 818.079 | 23.69 % So by setting a throttling threshold between 10 and 40, you would save be= tween=20 25 and 55% of your (huge) connect table size... > However, I would not incorporate this algorithm into 1.6.0 but in 1.7.0= . > If we put the other tables into sqlgrey about which I talked already, t= he > algorithm for throttling must be adapted. With the figures you gave, I believe throttling alone could help a great = deal=20 with your problem of zombi spam accidentally passing thru, without maybe=20 having to go for a heavier method of multiplying tables. Maybe you'd like to give throttling alone a try, and check to what extent= it=20 helps you, and if you still need further improvements (with the cost of=20 complexity). > But even if not, I am not sure=20 > if the algorithm is flexible enough. For example, lets assume the value= of > connect_src_throttle is 21 and the value of group_domain_level is 10. > > - if there is one or two entries in domain_awl, a new triple would be > accepted. > - if there are 20 entries in connect as well as in from_awl and 0 in > domain_awl, a new triple would be throttled, but 20 entries in from_a= wl > should be as good as 2 entries in domain_awl because of > group_domain_level. For sure we could try to "refine the refinements" a little further, but a= fter=20 having thought about it for a while, I believed this not to be necessary = --=20 maybe I 'm mistaken. Considering that we stop throttling when "we can be reasonably sure that = a=20 given source (IP) usually retries all or most (*) of its messages", then = we=20 don't need to throttle it anymore as : a/ Waiting messages will (most probably) come back b/ Throttling further could have undesired results (as causing unnecessar= y=20 long delays, or even causing the loss of legitimate messages after queue=20 lifetime expiration at sending server) (*) It doesn't however manage the case of both a legitimate server and a=20 network of zombies NATted behind the same IP... Taking this into consideration, I think it's not of high importance to ha= ve=20 the "number of entries in from_awl threshold" match the threshold at whic= h a=20 given IP goes into domain_awl. We need to consider both, but we don't necessarily need match "weights" f= or=20 both conditions. The game is only to make sure that we throttle as long as it may be usefu= l,=20 but quit throttling for servers which have already succeeded in retrying = a=20 significant number of messages. Isn't it ? Cheers. --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |
From: Michael S. <Mic...@lr...> - 2005-06-09 16:35:32
|
On Wed, 8 Jun 2005, Michel Bouissou wrote: > Le Mercredi 08 Juin 2005 11:05, Lionel Bouton a =E9crit : > > > > Michel, could you give us a ratio [...] > > > If other users could fetch Michel's build and test it in the same manne= r > > too that would be great. > > Everybody can easily figure out if it could save many entries in their co= nnect > table by performing manually a simple sql query such as : > > select src, count(*) as cpt from connect group by src having cpt >=3D 3 o= rder by > cpt desc, src; > > (replace >=3D 3 with any value you would consider for setting the tarpitt= ing > threshold) > > Here are my values using above select statement: number of entries in connect: 1.072.022 number of different IP addresses in connect: 110.904 average number of entries per IP address: 9.67 max. number of entries per IP address: 2.470 thrott. | num of | num of | num of | left | % reduc num. | IP addr | entries | thrott | entries | =3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D= =3D=3D=3D=3D=3D 3 | 55.366 | 1.001.789 | 891.057 | 180.965 | 83.12 % 5 | 42.124 | 956.904 | 788.408 | 283.614 | 73.54 % 10 | 29.092 | 870.638 | 608.810 | 463.212 | 56.79 % 20 | 13.186 | 672.086 | 421.552 | 650.470 | 39.32 % 30 | 7.908 | 549.151 | 319.819 | 752.203 | 29.83 % 40 | 5.367 | 463.256 | 253.943 | 818.079 | 23.69 % 50 | 3.696 | 389.663 | 208.559 | 863.463 | 19.45 % 60 | 2.432 | 323.176 | 179.688 | 892.334 | 16.76 % 70 | 1.928 | 290.926 | 157.894 | 914.128 | 14.73 % 80 | 1.605 | 266.954 | 140.159 | 931.863 | 13.07 % 90 | 1.367 | 246.908 | 125.245 | 946.777 | 11.68 % 100 | 1.164 | 227.773 | 112.537 | 959.485 | 10.50 % thrott. num.: number of entries where throttling begins num of IP addr: number of unique IP addresses =3D number of lines of above select statement num of entries: total number of entries from select statement num of thrott: num of entries - (thrott. num. - 1) * num of IP addr left entries: number of entries in connect - num of thrott; % reduc: num of thrott * 100 / number of entries in connect This means, throttling would really decrease the size of our connect table and hopefully the chance from spam to get through. My primary goal was to reduce the delay for the regular messages. But after this I wanted to look at algorithms which would reduce the number of spams. Throttling would have been my first try to reduce spams. But since I had not though how an algorithm could work, its great that Michel already did the work. However, I would not incorporate this algorithm into 1.6.0 but in 1.7.0. If we put the other tables into sqlgrey about which I talked already, the algorithm for throttling must be adapted. But even if not, I am not sure if the algorithm is flexible enough. For example, lets assume the value of connect_src_throttle is 21 and the value of group_domain_level is 10. - if there is one or two entries in domain_awl, a new triple would be accepted. - if there are 20 entries in connect as well as in from_awl and 0 in domain_awl, a new triple would be throttled, but 20 entries in from_awl should be as good as 2 entries in domain_awl because of group_domain_level. Therefore a possible change to the algorithm would be to incorporate the relation between from_awl and domain_awl, something like: # Throttling too many connections from same new host if (defined $self->{sqlgrey}{connect_src_throttle} and $self->{sqlgrey}{connect_src_throttle} > 0 and $self->count_src_connect($cltid) >=3D $self->{sqlgrey}{connect_src_thro= ttle}) { # without the following tests a good chance exists to loose emails for # a new server of a big ISP my $threshold =3D connect_src_throttle - $self->count_src_domain_awl($c= ltid) * group_domain_level; if ($threshold > 0) { =09$threshold -=3D $self->count_src_from_awl($cltid)); =09if ($threshold > 0) { =09 $self->mylog('grey', 2, "throttling: $cltid, $sender_name\@$sender_d= omain -> $recipient"); =09 return ($self->{sqlgrey}{reject_first} . ' Throttling too many conne= ctions from new source - ' . ' Try again later. '); =09} } } BTW, this code sniplet is not tested! Michael Storz ------------------------------------------------- Leibniz-Rechenzentrum ! <mailto:St...@lr...> Barer Str. 21 ! Fax: +49 89 2809460 80333 Muenchen, Germany ! Tel: +49 89 289-28840 |
From: Michel B. <mi...@bo...> - 2005-06-09 11:38:27
|
Le Jeudi 09 Juin 2005 13:31, Lionel Bouton a =E9crit : > > Ok. Now I'm convinced we should test it. But 1.4.8 is pretty old now an= d > 1.5.x is quite stable since 1.5.7 so I would like to issue a stable > 1.6.0 release shortly. Would it be OK if I release a 1.6.0 without the > tarpitting and connect cleanup code and a 1.7.0 with it? You're the boss, so it's up to you ;-) I however believe that the tarpitting and connect cleanup code doesn't=20 introduce any stability or performance problem, and now sqlgrey-logstats.= pl=20 has its own patch for taking tarpitting into account as well. If you're in a real hurry of releasing 1.6.0 within 24 hours ;-) maybe yo= u'd=20 better leave all this code out although ; but if you can afford testing i= t=20 for a couple of days before deciding whether to include it in 1.6.0 or no= t,=20 it could be nice ;-)) Cheers. --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |
From: Lionel B. <lio...@bo...> - 2005-06-09 11:30:38
|
Michel Bouissou wrote: >Le Mercredi 08 Juin 2005 19:26, Lionel Bouton a =E9crit : > =20 > >>Unless a new function we are discussing on the mailing-list is proven u= seful >>to me shortly, I'm planning to release a 1.6.0 stable version based on >>1.5.9.=20 >> =20 >> > >After some thoughts, I have a couple more things in favor of "throttling= " : > >1/ The supplementary SELECT count(*) we perform against the connect tabl= e=20 >before deciding if we will accept or not to add a new entry, which is of= some=20 >performance concern to you, is to some extent compensated by the fact th= at we=20 >save an INSERT each time we refuse an entry -- and that makes also a DEL= ETE=20 >that we save at some point in the future for cleanup. > >2/ Throttling can to some extent be considered as "self-dynamic-blacklis= ting",=20 >which looks nice : I see some patterns by looking at my logs, showing th= at=20 >the same spam sources (Zombie machines used as SMTP relays ? Viruses /=20 >worms ?) tend to come back again and again randomly in time, with differ= ent=20 >payloads (sender / recipient). If we use throttling, once they've filled= up=20 >their not-retried "quota" in connect, when they come back again, their n= ew=20 >connection is refused without generating any new entry in connect, which= in=20 >turn reduces the chances that they could possibly defeat the greylisting= =20 >system by trying to resend (at random) a message with a sender/recipient= =20 >couple already known to the connect table. > =20 > Ok. Now I'm convinced we should test it. But 1.4.8 is pretty old now and 1.5.x is quite stable since 1.5.7 so I would like to issue a stable 1.6.0 release shortly. Would it be OK if I release a 1.6.0 without the tarpitting and connect cleanup code and a 1.7.0 with it? Lionel. |