You can subscribe to this list here.
2004 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(10) |
Nov
(37) |
Dec
(66) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2005 |
Jan
(52) |
Feb
(136) |
Mar
(65) |
Apr
(38) |
May
(46) |
Jun
(143) |
Jul
(60) |
Aug
(33) |
Sep
(79) |
Oct
(29) |
Nov
(13) |
Dec
(14) |
2006 |
Jan
(25) |
Feb
(26) |
Mar
(4) |
Apr
(9) |
May
(29) |
Jun
|
Jul
(9) |
Aug
(11) |
Sep
(10) |
Oct
(9) |
Nov
(45) |
Dec
(8) |
2007 |
Jan
(82) |
Feb
(61) |
Mar
(39) |
Apr
(7) |
May
(9) |
Jun
(16) |
Jul
(2) |
Aug
(22) |
Sep
(2) |
Oct
|
Nov
(4) |
Dec
(5) |
2008 |
Jan
|
Feb
|
Mar
(5) |
Apr
(2) |
May
(8) |
Jun
|
Jul
(10) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2009 |
Jan
|
Feb
|
Mar
|
Apr
(32) |
May
|
Jun
(7) |
Jul
|
Aug
(38) |
Sep
(3) |
Oct
|
Nov
(4) |
Dec
|
2010 |
Jan
(36) |
Feb
(32) |
Mar
(2) |
Apr
(19) |
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
(6) |
Nov
(8) |
Dec
|
2011 |
Jan
(3) |
Feb
|
Mar
(5) |
Apr
|
May
(2) |
Jun
(1) |
Jul
|
Aug
(3) |
Sep
|
Oct
|
Nov
|
Dec
(6) |
2012 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
2013 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
(1) |
Sep
(1) |
Oct
|
Nov
(6) |
Dec
(10) |
2014 |
Jan
(8) |
Feb
|
Mar
|
Apr
|
May
|
Jun
(3) |
Jul
(34) |
Aug
(6) |
Sep
|
Oct
(1) |
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(18) |
Jul
(13) |
Aug
(30) |
Sep
(4) |
Oct
(1) |
Nov
|
Dec
(4) |
2016 |
Jan
(2) |
Feb
(10) |
Mar
(3) |
Apr
|
May
|
Jun
(11) |
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
2017 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2018 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2019 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Michael S. <Mic...@lr...> - 2005-04-29 09:28:31
|
On Fri, 29 Apr 2005, Lionel Bouton wrote: > Given you manually query for the 3 most significant bytes, do you use > 'full' for the greylisting algorithm? Maybe smart would be better suited > (would slowly decrease the number of database entries by replacing IPs > by class C nets). It's difficult to say how much less entries you would > have without a script applying the algo after DNS lookups though... I > made it the default because it is more friendly with mail pools, but the > side effect is that it is also more friendly with your database :-) I know, at some point we have to discuss our differing views about FULL versus SMART :-) Indeed, we are using FULL, because I think it is the right way to go. It definitely depends on the amount of emails you are receiving. A site with a small or moderate amount of email may benefit of SMART. Therefore the default is ok with me. Now, what are the issues using FULL or SMART? - One of your arguments was smaller database. Well, that's actually no problem for us: du -h /var/lib/mysql/sqlgrey/ 258M /var/lib/mysql/sqlgrey With 4 GByte of memory the whole database including indexes should be in memory all the time. Our graph about CPU usage shows 5 % is used by sqlgrey and mysqld. Up to another 5 % by our log analyzing. Therefore I have no problem with more complexity of sqlgrey from the standpoint of performance. - Loss of emails from mailsystems with lots of MTAs/ip addresses for outgoing email, which * retry always from the same ip address (separate queueing systems) * which use a different ip address for every retry (common queueing system, for example a databse driven queueing system) * use a linear backoff algorithm for retries (every 15, 30 or 60 minutes) * use an exponential backoff algorithm (e.g. 5, 10, 20, 40, 80, ...) IP MTA | same | diff | ----+------+------+ lin | a | b | ----+------+------+ exp | c | d | ----+------+------+ * Case a: Here, emails will be delayed till an entry for every MTA is created. It will take longer for FULL than for SMART, but normally no email will be lost. Most of the MTA pools are of this type. * Case b: From my experience, this setup is seldom. Here a chance exists that FULL will not accept an email if there are a lot of MTAs in the pool and the retry time is longer than usual. E.g. if the retry time is 30 minutes, reconnect_delay less than 30 minutes and max_connect_age 24 hours, than the pool can have up to 46 MTAs and we will still accept the email, but it will be delayed for nearly 24 hours. In reality, emails will not be delayed so long, if the MTAs are choosen randomly. How many sites will have such big pools of MTAs? * Case c: Similar to a, but emails will be delayed longer than in case a. Still, for a well-behaved MTA, no email will be lost. * Case d: Sites using such a system are very rude to the Net, in my eyes. If they use a common queuing system, then they can distribute the load on a cluster of outgoing MTAs, but they MUST shield this from the outside, e.g. using NAT. Such installations are not well behaving MTAs and must be whitelisted. - Higher delay FULL can have compared to SMART: The right medizine against that are good whitelists! Good means the number of whitelists and the content of whitelists. In addition to the standard sqlgrey algorithms for filling the tables via traffic analysis, we have implemented our own algorithms :-) * fast propagation (fills from_awl): This algorithm is based on the trust we have about a sending MTA. If we trust it, we accept the email, even if there is no entry about this triple in the whitelists. * MX-check (fills domain_awl): if outgoing and incoming MTAs are the same, put an entry in domain_awl. * A-check (fills domain_awl): if sending MTA sends emails for its hostname only, put it in domain_awl. These additional algorithms give us a lot of entries in our from_awl and domain_awl and therefore reduce the delay significantly. And the last 2 algorithms only work with FULL, not with SMART, with the current design of sqlgrey. Sorry for you, guys :-) About additional whitelists, forward_awl/rcpt_awl is one of them. At the moment fast propagation replaces this table, because most of the time we accept immidiately all the spam mails from forwarding where the remote MTA does not use greylisting, but at the cost of many unnecessary entries in from_awl. Another one would be prevalidation as implented in other greylisting software. here you put the tuple originator/recipient without an ip address in a table for every email you send out. - Trust in the sending MTA: SMART reduces trust in the sending MTA about the handling of temporary errors in a well behaved way based on the relevant RFCs. Well behaved means * trying to retransmit a message several times until a timeout of 3 - 5 days occur. * retransmitting emails in a timely manner (minutes) and not only once in 24 hours For me this is the reason to use FULL. I want to get trust in the sending MTA, the more the better. And if I have trust in a MTA, I will accept emails as fast as I can. Actually, what I want is to strengthen the trust in the sending MTA, e.g. to use the domain from HELO/ELHO or requiring several retries, before I accept a connection. But that's another story and some work for the future, but before spammer change their software. A detailed analysis about the retransmit behavior of other MTAs is needed first. Regards, Michael Storz ------------------------------------------------- Leibniz-Rechenzentrum ! <mailto:St...@lr...> Barer Str. 21 ! Fax: +49 89 2809460 80333 Muenchen, Germany ! Tel: +49 89 289-28840 |
From: Michel B. <mi...@bo...> - 2005-04-29 08:20:03
|
Le Vendredi 29 Avril 2005 10:12, Lionel Bouton a =E9crit : > > The idea is that Mr. Jones' mail will be delayed only for recipients he > didn't contact earlier until he successfully sent mails to <n> differen= t > recipients. Then, I understand it better. --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |
From: Lionel B. <lio...@bo...> - 2005-04-29 08:15:20
|
Michael Storz wrote the following on 29.04.2005 09:58 : >On Thu, 28 Apr 2005, Who Knows wrote: > > >>I'm running mysql-server-3.23.58-4 and it failed attemping your query >> >>*SQL-query:* Documentation <http://dev.mysql.com/doc/mysql/en/SELECT.html> >> >>SELECT src, count( * ) >>FROM domain_awl >>GROUP BY src >>ORDER BY - count( * ) >>LIMIT 10 >> >>*MySQL said: *Documentation >><http://dev.mysql.com/doc/mysql/en/Error-returns.html> >> >>| #1111 - Invalid use of group function| >> >> >> > >The same error appeared at my installation. That's the reason why I use >the command > > > I'm a PostgreSQL (8.0.1) user :-) Forgot to fire SQLite and MySQL to test my query... Michael queries or fine for MySQL (you may want to replace host_ip with src if you use SQLgrey 1.5.x though). Lionel |
From: Lionel B. <lio...@bo...> - 2005-04-29 08:12:44
|
Michel Bouissou wrote the following on 29.04.2005 08:05 : >>Now the question for me is, if for a site like us, the use of a table >>connect_awl as the first awl would not be better (connect_awl = table with >>triple ip, originator, recipient). >> >>>From this table the from_awl would be filled by a propagation algorithm >> >> > >What would be the goal ? Have a shorter entry lifetime in connect_awl than in >from_awl ? > > No, the goal is to apply greylisting if the rcpt changes until we see enough mails from the sender to different rcpts to add him/her to the from_awl. Consider this one more stage in the current "greylist -> awl1 -> awl2" process whose purpose is to avoid nearly random but heavy spam trafic to reach the from_awl which allows more damage to be done than the would-be connect_awl. > > >>similar to the one from from_awl to domain_awl. At the end most of the >>entries in the from_awl would be the originators of mailinglists. All >>other entries would stay in connect_awl. >> >> > >But I'm not sure it would be a good idea. Most users won't mind if >mailing-lists emails may be delayed for a while, but on the contrary, most >users find extremely important that mail they receive from "real humans" >should not be delayed unless necessary, most of the times. >Users will tell that even if they receveive messages from Mr. Jones only once >a week or so, they definitely don't want Mr. Jones' mail to be delayed each >and everytime. > > The idea is that Mr. Jones' mail will be delayed only for recipients he didn't contact earlier until he successfully sent mails to <n> different recipients. Lionel. |
From: Lionel B. <lio...@bo...> - 2005-04-29 08:06:33
|
Michel Bouissou wrote the following on 29.04.2005 09:13 : >Le Jeudi 28 Avril 2005 17:08, Lionel Bouton a =E9crit : > =20 > >>I was afraid we'll have to come to this. This will increase the load on >>the database slightly but I guess this won't be much of a problem. Now >>is the right time to add this table as I'm pushing database changes in >>1.5.6 anyway. >> =20 >> > >I think we should be very careful when considering adding complexity. Ad= ding=20 >complexity is always much of a problem, especially if not absolutely=20 >needed ;-) > =20 > Yep, this is why I chose from_awl as a first step instead of the full triplet. But when you see several thousands mail coming your way that would not be accepted with a first awl stage using triplets you can't avoid considering adding it :-) >>From a performance standpoint, I feel that adding supplementary tables=20 >(connect_awl ? forward_awl ? src_awl ?) would probably be bad. We would = have=20 >to perform queries against many more tables before deciding on the fate = of =20 >any email, and I'm rather sure that querying 6 tables, even if they are=20 >smaller, will always be much slower than querying 3 tables, even if bigg= er.=20 >After all, they are indexed... > =20 > It depends. The connect_awl will slow down the process there's no doubt about it, but the forward/rcpt_awl could remove lots of entries in from_awl. Even with indexes, if a big table's size is cut in half, your index uses half as much memory -> this can mean having to access disk versus doing all searches in memory. Anyway I'll make SQLgrey access these tables only if configured to do so. > =20 > >>Seems good to me. People will be able to set the aggregation level at >>"1" to bypass the connect_awl (I just realised that I can make DB check= s >>depend on the aggregation level, in this case we'll have the same DB lo= ad). >> =20 >> > >I think that if new tables appear in SQLgrey, their very usage must be=20 >_optional_ (with a config parameter), so the admin has control not only = on=20 >"when to move entries from one table to another", but also on "wheter or= not=20 >to use a given table at all". > =20 > I agree. >I feel that such new tables, if implemented, should be off by default, a= nd=20 >could be activated only by users who think their use may be better in th= eir=20 >precise situation. > > =20 > In the development versions, I'll turn them on by defaults and wait for reports to decide what's best for 1.6.0. >>A quick question though. I'm wondering if a "src_awl" would be of any >>use, could people with large sites check how many entries they have in >>domain_awl for the same src ? >> =20 >> > >I don't think this one would be very useful. domain_awl is very useful w= ith no=20 >doubt, and it helps in much reducing the size of from_awl. I'm not sure = that=20 >adding a supplementary level would give a comparable gain that would be = worth=20 >doing it. > > =20 > If I can avoid one more table, that's good :-) Anyway I was thinking about finally adding the log parsing tool, one function of this tool would be to advise the admin to add some IPs/class C in the local whitelists. This would serve the purpose of the src_awl only better as the Perl hashes will handle the decision with no DB access whatsoever. >>In other news, I'm planning to add blacklisting support. Probably after >>1.6.0. The idea is to have a set of conditions where an IP will enter a >>blacklist >> =20 >> > >Again, be careful with complexity. I'm not sure that BLACKlisting is the= =20 >business of a GREYlisting tool... > > =20 > Adding blacklisting support is not for 1.6 so we'll have time to think about it. But the idea here is that SQLgrey sees a lot of spam sources but forget them as soon as they are detected as a spam source! Using this information to fight spam storms could be rather handy. Lionel. |
From: Michael S. <Mic...@lr...> - 2005-04-29 07:58:22
|
On Thu, 28 Apr 2005, Who Knows wrote: > Lionel Bouton wrote: > > > > >I'll do this for 1.5.6. > > > >A quick question though. I'm wondering if a "src_awl" would be of any > >use, could people with large sites check how many entries they have in > >domain_awl for the same src ? > >I'm interested in the results of > >SELECT src, count(*) FROM domain_awl GROUP BY src ORDER BY -count(*) > >LIMIT 10; > >and > >SELECT count(*) FROM domain_awl; > > > > > I'm running mysql-server-3.23.58-4 and it failed attemping your query > > *SQL-query:* Documentation <http://dev.mysql.com/doc/mysql/en/SELECT.html> > > SELECT src, count( * ) > FROM domain_awl > GROUP BY src > ORDER BY - count( * ) > LIMIT 10 > > *MySQL said: *Documentation > <http://dev.mysql.com/doc/mysql/en/Error-returns.html> > > | #1111 - Invalid use of group function| > The same error appeared at my installation. That's the reason why I use the command select host_ip,count(*) as cnt from domain_awl group by host_ip order by cnt desc limit 10; or select src,count(*) as cnt from domain_awl group by src order by cnt desc limit 10; Michael Storz ------------------------------------------------- Leibniz-Rechenzentrum ! <mailto:St...@lr...> Barer Str. 21 ! Fax: +49 89 2809460 80333 Muenchen, Germany ! Tel: +49 89 289-28840 |
From: Michel B. <mi...@bo...> - 2005-04-29 07:13:18
|
Le Jeudi 28 Avril 2005 17:08, Lionel Bouton a =E9crit : > > I was afraid we'll have to come to this. This will increase the load on > the database slightly but I guess this won't be much of a problem. Now > is the right time to add this table as I'm pushing database changes in > 1.5.6 anyway. I think we should be very careful when considering adding complexity. Addin= g=20 complexity is always much of a problem, especially if not absolutely=20 needed ;-) =46rom a performance standpoint, I feel that adding supplementary tables=20 (connect_awl ? forward_awl ? src_awl ?) would probably be bad. We would hav= e=20 to perform queries against many more tables before deciding on the fate of = =20 any email, and I'm rather sure that querying 6 tables, even if they are=20 smaller, will always be much slower than querying 3 tables, even if bigger.= =20 After all, they are indexed... > Seems good to me. People will be able to set the aggregation level at > "1" to bypass the connect_awl (I just realised that I can make DB checks > depend on the aggregation level, in this case we'll have the same DB load= ). I think that if new tables appear in SQLgrey, their very usage must be=20 _optional_ (with a config parameter), so the admin has control not only on= =20 "when to move entries from one table to another", but also on "wheter or no= t=20 to use a given table at all". I feel that such new tables, if implemented, should be off by default, and= =20 could be activated only by users who think their use may be better in their= =20 precise situation. > A quick question though. I'm wondering if a "src_awl" would be of any > use, could people with large sites check how many entries they have in > domain_awl for the same src ? I don't think this one would be very useful. domain_awl is very useful with= no=20 doubt, and it helps in much reducing the size of from_awl. I'm not sure tha= t=20 adding a supplementary level would give a comparable gain that would be wor= th=20 doing it. The case in which this table could be useful would be the case of forwarder= s=20 that can produce quite many entries from different originating domains in=20 from_awl. Maybe in this case having a src_awl would be of interest for them= =2E=20 Then this table could be turned on ony at sites that see a significant amou= nt=20 of such entries in their from_awl -- many ISPs will probably have some, but= =20 many company servers won't need it as most enterprise servers don't receive= =20 much mail from forwarding services. It's worth noting that if such forwarders were using SRS, this need would=20 disappear as email coming from them would always show a MAIL FROM=20 @theirdomain.tld > In other news, I'm planning to add blacklisting support. Probably after > 1.6.0. The idea is to have a set of conditions where an IP will enter a > blacklist Again, be careful with complexity. I'm not sure that BLACKlisting is the=20 business of a GREYlisting tool... =2D-=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E Appel de 200 Informaticiens pour le NON au Trait=E9 Constitutionnel Europ=E9en: http://www.200informaticiens.ras.eu.org |
From: Michel B. <mi...@bo...> - 2005-04-29 06:05:44
|
Le Jeudi 28 Avril 2005 16:25, Michael Storz a =E9crit : > Conclusion: > > The way how table from_awl ist automatically filled after one successfu= l > retry helped this spammer to circumvene greylisting for a large amout o= f > spam emails. This only shows that greylisting isn't by itself the miracle solution tha= t can=20 block all and every kind of spam. But there is no single solution that ca= n=20 block all spam. OTOH, SQLgrey proves very highly efficient for a large proportion of spam= ,=20 with a very low system resources usage -- compared to other anti-spam=20 solutions -- which is already excellent. > Now the question for me is, if for a site like us, the use of a table > connect_awl as the first awl would not be better (connect_awl =3D table= with > triple ip, originator, recipient). > > >From this table the from_awl would be filled by a propagation algorith= m What would be the goal ? Have a shorter entry lifetime in connect_awl tha= n in=20 from_awl ? > similar to the one from from_awl to domain_awl. At the end most of the > entries in the from_awl would be the originators of mailinglists. All > other entries would stay in connect_awl. But I'm not sure it would be a good idea. Most users won't mind if=20 mailing-lists emails may be delayed for a while, but on the contrary, mos= t=20 users find extremely important that mail they receive from "real humans"=20 should not be delayed unless necessary, most of the times. Users will tell that even if they receveive messages from Mr. Jones only = once=20 a week or so, they definitely don't want Mr. Jones' mail to be delayed ea= ch=20 and everytime. So Mr Jones' entry should go to from_awl quickly, and from_awl should=20 definitely not be reserved to MLs that transmit zillions of emails... Cheers. --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E Appel de 200 Informaticiens pour le NON au Trait=E9 Constitutionnel Europ=E9en: http://www.200informaticiens.ras.eu.org |
From: Who K. <qui...@me...> - 2005-04-29 03:05:19
|
Lionel Bouton wrote: > >I'll do this for 1.5.6. > >A quick question though. I'm wondering if a "src_awl" would be of any >use, could people with large sites check how many entries they have in >domain_awl for the same src ? >I'm interested in the results of >SELECT src, count(*) FROM domain_awl GROUP BY src ORDER BY -count(*) >LIMIT 10; >and >SELECT count(*) FROM domain_awl; > > I'm running mysql-server-3.23.58-4 and it failed attemping your query *SQL-query:* Documentation <http://dev.mysql.com/doc/mysql/en/SELECT.html> SELECT src, count( * ) FROM domain_awl GROUP BY src ORDER BY - count( * ) LIMIT 10 *MySQL said: *Documentation <http://dev.mysql.com/doc/mysql/en/Error-returns.html> | #1111 - Invalid use of group function| |
From: Lionel B. <lio...@bo...> - 2005-04-28 22:20:05
|
Michael Storz wrote the following on 28.04.2005 18:42 : >Yes, this was something I was also thinking about. At the moment I made >this manually. I analyzed the tables and put ip addresses with lots of >entries in the tables in client_ip_whitelist.local. > > > I see, this is probably the best way of speeding the whole process, Perl hashes can't be slower than a database query! >Here ist the output of the different select-statements I use: > >(...) > >select substring_index(host_ip, '.', 3),count(*) as cnt from from_awl >group by substring_index(host_ip, '.', 3) order by cnt desc limit 10; > > Given you manually query for the 3 most significant bytes, do you use 'full' for the greylisting algorithm? Maybe smart would be better suited (would slowly decrease the number of database entries by replacing IPs by class C nets). It's difficult to say how much less entries you would have without a script applying the algo after DNS lookups though... I made it the default because it is more friendly with mail pools, but the side effect is that it is also more friendly with your database :-) >As you can see, I have already optimized my domain_awl pretty good. The >only candidate to whitelist is 141.40.103.103, one MTA of a local >mailcluster, the otherone I already whitelisted. Interesting is the line > >| 141.40.103.103 | 8189 | > >When I analyzed the from_awl, I found several such ip addresses with >extreme high numbers of entries. Going through the logs, I finally found >out, why this was the case. The reason is forwarding. In the above case >there were just 2 people which forwarded their email to mailbox on our >system. All these entries have originators thought up by spammers, which >most of the time do not exist. > >This brings me to my next wish :-) I need a forward_awl. And therefore >this is another reason to have the connect_awl, otherwise I have to >populate the forward_awl manually (actually, I have already written a >little script to extract these entries out of the log file). Again >aggregation would be done to fill the table, but this time on originator, >whereas for the from_awl aggreating on recipient would be used. > > I understand what you want (it took me some time though :)). The forward_awl (in fact more of a rcpt_awl if we refer to the field being awl'ed) will prevent the from_awl to be filled with hundreds of entries. I realised some time ago that I don't need to add connect_awl (or forward/rcp_awl for that matter) before releasing 1.5.6 with IPv6 and optin/optout support. This can wait for 1.5.7 and I won't have to code any database upgrade as SQLgrey checks for missing tables and automatically recreate them during startup. So we stil have some time to discuss the details which is a good thing. >The forward_awl will decrease the need for a src_awl. > > Yep, I realize that. Really good idea. Do any other on the list saw similar behaviours (spammer exploiting the from_awl weakness and forwards generating lots of from_awl entries)? Lionel. |
From: Michael S. <Mic...@lr...> - 2005-04-28 16:42:10
|
On Thu, 28 Apr 2005, Lionel Bouton wrote: > > A quick question though. I'm wondering if a "src_awl" would be of any > use, could people with large sites check how many entries they have in > domain_awl for the same src ? > I'm interested in the results of > SELECT src, count(*) FROM domain_awl GROUP BY src ORDER BY -count(*) > LIMIT 10; > and > SELECT count(*) FROM domain_awl; > > This will show if we can reduce the DB load by merging some entries in > another table for quicker lookups. Yes, this was something I was also thinking about. At the moment I made this manually. I analyzed the tables and put ip addresses with lots of entries in the tables in client_ip_whitelist.local. Here ist the output of the different select-statements I use: select host_ip,count(*) as cnt from domain_awl group by host_ip order by cnt desc limit 10; +----------------+-----+ | host_ip | cnt | +----------------+-----+ | 141.40.103.103 | 90 | | 132.230.2.211 | 69 | | 130.60.68.105 | 60 | | 194.95.177.104 | 54 | | 194.95.177.121 | 53 | | 130.60.68.106 | 52 | | 153.96.1.62 | 52 | | 141.48.3.8 | 51 | | 195.200.32.20 | 50 | | 62.153.78.100 | 47 | +----------------+-----+ select substring_index(host_ip, '.', 3),count(*) as cnt from domain_awl group by substring_index(host_ip, '.', 3) order by cnt desc limit 10; +----------------------------------+-----+ | substring_index(host_ip, '.', 3) | cnt | +----------------------------------+-----+ | 80.237.130 | 196 | | 193.125.235 | 130 | | 217.115.142 | 128 | | 193.109.255 | 113 | | 130.60.68 | 112 | | 194.95.177 | 107 | | 81.209.184 | 102 | | 195.200.32 | 94 | | 81.209.148 | 91 | | 141.40.103 | 90 | +----------------------------------+-----+ select count(*) from domain_awl; +----------+ | count(*) | +----------+ | 51800 | +----------+ select host_ip,count(*) as cnt from from_awl group by host_ip order by cnt desc limit 10; +-----------------+------+ | host_ip | cnt | +-----------------+------+ | 141.40.103.103 | 8189 | | 62.216.178.196 | 2010 | | 80.237.203.120 | 1651 | | 217.115.139.21 | 1528 | | 146.82.138.7 | 1126 | | 80.80.20.42 | 1091 | | 132.229.231.52 | 1080 | | 217.172.173.165 | 1063 | | 192.108.115.12 | 1054 | | 194.208.88.1 | 984 | +-----------------+------+ select substring_index(host_ip, '.', 3),count(*) as cnt from from_awl group by substring_index(host_ip, '.', 3) order by cnt desc limit 10; +----------------------------------+------+ | substring_index(host_ip, '.', 3) | cnt | +----------------------------------+------+ | 141.40.103 | 8190 | | 72.5.1 | 4684 | | 64.125.87 | 2018 | | 62.216.178 | 2015 | | 208.184.55 | 1825 | | 80.237.203 | 1669 | | 206.190.36 | 1628 | | 217.115.139 | 1534 | | 216.155.197 | 1380 | | 140.98.193 | 1349 | +----------------------------------+------+ select count(*) from from_awl; +----------+ | count(*) | +----------+ | 353241 | +----------+ As you can see, I have already optimized my domain_awl pretty good. The only candidate to whitelist is 141.40.103.103, one MTA of a local mailcluster, the otherone I already whitelisted. Interesting is the line | 141.40.103.103 | 8189 | When I analyzed the from_awl, I found several such ip addresses with extreme high numbers of entries. Going through the logs, I finally found out, why this was the case. The reason is forwarding. In the above case there were just 2 people which forwarded their email to mailbox on our system. All these entries have originators thought up by spammers, which most of the time do not exist. This brings me to my next wish :-) I need a forward_awl. And therefore this is another reason to have the connect_awl, otherwise I have to populate the forward_awl manually (actually, I have already written a little script to extract these entries out of the log file). Again aggregation would be done to fill the table, but this time on originator, whereas for the from_awl aggreating on recipient would be used. The forward_awl will decrease the need for a src_awl. If you look at the other high numbers from the from_awl, these are networks of BlueStream Media, a well known spammer, which I have not blocked yet, s. 64.125.87.0/24 http://www.spamhaus.org/sbl/sbl.lasso?query=SBL18058 64.125.188.0/25 http://www.spamhaus.org/sbl/sbl.lasso?query=SBL14961 69.25.109.0/24 http://www.spamhaus.org/sbl/sbl.lasso?query=SBL20650 72.5.1.0/24 http://www.spamhaus.org/sbl/sbl.lasso?query=SBL22215 208.184.55.0/25 http://www.spamhaus.org/sbl/sbl.lasso?query=SBL13542 Regards, Michael Storz ------------------------------------------------- Leibniz-Rechenzentrum ! <mailto:St...@lr...> Barer Str. 21 ! Fax: +49 89 2809460 80333 Muenchen, Germany ! Tel: +49 89 289-28840 |
From: Lionel B. <lio...@bo...> - 2005-04-28 12:42:05
|
Hi, log_override parsing is flawed. If you don't put spaces in the "logtype1:level,logtype2:level,..." string it will be fine but SQLgrey will spit out warnings :-( I you put spaces depending on the actual spaces you may have various behaviors... Fixed in my tree, will be in 1.5.6 Lionel |
From: Michel B. <mi...@bo...> - 2005-04-27 18:51:44
|
Le Mercredi 27 Avril 2005 20:16, Lionel Bouton a =E9crit : > > A quick fix would be to look for {server}{log}{spam} in sqlgrey v1.5.5 > and replace it with {sqlgrey}{log}{spam}... Thanks, it works. --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E Appel de 200 Informaticiens pour le NON au Trait=E9 Constitutionnel Europ=E9en: http://www.200informaticiens.ras.eu.org |
From: Lionel B. <lio...@bo...> - 2005-04-27 18:17:16
|
Michel Bouissou wrote the following on 27.04.2005 09:41 : >Hi, > >Since I upgraded to SQLgrey 1.5.5 yesterday, my system doesn't log "Probable >spam" attempts anymore, when cleaning up connections that didn't come back. > >However, loglevel is not set in sqlgrey.conf, which I expect to default "2" >according to the comments, and "2" should list "Probable spam" attempts... > >Any clue ? > > > That's a bug (obviously), the check done before doing the spam listing is bogus. The log lines will change too (I made most of them more compact and reflected the log category at the beginning to help people tune log_override). I'm in the middle of IPv6 and optin/optout support, the fix will be in 1.5.6 which will have them. A quick fix would be to look for {server}{log}{spam} in sqlgrey v1.5.5 and replace it with {sqlgrey}{log}{spam}... Lionel. |
From: Michel B. <mi...@bo...> - 2005-04-27 07:41:19
|
Hi, Since I upgraded to SQLgrey 1.5.5 yesterday, my system doesn't log "Probable spam" attempts anymore, when cleaning up connections that didn't come back. However, loglevel is not set in sqlgrey.conf, which I expect to default "2" according to the comments, and "2" should list "Probable spam" attempts... Any clue ? -- Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |
From: Lionel B. <lio...@bo...> - 2005-04-26 14:45:07
|
Ray Booysen wrote the following on 26.04.2005 16:31 : > Can I submit a bug to Gentoo to have the ebuild placed in portage or > will you deal with that? > You can place a comment on the bug I already opened long ago :-) (do the usual "ALL sqlgrey" search on bugs.gentoo.org). A "worksforme" (if indeed it does so, which it should) would be great ! Lionel. |
From: Ray B. <rj_...@rj...> - 2005-04-26 14:31:42
|
Lionel Bouton said the following on 26/04/2005 15:12: > Ray Booysen wrote the following on 26.04.2005 09:34 : > > >>Ebuild for Gentoo? :) > > > > There's one included in the tar.bz2 now. I updated the web site to point > to the last one. > > Thanks for the reminder, > > Lionel > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Sqlgrey-users mailing list > Sql...@li... > https://lists.sourceforge.net/lists/listinfo/sqlgrey-users Can I submit a bug to Gentoo to have the ebuild placed in portage or will you deal with that? Regards Ray -- Ray Booysen rj_...@rj... |
From: Lionel B. <lio...@bo...> - 2005-04-26 14:12:39
|
Ray Booysen wrote the following on 26.04.2005 09:34 : > Ebuild for Gentoo? :) There's one included in the tar.bz2 now. I updated the web site to point to the last one. Thanks for the reminder, Lionel |
From: Ray B. <rj_...@rj...> - 2005-04-26 07:35:11
|
Ebuild for Gentoo? :) I would love to use the new version. Thanks for all your hard work. Regards Ray Lionel Bouton said the following on 26/04/2005 07:20: > Hi, > > 1.5.5 is on sourceforge. There are fixes for the DEVERP and SRS code in. > I finished the new logging code for this release. This is not much > tested (the default configuration works for me though), but those fed up > with some unwanted log lines should be able to filter them out with > "log_override". > I took the time to document the log messages to make it easier to > understand what to fine-tune, see the sqlgrey.conf file for the details. > > Cheers, > > Lionel. > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Sqlgrey-users mailing list > Sql...@li... > https://lists.sourceforge.net/lists/listinfo/sqlgrey-users -- Ray Booysen rj_...@rj... |
From: Lionel B. <lio...@bo...> - 2005-04-26 06:20:55
|
Hi, 1.5.5 is on sourceforge. There are fixes for the DEVERP and SRS code in. I finished the new logging code for this release. This is not much tested (the default configuration works for me though), but those fed up with some unwanted log lines should be able to filter them out with "log_override". I took the time to document the log messages to make it easier to understand what to fine-tune, see the sqlgrey.conf file for the details. Cheers, Lionel. |
From: Michel B. <mi...@bo...> - 2005-04-26 06:10:50
|
Le Dimanche 24 Avril 2005 22:05, Lionel Bouton a =E9crit : > > >I have the impression that with such "+"ed addresses, for some reason,= the > >from_awl "sender_domain" gets the sending server name FQDN instead of > > getting the sender's domain alone. > > I've looked at the code and AFAIKT nowhere are the $fqdn and > $sender_domain mixed up. What could happen is that some misconfigured > server could change the MAIL FROM: on retries. This could explain what > you are witnessing. Could you have a look into Postfix's logs? I've checked both my Postfix logs (quickly, I didn't have much time) and = the=20 code, and so far I don't understand clearly what happens on this respect.= It=20 is quite possible that I have reported a non-existing problem ;-) but my=20 server being only secondary on the concerned receiving domain, I have to = take=20 a look at what happens at the primary itself. Anyway, as the primary also uses SQLgrey, the concerned ML addresses shou= ld=20 already be whitelisted by SQLgrey at the primary, so I shouldn't get all=20 these connections and greylist them. There's something puzzling me there. > If it's the case, this would be a good candidate to the whitelists... This concerns all of the "gentoo" mailing-lists, coming from server=20 lists.gentoo.org[140.105.134.102]... Cheers. --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E Appel de 200 Informaticiens pour le NON au Trait=E9 Constitutionnel Europ=E9en: http://www.200informaticiens.ras.eu.org |
From: Michel B. <mi...@bo...> - 2005-04-26 06:05:01
|
Le Dimanche 24 Avril 2005 22:05, Lionel Bouton a =E9crit : > > BTW, I just found out that we were matching "SRS" in the > normalize_sender, but it is called after lowercasing the whole string..= . > I've neither seen an SRS email, but I think the current code doesn't > match. I lowercased the corresponding regexp in my tree. If you wish, I can create an alias for you at my domain, that would be=20 forwarded and SRS'd to any address of your choice. Email forwarded thru t= his=20 alias would be SRS'd, so you could check how it looks like by yourself. Please tell me by private mail if you would be interested. > >The original corresponding "true" sender localpart for this example wa= s > > (case respected): > > > >bounce-439452-5541B8D889BBAF2DE05C02B4A0AF204F93174A98-439250@... > > I'm adding a while loop to call the regexp until the string doesn't > change anymore, this should strip each hex sequence one by one. I'm > looking at the s/orig/dest/g syntax to see the problem (might come from > variable usage in dest: $1#$2). I believe that you are correct. Cheers. --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E Appel de 200 Informaticiens pour le NON au Trait=E9 Constitutionnel Europ=E9en: http://www.200informaticiens.ras.eu.org |
From: Lionel B. <lio...@bo...> - 2005-04-24 20:05:47
|
Michel Bouissou wrote the following on 23.04.2005 20:07 : >Hi there, > >I've been wondering for some time if there couldn't be a bug in sqlgrey for >"plussed" extension source addresses. > >These last days, I see very often VERP addresses from a ML, in a format such >as: > >list-name+bounces-123-destination.user=des...@li... > >getting greylisted everytime by SQLgrey (and entered in the connect table), >where I see that the from_awl table actually already contains an entry such >as: > >sender_name = list-name >sender_domain = server.listdomain.org > >And this entry gets updated when the mail finally makes it thru. > >I have the impression that with such "+"ed addresses, for some reason, the >from_awl "sender_domain" gets the sending server name FQDN instead of getting >the sender's domain alone. > > > I've looked at the code and AFAIKT nowhere are the $fqdn and $sender_domain mixed up. What could happen is that some misconfigured server could change the MAIL FROM: on retries. This could explain what you are witnessing. Could you have a look into Postfix's logs? If it's the case, this would be a good candidate to the whitelists... Lionel |
From: Lionel B. <lio...@bo...> - 2005-04-24 20:05:40
|
Michel Bouissou wrote the following on 23.04.2005 20:30 : >I also find values such as: > >bounce-#-5541b8d889bbaf2de05c02b4a0af204f93174a98-# > >in the "sender_name" column of from_awl, for some VERP messages that are >received on a regular basis. > >I believe such addresses should have been collapsed into: > >bounce-#-#-# > > I believed the same too :-/ >...so there may be a bug in the substitution regexp for such cases. > > > BTW, I just found out that we were matching "SRS" in the normalize_sender, but it is called after lowercasing the whole string... I've neither seen an SRS email, but I think the current code doesn't match. I lowercased the corresponding regexp in my tree. >The original corresponding "true" sender localpart for this example was (case >respected): > >bounce-439452-5541B8D889BBAF2DE05C02B4A0AF204F93174A98-439250@... > > > I'm adding a while loop to call the regexp until the string doesn't change anymore, this should strip each hex sequence one by one. I'm looking at the s/orig/dest/g syntax to see the problem (might come from variable usage in dest: $1#$2). >(Note that the 2 parts that have been correctly substituted were [0-9]+, where >the part that wasn't substituted was Hex, and the original address was >uppercase Hex, where SQLgrey stored it in from_awl as lowercase Hex...) > > > SQLgrey always lowercases the e-mail adresses. Lionel. |
From: Lionel B. <lio...@bo...> - 2005-04-24 14:14:41
|
Who Knows wrote the following on 24.04.2005 03:52 : > If I understand that correctly ( don't get me wrong I know how hard some > of these concepts are to put into words ) it is exactly what I want. > > By final recipent I assume you mean something like: > > the email address me...@do... is and alias to me...@do... > > then if domaintwo.com is opted out, the message to me...@do... would > still be greylisted because sqlgrey wouldn't know that it would > eventually go > to domaintwo.com. Right? > Exactly. For example, if you make a nice web frontend for customers wanting to opt-in, you might want to add all their (one-to-one) aliases if you allow customers set aliases when they opt-in. Lionel. |