From: Michael S. <Mic...@lr...> - 2005-04-28 16:42:10
|
On Thu, 28 Apr 2005, Lionel Bouton wrote: > > A quick question though. I'm wondering if a "src_awl" would be of any > use, could people with large sites check how many entries they have in > domain_awl for the same src ? > I'm interested in the results of > SELECT src, count(*) FROM domain_awl GROUP BY src ORDER BY -count(*) > LIMIT 10; > and > SELECT count(*) FROM domain_awl; > > This will show if we can reduce the DB load by merging some entries in > another table for quicker lookups. Yes, this was something I was also thinking about. At the moment I made this manually. I analyzed the tables and put ip addresses with lots of entries in the tables in client_ip_whitelist.local. Here ist the output of the different select-statements I use: select host_ip,count(*) as cnt from domain_awl group by host_ip order by cnt desc limit 10; +----------------+-----+ | host_ip | cnt | +----------------+-----+ | 141.40.103.103 | 90 | | 132.230.2.211 | 69 | | 130.60.68.105 | 60 | | 194.95.177.104 | 54 | | 194.95.177.121 | 53 | | 130.60.68.106 | 52 | | 153.96.1.62 | 52 | | 141.48.3.8 | 51 | | 195.200.32.20 | 50 | | 62.153.78.100 | 47 | +----------------+-----+ select substring_index(host_ip, '.', 3),count(*) as cnt from domain_awl group by substring_index(host_ip, '.', 3) order by cnt desc limit 10; +----------------------------------+-----+ | substring_index(host_ip, '.', 3) | cnt | +----------------------------------+-----+ | 80.237.130 | 196 | | 193.125.235 | 130 | | 217.115.142 | 128 | | 193.109.255 | 113 | | 130.60.68 | 112 | | 194.95.177 | 107 | | 81.209.184 | 102 | | 195.200.32 | 94 | | 81.209.148 | 91 | | 141.40.103 | 90 | +----------------------------------+-----+ select count(*) from domain_awl; +----------+ | count(*) | +----------+ | 51800 | +----------+ select host_ip,count(*) as cnt from from_awl group by host_ip order by cnt desc limit 10; +-----------------+------+ | host_ip | cnt | +-----------------+------+ | 141.40.103.103 | 8189 | | 62.216.178.196 | 2010 | | 80.237.203.120 | 1651 | | 217.115.139.21 | 1528 | | 146.82.138.7 | 1126 | | 80.80.20.42 | 1091 | | 132.229.231.52 | 1080 | | 217.172.173.165 | 1063 | | 192.108.115.12 | 1054 | | 194.208.88.1 | 984 | +-----------------+------+ select substring_index(host_ip, '.', 3),count(*) as cnt from from_awl group by substring_index(host_ip, '.', 3) order by cnt desc limit 10; +----------------------------------+------+ | substring_index(host_ip, '.', 3) | cnt | +----------------------------------+------+ | 141.40.103 | 8190 | | 72.5.1 | 4684 | | 64.125.87 | 2018 | | 62.216.178 | 2015 | | 208.184.55 | 1825 | | 80.237.203 | 1669 | | 206.190.36 | 1628 | | 217.115.139 | 1534 | | 216.155.197 | 1380 | | 140.98.193 | 1349 | +----------------------------------+------+ select count(*) from from_awl; +----------+ | count(*) | +----------+ | 353241 | +----------+ As you can see, I have already optimized my domain_awl pretty good. The only candidate to whitelist is 141.40.103.103, one MTA of a local mailcluster, the otherone I already whitelisted. Interesting is the line | 141.40.103.103 | 8189 | When I analyzed the from_awl, I found several such ip addresses with extreme high numbers of entries. Going through the logs, I finally found out, why this was the case. The reason is forwarding. In the above case there were just 2 people which forwarded their email to mailbox on our system. All these entries have originators thought up by spammers, which most of the time do not exist. This brings me to my next wish :-) I need a forward_awl. And therefore this is another reason to have the connect_awl, otherwise I have to populate the forward_awl manually (actually, I have already written a little script to extract these entries out of the log file). Again aggregation would be done to fill the table, but this time on originator, whereas for the from_awl aggreating on recipient would be used. The forward_awl will decrease the need for a src_awl. If you look at the other high numbers from the from_awl, these are networks of BlueStream Media, a well known spammer, which I have not blocked yet, s. 64.125.87.0/24 http://www.spamhaus.org/sbl/sbl.lasso?query=SBL18058 64.125.188.0/25 http://www.spamhaus.org/sbl/sbl.lasso?query=SBL14961 69.25.109.0/24 http://www.spamhaus.org/sbl/sbl.lasso?query=SBL20650 72.5.1.0/24 http://www.spamhaus.org/sbl/sbl.lasso?query=SBL22215 208.184.55.0/25 http://www.spamhaus.org/sbl/sbl.lasso?query=SBL13542 Regards, Michael Storz ------------------------------------------------- Leibniz-Rechenzentrum ! <mailto:St...@lr...> Barer Str. 21 ! Fax: +49 89 2809460 80333 Muenchen, Germany ! Tel: +49 89 289-28840 |