Thread: Re: [Sqlgrey-users] Spammer alex@hotmail.com benefits from from_awl

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Thu, 28 Apr 2005, Lionel Bouton wrote:

>
> A quick question though. I'm wondering if a "src_awl" would be of any
> use, could people with large sites check how many entries they have in
> domain_awl for the same src ?
> I'm interested in the results of
> SELECT src, count(*) FROM domain_awl GROUP BY src ORDER BY -count(*)
> LIMIT 10;
> and
> SELECT count(*) FROM domain_awl;
>
> This will show if we can reduce the DB load by merging some entries in
> another table for quicker lookups.

Yes, this was something I was also thinking about. At the moment I made
this manually. I analyzed the tables and put ip addresses with lots of
entries in the tables in client_ip_whitelist.local.

Here ist the output of the different select-statements I use:

select host_ip,count(*) as cnt from domain_awl group by host_ip order by
cnt desc limit 10;

+----------------+-----+
| host_ip        | cnt |
+----------------+-----+
| 141.40.103.103 |  90 |
| 132.230.2.211  |  69 |
| 130.60.68.105  |  60 |
| 194.95.177.104 |  54 |
| 194.95.177.121 |  53 |
| 130.60.68.106  |  52 |
| 153.96.1.62    |  52 |
| 141.48.3.8     |  51 |
| 195.200.32.20  |  50 |
| 62.153.78.100  |  47 |
+----------------+-----+

select substring_index(host_ip, '.', 3),count(*) as cnt from domain_awl
group by substring_index(host_ip, '.', 3) order by cnt desc limit 10;

+----------------------------------+-----+
| substring_index(host_ip, '.', 3) | cnt |
+----------------------------------+-----+
| 80.237.130                       | 196 |
| 193.125.235                      | 130 |
| 217.115.142                      | 128 |
| 193.109.255                      | 113 |
| 130.60.68                        | 112 |
| 194.95.177                       | 107 |
| 81.209.184                       | 102 |
| 195.200.32                       |  94 |
| 81.209.148                       |  91 |
| 141.40.103                       |  90 |
+----------------------------------+-----+

select count(*) from domain_awl;
+----------+
| count(*) |
+----------+
|    51800 |
+----------+

select host_ip,count(*) as cnt from from_awl group by host_ip order by cnt
desc limit 10;

+-----------------+------+
| host_ip         | cnt  |
+-----------------+------+
| 141.40.103.103  | 8189 |
| 62.216.178.196  | 2010 |
| 80.237.203.120  | 1651 |
| 217.115.139.21  | 1528 |
| 146.82.138.7    | 1126 |
| 80.80.20.42     | 1091 |
| 132.229.231.52  | 1080 |
| 217.172.173.165 | 1063 |
| 192.108.115.12  | 1054 |
| 194.208.88.1    |  984 |
+-----------------+------+

select substring_index(host_ip, '.', 3),count(*) as cnt from from_awl
group by substring_index(host_ip, '.', 3) order by cnt desc limit 10;
+----------------------------------+------+
| substring_index(host_ip, '.', 3) | cnt  |
+----------------------------------+------+
| 141.40.103                       | 8190 |
| 72.5.1                           | 4684 |
| 64.125.87                        | 2018 |
| 62.216.178                       | 2015 |
| 208.184.55                       | 1825 |
| 80.237.203                       | 1669 |
| 206.190.36                       | 1628 |
| 217.115.139                      | 1534 |
| 216.155.197                      | 1380 |
| 140.98.193                       | 1349 |
+----------------------------------+------+

select count(*) from from_awl;
+----------+
| count(*) |
+----------+
|   353241 |
+----------+

As you can see, I have already optimized my domain_awl pretty good. The
only candidate to whitelist is 141.40.103.103, one MTA of a local
mailcluster, the otherone I already whitelisted. Interesting is the line

| 141.40.103.103  | 8189 |

When I analyzed the from_awl, I found several such ip addresses with
extreme high numbers of entries. Going through the logs, I finally found
out, why this was the case. The reason is forwarding. In the above case
there were just 2 people which forwarded their email to mailbox on our
system. All these entries have originators thought up by spammers, which
most of the time do not exist.

This brings me to my next wish :-) I need a forward_awl. And therefore
this is another reason to have the connect_awl, otherwise I have to
populate the forward_awl manually (actually, I have already written a
little script to extract these entries out of the log file). Again
aggregation would be done to fill the table, but this time on originator,
whereas for the from_awl aggreating on recipient would be used.

The forward_awl will decrease the need for a src_awl.

If you look at the other high numbers from the from_awl, these are
networks of BlueStream Media, a well known spammer, which I have not
blocked yet, s.

64.125.87.0/24          http://www.spamhaus.org/sbl/sbl.lasso?query=SBL18058
64.125.188.0/25         http://www.spamhaus.org/sbl/sbl.lasso?query=SBL14961
69.25.109.0/24          http://www.spamhaus.org/sbl/sbl.lasso?query=SBL20650
72.5.1.0/24             http://www.spamhaus.org/sbl/sbl.lasso?query=SBL22215
208.184.55.0/25         http://www.spamhaus.org/sbl/sbl.lasso?query=SBL13542

Regards,
Michael Storz
-------------------------------------------------
Leibniz-Rechenzentrum   !   <mailto:St...@lr...>
Barer Str. 21           !   Fax: +49 89 2809460
80333 Muenchen, Germany !   Tel: +49 89 289-28840

Thread: Re: [Sqlgrey-users] Spammer alex@hotmail.com benefits from from_awl

sqlgrey-users