From: Lionel B. <lio...@bo...> - 2005-04-28 22:20:05
|
Michael Storz wrote the following on 28.04.2005 18:42 : >Yes, this was something I was also thinking about. At the moment I made >this manually. I analyzed the tables and put ip addresses with lots of >entries in the tables in client_ip_whitelist.local. > > > I see, this is probably the best way of speeding the whole process, Perl hashes can't be slower than a database query! >Here ist the output of the different select-statements I use: > >(...) > >select substring_index(host_ip, '.', 3),count(*) as cnt from from_awl >group by substring_index(host_ip, '.', 3) order by cnt desc limit 10; > > Given you manually query for the 3 most significant bytes, do you use 'full' for the greylisting algorithm? Maybe smart would be better suited (would slowly decrease the number of database entries by replacing IPs by class C nets). It's difficult to say how much less entries you would have without a script applying the algo after DNS lookups though... I made it the default because it is more friendly with mail pools, but the side effect is that it is also more friendly with your database :-) >As you can see, I have already optimized my domain_awl pretty good. The >only candidate to whitelist is 141.40.103.103, one MTA of a local >mailcluster, the otherone I already whitelisted. Interesting is the line > >| 141.40.103.103 | 8189 | > >When I analyzed the from_awl, I found several such ip addresses with >extreme high numbers of entries. Going through the logs, I finally found >out, why this was the case. The reason is forwarding. In the above case >there were just 2 people which forwarded their email to mailbox on our >system. All these entries have originators thought up by spammers, which >most of the time do not exist. > >This brings me to my next wish :-) I need a forward_awl. And therefore >this is another reason to have the connect_awl, otherwise I have to >populate the forward_awl manually (actually, I have already written a >little script to extract these entries out of the log file). Again >aggregation would be done to fill the table, but this time on originator, >whereas for the from_awl aggreating on recipient would be used. > > I understand what you want (it took me some time though :)). The forward_awl (in fact more of a rcpt_awl if we refer to the field being awl'ed) will prevent the from_awl to be filled with hundreds of entries. I realised some time ago that I don't need to add connect_awl (or forward/rcp_awl for that matter) before releasing 1.5.6 with IPv6 and optin/optout support. This can wait for 1.5.7 and I won't have to code any database upgrade as SQLgrey checks for missing tables and automatically recreate them during startup. So we stil have some time to discuss the details which is a good thing. >The forward_awl will decrease the need for a src_awl. > > Yep, I realize that. Really good idea. Do any other on the list saw similar behaviours (spammer exploiting the from_awl weakness and forwards generating lots of from_awl entries)? Lionel. |