From: Michael S. <Mic...@lr...> - 2005-05-06 22:13:32
|
Analyzing our from_awl, I found the following: The table has 365.208 entries from 178.026 different ip addresses. From these ip addresses - 129.210 have exactly one entry and this is with sender_domain = "-undef-" - 38.904 have only entries without sender_domain = "-undef-" - only 9.912 have entries with both kind of sender_domains If we split the from_awl in 2 tables - from_awl: sender_domain <> "-undef-" - dsn_awl: sender_domain = "-undef-" we get a massive reduction of entries in the from_awl and also a massive reduction of table size, since we do not have to store sender_name and sender_domain in dsn_awl. In our case, dsn_awl would have 139.122 entries and from_awl 48.816 entries. Since we know which table to query based on sender_domain/sender_name no additional table lookups are needed. Another advantage would be that the from_awl does not change so much as before, because all the DSNs which result as backscatter of the spammers are excluded now. And we could decide to use a different awl_age for the tables. If we include connect_awl, I don't think we need a split of this table, because the backscatter DSNs will propagate fast into the dsn_awl, only normal DSNs will stay in connect_awl. Michael Storz ------------------------------------------------- Leibniz-Rechenzentrum ! <mailto:St...@lr...> Barer Str. 21 ! Fax: +49 89 2809460 80333 Muenchen, Germany ! Tel: +49 89 289-28840 |