From: Michael S. <Mic...@lr...> - 2005-05-06 22:43:03
|
Sorry, mixed the numbers of entries wih the one of ip addrs: On Sat, 7 May 2005, Michael Storz wrote: > Analyzing our from_awl, I found the following: > > The table has 365.208 entries from 178.026 different ip addresses. > From these ip addresses > > - 129.210 have exactly one entry and this is with sender_domain = > "-undef-" > - 38.904 have only entries without sender_domain = "-undef-" > - only 9.912 have entries with both kind of sender_domains > > If we split the from_awl in 2 tables > > - from_awl: sender_domain <> "-undef-" > - dsn_awl: sender_domain = "-undef-" > > we get a massive reduction of entries in the from_awl and also a massive > reduction of table size, since we do not have to store sender_name and > sender_domain in dsn_awl. > > In our case, dsn_awl would have 139.122 entries and from_awl 48.816 In our case, dsn_awl would have 139.122 entries/ip addresses and from_awl 226.086 entries from 48.816 ip addresses. > entries. Since we know which table to query based on > sender_domain/sender_name no additional table lookups are needed. > > Another advantage would be that the from_awl does not change so much as > before, because all the DSNs which result as backscatter of the spammers > are excluded now. And we could decide to use a different awl_age for the > tables. > > If we include connect_awl, I don't think we need a split of this table, > because the backscatter DSNs will propagate fast into the dsn_awl, only > normal DSNs will stay in connect_awl. > > Michael Storz > ------------------------------------------------- > Leibniz-Rechenzentrum ! <mailto:St...@lr...> > Barer Str. 21 ! Fax: +49 89 2809460 > 80333 Muenchen, Germany ! Tel: +49 89 289-28840 > Michael Storz ------------------------------------------------- Leibniz-Rechenzentrum ! <mailto:St...@lr...> Barer Str. 21 ! Fax: +49 89 2809460 80333 Muenchen, Germany ! Tel: +49 89 289-28840 |