From: Michael S. <Mic...@lr...> - 2005-04-29 09:28:31
|
On Fri, 29 Apr 2005, Lionel Bouton wrote: > Given you manually query for the 3 most significant bytes, do you use > 'full' for the greylisting algorithm? Maybe smart would be better suited > (would slowly decrease the number of database entries by replacing IPs > by class C nets). It's difficult to say how much less entries you would > have without a script applying the algo after DNS lookups though... I > made it the default because it is more friendly with mail pools, but the > side effect is that it is also more friendly with your database :-) I know, at some point we have to discuss our differing views about FULL versus SMART :-) Indeed, we are using FULL, because I think it is the right way to go. It definitely depends on the amount of emails you are receiving. A site with a small or moderate amount of email may benefit of SMART. Therefore the default is ok with me. Now, what are the issues using FULL or SMART? - One of your arguments was smaller database. Well, that's actually no problem for us: du -h /var/lib/mysql/sqlgrey/ 258M /var/lib/mysql/sqlgrey With 4 GByte of memory the whole database including indexes should be in memory all the time. Our graph about CPU usage shows 5 % is used by sqlgrey and mysqld. Up to another 5 % by our log analyzing. Therefore I have no problem with more complexity of sqlgrey from the standpoint of performance. - Loss of emails from mailsystems with lots of MTAs/ip addresses for outgoing email, which * retry always from the same ip address (separate queueing systems) * which use a different ip address for every retry (common queueing system, for example a databse driven queueing system) * use a linear backoff algorithm for retries (every 15, 30 or 60 minutes) * use an exponential backoff algorithm (e.g. 5, 10, 20, 40, 80, ...) IP MTA | same | diff | ----+------+------+ lin | a | b | ----+------+------+ exp | c | d | ----+------+------+ * Case a: Here, emails will be delayed till an entry for every MTA is created. It will take longer for FULL than for SMART, but normally no email will be lost. Most of the MTA pools are of this type. * Case b: From my experience, this setup is seldom. Here a chance exists that FULL will not accept an email if there are a lot of MTAs in the pool and the retry time is longer than usual. E.g. if the retry time is 30 minutes, reconnect_delay less than 30 minutes and max_connect_age 24 hours, than the pool can have up to 46 MTAs and we will still accept the email, but it will be delayed for nearly 24 hours. In reality, emails will not be delayed so long, if the MTAs are choosen randomly. How many sites will have such big pools of MTAs? * Case c: Similar to a, but emails will be delayed longer than in case a. Still, for a well-behaved MTA, no email will be lost. * Case d: Sites using such a system are very rude to the Net, in my eyes. If they use a common queuing system, then they can distribute the load on a cluster of outgoing MTAs, but they MUST shield this from the outside, e.g. using NAT. Such installations are not well behaving MTAs and must be whitelisted. - Higher delay FULL can have compared to SMART: The right medizine against that are good whitelists! Good means the number of whitelists and the content of whitelists. In addition to the standard sqlgrey algorithms for filling the tables via traffic analysis, we have implemented our own algorithms :-) * fast propagation (fills from_awl): This algorithm is based on the trust we have about a sending MTA. If we trust it, we accept the email, even if there is no entry about this triple in the whitelists. * MX-check (fills domain_awl): if outgoing and incoming MTAs are the same, put an entry in domain_awl. * A-check (fills domain_awl): if sending MTA sends emails for its hostname only, put it in domain_awl. These additional algorithms give us a lot of entries in our from_awl and domain_awl and therefore reduce the delay significantly. And the last 2 algorithms only work with FULL, not with SMART, with the current design of sqlgrey. Sorry for you, guys :-) About additional whitelists, forward_awl/rcpt_awl is one of them. At the moment fast propagation replaces this table, because most of the time we accept immidiately all the spam mails from forwarding where the remote MTA does not use greylisting, but at the cost of many unnecessary entries in from_awl. Another one would be prevalidation as implented in other greylisting software. here you put the tuple originator/recipient without an ip address in a table for every email you send out. - Trust in the sending MTA: SMART reduces trust in the sending MTA about the handling of temporary errors in a well behaved way based on the relevant RFCs. Well behaved means * trying to retransmit a message several times until a timeout of 3 - 5 days occur. * retransmitting emails in a timely manner (minutes) and not only once in 24 hours For me this is the reason to use FULL. I want to get trust in the sending MTA, the more the better. And if I have trust in a MTA, I will accept emails as fast as I can. Actually, what I want is to strengthen the trust in the sending MTA, e.g. to use the domain from HELO/ELHO or requiring several retries, before I accept a connection. But that's another story and some work for the future, but before spammer change their software. A detailed analysis about the retransmit behavior of other MTAs is needed first. Regards, Michael Storz ------------------------------------------------- Leibniz-Rechenzentrum ! <mailto:St...@lr...> Barer Str. 21 ! Fax: +49 89 2809460 80333 Muenchen, Germany ! Tel: +49 89 289-28840 |