From: Lionel B. <lio...@bo...> - 2005-04-29 08:06:33
|
Michel Bouissou wrote the following on 29.04.2005 09:13 : >Le Jeudi 28 Avril 2005 17:08, Lionel Bouton a =E9crit : > =20 > >>I was afraid we'll have to come to this. This will increase the load on >>the database slightly but I guess this won't be much of a problem. Now >>is the right time to add this table as I'm pushing database changes in >>1.5.6 anyway. >> =20 >> > >I think we should be very careful when considering adding complexity. Ad= ding=20 >complexity is always much of a problem, especially if not absolutely=20 >needed ;-) > =20 > Yep, this is why I chose from_awl as a first step instead of the full triplet. But when you see several thousands mail coming your way that would not be accepted with a first awl stage using triplets you can't avoid considering adding it :-) >>From a performance standpoint, I feel that adding supplementary tables=20 >(connect_awl ? forward_awl ? src_awl ?) would probably be bad. We would = have=20 >to perform queries against many more tables before deciding on the fate = of =20 >any email, and I'm rather sure that querying 6 tables, even if they are=20 >smaller, will always be much slower than querying 3 tables, even if bigg= er.=20 >After all, they are indexed... > =20 > It depends. The connect_awl will slow down the process there's no doubt about it, but the forward/rcpt_awl could remove lots of entries in from_awl. Even with indexes, if a big table's size is cut in half, your index uses half as much memory -> this can mean having to access disk versus doing all searches in memory. Anyway I'll make SQLgrey access these tables only if configured to do so. > =20 > >>Seems good to me. People will be able to set the aggregation level at >>"1" to bypass the connect_awl (I just realised that I can make DB check= s >>depend on the aggregation level, in this case we'll have the same DB lo= ad). >> =20 >> > >I think that if new tables appear in SQLgrey, their very usage must be=20 >_optional_ (with a config parameter), so the admin has control not only = on=20 >"when to move entries from one table to another", but also on "wheter or= not=20 >to use a given table at all". > =20 > I agree. >I feel that such new tables, if implemented, should be off by default, a= nd=20 >could be activated only by users who think their use may be better in th= eir=20 >precise situation. > > =20 > In the development versions, I'll turn them on by defaults and wait for reports to decide what's best for 1.6.0. >>A quick question though. I'm wondering if a "src_awl" would be of any >>use, could people with large sites check how many entries they have in >>domain_awl for the same src ? >> =20 >> > >I don't think this one would be very useful. domain_awl is very useful w= ith no=20 >doubt, and it helps in much reducing the size of from_awl. I'm not sure = that=20 >adding a supplementary level would give a comparable gain that would be = worth=20 >doing it. > > =20 > If I can avoid one more table, that's good :-) Anyway I was thinking about finally adding the log parsing tool, one function of this tool would be to advise the admin to add some IPs/class C in the local whitelists. This would serve the purpose of the src_awl only better as the Perl hashes will handle the decision with no DB access whatsoever. >>In other news, I'm planning to add blacklisting support. Probably after >>1.6.0. The idea is to have a set of conditions where an IP will enter a >>blacklist >> =20 >> > >Again, be careful with complexity. I'm not sure that BLACKlisting is the= =20 >business of a GREYlisting tool... > > =20 > Adding blacklisting support is not for 1.6 so we'll have time to think about it. But the idea here is that SQLgrey sees a lot of spam sources but forget them as soon as they are detected as a spam source! Using this information to fight spam storms could be rather handy. Lionel. |