Re: [Sqlgrey-users] Spammer alex@hotmail.com benefits from from_awl

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Michel Bouissou wrote the following on 29.04.2005 09:13 :

>Le Jeudi 28 Avril 2005 17:08, Lionel Bouton a =E9crit :
> =20
>
>>I was afraid we'll have to come to this. This will increase the load on
>>the database slightly but I guess this won't be much of a problem. Now
>>is the right time to add this table as I'm pushing database changes in
>>1.5.6 anyway.
>>   =20
>>
>
>I think we should be very careful when considering adding complexity. Ad=
ding=20
>complexity is always much of a problem, especially if not absolutely=20
>needed ;-)
> =20
>

Yep, this is why I chose from_awl as a first step instead of the full
triplet. But when you see several thousands mail coming your way that
would not be accepted with a first awl stage using triplets you can't
avoid considering adding it :-)

>>From a performance standpoint, I feel that adding supplementary tables=20
>(connect_awl ? forward_awl ? src_awl ?) would probably be bad. We would =
have=20
>to perform queries against many more tables before deciding on the fate =
of =20
>any email, and I'm rather sure that querying 6 tables, even if they are=20
>smaller, will always be much slower than querying 3 tables, even if bigg=
er.=20
>After all, they are indexed...
> =20
>

It depends. The connect_awl will slow down the process there's no doubt
about it, but the forward/rcpt_awl could remove lots of entries in
from_awl. Even with indexes, if a big table's size is cut in half, your
index uses half as much memory -> this can mean having to access disk
versus doing all searches in memory.
Anyway I'll make SQLgrey access these tables only if configured to do so.

> =20
>
>>Seems good to me. People will be able to set the aggregation level at
>>"1" to bypass the connect_awl (I just realised that I can make DB check=
s
>>depend on the aggregation level, in this case we'll have the same DB lo=
ad).
>>   =20
>>
>
>I think that if new tables appear in SQLgrey, their very usage must be=20
>_optional_ (with a config parameter), so the admin has control not only =
on=20
>"when to move entries from one table to another", but also on "wheter or=
 not=20
>to use a given table at all".
> =20
>

I agree.

>I feel that such new tables, if implemented, should be off by default, a=
nd=20
>could be activated only by users who think their use may be better in th=
eir=20
>precise situation.
>
> =20
>

In the development versions, I'll turn them on by defaults and wait for
reports to decide what's best for 1.6.0.

>>A quick question though. I'm wondering if a "src_awl" would be of any
>>use, could people with large sites check how many entries they have in
>>domain_awl for the same src ?
>>   =20
>>
>
>I don't think this one would be very useful. domain_awl is very useful w=
ith no=20
>doubt, and it helps in much reducing the size of from_awl. I'm not sure =
that=20
>adding a supplementary level would give a comparable gain that would be =
worth=20
>doing it.
>
> =20
>

If I can avoid one more table, that's good :-) Anyway I was thinking
about finally adding the log parsing tool, one function of this tool
would be to advise the admin to add some IPs/class C in the local
whitelists. This would serve the purpose of the src_awl only better as
the Perl hashes will handle the decision with no DB access whatsoever.

>>In other news, I'm planning to add blacklisting support. Probably after
>>1.6.0. The idea is to have a set of conditions where an IP will enter a
>>blacklist
>>   =20
>>
>
>Again, be careful with complexity. I'm not sure that BLACKlisting is the=
=20
>business of a GREYlisting tool...
>
> =20
>

Adding blacklisting support is not for 1.6 so we'll have time to think
about it. But the idea here is that SQLgrey sees a lot of spam sources
but forget them as soon as they are detected as a spam source! Using
this information to fight spam storms could be rather handy.

Lionel.