Re: [Sqlgrey-users] modular design

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Sat, 30 Apr 2005, Lionel Bouton wrote:

> Michael Storz wrote the following on 29.04.2005 16:10 :
>
...

> >If the code would be separated into several packages,
> >other people could implement daemons for different MTAs like sendmail with
> >milter, exim or qmail. All of these daemons would be able to use the
> >package for grey-, black- and whitelisting. Since we are not using
> >postfix, we had to struggle to code glueware which emulates the postfix
> >policy protocol.
> >
>
> Spliting the code will probably happen sooner or later. SQLgrey is
> starting to look a little too bloated to my taste... I would like to
> avoid this for 1.6.0 though, because it will probably take some time and
> heavy surgery :-)
>

I agree. I thought it would be nice to have it for a 2.X release.
...

>
> >Second: For smaller sites it is definitely nice to have one daemon, which
> >makes all the work. Just install the software and let it run. In our case
> >however, I would like to be able to tune the system in such a way that it
> >fits our needs. E.g. I would like to separate the checking of the
> >databases from the different propagation algorithms, which transports data
> >from one table to another, into separate daemons or scripts.
> >
>
> Hum... There are problems with separating the propagations from the
> greylisting.
> * It will create stale entries in the bottom awls which will be fed by
> the greylister itself due to race conditions between the greylister and
> the separate daemons/scripts (not bad, just annoying and reflect what
> can already happen when multiple SQLgrey instances access the same DB).
> * You'll have more overhead because the propagation algorithms will have
> to query the database for the entries they have to move, now SQLgrey
> only query the src it is working on, the external daemons will have to
> select these srcs by querying the database.
> * You'll have to schedule the propagation algorithms carefully : not to
> slow or you will lose awl perfs, not to fast or you will bring the DB
> down to its knees. Today the scheduling is not needed as the propagation
> algorithms are event-driven (and so are automagically at the ideal point).
>
> The event-driven aspect is quite important if you want to:
> - maintain control of what happens on the global scale,
> - avoid querying large amounts of data to extract which part should be
> processed.

I'm not sure, if I understand this correctly. As the underlying database
engine does not allow transactions, you will always have the possibility
of interference of parallel running daemons, which access the same data.
If the sequence of operations (insert, delete, update) are carefully
planned with the parallel access in mind, no big problems should occur.

We are running our external propagation algorithms every 5 minutes and it
does not seem to bring mysql down to its knees. Since the scripts only
request all the new data of the last 6 minutes, this is not much load for
mysql. The processing of the data however does need some time, since heavy
DNS queries are done, which in case of spammer domains may take a while to
complete or get a timeout. With the momentary desing of sqlgrey -
multiplexing - it is not possible to do this event-driven, response time
would be terrible. To allow DNS based queries, sqlgrey has to go to
prefork, where several threads run in parallel like the implementaton of
amavisd-new.
...

>
> >What kind of whitelist tables are possible? Well, we have 5 variables:
> >
> >- IP: IP adress of sending email server
> >- ON: Originator Name
> >- OD: Originator Domain
> >- RN: Recipient Name
> >- RD: Recipient Domain
> >
> >This leads to 32 different possibilities:
> >
> >
>
> This is a little more complex than that... You can add to these 5
> variables : time (probably first/last), helo, hits and some other values
> you can get through the policy protocol (SASL auth, fqdn). But you can
> probably blow huge holes in the matrix by removing the combinations that
> don't make sense (ON without OD isn't really useful for example)...
>

I agree, there are a lot more variables to consider. What I tried was to
see all possibilities from the variables we use at the moment in from_awl
and domain_awl for whitelisting. As you can see, 2 of your new tables fit
nicely in this concept, whereas the other 2 bring a new dimension to this:

- exception processing of whitelist tables.

This is something which could be valuable for other whitelists too. E.g. I
think every automatic whitelist should have an exception table, which is
manually configured. At the moment I have no example, but I could imagine
that at some point in the future I would wish to express that some ip addr
and/or domain should not be propagated to the next awl.

> >I'll stop here, because this is a lot of information to think about. But
> >hopefully I showed some ideas of where sqlgrey could evolve into.
> >
> >
>
> And I thank you. Quite ambitious! This will take some time to get there...

I would love to hack some perl code together to implement at least some of
these features. Unfortunately, I'm not allowed to do it, because I have to
manage some other projects for our messing system. Therefore, I hope you
are keen to implement these features :-)

Michael Storz
-------------------------------------------------
Leibniz-Rechenzentrum   !   <mailto:St...@lr...>
Barer Str. 21           !   Fax: +49 89 2809460
80333 Muenchen, Germany !   Tel: +49 89 289-28840