Re: [Sqlgrey-users] modular design

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Michael Storz wrote the following on 06.05.2005 23:11 :

>>Hum... There are problems with separating the propagations from the
>>greylisting.
>>* It will create stale entries in the bottom awls which will be fed by
>>the greylister itself due to race conditions between the greylister and
>>the separate daemons/scripts (not bad, just annoying and reflect what
>>can already happen when multiple SQLgrey instances access the same DB).
>>* You'll have more overhead because the propagation algorithms will have
>>to query the database for the entries they have to move, now SQLgrey
>>only query the src it is working on, the external daemons will have to
>>select these srcs by querying the database.
>>* You'll have to schedule the propagation algorithms carefully : not to
>>slow or you will lose awl perfs, not to fast or you will bring the DB
>>down to its knees. Today the scheduling is not needed as the propagation
>>algorithms are event-driven (and so are automagically at the ideal point).
>>
>>The event-driven aspect is quite important if you want to:
>>- maintain control of what happens on the global scale,
>>- avoid querying large amounts of data to extract which part should be
>>processed.
>>    
>>
>
>I'm not sure, if I understand this correctly. As the underlying database
>engine does not allow transactions, you will always have the possibility
>of interference of parallel running daemons, which access the same data.
>If the sequence of operations (insert, delete, update) are carefully
>planned with the parallel access in mind, no big problems should occur.
>
>  
>

Indeed no big problems will occur. The annoying effects (that can
already happen and are nothing to be afraid of) I'm speaking of are the
awl entries that could be created in from_awl although an entry in
domain_awl supercedes it.

The main problem I see with separate independant daemons is that the
propagation algorithms must select from the whole awl tables the entries
they want to handle. I don't like this for two reasons, this :
- is inefficient on a purely design standpoint (you have to query the
database for an information you could get directly from the greylister),
- causes load spikes.

What I would prefer to see is some key points in the code where you
could register hooks. Let's say for example that every time an entry is
ready to be added to the from_awl, any registered hook will be able to
short-circuit the default behaviour of adding the entry to from_awl and
do whatever it wants with the entry. You could then add the propagation
to higher-level awls at this point.

>We are running our external propagation algorithms every 5 minutes and it
>does not seem to bring mysql down to its knees. Since the scripts only
>request all the new data of the last 6 minutes, this is not much load for
>mysql. The processing of the data however does need some time, since heavy
>DNS queries are done, which in case of spammer domains may take a while to
>complete or get a timeout. With the momentary desing of sqlgrey -
>multiplexing - it is not possible to do this event-driven, response time
>would be terrible. To allow DNS based queries, sqlgrey has to go to
>prefork, where several threads run in parallel like the implementaton of
>amavisd-new.
>  
>

As long as SQLgrey can answer in a timely fashion (and frankly it should
or we'll have serious problems) prefork can only bring marginal speedups
(and probably slowdowns if not tuned properly). Nothing prevents a fork
in SQLgrey's code (or a module's one for that matter) as is already done
for the cleanups though. For example, if I understand correctly, the DNS
query only comes after the greylisting, the answer to this query isn't
needed to return an answer to Postfix. You could then fork, returning
your answer to the main code while processing the data asynchronously
(in fact I could already implement forking in the code to do some DB
processing asynchronously, mainly the AWL propagations).

In the example above where the entry is about to be added to from_awl,
the hook could fork, tell SQLgrey to let the message pass (and decide if
you want the from_awl entry to be created by SQLgrey or not) and
meanwhile do whatever you want with the "src, sender_name,
sender_domain, rcpt, first_time" array. You could do DNS queries at this
point or if you want, you can avoid forking and push this information to
another daemon through a socket or even log the entry for future
batch-processing if you feel like it.

>I would love to hack some perl code together to implement at least some of
>these features. Unfortunately, I'm not allowed to do it, because I have to
>manage some other projects for our messing system. Therefore, I hope you
>are keen to implement these features :-)
>
>  
>

Not everything, not tonight :-) But these paths are interesting and help
generate other ideas.

Thanks,

Lionel.