From: Lionel B. <lio...@bo...> - 2005-05-06 22:44:45
|
Michael Storz wrote the following on 06.05.2005 23:11 : >>Hum... There are problems with separating the propagations from the >>greylisting. >>* It will create stale entries in the bottom awls which will be fed by >>the greylister itself due to race conditions between the greylister and >>the separate daemons/scripts (not bad, just annoying and reflect what >>can already happen when multiple SQLgrey instances access the same DB). >>* You'll have more overhead because the propagation algorithms will have >>to query the database for the entries they have to move, now SQLgrey >>only query the src it is working on, the external daemons will have to >>select these srcs by querying the database. >>* You'll have to schedule the propagation algorithms carefully : not to >>slow or you will lose awl perfs, not to fast or you will bring the DB >>down to its knees. Today the scheduling is not needed as the propagation >>algorithms are event-driven (and so are automagically at the ideal point). >> >>The event-driven aspect is quite important if you want to: >>- maintain control of what happens on the global scale, >>- avoid querying large amounts of data to extract which part should be >>processed. >> >> > >I'm not sure, if I understand this correctly. As the underlying database >engine does not allow transactions, you will always have the possibility >of interference of parallel running daemons, which access the same data. >If the sequence of operations (insert, delete, update) are carefully >planned with the parallel access in mind, no big problems should occur. > > > Indeed no big problems will occur. The annoying effects (that can already happen and are nothing to be afraid of) I'm speaking of are the awl entries that could be created in from_awl although an entry in domain_awl supercedes it. The main problem I see with separate independant daemons is that the propagation algorithms must select from the whole awl tables the entries they want to handle. I don't like this for two reasons, this : - is inefficient on a purely design standpoint (you have to query the database for an information you could get directly from the greylister), - causes load spikes. What I would prefer to see is some key points in the code where you could register hooks. Let's say for example that every time an entry is ready to be added to the from_awl, any registered hook will be able to short-circuit the default behaviour of adding the entry to from_awl and do whatever it wants with the entry. You could then add the propagation to higher-level awls at this point. >We are running our external propagation algorithms every 5 minutes and it >does not seem to bring mysql down to its knees. Since the scripts only >request all the new data of the last 6 minutes, this is not much load for >mysql. The processing of the data however does need some time, since heavy >DNS queries are done, which in case of spammer domains may take a while to >complete or get a timeout. With the momentary desing of sqlgrey - >multiplexing - it is not possible to do this event-driven, response time >would be terrible. To allow DNS based queries, sqlgrey has to go to >prefork, where several threads run in parallel like the implementaton of >amavisd-new. > > As long as SQLgrey can answer in a timely fashion (and frankly it should or we'll have serious problems) prefork can only bring marginal speedups (and probably slowdowns if not tuned properly). Nothing prevents a fork in SQLgrey's code (or a module's one for that matter) as is already done for the cleanups though. For example, if I understand correctly, the DNS query only comes after the greylisting, the answer to this query isn't needed to return an answer to Postfix. You could then fork, returning your answer to the main code while processing the data asynchronously (in fact I could already implement forking in the code to do some DB processing asynchronously, mainly the AWL propagations). In the example above where the entry is about to be added to from_awl, the hook could fork, tell SQLgrey to let the message pass (and decide if you want the from_awl entry to be created by SQLgrey or not) and meanwhile do whatever you want with the "src, sender_name, sender_domain, rcpt, first_time" array. You could do DNS queries at this point or if you want, you can avoid forking and push this information to another daemon through a socket or even log the entry for future batch-processing if you feel like it. >I would love to hack some perl code together to implement at least some of >these features. Unfortunately, I'm not allowed to do it, because I have to >manage some other projects for our messing system. Therefore, I hope you >are keen to implement these features :-) > > > Not everything, not tonight :-) But these paths are interesting and help generate other ideas. Thanks, Lionel. |