Re: [Sqlgrey-users] [SPAM] Re: [SPAM] Re: Patches for SQLgrey

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Dan Faerch a écrit, le 08/19/2009 12:08 AM :
> Dan Faerch wrote:
>   
>> [...]
>> Right. Completly killing of the "old" cleanup_delay and replace it with 
>> a fuzzy cleanup is a great idea. However "max_allowed_clean_time" needs 
>> to be calculated,

I don't think so. In fact we can very well determine what the value
should look like. If we assume SQLgrey is blocked while cleaning (sync
mode), we have to prevent 2 problems :
- Postfix aborting the connection because of a policy service timeout
(100s),
- Postfix refusing incoming connections because it reached the maximum
number of smtpd processes.

The first is easy, just set a value for target_clean_time
(max_clean_time was a poor name choice) well below 100s and everything
is OK (if you set it to 10s for example, only a 10x surge in connect
traffic can make it a problem, 2s => 50x surge, ...).

It seems we don't have the second problem (as we didn't get reports
describing errors about max simultaneous processes). My guess is that
smtpd processes are busy for several seconds (actual mail transfer, RBL
and policy queries, ...) for each transfer which make admins of busy
server tune them for more simultaneous connections and push the second
problem behind the limit where we witness the first (policy service
timeouts).

So I think that if we solve the first problem we kill two birds with one
stone.

I propose that we set a target of 5s for the actual cleanup execution
time and adjust the cleanup frequency to try to keep below this value.

>>  since it is relative how much time it takes to 
>> cleanup, depending on the system, the dbserver, the cpu & IO load, the 
>> filesize of db's ect.ect.

Yes it is, but assuming the rate of row delete is more or less a
constant, on a given domain with traffic patterns that don't change by
more than 2 orders of magnitude, the cleanup time doesn't change much
from run to run (not by more than 2 orders of magnitude at least).

>>  So you would still need to employ LIMIT to 
>> ensure a definitive max.
>>     

I think we won't have. Just init the system with a very high cleanup
frequency : 30s between each cleanup for example. Then let it find the
sweet spot from that point. On most domains it should lower the cleanup
frequency gradually and only on very busy domains will it keep it around
the initial value.
Everyone would be able to use this without any configuration.

To make that work we only have to store the db_cleandelay in the config
table in addition to the current last cleanup timestamp.
For each SQLgrey process, we wait for our own (adjusted) copy of the
delay to expire, reload the values store in DB to make sure another
process hasn't change things behind our back (then we know the cleanup
is already done) and update our own internal copy of the
next_cleanup_time based on the DB values. We set a minimum db_cleandelay
of 10s, add or remove a random number of seconds (between -5 and +5) and
we are set. Even if two or more servers happen to clean at the same
time, it's not a problem : they should all take roughly the same time
doing so (because one is blocking all the others on blocking backends or
they share the delete work on non-blocking backends) and will all put
roughly the same values in the database -> no unexpected cleandelay
shooting through the roof or the floor.

>> [...]
>>     
> Ohhh. I just realized that i was answering based on the assumption that 
> max_allowed_clean time would be the maximum number of seconds a cleaup 
> is allowed to take.. And that it might not be what you meant?
>   

It's not the maximum (the name wasn't right, sorry). It should be the
sweet spot where everything runs smoothly and the system should be able
to cope with 10x this value when an occasional surge occurs.

Seems like it's time for me to go back to the source :-)

Lionel