From: Lionel B. <lio...@bo...> - 2009-08-18 23:43:03
|
Dan Faerch a écrit, le 08/19/2009 12:08 AM : > Dan Faerch wrote: > >> [...] >> Right. Completly killing of the "old" cleanup_delay and replace it with >> a fuzzy cleanup is a great idea. However "max_allowed_clean_time" needs >> to be calculated, I don't think so. In fact we can very well determine what the value should look like. If we assume SQLgrey is blocked while cleaning (sync mode), we have to prevent 2 problems : - Postfix aborting the connection because of a policy service timeout (100s), - Postfix refusing incoming connections because it reached the maximum number of smtpd processes. The first is easy, just set a value for target_clean_time (max_clean_time was a poor name choice) well below 100s and everything is OK (if you set it to 10s for example, only a 10x surge in connect traffic can make it a problem, 2s => 50x surge, ...). It seems we don't have the second problem (as we didn't get reports describing errors about max simultaneous processes). My guess is that smtpd processes are busy for several seconds (actual mail transfer, RBL and policy queries, ...) for each transfer which make admins of busy server tune them for more simultaneous connections and push the second problem behind the limit where we witness the first (policy service timeouts). So I think that if we solve the first problem we kill two birds with one stone. I propose that we set a target of 5s for the actual cleanup execution time and adjust the cleanup frequency to try to keep below this value. >> since it is relative how much time it takes to >> cleanup, depending on the system, the dbserver, the cpu & IO load, the >> filesize of db's ect.ect. Yes it is, but assuming the rate of row delete is more or less a constant, on a given domain with traffic patterns that don't change by more than 2 orders of magnitude, the cleanup time doesn't change much from run to run (not by more than 2 orders of magnitude at least). >> So you would still need to employ LIMIT to >> ensure a definitive max. >> I think we won't have. Just init the system with a very high cleanup frequency : 30s between each cleanup for example. Then let it find the sweet spot from that point. On most domains it should lower the cleanup frequency gradually and only on very busy domains will it keep it around the initial value. Everyone would be able to use this without any configuration. To make that work we only have to store the db_cleandelay in the config table in addition to the current last cleanup timestamp. For each SQLgrey process, we wait for our own (adjusted) copy of the delay to expire, reload the values store in DB to make sure another process hasn't change things behind our back (then we know the cleanup is already done) and update our own internal copy of the next_cleanup_time based on the DB values. We set a minimum db_cleandelay of 10s, add or remove a random number of seconds (between -5 and +5) and we are set. Even if two or more servers happen to clean at the same time, it's not a problem : they should all take roughly the same time doing so (because one is blocking all the others on blocking backends or they share the delete work on non-blocking backends) and will all put roughly the same values in the database -> no unexpected cleandelay shooting through the roof or the floor. >> [...] >> > Ohhh. I just realized that i was answering based on the assumption that > max_allowed_clean time would be the maximum number of seconds a cleaup > is allowed to take.. And that it might not be what you meant? > It's not the maximum (the name wasn't right, sorry). It should be the sweet spot where everything runs smoothly and the system should be able to cope with 10x this value when an occasional surge occurs. Seems like it's time for me to go back to the source :-) Lionel |