From: Lionel B. <lio...@bo...> - 2004-12-17 12:33:28
|
Rene Joergensen wrote the following on 12/17/04 11:42 : >[...] > > >>Currently I'm inclined to make the interval configurable in 1.4 with a >>more sensible default that "0" and store a last_cleanup timestamp in >>database in 1.6 to easily support multiple instances accessing the same >>database. >> >> > >Why wait until 1.6? Currently you got the config-table which could be >fine for storing a timestamp. > > Because 1.4 should be for bugfixes and really simple additions only. For example maint_delay set to 0 is a bug, it will be fixed in 1.4.1. Allowing whitelists reload from a central server is not intrusive : it will be done in 1.4.x. But for new functions that involve more than a few lines of code, I'll start the 1.5.x dev branch. People that don't want to test new code and want more reliable software will use 1.4.x updates, others will test the 1.5.x releases until they stabilize and 1.6.0 comes out... > > >>Yep, the order of the checks is designed to minimize the number of >>queries done to the database. >> >> > >A couple of comments: > >When checking domain_awl and from_awl, which both is cleaned reguarly, >there is no need for "AND last_seen > now() - INTERVAL 60 DAY". It >doesn't matter if it's 60 days + 30 minutes, and it'll make the query a >little lighter. > > I have already considered dropping this part of the statement (with the same remarks in mind), but I didn't do it already for the following reasons : - doing so will make the results depend on the cleanup which might very well become a separate process which might have to handle subtle problems when you have several SQLgrey instance accessing the same database, - I'm not sur this will save much time, remember that SQL doesn't slow down much when performing computations, it gets slow when it involves reading a lot of rows, sorting results that don't fit in memory or perform some kinds of joins between tables. I might drop this condition in the future given the following conditions : - tests show there is a gain in performance, - I've clear ideas on how the cleanup process will shape up. >First time a triplet is seen a delete is still performed on the >connect-table to erase an entry that doesn't exist, > In fact it could exist, but the way SQLgrey handle this is not clean (I couldn't find any way to make it so). Let me explain. SQLgrey is coded with multiple instances accessing the same database in mind. So nothing prevents another instance adding a connect entry that will clash with the one this instance wants to insert, just imagine 2 MX receiving 2 mails with the same sender/recipient/clientIP, the corresponding SQLgrey instances will both try to add a very similar entry to the connect table (only the timestamp can change). If they do it at the same time, there's no way to ensure the code can detect it reliably (MySQL doesn't support PRIMARY KEYs large enough, this is one of the little annoyances that usually make me prefer PostgreSQL). So SQLgrey tries to reduce the window as much as possible by DELETING just before INSERTING. Having 2 entries will bite back when trying to match a reconnection against the earlier connection, SQLgrey doesn't like the fact that there could be 2 entries in the table for the same connection. In fact by rereading the whole code involved, there's a bug : SQLgrey should allow several entries and if one matches, allow the message to pass (the consistency checks done by SQLgrey wrongly assume there is a PRIMARY KEY). There will still be problems when computing the reconnection delay, but there's not much we can do about it. The window for this bug is really small and can only happen on multiple SQLgrey instances installations, but it can happen (just added this bug to my TODO for 1.4.x). > maybe the result of >the previous check in the connect-table could be used to determine if a >delete is necessary before inserting the row. > > Hope my explanation was enough to convince you it's not the case, it's a subtle problem. >Maybe a auto-incrementing key could be implemented on the connect-table? >When the first select is performed on connect-table, the id-key could be >returned, making deletes easier for the db. > > > I'm not sure I understand. Which delete do you want to speed up ? The one that occurs on the connect table when the reconnection succeeds ? This can be done, but there won't be a way to check that another connect entry was added *while* the auto-whitelisting process was going on. You will have an entry in connect that will only be removed by the "cleanup" functions that will report a false positive (as the reconnect will match the auto-whitelist instead of the connect entry, it will remain here). >>Another entry in my TODO list... Probably for the 1.4 releases. >> >> > >Sounds like a good idea, maybe i'll look at coding something like that, >if i get the time. > > > That would be a great addition I'd be thankful for, Lionel. |