Re: [Sqlgrey-users] Suggestions for improvement

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Rene Joergensen wrote the following on 12/17/04 11:42 :

>[...]
>  
>
>>Currently I'm inclined to make the interval configurable in 1.4 with a 
>>more sensible default that "0" and store a last_cleanup timestamp in 
>>database in 1.6 to easily support multiple instances accessing the same 
>>database.
>>    
>>
>
>Why wait until 1.6? Currently you got the config-table which could be
>fine for storing a timestamp.
>  
>

Because 1.4 should be for bugfixes and really simple additions only.
For example maint_delay set to 0 is a bug, it will be fixed in 1.4.1. 
Allowing whitelists reload from a central server is not intrusive : it 
will be done in 1.4.x.

But for new functions that involve more than a few lines of code, I'll 
start the 1.5.x dev branch. People that don't want to test new code and 
want more reliable software will use 1.4.x updates, others will test the 
1.5.x releases until they stabilize and 1.6.0 comes out...

>  
>
>>Yep, the order of the checks is designed to minimize the number of 
>>queries done to the database.
>>    
>>
>
>A couple of comments:
>
>When checking domain_awl and from_awl, which both is cleaned reguarly,
>there is no need for "AND last_seen > now() - INTERVAL 60 DAY". It
>doesn't matter if it's 60 days + 30 minutes, and it'll make the query a
>little lighter.
>  
>

I have already considered dropping this part of the statement (with the 
same remarks in mind), but I didn't do it already for the following 
reasons :
- doing so will make the results depend on the cleanup which might very 
well become a separate process which might have to handle subtle 
problems when you have several SQLgrey instance accessing the same database,
- I'm not sur this will save much time, remember that SQL doesn't slow 
down much when performing computations, it gets slow when it involves 
reading a lot of rows, sorting results that don't fit in memory or 
perform some kinds of joins between tables.

I might drop this condition in the future given the following conditions :
- tests show there is a gain in performance,
- I've clear ideas on how the cleanup process will shape up.

>First time a triplet is seen a delete is still performed on the
>connect-table to erase an entry that doesn't exist,
>

In fact it could exist, but the way SQLgrey handle this is not clean (I 
couldn't find any way to make it so). Let me explain.

SQLgrey is coded with multiple instances accessing the same database in 
mind. So nothing prevents another instance adding a connect entry that 
will clash with the one this instance wants to insert, just imagine 2 MX 
receiving 2 mails with the same sender/recipient/clientIP, the 
corresponding SQLgrey instances will both try to add a very similar 
entry to the connect table (only the timestamp can change). If they do 
it at the same time, there's no way to ensure the code can detect it 
reliably (MySQL doesn't support PRIMARY KEYs large enough, this is one 
of the little annoyances that usually make me prefer PostgreSQL). So 
SQLgrey tries to reduce the window as much as possible by DELETING just 
before INSERTING.

Having 2 entries will bite back when trying to match a reconnection 
against the earlier connection, SQLgrey doesn't like the fact that there 
could be 2 entries in the table for the same connection.
In fact by rereading the whole code involved, there's a bug : SQLgrey 
should allow several entries and if one matches, allow the message to 
pass (the consistency checks done by SQLgrey wrongly assume there is a 
PRIMARY KEY). There will still be problems when computing the 
reconnection delay, but there's not much we can do about it.
The window for this bug is really small and can only happen on multiple 
SQLgrey instances installations, but it can happen (just added this bug 
to my TODO for 1.4.x).

> maybe the result of
>the previous check in the connect-table could be used to determine if a
>delete is necessary before inserting the row.
>  
>

Hope my explanation was enough to convince you it's not the case, it's a 
subtle problem.

>Maybe a auto-incrementing key could be implemented on the connect-table?
>When the first select is performed on connect-table, the id-key could be
>returned, making deletes easier for the db.
>
>  
>

I'm not sure I understand. Which delete do you want to speed up ? The 
one that occurs on the connect table when the reconnection succeeds ? 
This can be done, but there won't be a way to check that another connect 
entry was added *while* the auto-whitelisting process was going on. You 
will have an entry in connect that will only be removed by the "cleanup" 
functions that will report a false positive (as the reconnect will match 
the auto-whitelist instead of the connect entry, it will remain here).

>>Another entry in my TODO list... Probably for the 1.4 releases.
>>    
>>
>
>Sounds like a good idea, maybe i'll look at coding something like that,
>if i get the time.
>
>  
>

That would be a great addition I'd be thankful for,

Lionel.