From: Lionel B. <lio...@bo...> - 2005-02-20 23:34:57
|
Michel Bouissou wrote the following on 19.02.2005 10:13 : >Hi there, > >(I'm using PostgreSQL, so maybe your mileage may vary if you use another SQL >DB system). > >I tried to optimize DB cleanup time by creating indexes on the last_seen >column (domain_awl and from_awl) and on the first_seen column (connect), so >the cleanup process can use these indexes to find entries older or newer than >a given value rather than having to explore sequentially the complete tables. > >This can make a big difference for large sites having tens of thousands >entries in their tables... > >However, usingt the PostgreSQL "explain" statement, I've been surprised to >notice that the indexes I had created where *NOT* used when performing the >kind of queries that SQLgrey performs, with a WHERE clause built such as in: > >select * from from_awl where last_seen < now() - INTERVAL '30 DAY'; > >However, the indexes would be used if the WHERE clause used a constant, such >as in: > >select * from from_awl where last_seen < '2005-01-19 08:43:09'; > >It seems that what prevents the index from being used, is having the "now() - >INTERVAL '65 DAY'" calculation performed as an expression in the WHERE clause >comparison. > > You just made me understand why these queries are slow. This isn't the whole syntax that is to blame, but the 'now()' function call. The database as no way of knowing that calls to now() will return the same result during the query (and in fact it doesn't). Not only does the database use a sequential scan which is *BAD* but it also calls now() for each and every one of the awl entries :-( If you do an explain on this : select * from from_awl where last_seen < timestamp '2005-02-21 00:12:46.22726+01' - INTERVAL '30 DAY'; You'll find that PostgreSQL is doing the right thing. I considered doing the now() call outside the database earlier (in fact when SQLite support was added). I want to avoid this if I can because I'd like to allow for several SQLgrey instances accessing the same database. As the default greylisting window is only 1 minute, either there must be ntp in the loop and we pay extra attention to the localtime accross the whole system (we don't want GMT in the database for the poor DBA) or we rely on a single point : the database. One way to do this might be to replace the now() with the "timestamp '<value>'" and fetch the value with a simple "SELECT now()". Leaving "SELECT 1" aside, this should be one of the most light query we can do :-) It works with PostgreSQL, I need to test MySQL and SQLite I'll isolate this into a separate method called only once per connection after whitelists evaluations. If needed we'll have the possibility of avoiding the database query by doing the work in Perl later. Lionel. |