Re: [Sqlgrey-users] Performance: strict versus loose expiration time

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Michael Storz wrote the following on 28.02.2007 16:39 :
> On Wed, 28 Feb 2007, Dan Faerch wrote:
>
>   
>> Michael Storz wrote:
>>
>>     
>>> db_cleandelay seconds longer than the strict one. Taking the default this
>>> means it is just half an hour longer, which is at least not noticeable for
>>> the AWLs.
>>>
>>>
>>>       
>> Well. We cant really assume what ppl use for db_cleandelay.. I use 60
>> seconds. Someone might do it once every week. You never know ;).
>>     
>
> We use 1 hour. I think, because as you said, we do not know what people
> are using for db_cleandelay, we should either provide a strict
> implementation or the possibility for the user to choose which
> implementation he wants. As I said in another email, I think I will be
> able to provide a patch for such a choice.
>   

Hum if you can get away without loose, please do so. I consider the 
loose algorithm to be a bug. What I don't like with the possibility to 
choose between loose and strict algorithm is that it will be hard to 
explain to the user what she loses using loose...
I'd prefer users not to have to worry about such details. Unless you can 
demonstrate a significant performance difference between the two, please 
only provide 'strict' support.

>> I dont know excactly how thoose work, but why should there be any
>> difference when the actual where statement doenst change. I dont see
>> anywhere that ie. a timestamp is stuck into the statement. Its all
>> static information and thus, in my head, should still hit the query cache.
>>
>> Most select needs a WHERE statment. Eg. WHERE sender_name =, WHERE src
>> =, ect.ect. So i dont see how adding the max_connect_age,
>> reconnect_delay or awl_age values (which is static) will change how the
>> cache works.
>>
>>     
>
> The query cache uses the select statement as a "text" key. That means,
> using 'select' or 'SELECT' is already a different statement. If we
> implement strict expiration time, then every select of the is_in or count
> subroutines will have a timestamp:
>
> Example from sub is_in_from_awl:
>
>     my $sth = $self->prepare("SELECT 1 FROM $from_awl " .
>                              'WHERE sender_name = ? ' .
>                              'AND sender_domain = ? ' .
>                              'AND src = ? ' .
>                              'AND last_seen > ' .
>                              $self->past_tstamp($self->{sqlgrey}{awl_age},
>                                                 'DAY')
>                             );
> If now() is 2007-02-28 16:41:08, for MySQL you will get the statement
>
> SELECT 1 FROM from_awl WHERE sender_name = ? AND sender_domain = ? AND src
> = ? AND last_seen > timestamp '2007-02-28 16:11:08' - INTERVAL 36 DAY
>
> One second later, now changes to 2007-02-28 16:41:09 and you have a
> textual different select statement. For a loose implementation the part
> 'AND last_seen > timestamp '2007-02-28 16:11:08' - INTERVAL 36 DAY' will
> be ommitted and you always get the same statement (for the same src and
> sender).
>
>   

I don't believe this is a good example, I believe you get a benefit from 
the query cache only if the whole query (not only the prepared 
statement) is exactly the same. So in the example above you'll need to 
have the same timestamp AND the same sender_name, sender_domain, src. 
Losing the timestamp don't bring much benefits, because the rate at 
which the from_awl is updated (entries added, updated or deleted) 
probably is far higher than the rate at which the sender+src comes back.

Lionel