Re: [Firebird-devel] AutoOptimize

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

= m. Th = wrote:
>   
> In our particular situation, if the engine design allows perhaps right 
> before applying the lock on the page the thread will boost its priority 
> to something much higher priority in order to get the required index 
> page, release the lock and after this return to it's low priority, 
> giving the chance of the other threads to stop it. 

Changing thread priority twice for every page lock sounds very
expensive.  The read/write locking of pages is handled by a general
cache manager that has no idea what use the thread is going to make
of the page.  There's no way to read a page unless you've got a read
lock on it, so the lock can't be released immediately.  The thread
that fixes index statistics will spend most of its time reading
index pages, so under your scheme, it will spend most of its time
at elevated priority.

> Besides that, IMHO, 
> the chance of 'collision' on exactly same index page between a 'writer' 
> and a 'reader' lock, while we have a delayed update procedure is quite 
> small. (...remember, M$ 'dare' to do this in sync with /any/ change). 

I don't know what uSoft is doing inside SQLServer.  If you have an
example from an open source project, maybe we can learn from them.
A thread that reads every leaf page of every index will certainly
run into threads that are adding and deleting index nodes from time
to time.  If it can't get enough priority to release its locks, it
will eventually tie up the system.

> Also, the other vendors experience shows ...

I'm not arguing that we shouldn't find a way to set statistics
automatically, just that redesigning the whole I/O system of
Firebird to accommodate the feature seems a little extreme.
I'm also worried about housekeeping features that run at
random times and tie up production.

For example, we could make setting statistics on indexes part
of sweep, or an optional feature of sweep, or another function
that can be invoked by itself (as sweep can) or as part of
sweep.

>>>     
>> ... We don't hold unreferenced pages in 
>> cache.  Normal usage keeps the upper levels of indexes on active tables
>> in cache.   Leaf level pages may or may not be in cache, depending on
>> whether values from those pages have been referenced recently.
>>   
> ...and the cache is flat? Ie. an index page has the same change to go 
> away from cache like a record and/or blob page? Perhaps is better to 
> have some priorities like in hybrid disks - a very interesting concept 
> to speed things... perhaps is quite easy to adapt it. Basically the 
> concept core is to add the 'PageHits' variable to each page and to keep 
> in the cache preferentially the pages with more hits because these 
> likely will be accessed again.

Right.  LRU.  Least Recently Used.  Yes, our cache is LRU - That's
why pages that are frequently referenced (e.g. upper levels of
indexes) stay in cache and pages that are less frequently used
eventually age out, whatever type they may be.

>>>     
>> Request optimization is done before the query is run - possibly
>> minutes or even hours before it is run, so the optimizer has no idea
>> what pages will be in cache when the query actually runs.  And the same
>> prepared and optimized query can be run repeatedly, which also affects
>> what will be found in cache.
>>    
> I see. But, theoretically speaking, isn't worth to 're-think' the query 
> right before its run in order to gain the cache advantage?

Probably not.  Yes, you could read through the cache finding
data pages from tables and get a guess as to what percent of a
table is cached, but you couldn't know whether the part you need
is there.

> Also, if 
> someone prepares the query 'minutes or even hours before' then the 
> optimizer chooses the solution based on the statistics at *that* moment 
> which can be quite different from now (imagine a batch insert, a huge 
> update (like UPDATE BIG_TABLE SET STATUS=1 - which updates a few 
> thousands of records) or a Delete which zaps 75% from rows... or 
> something else you want... :) ). Or I'm missing something?

That's something that's entirely in the control of the application ...
or of the interface layer.  And I'm not sure that the cases you
mention are worth a major redesign...

Regards,

Ann

Re: [Firebird-devel] AutoOptimize

A powerful, cross platform, SQL database system

Re: [Firebird-devel] AutoOptimize