Re: [Firebird-devel] AutoOptimize

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Arno Brinkman wrote:
> Hi,
>
>   
>>> Why only not unique ones?
>>>
>>>       
>> Sorry, Arno, but the unique indexes doesn't have constant selectivity?
>>     
>
> No, selectivity for every index is calculated with :
>   
You got me here! :) Sorry for my fault.
>>> It's not interesting to track modifications of index.
>>> All indexes which belong to the same table should update the statistics simultaneously else the 
>>> statistics become
>>> "unbalanced" compared to each other.
>>>
>>>       
>> Yes, but the "separate" indexes from the same table doesn't update at
>> (approximative) same moment in time?
>>     
>
> That's ofcourse true, not always all indexes are modified when updating data.
Yes. In fact, to be sincere, this is the way of auto-updating statistics 
from Microsoft / Sybase. Thanks to Laco for the original link. See also 
the links from my message to him. It seems that is a pretty common 
practice...

>> On daily basis, the debit and credit fields (which are 24) are updated
>> frequently on different records. So, when the max_thresold will hit
>> (let's say) because of DB01 (because we are in January) then all 25
>> indexes will be recalculated (the PK not because is unique, CIIW, the
>> NAME and the other 24 numeric fields). The main overhead IMHO comes from
>> the field NAME (due to fact that is very large compared with others -
>> 210 bytes on UNICODE) which is a very static index in day-by-day usage.
>>     
>
> For your case above i wonder if it would be really usefull to update statistics after x-% 
> modifications. Those double precision values are probably very unique to each other.
>
>   
Generally speaking, yes. But nobody can guarantee this. And also is a 
very 'special' value (IOW, 0) which 'appears' and 'disappears' on a 
rather high degree based on many factors, which changes in a sensible 
manner the selectivity of indexes.
>> Hence a rather 'small' efficiency, IMHO. But, of course, you know better
>> than me the engine internals, so if the things became complex / worse
>> feel free to do as you wish. Anyway, my small benchmark shows that
>> updating the statistics isn't a performance hit.
>>     
>
> Running SET STATISTICS for many indexes in a busy system is certainly going to slow down, because 
> all index pages will be fetched, but i agree that it must be happen by the engine automaticly at 
> some point.
>   
1. That's why I had SetPriority(IDLE_PRIORITY_CLASS, 
THREAD_PRIORITY_BELOW_NORMAL) - in order to immediately back-off when a 
load is present. And I saw this with my eyes. This is by far the most 
appropriate approach IMHO ...and not only IMHO ( :) )because I spoke 
with many other programmers in newsgroups, searching the net, read some 
books aso. until I reached to conclusion (which I must admit is the 
common one) that is better to leave the OS thread scheduler to do the 
work. Knows better. This isn't something which is critical to have it 
synchronous, even if MS SQL 2000 have them in sync but in MS SQL 2005 
backs off a little and gives the possibility of asynchronous updating 
"in case of heavy loads" (IOW, my solution - a separate, low priority, 
thread) but keeps as default the synchronous updating. Also taking in 
consideration that (almost) all the other important vendors (except 
Postgres IIRC) implemented automatic update of statistics, and, by far, 
Firebird is the one which gathers the _less_ statistics from all 
servers, IMHO, is pretty clear that this is the way to go. (For 
references on MS SQL case see the links in my message to Laco). And also 
we can be much more aggressive than my very 'permisive' solution (ie. a 
scheduled idle-priority thread which triggers after a specified amount 
of modifications only on the indexes which need this and only if in a 
specified amount of time no modification was made on those indexes - 
thing which implies a (possible) load decreasing).

But perhaps is better to post my application in 'Files' section of the 
'support' group for users to test the thing practically in real case 
scenarios and gather some real info on it. What do you say?

2. What do you mean "all index pages will be *fetched*"? You don't cache 
the indexes in memory ? (Now, my heart beats because of fear of 
possibility to answer "no" here...) Of course, you mean "fetched" from 
the page cache... :)

3. Also, about caching, you, of course, take in consideration in cost 
calculation the fact that (some) pages is already in cache. IOW, if we 
have two tables one with 10000 rows and the other with 50000 but the 
latter one is already in cache then you go with it, isn't it? Or you 
take all the tables/pages the same nevertheless if they are cached or not?

hth,

m. th.

---------------------------------------------------------
P.S.: Appendix for Delphians:

The SetPriority procedure isn't in Delphi RTL. Its code is:

procedure SetPriority(AClass: cardinal; APriority: integer);
var
  ProcessID : DWORD;
  ProcessHandle : THandle;
  ThreadHandle : THandle;
begin
  ProcessID := GetCurrentProcessID;
  ProcessHandle := OpenProcess(PROCESS_SET_INFORMATION,
                               false,
                               ProcessID);
  SetPriorityClass(ProcessHandle, AClass); //--- for ex. 
REALTIME_PRIORITY_CLASS
  ThreadHandle := GetCurrentThread;
  SetThreadPriority(ThreadHandle, APriority); //--- for ex. 
THREAD_PRIORITY_TIME_CRITICAL
end;

Re: [Firebird-devel] AutoOptimize

A powerful, cross platform, SQL database system

Re: [Firebird-devel] AutoOptimize