|
From: = m. Th = <th...@va...> - 2007-01-03 07:55:37
|
Arno Brinkman wrote:
> Hi,
>
>
>>> Why only not unique ones?
>>>
>>>
>> Sorry, Arno, but the unique indexes doesn't have constant selectivity?
>>
>
> No, selectivity for every index is calculated with :
>
You got me here! :) Sorry for my fault.
>>> It's not interesting to track modifications of index.
>>> All indexes which belong to the same table should update the statistics simultaneously else the
>>> statistics become
>>> "unbalanced" compared to each other.
>>>
>>>
>> Yes, but the "separate" indexes from the same table doesn't update at
>> (approximative) same moment in time?
>>
>
> That's ofcourse true, not always all indexes are modified when updating data.
Yes. In fact, to be sincere, this is the way of auto-updating statistics
from Microsoft / Sybase. Thanks to Laco for the original link. See also
the links from my message to him. It seems that is a pretty common
practice...
>> On daily basis, the debit and credit fields (which are 24) are updated
>> frequently on different records. So, when the max_thresold will hit
>> (let's say) because of DB01 (because we are in January) then all 25
>> indexes will be recalculated (the PK not because is unique, CIIW, the
>> NAME and the other 24 numeric fields). The main overhead IMHO comes from
>> the field NAME (due to fact that is very large compared with others -
>> 210 bytes on UNICODE) which is a very static index in day-by-day usage.
>>
>
> For your case above i wonder if it would be really usefull to update statistics after x-%
> modifications. Those double precision values are probably very unique to each other.
>
>
Generally speaking, yes. But nobody can guarantee this. And also is a
very 'special' value (IOW, 0) which 'appears' and 'disappears' on a
rather high degree based on many factors, which changes in a sensible
manner the selectivity of indexes.
>> Hence a rather 'small' efficiency, IMHO. But, of course, you know better
>> than me the engine internals, so if the things became complex / worse
>> feel free to do as you wish. Anyway, my small benchmark shows that
>> updating the statistics isn't a performance hit.
>>
>
> Running SET STATISTICS for many indexes in a busy system is certainly going to slow down, because
> all index pages will be fetched, but i agree that it must be happen by the engine automaticly at
> some point.
>
1. That's why I had SetPriority(IDLE_PRIORITY_CLASS,
THREAD_PRIORITY_BELOW_NORMAL) - in order to immediately back-off when a
load is present. And I saw this with my eyes. This is by far the most
appropriate approach IMHO ...and not only IMHO ( :) )because I spoke
with many other programmers in newsgroups, searching the net, read some
books aso. until I reached to conclusion (which I must admit is the
common one) that is better to leave the OS thread scheduler to do the
work. Knows better. This isn't something which is critical to have it
synchronous, even if MS SQL 2000 have them in sync but in MS SQL 2005
backs off a little and gives the possibility of asynchronous updating
"in case of heavy loads" (IOW, my solution - a separate, low priority,
thread) but keeps as default the synchronous updating. Also taking in
consideration that (almost) all the other important vendors (except
Postgres IIRC) implemented automatic update of statistics, and, by far,
Firebird is the one which gathers the _less_ statistics from all
servers, IMHO, is pretty clear that this is the way to go. (For
references on MS SQL case see the links in my message to Laco). And also
we can be much more aggressive than my very 'permisive' solution (ie. a
scheduled idle-priority thread which triggers after a specified amount
of modifications only on the indexes which need this and only if in a
specified amount of time no modification was made on those indexes -
thing which implies a (possible) load decreasing).
But perhaps is better to post my application in 'Files' section of the
'support' group for users to test the thing practically in real case
scenarios and gather some real info on it. What do you say?
2. What do you mean "all index pages will be *fetched*"? You don't cache
the indexes in memory ? (Now, my heart beats because of fear of
possibility to answer "no" here...) Of course, you mean "fetched" from
the page cache... :)
3. Also, about caching, you, of course, take in consideration in cost
calculation the fact that (some) pages is already in cache. IOW, if we
have two tables one with 10000 rows and the other with 50000 but the
latter one is already in cache then you go with it, isn't it? Or you
take all the tables/pages the same nevertheless if they are cached or not?
hth,
m. th.
---------------------------------------------------------
P.S.: Appendix for Delphians:
The SetPriority procedure isn't in Delphi RTL. Its code is:
procedure SetPriority(AClass: cardinal; APriority: integer);
var
ProcessID : DWORD;
ProcessHandle : THandle;
ThreadHandle : THandle;
begin
ProcessID := GetCurrentProcessID;
ProcessHandle := OpenProcess(PROCESS_SET_INFORMATION,
false,
ProcessID);
SetPriorityClass(ProcessHandle, AClass); //--- for ex.
REALTIME_PRIORITY_CLASS
ThreadHandle := GetCurrentThread;
SetThreadPriority(ThreadHandle, APriority); //--- for ex.
THREAD_PRIORITY_TIME_CRITICAL
end;
|