|
From: Arno B. <fir...@ab...> - 2005-01-22 01:18:11
|
Hi Jim, > >In my memory, the IB optimizer never worked properly during the last 8 > >years. Now this is the worst optimization code in the industry, perhaps > >except MySQL. If you want it to be fixed (read: rewritten from scratch), you > >should be prepared for a very long testing period. > I don't think this is true, and I'm quite sure not enough is known to > even evaluate it's truth. Well, i'm sure the optimizer didn't choose the right indexes in many cases. > The optimizer has two phases. The first is join order selection, the > second is actual generation of the RSB tree. Given the intermediate > bitmap implementation of the indexes, there are very few subtleties in > the second phase. The problems, to the best of my knowledge, are all in > the first phase. Not only in the first phase also in the second phase. As you wish some examples : CREATE TABLE Relations ( RelationID INT NOT NULL PRIMARY KEY ); CREATE TABLE Categories ( CategoryID INT NOT NULL PRIMARY KEY ); CREATE TABLE RelationCategories ( RelationID INT NOT NULL, CategoryID INT NOT NULL, PRIMARY KEY (RelationID, CategoryID), FOREIGN KEY (RelationID) REFERENCES Relations (RelationID), FOREIGN KEY (CategoryID) REFERENCES Categories (CategoryID) ); SELECT Count(*) FROM Relations r JOIN RelationCategories rc ON (rc.RelationID = r.RelationID) WHERE rc.CATEGORYID IN (1, 4) The above query will never use fully the compound primary key while it would be possible. Also a compound index was preferred above a single index, because it often had a better selectivity, but this is only true when every segment it is matched with a equal. SELECT Count(*) FROM Relations r WHERE r.RelationID = 1 and r.RelationID > 1 Above query is a silly example, but with complex views and automatic generated queries you could have the same fields with different comparisons. A equal comparison should be prefered, but this wasn't the case. SELECT Count(*) FROM Relations r WHERE r.FieldX STARTING WITH 'aa' If more there were more indexes with FieldX as first segment all these indexes were choosen for the comparison. The same was true for OR booleans. And this problem was there also for IS NULL comparisons. These are just some examples of what was wrong. > There are two approaches to join order optimization, cost based and > heuristic based. Cost based means you estimate the costs various > alternatives and pick the cheapest. Heuristic based means you have a > bunch of rules which, when applied, determine the join order. > Just about everyone is born believing that the heuristic approach is > clearly superior. Any exposure to AI amplfies this belief. The belief > generally lasts up to the point where you're asked to optimize a 17 way > join. > The Interbase optimizer was originally a pure cost based optimizer. It > stayed pure until a customer reported that a 17 way join only took a few > seconds -- no problem there -- but the optimization took 12.5 hours. At > that point (actually shortly thereafter) the optimizer because cost > based with tree pruning. The optimizer always starts by considering > paths with the unique indexes up front. It also remembers the best > cost. At every point in the permutation tree walk it estimates the best > case cost of the subtree, and if this exceeds the current best > permutation, it gives up on the remaining sub-tree. It uses heuristics, > but only to prune the search tree. As you said there is a heuristic part in the join order and this causes that for some queries interesting join orders are not calculated. I'm investigating if there's no way to turn it into a better calculation. The big problem still remains that trying out all combinations is to expensive. My approach looks like this : calculate stream position: walk through all streams calculate for every stream the cost and put in the ordered list if it uses minimal a index pick best stream (cheapest cost) from list and calculate the next position For the first position all streams are tried unless all streams could be used. For all other positions only the cheapest stream is used. (The estimated selectivity for the stream is calculated by exactly the same routine (class) that will finally create the inversions needed for the rsb-tree) Before calling the routine above it would be nice if we already could order the streams based on constraints. Is this possible in Vulcan with a internal query? Regards, Arno Brinkman ABVisie -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Firebird open source database (based on IB-OE) with many SQL-99 features : http://www.firebirdsql.org http://www.firebirdsql.info http://www.fingerbird.de/ http://www.comunidade-firebird.org/ Support list for Interbase and Firebird users : fir...@ya... Nederlandse firebird nieuwsgroep : news://80.126.130.81 |