Re: [Firebird-devel] deprecate PLANs was Re: Garbage in error

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Jim,

> >In my memory, the IB optimizer never worked properly during the last 8
> >years. Now this is the worst optimization code in the industry, perhaps
> >except MySQL. If you want it to be fixed (read: rewritten from scratch), you
> >should be prepared for a very long testing period.

> I don't think this is true, and I'm quite sure not enough is known to
> even evaluate it's truth.

Well, i'm sure the optimizer didn't choose the right indexes in many cases.

> The optimizer has two phases.  The first is join order selection, the
> second is actual generation of the RSB tree.  Given the intermediate
> bitmap implementation of the indexes, there are very few subtleties in
> the second phase.  The problems, to the best of my knowledge, are all in
> the first phase.

Not only in the first phase also in the second phase.
As you wish some examples :

CREATE TABLE Relations (
  RelationID INT NOT NULL PRIMARY KEY
);

CREATE TABLE Categories (
  CategoryID INT NOT NULL PRIMARY KEY
);

CREATE TABLE RelationCategories (
  RelationID INT NOT NULL,
  CategoryID INT NOT NULL,
  PRIMARY KEY (RelationID, CategoryID),
  FOREIGN KEY (RelationID) REFERENCES Relations (RelationID),
  FOREIGN KEY (CategoryID) REFERENCES Categories (CategoryID)
);

SELECT
  Count(*)
FROM
  Relations r
  JOIN RelationCategories rc ON (rc.RelationID = r.RelationID)
WHERE
  rc.CATEGORYID IN (1, 4)

The above query will never use fully the compound primary key while it would be possible.
Also a compound index was preferred above a single index, because it often had a better selectivity,
but this is only true when every segment it is matched with a equal.

SELECT
  Count(*)
FROM
  Relations r
WHERE
  r.RelationID = 1 and
  r.RelationID > 1

Above query is a silly example, but with complex views and automatic generated queries you could
have the same fields with different comparisons. A equal comparison should be prefered, but this
wasn't the case.

SELECT
  Count(*)
FROM
  Relations r
WHERE
  r.FieldX STARTING WITH 'aa'

If more there were more indexes with FieldX as first segment all these indexes were choosen for the
comparison. The same was true for OR booleans.
And this problem was there also for IS NULL comparisons.

These are just some examples of what was wrong.

> There are two approaches to join order optimization, cost based and
> heuristic based.  Cost based means you estimate the costs various
> alternatives and pick the cheapest.  Heuristic based means you have a
> bunch of rules which, when applied, determine the join order.

> Just about everyone is born believing that the heuristic approach is
> clearly superior.  Any exposure to AI amplfies this belief.  The belief
> generally lasts up to the point where you're asked to optimize a 17 way
> join.

> The Interbase optimizer was originally a pure cost based optimizer.  It
> stayed pure until a customer reported that a 17 way join only took a few
> seconds -- no problem there -- but the optimization took 12.5 hours.  At
> that point (actually shortly thereafter) the optimizer because cost
> based with tree pruning.  The optimizer always starts by considering
> paths with the unique indexes up front.  It also remembers the best
> cost.  At every point in the permutation tree walk it estimates the best
> case cost of the subtree, and if this exceeds the current best
> permutation, it gives up on the remaining sub-tree.  It uses heuristics,
> but only to prune the search tree.

As you said there is a heuristic part in the join order and this causes that for some queries
interesting join orders are not calculated. I'm investigating if there's no way to turn it into a
better calculation. The big problem still remains that trying out all combinations is to expensive.

My approach looks like this :

calculate stream position:
walk through all streams
calculate for every stream the cost and put in the ordered list if it uses minimal a index
pick best stream (cheapest cost) from list and calculate the next position

For the first position all streams are tried unless all streams could be used.
For all other positions only the cheapest stream is used.
(The estimated selectivity for the stream is calculated by exactly the same routine (class) that
will finally create the inversions needed for the rsb-tree)

Before calling the routine above it would be nice if we already could order the streams based on
constraints. Is this possible in Vulcan with a internal query?

Regards,
Arno Brinkman
ABVisie

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Firebird open source database (based on IB-OE) with many SQL-99 features :
  http://www.firebirdsql.org
  http://www.firebirdsql.info
  http://www.fingerbird.de/
  http://www.comunidade-firebird.org/

Support list for Interbase and Firebird users :
fir...@ya...

Nederlandse firebird nieuwsgroep :
news://80.126.130.81

Re: [Firebird-devel] deprecate PLANs was Re: Garbage in error

A powerful, cross platform, SQL database system

Re: [Firebird-devel] deprecate PLANs was Re: Garbage in error