From: SourceForge.net <no...@so...> - 2007-11-06 14:32:47
|
Patches item #1826883, was opened at 2007-11-06 14:32 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=317691&aid=1826883&group_id=17691 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: kuelzer (kuelzi) Assigned to: Nobody/Anonymous (nobody) Summary: XQuery Join Optimizer Initial Comment: This is a prototype of a XQuery Join Optimizer for eXist. I would be happy if some of the exist developers could review the code and give some feedback. A join basically consists of two nested for-expressions where the inner one compares somehow its items with the outer items. This is a standard operation if you want to match for example ID's and IDREF's or even if you want to group results of a query. The main problem here is that executing 2 nested iterations in the naive way causes quadratic complexity which results in never-ending query-times even for quite small databases. The join optimizer can detect some of those nested expressions and rewrite them internally to a main-memory-indexed version, where the results of the inner iteration are stored into and looked-up from an on-the-fly index as needed. In most cases this results in linear growing query-times. The effect is quite impressive: for the xmark benchmark with a mini db-size of 10MB the benchmark query 9 already takes over 100 seconds (of course according to the hardware and setup), whereas the optimized query returns after 0.35 seconds. I have tried to keep changes at existing code minimal, nevertheless I had to adjust some of the files (some of the adjustments seem to complete some missing code as well, see ElementConstructor.accept ). See below for detailed information on the changes. I have also added some tests which check special cases and general functionality of the optimizer. All other XQuery-Tests of eXist pass as expected. Functionality: The join optimizer is called after the standard optimizer as a separate optimization step (but also activated via enable-query-rewriting='yes' until now). It visits the expression tree and detects different kinds of joins (to be extended). The ForExpr-expressions which can be evaluated as a join are then replaced internally by a newly created CachedForExpr which has the same semantics but evaluates the expression with the help of an index which is created and maintained during evaluation (in contrast to persistent indexes). The following classes had to be changed in order to work with the join optimization: - AttributeConstructor: added accept-method - ElementConstructor: added accept-method, get-methods for content and attributes - GeneralComparison: removed accept-method (already at BinaryOp) - DefaultExpressionVisitor, BasicExpressionVisitor, ExpressionVisitor: adjusted accordingly - ForExpr: added methods for getting and setting the parent expression, added setDefiningExpression for variables - LetExpr: added setDefiningExpression for the variable - Variable: added observeValue() and valueHasChanged(), setAsPositionalVar() and isPositionalVar() - VariableReference: added accept-method, minor adjustions - XQuery: added call to the join optimizer The following classes are new: - CachedForExpr: the base class for evaluation of join-like ForExpr - JoinRewritingVisitor: the visitor which detects join expressions and rewrites the expression tree (which means changing the inner ForExpr to a CachedForExpr and setting the join parameters) - VariableCollectingVisitor: Helping visitor to collect the variables/references used in an expression - InMemoryIndex, AbstractMultiMap, HashMultiMap: base classes which provide the in memory index structure - DistinctValueSequence: implements a sequence with distinct values (functionality like in FunDistinctValues) - PositionedItem: Wrapper-class for Items which are stored into the index to recover the original position ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=317691&aid=1826883&group_id=17691 |