|
From: <tho...@us...> - 2014-01-09 21:55:18
|
Revision: 7755
http://bigdata.svn.sourceforge.net/bigdata/?rev=7755&view=rev
Author: thompsonbry
Date: 2014-01-09 21:55:07 +0000 (Thu, 09 Jan 2014)
Log Message:
-----------
The RTO now supports FILTERs on required joins when those filters have materialization requirements. This is the subject of #257, which is now closed. Over time we will probably refactor the RTO to operate directly on the AST nodes (e.g., StatementPatternNode versus SPOPredicate), but it currently handles conditional materialization pipelines just fine.
There are currently two different ways in which a cutoff join can be evaluated. One uses a single pipeline join and carefully controls the execution of that join to obtain an estimated cardinality. This is the historical method.
The other can handle a sequence of operators. A pipeline join is generated followed by whatever operators are required to materialize variable bindings and evaluate filters. Finally, a SLICE is appended to the query plan to limit the output. When the query is executed, a rowid column is injected into the source solutions. This is used to correctly correlate the #of input solutions required to produce a given #of output solutions and hence obtain the estimated cardinality of the join through cutoff evaluation.
Some slight differences in the resulting plans and runtime behavior have been observed when all queries
See #257 (Support BOP fragments in the RTO)
See #64 (Runtime Query Optimizer)
Modified Paths:
--------------
branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/BOpIdFactory.java
branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/join/PipelineJoin.java
branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/joinGraph/PartitionedJoinGroup.java
branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/joinGraph/rto/EdgeSample.java
branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/joinGraph/rto/JGraph.java
branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/joinGraph/rto/JoinGraph.java
branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/joinGraph/rto/Path.java
branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/joinGraph/rto/SampleBase.java
branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/eval/AST2BOpBase.java
branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/eval/AST2BOpFilters.java
branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/eval/AST2BOpJoins.java
branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/eval/AST2BOpRTO.java
branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/eval/AST2BOpUtility.java
branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/test/com/bigdata/rdf/sparql/ast/eval/rto/AbstractRTOTestCase.java
branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/test/com/bigdata/rdf/sparql/ast/eval/rto/BAR-Q1.rq
branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/test/com/bigdata/rdf/sparql/ast/eval/rto/BSBM-Q1.rq
branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/test/com/bigdata/rdf/sparql/ast/eval/rto/BSBM-Q1.srx
branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/test/com/bigdata/rdf/sparql/ast/eval/rto/LUBM-Q2.rq
branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/test/com/bigdata/rdf/sparql/ast/eval/rto/LUBM-Q9.rq
branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/test/com/bigdata/rdf/sparql/ast/eval/rto/TestRTO_BSBM.java
Added Paths:
-----------
branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/eval/OutOfOrderEvaluationException.java
Removed Paths:
-------------
branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/test/com/bigdata/bop/rdf/joinGraph/AbstractJoinGraphTestCase.java
branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/test/com/bigdata/bop/rdf/joinGraph/TestAll.java
branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/test/com/bigdata/bop/rdf/joinGraph/TestJoinGraphOnBSBMData.java
branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/test/com/bigdata/bop/rdf/joinGraph/TestJoinGraphOnBarData.java
branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/test/com/bigdata/bop/rdf/joinGraph/TestJoinGraphOnLubm.java
Modified: branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/BOpIdFactory.java
===================================================================
--- branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/BOpIdFactory.java 2014-01-09 20:47:55 UTC (rev 7754)
+++ branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/BOpIdFactory.java 2014-01-09 21:55:07 UTC (rev 7755)
@@ -82,51 +82,54 @@
}
/**
- * Reserve ids used by the predicates or constraints associated with some
- * join graph.
+ * Reserve ids used by the predicates in some join graph.
*
* @param preds
* The vertices of the join graph.
- * @param constraints
- * The constraints of the join graph (optional).
*/
- public void reserveIds(final IPredicate<?>[] preds,
- final IConstraint[] constraints) {
+ public void reserveIds(final IPredicate<?>[] preds) {
if (preds == null)
throw new IllegalArgumentException();
- final BOpIdFactory idFactory = this;
-
for (IPredicate<?> p : preds) {
-
- idFactory.reserve(p.getId());
-
+
+ reserve(p.getId());
+
}
- if (constraints != null) {
-
- for (IConstraint c : constraints) {
-
- final Iterator<BOp> itr = BOpUtility
- .preOrderIteratorWithAnnotations(c);
+ }
- while (itr.hasNext()) {
-
- final BOp y = itr.next();
-
- final Integer anId = (Integer) y
- .getProperty(BOp.Annotations.BOP_ID);
-
- if (anId != null)
- idFactory.reserve(anId.intValue());
-
- }
-
+ /**
+ * Reserve ids used by the constraints for some predicate or join graph.
+ *
+ * @param constraints
+ * The constraints that attach to some predicate (optional).
+ */
+ public void reserveIds(final IConstraint[] constraints) {
+
+ if (constraints == null)
+ return;
+
+ for (IConstraint c : constraints) {
+
+ final Iterator<BOp> itr = BOpUtility
+ .preOrderIteratorWithAnnotations(c);
+
+ while (itr.hasNext()) {
+
+ final BOp y = itr.next();
+
+ final Integer anId = (Integer) y
+ .getProperty(BOp.Annotations.BOP_ID);
+
+ if (anId != null)
+ reserve(anId.intValue());
+
}
}
-
+
}
}
\ No newline at end of file
Modified: branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/join/PipelineJoin.java
===================================================================
--- branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/join/PipelineJoin.java 2014-01-09 20:47:55 UTC (rev 7754)
+++ branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/join/PipelineJoin.java 2014-01-09 21:55:07 UTC (rev 7755)
@@ -204,8 +204,23 @@
String COALESCE_DUPLICATE_ACCESS_PATHS = (PipelineJoin.class.getName()
+ ".coalesceDuplicateAccessPaths").intern();
- boolean DEFAULT_COALESCE_DUPLICATE_ACCESS_PATHS = true;
+ boolean DEFAULT_COALESCE_DUPLICATE_ACCESS_PATHS = true;
+ /**
+ * When <code>true</code>, access paths will be reordered to maximize
+ * locality.
+ * <p>
+ * Note: This needs to be turned off when the RTO uses row identifiers
+ * to correlate the input and output solutions for complex joins (those
+ * which required materialization of RDF Values for FILTER evaluation).
+ *
+ * @todo unit tests when (en|dis)abled.
+ */
+ String REORDER_ACCESS_PATHS = (PipelineJoin.class.getName() + ".reorderAccessPaths")
+ .intern();
+
+ boolean DEFAULT_REORDER_ACCESS_PATHS = true;
+
}
/**
@@ -223,7 +238,7 @@
* @param args
* @param annotations
*/
- public PipelineJoin(final BOp[] args, NV... annotations) {
+ public PipelineJoin(final BOp[] args, final NV... annotations) {
this(args, NV.asMap(annotations));
@@ -239,6 +254,11 @@
super(args, annotations);
+ /*
+ * TODO We should be checking this operator's required properties,
+ * especially when used for the RTO.
+ */
+
// if (arity() != 1)
// throw new IllegalArgumentException();
@@ -247,15 +267,6 @@
}
- // /**
- // * The sole operand, which is the previous join in the pipeline join path.
- // */
- // public PipelineOp left() {
- //
- // return (PipelineOp) get(0);
- //
- // }
-
/**
* {@inheritDoc}
*
@@ -420,8 +431,16 @@
*
* @see Annotations#COALESCE_DUPLICATE_ACCESS_PATHS
*/
- final boolean coalesceAccessPaths;
+ final private boolean coalesceAccessPaths;
+ /**
+ * When <code>true</code>, access paths will be reordered to maximize
+ * locality.
+ *
+ * @see Annotations#REORDER_ACCESS_PATHS
+ */
+ final private boolean reorderAccessPaths;
+
/**
* Used to enforce the {@link Annotations#LIMIT} iff one is specified.
*/
@@ -523,6 +542,9 @@
this.coalesceAccessPaths = joinOp.getProperty(
Annotations.COALESCE_DUPLICATE_ACCESS_PATHS,
Annotations.DEFAULT_COALESCE_DUPLICATE_ACCESS_PATHS);
+ this.reorderAccessPaths = joinOp.getProperty(
+ Annotations.REORDER_ACCESS_PATHS,
+ Annotations.DEFAULT_REORDER_ACCESS_PATHS);
this.threadLocalBufferFactory = new TLBFactory(sink);
@@ -942,10 +964,11 @@
*/
final AccessPathTask[] tasks = generateAccessPaths(chunk);
- /*
- * Reorder those tasks for better index read performance.
- */
- reorderTasks(tasks);
+ /*
+ * Reorder those tasks for better index read performance.
+ */
+ if (reorderAccessPaths)
+ reorderTasks(tasks);
/*
* Execute the tasks (either in the caller's thread or on
Modified: branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/joinGraph/PartitionedJoinGroup.java
===================================================================
--- branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/joinGraph/PartitionedJoinGroup.java 2014-01-09 20:47:55 UTC (rev 7754)
+++ branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/joinGraph/PartitionedJoinGroup.java 2014-01-09 21:55:07 UTC (rev 7755)
@@ -12,18 +12,12 @@
import org.apache.log4j.Logger;
import com.bigdata.bop.BOp;
-import com.bigdata.bop.BOpEvaluationContext;
-import com.bigdata.bop.BOpIdFactory;
import com.bigdata.bop.BOpUtility;
import com.bigdata.bop.Bind;
import com.bigdata.bop.IConstraint;
import com.bigdata.bop.IPredicate;
import com.bigdata.bop.IVariable;
-import com.bigdata.bop.NV;
-import com.bigdata.bop.PipelineOp;
-import com.bigdata.bop.join.PipelineJoin;
import com.bigdata.bop.joinGraph.rto.JoinGraph;
-import com.bigdata.bop.solutions.JVMDistinctBindingSetsOp;
import com.bigdata.rdf.sparql.ast.StaticAnalysis_CanJoin;
/**
@@ -962,180 +956,180 @@
}
- /**
- * Generate a query plan from an ordered collection of predicates.
- *
- * @param distinct
- * <code>true</code> iff only the distinct solutions are desired.
- * @param selected
- * The variable(s) to be projected out of the join graph.
- * @param preds
- * The join path which will be used to execute the join graph.
- * @param constraints
- * The constraints on the join graph.
- *
- * @return The query plan.
- *
- * FIXME Select only those variables required by downstream
- * processing or explicitly specified by the caller (in the case
- * when this is a subquery, the caller has to declare which
- * variables are selected and will be returned out of the subquery).
- *
- * FIXME For scale-out, we need to either mark the join's evaluation
- * context based on whether or not the access path is local or
- * remote (and whether the index is key-range distributed or hash
- * partitioned).
- *
- * FIXME Add a method to generate a runnable query plan from the
- * collection of predicates and constraints on the
- * {@link PartitionedJoinGroup} together with an ordering over the
- * join graph. This is a bit different for the join graph and the
- * optionals in the tail plan. The join graph itself should either
- * be a {@link JoinGraph} operator which gets evaluated at run time
- * or reordered by whichever optimizer is selected for the query
- * (query hints).
- *
- * @todo The order of the {@link IPredicate}s in the tail plan is currently
- * unchanged from their given order (optional joins without
- * constraints can not reduce the selectivity of the query). However,
- * it could be worthwhile to run optionals with constraints before
- * those without constraints since the constraints can reduce the
- * selectivity of the query. If we do this, then we need to reorder
- * the optionals based on the partial order imposed what variables
- * they MIGHT bind (which are not bound by the join graph).
- *
- * @todo multiple runFirst predicates can be evaluated in parallel unless
- * they have shared variables. When there are no shared variables,
- * construct a TEE pattern such that evaluation proceeds in parallel.
- * When there are shared variables, the runFirst predicates must be
- * ordered based on those shared variables (at which point, it is
- * probably an error to flag them as runFirst).
- */
- static public PipelineOp getQuery(final BOpIdFactory idFactory,
- final boolean distinct, final IVariable<?>[] selected,
- final IPredicate<?>[] preds, final IConstraint[] constraints) {
-
- /*
- * Reserve ids used by the join graph or its constraints.
- */
- idFactory.reserveIds(preds, constraints);
-// {
-// for (IPredicate<?> p : preds) {
-// idFactory.reserve(p.getId());
+// /**
+// * Generate a query plan from an ordered collection of predicates.
+// *
+// * @param distinct
+// * <code>true</code> iff only the distinct solutions are desired.
+// * @param selected
+// * The variable(s) to be projected out of the join graph.
+// * @param preds
+// * The join path which will be used to execute the join graph.
+// * @param constraints
+// * The constraints on the join graph.
+// *
+// * @return The query plan.
+// *
+// * FIXME Select only those variables required by downstream
+// * processing or explicitly specified by the caller (in the case
+// * when this is a subquery, the caller has to declare which
+// * variables are selected and will be returned out of the subquery).
+// *
+// * FIXME For scale-out, we need to either mark the join's evaluation
+// * context based on whether or not the access path is local or
+// * remote (and whether the index is key-range distributed or hash
+// * partitioned).
+// *
+// * FIXME Add a method to generate a runnable query plan from the
+// * collection of predicates and constraints on the
+// * {@link PartitionedJoinGroup} together with an ordering over the
+// * join graph. This is a bit different for the join graph and the
+// * optionals in the tail plan. The join graph itself should either
+// * be a {@link JoinGraph} operator which gets evaluated at run time
+// * or reordered by whichever optimizer is selected for the query
+// * (query hints).
+// *
+// * @todo The order of the {@link IPredicate}s in the tail plan is currently
+// * unchanged from their given order (optional joins without
+// * constraints can not reduce the selectivity of the query). However,
+// * it could be worthwhile to run optionals with constraints before
+// * those without constraints since the constraints can reduce the
+// * selectivity of the query. If we do this, then we need to reorder
+// * the optionals based on the partial order imposed what variables
+// * they MIGHT bind (which are not bound by the join graph).
+// *
+// * @todo multiple runFirst predicates can be evaluated in parallel unless
+// * they have shared variables. When there are no shared variables,
+// * construct a TEE pattern such that evaluation proceeds in parallel.
+// * When there are shared variables, the runFirst predicates must be
+// * ordered based on those shared variables (at which point, it is
+// * probably an error to flag them as runFirst).
+// */
+// static public PipelineOp getQuery(final BOpIdFactory idFactory,
+// final boolean distinct, final IVariable<?>[] selected,
+// final IPredicate<?>[] preds, final IConstraint[] constraints) {
+//
+// /*
+// * Reserve ids used by the join graph or its constraints.
+// */
+// idFactory.reserveIds(preds, constraints);
+//// {
+//// for (IPredicate<?> p : preds) {
+//// idFactory.reserve(p.getId());
+//// }
+//// if (constraints != null) {
+//// for (IConstraint c : constraints) {
+//// final Iterator<BOp> itr = BOpUtility
+//// .preOrderIteratorWithAnnotations(c);
+//// while (itr.hasNext()) {
+//// final BOp y = itr.next();
+//// final Integer anId = (Integer) y
+//// .getProperty(BOp.Annotations.BOP_ID);
+//// if (anId != null)
+//// idFactory.reserve(anId.intValue());
+//// }
+//// }
+//// }
+//// }
+//
+// // figure out which constraints are attached to which predicates.
+// final IConstraint[][] assignedConstraints = PartitionedJoinGroup
+// .getJoinGraphConstraints(preds, constraints, null/*knownBound*/,
+// true/*pathIsComplete*/);
+//
+//// final PipelineJoin<?>[] joins = new PipelineJoin[preds.length];
+//
+// PipelineOp lastOp = null;
+//
+// final Set<IVariable<?>> knownBound = new LinkedHashSet<IVariable<?>>();
+//
+// for (int i = 0; i < preds.length; i++) {
+//
+// // The next vertex in the selected join order.
+// final IPredicate<?> p = preds[i];
+//
+// // Annotations for this join.
+// final List<NV> anns = new LinkedList<NV>();
+//
+// anns.add(new NV(PipelineJoin.Annotations.PREDICATE, p));
+//
+// anns.add(new NV(PipelineJoin.Annotations.BOP_ID, idFactory
+// .nextId()));
+//
+//// anns.add(new NV(PipelineJoin.Annotations.EVALUATION_CONTEXT, BOpEvaluationContext.ANY));
+////
+//// anns.add(new NV(PipelineJoin.Annotations.SELECT, vars.toArray(new IVariable[vars.size()])));
+//
+// if (assignedConstraints[i] != null
+// && assignedConstraints[i].length > 0) {
+// // attach constraints to this join.
+// anns.add(new NV(PipelineJoin.Annotations.CONSTRAINTS,
+// assignedConstraints[i]));
// }
-// if (constraints != null) {
-// for (IConstraint c : constraints) {
-// final Iterator<BOp> itr = BOpUtility
-// .preOrderIteratorWithAnnotations(c);
-// while (itr.hasNext()) {
-// final BOp y = itr.next();
-// final Integer anId = (Integer) y
-// .getProperty(BOp.Annotations.BOP_ID);
-// if (anId != null)
-// idFactory.reserve(anId.intValue());
+//
+// // collect variables used as arguments by this predicate.
+// final Set<IVariable<?>> pvars = new LinkedHashSet<IVariable<?>>();
+// {
+// final Iterator<IVariable<?>> vitr = BOpUtility
+// .getArgumentVariables(p);
+// while (vitr.hasNext()) {
+// pvars.add(vitr.next());
+// }
+// }
+//
+// // figure out if there are ANY shared variables.
+// boolean shared = false;
+// {
+// for(IVariable<?> v : pvars) {
+// if(knownBound.contains(v)) {
+// shared = true;
+// break;
// }
// }
// }
+//
+// /*
+// * FIXME Explore the merit of this optimization with MikeP,
+// * including consideration of the PIPELINE_QUEUE_CAPACITY and
+// * whether or not to request an analytic join (hash join).
+// */
+// if (false && !shared) {
+// System.err.println("Full cross product join: " + p);
+// /*
+// * Force at-once evaluation to ensure that we evaluate the AP
+// * for [p] exactly once.
+// */
+// anns.add(new NV(PipelineOp.Annotations.PIPELINED, false));
+// }
+//
+// final PipelineJoin<?> joinOp = new PipelineJoin(//
+// lastOp == null ? new BOp[0] : new BOp[] { lastOp }, //
+// anns.toArray(new NV[anns.size()])//
+// );
+//
+// // Add predicate argument variables to [knownBound].
+// knownBound.addAll(pvars);
+//
+// lastOp = joinOp;
+//
+// }
+//
+// if (distinct) {
+// lastOp = new JVMDistinctBindingSetsOp(new BOp[] { lastOp }, NV
+// .asMap(new NV[] {
+// new NV(PipelineOp.Annotations.BOP_ID, idFactory
+// .nextId()), //
+// new NV(PipelineOp.Annotations.EVALUATION_CONTEXT,
+// BOpEvaluationContext.CONTROLLER),//
+// new NV(PipelineOp.Annotations.SHARED_STATE, true),//
+// new NV(JVMDistinctBindingSetsOp.Annotations.VARIABLES,
+// selected),//
+// })//
+// );
// }
-
- // figure out which constraints are attached to which predicates.
- final IConstraint[][] assignedConstraints = PartitionedJoinGroup
- .getJoinGraphConstraints(preds, constraints, null/*knownBound*/,
- true/*pathIsComplete*/);
-
-// final PipelineJoin<?>[] joins = new PipelineJoin[preds.length];
-
- PipelineOp lastOp = null;
-
- final Set<IVariable<?>> knownBound = new LinkedHashSet<IVariable<?>>();
-
- for (int i = 0; i < preds.length; i++) {
-
- // The next vertex in the selected join order.
- final IPredicate<?> p = preds[i];
-
- // Annotations for this join.
- final List<NV> anns = new LinkedList<NV>();
-
- anns.add(new NV(PipelineJoin.Annotations.PREDICATE, p));
-
- anns.add(new NV(PipelineJoin.Annotations.BOP_ID, idFactory
- .nextId()));
-
-// anns.add(new NV(PipelineJoin.Annotations.EVALUATION_CONTEXT, BOpEvaluationContext.ANY));
//
-// anns.add(new NV(PipelineJoin.Annotations.SELECT, vars.toArray(new IVariable[vars.size()])));
+// return lastOp;
+//
+// }
- if (assignedConstraints[i] != null
- && assignedConstraints[i].length > 0) {
- // attach constraints to this join.
- anns.add(new NV(PipelineJoin.Annotations.CONSTRAINTS,
- assignedConstraints[i]));
- }
-
- // collect variables used as arguments by this predicate.
- final Set<IVariable<?>> pvars = new LinkedHashSet<IVariable<?>>();
- {
- final Iterator<IVariable<?>> vitr = BOpUtility
- .getArgumentVariables(p);
- while (vitr.hasNext()) {
- pvars.add(vitr.next());
- }
- }
-
- // figure out if there are ANY shared variables.
- boolean shared = false;
- {
- for(IVariable<?> v : pvars) {
- if(knownBound.contains(v)) {
- shared = true;
- break;
- }
- }
- }
-
- /*
- * FIXME Explore the merit of this optimization with MikeP,
- * including consideration of the PIPELINE_QUEUE_CAPACITY and
- * whether or not to request an analytic join (hash join).
- */
- if (false && !shared) {
- System.err.println("Full cross product join: " + p);
- /*
- * Force at-once evaluation to ensure that we evaluate the AP
- * for [p] exactly once.
- */
- anns.add(new NV(PipelineOp.Annotations.PIPELINED, false));
- }
-
- final PipelineJoin<?> joinOp = new PipelineJoin(//
- lastOp == null ? new BOp[0] : new BOp[] { lastOp }, //
- anns.toArray(new NV[anns.size()])//
- );
-
- // Add predicate argument variables to [knownBound].
- knownBound.addAll(pvars);
-
- lastOp = joinOp;
-
- }
-
- if (distinct) {
- lastOp = new JVMDistinctBindingSetsOp(new BOp[] { lastOp }, NV
- .asMap(new NV[] {
- new NV(PipelineOp.Annotations.BOP_ID, idFactory
- .nextId()), //
- new NV(PipelineOp.Annotations.EVALUATION_CONTEXT,
- BOpEvaluationContext.CONTROLLER),//
- new NV(PipelineOp.Annotations.SHARED_STATE, true),//
- new NV(JVMDistinctBindingSetsOp.Annotations.VARIABLES,
- selected),//
- })//
- );
- }
-
- return lastOp;
-
- }
-
}
Modified: branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/joinGraph/rto/EdgeSample.java
===================================================================
--- branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/joinGraph/rto/EdgeSample.java 2014-01-09 20:47:55 UTC (rev 7754)
+++ branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/joinGraph/rto/EdgeSample.java 2014-01-09 21:55:07 UTC (rev 7755)
@@ -119,7 +119,7 @@
* <i>outputCount</i> as adjusted for a variety of edge
* conditions).
*/
- EdgeSample(final SampleBase sourceSample,//
+ public EdgeSample(final SampleBase sourceSample,//
final int inputCount, //
final long tuplesRead,//
final long sumRangeCount,//
Modified: branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/joinGraph/rto/JGraph.java
===================================================================
--- branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/joinGraph/rto/JGraph.java 2014-01-09 20:47:55 UTC (rev 7754)
+++ branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/joinGraph/rto/JGraph.java 2014-01-09 21:55:07 UTC (rev 7755)
@@ -53,6 +53,7 @@
import com.bigdata.bop.joinGraph.NoSolutionsException;
import com.bigdata.bop.joinGraph.PartitionedJoinGroup;
import com.bigdata.bop.rdf.join.DataSetJoin;
+import com.bigdata.rdf.sparql.ast.eval.AST2BOpRTO;
import com.bigdata.util.concurrent.ExecutionExceptions;
/**
@@ -223,6 +224,14 @@
private static final transient Logger log = Logger.getLogger(JGraph.class);
/**
+ * The pipeline operator for executing the RTO. This provides additional
+ * context from the AST model that is necessary to handle some kinds of
+ * FILTERs (e.g., those which require conditional routing patterns for
+ * chunked materialization).
+ */
+ private final JoinGraph joinGraph;
+
+ /**
* Vertices of the join graph.
*/
private final Vertex[] V;
@@ -258,40 +267,49 @@
return sb.toString();
}
- /**
- *
- * @param v
- * The vertices of the join graph. These are {@link IPredicate}s
- * associated with required joins.
- * @param constraints
- * The constraints of the join graph (optional). Since all joins
- * in the join graph are required, constraints are dynamically
- * attached to the first join in which all of their variables are
- * bound.
- *
- * @throws IllegalArgumentException
- * if the vertices is <code>null</code>.
- * @throws IllegalArgumentException
- * if the vertices is an empty array.
- * @throws IllegalArgumentException
- * if any element of the vertices is <code>null</code>.
- * @throws IllegalArgumentException
- * if any constraint uses a variable which is never bound by the
- * given predicates.
- * @throws IllegalArgumentException
- * if <i>sampleType</i> is <code>null</code>.
- *
- * @todo unit test for a constraint using a variable which is never bound.
- * the constraint should be attached at the last vertex in the join
- * path. this will cause the query to fail unless the variable was
- * already bound, e.g., by a parent query or in the solutions pumped
- * into the {@link JoinGraph} operator.
- *
- * @todo unit test when the join graph has a single vertex.
- */
- public JGraph(final IPredicate<?>[] v, final IConstraint[] constraints,
- final SampleType sampleType) {
+ /**
+ * @param joinGraph
+ * The pipeline operator that is executing the RTO. This defines
+ * the join graph (vertices, edges, and constraints) and also
+ * provides access to the AST and related metadata required to
+ * execute the join graph.
+ *
+ * @throws IllegalArgumentException
+ * if the joinGraph is <code>null</code>.
+ * @throws IllegalArgumentException
+ * if the {@link JoinGraph#getVertices()} is <code>null</code>.
+ * @throws IllegalArgumentException
+ * if the {@link JoinGraph#getVertices()} is an empty array.
+ * @throws IllegalArgumentException
+ * if any element of the {@link JoinGraph#getVertices()}is
+ * <code>null</code>.
+ * @throws IllegalArgumentException
+ * if any constraint uses a variable which is never bound by the
+ * given predicates.
+ * @throws IllegalArgumentException
+ * if <i>sampleType</i> is <code>null</code>.
+ *
+ * @todo unit test for a constraint using a variable which is never bound.
+ * the constraint should be attached at the last vertex in the join
+ * path. this will cause the query to fail unless the variable was
+ * already bound, e.g., by a parent query or in the solutions pumped
+ * into the {@link JoinGraph} operator.
+ *
+ * @todo unit test when the join graph has a single vertex (we never invoke
+ * the RTO for less than 3 vertices since with one vertex you just run
+ * it and with two vertices you run the lower cardinality vertex first
+ * (though there might be cases where filters require materialization
+ * where running for two vertices could make sense)).
+ */
+ public JGraph(final JoinGraph joinGraph) {
+ if (joinGraph == null)
+ throw new IllegalArgumentException();
+
+ this.joinGraph = joinGraph;
+
+ final IPredicate<?>[] v = joinGraph.getVertices();
+
if (v == null)
throw new IllegalArgumentException();
@@ -309,6 +327,8 @@
}
+ final IConstraint[] constraints = joinGraph.getConstraints();
+
if (constraints != null) {
C = new IConstraint[constraints.length];
for (int i = 0; i < constraints.length; i++) {
@@ -321,6 +341,8 @@
C = null;
}
+ final SampleType sampleType = joinGraph.getSampleType();
+
if (sampleType == null)
throw new IllegalArgumentException();
@@ -519,9 +541,9 @@
}
// Should be one winner.
- if (paths.length != 1) {
- throw new AssertionError("Expected one path but have "
- + paths.length + " paths.");
+ if (paths.length != 1) {
+ throw new AssertionError("Expected one path but have "
+ + paths.length + " paths.");
}
if (log.isInfoEnabled()) {
@@ -651,18 +673,19 @@
*/
sampleAllVertices(queryEngine, limit);
- if (log.isInfoEnabled()) {
- final StringBuilder sb = new StringBuilder();
- sb.append("Sampled vertices:\n");
- for (Vertex v : V) {
- if (v.sample != null) {
- sb.append("id="+v.pred.getId()+" : ");
- sb.append(v.sample.toString());
- sb.append("\n");
- }
- }
- log.info(sb.toString());
- }
+ if (log.isInfoEnabled()) {
+ final StringBuilder sb = new StringBuilder();
+ sb.append("limit=" + limit + ", nedges=" + nedges);
+ sb.append(", sampled vertices::\n");
+ for (Vertex v : V) {
+ if (v.sample != null) {
+ sb.append("id=" + v.pred.getId() + " : ");
+ sb.append(v.sample.toString());
+ sb.append("\n");
+ }
+ }
+ log.info(sb.toString());
+ }
/*
* Estimate the cardinality for each edge.
@@ -940,8 +963,10 @@
* cardinality vertex.
*/
- edgeSample = Path.cutoffJoin(//
- queryEngine, limit,//
+ edgeSample = AST2BOpRTO.cutoffJoin(//
+ queryEngine, //
+ joinGraph, //
+ limit,//
x.getPathSegment(2),// 1st edge.
C,// constraints
V.length == 2,// pathIsComplete
@@ -978,7 +1003,9 @@
* edge of the path.
*/
- edgeSample = Path.cutoffJoin(queryEngine,//
+ edgeSample = AST2BOpRTO.cutoffJoin(//
+ queryEngine,//
+ joinGraph,//
limit,//
x.getPathSegment(ids.length()),//
C, // constraints
@@ -1245,9 +1272,14 @@
used.add(tVertex);
// Extend the path to the new vertex.
- final Path p = x
- .addEdge(queryEngine, limit, tVertex, /* dynamicEdge, */
- C, x.getVertexCount() + 1 == V.length/* pathIsComplete */);
+ final Path p = x.addEdge(//
+ queryEngine, //
+ joinGraph, //
+ limit,//
+ tVertex,//
+ C, //
+ x.getVertexCount() + 1 == V.length// pathIsComplete
+ );
// Add to the set of paths for this round.
tmp.add(p);
@@ -1284,9 +1316,14 @@
final Vertex tVertex = nothingShared.iterator().next();
// Extend the path to the new vertex.
- final Path p = x
- .addEdge(queryEngine, limit, tVertex,/* dynamicEdge */
- C, x.getVertexCount() + 1 == V.length/* pathIsComplete */);
+ final Path p = x.addEdge(//
+ queryEngine, //
+ joinGraph,//
+ limit, //
+ tVertex, //
+ C,//
+ x.getVertexCount() + 1 == V.length// pathIsComplete
+ );
// Add to the set of paths for this round.
tmp.add(p);
@@ -1640,7 +1677,9 @@
final IPredicate<?>[] preds = new IPredicate[] { v.pred, vp.pred };
// cutoff join of the edge (v,vp)
- final EdgeSample edgeSample = Path.cutoffJoin(queryEngine,//
+ final EdgeSample edgeSample = AST2BOpRTO.cutoffJoin(//
+ queryEngine,//
+ joinGraph,//
limit, // sample limit
preds, // ordered path segment.
C, // constraints
Modified: branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/joinGraph/rto/JoinGraph.java
===================================================================
--- branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/joinGraph/rto/JoinGraph.java 2014-01-09 20:47:55 UTC (rev 7754)
+++ branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/joinGraph/rto/JoinGraph.java 2014-01-09 21:55:07 UTC (rev 7755)
@@ -475,8 +475,7 @@
final long begin = System.nanoTime();
// Create the join graph.
- final JGraph g = new JGraph(getVertices(), getConstraints(),
- getSampleType());
+ final JGraph g = new JGraph(JoinGraph.this);
/*
* This map is used to associate join path segments (expressed as an
Modified: branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/joinGraph/rto/Path.java
===================================================================
--- branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/joinGraph/rto/Path.java 2014-01-09 20:47:55 UTC (rev 7754)
+++ branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/joinGraph/rto/Path.java 2014-01-09 21:55:07 UTC (rev 7755)
@@ -25,30 +25,16 @@
import java.util.Arrays;
import java.util.Collections;
-import java.util.Iterator;
-import java.util.LinkedList;
import java.util.List;
-import java.util.Map;
-import java.util.UUID;
import org.apache.log4j.Logger;
import com.bigdata.bop.BOp;
-import com.bigdata.bop.BOpEvaluationContext;
-import com.bigdata.bop.BOpIdFactory;
import com.bigdata.bop.BOpUtility;
-import com.bigdata.bop.IBindingSet;
import com.bigdata.bop.IConstraint;
import com.bigdata.bop.IPredicate;
-import com.bigdata.bop.NV;
-import com.bigdata.bop.PipelineOp;
-import com.bigdata.bop.engine.IRunningQuery;
-import com.bigdata.bop.engine.LocalChunkMessage;
import com.bigdata.bop.engine.QueryEngine;
-import com.bigdata.bop.join.PipelineJoin;
-import com.bigdata.bop.join.PipelineJoinStats;
-import com.bigdata.bop.joinGraph.PartitionedJoinGroup;
-import com.bigdata.striterator.Dechunkerator;
+import com.bigdata.rdf.sparql.ast.eval.AST2BOpRTO;
/**
* A join path is an ordered sequence of N {@link Vertex vertices} and
@@ -182,18 +168,20 @@
*/
final public long sumEstCost;
- /**
- * Combine the cumulative expected cardinality and the cumulative expected
- * tuples read to produce an overall measure of the expected cost of the
- * join path if it were to be fully executed.
- *
- * @return The cumulative estimated cost of the join path.
- *
- * TODO Compute this incrementally as estCost using estRead and
- * estCard and then take the running sum as sumEstCost and update
- * the JGraph trace appropriately.
- */
- private static long getCost(final long sumEstRead, final long sumEstCard) {
+ /**
+ * Combine the cumulative expected cardinality and the cumulative expected
+ * tuples read to produce an overall measure of the expected cost of the
+ * join path if it were to be fully executed.
+ *
+ * @return The cumulative estimated cost of the join path.
+ *
+ * TODO Compute this incrementally as estCost using estRead and
+ * estCard and then take the running sum as sumEstCost and update
+ * the JGraph trace appropriately. [Refactor into an IPathCost
+ * interface. It should have visibility into the full path and also
+ * allow visibility into the vertex cost for generality.]
+ */
+ private static long getCost(final long sumEstRead, final long sumEstCard) {
final long total;
// total = sumEstCard + sumEstRead; // intermediate results + IO.
@@ -631,9 +619,9 @@
*
* @throws Exception
*/
- public Path addEdge(final QueryEngine queryEngine, final int limit,
- final Vertex vnew, final IConstraint[] constraints,
- final boolean pathIsComplete)
+ public Path addEdge(final QueryEngine queryEngine,
+ final JoinGraph joinGraph, final int limit, final Vertex vnew,
+ final IConstraint[] constraints, final boolean pathIsComplete)
throws Exception {
if (vnew == null)
@@ -692,8 +680,9 @@
}
- final EdgeSample edgeSample2 = cutoffJoin(//
+ final EdgeSample edgeSample2 = AST2BOpRTO.cutoffJoin(//
queryEngine,//
+ joinGraph,//
limit, //
preds2,//
constraints,//
@@ -715,41 +704,47 @@
}
- /**
- * Cutoff join of the last vertex in the join path.
- * <p>
- * <strong>The caller is responsible for protecting against needless
- * re-sampling.</strong> This includes cases where a sample already exists
- * at the desired sample limit and cases where the sample is already exact.
- *
- * @param queryEngine
- * The query engine.
- * @param limit
- * The limit for the cutoff join.
- * @param path
- * The path segment, which must include the target vertex as the
- * last component of the path segment.
- * @param constraints
- * The constraints declared for the join graph (if any). The
- * appropriate constraints will be applied based on the variables
- * which are known to be bound as of the cutoff join for the last
- * vertex in the path segment.
- * @param pathIsComplete
- * <code>true</code> iff all vertices in the join graph are
- * incorporated into this path.
- * @param sourceSample
- * The input sample for the cutoff join. When this is a one-step
- * estimation of the cardinality of the edge, then this sample is
- * taken from the {@link VertexSample}. When the edge (vSource,
- * vTarget) extends some {@link Path}, then this is taken from
- * the {@link EdgeSample} for that {@link Path}.
- *
- * @return The result of sampling that edge.
- *
- * @throws Exception
- */
+ /**
+ * Cutoff join of the last vertex in the join path.
+ * <p>
+ * <strong>The caller is responsible for protecting against needless
+ * re-sampling.</strong> This includes cases where a sample already exists
+ * at the desired sample limit and cases where the sample is already exact.
+ *
+ * @param queryEngine
+ * The query engine.
+ * @param joinGraph
+ * The pipeline operator that is executing the RTO. This defines
+ * the join graph (vertices, edges, and constraints) and also
+ * provides access to the AST and related metadata required to
+ * execute the join graph.
+ * @param limit
+ * The limit for the cutoff join.
+ * @param path
+ * The path segment, which must include the target vertex as the
+ * last component of the path segment.
+ * @param constraints
+ * The constraints declared for the join graph (if any). The
+ * appropriate constraints will be applied based on the variables
+ * which are known to be bound as of the cutoff join for the last
+ * vertex in the path segment.
+ * @param pathIsComplete
+ * <code>true</code> iff all vertices in the join graph are
+ * incorporated into this path.
+ * @param sourceSample
+ * The input sample for the cutoff join. When this is a one-step
+ * estimation of the cardinality of the edge, then this sample is
+ * taken from the {@link VertexSample}. When the edge (vSource,
+ * vTarget) extends some {@link Path}, then this is taken from
+ * the {@link EdgeSample} for that {@link Path}.
+ *
+ * @return The result of sampling that edge.
+ *
+ * @throws Exception
+ */
static public EdgeSample cutoffJoin(//
final QueryEngine queryEngine,//
+ final JoinGraph joinGraph,//
final int limit,//
final IPredicate<?>[] path,//
final IConstraint[] constraints,//
@@ -757,283 +752,10 @@
final SampleBase sourceSample//
) throws Exception {
- if (path == null)
- throw new IllegalArgumentException();
+ // Note: Delegated to the AST/RTO integration class.
+ return AST2BOpRTO.cutoffJoin(queryEngine, joinGraph, limit, path,
+ constraints, pathIsComplete, sourceSample);
- if (limit <= 0)
- throw new IllegalArgumentException();
-
- // The access path on which the cutoff join will read.
- final IPredicate<?> pred = path[path.length - 1];
-
- if (pred == null)
- throw new IllegalArgumentException();
-
- if (sourceSample == null)
- throw new IllegalArgumentException();
-
- if (sourceSample.getSample() == null)
- throw new IllegalArgumentException();
-
- // Figure out which constraints attach to each predicate. FIXME RTO Replace with StaticAnalysis.
- final IConstraint[][] constraintAttachmentArray = PartitionedJoinGroup
- .getJoinGraphConstraints(path, constraints, null/*knownBound*/,
- pathIsComplete);
-
- // The constraint(s) (if any) for this join.
- final IConstraint[] c = constraintAttachmentArray[path.length - 1];
-
- /*
- * Setup factory for bopIds with reservations for ones already in use.
- */
- final BOpIdFactory idFactory = new BOpIdFactory();
-
- // Reservation for the bopId used by the predicate.
- idFactory.reserve(pred.getId());
-
- // Reservations for the bopIds used by the constraints.
- if (c != null) {
- for (IConstraint x : c) {
- if (log.isTraceEnabled())
- log.trace(Arrays.toString(BOpUtility.getPredIds(path))
- + ": constraint: " + x);
- final Iterator<BOp> itr = BOpUtility
- .preOrderIteratorWithAnnotations(x);
- while (itr.hasNext()) {
- final BOp y = itr.next();
- final Integer anId = (Integer) y
- .getProperty(BOp.Annotations.BOP_ID);
- if (anId != null)
- idFactory.reserve(anId.intValue());
- }
- }
- }
-
- /*
- * Set up a cutoff pipeline join operator which makes an accurate
- * estimate of the #of input solutions consumed and the #of output
- * solutions generated. From that, we can directly compute the join hit
- * ratio.
- *
- * Note: This approach is preferred to injecting a "RowId" column as the
- * estimates are taken based on internal counters in the join operator
- * and the join operator knows how to cutoff evaluation as soon as the
- * limit is satisfied, thus avoiding unnecessary effort.
- */
-
- final int joinId = idFactory.nextId();
- final Map<String, Object> anns = NV.asMap(//
- new NV(BOp.Annotations.BOP_ID, joinId),//
- new NV(PipelineJoin.Annotations.PREDICATE, pred),//
- // Note: does not matter since not executed by the query
- // controller.
- // // disallow parallel evaluation of tasks
- // new NV(PipelineOp.Annotations.MAX_PARALLEL,1),
- // disallow parallel evaluation of chunks.
- new NV(PipelineJoin.Annotations.MAX_PARALLEL_CHUNKS, 0),
- // disable access path coalescing
- new NV( PipelineJoin.Annotations.COALESCE_DUPLICATE_ACCESS_PATHS, false), //
- // pass in constraints on this join.
- new NV(PipelineJoin.Annotations.CONSTRAINTS, c),//
- // cutoff join.
- new NV(PipelineJoin.Annotations.LIMIT, (long) limit),
- /*
- * Note: In order to have an accurate estimate of the
- * join hit ratio we need to make sure that the join
- * operator runs using a single PipelineJoinStats
- * instance which will be visible to us when the query
- * is cutoff. In turn, this implies that the join must
- * be evaluated on the query controller.
- *
- * @todo This implies that sampling of scale-out joins
- * must be done using remote access paths.
- */
- new NV(PipelineJoin.Annotations.SHARED_STATE, true),//
- new NV(PipelineJoin.Annotations.EVALUATION_CONTEXT,
- BOpEvaluationContext.CONTROLLER)//
- );
-
- @SuppressWarnings("unchecked")
- final PipelineJoin<?> joinOp = new PipelineJoin(new BOp[] {}, anns);
-
- final PipelineOp queryOp = joinOp;
-
- // run the cutoff sampling of the edge.
- final UUID queryId = UUID.randomUUID();
- final IRunningQuery runningQuery = queryEngine.eval(//
- queryId,//
- queryOp,//
- null,// attributes
- new LocalChunkMessage(queryEngine, queryId, joinOp
- .getId()/* startId */, -1 /* partitionId */,
- sourceSample.getSample()));
-
- final List<IBindingSet> result = new LinkedList<IBindingSet>();
- try {
- int nresults = 0;
- try {
- IBindingSet bset = null;
- // Figure out the #of source samples consumed.
- final Iterator<IBindingSet> itr = new Dechunkerator<IBindingSet>(
- runningQuery.iterator());
- while (itr.hasNext()) {
- bset = itr.next();
- result.add(bset);
- if (nresults++ >= limit) {
- // Break out if cutoff join over produces!
- break;
- }
- }
- } finally {
- // ensure terminated regardless.
- runningQuery.cancel(true/* mayInterruptIfRunning */);
- }
- } finally {
- // verify no problems.
- if (runningQuery.getCause() != null) {
- // wrap throwable from abnormal termination.
- throw new RuntimeException(runningQuery.getCause());
- }
- }
-
- // The join hit ratio can be computed directly from these stats.
- final PipelineJoinStats joinStats = (PipelineJoinStats) runningQuery
- .getStats().get(joinId);
-
- if (log.isTraceEnabled())
- log.trace(Arrays.toString(BOpUtility.getPredIds(path)) + ": "
- + joinStats.toString());
-
- // #of solutions in.
- final int inputCount = (int) joinStats.inputSolutions.get();
-
- // #of solutions out.
- final long outputCount = joinStats.outputSolutions.get();
-
- // #of solutions out as adjusted for various edge conditions.
- final long adjustedCard;
-
- // cumulative range count of the sampled access paths.
- final long sumRangeCount = joinStats.accessPathRangeCount.get();
-
- final EstimateEnum estimateEnum;
- if (sourceSample.estimateEnum == EstimateEnum.Exact
- && outputCount < limit) {
- /*
- * Note: If the entire source vertex is being fed into the cutoff
- * join and the cutoff join outputCount is LT the limit, then the
- * sample is the actual result of the join. That is, feeding all
- * source solutions into the join gives fewer than the desired
- * number of output solutions.
- */
- estimateEnum = EstimateEnum.Exact;
- adjustedCard = outputCount;
- } else if (inputCount == 1 && outputCount == limit) {
- /*
- * If the inputCount is ONE (1) and the outputCount is the limit,
- * then the estimated cardinality is a lower bound as more than
- * outputCount solutions might be produced by the join when
- * presented with a single input solution.
- *
- * However, this condition suggests that the sum of the sampled
- * range counts is a much better estimate of the cardinality of this
- * join.
- *
- * For example, consider a join feeding a rangeCount of 16 into a
- * rangeCount of 175000. With a limit of 100, we estimated the
- * cardinality at 1600L (lower bound). In fact, the cardinality is
- * 16*175000. This falsely low estimate can cause solutions which
- * are really better to be dropped.
- */
- // replace outputCount with the sum of the sampled range counts.
- adjustedCard = sumRangeCount;
- estimateEnum = EstimateEnum.LowerBound;
- } else if ((sourceSample.estimateEnum != EstimateEnum.Exact)
- /*&& inputCount == Math.min(sourceSample.limit,
- sourceSample.estimatedCardinality) */ && outputCount == 0) {
- /*
- * When the source sample was not exact, the inputCount is EQ to the
- * lesser of the source range count and the source sample limit, and
- * the outputCount is ZERO (0), then feeding in all source solutions
- * is not sufficient to generate any output solutions. In this case,
- * the estimated join hit ratio appears to be zero. However, the
- * estimation of the join hit ratio actually underflowed and the
- * real join hit ratio might be a small non-negative value. A real
- * zero can only be identified by executing the full join.
- *
- * Note: An apparent join hit ratio of zero does NOT imply that the
- * join will be empty (unless the source vertex sample is actually
- * the fully materialized access path - this case is covered above).
- *
- * path sourceCard * f ( in read out limit adjCard) = estCard : sumEstCard joinPath
- * 15 4800L * 0.00 ( 200 200 0 300 0) = 0 : 3633 [ 3 1 6 5 ]
-
- */
- estimateEnum = EstimateEnum.Underflow;
- adjustedCard = outputCount;
- } else {
- estimateEnum = EstimateEnum.Normal;
- adjustedCard = outputCount;
- }
-
- /*
- * The #of tuples read from the sampled access paths. This is part of
- * the cost of the join path, even though it is not part of the expected
- * cardinality of the cutoff join.
- *
- * Note: While IOs is a better predictor of latency, it is possible to
- * choose a pipelined join versus a hash join once the query plan has
- * been decided. Their IO provides are both correlated to the #of tuples
- * read.
- */
- final long tuplesRead = joinStats.accessPathUnitsIn.get();
-
- /*
- * Compute the hit-join ratio based on the adjusted cardinality
- * estimate.
- */
- final double f = adjustedCard == 0 ? 0
- : (adjustedCard / (double) inputCount);
-// final double f = outputCount == 0 ? 0
-// : (outputCount / (double) inputCount);
-
- // estimated output cardinality of fully executed operator.
- final long estCard = (long) (sourceSample.estCard * f);
-
- /*
- * estimated tuples read for fully executed operator
- *
- * TODO The actual IOs depend on the join type (hash join versus
- * pipeline join) and whether or not the file has index order (segment
- * versus journal). A hash join will read once on the AP. A pipeline
- * join will read once per input solution. A key-range read on a segment
- * uses multi-block IO while a key-range read on a journal uses random
- * IO. Also, remote access path reads are more expensive than sharded
- * or hash partitioned access path reads in scale-out.
- */
- final long estRead = (long) (sumRangeCount * f);
-
- final EdgeSample edgeSample = new EdgeSample(//
- sourceSample,//
- inputCount,//
- tuplesRead,//
- sumRangeCount,//
- outputCount, //
- adjustedCard,//
- f, //
- // args to SampleBase
- estCard, // estimated output cardinality if fully executed.
- estRead, // estimated tuples read if fully executed.
- limit, //
- estimateEnum,//
- result.toArray(new IBindingSet[result.size()]));
-
- if (log.isDebugEnabled())
- log.debug(Arrays.toString(BOpUtility.getPredIds(path))
- + ": newSample=" + edgeSample);
-
- return edgeSample;
-
}
}
Modified: branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/joinGraph/rto/SampleBase.java
===================================================================
--- branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/joinGraph/rto/SampleBase.java 2014-01-09 20:47:55 UTC (rev 7754)
+++ branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/joinGraph/rto/SampleBase.java 2014-01-09 21:55:07 UTC (rev 7755)
@@ -32,6 +32,7 @@
import org.apache.log4j.Logger;
import com.bigdata.bop.IBindingSet;
+import com.bigdata.bop.engine.IChunkMessage;
import com.bigdata.rwstore.sector.IMemoryManager;
/**
@@ -108,8 +109,11 @@
*
* @return The sampled solution set -or- <code>null</code> if it has been
* released.
+ *
+ * TODO Wrap up as an {@link IChunkMessage} so we can store this on
+ * the native heap?
*/
- IBindingSet[] getSample() {
+ public IBindingSet[] getSample() {
return sampleRef.get();
Modified: branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/eval/AST2BOpBase.java
===================================================================
--- branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/eval/AST2BOpBase.java 2014-01-09 20:47:55 UTC (rev 7754)
+++ branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/eval/AST2BOpBase.java 2014-01-09 21:55:07 UTC (rev 7755)
@@ -94,6 +94,16 @@
*/
String SCOPE = AST2BOpBase.class.getName() + ".scope";
+ /**
+ * Boolean annotation indicates whether the generated JOIN is simple (a
+ * single JOIN operator with optional constraints but without any
+ * variable materialization requirements) or complex (a JOIN operator
+ * associated with at least one constraint which requires the
+ * materialization of variables that are not already known to be
+ * materialized).
+ */
+ String SIMPLE_JOIN = AST2BOpBase.class.getName() + ".simpleJoin";
+
/*
* Query planner and cost estimates.
*/
Modified: branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/eval/AST2BOpFilters.java
===================================================================
--- branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/eval/AST2BOpFilters.java 2014-01-09 20:47:55 UTC (rev 7754)
+++ branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/eval/AST2BOpFilters.java 2014-01-09 21:55:07 UTC (rev 7755)
@@ -593,10 +593,13 @@
* @param needsMaterialization
* A map providing for each constraint the set of variables which
* must be materialized before that constraint can be evaluated.
- * This map is populated as a side-effect.
+ * This map is populated as a side-effect. It will be empty iff
+ * there are no constraints that might or must require variable
+ * materialization.
*
* @return Constraints which can (or might) be able to run attached to that
- * join.
+ * join -or- <code>null</code> iff there are no constraints that can
+ * be attached to the join.
*/
static protected IConstraint[] getJoinConstraints(
final Collection<IConstraint> constraints,
Modified: branch...
[truncated message content] |