[Bigdata-commit] SF.net SVN: bigdata:[7712] branches/BIGDATA_RELEASE_1_3_0

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Revision: 7712
          http://bigdata.svn.sourceforge.net/bigdata/?rev=7712&view=rev
Author:   thompsonbry
Date:     2014-01-03 19:18:16 +0000 (Fri, 03 Jan 2014)
Log Message:
-----------
Changes to PipelineJoin, IBindingSetAccessPath, IHashJoinUtility,
JVMHashJoinUtility, HTreeHashJoinUtility, ServiceCallOp, etc. intended
to reduce re-chunking during vectored query evaluation.  This change
provides correct accounting for chunksIn and unitsIn for the solution
set hash join and improved memory utilization through elimination of
some unnecessary dechunking and rechunking in the hash join API.

Changes were also made to the query hint infrastructure in order to
ensure that hint:chunkSize was correctly applied to HTree
operators. This is important now that the HTreeHashJoinUtility no
longer rechunks to a hard-coded chunkSize of 1000.  

While working on this, I noticed that hint:chunkSize was not making it
onto the SliceOp, onto the Predicate associated with a PipelineJoin
(where it controls the vectoring for reading on the access path),
etc. This was due to both the types of nodes to which the IQueryHint
implementations were willing to apply themselves and the types of
nodes to which the ASTQueryHintOptimizer was willing to apply query
hints.  I have made both more expansive.  

To help analyze the issue with query hints (which was necessary to
assess the performance impact of my change to vectoring in the
HTreeHashJoinUtility) and to help analyze the problem with the
stochastic behavior of the HTree, I also added significantly more
information into the predSummary and added columns (in the detailed
explain mode) for the PipelineOp and Predicate annotations.

I also touched a lot of classes, adding @Override and final
annotations.  A number of this were in the striterator package (when I
added a CloseableChunkedIteratorWrapperConverter to address the
rechunking pattern in HTreeHashJoinUtility, IBindingSetAccessPath, and
BOpContext#solutions()) and in the ast package.  Several of join
operators were also touched, either to support the fix of the
rechunking pattern or to eliminate some old (and commented out) code.

I have added more unit tests of the query hints mechanisms.  I found
and fixed several places where query hints were not being applied when
generating pipeline operators (AST2BOpUtility) and where query hints
were not being copied from one AST node to another when making
structural changes to the AST, e.g., ASTBottomUpOptimizer and
ASTSparql11SubqueryOptimizer both create a NamedSubqueryRoot and a
NamedSubqueryInclude.  However, they were not copying across the query
hints from the parent join group (ASTBottomUpOptimizer) and the
original SubqueryRoot (ASTSparql11SubqueryOptimizer).

com.bigdata.rdf.sparql.ast::
- ASTBase: javadoc on Annotations.QUERY_HINTS. final and @Override annotations.
- AssignmentNode: final and @Override annotations.
- GraphPatternGroup: final and @Override annotations.
- GraphNodeGroup: license header, final and @Override annotations, toString(int) changes.
- JoinGroupNode: license header. Made the OPTIMIZER property explicit. Added getQueryOptimizer() method. final and @Override annotations.
- QueryBase: added query hints into toString(int).
- QueryHints: referenced ticket #791 (clean up query hints).
- QueryOptimizerEnum: removed dead code.
- QueryRoot: final, @Override, and javadoc correction.
- SliceNode: javadoc on annotations, toString(int) now shows the query hints. this was done to have visibility into vectoring.
- StatementPatternNode: license header, final and @Override, javadoc, dead code elimination.
- ValueExpressionListBaseNode: @Override

com.bigdata.striterator::
- CloseableChunkedIteratorWrapperConverter: new class converts from IChunkedIterator<E> to ICloseableIterator visiting E[].
- TestCloseableChunkedIteratorWrapperConverter: new test suite.
- TestAll: include new test class.
- AbstractChunkedResolverator: final, @Override.
- ChunkedArrayIterator: final, @Override.
- ChunkedArraysIterator: javadoc, final, @Override.
- ChunkedConvertingIterator: final, @Override.
- ChunkedResolvingIterator: javadoc, final, @Override.
- ChunkedWrappedIterator: final, @Override.
- Chunkerator: close() now tests for ICloseable rather than ICloseableIterator.
- CloseableIteratorWrapper: final, @Override.
- Dechunkerator: javadoc, final, @Override, close() now tests for ICloseable rather than ICloseableIterator. 
- DelegateChunkedIterator: final, @Override.
- GenericChunkedStriterator: removed unnecessary @SuppressWarning.
- IChunkedIterator: @Override for methods declared by Iterator.
- IChunkedStriterator: @Override for methods in base interface.
- MergeFilter: final, @Override.
- PushbackIterator: final, @Override.
- Resolver: final, @Override.
- Striterator: final, @Override.

com.bigdata.bop.join::
- TestPipelineJoin: @Override and super.setUp() / super.tearDown().
- AbstractHashJoinUtilityTestCase: Modified how we execute the hash join for the IHashJoinUtility API change.
- HashIndexOp: @Override
- HashJoinOp: @Override, javadoc, API change for IHashJoinUtility.hashJoin().
- HTreeHashIndexOp: @Override, final, dead code eliminated.
- HTreeHashJoinUtility: removed static chunkSize field. Vectoring is now controlled by hint:chunkSize.  Pushed down the logic to track chunskIn and unitsIn for hashJoin2 (they were not being tracked).
- IHashJoinUtility: API change to remove rechunking pattern.
- JoinVariableNotBoundException: final annotations.
- JVMHashIndex: Slight efficiency change in makeKey(). @Override
- JVMHashIndexOp: final annotation.
- JVMHashJoinUtility: API change for hashJoin2(): now accepts ICloseableIteratoe<IBindingSet[]> and BOpStats and tracks unitsIn and chunksIn.
- PipelineJoin: IBindingSetAccessPath now returns an ICloseableIterator<IBindingSet[]>.  Modified the AccessPathTask to handle the IBindingSet[] chunks.  Used to be just IBindingSets.
- SolutionSetHashJoin: Modified to pass context.getSource() and BOpStats into the IHashJoinUtility rather than dechunking. 

com.bigdata.bop.controller::
- HTreeNamedSubqueryOp, JVMNamedSubqueryOp, and INamedSubqueryOp: Added INamedSubqueryOp as a marker interface for the two implementation classes so we can identify those operators when they appear in a query plan.
- ServiceCallJoin: Modified to pass ICloseableIterator<IBindingSet[]> into IHashJoinUtility.  This should be pushed down further.  There is a TODO to do that when we address vectoring in/out of the SERVICE operator.

com.bigdata.bop.QueryEngine:
- QueryLog: Significantly expanded and improved performance counter reporting for Explain, especially in the "detail" mode.

com.bigdata.bop::
- AbstractAccessPathOp: removed unused methods. This is part of the query hints cleanup.
- BOpContext#solutions() was modified to return an ICloseableIterator visiting IBindingSet[]s.  This is part of the rechunking change for IHashJoinUtility.
- BOpUtility: Added getOnly() method used to obtain the only instance of a BOp from a query plan or AST.  This is used by unit tests.

com.bigdata.relation.accesspath::
- IBindingSetAccessPath: solutions() was modified to return an ICloseableIterator visiting IBindingSet[]s.
- AccessPath: solutions() was modified to return an ICloseableIterator visiting IBindingSet[]s.

com.bigdata.rdf.sparql.hints::
- Extensive changes to clean up query hints.  Many query hints are now apply to IQueryNode rather than IJoinNode. javadoc. Test cases have been expanded.

com.bigdata.rdf.sparql.ast::
- ASTBottomUpOptimizer: modified to pass through query hints from the parent join group to the lifted out named subquery and the INCLUDE.
- ASTSparql11SubqueryOptimizer: modified to pass through query hints from the original subquery when it is lifted out into a named subquery. The hints are applied to both the new named subquery and the new named subquery include.
- ASTStaticJoinOptimizer: changes to isStaticOptimizer() to use JoinGroup.getQueryOptimizer();
- ASTQueryHintOptimizer: extensive changes.  modified how the optimizer identifies the nodes for which it will delegate to a query hint - it used to only do this for QueryNodeBase, which was too restrictive.  It now does this for everthing but value expressions.  javadoc clarifying the intention of the class.  Removed some dead code.  

  Note: ASTQueryHintOptimizer No longer adds all query hints with Scope:=Query to AST2BOpContext.queryHints.  I have clarified this in documentation in both classes.

com.bigdata.rdf.sparql.eval::
- AST2BOpBase: Changed the pattern for applyQueryHints() to use both the AST node's query hints and the AST2BOpUtility global defaults whenever possible.
- AST2BOpUtility: Modified to more systematically pass along query hints from the AST node to the constructed PipelineOps.  Modified to pass through BufferAnnotations and IPredicate annotations to the Predicate created from a StatementPatternNode.  This allows us to control vectoring for AccessPath reads.
- AST2BOpFilters: Partially addressed pass through of query hints, but only those in the global scope.  We need to change the method interfaces to pass through the bounding AST node or the queryHints for that AST node.
- AST2BOpContext: license header, javadoc.

I have run through all of the bop, AST, SPARQL, and NSS test suites
and everything is green.

TODO:

  - (*) I have not yet resolved the HTree stochastic behavior.  I will
    continue to look at that once this checkpoint is committed. See
    #763.

  - (*) Query hints for the materialization pipeline
    (ChunkedMaterializationOp, ConditionalRoutingOp) are not being
    applied correctly because the caller's AST node (or its query
    hints) are not being passed down.  I am going to defer fixing this
    for a moment while I look at the RTO integration.  (The RTO needs
    to be able to use AST2BOpJoins#join(), which is the main entry
    point into the materialization pipeline code.)  See #791.

    - AST2BOpBase: Once we fix this, we really do not need to pass in
      the AST2BOpContext's query hints into applyQueryHints() any
      more. The impact will be achieved by passing down the query
      hints from the appropriate bounding AST node.  The
      ASTQueryHintOptimizer is responsible for making sure that the
      query hints are applied to those AST nodes.

  - (*) Check query performance on LUBM U50, BSBM 100M, and govtrack.
    The changes in the HTree vectoring could hit govtrack, but we
    should be able to override hint:chunkSize if necessary to correct
    for this (or automatically increase the chunkSize for the analytic
    query mode, or use dynamic rechunking, etc).  The changes in the
    ASTQueryHintsOptimizer could hit all queries, but I was pretty
    careful and do not expect to see any performance regressions.

See http://sourceforge.net/apps/trac/bigdata/ticket/483 (Eliminate unnecessary dechunking and rechunking)
See http://sourceforge.net/apps/trac/bigdata/ticket/791 (Clean up query hints)
See http://sourceforge.net/apps/trac/bigdata/ticket/763 (Stochastic results with Analytic Query Mode)

Modified Paths:
--------------
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/AbstractAccessPathOp.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/BOpContext.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/BOpUtility.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/controller/HTreeNamedSubqueryOp.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/controller/JVMNamedSubqueryOp.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/controller/ServiceCallJoin.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/engine/QueryLog.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/join/HTreeHashIndexOp.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/join/HTreeHashJoinUtility.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/join/HashIndexOp.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/join/HashJoinOp.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/join/IHashJoinUtility.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/join/JVMHashIndex.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/join/JVMHashIndexOp.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/join/JVMHashJoinUtility.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/join/JoinVariableNotBoundException.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/join/PipelineJoin.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/join/SolutionSetHashJoinOp.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/solutions/ProjectionOp.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/solutions/SliceOp.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/relation/accesspath/AccessPath.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/relation/accesspath/IBindingSetAccessPath.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/striterator/AbstractChunkedResolverator.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/striterator/ChunkedArrayIterator.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/striterator/ChunkedArraysIterator.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/striterator/ChunkedConvertingIterator.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/striterator/ChunkedOrderedStriterator.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/striterator/ChunkedResolvingIterator.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/striterator/ChunkedWrappedIterator.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/striterator/Chunkerator.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/striterator/CloseableIteratorWrapper.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/striterator/Dechunkerator.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/striterator/DelegateChunkedIterator.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/striterator/GenericChunkedStriterator.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/striterator/IChunkedIterator.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/striterator/IChunkedStriterator.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/striterator/MergeFilter.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/striterator/PushbackIterator.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/striterator/Resolver.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/striterator/Striterator.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/test/com/bigdata/bop/join/AbstractHashJoinUtilityTestCase.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/test/com/bigdata/bop/join/TestPipelineJoin.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/test/com/bigdata/striterator/TestAll.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/ASTBase.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/AssignmentNode.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/GraphPatternGroup.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/GroupNodeBase.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/JoinGroupNode.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/QueryBase.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/QueryHints.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/QueryOptimizerEnum.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/QueryRoot.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/SliceNode.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/StatementPatternNode.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/ValueExpressionListBaseNode.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/eval/AST2BOpBase.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/eval/AST2BOpContext.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/eval/AST2BOpFilters.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/eval/AST2BOpUtility.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/hints/AbstractChunkSizeHint.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/hints/AbstractQueryHint.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/hints/AnalyticQueryHint.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/hints/AtOnceHint.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/hints/BufferChunkCapacityHint.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/hints/BufferChunkOfChunksCapacityHint.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/hints/ChunkSizeHint.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/hints/IQueryHint.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/hints/OptimizerQueryHint.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/hints/PipelineMaxMessagesPerTaskHint.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/hints/PipelineMaxParallelHint.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/hints/PipelineQueueCapacityHint.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/hints/QueryHintRegistry.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/hints/RTOSampleTypeQueryHint.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/optimizers/ASTBottomUpOptimizer.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/optimizers/ASTQueryHintOptimizer.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/optimizers/ASTSparql11SubqueryOptimizer.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/optimizers/ASTStaticJoinOptimizer.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/service/ServiceCall.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/test/com/bigdata/rdf/sparql/ast/eval/TestQueryHints.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/test/com/bigdata/rdf/sparql/ast/eval/query-hints-01.rq
    branches/BIGDATA_RELEASE_1_3_0/bigdata-rdf/src/test/com/bigdata/rdf/sparql/ast/eval/query-hints-06.rq

Added Paths:
-----------
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/controller/INamedSubqueryOp.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/striterator/CloseableChunkedIteratorWrapperConverter.java
    branches/BIGDATA_RELEASE_1_3_0/bigdata/src/test/com/bigdata/striterator/TestCloseableChunkedIteratorWrapperConverter.java

Property Changed:
----------------
    branches/BIGDATA_RELEASE_1_3_0/


Property changes on: branches/BIGDATA_RELEASE_1_3_0
___________________________________________________________________
Modified: svn:ignore
   - ant-build
src
bin
bigdata*.jar
ant-release
standalone
test*
countersfinal.xml
events.jnl
.settings
*.jnl
TestInsertRate.out
SYSTAP-BBT-result.txt
U10load+query
*.hprof
com.bigdata.cache.TestHardReferenceQueueWithBatchingUpdates.exp.csv
commit-log.txt
eventLog
dist
bigdata-test
com.bigdata.rdf.stress.LoadClosureAndQueryTest.*.csv
DIST.bigdata-*.tgz
REL.bigdata-*.tgz
queryLog*
queryRunState*
sparql.txt
benchmark
CI

   + ant-build
src
bin
bigdata*.jar
ant-release
standalone
test*
countersfinal.xml
events.jnl
.settings
*.jnl
TestInsertRate.out
SYSTAP-BBT-result.txt
U10load+query
*.hprof
com.bigdata.cache.TestHardReferenceQueueWithBatchingUpdates.exp.csv
commit-log.txt
eventLog
dist
bigdata-test
com.bigdata.rdf.stress.LoadClosureAndQueryTest.*.csv
DIST.bigdata-*.tgz
REL.bigdata-*.tgz
queryLog*
queryRunState*
sparql.txt
benchmark
CI
bsbm10-dataset.nt.gz
bsbm10-dataset.nt.zip


Modified: branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/AbstractAccessPathOp.java
===================================================================

--- branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/AbstractAccessPathOp.java	2013-12-31 19:48:13 UTC (rev 7711)
+++ branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/AbstractAccessPathOp.java	2014-01-03 19:18:16 UTC (rev 7712)
@@ -76,26 +76,26 @@
         super(op);
     }
 
-    /**
-     * @see BufferAnnotations#CHUNK_CAPACITY
-     */
-    protected int getChunkCapacity() {
-        
-        return getProperty(Annotations.CHUNK_CAPACITY,
-                Annotations.DEFAULT_CHUNK_CAPACITY);
+//    /**
+//     * @see BufferAnnotations#CHUNK_CAPACITY
+//     */
+//    protected int getChunkCapacity() {
+//        
+//        return getProperty(Annotations.CHUNK_CAPACITY,
+//                Annotations.DEFAULT_CHUNK_CAPACITY);
+//
+//    }
+//
+//    /**
+//     * @see BufferAnnotations#CHUNK_OF_CHUNKS_CAPACITY
+//     */
+//    protected int getChunkOfChunksCapacity() {
+//
+//        return getProperty(Annotations.CHUNK_OF_CHUNKS_CAPACITY,
+//                Annotations.DEFAULT_CHUNK_OF_CHUNKS_CAPACITY);
+//
+//    }
 
-    }
-
-    /**
-     * @see BufferAnnotations#CHUNK_OF_CHUNKS_CAPACITY
-     */
-    protected int getChunkOfChunksCapacity() {
-
-        return getProperty(Annotations.CHUNK_OF_CHUNKS_CAPACITY,
-                Annotations.DEFAULT_CHUNK_OF_CHUNKS_CAPACITY);
-
-    }
-
 //    protected int getFullyBufferedReadThreshold() {
 //
 //        return getProperty(Annotations.FULLY_BUFFERED_READ_THRESHOLD,
@@ -103,14 +103,14 @@
 //
 //    }
 
-    /**
-     * @see BufferAnnotations#CHUNK_TIMEOUT
-     */
-    protected long getChunkTimeout() {
-        
-        return getProperty(Annotations.CHUNK_TIMEOUT,
-                Annotations.DEFAULT_CHUNK_TIMEOUT);
-        
-    }
+//    /**
+//     * @see BufferAnnotations#CHUNK_TIMEOUT
+//     */
+//    protected long getChunkTimeout() {
+//        
+//        return getProperty(Annotations.CHUNK_TIMEOUT,
+//                Annotations.DEFAULT_CHUNK_TIMEOUT);
+//        
+//    }
 
 }

Modified: branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/BOpContext.java
===================================================================
--- branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/BOpContext.java	2013-12-31 19:48:13 UTC (rev 7711)
+++ branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/BOpContext.java	2014-01-03 19:18:16 UTC (rev 7712)
@@ -59,8 +59,9 @@
 import com.bigdata.rwstore.sector.IMemoryManager;
 import com.bigdata.striterator.ChunkedFilter;
 import com.bigdata.striterator.Chunkerator;
-import com.bigdata.striterator.CloseableIteratorWrapper;
+import com.bigdata.striterator.CloseableChunkedIteratorWrapperConverter;
 import com.bigdata.striterator.IChunkedIterator;
+import com.bigdata.striterator.IChunkedStriterator;
 
 import cutthecrap.utils.striterators.ICloseableIterator;
 
@@ -1078,8 +1079,8 @@
     }
 
     /**
-     * Convert an {@link IAccessPath#iterator()} into a stream of
-     * {@link IBindingSet}s.
+     * Convert an {@link IAccessPath#iterator()} into a stream of chunks of
+     * {@link IBindingSet}.
      * 
      * @param src
      *            The iterator draining the {@link IAccessPath}. This will visit
@@ -1090,7 +1091,7 @@
      *            Statistics to be updated as elements and chunks are consumed
      *            (optional).
      * 
-     * @return The dechunked iterator visiting the solutions. The order of the
+     * @return An iterator visiting chunks of solutions. The order of the
      *         original {@link IElement}s is preserved.
      * 
      * @see https://sourceforge.net/apps/trac/bigdata/ticket/209 (AccessPath
@@ -1105,14 +1106,15 @@
 //    *            The array of distinct variables (no duplicates) to be
 //    *            extracted from the visited {@link IElement}s.
     @SuppressWarnings({ "rawtypes", "unchecked" })
-    static public ICloseableIterator<IBindingSet> solutions(
+    static public ICloseableIterator<IBindingSet[]> solutions(
             final IChunkedIterator<?> src, //
             final IPredicate<?> pred,//
 //            final IVariable<?>[] varsx, 
             final BaseJoinStats stats//
             ) {
 
-        return new CloseableIteratorWrapper(
+        //return new CloseableIteratorWrapper(
+        final IChunkedStriterator itr1 =
                 new com.bigdata.striterator.ChunkedStriterator(src).addFilter(
 //                        new ChunkedFilter() {
                         new ChunkedFilter<IChunkedIterator<Object>, Object, Object>() {
@@ -1160,18 +1162,28 @@
 
                     }
 
-                })) {
+                });
+        //) {
+//
+//            /**
+//             * Close the real source if the caller closes the returned iterator.
+//             */
+//            @Override
+//            public void close() {
+//                super.close();
+//                src.close();
+//            }
+//        };
 
-            /**
-             * Close the real source if the caller closes the returned iterator.
-             */
-            @Override
-            public void close() {
-                super.close();
-                src.close();
-            }
-        };
+        /*
+         * Convert from IChunkedIterator<IBindingSet> to
+         * ICloseableIterator<IBindingSet[]>. This is a fly weight conversion.
+         */
+        final ICloseableIterator<IBindingSet[]> itr2 = new CloseableChunkedIteratorWrapperConverter<IBindingSet>(
+                itr1);
 
+        return itr2;
+
     }
 
 /*

Modified: branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/BOpUtility.java
===================================================================
--- branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/BOpUtility.java	2013-12-31 19:48:13 UTC (rev 7711)
+++ branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/BOpUtility.java	2014-01-03 19:18:16 UTC (rev 7712)
@@ -35,6 +35,7 @@
 import java.util.LinkedList;
 import java.util.List;
 import java.util.Map;
+import java.util.NoSuchElementException;
 import java.util.Set;
 
 import org.apache.log4j.Logger;
@@ -72,7 +73,7 @@
      * Pre-order recursive visitation of the operator tree (arguments only, no
      * annotations).
      */
-    @SuppressWarnings("unchecked")
+    @SuppressWarnings({ "unchecked", "rawtypes" })
     public static Iterator<BOp> preOrderIterator(final BOp op) {
 
         return new Striterator(new SingleValueIterator(op))
@@ -466,6 +467,8 @@
      *            The type of the node to be extracted.
      * 
      * @return A list containing those references.
+     * 
+     * @see #visitAll(BOp, Class)
      */
     public static <C> List<C> toList(final BOp op, final Class<C> clas) {
         
@@ -483,6 +486,44 @@
         
     }
 
+    /**
+     * Return the sole instance of the specified class.
+     * 
+     * @param op
+     *            The root of the traversal.
+     * @param class1
+     *            The class to look for.
+     * @return The sole instance of that class.
+     * @throws NoSuchElementException
+     *             if there is no such instance.
+     * @throws RuntimeException
+     *             if there is more than one such instance.
+     */
+    public static <C> C getOnly(final BOp op, final Class<C> class1) {
+        final Iterator<C> it = visitAll(op, class1);
+        if (!it.hasNext())
+            throw new NoSuchElementException("No instance found: class="
+                    + class1);
+        final C ret = it.next();
+        if (it.hasNext())
+            throw new RuntimeException("More than one instance exists: class="
+                    + class1);
+        return ret;
+    }
+
+    /**
+     * Return an iterator visiting references to all nodes of the given type
+     * (recursive, including annotations).
+     * 
+     * @param op
+     *            The root of the operator tree.
+     * @param clas
+     *            The type of the node to be extracted.
+     * 
+     * @return A iterator visiting those references.
+     * 
+     * @see #toList(BOp, Class)
+     */
     @SuppressWarnings("unchecked")
 	public static <C> Iterator<C> visitAll(final BOp op, final Class<C> clas) {
     	

Modified: branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/controller/HTreeNamedSubqueryOp.java
===================================================================
--- branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/controller/HTreeNamedSubqueryOp.java	2013-12-31 19:48:13 UTC (rev 7711)
+++ branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/controller/HTreeNamedSubqueryOp.java	2014-01-03 19:18:16 UTC (rev 7712)
@@ -73,7 +73,7 @@
  * 
  * @author <a href="mailto:tho...@us...">Bryan Thompson</a>
  */
-public class HTreeNamedSubqueryOp extends PipelineOp {
+public class HTreeNamedSubqueryOp extends PipelineOp implements INamedSubqueryOp {
 
     static private final transient Logger log = Logger
             .getLogger(HTreeNamedSubqueryOp.class);
@@ -151,7 +151,7 @@
 
     }
 
-    public HTreeNamedSubqueryOp(final BOp[] args, NV... annotations) {
+    public HTreeNamedSubqueryOp(final BOp[] args, final NV... annotations) {
 
         this(args, NV.asMap(annotations));
         
@@ -164,6 +164,7 @@
 
     }
     
+    @Override
     public FutureTask<Void> eval(final BOpContext<IBindingSet> context) {
 
         return new FutureTask<Void>(new ControllerTask(this, context));
@@ -266,6 +267,7 @@
         /**
          * Evaluate.
          */
+        @Override
         public Void call() throws Exception {
             
             try {
@@ -356,6 +358,7 @@
 
             }
 
+            @Override
             public Void call() throws Exception {
 
             	// The subquery

Added: branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/controller/INamedSubqueryOp.java
===================================================================
--- branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/controller/INamedSubqueryOp.java	                        (rev 0)
+++ branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/controller/INamedSubqueryOp.java	2014-01-03 19:18:16 UTC (rev 7712)
@@ -0,0 +1,42 @@
+/**
+
+Copyright (C) SYSTAP, LLC 2006-2010.  All rights reserved.
+
+Contact:
+     SYSTAP, LLC
+     4501 Tower Road
+     Greensboro, NC 27410
+     lic...@bi...
+
+This program is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; version 2 of the License.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; if not, write to the Free Software
+Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+*/
+package com.bigdata.bop.controller;
+
+import com.bigdata.bop.join.SolutionSetHashJoinOp;
+
+/**
+ * Marker interface for named subquery evaluation. Solutions from the pipeline
+ * flow through this operator without modification. The subquery is evaluated
+ * exactly once, the first time this operator is invoked, and the solutions for
+ * the subquery are written onto a hash index. Those solutions are then joined
+ * back within the query at latter points in the query plan using a solution set
+ * hash join.
+ * 
+ * @see SolutionSetHashJoinOp
+ * 
+ * @author <a href="mailto:tho...@us...">Bryan Thompson</a>
+ */
+public interface INamedSubqueryOp {
+
+}

Modified: branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/controller/JVMNamedSubqueryOp.java
===================================================================
--- branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/controller/JVMNamedSubqueryOp.java	2013-12-31 19:48:13 UTC (rev 7711)
+++ branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/controller/JVMNamedSubqueryOp.java	2014-01-03 19:18:16 UTC (rev 7712)
@@ -73,7 +73,7 @@
  * 
  * @author <a href="mailto:tho...@us...">Bryan Thompson</a>
  */
-public class JVMNamedSubqueryOp extends PipelineOp {
+public class JVMNamedSubqueryOp extends PipelineOp implements INamedSubqueryOp {
 
     static private final transient Logger log = Logger
             .getLogger(JVMNamedSubqueryOp.class);
@@ -140,7 +140,7 @@
 
     }
 
-    public JVMNamedSubqueryOp(final BOp[] args, NV... annotations) {
+    public JVMNamedSubqueryOp(final BOp[] args, final NV... annotations) {
 
         this(args, NV.asMap(annotations));
         
@@ -153,6 +153,7 @@
 
     }
 
+    @Override
     public FutureTask<Void> eval(final BOpContext<IBindingSet> context) {
 
         return new FutureTask<Void>(new ControllerTask(this, context));
@@ -254,6 +255,7 @@
         /**
          * Evaluate.
          */
+        @Override
         public Void call() throws Exception {
             
             try {
@@ -344,6 +346,7 @@
 
             }
 
+            @Override
             public Void call() throws Exception {
 
             	// The subquery

Modified: branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/controller/ServiceCallJoin.java
===================================================================
--- branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/controller/ServiceCallJoin.java	2013-12-31 19:48:13 UTC (rev 7711)
+++ branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/controller/ServiceCallJoin.java	2014-01-03 19:18:16 UTC (rev 7712)
@@ -69,6 +69,7 @@
 import com.bigdata.relation.accesspath.IBuffer;
 import com.bigdata.relation.accesspath.UnsyncLocalOutputBuffer;
 import com.bigdata.striterator.ChunkedArrayIterator;
+import com.bigdata.striterator.Chunkerator;
 import com.bigdata.util.InnerCause;
 import com.bigdata.util.concurrent.LatchedExecutor;
 
@@ -571,6 +572,7 @@
 
             }
 
+            @Override
             public Void call() throws Exception {
 
                 final UnsyncLocalOutputBuffer<IBindingSet> unsyncBuffer = new UnsyncLocalOutputBuffer<IBindingSet>(
@@ -592,7 +594,7 @@
                         chunk), null/* stats */);
 
                 // The iterator draining the subquery
-                ICloseableIterator<IBindingSet> serviceSolutionItr = null;
+                ICloseableIterator<IBindingSet[]> serviceSolutionItr = null;
                 try {
 
                     /*
@@ -609,10 +611,13 @@
                          * Do a hash join of the source solutions with the
                          * solutions from the service, outputting any solutions
                          * which join.
+                         * 
+                         * Note: 
                          */
                         
-                        state.hashJoin(serviceSolutionItr, unsyncBuffer);
-                        
+                        state.hashJoin(serviceSolutionItr, null/* stats */,
+                                unsyncBuffer);
+
                     }
 
                 } finally {
@@ -671,26 +676,35 @@
              *         SILENT is <code>true</code>.
              * 
              * @throws Exception
+             * 
+             *             TODO RECHUNKING Push down the
+             *             ICloseableIterator<IBindingSet[]> return type into
+             *             the {@link ServiceCall} interface and the various
+             *             ways in which we can execute a service call. Do this
+             *             as part of vectoring solutions in and out of service
+             *             calls?
              */
-            private ICloseableIterator<IBindingSet> doServiceCall(
+            private ICloseableIterator<IBindingSet[]> doServiceCall(
                     final ServiceCall<? extends Object> serviceCall,
                     final IBindingSet[] left) throws Exception {
 
                 try {
                     
+                    final ICloseableIterator<IBindingSet> itr;
+                    
                     if (serviceCall instanceof BigdataServiceCall) {
 
-                        return doBigdataServiceCall(
+                        itr = doBigdataServiceCall(
                                 (BigdataServiceCall) serviceCall, left);
 
                     } else if (serviceCall instanceof ExternalServiceCall) {
 
-                        return doExternalServiceCall(
+                        itr = doExternalServiceCall(
                                 (ExternalServiceCall) serviceCall, left);
 
                     } else if (serviceCall instanceof RemoteServiceCall) {
 
-                        return doRemoteServiceCall(
+                        itr = doRemoteServiceCall(
                                 (RemoteServiceCall) serviceCall, left);
 
                     } else {
@@ -698,7 +712,13 @@
                         throw new AssertionError();
 
                     }
+
                     
+                    final ICloseableIterator<IBindingSet[]> itr2 = new Chunkerator<IBindingSet>(
+                            itr, op.getChunkCapacity(), IBindingSet.class);
+
+                    return itr2;
+                    
                 } catch (Throwable t) {
 
                     if (silent

Modified: branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/engine/QueryLog.java
===================================================================
--- branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/engine/QueryLog.java	2013-12-31 19:48:13 UTC (rev 7711)
+++ branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/engine/QueryLog.java	2014-01-03 19:18:16 UTC (rev 7712)
@@ -61,6 +61,8 @@
 import com.bigdata.bop.solutions.DropOp;
 import com.bigdata.bop.solutions.GroupByOp;
 import com.bigdata.bop.solutions.ProjectionOp;
+import com.bigdata.bop.solutions.SliceOp;
+import com.bigdata.btree.Tuple;
 import com.bigdata.counters.render.XHTMLRenderer;
 import com.bigdata.rawstore.Bytes;
 import com.bigdata.rdf.sparql.ast.eval.AST2BOpJoins;
@@ -826,6 +828,10 @@
         }
         w.write("<th>bopSummary</th>");
         w.write("<th>predSummary</th>");
+        if (detailedStats) {
+            w.write("<th>bopAnnotations</th>");
+            w.write("<th>predAnnotations</th>");
+        }
         // metadata considered by the static optimizer.
         if(detailedStats) {
             w.write("<th>staticBestKeyOrder</th>"); // original key order assigned
@@ -1119,25 +1125,16 @@
             w.write(TDx);
         }
 
+        // bopSummary
         w.write(TD);
-        if(summary) {
+        if (summary) {
             w.write("total");
         } else {
             w.write(cdata(bop.getClass().getSimpleName()));
             w.write(cdata("[" + bopId + "]"));
-            final Integer defaultSink = (Integer) bop
-                    .getProperty(PipelineOp.Annotations.SINK_REF);
-            final Integer altSink = (Integer) bop
-                    .getProperty(PipelineOp.Annotations.ALT_SINK_REF);
-            if (defaultSink != null) {
-                w.write(cdata(", sink=" + defaultSink));
-            }
-            if (altSink != null) {
-                w.write(cdata(", altSink=" + altSink));
-            }
         }
         w.write(TDx);
-
+        
         /*
          * Pperator summary (not shown for the "total" line).
          * 
@@ -1264,9 +1261,29 @@
                 w.write(cdata(Arrays.toString(((ProjectionOp) bop)
                         .getVariables())));
             }
+            if (bop instanceof SliceOp) {
+                w.write(cdata("offset=" + ((SliceOp) bop).getOffset()));
+                w.write(cdata(", limit=" + ((SliceOp) bop).getLimit()));
+            }
         }
-        w.write(TDx); // end summary
+        w.write(TDx); // end predSummary
 
+        if (detailedStats) {
+            // bopAnnotations
+            w.write(TD);
+            showAnnotations(w, bop.annotations());
+            w.write(TDx);
+        }
+
+        if (detailedStats) {
+            // predAnnotations
+            w.write(TD);
+            if (pred != null) {
+                showAnnotations(w, pred.annotations());
+            }
+            w.write(TDx);
+        }
+
         /*
          * Static optimizer metadata.
          * 
@@ -1505,6 +1522,41 @@
     }
 
     /**
+     * Shows annotations on a {@link BOp}.
+     * 
+     * @param w
+     *            Where to write the XHTML data.
+     * @param anns
+     *            The annotations (optional).
+     * @throws IOException
+     */
+    static private void showAnnotations(final Writer w,
+            final Map<String, Object> anns) throws IOException {
+        if (anns != null && !anns.isEmpty()) {
+            w.write("<dl>");
+            for (Map.Entry<String, Object> e : anns.entrySet()) {
+                w.write("<dt>");
+                final String key = e.getKey();
+                w.write(cdata(key));
+                w.write("</dt><dd>");
+                final Object val = e.getValue();
+                // See CoreBaseBop for this pattern.
+                if (val != null && val.getClass().isArray()) {
+                    w.write(cdata(Arrays.toString((Object[]) val)));
+                } else if (key.equals(IPredicate.Annotations.FLAGS)) {
+                    w.write(cdata(Tuple.flagString((Integer) val)));
+                } else if (val instanceof BOp) {
+                    w.write(cdata(((BOp) val).toShortString()));
+                } else {
+                    w.write(cdata("" + val));
+                }
+                w.write("</dd>");
+            }
+            w.write("</dl>");
+        }
+    }
+
+    /**
      * Write a summary row for the query. The table element, header, and footer
      * must be written separately.
      * 

Modified: branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/join/HTreeHashIndexOp.java
===================================================================
--- branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/join/HTreeHashIndexOp.java	2013-12-31 19:48:13 UTC (rev 7711)
+++ branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/join/HTreeHashIndexOp.java	2014-01-03 19:18:16 UTC (rev 7712)
@@ -72,222 +72,22 @@
 
         super(args, annotations);
 
-//        if (getEvaluationContext() != BOpEvaluationContext.CONTROLLER) {
-//            throw new IllegalArgumentException(
-//                    BOp.Annotations.EVALUATION_CONTEXT + "="
-//                            + getEvaluationContext());
-//        }
-//
-//        if (getMaxParallel() != 1) {
-//            /*
-//             * Parallel evaluation is not allowed. This operator writes on an
-//             * object that is not thread-safe for mutation.
-//             */
-//            throw new IllegalArgumentException(
-//                    PipelineOp.Annotations.MAX_PARALLEL + "="
-//                            + getMaxParallel());
-//        }
-//
-//        if (!isLastPassRequested()) {
-//            /*
-//             * Last pass evaluation must be requested. This operator will not
-//             * produce any outputs until all source solutions have been
-//             * buffered.
-//             */
-//            throw new IllegalArgumentException(PipelineOp.Annotations.LAST_PASS
-//                    + "=" + isLastPassRequested());
-//        }
-//
-//        getRequiredProperty(Annotations.NAMED_SET_REF);
-//
-//        @SuppressWarnings("unused")
-//        final JoinTypeEnum joinType = (JoinTypeEnum) getRequiredProperty(Annotations.JOIN_TYPE);
-//
-//        // Join variables must be specified.
-//        final IVariable<?>[] joinVars = (IVariable[]) getRequiredProperty(Annotations.JOIN_VARS);
-//
-////        if (joinVars.length == 0)
-////            throw new IllegalArgumentException(Annotations.JOIN_VARS);
-//
-//        for (IVariable<?> var : joinVars) {
-//
-//            if (var == null)
-//                throw new IllegalArgumentException(Annotations.JOIN_VARS);
-//
-//        }
-
     }
 
-    public HTreeHashIndexOp(final BOp[] args, NV... annotations) {
+    public HTreeHashIndexOp(final BOp[] args, final NV... annotations) {
 
         this(args, NV.asMap(annotations));
-        
+
     }
 
-//    @Override
-//    public BOpStats newStats() {
-//
-//        return new NamedSolutionSetStats();
-//
-//    }
-    
     @Override
     protected HTreeHashJoinUtility newState(
             final BOpContext<IBindingSet> context,
             final INamedSolutionSetRef namedSetRef, final JoinTypeEnum joinType) {
 
-        return new HTreeHashJoinUtility(
-                context.getMemoryManager(namedSetRef.getQueryId()), this, joinType);
+        return new HTreeHashJoinUtility(context.getMemoryManager(namedSetRef
+                .getQueryId()), this, joinType);
 
     }
-    
-//    public FutureTask<Void> eval(final BOpContext<IBindingSet> context) {
-//
-//        return new FutureTask<Void>(new ControllerTask(this, context));
-//        
-//    }
-    
-//	/**
-//	 * Evaluates the subquery for each source binding set. If the controller
-//	 * operator is interrupted, then the subqueries are cancelled. If a subquery
-//	 * fails, then all subqueries are cancelled.
-//	 */
-//    private static class ControllerTask implements Callable<Void> {
-//
-//        private final BOpContext<IBindingSet> context;
-//
-//        private final HTreeHashIndexOp op;
-//        
-//        private final NamedSolutionSetStats stats;
-//        
-//        private final IHashJoinUtility state;
-//        
-//        public ControllerTask(final HTreeHashIndexOp op,
-//                final BOpContext<IBindingSet> context) {
-//
-//            if (op == null)
-//                throw new IllegalArgumentException();
-//
-//            if (context == null)
-//                throw new IllegalArgumentException();
-//
-//            this.context = context;
-//
-//            this.op = op;
-//            
-//            this.stats = ((NamedSolutionSetStats) context.getStats());
-//
-//            // Metadata to identify the named solution set.
-//            final NamedSolutionSetRef namedSetRef = (NamedSolutionSetRef) op
-//                    .getRequiredProperty(Annotations.NAMED_SET_REF);
-//            
-//            {
-//
-//                /*
-//                 * First, see if the map already exists.
-//                 * 
-//                 * Note: Since the operator is not thread-safe, we do not need
-//                 * to use a putIfAbsent pattern here.
-//                 */
-//                
-//                // Lookup the attributes for the query on which we will hang the
-//                // solution set.
-//                final IQueryAttributes attrs = context
-//                        .getQueryAttributes(namedSetRef.queryId);
-//
-//                HTreeHashJoinUtility state = (HTreeHashJoinUtility) attrs
-//                        .get(namedSetRef);
-//
-//                if (state == null) {
-//                    
-//                    final JoinTypeEnum joinType = (JoinTypeEnum) op
-//                            .getRequiredProperty(Annotations.JOIN_TYPE);
-//
-//                    state = new HTreeHashJoinUtility(
-//                            context.getMemoryManager(namedSetRef.queryId), op,
-//                            joinType);
-//
-//                    if (attrs.putIfAbsent(namedSetRef, state) != null)
-//                        throw new AssertionError();
-//                                        
-//                }
-//                
-//                this.state = state;
-//
-//            }
-//            
-//        }
-//
-//        /**
-//         * Evaluate.
-//         */
-//        public Void call() throws Exception {
-//            
-//            try {
-//
-//                // Buffer all source solutions.
-//                acceptSolutions();
-//                
-//                if(context.isLastInvocation()) {
-//
-//                    // Checkpoint the solution set.
-//                    checkpointSolutionSet();
-//                    
-//                    // Output the buffered solutions.
-//                    outputSolutions();
-//                    
-//                }
-//                
-//                // Done.
-//                return null;
-//
-//            } finally {
-//                
-//                context.getSource().close();
-//
-//                context.getSink().close();
-//
-//            }
-//            
-//        }
-//
-//        /**
-//         * Buffer intermediate resources.
-//         */
-//        private void acceptSolutions() {
-//
-//            state.acceptSolutions(context.getSource(), stats);
-//
-//        }
-//
-//        /**
-//         * Checkpoint and save the solution set.
-//         */
-//        private void checkpointSolutionSet() {
-//            
-//            state.saveSolutionSet();
-//            
-//        }
-//        
-//        /**
-//         * Output the buffered solutions.
-//         */
-//        private void outputSolutions() {
-//
-//            // default sink
-//            final IBlockingBuffer<IBindingSet[]> sink = context.getSink();
-//
-//            final UnsyncLocalOutputBuffer<IBindingSet> unsyncBuffer = new UnsyncLocalOutputBuffer<IBindingSet>(
-//                    op.getChunkCapacity(), sink);
-//
-//            state.outputSolutions(unsyncBuffer);
-//            
-//            unsyncBuffer.flush();
-//
-//            sink.flush();
-//
-//        }
-//
-//    } // ControllerTask
 
 }

Modified: branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/join/HTreeHashJoinUtility.java
===================================================================
--- branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/join/HTreeHashJoinUtility.java	2013-12-31 19:48:13 UTC (rev 7711)
+++ branches/BIGDATA_RELEASE_1_3_0/bigdata/src/java/com/bigdata/bop/join/HTreeHashJoinUtility.java	2014-01-03 19:18:16 UTC (rev 7712)
@@ -75,8 +75,6 @@
 import com.bigdata.relation.accesspath.IBuffer;
 import com.bigdata.rwstore.sector.IMemoryManager;
 import com.bigdata.rwstore.sector.MemStore;
-import com.bigdata.striterator.Chunkerator;
-import com.bigdata.striterator.Dechunkerator;
 import com.bigdata.util.InnerCause;
 
 import cutthecrap.utils.striterators.Expander;
@@ -178,7 +176,12 @@
             // Works Ok.
             h = 31 * h + c.hashCode();
             
-//          // Martyn's version.  Also works Ok.
+            /*
+             * TODO Martyn's version. Also works Ok. Compare rate of hash
+             * collisions and impact on join performance. Also compare use of
+             * 64-bit hash codes and impact on join performance (there should be
+             * fewer hash collisions).
+             */
             // @see http://burtleburtle.net/bob/hash/integer.html
 //            
 //            final int hc = c.hashCode();
@@ -210,15 +213,10 @@
      */
     private final PipelineOp op;
     
-    /**
-     * This basically controls the vectoring of the hash join.
-     * 
-     * TODO parameter from operator annotations. Note that 10k tends to put too
-     * much heap pressure on the system if the source chunks happen to be
-     * smallish. 1000k or 100 is probably the right value until we improve
-     * vectoring of the query engine.
-     */
-    private final int chunkSize = 1000;//ChunkedWrappedIterator.DEFAULT_CHUNK_SIZE;
+//    /**
+//     * This basically controls the vectoring of the hash join.
+//     */
+//    private final int chunkSize = 1000;//ChunkedWrappedIterator.DEFAULT_CHUNK_SIZE;
 
     /**
      * Utility class for compactly and efficiently encoding and decoding
@@ -305,7 +303,7 @@
      * The maximum #of (left,right) solution joins that will be considered
      * before failing the join. This is used IFF there are no join variables.
      * 
-     * FIXME Annotation and query hint for this. Probably on
+     * TODO HINTS: Annotation and query hint for this. Probably on
      * {@link HashJoinAnnotations}.
      */
     private final long noJoinVarsLimit = HashJoinAnnotations.DEFAULT_NO_JOIN_VARS_LIMIT;
@@ -357,8 +355,8 @@
         sb.append(getClass().getSimpleName());
         
         sb.append("{open=" + open);
-        sb.append(",joinType="+joinType);
-        sb.append(",chunkSize=" + chunkSize);
+        sb.append(",joinType=" + joinType);
+//        sb.append(",chunkSize=" + chunkSize);
 //        sb.append(",optional=" + optional);
 //        sb.append(",filter=" + filter);
         if (askVar != null)
@@ -707,30 +705,8 @@
             final IKeyBuilder keyBuilder = htree.getIndexMetadata()
                     .getKeyBuilder();
 
-            /*
-             * Rechunk in order to have a nice fat vector size for ordered
-             * inserts.
-             * 
-             * TODO This should probably be eliminated in favor of the existing
-             * chunk size. That allows us to control the vectoring directly from
-             * the pipeline annotations for the query engine. If 1000 (the
-             * current [chunkSize] hard wired into this class) makes a
-             * significant difference over 100 (the current default pipeline
-             * chunk capacity) then we should simply override the default chunk
-             * capacity for the htree hash join operators (i.e., analytic
-             * operators always imply a larger default chunk capacity, as could
-             * operators running on a cluster). This change should be verified
-             * against the GOVTRACK dataset and also by using BSBM with JVM and
-             * HTree hash joins and measuring the change in the performance
-             * delta when the HTree hash join vector size is changed.
-             * 
-             * @see https://sourceforge.net/apps/trac/bigdata/ticket/483
-             * (Eliminate unnecessary dechunking and rechunking)
-             */
-
-            final ICloseableIterator<IBindingSet[]> it = new Chunkerator<IBindingSet>(
-                    new Dechunkerator<IBindingSet>(itr), chunkSize,
-                    IBindingSet.class);
+            // Note: We no longer re-chunk here.
+            final ICloseableIterator<IBindingSet[]> it = itr;
             
             try {
                 
@@ -805,16 +781,13 @@
 
         final IKeyBuilder keyBuilder = htree.getIndexMetadata().getKeyBuilder();
 
-        /*
-         * Rechunk in order to have a nice fat vector size for ordered inserts.
-         */
-        final Iterator<IBindingSet[]> it = new Chunkerator<IBindingSet>(
-                new Dechunkerator<IBindingSet>(itr), chunkSize,
-                IBindingSet.class);
+        // Note: We no longer rechunk here.
+        final Iterator<IBindingSet[]> it = itr;
 
         final AtomicInteger vectorSize = new AtomicInteger();
         while (it.hasNext()) {
 
+            // Vector a chunk of solutions.
             final BS[] a = vector(it.next(), joinVars, selectVars,
                     true/* ignoreUnboundVariables */, vectorSize);
   
@@ -943,6 +916,7 @@
             return 0;
         }
         
+        @Override
         public String toString() {
             return getClass().getName() + "{hashCode=" + hashCode + ",bset="
                     + bset + "}";
@@ -974,6 +948,7 @@
             return 0;
         }
         
+        @Override
         public String toString() {
             return getClass().getName() + "{hashCode=" + hashCode + ",value="
                     + BytesUtil.toString(value) + "}";
@@ -983,21 +958,26 @@
     
     @Override
     public void hashJoin(//
-            final ICloseableIterator<IBindingSet> leftItr,//
+            final ICloseableIterator<IBindingSet[]> leftItr,//
+            final BOpStats stats,
             final IBuffer<IBindingSet> outputBuffer//
             ) {
 
-        hashJoin2(leftItr, outputBuffer, constraints);
+        hashJoin2(leftItr, stats, outputBuffer, constraints);
         
     }
 
     @Override
     public void hashJoin2(//
-            final ICloseableIterator<IBindingSet> leftItr,//
+            final ICloseableIterator<IBindingSet[]> leftItr,//
+            final BOpStats stats,//
             final IBuffer<IBindingSet> outputBuffer,//
             final IConstraint[] constraints//
             ) {
 
+        // Note: We no longer rechunk in this method.
+        final Iterator<IBindingSet[]> it = leftItr;
+
         try {
 
             final HTree rightSolutions = this.getRightSolutions();
@@ -1012,22 +992,37 @@
             final IKeyBuilder keyBuilder = rightSolutions.getIndexMetadata()
                     .getKeyBuilder();
 
-            final Iterator<IBindingSet[]> it = new Chunkerator<IBindingSet>(
-                    leftItr, chunkSize, IBindingSet.class);
-
             // true iff there are no join variables.
             final boolean noJoinVars = joinVars.length == 0;
             
             final AtomicInteger vectorSize = new AtomicInteger();
             while (it.hasNext()) {
 
-                final BS[] a = vector(it.next(), joinVars,
-                        null/* selectVars */,
-                        false/* ignoreUnboundVariables */, vectorSize);
-                
-                final int n = vectorSize.get();
+                final BS[] a; // vectored solutions.
+                final int n; // #of valid elements in a[].
+                {
+                    // Next chunk of solutions from left.
+                    final IBindingSet[] b = it.next();
+                    if (stats != null) {
+                        stats.chunksIn.increment();
+                        stats.unitsIn.add(b.length);
+                    }
 
-                nleftConsidered.add(n);
+                    // Vector a chunk of solutions, ordering by hashCode.
+                    a = vector(b, joinVars, null/* selectVars */,
+                            false/* ignoreUnboundVariables */, vectorSize);
+
+                    // The size of that vector.
+                    n = vectorSize.get();
+                    
+                    nleftConsidered.add(n);
+
+                    if (log.isTraceEnabled())
+                        log.trace("Vectoring chunk for HTree locality: " + n
+                                + " out of " + a.length
+                                + " solutions are preserved.");
+
+                }
                 
                 int fromIndex = 0;
 
@@ -1056,7 +1051,8 @@
 
                     if (log.isTraceEnabled())
                         log.trace("hashCode=" + hashCode + ": #left="
-                                + bucketSize + ", firstLeft=" + a[fromIndex]);
+                                + bucketSize + ", vectorSize=" + n
+                                + ", firstLeft=" + a[fromIndex]);
 
                     /*
                      * Note: all source solutions in [fromIndex:toIndex) have
@@ -1080,7 +1076,8 @@
                     int nrejected = 0;
                     {
 
-                        final byte[] key = keyBuilder.reset().append(hashCode).getKey();
+                        final byte[] key = keyBuilder.reset().append(hashCode)
+                                .getKey();
                         
                         // vi...
 
[truncated message content]

[Bigdata-commit] SF.net SVN: bigdata:[7712] branches/BIGDATA_RELEASE_1_3_0

Fast, scalable, robust graph database platform

[Bigdata-commit] SF.net SVN: bigdata:[7712] branches/BIGDATA_RELEASE_1_3_0