From: <tho...@us...> - 2010-11-07 12:46:43
|
Revision: 3906 http://bigdata.svn.sourceforge.net/bigdata/?rev=3906&view=rev Author: thompsonbry Date: 2010-11-07 12:46:37 +0000 (Sun, 07 Nov 2010) Log Message: ----------- Reduced a variety of defaults in order to reduce the heap demand associated with join processing on larger data sets. IChunkedIterator.DEFAULT_CHUNK_SIZE = 100;//was 10000; BlockingBuffer.DEFAULT_PRODUCER_QUEUE_CAPACITY = 10; // was 5000 BlockingBuffer.DEFAULT_MINIMUM_CHUNK_SIZE = 100; // was 10000 AbstractResource.DEFAULT_CHUNK_OF_CHUNKS_CAPACITY = "10"; // was 1000 AbstractTipleStore.DEFAULT_TERM_CACHE_CAPACITY = "5000"; // was 50000 AbstractAccessPath#1030 modified to pass in the chunkCapacity. final BlockingBuffer<R[]> buffer = new BlockingBuffer<R[]>( chunkOfChunksCapacity,chunkCapacity,10,TimeUnit.MILLISECONDS); and AbstractResource.DEFAULT_FULLY_BUFFERED_READ_THRESHOLD = "200";//""+20*Bytes.kilobyte32; Load of U50 is unchanged when compared with the baseline. [java] Load: 6890949 stmts added in 123.422 secs, rate= 55831, commitLatency=0ms However, closure is significantly slower (compare with 30707). Closure performance can not be related to the lexicon, so this must be either the queue capacity or the chunk capacity. [java] Closure: ClosureStats{mutationCount=1699274, elapsed=71662ms, rate=23712} Total time: 3 minutes 17 seconds There is very little impact on query: (compare with 10569 for ~ 4k pages from above). [java] ### Finished testing BIGDATA_SPARQL_ENDPOINT ### [java] BIGDATA_SPARQL_ENDPOINT #trials=10 #parallel=1 [java] query Time Result# [java] query1 46 4 [java] query3 25 6 [java] query4 63 34 [java] query5 59 719 [java] query7 24 61 [java] query8 189 6463 [java] query10 26 0 [java] query11 26 0 [java] query12 34 0 [java] query13 28 0 [java] query14 2952 393730 [java] query6 3218 430114 [java] query9 2958 8627 [java] query2 740 130 [java] Total 10388 However, when looking at U1000 there is a significant benefit for query: [java] Load: 138318723 stmts added in 7559.498 secs, rate= 18297, commitLatency=0ms [java] Closure: ClosureStats{mutationCount=34082911, elapsed=2909594ms, rate=11713} [java] ### Finished testing BIGDATA_SPARQL_ENDPOINT ### [java] BIGDATA_SPARQL_ENDPOINT #trials=10 #parallel=1 [java] query Time Result# [java] query1 69 4 [java] query3 33 6 [java] query4 67 34 [java] query5 66 719 [java] query7 34 61 [java] query8 231 6463 [java] query10 26 0 [java] query11 27 0 [java] query12 28 0 [java] query13 23 0 [java] query14 69907 7924765 (versus 124545) [java] query6 74343 8653646 (versus 130354) [java] query9 76161 172632 (versus 125518) [java] query2 368962 2528 (versus inconsistent due to backed out change to AbstractBTree.touch()) [java] Total 589977 This commit therefore improves query performance on larger LUBM data sets, but has a known negative impact on U50 closure and an unknown impact on LUBM U1000 closure. Closure warrants additional investigation. BSBM 100M performance with these changes and the following settings is as follows (this is the reduced query mix without query 3): com.bigdata.btree.writeRetentionQueue.capacity=4000 com.bigdata.btree.BTree.branchingFactor=128 # Reduce the branching factor for the lexicon since BSBM uses a lot of long # literals. Note that you have to edit this override to specify the namespace # into which the BSBM data will be loaded. com.bigdata.namespace.BSBM_284826.lex.TERM2ID.com.bigdata.btree.BTree.branchingFactor=32 com.bigdata.namespace.BSBM_284826.lex.ID2TERM.com.bigdata.btree.BTree.branchingFactor=32 # 4k pages. com.bigdata.namespace.BSBM_284826.spo.POS.com.bigdata.btree.BTree.branchingFactor=970 com.bigdata.namespace.BSBM_284826.spo.SPO.com.bigdata.btree.BTree.branchingFactor=512 com.bigdata.namespace.BSBM_284826.spo.OSP.com.bigdata.btree.BTree.branchingFactor=470 # Override the #of write cache buffers. com.bigdata.journal.AbstractJournal.writeCacheBufferCount=12 Cold JVM run immediately after data load: 98-99% disk utilization. [java] QMpH: 7515.78 query mixes per hour Hot JVM, cold disk: 98-99% disk utilization. [java] QMpH: 6459.97 query mixes per hour Hot JVM, hot disk: ~4% utilization. [java] QMpH: 40213.81 query mixes per hour Modified Paths: -------------- branches/JOURNAL_HA_BRANCH/bigdata/src/java/com/bigdata/relation/AbstractResource.java branches/JOURNAL_HA_BRANCH/bigdata/src/java/com/bigdata/relation/accesspath/AbstractAccessPath.java branches/JOURNAL_HA_BRANCH/bigdata/src/java/com/bigdata/relation/accesspath/BlockingBuffer.java branches/JOURNAL_HA_BRANCH/bigdata/src/java/com/bigdata/striterator/IChunkedIterator.java branches/JOURNAL_HA_BRANCH/bigdata-rdf/src/java/com/bigdata/rdf/store/AbstractLocalTripleStore.java branches/JOURNAL_HA_BRANCH/bigdata-rdf/src/java/com/bigdata/rdf/store/AbstractTripleStore.java Modified: branches/JOURNAL_HA_BRANCH/bigdata/src/java/com/bigdata/relation/AbstractResource.java =================================================================== --- branches/JOURNAL_HA_BRANCH/bigdata/src/java/com/bigdata/relation/AbstractResource.java 2010-11-07 12:39:08 UTC (rev 3905) +++ branches/JOURNAL_HA_BRANCH/bigdata/src/java/com/bigdata/relation/AbstractResource.java 2010-11-07 12:46:37 UTC (rev 3906) @@ -222,7 +222,7 @@ /** * Default for {@link #CHUNK_OF_CHUNKS_CAPACITY} */ - String DEFAULT_CHUNK_OF_CHUNKS_CAPACITY = "1000"; + String DEFAULT_CHUNK_OF_CHUNKS_CAPACITY = "10"; // was 1000 /** * <p> @@ -275,7 +275,7 @@ * * @todo figure out how good this value is. */ - String DEFAULT_FULLY_BUFFERED_READ_THRESHOLD = ""+20*Bytes.kilobyte32; + String DEFAULT_FULLY_BUFFERED_READ_THRESHOLD = "200";//""+20*Bytes.kilobyte32; /** * When <code>true</code> ({@value #DEFAULT_FORCE_SERIAL_EXECUTION}), Modified: branches/JOURNAL_HA_BRANCH/bigdata/src/java/com/bigdata/relation/accesspath/AbstractAccessPath.java =================================================================== --- branches/JOURNAL_HA_BRANCH/bigdata/src/java/com/bigdata/relation/accesspath/AbstractAccessPath.java 2010-11-07 12:39:08 UTC (rev 3905) +++ branches/JOURNAL_HA_BRANCH/bigdata/src/java/com/bigdata/relation/accesspath/AbstractAccessPath.java 2010-11-07 12:46:37 UTC (rev 3906) @@ -33,6 +33,7 @@ import java.util.concurrent.ExecutorService; import java.util.concurrent.Future; import java.util.concurrent.RejectedExecutionException; +import java.util.concurrent.TimeUnit; import org.apache.log4j.Logger; @@ -1027,7 +1028,7 @@ * once the elements were materialized on the client. */ final BlockingBuffer<R[]> buffer = new BlockingBuffer<R[]>( - chunkOfChunksCapacity); + chunkOfChunksCapacity,chunkCapacity,10,TimeUnit.MILLISECONDS); final ExecutorService executorService = indexManager .getExecutorService(); Modified: branches/JOURNAL_HA_BRANCH/bigdata/src/java/com/bigdata/relation/accesspath/BlockingBuffer.java =================================================================== --- branches/JOURNAL_HA_BRANCH/bigdata/src/java/com/bigdata/relation/accesspath/BlockingBuffer.java 2010-11-07 12:39:08 UTC (rev 3905) +++ branches/JOURNAL_HA_BRANCH/bigdata/src/java/com/bigdata/relation/accesspath/BlockingBuffer.java 2010-11-07 12:46:37 UTC (rev 3906) @@ -167,12 +167,14 @@ * The default capacity for the internal {@link Queue} on which elements (or * chunks of elements) are buffered. */ - public static transient final int DEFAULT_PRODUCER_QUEUE_CAPACITY = 5000; +// public static transient final int DEFAULT_PRODUCER_QUEUE_CAPACITY = 5000; + public static transient final int DEFAULT_PRODUCER_QUEUE_CAPACITY = 10; // was 5000 /** * The default minimum chunk size for the chunk combiner. */ - public static transient final int DEFAULT_MINIMUM_CHUNK_SIZE = 10000; +// public static transient final int DEFAULT_MINIMUM_CHUNK_SIZE = 10000; + public static transient final int DEFAULT_MINIMUM_CHUNK_SIZE = 100; // was 10000 /** * The default timeout in milliseconds during which chunks of elements may @@ -381,7 +383,12 @@ final int minimumChunkSize, final long chunkTimeout, final TimeUnit chunkTimeoutUnit, final boolean ordered) { - if (queue == null) + if (minimumChunkSize >= 1000 || queue.remainingCapacity() >= 1000) + log.fatal(new RuntimeException("queueCapacity=" + + queue.remainingCapacity() + ", minimumChunkSize=" + + minimumChunkSize)); + + if (queue == null) throw new IllegalArgumentException(); if (minimumChunkSize < 0) { Modified: branches/JOURNAL_HA_BRANCH/bigdata/src/java/com/bigdata/striterator/IChunkedIterator.java =================================================================== --- branches/JOURNAL_HA_BRANCH/bigdata/src/java/com/bigdata/striterator/IChunkedIterator.java 2010-11-07 12:39:08 UTC (rev 3905) +++ branches/JOURNAL_HA_BRANCH/bigdata/src/java/com/bigdata/striterator/IChunkedIterator.java 2010-11-07 12:46:37 UTC (rev 3906) @@ -59,7 +59,7 @@ /** * The default chunk size. */ - int DEFAULT_CHUNK_SIZE = 10000; + int DEFAULT_CHUNK_SIZE = 100;//was 10000; /** * The next element available from the iterator. Modified: branches/JOURNAL_HA_BRANCH/bigdata-rdf/src/java/com/bigdata/rdf/store/AbstractLocalTripleStore.java =================================================================== --- branches/JOURNAL_HA_BRANCH/bigdata-rdf/src/java/com/bigdata/rdf/store/AbstractLocalTripleStore.java 2010-11-07 12:39:08 UTC (rev 3905) +++ branches/JOURNAL_HA_BRANCH/bigdata-rdf/src/java/com/bigdata/rdf/store/AbstractLocalTripleStore.java 2010-11-07 12:46:37 UTC (rev 3906) @@ -90,7 +90,9 @@ final long nodesWritten = btreeCounters.getNodesWritten(); final long leavesWritten = btreeCounters.getLeavesWritten(); final long bytesWritten = btreeCounters.getBytesWritten(); - final long bytesPerRecord = bytesWritten/(nodesWritten+leavesWritten); + final long totalWritten = (nodesWritten + leavesWritten); + final long bytesPerRecord = totalWritten == 0 ? 0 : bytesWritten + / (nodesWritten + leavesWritten); sb.append((first ? "" : ", ") + fqn + "{nodes=" + nodesWritten + ",leaves=" + leavesWritten + ", bytes=" + bytesWritten Modified: branches/JOURNAL_HA_BRANCH/bigdata-rdf/src/java/com/bigdata/rdf/store/AbstractTripleStore.java =================================================================== --- branches/JOURNAL_HA_BRANCH/bigdata-rdf/src/java/com/bigdata/rdf/store/AbstractTripleStore.java 2010-11-07 12:39:08 UTC (rev 3905) +++ branches/JOURNAL_HA_BRANCH/bigdata-rdf/src/java/com/bigdata/rdf/store/AbstractTripleStore.java 2010-11-07 12:46:37 UTC (rev 3906) @@ -572,7 +572,7 @@ String TERM_CACHE_CAPACITY = AbstractTripleStore.class.getName() + ".termCache.capacity"; - String DEFAULT_TERM_CACHE_CAPACITY = "50000"; + String DEFAULT_TERM_CACHE_CAPACITY = "5000"; // was 50000 /** * The name of the class that will establish the pre-defined This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |