From: <tho...@us...> - 2011-06-22 20:02:20
|
Revision: 4773 http://bigdata.svn.sourceforge.net/bigdata/?rev=4773&view=rev Author: thompsonbry Date: 2011-06-22 20:02:14 +0000 (Wed, 22 Jun 2011) Log Message: ----------- Modified the full text index search to configure the ConcurrentHashMap with an initial capacity of 256 (it was using an effective value of min(maxRank,10000) which was nearly always 10000 as maxRank defaults to Integer.MAX_VALUE). Modified the full text index search to configure the ConcurrentHashMap with the exact concurrency level, which is simply the #of distinct tokens in the query document. Modified Paths: -------------- branches/QUADS_QUERY_BRANCH/bigdata/src/java/com/bigdata/search/FullTextIndex.java branches/TERMS_REFACTOR_BRANCH/bigdata/src/java/com/bigdata/search/FullTextIndex.java Modified: branches/QUADS_QUERY_BRANCH/bigdata/src/java/com/bigdata/search/FullTextIndex.java =================================================================== --- branches/QUADS_QUERY_BRANCH/bigdata/src/java/com/bigdata/search/FullTextIndex.java 2011-06-22 19:54:20 UTC (rev 4772) +++ branches/QUADS_QUERY_BRANCH/bigdata/src/java/com/bigdata/search/FullTextIndex.java 2011-06-22 20:02:14 UTC (rev 4773) @@ -1390,11 +1390,23 @@ final ConcurrentHashMap<Long/*docId*/,Hit> hits; { - // @todo use size of collection as upper bound. - final int initialCapacity = Math.min(maxRank,10000); + /* + * Note: Initial capacity COULD be set based on the max across the + * range counts of the different search terms. However, it can not + * be usefully set to the min(maxRank,10000) as we will buffer ALL + * hits in this map before pruning those selected by min/max rank. + */ + final int initialCapacity = 256;//Math.min(maxRank,10000); - hits = new ConcurrentHashMap<Long, Hit>(initialCapacity); - + /* + * Note: The actual concurrency will be the #of distinct query + * tokens. + */ + final int concurrencyLevel = qdata.distinctTermCount(); + + hits = new ConcurrentHashMap<Long, Hit>(initialCapacity, + .75f/* loadFactor */, concurrencyLevel); + } // run the queries. Modified: branches/TERMS_REFACTOR_BRANCH/bigdata/src/java/com/bigdata/search/FullTextIndex.java =================================================================== --- branches/TERMS_REFACTOR_BRANCH/bigdata/src/java/com/bigdata/search/FullTextIndex.java 2011-06-22 19:54:20 UTC (rev 4772) +++ branches/TERMS_REFACTOR_BRANCH/bigdata/src/java/com/bigdata/search/FullTextIndex.java 2011-06-22 20:02:14 UTC (rev 4773) @@ -1040,10 +1040,22 @@ final ConcurrentHashMap<V/* docId */, Hit<V>> hits; { - // @todo use size of collection as upper bound. - final int initialCapacity = Math.min(maxRank, 10000); + /* + * Note: Initial capacity COULD be set based on the max across the + * range counts of the different search terms. However, it can not + * be usefully set to the min(maxRank,10000) as we will buffer ALL + * hits in this map before pruning those selected by min/max rank. + */ + final int initialCapacity = 256;//Math.min(maxRank,10000); + + /* + * Note: The actual concurrency will be the #of distinct query + * tokens. + */ + final int concurrencyLevel = qdata.distinctTermCount(); - hits = new ConcurrentHashMap<V, Hit<V>>(initialCapacity); + hits = new ConcurrentHashMap<V, Hit<V>>(initialCapacity, + .75f/* loadFactor */, concurrencyLevel); } This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |