From: <tho...@us...> - 2011-07-07 18:45:56
|
Revision: 4857 http://bigdata.svn.sourceforge.net/bigdata/?rev=4857&view=rev Author: thompsonbry Date: 2011-07-07 18:45:49 +0000 (Thu, 07 Jul 2011) Log Message: ----------- See https://sourceforge.net/apps/trac/bigdata/ticket/349 (TermIdEncoder limiting #of RDF Values on the Journal) - done. VTE Removed 'TODO' about TermIdEncoder (already gone in the development branch). Note: the sole remaining reference to TermIdEncoder is from Term2IdWriteProc#apply(). Therefore there should not be any other assumptions in the code which could be effected by - done. Update AbstractTripleStore.Options.TERMID_BITS_TO_REVERSE to reflect that this option is only used in scale-out. - done. Update AbstractTripleStore.Options.TERMID_BITS_TO_REVERSE to no longer reflect the stealing of the sign bits as the VTE is now encoded with the flags byte of the IV rather than in the long termId. - done. Modify LexiconRelation#addTerms(), Term2IdWriteProc, and Term2IdWriteTask such that the TermIdEncoder is NOT used for standalone. This is done by forcing termIdBitsToReverse to ZERO (0) unless in scale-out and then NOT using the TermIdEncoder when termIdBitsToReverse is ZERO (0). Note that NOT using TermIdEncoder should be a NOP once it is 64-bits clean so this optimization is not strictly necessary. Modified Paths: -------------- branches/BIGDATA_RELEASE_1_0_0/bigdata-rdf/src/java/com/bigdata/rdf/internal/VTE.java branches/BIGDATA_RELEASE_1_0_0/bigdata-rdf/src/java/com/bigdata/rdf/lexicon/LexiconRelation.java branches/BIGDATA_RELEASE_1_0_0/bigdata-rdf/src/java/com/bigdata/rdf/lexicon/Term2IdWriteProc.java branches/BIGDATA_RELEASE_1_0_0/bigdata-rdf/src/java/com/bigdata/rdf/store/AbstractTripleStore.java branches/TERMS_REFACTOR_BRANCH/bigdata-rdf/src/java/com/bigdata/rdf/lexicon/LexiconRelation.java branches/TERMS_REFACTOR_BRANCH/bigdata-rdf/src/java/com/bigdata/rdf/lexicon/Term2IdWriteProc.java branches/TERMS_REFACTOR_BRANCH/bigdata-rdf/src/java/com/bigdata/rdf/store/AbstractTripleStore.java Modified: branches/BIGDATA_RELEASE_1_0_0/bigdata-rdf/src/java/com/bigdata/rdf/internal/VTE.java =================================================================== --- branches/BIGDATA_RELEASE_1_0_0/bigdata-rdf/src/java/com/bigdata/rdf/internal/VTE.java 2011-07-07 14:43:20 UTC (rev 4856) +++ branches/BIGDATA_RELEASE_1_0_0/bigdata-rdf/src/java/com/bigdata/rdf/internal/VTE.java 2011-07-07 18:45:49 UTC (rev 4857) @@ -38,12 +38,6 @@ * Value Type Enumeration (IVTE) is a class with methods for interpreting and * setting the bit flags used to identify the type of an RDF Value (URI, * Literal, Blank Node, SID, etc). - * - * @todo update {@link TermIdEncoder}. This encodes term identifiers for - * scale-out but moving some bits around. It will be simpler now that the - * term identifier is all bits in the long integer with an additional byte - * prefix to differentiate URI vs Literal vs BNode vs SID and to indicate - * the inline value type (termId vs everything else). */ public enum VTE { Modified: branches/BIGDATA_RELEASE_1_0_0/bigdata-rdf/src/java/com/bigdata/rdf/lexicon/LexiconRelation.java =================================================================== --- branches/BIGDATA_RELEASE_1_0_0/bigdata-rdf/src/java/com/bigdata/rdf/lexicon/LexiconRelation.java 2011-07-07 14:43:20 UTC (rev 4856) +++ branches/BIGDATA_RELEASE_1_0_0/bigdata-rdf/src/java/com/bigdata/rdf/lexicon/LexiconRelation.java 2011-07-07 18:45:49 UTC (rev 4857) @@ -278,34 +278,33 @@ AbstractTripleStore.Options.STORE_BLANK_NODES, AbstractTripleStore.Options.DEFAULT_STORE_BLANK_NODES)); + { - final String defaultValue; if (indexManager instanceof IBigdataFederation<?> && ((IBigdataFederation<?>) indexManager).isScaleOut()) { - defaultValue = AbstractTripleStore.Options.DEFAULT_TERMID_BITS_TO_REVERSE; + final String defaultValue = AbstractTripleStore.Options.DEFAULT_TERMID_BITS_TO_REVERSE; - } else { + termIdBitsToReverse = Integer.parseInt(getProperty( + AbstractTripleStore.Options.TERMID_BITS_TO_REVERSE, + defaultValue)); - // false unless this is a scale-out deployment. - defaultValue = "0"; + if (termIdBitsToReverse < 0 || termIdBitsToReverse > 31) { - } + throw new IllegalArgumentException( + AbstractTripleStore.Options.TERMID_BITS_TO_REVERSE + + "=" + termIdBitsToReverse); - termIdBitsToReverse = Integer - .parseInt(getProperty( - AbstractTripleStore.Options.TERMID_BITS_TO_REVERSE, - defaultValue)); - - if (termIdBitsToReverse < 0 || termIdBitsToReverse > 31) { + } - throw new IllegalArgumentException( - AbstractTripleStore.Options.TERMID_BITS_TO_REVERSE - + "=" + termIdBitsToReverse); - + } else { + + // Note: Not used in standalone. + termIdBitsToReverse = 0; + } - + } { Modified: branches/BIGDATA_RELEASE_1_0_0/bigdata-rdf/src/java/com/bigdata/rdf/lexicon/Term2IdWriteProc.java =================================================================== --- branches/BIGDATA_RELEASE_1_0_0/bigdata-rdf/src/java/com/bigdata/rdf/lexicon/Term2IdWriteProc.java 2011-07-07 14:43:20 UTC (rev 4856) +++ branches/BIGDATA_RELEASE_1_0_0/bigdata-rdf/src/java/com/bigdata/rdf/lexicon/Term2IdWriteProc.java 2011-07-07 18:45:49 UTC (rev 4857) @@ -288,8 +288,9 @@ // used to serialize term identifiers. final DataOutputBuffer idbuf = new DataOutputBuffer(); - final TermIdEncoder encoder = readOnly ? null : new TermIdEncoder( - scaleOutTermIdBitsToReverse); + final TermIdEncoder encoder = readOnly ? null + : scaleOutTermIdBitsToReverse == 0 ? null : new TermIdEncoder( + scaleOutTermIdBitsToReverse); // #of new terms (#of writes on the index). int nnew = 0; @@ -322,11 +323,18 @@ ivs[i] = null; } else { + + /* + * Assign a term identifier. + * + * Note: The TermIdEncoder is ONLY used in scale-out. + */ - // assign a term identifier. - final long termId = encoder.encode( - counter.incrementAndGet()); + final long ctr = counter.incrementAndGet(); + final long termId = encoder == null ? ctr : encoder + .encode(ctr); + ivs[i] = new TermId(VTE(code), termId); } @@ -354,11 +362,18 @@ } else { - // assign a term identifier. - final long termId = encoder.encode( - counter.incrementAndGet()); + /* + * Assign a term identifier. + * + * Note: The TermIdEncoder is ONLY used in scale-out. + */ + + final long ctr = counter.incrementAndGet(); + + final long termId = encoder == null ? ctr : encoder + .encode(ctr); - final TermId iv = new TermId(VTE(code), termId); + final TermId<?> iv = new TermId(VTE(code), termId); if (DEBUG && enableGroundTruth) { Modified: branches/BIGDATA_RELEASE_1_0_0/bigdata-rdf/src/java/com/bigdata/rdf/store/AbstractTripleStore.java =================================================================== --- branches/BIGDATA_RELEASE_1_0_0/bigdata-rdf/src/java/com/bigdata/rdf/store/AbstractTripleStore.java 2011-07-07 14:43:20 UTC (rev 4856) +++ branches/BIGDATA_RELEASE_1_0_0/bigdata-rdf/src/java/com/bigdata/rdf/store/AbstractTripleStore.java 2011-07-07 18:45:49 UTC (rev 4857) @@ -99,6 +99,7 @@ import com.bigdata.rdf.lexicon.ITermIndexCodes; import com.bigdata.rdf.lexicon.ITextIndexer; import com.bigdata.rdf.lexicon.LexiconRelation; +import com.bigdata.rdf.lexicon.TermIdEncoder; import com.bigdata.rdf.model.BigdataResource; import com.bigdata.rdf.model.BigdataStatement; import com.bigdata.rdf.model.BigdataURI; @@ -504,19 +505,16 @@ * Option effects how evenly distributed the assigned term identifiers * which has a pronounced effect on the ID2TERM and statement indices * for <em>scale-out deployments</em>. The default for a scale-out - * deployment is {@value #DEFAULT_TERMID_BITS_TO_REVERSE}, but the - * default for a scale-up deployment is ZERO(0). + * deployment is {@value #DEFAULT_TERMID_BITS_TO_REVERSE}. This option + * is ignored for a standalone deployment. * <p> * For the scale-out triple store, the term identifiers are formed by * placing the index partition identifier in the high word and the local - * counter for the index partition into the low word. In addition, the - * sign bit is "stolen" from each value such that the low two bits are - * left open for bit flags which encode the type (URI, Literal, BNode or - * SID) of the term. The effect of this option is to cause the low N - * bits of the local counter value to be reversed and written into the - * high N bits of the term identifier (the other bits are shifted down - * to make room for this). Regardless of the configured value for this - * option, all bits (except the sign bit) of the both the partition + * counter for the index partition into the low word. The effect of this + * option is to cause the low N bits of the local counter value to be + * reversed and written into the high N bits of the term identifier (the + * other bits are shifted down to make room for this). Regardless of the + * configured value for this option, all bits of the both the partition * identifier and the local counter are preserved. * <p> * Normally, the low bits of a sequential counter will vary the most @@ -524,11 +522,11 @@ * reversed bits into the high bits of the term identifier we cause the * term identifiers to be uniformly (but not randomly) distributed. This * is much like using hash function without collisions or a random - * number generator that does not produce duplicates. When ZERO (0) no - * bits are reversed so the high bits of the term identifiers directly - * reflect the partition identifier and the low bits are assigned - * sequentially by the local counter within each TERM2ID index - * partition. + * number generator that does not produce duplicates. When the value of + * this option is ZERO (0), no bits are reversed so the high bits of the + * term identifiers directly reflect the partition identifier and the + * low bits are assigned sequentially by the local counter within each + * TERM2ID index partition. * <p> * The use of a non-zero value for this option can easily cause the * write load on the index partitions for the ID2TERM and statement @@ -557,6 +555,8 @@ * knowledge base. If you estimate that you will have 50 x 200M index * partitions for the statement indices, then SQRT(50) =~ 7 would be a * good choice. + * + * @see TermIdEncoder */ String TERMID_BITS_TO_REVERSE = (AbstractTripleStore.class.getName() + ".termIdBitsToReverse") .intern(); Modified: branches/TERMS_REFACTOR_BRANCH/bigdata-rdf/src/java/com/bigdata/rdf/lexicon/LexiconRelation.java =================================================================== --- branches/TERMS_REFACTOR_BRANCH/bigdata-rdf/src/java/com/bigdata/rdf/lexicon/LexiconRelation.java 2011-07-07 14:43:20 UTC (rev 4856) +++ branches/TERMS_REFACTOR_BRANCH/bigdata-rdf/src/java/com/bigdata/rdf/lexicon/LexiconRelation.java 2011-07-07 18:45:49 UTC (rev 4857) @@ -288,32 +288,30 @@ { - final String defaultValue; if (indexManager instanceof IBigdataFederation<?> && ((IBigdataFederation<?>) indexManager).isScaleOut()) { - defaultValue = AbstractTripleStore.Options.DEFAULT_TERMID_BITS_TO_REVERSE; + final String defaultValue = AbstractTripleStore.Options.DEFAULT_TERMID_BITS_TO_REVERSE; - } else { + termIdBitsToReverse = Integer.parseInt(getProperty( + AbstractTripleStore.Options.TERMID_BITS_TO_REVERSE, + defaultValue)); - // false unless this is a scale-out deployment. - defaultValue = "0"; + if (termIdBitsToReverse < 0 || termIdBitsToReverse > 31) { - } + throw new IllegalArgumentException( + AbstractTripleStore.Options.TERMID_BITS_TO_REVERSE + + "=" + termIdBitsToReverse); - termIdBitsToReverse = Integer - .parseInt(getProperty( - AbstractTripleStore.Options.TERMID_BITS_TO_REVERSE, - defaultValue)); - - if (termIdBitsToReverse < 0 || termIdBitsToReverse > 31) { + } - throw new IllegalArgumentException( - AbstractTripleStore.Options.TERMID_BITS_TO_REVERSE - + "=" + termIdBitsToReverse); - + } else { + + // Note: Not used in standalone. + termIdBitsToReverse = 0; + } - + } { Modified: branches/TERMS_REFACTOR_BRANCH/bigdata-rdf/src/java/com/bigdata/rdf/lexicon/Term2IdWriteProc.java =================================================================== --- branches/TERMS_REFACTOR_BRANCH/bigdata-rdf/src/java/com/bigdata/rdf/lexicon/Term2IdWriteProc.java 2011-07-07 14:43:20 UTC (rev 4856) +++ branches/TERMS_REFACTOR_BRANCH/bigdata-rdf/src/java/com/bigdata/rdf/lexicon/Term2IdWriteProc.java 2011-07-07 18:45:49 UTC (rev 4857) @@ -289,8 +289,9 @@ // used to serialize term identifiers. final DataOutputBuffer idbuf = new DataOutputBuffer(); - final TermIdEncoder encoder = readOnly ? null : new TermIdEncoder( - scaleOutTermIdBitsToReverse); + final TermIdEncoder encoder = readOnly ? null + : scaleOutTermIdBitsToReverse == 0 ? null : new TermIdEncoder( + scaleOutTermIdBitsToReverse); // final DataOutputBuffer kbuf = new DataOutputBuffer(128); @@ -328,10 +329,17 @@ } else { - // assign a term identifier. - final long termId = encoder.encode( - counter.incrementAndGet()); + /* + * Assign a term identifier. + * + * Note: The TermIdEncoder is ONLY used in scale-out. + */ + final long ctr = counter.incrementAndGet(); + + final long termId = encoder == null ? ctr : encoder + .encode(ctr); + ivs[i] = new TermId(VTE(code), termId); } @@ -359,10 +367,18 @@ } else { - // assign a term identifier. - final long termId = encoder.encode( - counter.incrementAndGet()); + /* + * Assign a term identifier. + * + * Note: The TermIdEncoder is ONLY used in scale-out. + */ + + final long ctr = counter.incrementAndGet(); + + final long termId = encoder == null ? ctr : encoder + .encode(ctr); + @SuppressWarnings("unchecked") final TermId<?> iv = new TermId(VTE(code), termId); if (DEBUG && enableGroundTruth) { Modified: branches/TERMS_REFACTOR_BRANCH/bigdata-rdf/src/java/com/bigdata/rdf/store/AbstractTripleStore.java =================================================================== --- branches/TERMS_REFACTOR_BRANCH/bigdata-rdf/src/java/com/bigdata/rdf/store/AbstractTripleStore.java 2011-07-07 14:43:20 UTC (rev 4856) +++ branches/TERMS_REFACTOR_BRANCH/bigdata-rdf/src/java/com/bigdata/rdf/store/AbstractTripleStore.java 2011-07-07 18:45:49 UTC (rev 4857) @@ -102,6 +102,7 @@ import com.bigdata.rdf.lexicon.ITextIndexer; import com.bigdata.rdf.lexicon.LexiconKeyOrder; import com.bigdata.rdf.lexicon.LexiconRelation; +import com.bigdata.rdf.lexicon.TermIdEncoder; import com.bigdata.rdf.model.BigdataResource; import com.bigdata.rdf.model.BigdataStatement; import com.bigdata.rdf.model.BigdataURI; @@ -484,19 +485,16 @@ * Option effects how evenly distributed the assigned term identifiers * which has a pronounced effect on the ID2TERM and statement indices * for <em>scale-out deployments</em>. The default for a scale-out - * deployment is {@value #DEFAULT_TERMID_BITS_TO_REVERSE}, but the - * default for a scale-up deployment is ZERO(0). + * deployment is {@value #DEFAULT_TERMID_BITS_TO_REVERSE}. This option + * is ignored for a standalone deployment. * <p> * For the scale-out triple store, the term identifiers are formed by * placing the index partition identifier in the high word and the local - * counter for the index partition into the low word. In addition, the - * sign bit is "stolen" from each value such that the low two bits are - * left open for bit flags which encode the type (URI, Literal, BNode or - * SID) of the term. The effect of this option is to cause the low N - * bits of the local counter value to be reversed and written into the - * high N bits of the term identifier (the other bits are shifted down - * to make room for this). Regardless of the configured value for this - * option, all bits (except the sign bit) of the both the partition + * counter for the index partition into the low word. The effect of this + * option is to cause the low N bits of the local counter value to be + * reversed and written into the high N bits of the term identifier (the + * other bits are shifted down to make room for this). Regardless of the + * configured value for this option, all bits of the both the partition * identifier and the local counter are preserved. * <p> * Normally, the low bits of a sequential counter will vary the most @@ -504,11 +502,11 @@ * reversed bits into the high bits of the term identifier we cause the * term identifiers to be uniformly (but not randomly) distributed. This * is much like using hash function without collisions or a random - * number generator that does not produce duplicates. When ZERO (0) no - * bits are reversed so the high bits of the term identifiers directly - * reflect the partition identifier and the low bits are assigned - * sequentially by the local counter within each TERM2ID index - * partition. + * number generator that does not produce duplicates. When the value of + * this option is ZERO (0), no bits are reversed so the high bits of the + * term identifiers directly reflect the partition identifier and the + * low bits are assigned sequentially by the local counter within each + * TERM2ID index partition. * <p> * The use of a non-zero value for this option can easily cause the * write load on the index partitions for the ID2TERM and statement @@ -537,6 +535,8 @@ * knowledge base. If you estimate that you will have 50 x 200M index * partitions for the statement indices, then SQRT(50) =~ 7 would be a * good choice. + * + * @see TermIdEncoder */ String TERMID_BITS_TO_REVERSE = AbstractTripleStore.class.getName() + ".termIdBitsToReverse"; This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |