[Bigdata-commit] SF.net SVN: bigdata:[4528] branches/QUADS_QUERY_BRANCH/bigdata/src

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Revision: 4528
          http://bigdata.svn.sourceforge.net/bigdata/?rev=4528&view=rev
Author:   thompsonbry
Date:     2011-05-19 20:03:02 +0000 (Thu, 19 May 2011)

Log Message:
-----------
Broke out unit tests for splitting a directory page when the bucket page beneath it has the same depth as the directory page and introduced an argument (splitBits) to generalize the #of bits to be used in that split (rather than assuming ONE or [addressBits]).

Modified Paths:
--------------
    branches/QUADS_QUERY_BRANCH/bigdata/src/java/com/bigdata/htree/HTree.java
    branches/QUADS_QUERY_BRANCH/bigdata/src/test/com/bigdata/htree/TestHTree.java

Modified: branches/QUADS_QUERY_BRANCH/bigdata/src/java/com/bigdata/htree/HTree.java
===================================================================

--- branches/QUADS_QUERY_BRANCH/bigdata/src/java/com/bigdata/htree/HTree.java	2011-05-19 17:25:26 UTC (rev 4527)
+++ branches/QUADS_QUERY_BRANCH/bigdata/src/java/com/bigdata/htree/HTree.java	2011-05-19 20:03:02 UTC (rev 4528)
@@ -89,6 +89,14 @@
 
     private static final transient Logger log = Logger.getLogger(HTree.class);
 
+    /**
+     * The #of bits of distinction to be made each time we split a directory
+     * page in the {@link HTree}. See
+     * {@link #splitDirectoryPage(DirectoryPage, int, AbstractPage)} for a write
+     * up on this.
+     */
+    private final int splitBits;
+    
     /*
      * metadata about the index.
      * 
@@ -180,7 +188,9 @@
 
 //    	super(store, nodeFactory, readOnly, addressBits, metadata, recordCompressorFactory);
 		super(store, false/*readOnly*/, addressBits);
-    	
+
+		this.splitBits = addressBits;
+		
 //        if (pageSize <= 0)
 //            throw new IllegalArgumentException("pageSize must be positive.");
 //
@@ -535,9 +545,9 @@
 						 * global depth of a directory is always the same as
 						 * address bits, but maybe I am missing something....
 						 */
-						addDirectoryPageAndSplitBucketPage(current,
-								buddyOffset, bucketPage);
-						
+                        splitDirectoryPage(current, buddyOffset, splitBits,
+                                bucketPage);
+
 						// The children of [current] have changed so we will
 						// search current again.
 						continue;
@@ -944,10 +954,103 @@
 	}
 
     /**
-     * Introduce a new directory level when we need to split a child but
+     * Splits a {@link DirectoryPage} when we need to split a child but
      * <code>globalDepth == localDepth</code>. The caller must retry the insert
      * after this method makes the structural change.
      * 
+     * <h2>Design discussion</h2>
+     * 
+     * This method must maintain the invariant that the tree of page references
+     * for the hash tree is a strict tree. That is, you can not have two
+     * different pages each of which points to the same child. This would be a
+     * concurrency nightmare as, e.g., splitting the child could require us to
+     * propagate updates to multiple parents. However, even with that constraint
+     * we have two options.
+     * <p>
+     * Given addressBits := 2, the following hash tree state is induced by
+     * inserting the key sequence (0x01, 0x02, 0x03, 0x04).
+     * 
+     * <pre>
+     * root := [2] (a,c,b,b)
+     * a    := [2]   (1,2,3,4)
+     * c    := [2]   (-,-,-,-)
+     * b    := [1]   (-,-;-,-)
+     * </pre>
+     * 
+     * where [x] is the depth of the buddies on the corresponding page and ";"
+     * indicates a buddy bucket boundary while "," indicates a tuple boundary.
+     * <p>
+     * If we then attempt to insert a key which would be directed into (a)
+     * 
+     * <pre>
+     * insert(0x20,...)
+     * </pre>
+     * 
+     * then we must split (a) since depth(a):=2 and depth(root):=2. This will
+     * introduce a new directory page (d).
+     * 
+     * <h3>depth(d) := 1</h3>
+     * 
+     * This gives us the following post-condition.
+     * 
+     * <pre>
+     * root := [2] (d,d,b,b)
+     * d    := [1]   (a,a;c,c)   // two ptrs to (d) so 2 buddies on the page
+     * a    := [1]     (1,2;3,4) // depth changes since now 2 ptrs to (a)
+     * c    := [1]     (-,-;-,-) // depth changes since now 2 ptrs to (c)
+     * b    := [1]   (-,-;-,-)
+     * </pre>
+     * 
+     * Regardless of the value of [addressBits], this design gives us
+     * [addressBits] buddies on (d) and each buddy has two slots (since the
+     * depth of (d) is ONE, each buddy on (d) has a one bit address space and
+     * hence uses two slots). The depth(a) will always be reset to ONE by this
+     * design since there will always be TWO pointers to (a) in (d). This design
+     * provides ONE (1) bit of additional distinctions along the path for which
+     * we have exhausted the hash tree address space.
+     * 
+     * <h3>depth(d) := addressBits</h3>
+     * 
+     * This gives us the following post-condition.
+     * 
+     * <pre>
+     * root := [2] (a,c,b,b)
+     * d    := [2]   (a,a,a,a)   // one ptr to (d) so 1 buddy on the page
+     * a    := [0]     (1;2;3;4) // depth changes since now 4 ptrs to (a).
+     * c    := [2]   (-,-,-,-)
+     * b    := [1]   (-,-;-,-)
+     * </pre>
+     * 
+     * In this design, we always wind up with ONE buddy on (d), the depth(d) is
+     * [addressBits], and the depth(a) is reset to ZERO(0). This design focuses
+     * the expansion in the address space of the hash tree narrowly on the
+     * specific key prefix for which we have run out of distinctions and gives
+     * us [addressBits] of additional distinctions along that path.
+     * 
+     * <h3>Conclusion</h3>
+     * 
+     * Both designs would appear to be valid. Neither one can lead to a
+     * situation in which we have multiple parents for a child. In the first
+     * design, the one-bit expansion means that we never have pointers to the
+     * same child in more than one buddy bucket, and hence they will all be on
+     * the same page. In the second design, the depth of the new directory page
+     * is already at the maximum possible value so it can not be split again and
+     * thus the pointers to the child will always remain on the same page.
+     * <p>
+     * It seems that the first design has the advantage of growing the #of
+     * distinctions more slowly and sharing the new directory page among
+     * multiple such distinctions (all keys having the same leading bit). In the
+     * second design, we add a full [addressBits] at once to keys having the
+     * same [addressBits] leading bits).
+     * <p>
+     * It would appear that any choice in the inclusive range (1:addressBits) is
+     * permissible as in all cases the pointers to (a) will lie within a single
+     * buddy bucket. By factoring out the #of additional bits of distinction to
+     * be made when we split a directory page, we can defer this design either
+     * to construction time (or perhaps even to runtime) decision. I have
+     * therefore introduced an additional parameter on the {@link HTree} for
+     * this purpose.
+     * 
      * @param oldParent
      *            The parent {@link DirectoryPage}.
      * @param buddyOffset
@@ -956,10 +1059,15 @@
      *            updated such that it points to both the original child and new
      *            child.
      * @param child
-     *            The child.
+     *            The child, which can be a {@link DirectoryPage} or a
+     *            {@link BucketPage}.
      * 
      * @throws IllegalArgumentException
      *             if any argument is <code>null</code>.
+     * @throws IllegalArgumentException
+     *             if <i>splitBits</i> is non-positive.
+     * @throws IllegalArgumentException
+     *             if <i>splitBits</i> is GT {@link #getAddressBits()}.
      * @throws IllegalStateException
      *             if the depth of the child is GTE the depth of the parent.
      * @throws IllegalStateException
@@ -970,8 +1078,9 @@
      *             if the parent of the <oldBucket</i> is not the given
      *             <i>parent</i>.
      */
-	private void addDirectoryPageAndSplitBucketPage(final DirectoryPage oldParent,
-			final int buddyOffset, final AbstractPage child) {
+    // Note: package private for unit tests.
+    void splitDirectoryPage(final DirectoryPage oldParent,
+            final int buddyOffset, final int splitBits, final AbstractPage child) {
         if (oldParent == null)
             throw new IllegalArgumentException();
         if (child == null)
@@ -995,6 +1104,10 @@
              */
             throw new IllegalArgumentException();
         }
+        if (splitBits <= 0)
+            throw new IllegalArgumentException();
+        if (splitBits > addressBits)
+            throw new IllegalArgumentException();
         if (oldParent.isReadOnly()) // must be mutable.
             throw new IllegalStateException();
         if (child.isReadOnly()) // must be mutable.
@@ -1006,8 +1119,10 @@
             log.debug("parent=" + oldParent.toShortString() + ", buddyOffset="
                     + buddyOffset + ", child=" + child);
 
-        // Allocate a new directory page. The global depth will be ONE (1).
-        final DirectoryPage newParent = new DirectoryPage(this, 1/* globalDepth */);
+        assert splitBits == addressBits : "FIXME Handle splitBits NE addressBits.";
+        
+        // Allocate a new directory page. .
+        final DirectoryPage newParent = new DirectoryPage(this, splitBits/* globalDepth */);
 
         // Set the parent Reference on the new dir page to the old dir page.
         newParent.parent = (Reference) oldParent.self;

Modified: branches/QUADS_QUERY_BRANCH/bigdata/src/test/com/bigdata/htree/TestHTree.java
===================================================================
--- branches/QUADS_QUERY_BRANCH/bigdata/src/test/com/bigdata/htree/TestHTree.java	2011-05-19 17:25:26 UTC (rev 4527)
+++ branches/QUADS_QUERY_BRANCH/bigdata/src/test/com/bigdata/htree/TestHTree.java	2011-05-19 20:03:02 UTC (rev 4528)
@@ -405,6 +405,7 @@
              * before the insert could succeed. Choosing 0x20 thus minimizes the
              * #of unobserved intermediate states of the hash tree.
              * 
+             * 
              * FIXME This is causing a repeated introduction of a new directory
              * level and giving me a hash tree structure which differs from my
              * expectations. Work through the example on paper. Make sure that
@@ -420,194 +421,265 @@
              * 
              * TODO Do an alternative example where we insert 0x05 at this step
              * instead of 0x10.
-             */
-            htree.insert(k20, v20);
-            assertEquals("nnodes", 2, htree.getNodeCount());
-            assertEquals("nleaves", 4, htree.getLeafCount());
-            assertEquals("nentries", 5, htree.getEntryCount());
-            htree.dump(Level.ALL, System.err, true/* materialize */);
-            assertTrue(root == htree.getRoot());
-            assertEquals(4, root.childRefs.length);
-            final DirectoryPage d = (DirectoryPage) root.childRefs[0].get();
-            assertTrue(d == (DirectoryPage) root.childRefs[0].get());
-            assertTrue(c == (BucketPage) root.childRefs[1].get());
-            assertTrue(b == (BucketPage) root.childRefs[2].get());
-            assertTrue(b == (BucketPage) root.childRefs[3].get());
-            assertEquals(4, d.childRefs.length);
-            final BucketPage e = (BucketPage) d.childRefs[1].get();
-            assertTrue(a == (BucketPage) d.childRefs[0].get()); // FIXME This shows that the buddy references do not have to be contiguous!
-            assertTrue(e == (BucketPage) d.childRefs[1].get());
-            assertTrue(a == (BucketPage) d.childRefs[2].get());
-            assertTrue(a == (BucketPage) d.childRefs[3].get());
-            assertEquals(2, root.getGlobalDepth());
-            assertEquals(1, d.getGlobalDepth());
-            assertEquals(1, a.getGlobalDepth());// recomputed!
-            assertEquals(1, e.getGlobalDepth());// same as [a] (recomputed).
-            assertEquals(1, b.getGlobalDepth());// unchanged.
-            assertEquals(2, c.getGlobalDepth());// recomputed!
-            assertEquals(2, a.getKeyCount());
-            assertEquals(2, a.getValueCount());
-            assertEquals(3, e.getKeyCount());
-            assertEquals(3, e.getValueCount());
-            assertEquals(0, b.getKeyCount());
-            assertEquals(0, b.getValueCount());
-            assertEquals(0, c.getKeyCount());
-            assertEquals(0, c.getValueCount());
-            assertTrue(htree.contains(k1));
-            assertTrue(htree.contains(k2));
-            assertTrue(htree.contains(k3));
-            assertTrue(htree.contains(k4));
-            assertTrue(htree.contains(k20));
-            assertFalse(htree.contains(unused));
-            assertEquals(v1, htree.lookupFirst(k1));
-            assertEquals(v2, htree.lookupFirst(k2));
-            assertEquals(v3, htree.lookupFirst(k3));
-            assertEquals(v4, htree.lookupFirst(k4));
-            assertEquals(v20, htree.lookupFirst(k20));
-            assertNull(htree.lookupFirst(unused));
-            AbstractBTreeTestCase.assertSameIterator(new byte[][] { v1 }, htree
-                    .lookupAll(k1));
-            AbstractBTreeTestCase.assertSameIterator(new byte[][] { v2 }, htree
-                    .lookupAll(k2));
-            AbstractBTreeTestCase.assertSameIterator(new byte[][] { v3 }, htree
-                    .lookupAll(k3));
-            AbstractBTreeTestCase.assertSameIterator(new byte[][] { v4 }, htree
-                    .lookupAll(k4));
-            AbstractBTreeTestCase.assertSameIterator(new byte[][] { v20 }, htree
-                    .lookupAll(k20));
-            AbstractBTreeTestCase.assertSameIterator(new byte[][] {}, htree
-                    .lookupAll(unused));
-
-            /* FIXME Update comments to reflect the example (and update the
-             * worksheet as well).
              * 
-             * FIXME We MUST NOT decrease the localDepth of (a) since that would
-             * place duplicate keys into different buckets (lost retrival).
-             * Since (a) can not have its localDepth decreased, we need to have
-             * localDepth(d=2) with one pointer to (a) to get localDepth(a:=2).
-             * That makes this a very complicated example. It would be a lot
-             * easier if we started with distinct keys such that the keys in (a)
-             * could be redistributed. This case should be reserved for later
-             * once the structural mutations are better understood.
-             * 
-             * 5. Insert 0x20. This key is directed into the same buddy bucket
-             * as the 0x01 keys that we have been inserting. That buddy bucket
-             * is already full and, further, it is the only buddy bucket on the
-             * page. Since we are not inserting a duplicate key we can split the
-             * buddy bucket (rather than doubling the size of the bucket).
-             * However, since global depth == local depth (i.e., only one buddy
-             * bucket on the page), this split will introduce a new directory
-             * page. The new directory page basically codes for the additional
-             * prefix bits required to differentiate the two distinct keys such
-             * that they are directed into the appropriate buckets.
-             * 
-             * The new directory page (d) is inserted one level below the root
-             * on the path leading to (a). The new directory must have a local
-             * depth of ONE (1), since it will add a one bit distinction. Since
-             * we know the global depth of the root and the local depth of the
-             * new directory page, we solve for npointers := 1 << (globalDepth -
-             * localDepth). This tells us that we will have 2 pointers in the
-             * root to the new directory page.
-             * 
-             * The precondition state of root is {a,c,b,b}. Since we know that
-             * we need npointers:=2 pointers to (d), this means that we will
-             * copy the {a,c} references into (d) and replace those references
-             * in the root with references to (d). The root is now {d,d,b,b}. d
-             * is now {a,a;c,c}. Since d has a local depth of 1 and address bits
-             * of 2, it is comprised of two buddy hash tables {a,a} and {c,c}.
-             * 
-             * Linking in (d) has also changes the local depths of (a) and (c).
-             * Since they each now have npointers:=2, their localDepth is has
-             * been reduced from TWO (2) to ONE (1) (and their transient cached
-             * depth values must be either invalidated or recomputed). Note that
-             * the local depth of (d) and its children (a,c)) are ONE after this
-             * operation so if we force another split in (a) that will force a
-             * split in (d).
-             * 
-             * Having introduced a new directory page into the hash tree, we now
-             * retry the insert. Once again, the insert is directed into (a).
-             * Since (a) is still full it is split. (d) is now the parent of
-             * (a). Once again, we have globalDepth(d=1)==localDepth(a=1) so we
-             * need to split (d). However, since localDepth(d=1) is less than
-             * globalDepth(root=2) we can split the buddy hash tables in (d).
-             * This will require us to allocate a new page (f) which will be the
-             * right sibling of (d). There are TWO (2) buddy hash tables in (d).
-             * They are now redistributed between (d) and (f). We also have to
-             * update the pointers to (d) in the parent such that 1/2 of them
-             * point to the new right sibling of (d). Since we have changed the
-             * #of pointers to (d) (from 2 to 1) the local depth of (d) (and of
-             * f) is now TWO (2). Since globalDepth(d=2) is greater than
-             * localDepth(a=1) we can now split (a) into (a,e), redistribute the
-             * tuples in the sole buddy page (a) between (a,e) and update the
-             * pointers in (d) to (a,a,e,e).
-             * 
-             * Having split (d) into (d,f), we now retry the insert. This time
-             * the insert is directed into (e). There is room in (e) (it is
-             * empty) and the tuple is inserted without further mutation to the
-             * structure of the hash tree.
-             * 
-             * TODO At this point we should also prune the buckets [b] and [c]
-             * since they are empty and replace them with null references.
-             * 
-             * TODO The #of new directory levels which have to be introduced
-             * here is a function of the #of prefix bits which have to be
-             * consumed before a distinction can be made between the existing
-             * key (0x01) and the new key (0x20). With an address space of 2
-             * bits, each directory level examines the next 2-bits of the key.
-             * The key (x20) was chosen since the distinction can be made by
-             * adding only one directory level (the keys differ in the first 4
-             * bits). [Do an alternative example which requires recursive splits
-             * in order to verify that we reenter the logic correctly each time.
-             * E.g., by inserting 0x02 rather than 0x20.]
-             * 
-             * TODO Do an example in which we explore an insert which introduces
-             * a new directory level in a 3-level tree. This should be a driven
-             * by a single insert so we can examine in depth how the new
-             * directory is introduced and verify whether it is introduced below
-             * the root or above the bucket. [I believe that it is introduced
-             * immediately below below the root. Note that a balanced tree,
-             * e.g., a B-Tree, introduces the new level above the root. However,
-             * the HTree is intended to be unbalanced in order to optimize
-             * storage and access times to the parts of the index which
-             * correspond to unequal distributions in the hash codes.]
+             * TODO This version tunnels to package private methods in order to
+             * test the intermediate states: (A) Do an alternative version with
+             * a different value for [splitBits]; (B) Do an alternative using
+             * htree.insert() (high level API); (C) Do low level tests of the
+             * BucketPage insert/lookup API.
              */
-//          htree.insert(new byte[] { 0x20 }, new byte[] { 0x20 });
-//          assertEquals("nnodes", 2, htree.getNodeCount());
-//          assertEquals("nleaves", 4, htree.getLeafCount());
-//          assertEquals("nentries", 5, htree.getEntryCount());
-//          htree.dump(Level.ALL, System.err, true/* materialize */);
-//          assertTrue(root == htree.getRoot());
-//          assertEquals(4, root.childRefs.length);
-//          final DirectoryPage d = (DirectoryPage)root.childRefs[0].get();
-//          assertTrue(d == (DirectoryPage) root.childRefs[0].get());
-//          assertTrue(d == (DirectoryPage) root.childRefs[1].get());
-//          assertTrue(b == (BucketPage) root.childRefs[2].get());
-//          assertTrue(b == (BucketPage) root.childRefs[3].get());
-//          assertEquals(4, d.childRefs.length);
-//          final BucketPage e = (BucketPage)d.childRefs[1].get();
-//          assertTrue(a == (BucketPage) d.childRefs[0].get());
-//          assertTrue(e == (BucketPage) d.childRefs[1].get());
-//          assertTrue(c == (BucketPage) d.childRefs[2].get());
-//          assertTrue(c == (BucketPage) d.childRefs[3].get());
-//          assertEquals(2, root.getGlobalDepth());
-//          assertEquals(2, d.getGlobalDepth());
-//          assertEquals(2, a.getGlobalDepth());// unchanged
-//          assertEquals(2, e.getGlobalDepth());// same as [a].
-//          assertEquals(1, b.getGlobalDepth());// unchanged.
-//          assertEquals(2, c.getGlobalDepth());// unchanged.
-//          assertTrue(htree.contains(new byte[] { 0x01 }));
-//          assertFalse(htree.contains(new byte[] { 0x02 }));
-//          assertEquals(new byte[] { 0x01 }, htree
-//                  .lookupFirst(new byte[] { 0x01 }));
-//          assertNull(htree.lookupFirst(new byte[] { 0x02 }));
-//          AbstractBTreeTestCase.assertSameIterator(
-//          //
-//                  new byte[][] { new byte[] { 0x01 }, new byte[] { 0x01 },
-//                          new byte[] { 0x01 }, new byte[] { 0x01 } }, htree
-//                          .lookupAll(new byte[] { 0x01 }));
-//          AbstractBTreeTestCase.assertSameIterator(//
-//                  new byte[][] {}, htree.lookupAll(new byte[] { 0x02 }));
+//            // htree.insert(k20, v20); 
+//            // verify that [a] will not accept an insert.
+//            assertFalse(a.insert(k20, v20, root/* parent */, 0/* buddyOffset */));
+//            // split the root directory page.
+//            htree.splitDirectoryPage(root/* oldParent */,
+//                            0/* buddyOffset */, 2/*splitBits*/, a/* child */);
+//            assertEquals("nnodes", 2, htree.getNodeCount());
+//            assertEquals("nleaves", 3, htree.getLeafCount()); // unchange
+//            assertEquals("nentries", 4, htree.getEntryCount()); // unchanged
+//            htree.dump(Level.ALL, System.err, true/* materialize */);
+//            assertTrue(root == htree.getRoot());
+//            assertEquals(4, root.childRefs.length);
+//            final DirectoryPage d = (DirectoryPage) root.childRefs[0].get();
+//            assertTrue(d == (DirectoryPage) root.childRefs[0].get());
+//            assertTrue(d == (DirectoryPage) root.childRefs[1].get());
+//            assertTrue(b == (BucketPage) root.childRefs[2].get());
+//            assertTrue(b == (BucketPage) root.childRefs[3].get());
+//            assertEquals(4, d.childRefs.length);
+////            final BucketPage e = (BucketPage) d.childRefs[1].get();
+//            assertTrue(a == (BucketPage) d.childRefs[0].get());
+//            assertTrue(a == (BucketPage) d.childRefs[1].get());
+//            assertTrue(c == (BucketPage) d.childRefs[2].get());
+//            assertTrue(c == (BucketPage) d.childRefs[3].get());
+//            assertEquals(2, root.getGlobalDepth());
+//            assertEquals(1, d.getGlobalDepth());
+//            assertEquals(1, a.getGlobalDepth());// recomputed!
+//            assertEquals(1, e.getGlobalDepth());// same as [a] (recomputed).
+//            assertEquals(1, b.getGlobalDepth());// unchanged.
+//            assertEquals(2, c.getGlobalDepth());// recomputed!
+//            assertEquals(2, a.getKeyCount());
+//            assertEquals(2, a.getValueCount());
+////            assertEquals(3, e.getKeyCount());
+////            assertEquals(3, e.getValueCount());
+//            assertEquals(0, b.getKeyCount());
+//            assertEquals(0, b.getValueCount());
+//            assertEquals(0, c.getKeyCount());
+//            assertEquals(0, c.getValueCount());
+//            assertTrue(htree.contains(k1));
+//            assertTrue(htree.contains(k2));
+//            assertTrue(htree.contains(k3));
+//            assertTrue(htree.contains(k4));
+//            assertTrue(htree.contains(k20));
+//            assertFalse(htree.contains(unused));
+//            assertEquals(v1, htree.lookupFirst(k1));
+//            assertEquals(v2, htree.lookupFirst(k2));
+//            assertEquals(v3, htree.lookupFirst(k3));
+//            assertEquals(v4, htree.lookupFirst(k4));
+//            assertEquals(v20, htree.lookupFirst(k20));
+//            assertNull(htree.lookupFirst(unused));
+//            AbstractBTreeTestCase.assertSameIterator(new byte[][] { v1 }, htree
+//                    .lookupAll(k1));
+//            AbstractBTreeTestCase.assertSameIterator(new byte[][] { v2 }, htree
+//                    .lookupAll(k2));
+//            AbstractBTreeTestCase.assertSameIterator(new byte[][] { v3 }, htree
+//                    .lookupAll(k3));
+//            AbstractBTreeTestCase.assertSameIterator(new byte[][] { v4 }, htree
+//                    .lookupAll(k4));
+//            AbstractBTreeTestCase.assertSameIterator(new byte[][] { v20 }, htree
+//                    .lookupAll(k20));
+//            AbstractBTreeTestCase.assertSameIterator(new byte[][] {}, htree
+//                    .lookupAll(unused));
 
+// TODO Continue test (perhaps only for the high level variant).
+//            /*
+//             * htree.insert() variant.
+//             */
+//            assertEquals("nnodes", 2, htree.getNodeCount());
+//            assertEquals("nleaves", 4, htree.getLeafCount());
+//            assertEquals("nentries", 5, htree.getEntryCount());
+//            htree.dump(Level.ALL, System.err, true/* materialize */);
+//            assertTrue(root == htree.getRoot());
+//            assertEquals(4, root.childRefs.length);
+//            final DirectoryPage d = (DirectoryPage) root.childRefs[0].get();
+//            assertTrue(d == (DirectoryPage) root.childRefs[0].get());
+//            assertTrue(c == (BucketPage) root.childRefs[1].get());
+//            assertTrue(b == (BucketPage) root.childRefs[2].get());
+//            assertTrue(b == (BucketPage) root.childRefs[3].get());
+//            assertEquals(4, d.childRefs.length);
+//            final BucketPage e = (BucketPage) d.childRefs[1].get();
+//            assertTrue(a == (BucketPage) d.childRefs[0].get()); // FIXME This shows that the buddy references do not have to be contiguous!
+//            assertTrue(e == (BucketPage) d.childRefs[1].get());
+//            assertTrue(a == (BucketPage) d.childRefs[2].get());
+//            assertTrue(a == (BucketPage) d.childRefs[3].get());
+//            assertEquals(2, root.getGlobalDepth());
+//            assertEquals(1, d.getGlobalDepth());
+//            assertEquals(1, a.getGlobalDepth());// recomputed!
+//            assertEquals(1, e.getGlobalDepth());// same as [a] (recomputed).
+//            assertEquals(1, b.getGlobalDepth());// unchanged.
+//            assertEquals(2, c.getGlobalDepth());// recomputed!
+//            assertEquals(2, a.getKeyCount());
+//            assertEquals(2, a.getValueCount());
+//            assertEquals(3, e.getKeyCount());
+//            assertEquals(3, e.getValueCount());
+//            assertEquals(0, b.getKeyCount());
+//            assertEquals(0, b.getValueCount());
+//            assertEquals(0, c.getKeyCount());
+//            assertEquals(0, c.getValueCount());
+//            assertTrue(htree.contains(k1));
+//            assertTrue(htree.contains(k2));
+//            assertTrue(htree.contains(k3));
+//            assertTrue(htree.contains(k4));
+//            assertTrue(htree.contains(k20));
+//            assertFalse(htree.contains(unused));
+//            assertEquals(v1, htree.lookupFirst(k1));
+//            assertEquals(v2, htree.lookupFirst(k2));
+//            assertEquals(v3, htree.lookupFirst(k3));
+//            assertEquals(v4, htree.lookupFirst(k4));
+//            assertEquals(v20, htree.lookupFirst(k20));
+//            assertNull(htree.lookupFirst(unused));
+//            AbstractBTreeTestCase.assertSameIterator(new byte[][] { v1 }, htree
+//                    .lookupAll(k1));
+//            AbstractBTreeTestCase.assertSameIterator(new byte[][] { v2 }, htree
+//                    .lookupAll(k2));
+//            AbstractBTreeTestCase.assertSameIterator(new byte[][] { v3 }, htree
+//                    .lookupAll(k3));
+//            AbstractBTreeTestCase.assertSameIterator(new byte[][] { v4 }, htree
+//                    .lookupAll(k4));
+//            AbstractBTreeTestCase.assertSameIterator(new byte[][] { v20 }, htree
+//                    .lookupAll(k20));
+//            AbstractBTreeTestCase.assertSameIterator(new byte[][] {}, htree
+//                    .lookupAll(unused));
+//
+//            /* FIXME Update comments to reflect the example (and update the
+//             * worksheet as well).
+//             * 
+//             * FIXME We MUST NOT decrease the localDepth of (a) since that would
+//             * place duplicate keys into different buckets (lost retrival).
+//             * Since (a) can not have its localDepth decreased, we need to have
+//             * localDepth(d=2) with one pointer to (a) to get localDepth(a:=2).
+//             * That makes this a very complicated example. It would be a lot
+//             * easier if we started with distinct keys such that the keys in (a)
+//             * could be redistributed. This case should be reserved for later
+//             * once the structural mutations are better understood.
+//             * 
+//             * 5. Insert 0x20. This key is directed into the same buddy bucket
+//             * as the 0x01 keys that we have been inserting. That buddy bucket
+//             * is already full and, further, it is the only buddy bucket on the
+//             * page. Since we are not inserting a duplicate key we can split the
+//             * buddy bucket (rather than doubling the size of the bucket).
+//             * However, since global depth == local depth (i.e., only one buddy
+//             * bucket on the page), this split will introduce a new directory
+//             * page. The new directory page basically codes for the additional
+//             * prefix bits required to differentiate the two distinct keys such
+//             * that they are directed into the appropriate buckets.
+//             * 
+//             * The new directory page (d) is inserted one level below the root
+//             * on the path leading to (a). The new directory must have a local
+//             * depth of ONE (1), since it will add a one bit distinction. Since
+//             * we know the global depth of the root and the local depth of the
+//             * new directory page, we solve for npointers := 1 << (globalDepth -
+//             * localDepth). This tells us that we will have 2 pointers in the
+//             * root to the new directory page.
+//             * 
+//             * The precondition state of root is {a,c,b,b}. Since we know that
+//             * we need npointers:=2 pointers to (d), this means that we will
+//             * copy the {a,c} references into (d) and replace those references
+//             * in the root with references to (d). The root is now {d,d,b,b}. d
+//             * is now {a,a;c,c}. Since d has a local depth of 1 and address bits
+//             * of 2, it is comprised of two buddy hash tables {a,a} and {c,c}.
+//             * 
+//             * Linking in (d) has also changes the local depths of (a) and (c).
+//             * Since they each now have npointers:=2, their localDepth is has
+//             * been reduced from TWO (2) to ONE (1) (and their transient cached
+//             * depth values must be either invalidated or recomputed). Note that
+//             * the local depth of (d) and its children (a,c)) are ONE after this
+//             * operation so if we force another split in (a) that will force a
+//             * split in (d).
+//             * 
+//             * Having introduced a new directory page into the hash tree, we now
+//             * retry the insert. Once again, the insert is directed into (a).
+//             * Since (a) is still full it is split. (d) is now the parent of
+//             * (a). Once again, we have globalDepth(d=1)==localDepth(a=1) so we
+//             * need to split (d). However, since localDepth(d=1) is less than
+//             * globalDepth(root=2) we can split the buddy hash tables in (d).
+//             * This will require us to allocate a new page (f) which will be the
+//             * right sibling of (d). There are TWO (2) buddy hash tables in (d).
+//             * They are now redistributed between (d) and (f). We also have to
+//             * update the pointers to (d) in the parent such that 1/2 of them
+//             * point to the new right sibling of (d). Since we have changed the
+//             * #of pointers to (d) (from 2 to 1) the local depth of (d) (and of
+//             * f) is now TWO (2). Since globalDepth(d=2) is greater than
+//             * localDepth(a=1) we can now split (a) into (a,e), redistribute the
+//             * tuples in the sole buddy page (a) between (a,e) and update the
+//             * pointers in (d) to (a,a,e,e).
+//             * 
+//             * Having split (d) into (d,f), we now retry the insert. This time
+//             * the insert is directed into (e). There is room in (e) (it is
+//             * empty) and the tuple is inserted without further mutation to the
+//             * structure of the hash tree.
+//             * 
+//             * TODO At this point we should also prune the buckets [b] and [c]
+//             * since they are empty and replace them with null references.
+//             * 
+//             * TODO The #of new directory levels which have to be introduced
+//             * here is a function of the #of prefix bits which have to be
+//             * consumed before a distinction can be made between the existing
+//             * key (0x01) and the new key (0x20). With an address space of 2
+//             * bits, each directory level examines the next 2-bits of the key.
+//             * The key (x20) was chosen since the distinction can be made by
+//             * adding only one directory level (the keys differ in the first 4
+//             * bits). [Do an alternative example which requires recursive splits
+//             * in order to verify that we reenter the logic correctly each time.
+//             * E.g., by inserting 0x02 rather than 0x20.]
+//             * 
+//             * TODO Do an example in which we explore an insert which introduces
+//             * a new directory level in a 3-level tree. This should be a driven
+//             * by a single insert so we can examine in depth how the new
+//             * directory is introduced and verify whether it is introduced below
+//             * the root or above the bucket. [I believe that it is introduced
+//             * immediately below below the root. Note that a balanced tree,
+//             * e.g., a B-Tree, introduces the new level above the root. However,
+//             * the HTree is intended to be unbalanced in order to optimize
+//             * storage and access times to the parts of the index which
+//             * correspond to unequal distributions in the hash codes.]
+//             */
+////          htree.insert(new byte[] { 0x20 }, new byte[] { 0x20 });
+////          assertEquals("nnodes", 2, htree.getNodeCount());
+////          assertEquals("nleaves", 4, htree.getLeafCount());
+////          assertEquals("nentries", 5, htree.getEntryCount());
+////          htree.dump(Level.ALL, System.err, true/* materialize */);
+////          assertTrue(root == htree.getRoot());
+////          assertEquals(4, root.childRefs.length);
+////          final DirectoryPage d = (DirectoryPage)root.childRefs[0].get();
+////          assertTrue(d == (DirectoryPage) root.childRefs[0].get());
+////          assertTrue(d == (DirectoryPage) root.childRefs[1].get());
+////          assertTrue(b == (BucketPage) root.childRefs[2].get());
+////          assertTrue(b == (BucketPage) root.childRefs[3].get());
+////          assertEquals(4, d.childRefs.length);
+////          final BucketPage e = (BucketPage)d.childRefs[1].get();
+////          assertTrue(a == (BucketPage) d.childRefs[0].get());
+////          assertTrue(e == (BucketPage) d.childRefs[1].get());
+////          assertTrue(c == (BucketPage) d.childRefs[2].get());
+////          assertTrue(c == (BucketPage) d.childRefs[3].get());
+////          assertEquals(2, root.getGlobalDepth());
+////          assertEquals(2, d.getGlobalDepth());
+////          assertEquals(2, a.getGlobalDepth());// unchanged
+////          assertEquals(2, e.getGlobalDepth());// same as [a].
+////          assertEquals(1, b.getGlobalDepth());// unchanged.
+////          assertEquals(2, c.getGlobalDepth());// unchanged.
+////          assertTrue(htree.contains(new byte[] { 0x01 }));
+////          assertFalse(htree.contains(new byte[] { 0x02 }));
+////          assertEquals(new byte[] { 0x01 }, htree
+////                  .lookupFirst(new byte[] { 0x01 }));
+////          assertNull(htree.lookupFirst(new byte[] { 0x02 }));
+////          AbstractBTreeTestCase.assertSameIterator(
+////          //
+////                  new byte[][] { new byte[] { 0x01 }, new byte[] { 0x01 },
+////                          new byte[] { 0x01 }, new byte[] { 0x01 } }, htree
+////                          .lookupAll(new byte[] { 0x01 }));
+////          AbstractBTreeTestCase.assertSameIterator(//
+////                  new byte[][] {}, htree.lookupAll(new byte[] { 0x02 }));
+
             // TODO REMOVE (or test suite for remove).
             
             // TODO Continue progression here? 
@@ -621,6 +693,198 @@
     }
 
     /**
+     * Unit test for
+     * {@link HTree#splitDirectoryPage(DirectoryPage, int, int, com.bigdata.htree.HTree.AbstractPage)}
+     * .
+     */
+    public void test_splitDir_addressBits2_splitBits1() {
+
+        final int addressBits = 2;
+        
+        final IRawStore store = new SimpleMemoryRawStore();
+
+        try {
+
+            final byte[] k1 = new byte[]{0x01};
+            final byte[] k2 = new byte[]{0x02};
+            final byte[] k3 = new byte[]{0x03};
+            final byte[] k4 = new byte[]{0x04};
+            final byte[] k20 = new byte[]{0x20};
+
+            final byte[] v1 = new byte[]{0x01};
+            final byte[] v2 = new byte[]{0x02};
+            final byte[] v3 = new byte[]{0x03};
+            final byte[] v4 = new byte[]{0x04};
+            final byte[] v20 = new byte[]{0x20};
+            
+            final HTree htree = new HTree(store, addressBits);
+
+            final DirectoryPage root = htree.getRoot();
+            
+            htree.insert(k1, v1);
+            htree.insert(k2, v2);
+            htree.insert(k3, v3);
+            htree.insert(k4, v4);
+            
+            /*
+             * Verify preconditions for the unit test.
+             */
+            assertTrue(root == htree.getRoot());
+            final BucketPage a = (BucketPage) root.childRefs[0].get();
+            final BucketPage c = (BucketPage) root.childRefs[1].get();
+            final BucketPage b = (BucketPage) root.childRefs[2].get();
+            assertTrue(a == root.childRefs[0].get());
+            assertTrue(c == root.childRefs[1].get());
+            assertTrue(b == root.childRefs[2].get());
+            assertTrue(b == root.childRefs[3].get());
+            assertEquals(2, root.globalDepth);
+            assertEquals(2, a.globalDepth);
+            assertEquals(2, c.globalDepth);
+            assertEquals(1, b.globalDepth);
+
+            /*
+             * Force a split of the root directory page. Note that (a) has the
+             * same depth as the directory page we want to split, which is a
+             * precondition for being able to split the directory.
+             */
+
+            // verify that [a] will not accept an insert.
+            assertFalse(a
+                    .insert(k20, v20, root/* parent */, 0/* buddyOffset */));
+            
+            // split the root directory page.
+            htree.splitDirectoryPage(root/* oldParent */, 0/* buddyOffset */,
+                    1/* splitBits */, a/* child */);
+            
+            /*
+             * Verify post-conditions.
+             */
+            
+            assertEquals("nnodes", 2, htree.getNodeCount());
+            assertEquals("nleaves", 3, htree.getLeafCount()); // unchanged
+            assertEquals("nentries", 4, htree.getEntryCount()); // unchanged
+
+            assertTrue(root == htree.getRoot());
+            final DirectoryPage d = (DirectoryPage) root.childRefs[0].get();
+            assertTrue(d == root.childRefs[0].get());
+            assertTrue(d == root.childRefs[1].get());
+            assertTrue(b == root.childRefs[2].get());
+            assertTrue(b == root.childRefs[3].get());
+            assertTrue(a == d.childRefs[0].get());
+            assertTrue(a == d.childRefs[1].get());
+            assertTrue(c == d.childRefs[2].get());
+            assertTrue(c == d.childRefs[3].get());
+            assertEquals(2, root.globalDepth);
+            assertEquals(1, d.globalDepth);
+            assertEquals(1, a.globalDepth);
+            assertEquals(1, c.globalDepth);
+            assertEquals(1, b.globalDepth);
+
+        } finally {
+
+            store.destroy();
+            
+        }
+
+    }
+    
+    /**
+     * Unit test for
+     * {@link HTree#splitDirectoryPage(DirectoryPage, int, int, com.bigdata.htree.HTree.AbstractPage)}
+     * .
+     */
+    public void test_splitDir_addressBits2_splitBits2() {
+
+        final int addressBits = 2;
+        
+        final IRawStore store = new SimpleMemoryRawStore();
+
+        try {
+
+            final byte[] k1 = new byte[]{0x01};
+            final byte[] k2 = new byte[]{0x02};
+            final byte[] k3 = new byte[]{0x03};
+            final byte[] k4 = new byte[]{0x04};
+            final byte[] k20 = new byte[]{0x20};
+
+            final byte[] v1 = new byte[]{0x01};
+            final byte[] v2 = new byte[]{0x02};
+            final byte[] v3 = new byte[]{0x03};
+            final byte[] v4 = new byte[]{0x04};
+            final byte[] v20 = new byte[]{0x20};
+            
+            final HTree htree = new HTree(store, addressBits);
+
+            final DirectoryPage root = htree.getRoot();
+            
+            htree.insert(k1, v1);
+            htree.insert(k2, v2);
+            htree.insert(k3, v3);
+            htree.insert(k4, v4);
+            
+            /*
+             * Verify preconditions for the unit test.
+             */
+            assertTrue(root == htree.getRoot());
+            final BucketPage a = (BucketPage) root.childRefs[0].get();
+            final BucketPage c = (BucketPage) root.childRefs[1].get();
+            final BucketPage b = (BucketPage) root.childRefs[2].get();
+            assertTrue(a == root.childRefs[0].get());
+            assertTrue(c == root.childRefs[1].get());
+            assertTrue(b == root.childRefs[2].get());
+            assertTrue(b == root.childRefs[3].get());
+            assertEquals(2, root.globalDepth);
+            assertEquals(2, a.globalDepth);
+            assertEquals(2, c.globalDepth);
+            assertEquals(1, b.globalDepth);
+
+            /*
+             * Force a split of the root directory page. Note that (a) has the
+             * same depth as the directory page we want to split, which is a
+             * precondition for being able to split the directory.
+             */
+
+            // verify that [a] will not accept an insert.
+            assertFalse(a
+                    .insert(k20, v20, root/* parent */, 0/* buddyOffset */));
+            
+            // split the root directory page.
+            htree.splitDirectoryPage(root/* oldParent */, 0/* buddyOffset */,
+                    2/* splitBits */, a/* child */);
+            
+            /*
+             * Verify post-conditions.
+             */
+            
+            assertEquals("nnodes", 2, htree.getNodeCount());
+            assertEquals("nleaves", 3, htree.getLeafCount()); // unchanged
+            assertEquals("nentries", 4, htree.getEntryCount()); // unchanged
+
+            assertTrue(root == htree.getRoot());
+            final DirectoryPage d = (DirectoryPage) root.childRefs[0].get();
+            assertTrue(d == root.childRefs[0].get());
+            assertTrue(c == root.childRefs[1].get());
+            assertTrue(b == root.childRefs[2].get());
+            assertTrue(b == root.childRefs[3].get());
+            assertTrue(a == d.childRefs[0].get());
+            assertTrue(a == d.childRefs[1].get());
+            assertTrue(a == d.childRefs[2].get());
+            assertTrue(a == d.childRefs[3].get());
+            assertEquals(2, root.globalDepth);
+            assertEquals(2, d.globalDepth);
+            assertEquals(0, a.globalDepth);
+            assertEquals(2, c.globalDepth);
+            assertEquals(1, b.globalDepth);
+
+        } finally {
+
+            store.destroy();
+            
+        }
+
+    }
+    
+    /**
 	 * Test of basic insert, lookup, and split page operations (including when a
 	 * new directory page must be introduced) using an address space with only
 	 * TWO (2) bits.


This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.




[Bigdata-commit] SF.net SVN: bigdata:[4528] branches/QUADS_QUERY_BRANCH/bigdata/src

Fast, scalable, robust graph database platform

[Bigdata-commit] SF.net SVN: bigdata:[4528] branches/QUADS_QUERY_BRANCH/bigdata/src