From: <tho...@us...> - 2011-05-19 20:03:09
|
Revision: 4528 http://bigdata.svn.sourceforge.net/bigdata/?rev=4528&view=rev Author: thompsonbry Date: 2011-05-19 20:03:02 +0000 (Thu, 19 May 2011) Log Message: ----------- Broke out unit tests for splitting a directory page when the bucket page beneath it has the same depth as the directory page and introduced an argument (splitBits) to generalize the #of bits to be used in that split (rather than assuming ONE or [addressBits]). Modified Paths: -------------- branches/QUADS_QUERY_BRANCH/bigdata/src/java/com/bigdata/htree/HTree.java branches/QUADS_QUERY_BRANCH/bigdata/src/test/com/bigdata/htree/TestHTree.java Modified: branches/QUADS_QUERY_BRANCH/bigdata/src/java/com/bigdata/htree/HTree.java =================================================================== --- branches/QUADS_QUERY_BRANCH/bigdata/src/java/com/bigdata/htree/HTree.java 2011-05-19 17:25:26 UTC (rev 4527) +++ branches/QUADS_QUERY_BRANCH/bigdata/src/java/com/bigdata/htree/HTree.java 2011-05-19 20:03:02 UTC (rev 4528) @@ -89,6 +89,14 @@ private static final transient Logger log = Logger.getLogger(HTree.class); + /** + * The #of bits of distinction to be made each time we split a directory + * page in the {@link HTree}. See + * {@link #splitDirectoryPage(DirectoryPage, int, AbstractPage)} for a write + * up on this. + */ + private final int splitBits; + /* * metadata about the index. * @@ -180,7 +188,9 @@ // super(store, nodeFactory, readOnly, addressBits, metadata, recordCompressorFactory); super(store, false/*readOnly*/, addressBits); - + + this.splitBits = addressBits; + // if (pageSize <= 0) // throw new IllegalArgumentException("pageSize must be positive."); // @@ -535,9 +545,9 @@ * global depth of a directory is always the same as * address bits, but maybe I am missing something.... */ - addDirectoryPageAndSplitBucketPage(current, - buddyOffset, bucketPage); - + splitDirectoryPage(current, buddyOffset, splitBits, + bucketPage); + // The children of [current] have changed so we will // search current again. continue; @@ -944,10 +954,103 @@ } /** - * Introduce a new directory level when we need to split a child but + * Splits a {@link DirectoryPage} when we need to split a child but * <code>globalDepth == localDepth</code>. The caller must retry the insert * after this method makes the structural change. * + * <h2>Design discussion</h2> + * + * This method must maintain the invariant that the tree of page references + * for the hash tree is a strict tree. That is, you can not have two + * different pages each of which points to the same child. This would be a + * concurrency nightmare as, e.g., splitting the child could require us to + * propagate updates to multiple parents. However, even with that constraint + * we have two options. + * <p> + * Given addressBits := 2, the following hash tree state is induced by + * inserting the key sequence (0x01, 0x02, 0x03, 0x04). + * + * <pre> + * root := [2] (a,c,b,b) + * a := [2] (1,2,3,4) + * c := [2] (-,-,-,-) + * b := [1] (-,-;-,-) + * </pre> + * + * where [x] is the depth of the buddies on the corresponding page and ";" + * indicates a buddy bucket boundary while "," indicates a tuple boundary. + * <p> + * If we then attempt to insert a key which would be directed into (a) + * + * <pre> + * insert(0x20,...) + * </pre> + * + * then we must split (a) since depth(a):=2 and depth(root):=2. This will + * introduce a new directory page (d). + * + * <h3>depth(d) := 1</h3> + * + * This gives us the following post-condition. + * + * <pre> + * root := [2] (d,d,b,b) + * d := [1] (a,a;c,c) // two ptrs to (d) so 2 buddies on the page + * a := [1] (1,2;3,4) // depth changes since now 2 ptrs to (a) + * c := [1] (-,-;-,-) // depth changes since now 2 ptrs to (c) + * b := [1] (-,-;-,-) + * </pre> + * + * Regardless of the value of [addressBits], this design gives us + * [addressBits] buddies on (d) and each buddy has two slots (since the + * depth of (d) is ONE, each buddy on (d) has a one bit address space and + * hence uses two slots). The depth(a) will always be reset to ONE by this + * design since there will always be TWO pointers to (a) in (d). This design + * provides ONE (1) bit of additional distinctions along the path for which + * we have exhausted the hash tree address space. + * + * <h3>depth(d) := addressBits</h3> + * + * This gives us the following post-condition. + * + * <pre> + * root := [2] (a,c,b,b) + * d := [2] (a,a,a,a) // one ptr to (d) so 1 buddy on the page + * a := [0] (1;2;3;4) // depth changes since now 4 ptrs to (a). + * c := [2] (-,-,-,-) + * b := [1] (-,-;-,-) + * </pre> + * + * In this design, we always wind up with ONE buddy on (d), the depth(d) is + * [addressBits], and the depth(a) is reset to ZERO(0). This design focuses + * the expansion in the address space of the hash tree narrowly on the + * specific key prefix for which we have run out of distinctions and gives + * us [addressBits] of additional distinctions along that path. + * + * <h3>Conclusion</h3> + * + * Both designs would appear to be valid. Neither one can lead to a + * situation in which we have multiple parents for a child. In the first + * design, the one-bit expansion means that we never have pointers to the + * same child in more than one buddy bucket, and hence they will all be on + * the same page. In the second design, the depth of the new directory page + * is already at the maximum possible value so it can not be split again and + * thus the pointers to the child will always remain on the same page. + * <p> + * It seems that the first design has the advantage of growing the #of + * distinctions more slowly and sharing the new directory page among + * multiple such distinctions (all keys having the same leading bit). In the + * second design, we add a full [addressBits] at once to keys having the + * same [addressBits] leading bits). + * <p> + * It would appear that any choice in the inclusive range (1:addressBits) is + * permissible as in all cases the pointers to (a) will lie within a single + * buddy bucket. By factoring out the #of additional bits of distinction to + * be made when we split a directory page, we can defer this design either + * to construction time (or perhaps even to runtime) decision. I have + * therefore introduced an additional parameter on the {@link HTree} for + * this purpose. + * * @param oldParent * The parent {@link DirectoryPage}. * @param buddyOffset @@ -956,10 +1059,15 @@ * updated such that it points to both the original child and new * child. * @param child - * The child. + * The child, which can be a {@link DirectoryPage} or a + * {@link BucketPage}. * * @throws IllegalArgumentException * if any argument is <code>null</code>. + * @throws IllegalArgumentException + * if <i>splitBits</i> is non-positive. + * @throws IllegalArgumentException + * if <i>splitBits</i> is GT {@link #getAddressBits()}. * @throws IllegalStateException * if the depth of the child is GTE the depth of the parent. * @throws IllegalStateException @@ -970,8 +1078,9 @@ * if the parent of the <oldBucket</i> is not the given * <i>parent</i>. */ - private void addDirectoryPageAndSplitBucketPage(final DirectoryPage oldParent, - final int buddyOffset, final AbstractPage child) { + // Note: package private for unit tests. + void splitDirectoryPage(final DirectoryPage oldParent, + final int buddyOffset, final int splitBits, final AbstractPage child) { if (oldParent == null) throw new IllegalArgumentException(); if (child == null) @@ -995,6 +1104,10 @@ */ throw new IllegalArgumentException(); } + if (splitBits <= 0) + throw new IllegalArgumentException(); + if (splitBits > addressBits) + throw new IllegalArgumentException(); if (oldParent.isReadOnly()) // must be mutable. throw new IllegalStateException(); if (child.isReadOnly()) // must be mutable. @@ -1006,8 +1119,10 @@ log.debug("parent=" + oldParent.toShortString() + ", buddyOffset=" + buddyOffset + ", child=" + child); - // Allocate a new directory page. The global depth will be ONE (1). - final DirectoryPage newParent = new DirectoryPage(this, 1/* globalDepth */); + assert splitBits == addressBits : "FIXME Handle splitBits NE addressBits."; + + // Allocate a new directory page. . + final DirectoryPage newParent = new DirectoryPage(this, splitBits/* globalDepth */); // Set the parent Reference on the new dir page to the old dir page. newParent.parent = (Reference) oldParent.self; Modified: branches/QUADS_QUERY_BRANCH/bigdata/src/test/com/bigdata/htree/TestHTree.java =================================================================== --- branches/QUADS_QUERY_BRANCH/bigdata/src/test/com/bigdata/htree/TestHTree.java 2011-05-19 17:25:26 UTC (rev 4527) +++ branches/QUADS_QUERY_BRANCH/bigdata/src/test/com/bigdata/htree/TestHTree.java 2011-05-19 20:03:02 UTC (rev 4528) @@ -405,6 +405,7 @@ * before the insert could succeed. Choosing 0x20 thus minimizes the * #of unobserved intermediate states of the hash tree. * + * * FIXME This is causing a repeated introduction of a new directory * level and giving me a hash tree structure which differs from my * expectations. Work through the example on paper. Make sure that @@ -420,194 +421,265 @@ * * TODO Do an alternative example where we insert 0x05 at this step * instead of 0x10. - */ - htree.insert(k20, v20); - assertEquals("nnodes", 2, htree.getNodeCount()); - assertEquals("nleaves", 4, htree.getLeafCount()); - assertEquals("nentries", 5, htree.getEntryCount()); - htree.dump(Level.ALL, System.err, true/* materialize */); - assertTrue(root == htree.getRoot()); - assertEquals(4, root.childRefs.length); - final DirectoryPage d = (DirectoryPage) root.childRefs[0].get(); - assertTrue(d == (DirectoryPage) root.childRefs[0].get()); - assertTrue(c == (BucketPage) root.childRefs[1].get()); - assertTrue(b == (BucketPage) root.childRefs[2].get()); - assertTrue(b == (BucketPage) root.childRefs[3].get()); - assertEquals(4, d.childRefs.length); - final BucketPage e = (BucketPage) d.childRefs[1].get(); - assertTrue(a == (BucketPage) d.childRefs[0].get()); // FIXME This shows that the buddy references do not have to be contiguous! - assertTrue(e == (BucketPage) d.childRefs[1].get()); - assertTrue(a == (BucketPage) d.childRefs[2].get()); - assertTrue(a == (BucketPage) d.childRefs[3].get()); - assertEquals(2, root.getGlobalDepth()); - assertEquals(1, d.getGlobalDepth()); - assertEquals(1, a.getGlobalDepth());// recomputed! - assertEquals(1, e.getGlobalDepth());// same as [a] (recomputed). - assertEquals(1, b.getGlobalDepth());// unchanged. - assertEquals(2, c.getGlobalDepth());// recomputed! - assertEquals(2, a.getKeyCount()); - assertEquals(2, a.getValueCount()); - assertEquals(3, e.getKeyCount()); - assertEquals(3, e.getValueCount()); - assertEquals(0, b.getKeyCount()); - assertEquals(0, b.getValueCount()); - assertEquals(0, c.getKeyCount()); - assertEquals(0, c.getValueCount()); - assertTrue(htree.contains(k1)); - assertTrue(htree.contains(k2)); - assertTrue(htree.contains(k3)); - assertTrue(htree.contains(k4)); - assertTrue(htree.contains(k20)); - assertFalse(htree.contains(unused)); - assertEquals(v1, htree.lookupFirst(k1)); - assertEquals(v2, htree.lookupFirst(k2)); - assertEquals(v3, htree.lookupFirst(k3)); - assertEquals(v4, htree.lookupFirst(k4)); - assertEquals(v20, htree.lookupFirst(k20)); - assertNull(htree.lookupFirst(unused)); - AbstractBTreeTestCase.assertSameIterator(new byte[][] { v1 }, htree - .lookupAll(k1)); - AbstractBTreeTestCase.assertSameIterator(new byte[][] { v2 }, htree - .lookupAll(k2)); - AbstractBTreeTestCase.assertSameIterator(new byte[][] { v3 }, htree - .lookupAll(k3)); - AbstractBTreeTestCase.assertSameIterator(new byte[][] { v4 }, htree - .lookupAll(k4)); - AbstractBTreeTestCase.assertSameIterator(new byte[][] { v20 }, htree - .lookupAll(k20)); - AbstractBTreeTestCase.assertSameIterator(new byte[][] {}, htree - .lookupAll(unused)); - - /* FIXME Update comments to reflect the example (and update the - * worksheet as well). * - * FIXME We MUST NOT decrease the localDepth of (a) since that would - * place duplicate keys into different buckets (lost retrival). - * Since (a) can not have its localDepth decreased, we need to have - * localDepth(d=2) with one pointer to (a) to get localDepth(a:=2). - * That makes this a very complicated example. It would be a lot - * easier if we started with distinct keys such that the keys in (a) - * could be redistributed. This case should be reserved for later - * once the structural mutations are better understood. - * - * 5. Insert 0x20. This key is directed into the same buddy bucket - * as the 0x01 keys that we have been inserting. That buddy bucket - * is already full and, further, it is the only buddy bucket on the - * page. Since we are not inserting a duplicate key we can split the - * buddy bucket (rather than doubling the size of the bucket). - * However, since global depth == local depth (i.e., only one buddy - * bucket on the page), this split will introduce a new directory - * page. The new directory page basically codes for the additional - * prefix bits required to differentiate the two distinct keys such - * that they are directed into the appropriate buckets. - * - * The new directory page (d) is inserted one level below the root - * on the path leading to (a). The new directory must have a local - * depth of ONE (1), since it will add a one bit distinction. Since - * we know the global depth of the root and the local depth of the - * new directory page, we solve for npointers := 1 << (globalDepth - - * localDepth). This tells us that we will have 2 pointers in the - * root to the new directory page. - * - * The precondition state of root is {a,c,b,b}. Since we know that - * we need npointers:=2 pointers to (d), this means that we will - * copy the {a,c} references into (d) and replace those references - * in the root with references to (d). The root is now {d,d,b,b}. d - * is now {a,a;c,c}. Since d has a local depth of 1 and address bits - * of 2, it is comprised of two buddy hash tables {a,a} and {c,c}. - * - * Linking in (d) has also changes the local depths of (a) and (c). - * Since they each now have npointers:=2, their localDepth is has - * been reduced from TWO (2) to ONE (1) (and their transient cached - * depth values must be either invalidated or recomputed). Note that - * the local depth of (d) and its children (a,c)) are ONE after this - * operation so if we force another split in (a) that will force a - * split in (d). - * - * Having introduced a new directory page into the hash tree, we now - * retry the insert. Once again, the insert is directed into (a). - * Since (a) is still full it is split. (d) is now the parent of - * (a). Once again, we have globalDepth(d=1)==localDepth(a=1) so we - * need to split (d). However, since localDepth(d=1) is less than - * globalDepth(root=2) we can split the buddy hash tables in (d). - * This will require us to allocate a new page (f) which will be the - * right sibling of (d). There are TWO (2) buddy hash tables in (d). - * They are now redistributed between (d) and (f). We also have to - * update the pointers to (d) in the parent such that 1/2 of them - * point to the new right sibling of (d). Since we have changed the - * #of pointers to (d) (from 2 to 1) the local depth of (d) (and of - * f) is now TWO (2). Since globalDepth(d=2) is greater than - * localDepth(a=1) we can now split (a) into (a,e), redistribute the - * tuples in the sole buddy page (a) between (a,e) and update the - * pointers in (d) to (a,a,e,e). - * - * Having split (d) into (d,f), we now retry the insert. This time - * the insert is directed into (e). There is room in (e) (it is - * empty) and the tuple is inserted without further mutation to the - * structure of the hash tree. - * - * TODO At this point we should also prune the buckets [b] and [c] - * since they are empty and replace them with null references. - * - * TODO The #of new directory levels which have to be introduced - * here is a function of the #of prefix bits which have to be - * consumed before a distinction can be made between the existing - * key (0x01) and the new key (0x20). With an address space of 2 - * bits, each directory level examines the next 2-bits of the key. - * The key (x20) was chosen since the distinction can be made by - * adding only one directory level (the keys differ in the first 4 - * bits). [Do an alternative example which requires recursive splits - * in order to verify that we reenter the logic correctly each time. - * E.g., by inserting 0x02 rather than 0x20.] - * - * TODO Do an example in which we explore an insert which introduces - * a new directory level in a 3-level tree. This should be a driven - * by a single insert so we can examine in depth how the new - * directory is introduced and verify whether it is introduced below - * the root or above the bucket. [I believe that it is introduced - * immediately below below the root. Note that a balanced tree, - * e.g., a B-Tree, introduces the new level above the root. However, - * the HTree is intended to be unbalanced in order to optimize - * storage and access times to the parts of the index which - * correspond to unequal distributions in the hash codes.] + * TODO This version tunnels to package private methods in order to + * test the intermediate states: (A) Do an alternative version with + * a different value for [splitBits]; (B) Do an alternative using + * htree.insert() (high level API); (C) Do low level tests of the + * BucketPage insert/lookup API. */ -// htree.insert(new byte[] { 0x20 }, new byte[] { 0x20 }); -// assertEquals("nnodes", 2, htree.getNodeCount()); -// assertEquals("nleaves", 4, htree.getLeafCount()); -// assertEquals("nentries", 5, htree.getEntryCount()); -// htree.dump(Level.ALL, System.err, true/* materialize */); -// assertTrue(root == htree.getRoot()); -// assertEquals(4, root.childRefs.length); -// final DirectoryPage d = (DirectoryPage)root.childRefs[0].get(); -// assertTrue(d == (DirectoryPage) root.childRefs[0].get()); -// assertTrue(d == (DirectoryPage) root.childRefs[1].get()); -// assertTrue(b == (BucketPage) root.childRefs[2].get()); -// assertTrue(b == (BucketPage) root.childRefs[3].get()); -// assertEquals(4, d.childRefs.length); -// final BucketPage e = (BucketPage)d.childRefs[1].get(); -// assertTrue(a == (BucketPage) d.childRefs[0].get()); -// assertTrue(e == (BucketPage) d.childRefs[1].get()); -// assertTrue(c == (BucketPage) d.childRefs[2].get()); -// assertTrue(c == (BucketPage) d.childRefs[3].get()); -// assertEquals(2, root.getGlobalDepth()); -// assertEquals(2, d.getGlobalDepth()); -// assertEquals(2, a.getGlobalDepth());// unchanged -// assertEquals(2, e.getGlobalDepth());// same as [a]. -// assertEquals(1, b.getGlobalDepth());// unchanged. -// assertEquals(2, c.getGlobalDepth());// unchanged. -// assertTrue(htree.contains(new byte[] { 0x01 })); -// assertFalse(htree.contains(new byte[] { 0x02 })); -// assertEquals(new byte[] { 0x01 }, htree -// .lookupFirst(new byte[] { 0x01 })); -// assertNull(htree.lookupFirst(new byte[] { 0x02 })); -// AbstractBTreeTestCase.assertSameIterator( -// // -// new byte[][] { new byte[] { 0x01 }, new byte[] { 0x01 }, -// new byte[] { 0x01 }, new byte[] { 0x01 } }, htree -// .lookupAll(new byte[] { 0x01 })); -// AbstractBTreeTestCase.assertSameIterator(// -// new byte[][] {}, htree.lookupAll(new byte[] { 0x02 })); +// // htree.insert(k20, v20); +// // verify that [a] will not accept an insert. +// assertFalse(a.insert(k20, v20, root/* parent */, 0/* buddyOffset */)); +// // split the root directory page. +// htree.splitDirectoryPage(root/* oldParent */, +// 0/* buddyOffset */, 2/*splitBits*/, a/* child */); +// assertEquals("nnodes", 2, htree.getNodeCount()); +// assertEquals("nleaves", 3, htree.getLeafCount()); // unchange +// assertEquals("nentries", 4, htree.getEntryCount()); // unchanged +// htree.dump(Level.ALL, System.err, true/* materialize */); +// assertTrue(root == htree.getRoot()); +// assertEquals(4, root.childRefs.length); +// final DirectoryPage d = (DirectoryPage) root.childRefs[0].get(); +// assertTrue(d == (DirectoryPage) root.childRefs[0].get()); +// assertTrue(d == (DirectoryPage) root.childRefs[1].get()); +// assertTrue(b == (BucketPage) root.childRefs[2].get()); +// assertTrue(b == (BucketPage) root.childRefs[3].get()); +// assertEquals(4, d.childRefs.length); +//// final BucketPage e = (BucketPage) d.childRefs[1].get(); +// assertTrue(a == (BucketPage) d.childRefs[0].get()); +// assertTrue(a == (BucketPage) d.childRefs[1].get()); +// assertTrue(c == (BucketPage) d.childRefs[2].get()); +// assertTrue(c == (BucketPage) d.childRefs[3].get()); +// assertEquals(2, root.getGlobalDepth()); +// assertEquals(1, d.getGlobalDepth()); +// assertEquals(1, a.getGlobalDepth());// recomputed! +// assertEquals(1, e.getGlobalDepth());// same as [a] (recomputed). +// assertEquals(1, b.getGlobalDepth());// unchanged. +// assertEquals(2, c.getGlobalDepth());// recomputed! +// assertEquals(2, a.getKeyCount()); +// assertEquals(2, a.getValueCount()); +//// assertEquals(3, e.getKeyCount()); +//// assertEquals(3, e.getValueCount()); +// assertEquals(0, b.getKeyCount()); +// assertEquals(0, b.getValueCount()); +// assertEquals(0, c.getKeyCount()); +// assertEquals(0, c.getValueCount()); +// assertTrue(htree.contains(k1)); +// assertTrue(htree.contains(k2)); +// assertTrue(htree.contains(k3)); +// assertTrue(htree.contains(k4)); +// assertTrue(htree.contains(k20)); +// assertFalse(htree.contains(unused)); +// assertEquals(v1, htree.lookupFirst(k1)); +// assertEquals(v2, htree.lookupFirst(k2)); +// assertEquals(v3, htree.lookupFirst(k3)); +// assertEquals(v4, htree.lookupFirst(k4)); +// assertEquals(v20, htree.lookupFirst(k20)); +// assertNull(htree.lookupFirst(unused)); +// AbstractBTreeTestCase.assertSameIterator(new byte[][] { v1 }, htree +// .lookupAll(k1)); +// AbstractBTreeTestCase.assertSameIterator(new byte[][] { v2 }, htree +// .lookupAll(k2)); +// AbstractBTreeTestCase.assertSameIterator(new byte[][] { v3 }, htree +// .lookupAll(k3)); +// AbstractBTreeTestCase.assertSameIterator(new byte[][] { v4 }, htree +// .lookupAll(k4)); +// AbstractBTreeTestCase.assertSameIterator(new byte[][] { v20 }, htree +// .lookupAll(k20)); +// AbstractBTreeTestCase.assertSameIterator(new byte[][] {}, htree +// .lookupAll(unused)); +// TODO Continue test (perhaps only for the high level variant). +// /* +// * htree.insert() variant. +// */ +// assertEquals("nnodes", 2, htree.getNodeCount()); +// assertEquals("nleaves", 4, htree.getLeafCount()); +// assertEquals("nentries", 5, htree.getEntryCount()); +// htree.dump(Level.ALL, System.err, true/* materialize */); +// assertTrue(root == htree.getRoot()); +// assertEquals(4, root.childRefs.length); +// final DirectoryPage d = (DirectoryPage) root.childRefs[0].get(); +// assertTrue(d == (DirectoryPage) root.childRefs[0].get()); +// assertTrue(c == (BucketPage) root.childRefs[1].get()); +// assertTrue(b == (BucketPage) root.childRefs[2].get()); +// assertTrue(b == (BucketPage) root.childRefs[3].get()); +// assertEquals(4, d.childRefs.length); +// final BucketPage e = (BucketPage) d.childRefs[1].get(); +// assertTrue(a == (BucketPage) d.childRefs[0].get()); // FIXME This shows that the buddy references do not have to be contiguous! +// assertTrue(e == (BucketPage) d.childRefs[1].get()); +// assertTrue(a == (BucketPage) d.childRefs[2].get()); +// assertTrue(a == (BucketPage) d.childRefs[3].get()); +// assertEquals(2, root.getGlobalDepth()); +// assertEquals(1, d.getGlobalDepth()); +// assertEquals(1, a.getGlobalDepth());// recomputed! +// assertEquals(1, e.getGlobalDepth());// same as [a] (recomputed). +// assertEquals(1, b.getGlobalDepth());// unchanged. +// assertEquals(2, c.getGlobalDepth());// recomputed! +// assertEquals(2, a.getKeyCount()); +// assertEquals(2, a.getValueCount()); +// assertEquals(3, e.getKeyCount()); +// assertEquals(3, e.getValueCount()); +// assertEquals(0, b.getKeyCount()); +// assertEquals(0, b.getValueCount()); +// assertEquals(0, c.getKeyCount()); +// assertEquals(0, c.getValueCount()); +// assertTrue(htree.contains(k1)); +// assertTrue(htree.contains(k2)); +// assertTrue(htree.contains(k3)); +// assertTrue(htree.contains(k4)); +// assertTrue(htree.contains(k20)); +// assertFalse(htree.contains(unused)); +// assertEquals(v1, htree.lookupFirst(k1)); +// assertEquals(v2, htree.lookupFirst(k2)); +// assertEquals(v3, htree.lookupFirst(k3)); +// assertEquals(v4, htree.lookupFirst(k4)); +// assertEquals(v20, htree.lookupFirst(k20)); +// assertNull(htree.lookupFirst(unused)); +// AbstractBTreeTestCase.assertSameIterator(new byte[][] { v1 }, htree +// .lookupAll(k1)); +// AbstractBTreeTestCase.assertSameIterator(new byte[][] { v2 }, htree +// .lookupAll(k2)); +// AbstractBTreeTestCase.assertSameIterator(new byte[][] { v3 }, htree +// .lookupAll(k3)); +// AbstractBTreeTestCase.assertSameIterator(new byte[][] { v4 }, htree +// .lookupAll(k4)); +// AbstractBTreeTestCase.assertSameIterator(new byte[][] { v20 }, htree +// .lookupAll(k20)); +// AbstractBTreeTestCase.assertSameIterator(new byte[][] {}, htree +// .lookupAll(unused)); +// +// /* FIXME Update comments to reflect the example (and update the +// * worksheet as well). +// * +// * FIXME We MUST NOT decrease the localDepth of (a) since that would +// * place duplicate keys into different buckets (lost retrival). +// * Since (a) can not have its localDepth decreased, we need to have +// * localDepth(d=2) with one pointer to (a) to get localDepth(a:=2). +// * That makes this a very complicated example. It would be a lot +// * easier if we started with distinct keys such that the keys in (a) +// * could be redistributed. This case should be reserved for later +// * once the structural mutations are better understood. +// * +// * 5. Insert 0x20. This key is directed into the same buddy bucket +// * as the 0x01 keys that we have been inserting. That buddy bucket +// * is already full and, further, it is the only buddy bucket on the +// * page. Since we are not inserting a duplicate key we can split the +// * buddy bucket (rather than doubling the size of the bucket). +// * However, since global depth == local depth (i.e., only one buddy +// * bucket on the page), this split will introduce a new directory +// * page. The new directory page basically codes for the additional +// * prefix bits required to differentiate the two distinct keys such +// * that they are directed into the appropriate buckets. +// * +// * The new directory page (d) is inserted one level below the root +// * on the path leading to (a). The new directory must have a local +// * depth of ONE (1), since it will add a one bit distinction. Since +// * we know the global depth of the root and the local depth of the +// * new directory page, we solve for npointers := 1 << (globalDepth - +// * localDepth). This tells us that we will have 2 pointers in the +// * root to the new directory page. +// * +// * The precondition state of root is {a,c,b,b}. Since we know that +// * we need npointers:=2 pointers to (d), this means that we will +// * copy the {a,c} references into (d) and replace those references +// * in the root with references to (d). The root is now {d,d,b,b}. d +// * is now {a,a;c,c}. Since d has a local depth of 1 and address bits +// * of 2, it is comprised of two buddy hash tables {a,a} and {c,c}. +// * +// * Linking in (d) has also changes the local depths of (a) and (c). +// * Since they each now have npointers:=2, their localDepth is has +// * been reduced from TWO (2) to ONE (1) (and their transient cached +// * depth values must be either invalidated or recomputed). Note that +// * the local depth of (d) and its children (a,c)) are ONE after this +// * operation so if we force another split in (a) that will force a +// * split in (d). +// * +// * Having introduced a new directory page into the hash tree, we now +// * retry the insert. Once again, the insert is directed into (a). +// * Since (a) is still full it is split. (d) is now the parent of +// * (a). Once again, we have globalDepth(d=1)==localDepth(a=1) so we +// * need to split (d). However, since localDepth(d=1) is less than +// * globalDepth(root=2) we can split the buddy hash tables in (d). +// * This will require us to allocate a new page (f) which will be the +// * right sibling of (d). There are TWO (2) buddy hash tables in (d). +// * They are now redistributed between (d) and (f). We also have to +// * update the pointers to (d) in the parent such that 1/2 of them +// * point to the new right sibling of (d). Since we have changed the +// * #of pointers to (d) (from 2 to 1) the local depth of (d) (and of +// * f) is now TWO (2). Since globalDepth(d=2) is greater than +// * localDepth(a=1) we can now split (a) into (a,e), redistribute the +// * tuples in the sole buddy page (a) between (a,e) and update the +// * pointers in (d) to (a,a,e,e). +// * +// * Having split (d) into (d,f), we now retry the insert. This time +// * the insert is directed into (e). There is room in (e) (it is +// * empty) and the tuple is inserted without further mutation to the +// * structure of the hash tree. +// * +// * TODO At this point we should also prune the buckets [b] and [c] +// * since they are empty and replace them with null references. +// * +// * TODO The #of new directory levels which have to be introduced +// * here is a function of the #of prefix bits which have to be +// * consumed before a distinction can be made between the existing +// * key (0x01) and the new key (0x20). With an address space of 2 +// * bits, each directory level examines the next 2-bits of the key. +// * The key (x20) was chosen since the distinction can be made by +// * adding only one directory level (the keys differ in the first 4 +// * bits). [Do an alternative example which requires recursive splits +// * in order to verify that we reenter the logic correctly each time. +// * E.g., by inserting 0x02 rather than 0x20.] +// * +// * TODO Do an example in which we explore an insert which introduces +// * a new directory level in a 3-level tree. This should be a driven +// * by a single insert so we can examine in depth how the new +// * directory is introduced and verify whether it is introduced below +// * the root or above the bucket. [I believe that it is introduced +// * immediately below below the root. Note that a balanced tree, +// * e.g., a B-Tree, introduces the new level above the root. However, +// * the HTree is intended to be unbalanced in order to optimize +// * storage and access times to the parts of the index which +// * correspond to unequal distributions in the hash codes.] +// */ +//// htree.insert(new byte[] { 0x20 }, new byte[] { 0x20 }); +//// assertEquals("nnodes", 2, htree.getNodeCount()); +//// assertEquals("nleaves", 4, htree.getLeafCount()); +//// assertEquals("nentries", 5, htree.getEntryCount()); +//// htree.dump(Level.ALL, System.err, true/* materialize */); +//// assertTrue(root == htree.getRoot()); +//// assertEquals(4, root.childRefs.length); +//// final DirectoryPage d = (DirectoryPage)root.childRefs[0].get(); +//// assertTrue(d == (DirectoryPage) root.childRefs[0].get()); +//// assertTrue(d == (DirectoryPage) root.childRefs[1].get()); +//// assertTrue(b == (BucketPage) root.childRefs[2].get()); +//// assertTrue(b == (BucketPage) root.childRefs[3].get()); +//// assertEquals(4, d.childRefs.length); +//// final BucketPage e = (BucketPage)d.childRefs[1].get(); +//// assertTrue(a == (BucketPage) d.childRefs[0].get()); +//// assertTrue(e == (BucketPage) d.childRefs[1].get()); +//// assertTrue(c == (BucketPage) d.childRefs[2].get()); +//// assertTrue(c == (BucketPage) d.childRefs[3].get()); +//// assertEquals(2, root.getGlobalDepth()); +//// assertEquals(2, d.getGlobalDepth()); +//// assertEquals(2, a.getGlobalDepth());// unchanged +//// assertEquals(2, e.getGlobalDepth());// same as [a]. +//// assertEquals(1, b.getGlobalDepth());// unchanged. +//// assertEquals(2, c.getGlobalDepth());// unchanged. +//// assertTrue(htree.contains(new byte[] { 0x01 })); +//// assertFalse(htree.contains(new byte[] { 0x02 })); +//// assertEquals(new byte[] { 0x01 }, htree +//// .lookupFirst(new byte[] { 0x01 })); +//// assertNull(htree.lookupFirst(new byte[] { 0x02 })); +//// AbstractBTreeTestCase.assertSameIterator( +//// // +//// new byte[][] { new byte[] { 0x01 }, new byte[] { 0x01 }, +//// new byte[] { 0x01 }, new byte[] { 0x01 } }, htree +//// .lookupAll(new byte[] { 0x01 })); +//// AbstractBTreeTestCase.assertSameIterator(// +//// new byte[][] {}, htree.lookupAll(new byte[] { 0x02 })); + // TODO REMOVE (or test suite for remove). // TODO Continue progression here? @@ -621,6 +693,198 @@ } /** + * Unit test for + * {@link HTree#splitDirectoryPage(DirectoryPage, int, int, com.bigdata.htree.HTree.AbstractPage)} + * . + */ + public void test_splitDir_addressBits2_splitBits1() { + + final int addressBits = 2; + + final IRawStore store = new SimpleMemoryRawStore(); + + try { + + final byte[] k1 = new byte[]{0x01}; + final byte[] k2 = new byte[]{0x02}; + final byte[] k3 = new byte[]{0x03}; + final byte[] k4 = new byte[]{0x04}; + final byte[] k20 = new byte[]{0x20}; + + final byte[] v1 = new byte[]{0x01}; + final byte[] v2 = new byte[]{0x02}; + final byte[] v3 = new byte[]{0x03}; + final byte[] v4 = new byte[]{0x04}; + final byte[] v20 = new byte[]{0x20}; + + final HTree htree = new HTree(store, addressBits); + + final DirectoryPage root = htree.getRoot(); + + htree.insert(k1, v1); + htree.insert(k2, v2); + htree.insert(k3, v3); + htree.insert(k4, v4); + + /* + * Verify preconditions for the unit test. + */ + assertTrue(root == htree.getRoot()); + final BucketPage a = (BucketPage) root.childRefs[0].get(); + final BucketPage c = (BucketPage) root.childRefs[1].get(); + final BucketPage b = (BucketPage) root.childRefs[2].get(); + assertTrue(a == root.childRefs[0].get()); + assertTrue(c == root.childRefs[1].get()); + assertTrue(b == root.childRefs[2].get()); + assertTrue(b == root.childRefs[3].get()); + assertEquals(2, root.globalDepth); + assertEquals(2, a.globalDepth); + assertEquals(2, c.globalDepth); + assertEquals(1, b.globalDepth); + + /* + * Force a split of the root directory page. Note that (a) has the + * same depth as the directory page we want to split, which is a + * precondition for being able to split the directory. + */ + + // verify that [a] will not accept an insert. + assertFalse(a + .insert(k20, v20, root/* parent */, 0/* buddyOffset */)); + + // split the root directory page. + htree.splitDirectoryPage(root/* oldParent */, 0/* buddyOffset */, + 1/* splitBits */, a/* child */); + + /* + * Verify post-conditions. + */ + + assertEquals("nnodes", 2, htree.getNodeCount()); + assertEquals("nleaves", 3, htree.getLeafCount()); // unchanged + assertEquals("nentries", 4, htree.getEntryCount()); // unchanged + + assertTrue(root == htree.getRoot()); + final DirectoryPage d = (DirectoryPage) root.childRefs[0].get(); + assertTrue(d == root.childRefs[0].get()); + assertTrue(d == root.childRefs[1].get()); + assertTrue(b == root.childRefs[2].get()); + assertTrue(b == root.childRefs[3].get()); + assertTrue(a == d.childRefs[0].get()); + assertTrue(a == d.childRefs[1].get()); + assertTrue(c == d.childRefs[2].get()); + assertTrue(c == d.childRefs[3].get()); + assertEquals(2, root.globalDepth); + assertEquals(1, d.globalDepth); + assertEquals(1, a.globalDepth); + assertEquals(1, c.globalDepth); + assertEquals(1, b.globalDepth); + + } finally { + + store.destroy(); + + } + + } + + /** + * Unit test for + * {@link HTree#splitDirectoryPage(DirectoryPage, int, int, com.bigdata.htree.HTree.AbstractPage)} + * . + */ + public void test_splitDir_addressBits2_splitBits2() { + + final int addressBits = 2; + + final IRawStore store = new SimpleMemoryRawStore(); + + try { + + final byte[] k1 = new byte[]{0x01}; + final byte[] k2 = new byte[]{0x02}; + final byte[] k3 = new byte[]{0x03}; + final byte[] k4 = new byte[]{0x04}; + final byte[] k20 = new byte[]{0x20}; + + final byte[] v1 = new byte[]{0x01}; + final byte[] v2 = new byte[]{0x02}; + final byte[] v3 = new byte[]{0x03}; + final byte[] v4 = new byte[]{0x04}; + final byte[] v20 = new byte[]{0x20}; + + final HTree htree = new HTree(store, addressBits); + + final DirectoryPage root = htree.getRoot(); + + htree.insert(k1, v1); + htree.insert(k2, v2); + htree.insert(k3, v3); + htree.insert(k4, v4); + + /* + * Verify preconditions for the unit test. + */ + assertTrue(root == htree.getRoot()); + final BucketPage a = (BucketPage) root.childRefs[0].get(); + final BucketPage c = (BucketPage) root.childRefs[1].get(); + final BucketPage b = (BucketPage) root.childRefs[2].get(); + assertTrue(a == root.childRefs[0].get()); + assertTrue(c == root.childRefs[1].get()); + assertTrue(b == root.childRefs[2].get()); + assertTrue(b == root.childRefs[3].get()); + assertEquals(2, root.globalDepth); + assertEquals(2, a.globalDepth); + assertEquals(2, c.globalDepth); + assertEquals(1, b.globalDepth); + + /* + * Force a split of the root directory page. Note that (a) has the + * same depth as the directory page we want to split, which is a + * precondition for being able to split the directory. + */ + + // verify that [a] will not accept an insert. + assertFalse(a + .insert(k20, v20, root/* parent */, 0/* buddyOffset */)); + + // split the root directory page. + htree.splitDirectoryPage(root/* oldParent */, 0/* buddyOffset */, + 2/* splitBits */, a/* child */); + + /* + * Verify post-conditions. + */ + + assertEquals("nnodes", 2, htree.getNodeCount()); + assertEquals("nleaves", 3, htree.getLeafCount()); // unchanged + assertEquals("nentries", 4, htree.getEntryCount()); // unchanged + + assertTrue(root == htree.getRoot()); + final DirectoryPage d = (DirectoryPage) root.childRefs[0].get(); + assertTrue(d == root.childRefs[0].get()); + assertTrue(c == root.childRefs[1].get()); + assertTrue(b == root.childRefs[2].get()); + assertTrue(b == root.childRefs[3].get()); + assertTrue(a == d.childRefs[0].get()); + assertTrue(a == d.childRefs[1].get()); + assertTrue(a == d.childRefs[2].get()); + assertTrue(a == d.childRefs[3].get()); + assertEquals(2, root.globalDepth); + assertEquals(2, d.globalDepth); + assertEquals(0, a.globalDepth); + assertEquals(2, c.globalDepth); + assertEquals(1, b.globalDepth); + + } finally { + + store.destroy(); + + } + + } + + /** * Test of basic insert, lookup, and split page operations (including when a * new directory page must be introduced) using an address space with only * TWO (2) bits. This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |