BTree - getLastKey() and getCountInRange()?

2005-07-06
2013-06-03
  • Bryan Thompson
    Bryan Thompson
    2005-07-06

    Hello,

    Does anyone have patches for BTree that return the last valid key in the index or that efficiently returns the #of keys in a given index range?

    Thanks,

    -bryan

     
    • Kevin Day
      Kevin Day
      2005-07-06

      I use the following code to get the last key:

              TupleBrowser browser = index.browse(null);
              Tuple tuple = new Tuple();
              Object lastKey = null;
              if (browser.getPrevious(tuple))
                  lastKey = tuple.getKey();

      I suppose this could be added to the BTree class, but it's pretty simple as is.

      I haven't had the need to get a count between two keys, but iterating using the browser object should be pretty fast.

      - K

       
      • Bryan Thompson
        Bryan Thompson
        2005-07-06

        Kevin,

        That works fine for getLastKey() - thanks!

        I am interested in "directly" knowing the #of keys in a range when there might be a large #of entries between two keys.  I am currently scanning the keys.  My values are all recids expressed as Long and I am using the LongSerializer instance to make that as efficient as possible.  Still, this is not efficient when there are a large #of keys.

        Thanks,

        -bryan

         
        • Kevin Day
          Kevin Day
          2005-07-06

          OK - the BTree data structure (as implemented) kind of requires that the keys and values be deserialized.

          I'm pretty sure it is going to be very tough for you to get a count like that with the current implementation.  The problem is that in order to get a valid count of a page, you have to deserialize the page.

          I suppose that you could play some games with the value serializer to "turn it off" during certain operations (i.e. return a null instead of doing the actual deserialization).  I do have a patch that allows you to obtain the key and value serializer of a restored BTree - but I think that doing this would be a bad idea (you could easily wind up with a tree that has an inconsistent state, and if the tree decides to store itself, you could lose data)....

          Before messing with that, I would recommend that you spend some time determining exactly where the performance hit is coming from. 

          Are you using a TupleBrowser to walk the tree when you get your count?  I'm asking because I have to delete huge ranges of values from several BTrees in our application, and the performance is quite good...

          Have you tried optimizing the size of your pages?

          Another trick you can use (if disk space isn't at a huge premium) is to have multiple BTrees.  In my app, for example, I use 10 BTree objects.  1 is a primary key that actually stores the data, the other 9 are indexes into that key.  I use a class I created (EmptyValueSerializer) so I don't take a hit in the serialization and deserialization when walking the non-primary indexes...

          Hope this helps!

          - K