Menu

scalability of jdmb.btree.BTree?

2001-07-28
2002-09-23
  • Amund Tveit

    Amund Tveit - 2001-07-28

    I was wondering how scalable the above mentioned class is? E.g. Does it is easily handle 1 Billion records or so? Are there any benchmarks of jdbm available?

    (My intention is to use it to store web user clickstreams)

    Thanks!

    Amund

     
    • Alex Boisvert

      Alex Boisvert - 2001-07-28

      Yes, the B+Tree data structure is scalable.

      I've tested the B+Tree myself up to 10 milion records, see my earlier post in the mailing list archive:

      http://www.geocrawler.com/archives/3/2965/2001/6/0/5881521/

      I'd be interested in results if you test with more than 2 bilion records (2^32 / 2, the point where java integers overflow).
      cheers,
      alex

       
    • Amund Tveit

      Amund Tveit - 2001-10-26

      Are there ways to reduce the fragmentation of the jdmb storage file(s)? The file size seems to increase a lot when doing a lot of updates (but not adds), and I believe the file is being fragmented?

      Thanks!

      Amund

      (Regarding the 2+ billion records test, I haven't tested that yet :-)

       
      • Alex Boisvert

        Alex Boisvert - 2001-10-26

        The data structure to manage physical block allocation  isn't optimal yet.  It consists of a "free list" of available data blocks which have previously been allocated (LILO).  When a new record is inserted, the list is traversed and the first block with enough available space is used.

        There are 2 drawbacks with this implementation:

        1) space is never reclaimed

        2) fragmentation will gradually rise if a mix of small/large records is used (depends on the exact access pattern).  My guess is that fragmentation will stabilize at about 20-30% overhead for most uses.

        Now, I can think of 3 possible short-term solutions.  (The ideal case being to enhance the current free list management, but would likely to break file compatibility with prior JDBM versions)

        1) if you are storing variable-lengh records, try to pre-allocate enough space in the original record  (ie. if using byte[], use a maximum-sized byte[] to start with)

        2) we could add an optional argument to RecordManager.insert() to specify record space pre-allocation, in order to avoid re-allocation of the record when it grows.  Ugly, but necessary for performance optimization.

        3) implement record file compaction (been on the TODO list for a little while...)

        Option 1), you can do on your own;  Option 2) requires little change to the code and could be done quickly;  Option 3) would require somewhat more time.

        Or, again, we can fix the problem at the source and rewrite the physical record manager.  As always, I'm open to suggestions and code submissions :)

        What do you think?

        alex

         
    • Rx Cx

      Rx Cx - 2002-08-16

      Can you provide me with pointers on what needs to be changed in the btree/recman source etc to scale beyond sizeof(int) ... i.e. beyond 2 billion records ?

      Any suggestions will be appreciated. Thanks.

      - Rob

       
      • Alex Boisvert

        Alex Boisvert - 2002-08-17

        JDBM is not currently limited to a 32 bit address space.

        Records are identified using a "long" (64-bit integer) and refer to logical record id's, which are mapped to physical record id's through special mapping pages.

        Physical record id's are a (block, offset) tuple  that point to the place where the record is located.   The block is a "long" (again 64 bits) and the offset is a "short" (16 bits) -- limiting the block size to 32K.

        The reason JDBM uses logical record id's is to allow transparent relocation (from a client perspective) when the record's size changes and to support compaction of the record file (not implemented yet).

        BTrees only rely on the logical record id's and so are shielded from physical limits.

        My perspective on this is that you are most likely to run into platform issues -- like maximum file size and available disk space --  before reaching JDBM limits.

        regards,
        alex

         
    • Rx Cx

      Rx Cx - 2002-08-19

      Okay ... that sounds great ... but I'm still curious about your earlier statement that said ...

      "I'd be interested in results if you test with more than 2 bilion records (2^32 / 2, the point where java integers overflow)."

      Why would this be a concern if 32-bit integers (upto a max of 2^32/2) were not used in the JDBM implementation?

       
      • Alex Boisvert

        Alex Boisvert - 2002-08-19

        I was just looking for factual evidence that it works.   I'm not too concerned with the JDBM code but still, it's nice to know it actually works.  Sometimes the platform plays some tricks on you...

        alex

         
    • Rx Cx

      Rx Cx - 2002-09-18

      That's reassuring.

      Another question related to the BTree storage.
      Are there any tricks that I could employ to minimize the size of the BTree storage? If I have a class that is packaged as com.foo1.foo2.foo3.foo4.MyClass, even if the instance being written contains an optimized set of bytes (by overriding writeObject), it appears as though the BTree stores the fully qualified name of the entire class for every single instance.

      Any pointers on how I could update the BTree (I guarantee that all BTree keys and values will always be of a single type) so that the fully qualified name of the key and the value is only written ONCE rather than for every single instance? I assume that the key and value objects are being serialized using oos.writeObject(oKey) and oos.writeObject(oValue) and I guess this would need to change someplace?

       
      • Alex Boisvert

        Alex Boisvert - 2002-09-18

        The version of JDBM currently in CVS allows you to store byte arrays literally in the BTree.  This means you can do you own serialization to/from a byte[] and give this representation to the BTree.

        This is very convenient for indexes using primitive types  (such as integers or string) where you can convert the type into just a few bytes and avoid the overhead of full object serialization.

         
        • Rx Cx

          Rx Cx - 2002-09-18

          Is this version in CVS likely to be made into a 'release' sometime soon?

          What would I need to change in my serializable class (say class Foo) to return the byte[ ] representation of the class? What callback method do I need to write in my class to return the byte[ ]? Or would I need to implement a new type of serializable interface? Any documentation or pointers to this would be appreciated.

          Thanks!

           
          • Rx Cx

            Rx Cx - 2002-09-18

            Ok ... I went through the source.

            It seems as though the existing methods for insert(...), find(...), etc have been replaced with Object arguments to byte[ ] arguments. This would break compatibility with my existing application that calls into the insert(...), remove(...), find(...) methods. :(

            I could re-engineer my application to reuse the new methods but it would've been nice if you introduced a new interface ... perhaps ... jdbm.helper.ByteArraySerializable ... that defines abstract methods for ... byte[ ] getBytes( ) and void createFromBytes(byte[ ]) ... for each object. Hence, while my existing objects support the java.io.Serializable interface, I could've extended them to support the new interface so that byte serialization is done transparent to the BTree method definitions.

            Comments?

             
            • Rx Cx

              Rx Cx - 2002-09-18

              Just realized that backward compatibility has been provided using the newly created ObjectBTree wrapper. I'll work on updating my application to use the newly provided byte[ ] methods directly from BTree.

              What is the release cycle for JDBM? There hasn't been a release after version 0.12? Any planned date on when 0.13 may be 'released'?

              P.S. Sorry for bombarding this thread with so many messages.

               
              • Alex Boisvert

                Alex Boisvert - 2002-09-20

                I've been planning a new release for a long time now but didn't get the time yet to do one.  There's always new stuff that I want to put in before the release :)

                The most important concern right now is to get most of the changes in that will affect the file format / structures format, into the next release to avoid breaking backwards compatibility all the time.  One such change that will provoke a structure change will be key compression in the BTree.

                If you don't mind too much about file format compatibility from one version to the next, you can start using the CVS version now.

                And feel free to post messages, that's what the discussion board is about!

                alex

                 
                • Rx Cx

                  Rx Cx - 2002-09-23

                  The fileformat doesn't necessarily have to be backward compatible at a byte level but w.r.t. the API, I would expect it to work ... meaning ...

                  If I make a call to ...

                  1. new Btree( )
                  2. insert(byte[] baKey, byte[] baValue, etc)
                  3. close

                  (write to my DB using the CVS version)

                  and then, subsequently, make a call to ...

                  1. Btree.load( )
                  2. Btree.browse using tuples
                  3. Successfully retrieve my baKey and baValue
                  4. close

                  (read from my DB using the future Released version)

                  ... would this work? Or are you going to be playing around with further optimization of the fileformat (for size/speed, etc) prior to release resulting in breaking compatibility for the above mentioned sequence of operations?

                  Our product would be released sometime at the end of October ... any chance of releasing jdbm 0.13 with all the cool new features by then?

                   

Log in to post a comment.