Performance/Scalability - CVS version - BTre

Rx Cx
2002-10-11
2003-03-22
  • Rx Cx
    Rx Cx
    2002-10-11

    Has anybody noticed any performance degradation with the newly implemented ObjectBTree and/or the byte[] BTree w.r.t. the previous 0.12 BTree (for Objects) implementation specifically while WRITING objects to the BTree?

    I've switched over my implementation to use the byte[] BTree which has resulted in more compact DB files (for obvious reasons) but the byte[] to Object and viceversa conversions (either using the helper.Serialization class OR my own) seem to cost me a lot in terms of performance.

    I use the following code to convert my Object into a byte[ ] ...

        public static final byte[] getBytes(StorageManager sm, Object o) throws IOException
        {
            // REUSE STRATEGY
            ByteArrayOutputStream baos = sm.getReusableOutputStream();
           
            if (o instanceof ByteArraySerializable)
            {
                ((ByteArraySerializable) o).store(baos);
            }

            byte[] ba = baos.toByteArray( );
            baos.reset( ); // READY FOR REUSE
            sm.unlock(baos);
            return ba;
        }

    ByteArraySerializable (in my app) is an interface that enables any complex object to write primitives to a ByteArrayOutputStream (via a DataOutputStream). Usually my complex objects are relatively small in size (approx 100 - 200 bytes) and shouldnt cause any significant problems.

    Any suggestions on what I could change to improve performance? I'm going to start using some performance monitoring tools to identify bottlenecks soon but in the meantime, does there seem to be anything very wrong in what I'm doing?

    I would expect performance to go down by a bit due to additional byte[ ] conversions required but I'm looking at some really disappointing numbers (approx 220% more time taken with the CVS source than my previous implementation using JDBM 0.12 with direct object serialization).

    Also, is a BTree restricted to holding sizeof(int)/2 rows of data? The reason I ask this is because the size( ) method in BTree returns a value of type int.

     
    • Alex Boisvert
      Alex Boisvert
      2002-10-11

      I do not see any performance flaw in the code that you've presented.

      I would be really interested in hard data supporting your observation.  If performance is indeed slower, it would indicate that JVM serialization (usally done natively) is far better than a custom serialization approach.  This wasn't the case last time I checked.  Externalization, for instance, was about 30-40% faster than plain Serialization.

      The size limit of the BTree is an overlook.  I should change it to a long.

      alex

       
    • Rx Cx
      Rx Cx
      2002-10-11

      At a higher level, I'm working on building a document generation tool. Here are the numbers that I've found...

      For a document containing 5313 pages (with 100 elements per page) for the same hardware config and memory state, here are the results that I got...

      1. Using JDBM 0.12 and object ser, it takes me 4:00 minutes to generate this document
      [Uses 8 BTrees (Object by default)]

      2. Using the CVS version of JDBM for byte[ ] serialization, it takes me 09:34 minutes to generate this document
      [Uses 7 byte[] BTrees and 1 ObjectBTree]

      And really, the difference between 1 and 2 listed above involves the work that I've done in the new code that I've listed previously ... i.e. in providing for a byte[ ] representation of objects ... incidentally, this conversion is ALSO done in the Comparators because the obj1 and obj2 arguments now come in as byte[ ] ... so I have to convert the byte[ ] into an object to do accurately perform comparisons on my complex object (variable length) keys.

      My completed document contains a merge of all these BTrees and I've updated the load method to take in a new argument called 'offset' that is capable of translating all BTree reads from an offset (in the merged file) while reading the document.

       
    • Melanie Mayer
      Melanie Mayer
      2003-02-20

      I noticed a similar behavior when comparing the jdbm 0.12 with the version from cvs. The cvs version seems to be a lot slower when using ObjectBTrees. I wrote a small Test to compare the two and saw that the 0.12 is about 17 times faster then the one from cvs. Is there something wrong in the way I use the cvs version?
      I attached the Test program and would be interested in any feedback. (To use the two versions in parallel I renamed the package from cvs to jdbm2)

      Test results for inserting 10000 rows:
      jdbm 0.12:    1.9 sec
      jdbm from cvs:    35 sec

      -melanie

      import java.io.IOException;
      import java.util.Properties;

      import junit.framework.Test;
      import junit.framework.TestCase;
      import junit.framework.TestSuite;

      /**
      *
      */
      public class TestJDBM extends TestCase {

          int numberOfObjects_;

          /**
           * Constructor TestJDBM.
           * @param string
           */
          public TestJDBM(String string, int numberOfObjects) {
              super(string);
              numberOfObjects_ = numberOfObjects;
          }

          public static void main(String[] args) {
              junit.textui.TestRunner.run(suite());
          }

          public static Test suite() {

              TestSuite suite = new TestSuite();

              suite.addTest(new TestJDBM("testInsertJDBM", 10000));
              suite.addTest(new TestJDBM("testInsertJDBMFromCVS", 10000));

              return suite;
          }

          public void testInsertJDBMFromCVS() {
              printTestInfo();

              jdbm2.btree.ObjectBTree tree_ = null;
              jdbm2.RecordManager recman_ = null;

              Properties props = new Properties();
              try {
                  recman_ = jdbm2.RecordManagerFactory.createRecordManager("TestDB1"+System.currentTimeMillis(), props);
                  tree_ = jdbm2.btree.ObjectBTree.createInstance(recman_, new jdbm2.helper.LongComparator());

              } catch (IOException e) {
                  e.printStackTrace();
              }

              try {
                  long start = System.currentTimeMillis();
                  for (int i = 0; i < numberOfObjects_; i++) {
                      tree_.insert(new Long(i), new Integer(i), false);
                  }
                  recman_.commit();
                  long end = System.currentTimeMillis();
                  System.out.println("It took "+(end-start)+" ms to insert "+ numberOfObjects_ +" rows");

              } catch (IOException e) {
                  e.printStackTrace();
              }
          }
         
          public void testInsertJDBM() {
              printTestInfo();

              jdbm.btree.BTree tree_ = null;
              jdbm.recman.RecordManager recman_ = null;
              jdbm.helper.ObjectCache cache_;

              Properties props = new Properties();
              try {
                  recman_ = new jdbm.recman.RecordManager("TestDB2"+System.currentTimeMillis());
                  cache_ = new jdbm.helper.ObjectCache(recman_, new jdbm.helper.MRU(100));
                  tree_ = new jdbm.btree.BTree(recman_, cache_, new jdbm.helper.LongComparator());
                

              } catch (IOException e) {
                  e.printStackTrace();
              }

              try {
                  long start = System.currentTimeMillis();
                  for (int i = 0; i < numberOfObjects_; i++) {
                      tree_.insert(new Long(i), new Integer(i), false);
                  }
                  recman_.commit();
                  long end = System.currentTimeMillis();
                   System.out.println("It took "+(end-start)+" to insert "+ numberOfObjects_ +" rows");

              } catch (IOException e) {
                  e.printStackTrace();
              }
          }
         
          void printTestInfo() {
              System.out.println("");
              System.out.println("");
              System.out.println(this +
                  " -----    \nnumberOfObjects:" + numberOfObjects_);
              System.out.println("");
          }

      }

       
      • Alex Boisvert
        Alex Boisvert
        2003-02-20

        Melanie,

        I was able to reproduce your results on my machine with both JDBM 0.12 and the CVS version.  My diagnosis seems to indicate that the CVS version performs a lot of unnecessary serialization to convert objects to/from byte arrays (which is the new "native" persistence format of the BTree).

        I need to investigate and think how to reduce those costs.  Let me come back to you on this.  I should have a better answer and probably a resolution by the end of this weekend.

        alex

         
    • Alex Boisvert
      Alex Boisvert
      2003-03-22

      Melanie,

      I've just committed a large number of changes to JDBM, including a fix to CacheRecordManager which brings back the performance of the CVS version back on par with that of JDBM 0.12.

      Let me know if you still experience performance issues.

      cheers,
      alex

       
      • Alex Boisvert
        Alex Boisvert
        2003-03-22

        Also, I forgot to mention that I've included your test under src/tests/jdbm/btree/TestInsertPerf.java.

        alex