CVS Commit: Packed long serializer

Kevin Day
2005-08-25
2013-06-03
  • Kevin Day
    Kevin Day
    2005-08-25

    Just commited a new serializer (LongPackedSerializer) and some supporting methods in the Conversion class (these may be useful for others creating custom serializers that need to efficiently store longs).

    You should be able to just replace your LongSerializer constructor with LongPackedSerializer, and start running.

    The general principle is that we are giving up 2 bits worth of long value space to use as markers for how many bytes were actually used to store the long value.  If the value is relatively small, then it will only take 2 bytes, a bit larger, it will take 4, until some threshold is passed, after which it takes the full 8 bytes.

    This allows you to have huge files without paying the price in storage efficiency for smaller databases.

    This approach won't help much if you are using random longs, or anything like that - but if you are using a sequential counter, it can save a huge amount of space.

    In addition to the above algorithms and serializer, I have another data type called MultiPartLongValue - this allows efficient storage of arrays of long values, which can be very useful for multi-part keys in a BTree.

    Right now, I'm keeping the implementation in my project (Where I definitely need it), but if you all think there may be general interest in it, I could include it.

    Along these lines, if we come up with specialized value classes like this (i.e. a class that represents the value, plus a comparitor and serializer), that aren't really needed for jdbm to function, I wonder if we shouldn't have a separate package, or even maybe a separate distro, that includes them....

    Thoughts?

    - K

     
    • Bryan Thompson
      Bryan Thompson
      2005-08-27

      Hello,

      Does the record manager guarantee a null return if you attempt to fetch a record using the recid of a record that has since been deleted?

      Thanks,

      -bryan

       
      • Bryan Thompson
        Bryan Thompson
        2005-08-27

        Hello,

        I have written two proposed tests for the behavior of the RecordManager when attempting to fetch a record that is not in the store.  The tests are below.  Please let me know what you think.

        The first test attempts a fetch with a "fake" recid which is derived by inserting an object to get a real recid and then adding 100 to that recid to get a recid which is not valid and then doing a fetch.  This is a little bit dubious as a test, but so is the recman response (double get on block 0).

        The second test inserts a record, commits the transaction, then deletes the record and attempts to fetch the record that was just deleted.  This test seems coherent to me, but the recman response is an EOFException (trying to read from an empty byte[]).  I think that the correct response should be to detect the reference to a deleted record and return null to the application since no such record exists.  I am not quite certain how to detect a deleted record, so I do not have a patch to propose for this test.

        Thanks,

        -bryan

        /*
        * Created on Aug 27, 2005
        */
        package jdbm.recman;

        import jdbm.RecordManager;
        import jdbm.RecordManagerFactory;
        import junit.framework.TestCase;

        import java.io.IOException;

        /**
        * Proposed tests for fetch of (a) a record which was never inserted into the
        * store; and (b) a record which has since been deleted from the store.
        *
        * @author thompsonbry
        */

        public class TestFetchAfterDelete
           extends TestCase
        {

            /**
             * Attempts to fetch a record which was never inserted into the store.
             * The correct behavior is to return a null since the identified object
             * does not exist in the store.
             */

            public void test_fetchUnknownRecord()
                throws IOException
            {
               
                RecordManager recman = RecordManagerFactory.createRecordManager( TestRecordFile.testFileName );

                final long recid = recman.insert( "some data" );
               
                recman.commit();
               
                // Attempt to fetch data using a recid that should be invalid.
               
                assertEquals
                    ( null,
                      recman.fetch( recid + 100 )
                      );
               
            }

            /**
             * Proposed test case for the behavior of the {@link RecordManager} interface
             * when a record is fetched which used to exist in the store but has since
             * been deleted.  The current behavior throws an {@link java.io.EOFException}
             * when trying to read from an empty byte[].  I believe that the correct behavior
             * is to detect a read against a recid that is no longer valid and to return null
             * since there is no such object in the store.
             */
           
            public void test_fetchAfterDelete()
                throws IOException
            {
               
                RecordManager recman = RecordManagerFactory.createRecordManager( TestRecordFile.testFileName );

                Object obj1 = "Hello World!";
               
                final long recid = recman.insert( obj1 );
               
                recman.commit();
               
                assertEquals
                    ( "Hello World!",
                      recman.fetch( recid )
                      );

                recman.delete( recid );
               
                assertEquals
                    ( null,
                      recman.fetch( recid )
                      );
               
            }
          
        }

         
        • Kevin Day
          Kevin Day
          2005-08-27

          Kind of odd to include this in the CVS Commit message thread...

          The developer listserv is working now - I'll respond here, but per Alex's request, I think future development conversations would be best done via the listserv...

          Anyway, I believe that you are running into something that Alex and I have been discussing.  The current implementation of the record manager uses a header with length field = 0 to mark an empty record.  This has a couple of implications:

          1.  You can't store a 0 length byte array
          2.  When you fetch a record that has been "deleted", there is no way for the record manager to know whether it is a 0 length byte array, or a truly deleted record.

          (Alex - I found another place that you'll need to update:   PhysicalRowIdManager#free() )

          If you look at PhysicalRowIdManager#fetch(), you'll see that it returns a 0 size byte array if the record size is 0.  This then gets returned to your deserializer, etc...

          Alex is working on changes that would change the marker for a free record from size() = 0 to be size() = -1, but it's a tricky patch so it may take some time.

          Once that is in place, it should be easy to add a check to see if the header is free or not before returning the empty byte array.

          I think that if the record is marked as free that an exception (or maybe even a runtime error) should be thrown (instead of null).  It is a very bad idea to fetch an invalid recid, as recids can be re-used - I'd rather have the programmer get a big error than have them assume it's OK to rely on a failed fetch to detect if a given recid has been freed at some point in the past.

          - K