Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

Release?

2005-06-15
2013-06-03
  • Tatu Saloranta
    Tatu Saloranta
    2005-06-15

    The home page lists version 0.12 as the latest; and this is from 2001 (!). Are there plans to make an updated release at some point in near future? I don't mind going to CVS and building a jar, but for many people it's just scary that the last "official" release is from years ago.

     
    • Bryan Thompson
      Bryan Thompson
      2005-06-16

      I have no clue, but I am planning a jdbm-based project in the near future, and I know that YARS (yet another RDF store) was based on jdbm within the last year or so.

      On feature that I would like to see developed is compressed key indicies (where only the increment of the key string is stored).

       
      • kent fitch
        kent fitch
        2005-06-18

        A few years ago I used the JDBM code in a web archiving tool (pageVault:
        http://ausweb.scu.edu.au/aw03/papers/fitch/paper.html, http://www.projectcomputing.com/products/pageVault )

        Our requirements led me to make two changes to JDBM:

        1) key compression - our BTrees store millions of urls as keys and the start of those urls
        have many bytes in common!

        2) one of the pageVault indices is pretty-much montonically increasing (timestamp),
        so the block splitting algorithm was tweaked to allow specification of the split point.

        I think I made these changes to a version of JDBM which was about to be changed
        significantly, and this caused a problem with them being accepted, and I've never
        downloadeda more recently version.  In any case, below are the relevant snippets for
        change 1 (compression) if they are any use to anyone.

        Kent
        ----
        BTree.java:

              /**     added Kent Fitch Project Computing Pty Ltd (kent.fitch@projectcomputing.com) to v 0.12
                of JDBM, June 2002.  The externalisation method writes this byte in the saved
                BTree, but it will not be present in older versions of saved BTree's and hence
                will be set to 0.

                This byte represents one of the available key compression algorithms which
                are used when a BPage page is serialised/deserialised.  It can be set by the
                user when instantiating the BTree, and can be changed later on which will
                set (or remove) the compression algorithm used when serialising BPages from
                them on.  See the BPage externalization methods for more details.
            **/

            protected byte _compressionAlgorithm ;

            public static final byte NO_COMPRESSION = 0 ;

            /** COMMON_STRING_COMPRESSION is an extremely simple compression algorithm
                which works with String keys.  It finds the longest common leading
                string on all keys in a page and stores it once and removes it from
                the keys during the BPage serialisation.  Compression efficiency greatly
                depends on there being long common leading strings in the keys on a page.
            **/

            public static final byte COMMON_STRING_COMPRESSION = 1 ;

            /**
            * Specify the compression algorithm to be used when subsequently saving BPage's belonging
            * to this BTree to disk.
            * <p>
            * Added KKF May 2002
            *
            * @param compressionAlgorithm value 0 turns off compression.  Other values must be
            *    recognised by the BPage implementation or they will be ignored.  -1 cannot be
            *    used.
            */

            public void setCompressionAlgorithm(byte compressionAlgorithm) {

                _compressionAlgorithm = compressionAlgorithm ;
                try {
                        _cache.update( _recid, this );
                }
                catch (IOException e) {} ;
            }

            /**
             * Implement Externalizable interface.
             */
            public void readExternal( ObjectInput in )
            throws IOException, ClassNotFoundException {
                _comparator = (Comparator) in.readObject();
                _height = in.readInt();
                _root = in.readLong();
                _pageSize = in.readInt();
                _size = in.readInt();

             /** added KKF - try to read a compressionAlgorithm flag - 1 byte.  If not present,
                then this BTree was last saved with an old version of the code
                which did not support compression **/

            if (in.available() > 0) {
                _compressionAlgorithm = (byte) in.read() ;
                if (_compressionAlgorithm == -1) _compressionAlgorithm = 0 ; // eof but bytes available!?
            }
            else _compressionAlgorithm = 0 ;   
            }

            /**
             * Implement Externalizable interface.
             */
            public void writeExternal( ObjectOutput out )
            throws IOException {
                out.writeObject( _comparator );
                out.writeInt( _height );
                out.writeLong( _root );
                out.writeInt( _pageSize );
                out.writeInt( _size );

             /** added KKF - write the compressionAlgorithm flag **/
            out.write((int) _compressionAlgorithm) ;
            }

        BPage.java:

            /**
             * Implement Externalizable interface.
             */
            public void readExternal( ObjectInput in )
            throws IOException, ClassNotFoundException {
                int size = in.readInt();

                _isLeaf = in.readBoolean();
                if ( _isLeaf ) {
                    _previous = in.readLong();
                    _next = in.readLong();
                }

                _first = in.readInt();

             _keys = new Object[ size ];
                for ( int i=_first; i<size; i++ ) {
                    _keys[ i ] = in.readObject();
                }

                _values = new Object[ size ];
                for ( int i=_first; i<size; i++ ) {
                    _values[ i ] = in.readObject();
                }

            /** start kkf 20june2002
            Try to determine whether the keys in this page were compressed when
            last written.  If the page was written by an "old" version of JDBM,
            then there will be no more serialised bytes in the objectinput stream.
            **/

            int compressionAlgorithm ;

            if (in.available() > 0) {
                compressionAlgorithm = (byte) in.read() ;
                if (compressionAlgorithm == -1) compressionAlgorithm = 0 ; // eof but bytes available!?
            }
            else compressionAlgorithm = 0 ;

            switch (compressionAlgorithm) {

                case 0:    return ;

                case BTree.COMMON_STRING_COMPRESSION:
                    commonStringDecompression(in) ;
                    break ;

                default:
                    throw new IOException("Unrecognized compression algorithm code:" + compressionAlgorithm) ;
           
             }

          }

            void commonStringDecompression( ObjectInput in )
            throws IOException, ClassNotFoundException {

            // the next object will be the common leading string

            String leadingString = (String) in.readObject() ;
            // we must replace all the keys with new String objects containing this prefix

            for ( int i=_first; i<_keys.length; i++ ) {
                if (_keys[i] == null) continue ;
                _keys[i] = new String(leadingString + _keys[i]) ;    // sorry about that, garbage collector..
            }

            }

            /**
             * Implement Externalizable interface.
             */

            public void writeExternal( ObjectOutput out )
            throws IOException {

                out.writeInt( _keys.length );

                out.writeBoolean( _isLeaf );
                if ( _isLeaf ) {
                    out.writeLong( _previous );
                    out.writeLong( _next );
                }

                out.writeInt( _first );

              switch (_btree._compressionAlgorithm) {

                case BTree.COMMON_STRING_COMPRESSION:
                    commonStringCompression(out) ;
                    break ;
                default:
                    defaultKeyValueSerialisation(out) ;
             }

            }

            void defaultKeyValueSerialisation( ObjectOutput out )    throws IOException {
               
                for ( int i=_first; i<_keys.length; i++ ) {
                        out.writeObject( _keys[ i ] );
                    }
           

                  for ( int i=_first; i<_keys.length; i++ ) {
                        out.writeObject( _values[ i ] );
                  }

                out.write(0) ;    // write the "no compression" flag byte for politeness
            }

            void commonStringCompression( ObjectOutput out ) throws IOException {

            /** kkf 17jun2002

            We use a very simple leading string compression algorithm - ie, we find the
            longest common key to all entries, which we can do by just looking at the
            first and last (non null) key.  A worthwhile variation is to instead store the
            first key in full and record in the 2nd key the number of characters that
            are the same as the first key, and in the 3rd key the number of characters that
            are the same in the 2nd key, and so on.  This generally improves compression
            when the key varies greatly, but costs an extra byte per key (to store the common
            length) and uses more time.  The simple approach taken here works well when there
            is a lot of commonality between the start of the keys on the page.

            This approach only reduces the serialised database size - it does nothing to
            reduce the "in memory" size used by cached pages.  Such a change is much
            more radical and involves changes to the FIND algorithms!!  However, it would
            remove the extra object creation in de/serialisation for each key, so it could
            be worthwhile.

            We check to see we dont make the serialization bigger - under jdk1.3, serializing
            a string takes 3 bytes plus the string contents, and serialising the additional
            Boolean (Bpage doesnt otherwise serialize one) takes 65! bytes.  So, even if we only have 1 byte
            in the common string, we cause 68 bytes to be serialized but save 1 for every index
            entry, so we may not always be ahead!
           
            Is it worth it?  Not sure - certainly disk space will be saved, but de/serialisation
            creates more objects/garbage and uses more CPU!
            **/

            int commonLeadingStringLength = 0 ;
            String commonLeadingString = null ;
           
            if (_keys[_first] instanceof String) {
                String firstString = (String) _keys[_first] ;
                String lastString = null ;
                // find last non-null string
                  for ( int i=_keys.length-1; i>=_first; i-- ) {
                    if (_keys[i] == null) continue ;
                    if (i == _first) break ;         // only one key!
                    lastString = (String) _keys[i] ;
                    break ;
                }

                if ((firstString != null) && (lastString != null)) {    // calculate common key
                    int len = Math.min(firstString.length(), lastString.length()) ;
                    for (int i=0;i<len;i++) {
                        if (firstString.charAt(i) != lastString.charAt(i)) break ;
                        commonLeadingStringLength++ ;
                    }           

                    if (commonLeadingStringLength > 0) {

                        // calc overhead we are adding, compare to what we'll save

                        int saving = (_keys.length - _first) * commonLeadingStringLength -
                            (65 + 3 + commonLeadingStringLength) ;    // hmm.. "65" may change with JVM?

                        if (saving > 0) commonLeadingString =
                            firstString.substring(0, commonLeadingStringLength) ;
                        else commonLeadingStringLength = 0 ;
                    }
                }                   
            }

            if (commonLeadingStringLength > 0) {

                for ( int i=_first; i<_keys.length; i++ ) {
                    if (_keys[i] == null) out.writeObject( _keys[ i ] );
                    else out.writeObject(
                        new String(((String) _keys[ i ]).substring(commonLeadingStringLength))) ;           
                   }
                for ( int i=_first; i<_keys.length; i++ ) {
                        out.writeObject( _values[ i ] );
                  }

                out.write((int) _btree._compressionAlgorithm) ;
                out.writeObject(commonLeadingString) ;

            }
            else defaultKeyValueSerialisation(out) ;

            }

         
    • Bryan Thompson
      Bryan Thompson
      2005-07-14

      Hello,

      I am curious whether the strategy to produce a new release is going forward.  I would be very interested in testing any new features, esp. compressed key indices or improved cache policies, that might make it into this release.  The test application would be an RDFS store on which we are comparing performance for different persistent layers and object models.

      Thanks,

      -bryan

       
      • Alex Boisvert
        Alex Boisvert
        2005-07-15

        I will package the current CVS code and make a release this weekend.   It's been long overdue...

        alex

         
        • Cees de Groot
          Cees de Groot
          2005-07-15

          Will you call it 1.0, with or without additives like 'RC', 'beta', ...? I think that's long overdue as well...

           
          • Alex Boisvert
            Alex Boisvert
            2005-07-15

            Yes, it will be named 1.0 and explicitly qualified for "production" use.

             
    • Alex Boisvert
      Alex Boisvert
      2005-08-12

      Alright.  I've finally managed to do a release of JDBM 1.0.  Having recently completed my MBA studies helps...

      Next step is to gather all outstanding patches that do not break compatibility (API and file-level) and create version 1.1RC.

      In parallel, we can start discussing plans for 2.0 which should probably include key compression and optimizations at the file structure level if recent performance measurements prove to be right.

      cheers!
      alex

       
      • Kevin Day
        Kevin Day
        2005-08-12

        Fantastic!

        How shall we gather items for the 1.1 RC and 2.0 wish list?

        Should we add them to the RFE list, or just use the forums?

        - K

         
        • Alex Boisvert
          Alex Boisvert
          2005-08-12

          I don't really have an firm opinion.  Personally, I would opt for the forums or mailing lists simply because I've never used the RFE feature on SourceForge before...

          alex

           
          • Kevin Day
            Kevin Day
            2005-08-12

            I'm fine with that.  If it gets too hard to track in the forums, we can always switch over to the RFE system.

            - K