Re: [Jdbm-general] Sharing objects

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Brian O'Neill wrote:

> I think JDBM is a great little persistence engine with tons of potential.
> Its nice to see such a project like this being actively developed. I do see
> some things that can be improved.

I'm certainly open to improvements, either to the existing core or 
adding extra functionality.

Something I've articulated in the past is that I'd like to see the core 
of JDBM remain small, so that it remains an interesting choice for small 
projects that only want a simple persistence engine.  However, we can 
bundle a number of optional helpers/utilities around it to make it 
attractive to those who need a few more features.

For example, the support for (or emulating) the Java2 Collection classes 
has been discussed here in the past and I still believe it would make a 
nice addition.

> When I save a key or value into a JDBM table, the object is serialized in
> its own stream. If this object has a reference to another shared object, the
> original object graph is not preserved across hash table entries. The
> approach that JDBM uses to serialize also has space overhead from all the
> stream headers and class info written to each record.

You're right.  Currently, the object is serialized in its own stream 
(along with its complete object graph) and converted into a byte[] for 
storage in the RecordManager.

Like you mention, this implies an overhead for each object placed into 
the hash table.  This overhead is roughly 25 bytes for user-defined 
serialized objects.  For strings, it's 7 bytes (including string 
length).   It's not so bad, but can be a concern databases where a large 
number of small objects are stored.

> I like the simple interfaces provided in the HTree and BTree, but since they
> do nothing special to preserve object graphs, I think an interface that
> operates on byte[] keys and values is more flexible. A simple object
> serialization strategy could be placed above this level, or a more
> sophisticated persistence model could be developed with less storage
> overhead.

Well, from my experience, the 'simple serialization strategy' you 
mention can become quite complex, depending on what exactly you have in 
mind.  I'm not against the approach.  In fact, I'd like to have such a 
feature if it really is not too complicated and doesn't impact the size 
of the core too much.

If you have ideas in mind, please send them on this list for discussion.

> Separating the HTree and BTree from the RecordManager is really nice. I
> think more levels of layering will make it easier to develop many kinds of
> persistence models. Making object persistence available at the RecordManager
> level might not be the best place for this, since I consider this to be a
> very "high level" function.

Again, the current object persistence in the RecordManager is really 
just a utility function which serialiazes the object into a byte[].  If 
you want more functionality, you build it on top.

This goes for any type of service generally found in OODBMS such as 
collection classes, concurrency policy (pessimistic/optimistic locking), 
transaction models (nesting, rollback ability, ...), etc.

I've argued in the past that many of these features should be part of a 
separate project, in order to keep JDBM small and simple.  I guess it 
depends on the scope of the overall project and whether you want to 
create a full-blown OODBMS or simply extend JDBM in ways that are 
compatible with its original objectives.

> 
> I'm going to start mucking with JDBM, to see how feasible it is to have
> higher level tree implementations built upon simpler ones. Has there been
> any other talk of this, or have there been any such implementations?
> 

The Exolab folks have built some services on top of JDBM in the past.  I 
remember seeing locking and extended transaction management in the core 
library.  I should check again to see if we could merge some of the code 
back into JDBM.

cheers,
alex