Mark Proctor wrote:
Kevin Day wrote:

I have no objections.  I actually did a pretty thorough code review of Bryan's work and found his changes to be very well though through.  Haven't used it in projection yet, but probably will.  Also, as he points out, it's easy to drop back to the old serialization mechanism (or even the *shudder* Java serialzation mechanism).
for me I just want to be able to efficiently write my own byte[], without having to go through any serialisation mechanism. When I last looked the Serializer interface allowed this, so should be fine.
It would be nice to be able to factor things so alternative serializers could be specified, but as Bryan points out, a *lot* of tweaks and hooks were required to add this one other serialization mechanism.  I suspect that a serious refactoring of the entire jdbm codebase would be required to make it truly pluggable (and there are, I believe, more desirable goals for future development - like true transaction isolation and rollback support).
Getting JDBM to play in JTA transactions is important, someone said in an old posting they had created some code for this and submitted it - anyone know what happened to that?

Interestingly the other day I was talking to someone who built a key/value store for journal logging, they use to use BDB, but moved away sa the btree was overkill. They built a system that created 10 x 10mb files (configurable). You would insert the byte[] and it would return a long handle, the byte[] would be written to the head location, which currently pointed to a specific file, if there wasn't enough space it would point to the next file, they do not allow spanning and that free space would not be written to later as they do not seek, they only ever write to the head location. Whlie you can remove entries the indivual entry filespace is not reclaimed, only if all entries for that file are marked as removed would it delete the entire file. They claim this approach gives blistering fast speed, as there is no seek at write time, and only seek once at read time to find the start position, because they do not fragment their files to fit in gaps. Most journalling systems have entries only possibly numbering in their hundreds and won't exist forever, so you get something that is less efficient with space but much faster. I was wondering if the RecordManager in JDBM could be extended to do something similar, as another possible backend?

Talking of which would it be possible for someone to write some docs on how RecordManager and JDBM works in general?
Chatting to my collegue about his approach, his stuff is here The files are made small enough to fit into a disk cylinder and each one is append only, this maximises throughput. This is geared up more for journalling though, but it does do long/value add, remove and update. They have record types, so the TX info can be appended to the same log file, to avoid the seek between the .db and .log. Anyway thought the idea of a single appending log only idea might be a good "optimisation" backend for JDBM.

I've been going through the JDBM code and it's quite well written, so I'm able to understand it. Interesting I can see that RecordManager can be used directly without BTree, for a basic store using a long key, so thats interesting. There seems to be no repeated disk seeking on write, as the location is determined in memory and then it's a continuous write. A delete again is an in memory lookup and a single write to mark the location free, it doesn't have to actually free all bytes on the disk. This looks pretty optimial to me. Buffering is optional, out of interest when does this pay off? I can't imagine the logic and physical mapping has any measurable cost. Down side is that byte[] size needs to be known ahead of time so async streaming for writes won't work. I saw that in the docs it mentions replacing some of the code with DBCache, but I couldn't find the mailing list discussion on this - any details?

There is obviously the issue of multithreading. With the move to JDK1.5, does that help some? JDBM should probably atleast allow multiple reads, regardless of write, sorta like concurrent hashmap. I'm guessing at this point it might be preferable to split up dbs, to allow concurrent writes too, via striping?

One of the use cases I'd like to support is the idea of in memory meta-data for each long, without having to iterate over the entire .db reading in all files. I will probably do this as a second db, that holds only meta data - although then I need to get transactions to span across both dbs. This would hold the long id to the read data. I'd then do a permanent in memory cache of data, as we'd only ever have a few hundred items anyway. I'm wondering if the idea of "meta" data for records could be built into the main .db. And it should be possible to startup a record manager and load in all meta data. This way the meta data and record can be written continously together.

- K

Kevin Day
Trumpet, Inc.
480-961-6003 x1002

----------------------- Original Message -----------------------
From: "Cees de Groot" <>
To: "Bryan Thompson" <>
Cc: Mark Proctor <>,,
Date: Tue, 10 Jun 2008 20:34:25 +0200
Subject: Re: [Jdbm-general] extensible serializer
If the license issue is sorted, I'm more than happy to keep it in. The
original JDBM interfaces are still there, and I'll just move it to
jdbm.extser or similar.

Any objections? I mean, upside, no license problems, and through code
inclusion no dependencies issues. Sounds like ok to me...

On Tue, Jun 10, 2008 at 7:11 PM, Bryan Thompson <> wrote:

I had to do some significant work in order to get jdbm to persist the
serializer state.  I would suggest that you look at the code more carefully
rather than just rolling it back.  Assuming that xstream uses a stateful
serializer, you are going to want to preserve the integration points.

By "stateful" serializer, I mean one that maintains persistent state NOT
recorded in the individual serialized records.  extser factors out what
serializers are declared, the class ids (int's) assigned to each class for
which there is a registered serializer, and the corresponding serializer
version(s) and puts that all into a persistent record accessed off of one of
the named roots for the store.  This makes it extremely compact when
serializing object graphs.  The shared state is all factored out.  In order
to support that I had to put in a bunch of hooks that you will want to keep

Another issue with versioned serializers is that they basically have to be
inner classes in order to access the various fields (unless you want the
overhead of reflection during serialization!).  One of the changes that I
introduced with the extser integration was transparently versioning for the
btree nodes and leaves for stores that choose to enable extser.  If that
forward versioning is important then you are going to wind up with
something that's tightly coupled regardless.

If the broader issue is the dependency, then Cees already imported extser
and I "authorize" its relicensing under the license for the jdbm project.


[] On Behalf Of Mark
Sent: Tuesday, June 10, 2008 12:54 PM
To: Bryan Thompson
Cc: 'Cees de Groot';;
Subject: Re: [Jdbm-general] extensible serializer

Bryan Thompson wrote:

Well, easy come, easy go - but you might want to see who's using it before
you drop it out.

There is zero overhead when extser is not enabled.

it's not so much the overhead, its the extra dependency. Even if you don't
use it you are forced to include it, because Serialiser extends it. So if we
are to use it, it needs to be "plugged" in, so that it's optional.

The LGPL is a wierd issue, basically Apache takes a stand against LGPL and
will not allow any of it's projects to depend on an LGPL dependency. This
would force Apache DS to have to fork JDBM to maintain a version without
that LGPL dependency. It's not that they think that LGPL is causing any
wierd violation, they just don't like some of the ambiguity, and thus fall
on the side of caution.


[] On Behalf Of Mark
Sent: Monday, June 09, 2008 10:58 AM
To: Cees de Groot
Subject: Re: [Jdbm-general] extensible serializer

Cees de Groot wrote:

On Mon, Jun 9, 2008 at 4:24 PM, Cees de Groot <> wrote:

I grabbed the extensible serializer source code and added it to the
source tree - the original project doesn't seem to exist anymore so I
thought this was the quickest way to get rid of a binary-only

Great, I'l update and look over it. For me I just want JDBM to write my
byte[], I don't want it to go anywhere near a serialistion method call, I'm
currently just trying to find out if that is possible.

On second thought, I agree it's not good to have JDBM depend on a
single serializer. So I'm removing the dependency (rolling back to the
pre-extser tag in CVS and checking what happened after that)



------------------------------------------------------------------------- Check out the new Marketplace. It's the best place to buy or sell services for just about anything Open Source.

_______________________________________________ Jdbm-general mailing list