On Tue, 2007-02-13 at 11:13 +0100, Cees de Groot wrote:
> I have some time on my hands (between job changes) and I've been back
> into Javaland for a while now, *and* someone needed a patch :-), so
> I'm planning to prepare a 1.1 release shortly (coming 1-2 weeks). If
> there's anything you'd like to have included, shout :-)
We use jdbm in a high performance environment. We have jdbm files
totalling approximately six gigabytes in use by our application at any
time, and the locality of our accesses does not generally follow the
We have some issues with jdbm which we have solved by writing new
components. However, we have identified a number of requirements for a
longer term solution, and jdbm does not currently satisfy many of these
requirements. We will contribute our current code as soon as I get it
past the legal people.
Serializing the serializers causes a lot of problems since it means they
cannot contain pointers to system state required for deserialization
(without nasty workarounds). Please can we just provide a serializer
whenever we create the BTree.
Lock-free, concurrent thread-safety is achievable using
java.util.concurrent, we NEED this; we cannot afford lock contention. We
replaced the disk backend and cache system with something based on a
sliding garbage collector. However, in our current implementation, the
mapping from record id to disk offset is stored in RAM, and this
overhead is no longer supportable, so we need a new idea, such as
storing disk offsets in the pages. (If we store disk offsets on the
disk, relocating/GC becomes difficult/impossible, so it would need a
malloc/free-list strategy). We have written up a design comparison sheet
for the various possible solutions; perhaps we can share this with the
list as well.
Can we completely remove the serialize/deserialize overhead for btree
and work on raw bytes, and make cache handle byte arrays? It is
important for us to specify the cache size in megabytes, not pages. It's
perfectly reasonable to provide a comparator which works on raw byte
arrays and performs deserialization itself if required.
Mostly, the order in a lookup-btree doesn't matter, and so comparing the
byte arrays directly is fine. Equality is the only required operation,
the ordering can be arbitrary as long as it is consistent.
Shared Cache: We have a SharedCacheRecordManager which has a single
cache of specified size shared between all btrees in the application. It
is also thread-safe and lock-free. I will get permission to contribute
this, although it will need considerable rewriting to satisfy the
Additional feature: Can jdbm also support a binary tree where the keys
are bit-vectors and data can be stored at any node?
One of my colleagues more familiar with BaseRecordManager tells me that
the freespace map has n^2 complexity (he didn't say whether time or
space, but we aren't using it anyway).
Expose an interface for raw file access, so third parties can plug in
their own file access code. This allows developers to use tricks like
memmap when they know the file is small, or cache FDs in a system which
would otherwise exhaust all available FDs. We, for instance, cannot use
RandomAccessFile; we suffer file descriptor exhaustion. We did evaluate
a j2me skip-list solution which accessed files only through an
Currently jdbm is using 30% of our RAM and most of our I/O time on a
twin Xeon with 8Gb of RAM and hardware RAID.