try this today. One thing that I am not clear about with jdbm is whether
it makes any attempt (or guarentee)
records which span a page will be on contiguous pages.
mentioned, I have been playing with some blob/clob support. My original
take was to have the records form
linked list. Since this is exactly what the jdbm record headers are doing, it
should be possible to incrementally
allocate new pages into a jdbm record, thereby supporting streaming
from the application in addition to the current
Serializer approach. An interesting
However, it occurred to me that greater efficiency could be obtained by
blocking the linked list of records (in my
current implementation of blob/clob) into an array of recids in a
header record for the blob. That would make it
possible to do pre-fetch strategies for subsequent segments of the
blob. If we use the record header approach as
stands today, the most pre-fetch that we could do is one page read ahead at a
mentioned in another thread I/O efficiency. The guiding principle as I
understand it is that you want to get as
I/O concurrency as possible so that you can get as many disk arms behind your
application as possible.
leads to the use of striped disk arrays and cluster storage solutions.
jdbm today is single threaded for read
write, so it is not possible to get I/O concurrency. Even if you have
I/O concurrency at the store layer, your
application has to support it as well. For us, I/O concurrency is
gained by high level query languages that can
parallel read operations against the store or (in a different application
which is a fuzzy inference engine based
on a neural network model) by
being able to model the main computation using
a parallel processing approach.
change that I would like to introduce (if we wind up introducing changes in
the jdbm file structure) is a long
class identifier in the record header. This would make it
possible to accurrately profile the contents of a jdbm
store. For certain key classes (BTree, BPage, HashDictionary,
HashBucket, String, Long) we would have "magic"
pre-defined values that used negative recids. All other classes
would be assigned recids by interning the class name
string table. The string table itself could be just a BTree using a
compressed key index whose keys are the
string values and whose values are the jdbm records whose content is
that string. The latter is necessary so
you can lookup the class
I am certainly going to do this at the object manager
layer for our application since some use of Externalizable
even Serializable appears to be why the store is so
bloated (I had no idea just how bad java serialization was!).
think that it would be a nice feature for jdbm, but
not one that we could introduce while maintaining binary
in the store file.
At the same time, I would also like to introduce
version numbers to critical classes (BTree, BPage, etc.) so that
have more flexibility in evolving jdbm without
breaking binary compatibility.
Thanks for picking this up right now - I am
completely slammed with work that I'm not able to really write any jdbm
code. For some reason it's a lot easier for me to think of conceptual
design and algorithms in the evening than actually bang on the
Things should clear up a bit in the next week
or so. Please let me know if you need a hand walking the data pages -
I think that is going to be your best bet for getting a solid unused space
||> I finally got
through the sourceforge CVS. I will pick up work on the
free lists tomorrow. -bryan