Two questions related to the traversal of the "used pages" list.  If a data page is solely used as a continuation page
for a record, does it show up in the used pages list?  If so, what is the return value for getFirst() on such a data page
since there is no record header?
Also, do you have thoughts on what would be "interesting" information to extract from the free page list?  I am thinking
of a free page count and perhaps determining how much fragmentation there is in the free page list.  Which gets back
to an earlier question -- does jdbm memory allocation make any effort to allocation contiguous pages for large records?
-----Original Message-----
From: [] On Behalf Of Thompson, Bryan B.
Sent: Friday, September 23, 2005 8:15 AM
To: Kevin Day
Subject: RE: [Jdbm-developer] commit: jdbm.recman.DumpUtility

I'll try this today.  One thing that I am not clear about with jdbm is whether it makes any attempt (or guarentee)
that records which span a page will be on contiguous pages.
As I mentioned, I have been playing with some blob/clob support.  My original take was to have the records form
a linked list.  Since this is exactly what the jdbm record headers are doing, it should be possible to incrementally
allocate new pages into a jdbm record, thereby supporting streaming from the application in addition to the current
Serializer approach.  An interesting thought.
However, it occurred to me that greater efficiency could be obtained by blocking the linked list of records (in my
current implementation of blob/clob) into an array of recids in a header record for the blob.  That would make it
possible to do pre-fetch strategies for subsequent segments of the blob.  If we use the record header approach as
it stands today, the most pre-fetch that we could do is one page read ahead at a time.
You mentioned in another thread I/O efficiency.  The guiding principle as I understand it is that you want to get as
much I/O concurrency as possible so that you can get as many disk arms behind your application as possible.
This leads to the use of striped disk arrays and cluster storage solutions.  jdbm today is single threaded for read
and write, so it is not possible to get I/O concurrency.  Even if you have I/O concurrency at the store layer, your
application has to support it as well.  For us, I/O concurrency is gained by high level query languages that can
use parallel read operations against the store or (in a different application which is a fuzzy inference engine based
on a neural network model) by being able to model the main computation using a parallel processing approach.
One change that I would like to introduce (if we wind up introducing changes in the jdbm file structure) is a long
class identifier in the record header.  This would make it possible to accurrately profile the contents of a jdbm
store.  For certain key classes (BTree, BPage, HashDictionary, HashBucket, String, Long) we would have "magic"
pre-defined values that used negative recids.  All other classes would be assigned recids by interning the class name
in a string table.  The string table itself could be just a BTree using a compressed key index whose keys are the
string values and whose values are the jdbm records whose content is that string.  The latter is necessary so that
you can lookup the class name.
I am certainly going to do this at the object manager layer for our application since some use of Externalizable or
even Serializable appears to be why the store is so bloated (I had no idea just how bad java serialization was!).  I
think that it would be a nice feature for jdbm, but not one that we could introduce while maintaining binary compatibility
in the store file.
At the same time, I would also like to introduce version numbers to critical classes (BTree, BPage, etc.) so that we
have more flexibility in evolving jdbm without breaking binary compatibility.
-----Original Message-----
From: Kevin Day []
Sent: Thursday, September 22, 2005 11:26 PM
To: Thompson, Bryan B.
Subject: re: [Jdbm-developer] commit: jdbm.recman.DumpUtility

Thanks for picking this up right now - I am completely slammed with work that I'm not able to really write any jdbm code.  For some reason it's a lot easier for me to think of conceptual design and algorithms in the evening than actually bang on the keyboard.
Things should clear up a bit in the next week or so.  Please let me know if you need a hand walking the data pages - I think that is going to be your best bet for getting a solid unused space percentage.
- K
> I finally got through the sourceforge CVS.  I will pick up  work on the free lists tomorrow.  -bryan <