Kevin,
 
A while back I posted on the notion of "logical containers."  I don't believe that I ever cleared up the
terminology to your satisfaction.  However, the recent thread on the translation pages has given me
some insight as to how to use such containers so as to consume those 10 remaining bits in the
recid in the current translation page scheme.  I'll sketch out the approach that I am suggesting.  It
has the advantage that it packs many logical records (from the application perspective) inside of a
single physical record (from the jdbm perspective).
 
The basic idea is that any given physical record serves as a logical record containing some application
data, but can also serve as a container for other logical records.  The container behavior provides for up
to 2^N objects in the physical record, where N is the #of bits available for addressing -- 10 in the case
of the current translation page scheme as you have outlined the current jdbm behavior.  A flag marks
whether the physical record is currently a container or not.  When dynamically promoted to a container,
the following data are added to the record:
 
 - #of logical records in the logical container (may span multiple physical containers)
 - #of logical records in this physical record (up to 2^N).
 - recid of the first secondary container (head of a linked list of zero or more physical records).
 - recid of the last secondary container (tail of a linked list of zero or more physical records).
 - long[] of keys into the container (maintained in sorted order)
 - Object[] of values in the container (indexed into by finding the matching recid in the key[]).
 - keySerializer
 - valueSerializer
 
As you can see the physical record is dynamically transformed into something that is capable of
looking up the recid in an internal index.  The way this works is that the least significant N bits of
the logical recid provided by the application are dropped in order to find the block and offset in the
translation page that points to the physical record.  Those least significant bits are then used to
index into the container to find the logical record (so the key[] could be packed since the high order
bits are always the same)
 
The application inserts into a logical container using:
 
    insert( long containerHintId, Object obj )
 
The container hint is the recid of any object in that container.  The serializer is the value serializer
for the container (see above).
 
When the address space for the physical record (those N bits) becomes full, a secondary container
is created and objects are inserted into that container instead.
 
Note that the CRUD operators have basically the same cost as the original jdbm CRUD operators
since the only indirection is the translation page, exactly like current jdbm operations.  The advantage
is that many logical records may be packed into the same physical record, which has the same
benefits that you noted with using a btree to pack records together:
 
Multiple logical records in a single physical record means:
 
 - fewer, but larger, I/O transfers
 - fewer distinct physical records, so fewer objects in the cache.
 
I have implemented a varient on this as a RecordManager interface that delegates to the cache and
then to the BaseRecordManager.  This works very nicely.  However, given the discussion of the
translation pages it is clear that this technique could be migrated directly into jdbm which would
give us full utilization of those unused address bits.
 
So, when considering design alternatives for the translation pages I think that this approach can be
used to regain the full logical address space.   The other consideration, of couse, is changing the
translation pages in order to support reordering of the pages within the store.
 
-bryan
-----Original Message-----
From: jdbm-developer-admin@lists.sourceforge.net [mailto:jdbm-developer-admin@lists.sourceforge.net] On Behalf Of Kevin Day
Sent: Wednesday, September 28, 2005 2:12 AM
To: JDBM Developer listserv
Subject: [Jdbm-developer] More reverse lookup info

 
All-
 
Here are my thoughts for the evening - I'm looking forward to your responses and ideas...
 
 
Logical row id reverse lookup redux
--------------------------------------------------
Based on feedback from Markus, I went back and looked at the logical row ids.  It turns out that the limit on the maximum logical row id has more to do with the 8 byte representation and how it maps to block/offset than it does to the amount of space actually available in the file.
 
By taking advantage of this, and the fact that all logical rowids are divisible by 10 (because of the slot size in the translation page) I can get the minimum bytes to capture a logical row id down to 6 (Actually, 44 bits - which leaves a few bits for marking free and previous-free state to support coalescing).  I'm pretty sure I've maxed it out at this point, though.
 
Even with taking advantage of splitting records (to reduce the 4 byte size field to a 1 byte 'unused' field), I still can't get a record header smaller than 11 bytes.
 
This is going to force us to either:
 
1.  Maintain a separate table with reverse lookup
or
2.  Change the binary format of the file (which will require some sort of conversion procedure)
 
 
I'm not keen on either, but I really don't see any alternative - hopefully someone else will?
 
Conversion / Off-line pack and rebuild
-----------------------------------------------------
I have some thoughts on a conversion procedure - I'm going to put them down here to see if anyone has any comments on this (note that this may make for an off-line pack and rebuild strategy as well):
 
1.  Pull the length of the first logical row id's record
2.  Allocate space from the available page list to store this record (probably use the used space of the record, not avaliable space), including the new header size (allocate new pages as necessary)
3.  Insert that record at the new location
4.  Update the translation page for the new record location, and insert the logical row id into the new record header
5.  Mark the old record as free
6.  Coallesce adjacent free records on either side of the old record, if possible
7.  If the page containing the old record (or the result of coallescing the old record) now contains only free record data, move it to the free list
8.  Update the value of the last logical row id to be processed (important for restarting without data corruption)
9.  Every XX loops, call commit()
10.  Continue with the next logical row id and repeat
 
11.  After all logical row ids have been processed, reclaim any free pages that are now located at the end of the file
 
A couple of comments on this strategy:
 
First, we need to have some mechanism for pulling free pages that are as close to the front of the file as possible.  We could order the list, or add parent, left and right pointers to the free page and implement a binary heap, but maybe there is some other mechanism that would be more efficient? 
 
Second, we would probably want to also move other page types (free logical and physical row id pages come to mind - but I think we can get rid of those entirely, so maybe data pages are the only ones we need to worry about after all.
 
Third, the coallescing step is critical to this strategy working as it will free up pages.  Otherwise, we could wind up with a bunch of unused free pages at the end of the process.
 
Fourth, this strategy may work nicely as a general pack & rebuild operation (even if we aren't changing record header sizes).
 
Fifth, this may even work as an on-line pack & rebuild operation (we still need the reverse lookup to allow us to pack translation pages to the front of the file, though, so this doesn't remove that need - see my comments in the next section "Translation Page Strategy" for what I'm thinking here).  Basically, we could have a low priority thread that does one of the above loops, then goes to sleep.  As long as the operation is synchronized properly, it should be possible to do this while the database is actually being used.  The other approach would be to move through the loop with some duty cycle (i.e. every X inserts, run the loop), but doing it in a separate thread allows the pack to occur when the database isn't actively being used, so it won't effect performance.
 
Sixth, we would need to make sure that we don't move records around unnecesarily.  I think this is just a matter of only moving records if they belong to blocks that have unused space in them (allowing for the special condition where there can be 16 bytes + 1 record header at the end of the page).
 
Is there anything else I might be missing?
 
 
 
Translation Page Strategy
------------------------------------
 
By the way (also based on Markus' comments), I went back and spent some time thinking about the new translation page strategy.  I think that using a tree structure like I was proposing is *not* the way to go.  Instead, I think we should adopt a policy of always placing new translation pages near the front of the file, and that we never move a tranlation page once it is allocated.  Maintaining a single page read for translation is going to be important, and I think we can gain the same desired benefit (i.e. creating a system where the translation pages don't prevent us from packing free space from the file) and still maintain O(1) lookups (and less disk space overhead as well).
 
Any older files would continue to work as normal, but can never be smaller than the largest sized translation page that was allocated before the new code was put in place.
 
In order for this to work, we will need to know the physical location of the largest translation page (excluding pages added before this strategy change).  When a new translation page is requested, we go to the physical page immediately following, and move it's contents into a free page somewhere later in the file.  We then mark the recently freed page as a translation page, link it to the correct page list, and start using it.
 
There are some details about how to properly handle freed translation pages, but this strategy seems like it would work.  Basically, we recognize that we can't practically move translation pages around, so we make sure we place them initially where we want them to be.
 
For this to work, though, we have to be able to move data pages around on-demand - and the only way that's going to happen is if we have a reverse lookup to fnd the logical row id for a given record.
 
 
 
 
Stumped on how best to do this...
 
- K
------------------------------------------------------- This SF.Net email is sponsored by: Power Architecture Resource Center: Free content, downloads, discussions, and more. http://solutions.newsforge.com/ibmarch.tmpl _______________________________________________ Jdbm-developer mailing list Jdbm-developer@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/jdbm-developer