Here are my thoughts for the evening - I'm looking
forward to your responses and ideas...
Logical row id reverse lookup redux
Based on feedback from Markus, I went back and looked at
the logical row ids. It turns out that the limit on the maximum logical
row id has more to do with the 8 byte representation and how it maps to
block/offset than it does to the amount of space actually available in the
By taking advantage of this, and the fact that all
logical rowids are divisible by 10 (because of the slot size in the
translation page) I can get the minimum bytes to capture a logical row id down
to 6 (Actually, 44 bits - which leaves a few bits for marking free and
previous-free state to support coalescing). I'm pretty sure I've maxed
it out at this point, though.
Even with taking advantage of splitting records (to
reduce the 4 byte size field to a 1 byte 'unused' field), I still can't get a
record header smaller than 11 bytes.
This is going to force us to either:
1. Maintain a separate table with reverse
2. Change the binary format of the file (which
will require some sort of conversion procedure)
I'm not keen on either, but I really don't see any
alternative - hopefully someone else will?
Conversion / Off-line pack and rebuild
I have some thoughts on a conversion procedure - I'm
going to put them down here to see if anyone has any comments on this (note
that this may make for an off-line pack and rebuild strategy as
1. Pull the length of the first logical row id's
2. Allocate space from the available
page list to store this record (probably use the used space of the record, not
avaliable space), including the new header size (allocate new pages as
3. Insert that record at the new
4. Update the translation page for the new record
location, and insert the logical row id into the new record
5. Mark the old record as free
6. Coallesce adjacent free records on either side
of the old record, if possible
7. If the page containing the old record (or the
result of coallescing the old record) now contains only free record
data, move it to the free list
8. Update the value of the last logical row id to
be processed (important for restarting without data corruption)
9. Every XX loops, call commit()
10. Continue with the next logical row id and
11. After all logical row ids have been processed,
reclaim any free pages that are now located at the end of the
A couple of comments on this strategy:
First, we need to have some mechanism for pulling free
pages that are as close to the front of the file as possible. We could
order the list, or add parent, left and right pointers to the free page and
implement a binary heap, but maybe there is some other mechanism that would be
Second, we would probably want to also move other page
types (free logical and physical row id pages come to mind - but I think we
can get rid of those entirely, so maybe data pages are the only ones we need
to worry about after all.
Third, the coallescing step is critical to this strategy
working as it will free up pages. Otherwise, we could wind up with a
bunch of unused free pages at the end of the process.
Fourth, this strategy may work nicely as a general pack
& rebuild operation (even if we aren't changing record header
Fifth, this may even work as an on-line pack &
rebuild operation (we still need the reverse lookup to allow us to pack
translation pages to the front of the file, though, so this doesn't remove
that need - see my comments in the next section "Translation Page Strategy"
for what I'm thinking here). Basically, we could have a low priority
thread that does one of the above loops, then goes to sleep. As long as
the operation is synchronized properly, it should be possible to do this while
the database is actually being used. The other approach would be to move
through the loop with some duty cycle (i.e. every X inserts, run the loop),
but doing it in a separate thread allows the pack to occur when the database
isn't actively being used, so it won't effect performance.
Sixth, we would need to make sure that we don't move
records around unnecesarily. I think this is just a matter of only
moving records if they belong to blocks that have unused space in them
(allowing for the special condition where there can be 16 bytes + 1
record header at the end of the page).
Is there anything else I might be missing?
Translation Page Strategy
By the way (also based on Markus' comments), I went back
and spent some time thinking about the new translation page strategy. I
think that using a tree structure like I was proposing is *not* the way to
go. Instead, I think we should adopt a policy of always placing new
translation pages near the front of the file, and that we never move a
tranlation page once it is allocated. Maintaining a single page read for
translation is going to be important, and I think we can gain the same desired
benefit (i.e. creating a system where the translation pages don't prevent us
from packing free space from the file) and still maintain O(1) lookups (and
less disk space overhead as well).
Any older files would continue to work as normal, but
can never be smaller than the largest sized translation page that was
allocated before the new code was put in place.
In order for this to work, we will need to know the
physical location of the largest translation page (excluding pages added
before this strategy change). When a new translation page is requested,
we go to the physical page immediately following, and move it's contents into
a free page somewhere later in the file. We then mark the recently freed
page as a translation page, link it to the correct page list, and start using
There are some details about how to properly handle
freed translation pages, but this strategy seems like it would work.
Basically, we recognize that we can't practically move translation pages
around, so we make sure we place them initially where we want them to
For this to work, though, we have to be able to move
data pages around on-demand - and the only way that's going to happen is if we
have a reverse lookup to fnd the logical row id for a given
Stumped on how best to do this...
This SF.Net email is sponsored by: Power Architecture Resource Center: Free
content, downloads, discussions, and more.
_______________________________________________ Jdbm-developer mailing list