complementary idea would be to define a property that provides a hint
to the allocator algorithm to enable/disable the first-fit strategy.
Kevin Day wrote:
Wanted to float an intermediate idea here - it depends on the database
utlization, but may provide a quick out.
The primary issue with the first fit algorithm was that, in certain
reasonable usage models, it resulted in massive amounts of wasted
space. This was more a result of the fact that the allocator doesn't
support splitting or coallescene than anything. The downside of the
current 'fix' for this issue is that other (quite reasonable) usage
scenarios result in a huge amount of wasted CPU cycles (dontcha love
these kinds of tradeoffs?).
The ultimate fix for this, of course, is to re-design the allocation
strategy completely, and include splitting and coallescing (or have
that behavior be implicit to the algorithm). There has been quite a
bit of discussion about strategies in this area, but things move
slowly, so I'mnot holding my breath for any of them.
It seems to me that the immediate workaround here is to add pack
capability to jdbm. I believe that this could be done, even with the
current data structures, by creating a temporary lookup table for
reverse mapping physical offsets to recids. The downside to this
approach is that it might require some sort of quiessence in the system
to get it right, so if the database is going to be hammered 24x7, then
it may not be the right solution - but I thought I'd toss it out.
Here is how the algorithm could work:
The pack operation is triggered
jdbm creates anew, separate jdbm database with transactions turned off,
optimized purely for speed (this is a temporary file)
jdbm adds a BTree to act as a reverse lookup table. Keys are Long,
physical offset. Values are Long, recid
jdbm cranks through the translation table and inserts physical
offset->recid mapping into the BTree
jdbm then grabs the first data page, and checks to see if the first
record on the page has identical length and used values. If not, then
in a single transaction, it moves the second record to consume all
space, updates the length and used value of the first record, lookups
up the recid of the second physical record, updates the offset of the
second physical record in the translation table, closes transaction
As an additional check, the system should check the physical offset in
the current translation table against the physical offset of the actual
record. If they are not the same, then the record shouldn't be moved,
and the next record should be considered.
I'm pretty sure that it should be possible to implement this in a
manner that allows for partial pack operations.
The database will certainly grow, but the pack operation will shrink it
Anyway, just wanted to get my thoughts out there on this in case anyone
wants to run with it. It should be considerably easier (and less error
prone) than implementing a new allocator, and it should provide the
basis of a migration path into a new file format when that happens...
If such a system were in place, it would be possible to go back to a
first fit algorithm (with significantly better performance across a
much wider field of applications), and still (hopefully) keep the