Oh yes, forgot about that!

Kevin Day wrote:
Yes - I think that Bryan's idea of making the '128 bytes' configurable will achieve the same thing (set that value to Integer.MAX_VALUE, and you've effectively reverted back to first fit...) - or am I missing something subtle?
- K

> A complementary idea would be to define a property that provides a hint to the allocator algorithm to enable/disable the first-fit strategy.


Kevin Day wrote:
Wanted to float an intermediate idea here - it depends on the database utlization, but may provide a quick out.
The primary issue with the first fit algorithm was that, in certain reasonable usage models, it resulted in massive amounts of wasted space.  This was more a result of the fact that the allocator doesn't support splitting or coallescene than anything.  The downside of the current 'fix' for this issue is that other (quite reasonable) usage scenarios result in a huge amount of wasted CPU cycles (dontcha love these kinds of tradeoffs?).
The ultimate fix for this, of course, is to re-design the allocation strategy completely, and include splitting and coallescing (or have that behavior be implicit to the algorithm).  There has been quite a bit of discussion about strategies in this area, but things move slowly, so I'mnot holding my breath for any of them.
It seems to me that the immediate workaround here is to add pack capability to jdbm.  I believe that this could be done, even with the current data structures, by creating a temporary lookup table for reverse mapping physical offsets to recids.  The downside to this approach is that it might require some sort of quiessence in the system to get it right, so if the database is going to be hammered 24x7, then it may not be the right solution - but I thought I'd toss it out.
Here is how the algorithm could work:
The pack operation is triggered
jdbm creates anew, separate jdbm database with transactions turned off, optimized purely for speed (this is a temporary file)
jdbm adds a BTree to act as a reverse lookup table.  Keys are Long, physical offset.  Values are Long, recid
jdbm cranks through the translation table and inserts physical offset->recid mapping into the BTree
jdbm then grabs the first data page, and checks to see if the first record on the page has identical length and used values.  If not, then in a single transaction, it moves the second record to consume all space, updates the length and used value of the first record, lookups up the recid of the second physical record, updates the offset of the second physical record in the translation table, closes transaction
As an additional check, the system should check the physical offset in the current translation table against the physical offset of the actual record.  If they are not the same, then the record shouldn't be moved, and the next record should be considered.
I'm pretty sure that it should be possible to implement this in a manner that allows for partial pack operations.

The database will certainly grow, but the pack operation will shrink it back down.
Anyway, just wanted to get my thoughts out there on this in case anyone wants to run with it.  It should be considerably easier (and less error prone) than implementing a new allocator, and it should provide the basis of a migration path into a new file format when that happens...
If such a system were in place, it would be possible to go back to a first fit algorithm (with significantly better performance across a much wider field of applications), and still (hopefully) keep the fragmentation down.
- Kevin
------------------------------------------------------- All the advantages of Linux Managed Hosting--Without the Cost and Risk! Fully trained technicians. The highest number of Red Hat certifications in the hosting industry. Fanatical Support. Click to learn more http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642 _______________________________________________ Jdbm-developer mailing list Jdbm-developer@listssourceforge.net https://lists.sourceforge.net/lists/listinfo/jdbm-developer