re[2]: [Jdbm-developer] Free page selection bug

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0
 Transitional//EN">
<HTML><HEAD>
<STYLE type=text/css> P, UL, OL, DL, DIR,
 MENU, PRE { margin: 0 auto;}</STYLE>

<META content="MSHTML 6.00.2900.2802" name=GENERATOR></HEAD>
<BODY leftMargin=1 topMargin=1 rightMargin=1><FONT
 face=Tahoma>
<DIV><FONT face=Arial size=2>Gents-</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial size=2>Wanted to float
 an intermediate idea here - it depends on
 the database utlization, but may provide
 a quick out.</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial size=2>The primary
 issue with the first fit algorithm was that,
 in certain reasonable usage models, it resulted
 in massive amounts of wasted space.&nbsp;
 This was more a result of the fact that the
 allocator doesn't support splitting or coallescene
 than anything.&nbsp; The downside of the
 current 'fix' for this issue is that other
 (quite reasonable) usage scenarios result
 in a huge amount of wasted&nbsp;CPU cycles&nbsp;(dontcha
 love these kinds of tradeoffs?).</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial size=2>The ultimate
 fix for this, of course, is to re-design
 the allocation strategy completely, and include
 splitting and coallescing (or have that behavior
 be implicit to the algorithm).&nbsp; There
 has been quite a bit of discussion about
 strategies in this area, but things move
 slowly, so I'm not holding my breath for
 any of them.</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial size=2>It seems to
 me that the immediate workaround here is
 to add pack capability to jdbm.&nbsp; I believe
 that this could be done, even with the current
 data structures, by creating a temporary
 lookup table for reverse mapping physical
 offsets to recids.&nbsp; The downside to
 this approach is that it might require some
 sort of quiessence in the system to get it
 right, so if the database is going to be
 hammered 24x7, then it may not be the right
 solution - but I thought I'd toss it out.</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial size=2>Here is how
 the algorithm could work:</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial size=2>The pack operation
 is triggered</FONT></DIV>
<DIV><FONT face=Arial size=2>jdbm creates
 a new, separate jdbm database with transactions
 turned off, optimized purely for speed (this
 is a temporary file)</FONT></DIV>
<DIV><FONT face=Arial size=2>jdbm adds a
 BTree to act as a reverse lookup table.&nbsp;
 Keys are Long, physical offset.&nbsp; Values
 are Long, recid</FONT></DIV>
<DIV><FONT face=Arial size=2>jdbm cranks
 through the translation table and inserts
 physical offset-&gt;recid mapping into the
 BTree</FONT></DIV>
<DIV><FONT face=Arial size=2>jdbm then grabs
 the first data page, and checks to see if
 the first record on the page has identical
 length and used values.&nbsp; If not, then
 in a single transaction, it moves the second
 record to consume all space, updates the
 length and used value of the first record,
 lookups up the recid of the second physical
 record, updates the offset of the second
 physical record in the translation table,
 closes transaction</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial size=2>As an additional
 check, the system should check the physical
 offset in the current translation table against
 the physical offset of the actual record.&nbsp;
 If they are not the same, then the record
 shouldn't be moved, and the next record should
 be considered.</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial size=2>I'm pretty sure
 that it should be possible to implement this
 in a manner that allows for partial pack
 operations.</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial size=2>The database
 will certainly grow, but the pack operation
 will shrink it back down.</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial size=2>Anyway, just
 wanted to get my thoughts out there on this&nbsp;in
 case anyone wants to run with&nbsp;it.&nbsp;
 It should be considerably easier (and less
 error prone)&nbsp;than implementing a new
 allocator, and it should provide the basis
 of a migration path into a new file format
 when that happens...</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial size=2>If such a system
 were in place, it would be possible to go
 back to a first fit algorithm (with significantly
 better performance across a much wider field
 of applications), and still (hopefully) keep
 the fragmentation down.</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial size=2>Cheers,</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial size=2>- Kevin</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial size=2></FONT></DIV>
<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial size=2>&nbsp;</FONT>

<TABLE>
<TBODY>
<TR>
<TD width=1 bgColor=blue></TD>
<TD>&gt;
 Colin, I am not seeing as much grief
 from the allocation strategy as you describe.
 &nbsp;I expect that there is an interaction
 with the allocations that are being performed
 by your application. &nbsp;The change was
 introduced since it produced smaller stores
 for our RDFS database without a loss in speed,
 but clearly it is a loose for you. &nbsp;Let
 me look at the code and see what we can
 change. With respect to record copying,
 you should be able to refactor what you need
 out of the dump utility. &nbsp;It has mechanisms
 for scanning records from a page perspective,
 but it does not attempt to do inserts with
 a pre-defined logical record id. &nbsp;I
 can look into that I suppose, but let's see
 where the update (below) gets you. I've
 attached a modified version of FreePhysicalRowIdPage.
 &nbsp;Please try it out and let me know
 what impact it has on your application. &nbsp;It
 passes the jdbm test suite, but there
 are no performance tests that specifically target
 this feature. &nbsp;It has no performance
 impact (time or disk space) on our RDFS
 database application. -bryan </TD></TR></TBODY></TABLE></DIV></BODY></HTML>