From: Kevin D. <ke...@tr...> - 2006-05-30 17:46:35
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD> <STYLE type=text/css> P, UL, OL, DL, DIR, MENU, PRE { margin: 0 auto;}</STYLE> <META content="MSHTML 6.00.2900.2802" name=GENERATOR></HEAD> <BODY leftMargin=1 topMargin=1 rightMargin=1><FONT face=Tahoma> <DIV><FONT face=Arial size=2>Gents-</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>Wanted to float an intermediate idea here - it depends on the database utlization, but may provide a quick out.</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>The primary issue with the first fit algorithm was that, in certain reasonable usage models, it resulted in massive amounts of wasted space. This was more a result of the fact that the allocator doesn't support splitting or coallescene than anything. The downside of the current 'fix' for this issue is that other (quite reasonable) usage scenarios result in a huge amount of wasted CPU cycles (dontcha love these kinds of tradeoffs?).</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>The ultimate fix for this, of course, is to re-design the allocation strategy completely, and include splitting and coallescing (or have that behavior be implicit to the algorithm). There has been quite a bit of discussion about strategies in this area, but things move slowly, so I'm not holding my breath for any of them.</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>It seems to me that the immediate workaround here is to add pack capability to jdbm. I believe that this could be done, even with the current data structures, by creating a temporary lookup table for reverse mapping physical offsets to recids. The downside to this approach is that it might require some sort of quiessence in the system to get it right, so if the database is going to be hammered 24x7, then it may not be the right solution - but I thought I'd toss it out.</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>Here is how the algorithm could work:</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>The pack operation is triggered</FONT></DIV> <DIV><FONT face=Arial size=2>jdbm creates a new, separate jdbm database with transactions turned off, optimized purely for speed (this is a temporary file)</FONT></DIV> <DIV><FONT face=Arial size=2>jdbm adds a BTree to act as a reverse lookup table. Keys are Long, physical offset. Values are Long, recid</FONT></DIV> <DIV><FONT face=Arial size=2>jdbm cranks through the translation table and inserts physical offset->recid mapping into the BTree</FONT></DIV> <DIV><FONT face=Arial size=2>jdbm then grabs the first data page, and checks to see if the first record on the page has identical length and used values. If not, then in a single transaction, it moves the second record to consume all space, updates the length and used value of the first record, lookups up the recid of the second physical record, updates the offset of the second physical record in the translation table, closes transaction</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>As an additional check, the system should check the physical offset in the current translation table against the physical offset of the actual record. If they are not the same, then the record shouldn't be moved, and the next record should be considered.</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>I'm pretty sure that it should be possible to implement this in a manner that allows for partial pack operations.</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>The database will certainly grow, but the pack operation will shrink it back down.</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>Anyway, just wanted to get my thoughts out there on this in case anyone wants to run with it. It should be considerably easier (and less error prone) than implementing a new allocator, and it should provide the basis of a migration path into a new file format when that happens...</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>If such a system were in place, it would be possible to go back to a first fit algorithm (with significantly better performance across a much wider field of applications), and still (hopefully) keep the fragmentation down.</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>Cheers,</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>- Kevin</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2></FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2> </FONT> <TABLE> <TBODY> <TR> <TD width=1 bgColor=blue><FONT face=Arial size=2></FONT></TD> <TD><FONT face=Arial size=2><FONT color=red>> Colin,<BR><BR>I am not seeing as much grief from the allocation strategy as you<BR>describe. I expect that there is an interaction with the allocations<BR>that are being performed by your application. The change was introduced<BR>since it produced smaller stores for our RDFS database without a loss in<BR>speed, but clearly it is a loose for you. Let me look at the code and<BR>see what we can change.<BR><BR>With respect to record copying, you should be able to refactor what you<BR>need out of the dump utility. It has mechanisms for scanning records<BR>from a page perspective, but it does not attempt to do inserts with a<BR>pre-defined logical record id. I can look into that I suppose, but let's<BR>see where the update (below) gets you.<BR><BR>I've attached a modified version of FreePhysicalRowIdPage. Please try<BR>it out and let me know what impact it has on your application. It passes<BR>the jdbm test suite, but there are no performance tests that specifically<BR>target this feature. It has no performance impact (time or disk space) on<BR>our RDFS database application.<BR><BR>-bryan<BR><BR></FONT></FONT></TD></TR></TBODY></TABLE></DIV></FONT></BODY></HTML> |