From: Colin E. <col...@sr...> - 2006-05-24 23:55:14
|
Hi All, Nice work on the JDBM project -- at SRI we use it to store RDF triples for a semantic desktop project here: http://www.openiris.org/. I've been playing with the current CVS version, and noticed that new the free page selection algorithm in FreePhysicalRowIdPage.getFirstLargerThan() actually seems waste *more* space than the previous algorithm unless all of the pages are the same length. For our data, the new algorithm ends up causing the creation of a new page over 90% of the time, even if there are free pages that will fit. A tweak might be a "best fit" instead of "first fit" algorithm, but the current algorithm is pretty bad for almost all data sets. Thanks Colin Evans |
From: Bryan T. <br...@sy...> - 2006-05-25 00:27:46
|
Colin, Thanks for bringing this to my attention. This is actually my #1 problem with the existing code base in terms of performance. The row re-allocation strategy is 10% of the total runtime costs for my application -- also an RDF triple store[1], but using a generic object model as the framework layer over jdbm[2]. I believe that YARS [3] is Yet Another RDF store based on jdbm. We have looked at a few options for correcting this problem. My take is that we need to bin the free rows for more efficient reallocation based on the first free row in the bin of rows that would satisify the allocation request. I am a bit confused by your message. You indicate that a new page is being allocated even when there are existing pages with sufficient free space. Is the problem that the free space on the pages is fragmented and therefore can not be used to satisify the request? If so, a variation on the existing translation page scheme has been discussed in which physical rows would code the slot on a page, and the slot on the page would translate to an offset on the page -- thus providing an opportunity to compact space on the page. Thanks, -bryan [1] http://proto.cognitiveweb.org/projects/cweb/multiproject/cweb-rdf-generic/in dex.html [2] http://proto.cognitiveweb.org/projects/cweb/multiproject/cweb-generic-native /index.html [3] http://sw.deri.org/2004/06/yars/ -----Original Message----- From: jdb...@li... [mailto:jdb...@li...]On Behalf Of Colin Evans Sent: Wednesday, May 24, 2006 7:54 PM To: jdb...@li... Subject: [Jdbm-developer] Free page selection bug Hi All, Nice work on the JDBM project -- at SRI we use it to store RDF triples for a semantic desktop project here: http://www.openiris.org/. I've been playing with the current CVS version, and noticed that new the free page selection algorithm in FreePhysicalRowIdPage.getFirstLargerThan() actually seems waste *more* space than the previous algorithm unless all of the pages are the same length. For our data, the new algorithm ends up causing the creation of a new page over 90% of the time, even if there are free pages that will fit. A tweak might be a "best fit" instead of "first fit" algorithm, but the current algorithm is pretty bad for almost all data sets. Thanks Colin Evans ------------------------------------------------------- All the advantages of Linux Managed Hosting--Without the Cost and Risk! Fully trained technicians. The highest number of Red Hat certifications in the hosting industry. Fanatical Support. Click to learn more http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642 _______________________________________________ Jdbm-developer mailing list Jdb...@li... https://lists.sourceforge.net/lists/listinfo/jdbm-developer |
From: Colin E. <col...@sr...> - 2006-05-25 17:12:42
|
Hi Bryan, The existing algorithm rejects an individual block if the "wasted space" is more than 128 bytes. The aggregate effect is that for almost every write the algorithm runs through the whole linked list of free pages, finds nothing within 128 bytes, and then creates a new block even though there were existing free blocks that could have taken the data, albeit with more wasted space. This causes the .db file to grow in size at a ridiculous rate, most of it full of free blocks. The old (version 1.0) algorithm took the first block that could fit the data regardless of the wasted space -- not optimal, but the .db file grows a lot more slowly. The simplest fix for the problem would be to roll FreePhysicalRowIdPage.getFirstLargerThan() back to the 1.0 version. A simple optimization would be having getFirstLargerThan() select the best-fit within a single page, but there is plenty more that could be done. The Cognitive Web project looks interesting, and there are plenty of connections to SRI's Iris project -- including that Jack Park is one of the original parents of Iris. Iris uses Jena internally, and we get good performance by implementing a Jena graph using JDBM. Thanks Colin Evans Bryan Thompson wrote: > Colin, > > Thanks for bringing this to my attention. This is actually my #1 problem > with the existing code base in terms of performance. The row re-allocation > strategy is 10% of the total runtime costs for my application -- also an RDF > triple store[1], but using a generic object model as the framework layer > over > jdbm[2]. I believe that YARS [3] is Yet Another RDF store based on jdbm. > > We have looked at a few options for correcting this problem. My take is > that > we need to bin the free rows for more efficient reallocation based on the > first > free row in the bin of rows that would satisify the allocation request. > > I am a bit confused by your message. You indicate that a new page is being > allocated even when there are existing pages with sufficient free space. Is > the problem that the free space on the pages is fragmented and therefore can > not be used to satisify the request? If so, a variation on the existing > translation page scheme has been discussed in which physical rows would code > the slot on a page, and the slot on the page would translate to an offset on > the page -- thus providing an opportunity to compact space on the page. > > Thanks, > > -bryan > > [1] > http://proto.cognitiveweb.org/projects/cweb/multiproject/cweb-rdf-generic/in > dex.html > [2] > http://proto.cognitiveweb.org/projects/cweb/multiproject/cweb-generic-native > /index.html > [3] http://sw.deri.org/2004/06/yars/ > > -----Original Message----- > From: jdb...@li... > [mailto:jdb...@li...]On Behalf Of Colin > Evans > Sent: Wednesday, May 24, 2006 7:54 PM > To: jdb...@li... > Subject: [Jdbm-developer] Free page selection bug > > > Hi All, > Nice work on the JDBM project -- at SRI we use it to store RDF triples > for a semantic desktop project here: http://www.openiris.org/. > > I've been playing with the current CVS version, and noticed that new the > free page selection algorithm in > FreePhysicalRowIdPage.getFirstLargerThan() actually seems waste *more* > space than the previous algorithm unless all of the pages are the same > length. For our data, the new algorithm ends up causing the creation of > a new page over 90% of the time, even if there are free pages that will fit. > > A tweak might be a "best fit" instead of "first fit" algorithm, but the > current algorithm is pretty bad for almost all data sets. > > Thanks > Colin Evans > > > > > ------------------------------------------------------- > All the advantages of Linux Managed Hosting--Without the Cost and Risk! > Fully trained technicians. The highest number of Red Hat certifications in > the hosting industry. Fanatical Support. Click to learn more > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642 > _______________________________________________ > Jdbm-developer mailing list > Jdb...@li... > https://lists.sourceforge.net/lists/listinfo/jdbm-developer > |
From: Bryan T. <br...@sy...> - 2006-05-25 17:42:23
|
Colin, I've done some looking at scanning a page for the best fit and it is more expensive than you would think. A few simple tweaks have been done, but I think that we need to expand the data on the page into a model of the page view that makes this operation cheaper - and that would be a more wholesale overhaul. How would you feel about a configuration parameter for the amount of wasted space that would be accepted? Alternatively, at the end of the page we could modify the algorithm to accept the record with the best fit (if none has less than 128 bytes wasted) until the amount that would be wasted is greater than a second threashold, e.g., 1024k. Both could be runtime parameters. We'll definately take a look at iris as part of the work that we are doing. What's good performance? How interested would you be in clustered allocation of records onto pages? That seems to me to be an important next step in improving the record allocation strategy. -bryan -----Original Message----- From: Colin Evans [mailto:col...@sr...] Sent: Thursday, May 25, 2006 1:12 PM To: Bryan Thompson Cc: jdb...@li...; Brad Bebee; Mike Personick; Ibrahim. A. Shafi Subject: Re: [Jdbm-developer] Free page selection bug Hi Bryan, The existing algorithm rejects an individual block if the "wasted space" is more than 128 bytes. The aggregate effect is that for almost every write the algorithm runs through the whole linked list of free pages, finds nothing within 128 bytes, and then creates a new block even though there were existing free blocks that could have taken the data, albeit with more wasted space. This causes the .db file to grow in size at a ridiculous rate, most of it full of free blocks. The old (version 1.0) algorithm took the first block that could fit the data regardless of the wasted space -- not optimal, but the .db file grows a lot more slowly. The simplest fix for the problem would be to roll FreePhysicalRowIdPage.getFirstLargerThan() back to the 1.0 version. A simple optimization would be having getFirstLargerThan() select the best-fit within a single page, but there is plenty more that could be done. The Cognitive Web project looks interesting, and there are plenty of connections to SRI's Iris project -- including that Jack Park is one of the original parents of Iris. Iris uses Jena internally, and we get good performance by implementing a Jena graph using JDBM. Thanks Colin Evans Bryan Thompson wrote: > Colin, > > Thanks for bringing this to my attention. This is actually my #1 problem > with the existing code base in terms of performance. The row re-allocation > strategy is 10% of the total runtime costs for my application -- also an RDF > triple store[1], but using a generic object model as the framework layer > over > jdbm[2]. I believe that YARS [3] is Yet Another RDF store based on jdbm. > > We have looked at a few options for correcting this problem. My take is > that > we need to bin the free rows for more efficient reallocation based on the > first > free row in the bin of rows that would satisify the allocation request. > > I am a bit confused by your message. You indicate that a new page is being > allocated even when there are existing pages with sufficient free space. Is > the problem that the free space on the pages is fragmented and therefore can > not be used to satisify the request? If so, a variation on the existing > translation page scheme has been discussed in which physical rows would code > the slot on a page, and the slot on the page would translate to an offset on > the page -- thus providing an opportunity to compact space on the page. > > Thanks, > > -bryan > > [1] > http://proto.cognitiveweb.org/projects/cweb/multiproject/cweb-rdf-generic/in > dex.html > [2] > http://proto.cognitiveweb.org/projects/cweb/multiproject/cweb-generic-native > /index.html > [3] http://sw.deri.org/2004/06/yars/ > > -----Original Message----- > From: jdb...@li... > [mailto:jdb...@li...]On Behalf Of Colin > Evans > Sent: Wednesday, May 24, 2006 7:54 PM > To: jdb...@li... > Subject: [Jdbm-developer] Free page selection bug > > > Hi All, > Nice work on the JDBM project -- at SRI we use it to store RDF triples > for a semantic desktop project here: http://www.openiris.org/. > > I've been playing with the current CVS version, and noticed that new the > free page selection algorithm in > FreePhysicalRowIdPage.getFirstLargerThan() actually seems waste *more* > space than the previous algorithm unless all of the pages are the same > length. For our data, the new algorithm ends up causing the creation of > a new page over 90% of the time, even if there are free pages that will fit. > > A tweak might be a "best fit" instead of "first fit" algorithm, but the > current algorithm is pretty bad for almost all data sets. > > Thanks > Colin Evans > > > > > ------------------------------------------------------- > All the advantages of Linux Managed Hosting--Without the Cost and Risk! > Fully trained technicians. The highest number of Red Hat certifications in > the hosting industry. Fanatical Support. Click to learn more > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642 > _______________________________________________ > Jdbm-developer mailing list > Jdb...@li... > https://lists.sourceforge.net/lists/listinfo/jdbm-developer > |
From: Colin E. <col...@sr...> - 2006-05-25 18:21:21
|
Hi Bryan, The current algorithm ends up scanning the whole list for almost every block allocation anyways, so doing a best-fit allocation shouldn't be any slower than the current algorithm in the common case. Doing a first-fit allocation will always be faster. I don't have strong opinions about the algorithm. My broad view is that if there is an existing free block that will fit the data, it should always be taken over creating a new block. Otherwise, the probability of there being a match between a data chunk size and a free block size become very low when the data varies greatly in size. Making this a configurable option between selection strategies (first-fit/best-fit/threshold) may be a good solution if you're wedded to the threshold. Iris harvests a lot of data (files, email, web pages) into its RDF store, so we do a lot of background writes, and bloated .db files become a problem. I've been copying BTrees to pack the .db files, and this usually cuts .db file size from 1-2GB to less than 100MB. Iris gets good query performance because of judicious uses of application-level caches and a careful indexing strategy. It is an order of magnitude faster than the RDBMS-based triple stores, and I've found it to be faster than Kowari for our uses as well. It handles hundreds of queries per second over a dataset of many thousands of triples. The OWL ontology that we are using is massive, so the corresponding data is large too. In my more recent work I've taken to turning the caching and performances improvements in the TransactionManager and RecordFile off because the application-level caching is much more performant and I'm trying to restrict the memory footprint of the overall app. I found that getting rid of the free cache in RecordFile, setting TransactionManager.setMaximumTransactionsInLog() to 1, and committing often keeps the memory footprint of JDBM small, and I make up the query performance by using weak caches of nodes and triples inside our Graph implementation. If there was one golden feature that I could ask for, it would be solutions to deal with the free space packing issue. A new record allocation strategy would help, and support in the BaseRecordManager for iterating through allocated records and allowing inserts to specify the logical row id would make the stop-and-copy approach to repacking go more quickly. The system could iterate through the allocated records in the source BaseRecordManager and copy them directly to a target BaseRecordManager with logical row ids intact so that the BTrees would run without modification. Thanks Colin Bryan Thompson wrote: > Colin, > > I've done some looking at scanning a page for the best fit and it is > more expensive than you would think. A few simple tweaks have been > done, but I think that we need to expand the data on the page into a > model of the page view that makes this operation cheaper - and that > would be a more wholesale overhaul. > > How would you feel about a configuration parameter for the amount of > wasted space that would be accepted? Alternatively, at the end of the > page we could modify the algorithm to accept the record with the best > fit (if none has less than 128 bytes wasted) until the amount that would > be wasted is greater than a second threashold, e.g., 1024k. Both could > be runtime parameters. > > We'll definately take a look at iris as part of the work that we are > doing. What's good performance? How interested would you be in clustered > allocation of records onto pages? That seems to me to be an important next > step in improving the record allocation strategy. > > -bryan > > -----Original Message----- > From: Colin Evans [mailto:col...@sr...] > Sent: Thursday, May 25, 2006 1:12 PM > To: Bryan Thompson > Cc: jdb...@li...; Brad Bebee; Mike Personick; > Ibrahim. A. Shafi > Subject: Re: [Jdbm-developer] Free page selection bug > > > Hi Bryan, > The existing algorithm rejects an individual block if the "wasted space" > is more than 128 bytes. The aggregate effect is that for almost every > write the algorithm runs through the whole linked list of free pages, > finds nothing within 128 bytes, and then creates a new block even though > there were existing free blocks that could have taken the data, albeit > with more wasted space. This causes the .db file to grow in size at a > ridiculous rate, most of it full of free blocks. > > The old (version 1.0) algorithm took the first block that could fit the > data regardless of the wasted space -- not optimal, but the .db file > grows a lot more slowly. The simplest fix for the problem would be to > roll FreePhysicalRowIdPage.getFirstLargerThan() back to the 1.0 > version. A simple optimization would be having getFirstLargerThan() > select the best-fit within a single page, but there is plenty more that > could be done. > > The Cognitive Web project looks interesting, and there are plenty of > connections to SRI's Iris project -- including that Jack Park is one of > the original parents of Iris. Iris uses Jena internally, and we get > good performance by implementing a Jena graph using JDBM. > > Thanks > Colin Evans > > > Bryan Thompson wrote: > >> Colin, >> >> Thanks for bringing this to my attention. This is actually my #1 problem >> with the existing code base in terms of performance. The row >> > re-allocation > >> strategy is 10% of the total runtime costs for my application -- also an >> > RDF > >> triple store[1], but using a generic object model as the framework layer >> over >> jdbm[2]. I believe that YARS [3] is Yet Another RDF store based on jdbm. >> >> We have looked at a few options for correcting this problem. My take is >> that >> we need to bin the free rows for more efficient reallocation based on the >> first >> free row in the bin of rows that would satisify the allocation request. >> >> I am a bit confused by your message. You indicate that a new page is >> > being > >> allocated even when there are existing pages with sufficient free space. >> > Is > >> the problem that the free space on the pages is fragmented and therefore >> > can > >> not be used to satisify the request? If so, a variation on the existing >> translation page scheme has been discussed in which physical rows would >> > code > >> the slot on a page, and the slot on the page would translate to an offset >> > on > >> the page -- thus providing an opportunity to compact space on the page. >> >> Thanks, >> >> -bryan >> >> [1] >> >> > http://proto.cognitiveweb.org/projects/cweb/multiproject/cweb-rdf-generic/in > >> dex.html >> [2] >> >> > http://proto.cognitiveweb.org/projects/cweb/multiproject/cweb-generic-native > >> /index.html >> [3] http://sw.deri.org/2004/06/yars/ >> >> -----Original Message----- >> From: jdb...@li... >> [mailto:jdb...@li...]On Behalf Of Colin >> Evans >> Sent: Wednesday, May 24, 2006 7:54 PM >> To: jdb...@li... >> Subject: [Jdbm-developer] Free page selection bug >> >> >> Hi All, >> Nice work on the JDBM project -- at SRI we use it to store RDF triples >> for a semantic desktop project here: http://www.openiris.org/. >> >> I've been playing with the current CVS version, and noticed that new the >> free page selection algorithm in >> FreePhysicalRowIdPage.getFirstLargerThan() actually seems waste *more* >> space than the previous algorithm unless all of the pages are the same >> length. For our data, the new algorithm ends up causing the creation of >> a new page over 90% of the time, even if there are free pages that will >> > fit. > >> A tweak might be a "best fit" instead of "first fit" algorithm, but the >> current algorithm is pretty bad for almost all data sets. >> >> Thanks >> Colin Evans >> >> >> >> >> ------------------------------------------------------- >> All the advantages of Linux Managed Hosting--Without the Cost and Risk! >> Fully trained technicians. The highest number of Red Hat certifications in >> the hosting industry. Fanatical Support. Click to learn more >> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642 >> _______________________________________________ >> Jdbm-developer mailing list >> Jdb...@li... >> https://lists.sourceforge.net/lists/listinfo/jdbm-developer >> >> > > |
From: Bryan T. <br...@sy...> - 2006-05-25 20:11:33
|
Colin, I am not seeing as much grief from the allocation strategy as you describe. I expect that there is an interaction with the allocations that are being performed by your application. The change was introduced since it produced smaller stores for our RDFS database without a loss in speed, but clearly it is a loose for you. Let me look at the code and see what we can change. With respect to record copying, you should be able to refactor what you need out of the dump utility. It has mechanisms for scanning records from a page perspective, but it does not attempt to do inserts with a pre-defined logical record id. I can look into that I suppose, but let's see where the update (below) gets you. I've attached a modified version of FreePhysicalRowIdPage. Please try it out and let me know what impact it has on your application. It passes the jdbm test suite, but there are no performance tests that specifically target this feature. It has no performance impact (time or disk space) on our RDFS database application. -bryan /** * JDBM LICENSE v1.00 * * Redistribution and use of this software and associated documentation * ("Software"), with or without modification, are permitted provided * that the following conditions are met: * * 1. Redistributions of source code must retain copyright * statements and notices. Redistributions must also contain a * copy of this document. * * 2. Redistributions in binary form must reproduce the * above copyright notice, this list of conditions and the * following disclaimer in the documentation and/or other * materials provided with the distribution. * * 3. The name "JDBM" must not be used to endorse or promote * products derived from this Software without prior written * permission of Cees de Groot. For written permission, * please contact cg...@cd.... * * 4. Products derived from this Software may not be called "JDBM" * nor may "JDBM" appear in their names without prior written * permission of Cees de Groot. * * 5. Due credit should be given to the JDBM Project * (http://jdbm.sourceforge.net/). * * THIS SOFTWARE IS PROVIDED BY THE JDBM PROJECT AND CONTRIBUTORS * ``AS IS'' AND ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT * NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND * FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL * CEES DE GROOT OR ANY CONTRIBUTORS BE LIABLE FOR ANY DIRECT, * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED * OF THE POSSIBILITY OF SUCH DAMAGE. * * Copyright 2000 (C) Cees de Groot. All Rights Reserved. * Contributions are Copyright (C) 2000 by their associated contributors. * * $Id: FreePhysicalRowIdPage.java,v 1.2 2005/11/08 20:58:28 thompsonbry Exp $ */ package jdbm.recman; /** * Class describing a page that holds physical rowids that were freed. */ final class FreePhysicalRowIdPage extends PageHeader { // offsets private static final short O_COUNT = PageHeader.SIZE; // short count static final short O_FREE = O_COUNT + Magic.SZ_SHORT; static final short ELEMS_PER_PAGE = (RecordFile.BLOCK_SIZE - O_FREE) / FreePhysicalRowId.SIZE; /** * Used to place a limit on the wasted capacity resulting in a modified * first fit policy for re-allocated of free records. This value is the * maximum first fit waste that is accepted when scanning the available * slots on a given page of the free physical row page list. */ static final transient long wastedCapacityLimit = 128; /** * Used to place a limit on the wasted capacity resulting in a modified * first fit policy for re-allocated of free records. This value is the * upper bound of waste that is accepted before scanning another page on the * free physical row page list. If no page can be found whose waste for the * re-allocation request would be less than this value then a new page will * be allocated and the requested physical row will be allocated from that * new page. */ static final transient long wastedCapacityLimit2 = PageHeader.SIZE/4; // slots we returned. FreePhysicalRowId[] slots = new FreePhysicalRowId[ELEMS_PER_PAGE]; /** * Constructs a data page view from the indicated block. */ FreePhysicalRowIdPage(BlockIo block) { super(block); } /** * Factory method to create or return a data page for the * indicated block. */ static FreePhysicalRowIdPage getFreePhysicalRowIdPageView(BlockIo block) { BlockView view = block.getView(); if (view != null && view instanceof FreePhysicalRowIdPage) return (FreePhysicalRowIdPage) view; else return new FreePhysicalRowIdPage(block); } /** Returns the number of free rowids */ short getCount() { return block.readShort(O_COUNT); } /** Sets the number of free rowids */ private void setCount(short i) { block.writeShort(O_COUNT, i); } /** Frees a slot */ void free(int slot) { get(slot).setSize(0); setCount((short) (getCount() - 1)); } /** Allocates a slot */ FreePhysicalRowId alloc(int slot) { setCount((short) (getCount() + 1)); return get(slot); } /** Returns true if a slot is allocated */ boolean isAllocated(int slot) { return get(slot).getSize() != 0; } /** Returns true if a slot is free */ boolean isFree(int slot) { return !isAllocated(slot); } /** Returns the value of the indicated slot */ FreePhysicalRowId get(int slot) { if (slots[slot] == null) { slots[slot] = new FreePhysicalRowId(block, slotToOffset(slot)); } return slots[slot]; } /** Converts slot to offset */ short slotToOffset(int slot) { return (short) (O_FREE + (slot * FreePhysicalRowId.SIZE)); } /** * Returns first free slot, -1 if no slots are available */ int getFirstFree() { for (int i = 0; i < ELEMS_PER_PAGE; i++) { if (isFree(i)) return i; } return -1; } /** * Returns first slot with available size >= indicated size, * or -1 if no slots are available. * * @param size The requested allocation size. **/ int getFirstLargerThan(int size) { /* * Tracks slot of the smallest available physical row on the page. */ int bestSlot = -1; /* * Tracks size of the smallest available physical row on the page. */ int bestSlotSize = 0; /* * Scan each slot in the page. */ for (int i = 0; i < ELEMS_PER_PAGE; i++) { /* * When large allocations are used, the space wasted by the first * fit policy can become very large (25% of the store). The first * fit policy has been modified to only accept a record with a * maximum amount of wasted capacity given the requested allocation * size. */ // Note: isAllocated(i) is equiv to get(i).getSize() != 0 long theSize = get(i).getSize(); // capacity of this free record. long waste = theSize - size; // when non-negative, record has suf. capacity. if( waste >= 0 ) { if( waste < wastedCapacityLimit ) { return i; // record has suf. capacity and not too much waste. } else if( bestSlotSize >= size ) { /* * This slot is a better fit that any that we have seen so * far on this page so we update the slot# and available * size for that slot. */ bestSlot = i; bestSlotSize = size; } } } if( bestSlot != -1 ) { /* * An available slot was identified that is large enough, but it * exceeds the first wasted capacity limit. At this point we check * to see whether it is under our second wasted capacity limit. If * it is, then we return that slot. */ long waste = bestSlotSize - size; // when non-negative, record has suf. capacity. if( waste >= 0 && waste < wastedCapacityLimit2 ) { // record has suf. capacity and not too much waste. return bestSlot; } /* * Will scan next page on the free physical row page list. */ } return -1; } } |
From: Kevin D. <ke...@tr...> - 2006-05-30 17:46:35
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD> <STYLE type=text/css> P, UL, OL, DL, DIR, MENU, PRE { margin: 0 auto;}</STYLE> <META content="MSHTML 6.00.2900.2802" name=GENERATOR></HEAD> <BODY leftMargin=1 topMargin=1 rightMargin=1><FONT face=Tahoma> <DIV><FONT face=Arial size=2>Gents-</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>Wanted to float an intermediate idea here - it depends on the database utlization, but may provide a quick out.</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>The primary issue with the first fit algorithm was that, in certain reasonable usage models, it resulted in massive amounts of wasted space. This was more a result of the fact that the allocator doesn't support splitting or coallescene than anything. The downside of the current 'fix' for this issue is that other (quite reasonable) usage scenarios result in a huge amount of wasted CPU cycles (dontcha love these kinds of tradeoffs?).</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>The ultimate fix for this, of course, is to re-design the allocation strategy completely, and include splitting and coallescing (or have that behavior be implicit to the algorithm). There has been quite a bit of discussion about strategies in this area, but things move slowly, so I'm not holding my breath for any of them.</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>It seems to me that the immediate workaround here is to add pack capability to jdbm. I believe that this could be done, even with the current data structures, by creating a temporary lookup table for reverse mapping physical offsets to recids. The downside to this approach is that it might require some sort of quiessence in the system to get it right, so if the database is going to be hammered 24x7, then it may not be the right solution - but I thought I'd toss it out.</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>Here is how the algorithm could work:</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>The pack operation is triggered</FONT></DIV> <DIV><FONT face=Arial size=2>jdbm creates a new, separate jdbm database with transactions turned off, optimized purely for speed (this is a temporary file)</FONT></DIV> <DIV><FONT face=Arial size=2>jdbm adds a BTree to act as a reverse lookup table. Keys are Long, physical offset. Values are Long, recid</FONT></DIV> <DIV><FONT face=Arial size=2>jdbm cranks through the translation table and inserts physical offset->recid mapping into the BTree</FONT></DIV> <DIV><FONT face=Arial size=2>jdbm then grabs the first data page, and checks to see if the first record on the page has identical length and used values. If not, then in a single transaction, it moves the second record to consume all space, updates the length and used value of the first record, lookups up the recid of the second physical record, updates the offset of the second physical record in the translation table, closes transaction</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>As an additional check, the system should check the physical offset in the current translation table against the physical offset of the actual record. If they are not the same, then the record shouldn't be moved, and the next record should be considered.</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>I'm pretty sure that it should be possible to implement this in a manner that allows for partial pack operations.</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>The database will certainly grow, but the pack operation will shrink it back down.</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>Anyway, just wanted to get my thoughts out there on this in case anyone wants to run with it. It should be considerably easier (and less error prone) than implementing a new allocator, and it should provide the basis of a migration path into a new file format when that happens...</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>If such a system were in place, it would be possible to go back to a first fit algorithm (with significantly better performance across a much wider field of applications), and still (hopefully) keep the fragmentation down.</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>Cheers,</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>- Kevin</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2></FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2> </FONT> <TABLE> <TBODY> <TR> <TD width=1 bgColor=blue><FONT face=Arial size=2></FONT></TD> <TD><FONT face=Arial size=2><FONT color=red>> Colin,<BR><BR>I am not seeing as much grief from the allocation strategy as you<BR>describe. I expect that there is an interaction with the allocations<BR>that are being performed by your application. The change was introduced<BR>since it produced smaller stores for our RDFS database without a loss in<BR>speed, but clearly it is a loose for you. Let me look at the code and<BR>see what we can change.<BR><BR>With respect to record copying, you should be able to refactor what you<BR>need out of the dump utility. It has mechanisms for scanning records<BR>from a page perspective, but it does not attempt to do inserts with a<BR>pre-defined logical record id. I can look into that I suppose, but let's<BR>see where the update (below) gets you.<BR><BR>I've attached a modified version of FreePhysicalRowIdPage. Please try<BR>it out and let me know what impact it has on your application. It passes<BR>the jdbm test suite, but there are no performance tests that specifically<BR>target this feature. It has no performance impact (time or disk space) on<BR>our RDFS database application.<BR><BR>-bryan<BR><BR></FONT></FONT></TD></TR></TBODY></TABLE></DIV></FONT></BODY></HTML> |
From: Bryan T. <br...@sy...> - 2006-05-30 17:59:55
|
Kevin, Colin was requesting a feature to insert a physical row at a specified location in the file. It sounds like that is all he needs to provide a pack feature. If that is true, perhaps we can write the insert method and Colin could contribute the rest of the pack feature? In other news, I am working through the test suite for a page at a time record allocation strategy. It will buffer serialized records until there are enough to fill up a page and then batch them onto the page. I am hoping that this will produce much more efficient installations of records into pages. This will be an optional feature that can be enabled using RecordManagerOptions. Once we have this working, it should be a minor change to introduce a clustering feature. -bryan -----Original Message----- From: jdb...@li... [mailto:jdb...@li...]On Behalf Of Kevin Day Sent: Tuesday, May 30, 2006 1:51 PM To: jdb...@li... Subject: re[2]: [Jdbm-developer] Free page selection bug Gents- Wanted to float an intermediate idea here - it depends on the database utlization, but may provide a quick out. The primary issue with the first fit algorithm was that, in certain reasonable usage models, it resulted in massive amounts of wasted space. This was more a result of the fact that the allocator doesn't support splitting or coallescene than anything. The downside of the current 'fix' for this issue is that other (quite reasonable) usage scenarios result in a huge amount of wasted CPU cycles (dontcha love these kinds of tradeoffs?). The ultimate fix for this, of course, is to re-design the allocation strategy completely, and include splitting and coallescing (or have that behavior be implicit to the algorithm). There has been quite a bit of discussion about strategies in this area, but things move slowly, so I'm not holding my breath for any of them. It seems to me that the immediate workaround here is to add pack capability to jdbm. I believe that this could be done, even with the current data structures, by creating a temporary lookup table for reverse mapping physical offsets to recids. The downside to this approach is that it might require some sort of quiessence in the system to get it right, so if the database is going to be hammered 24x7, then it may not be the right solution - but I thought I'd toss it out. Here is how the algorithm could work: The pack operation is triggered jdbm creates a new, separate jdbm database with transactions turned off, optimized purely for speed (this is a temporary file) jdbm adds a BTree to act as a reverse lookup table. Keys are Long, physical offset. Values are Long, recid jdbm cranks through the translation table and inserts physical offset->recid mapping into the BTree jdbm then grabs the first data page, and checks to see if the first record on the page has identical length and used values. If not, then in a single transaction, it moves the second record to consume all space, updates the length and used value of the first record, lookups up the recid of the second physical record, updates the offset of the second physical record in the translation table, closes transaction As an additional check, the system should check the physical offset in the current translation table against the physical offset of the actual record. If they are not the same, then the record shouldn't be moved, and the next record should be considered. I'm pretty sure that it should be possible to implement this in a manner that allows for partial pack operations. The database will certainly grow, but the pack operation will shrink it back down. Anyway, just wanted to get my thoughts out there on this in case anyone wants to run with it. It should be considerably easier (and less error prone) than implementing a new allocator, and it should provide the basis of a migration path into a new file format when that happens... If such a system were in place, it would be possible to go back to a first fit algorithm (with significantly better performance across a much wider field of applications), and still (hopefully) keep the fragmentation down. Cheers, - Kevin > Colin, I am not seeing as much grief from the allocation strategy as you describe. I expect that there is an interaction with the allocations that are being performed by your application. The change was introduced since it produced smaller stores for our RDFS database without a loss in speed, but clearly it is a loose for you. Let me look at the code and see what we can change. With respect to record copying, you should be able to refactor what you need out of the dump utility. It has mechanisms for scanning records from a page perspective, but it does not attempt to do inserts with a pre-defined logical record id. I can look into that I suppose, but let's see where the update (below) gets you. I've attached a modified version of FreePhysicalRowIdPage. Please try it out and let me know what impact it has on your application. It passes the jdbm test suite, but there are no performance tests that specifically target this feature. It has no performance impact (time or disk space) on our RDFS database application. -bryan ------------------------------------------------------- All the advantages of Linux Managed Hosting--Without the Cost and Risk! Fully trained technicians. The highest number of Red Hat certifications in the hosting industry. Fanatical Support. Click to learn more http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642 _______________________________________________ Jdbm-developer mailing list Jdb...@li... https://lists.sourceforge.net/lists/listinfo/jdbm-developer |
From: Kevin D. <ke...@tr...> - 2006-05-30 18:08:15
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD> <STYLE type=text/css> P, UL, OL, DL, DIR, MENU, PRE { margin: 0 auto;}</STYLE> <META content="MSHTML 6.00.2900.2802" name=GENERATOR></HEAD> <BODY leftMargin=1 topMargin=1 rightMargin=1><FONT face=Tahoma> <DIV><FONT face=Arial size=2>hmmmm...</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>Is this the request? I don't think that would truly provide pack capability, because you would need to be able to move records in an overlapped manner (e.g. shift a 250 byte record downward by 8 bytes) - that's quite a bit different than providing insertion at a specific offset...</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>This is obviously not something that we'd want to add to the public API... but I think that providing the following call would be sufficient:</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>move(long oldPhysicalOffset, long newPhysicalOffset, long recID) throws OverlappedMoveRequestException, InconsistentRecIDMappingException</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>This call could check to ensure that the new location is either free bytes or only overlapped by the record data, then check to ensure that the oldPhysicalOffset and recID are consistent. If those checks pass, it would then perform the move and update all values as needed.</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>My biggest concern here is in terms of locking the database - though it will create a bottleneck, I think the only truly safe thing we can do during this operation is to grab a full synch lock on the entire record manager...</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>What do you guys think?</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>- K</FONT></DIV> <DIV><FONT face=Arial size=2></FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2> </FONT> <TABLE> <TBODY> <TR> <TD width=1 bgColor=blue><FONT face=Arial size=2></FONT></TD> <TD><FONT face=Arial size=2><FONT color=blue>> Kevin, <BR> <BR>Colin was requesting a feature to insert a physical row at a specified location in the file. It sounds <BR>like that is all he needs to provide a pack feature. If that is true, perhaps we can write the insert <BR>method and Colin could contribute the rest of the pack feature? <BR> <BR>In other news, I am working through the test suite for a page at a time record allocation strategy. <BR>It will buffer serialized records until there are enough to fill up a page and then batch them onto <BR>the page. I am hoping that this will produce much more efficient installations of records into pages. <BR>This will be an optional feature that can be enabled using RecordManagerOptions. <BR> <BR>Once we have this working, it should be a minor change to introduce a clustering feature. <BR> <BR>-bryan <BR>-----Original Message-----<BR>From: <A href="mailto:jdb...@li..."><FONT color=#0000ff>jdb...@li...</FONT></A> <A href="mailto:jdb...@li...]On"><FONT color=#0000ff>[mailto:jdb...@li...]On</FONT></A> Behalf Of Kevin Day<BR>Sent: Tuesday, May 30, 2006 1:51 PM<BR>To: <A href="mailto:jdb...@li..."><FONT color=#0000ff>jdb...@li...</FONT></A><BR>Subject: re[2]: [Jdbm-developer] Free page selection bug<BR><BR><BR>Gents- <BR> <BR>Wanted to float an intermediate idea here - it depends on the database utlization, but may provide a quick out. <BR> <BR>The primary issue withthe first fit algorithm was that, in certain reasonable usage models, it resulted in massive amounts of wasted space. This was more a result of the fact that the allocator doesn't support splitting or coallescene than anything. The downside of the current 'fix' for this issue is that other (quite reasonable) usage scenarios result in a huge amount of wasted CPU cycles (dontcha love these kinds of tradeoffs?). <BR> <BR>The ultimate fix for this, of course, is to re-design the allocation strategy completely, and include splitting and coallescing (or have that behavior be implicit to the algorithm). There has been quite a bit of discussion about strategies in this area, but things move slowly, so I'm not holding my breath for any of them. <BR> <BR> <BR>It seems to me that the immediate workaround here is to add pack capability to jdbm. I believe that this could be done, even with the current data structures, by creating a temporary lookup table for reverse mapping physical offsets to recids. The downside to this approach is that it might require some sort of quiessence in the system to get it right, so if the database is going to be hammered 24x7, then itmay not be the right solution - but I thought I'd toss it out. <BR> <BR>Here is how the algorithm could work: <BR> <BR>The pack operation is triggered <BR>jdbm creates a new, separate jdbm database with transactions turned off, optimized purely for speed (this is a temporary file) <BR>jdbm adds a BTree to act as a reverse lookup table. Keys are Long, physical offset. Values are Long, recid <BR>jdbm cranks through the translation table and inserts physical offset->recid mapping into the BTree <BR>jdbm then grabs the first data page, and checks to see if the first record on the page has identical length and used values. If not, then in a single transaction, it moves the second record to consume all space, updates the length and used value of the first record, lookups up the recid of the second physical record, updates the offset of the second physical recordin the translation table, closes transaction <BR> <BR>As an additional check, the system should check the physical offset in the current translation table against the physical offset of the actual record. If they are not the same, then the record shouldn't be moved, and the next record should be considered. <BR> <BR>I'm pretty sure that it should be possible to implement this in a manner that allows for partial pack operations. <BR> <BR>The database will certainly grow, but the pack operation will shrink it back down. <BR> <BR> <BR>Anyway, just wanted to get my thoughts out there on this in case anyone wants to run with it. It should be considerably easier (and less error prone) than implementing a new allocator, and it should provide the basis of a migration path into a new file format when that happens... <BR> <BR>If such a system were in place, it would be possible to go back to a first fit algorithm (with significantly better performance across a much wider field of applications), and still (hopefully) keep the fragmentation down. <BR> <BR>Cheers, <BR> <BR>- Kevin <BR> <BR> <BR><BR> <BR> <BR>>Colin,<BR><BR>I am not seeing as much grief from the allocation strategy as you<BR>describe. I expect that there is an interaction with the allocations<BR>that are being performed by your application. The change was introduced<BR>since it produced smaller stores for our RDFS database without a loss in<BR>speed, but clearly it is a loose for you. Let me look at the code and<BR>see what we can change.<BR><BR>With respect to record copying, you should be able to refactor what you<BR>need out of the dump utility. It has mechanisms for scanning records<BR>from a page perspective, but it does not attempt to do inserts with a<BR>pre-defined logical record id. I can look into that I suppose, but let's<BR>see where the update (below) gets you.<BR><BR>I've attached a modified version of FreePhysicalRowIdPage. Please try<BR>it out and let me know what impact it has on your application. It passes<BR>the jdbm test suite, but there are no performance tests that specifically<BR>target this feature. It has no performance impact (time or disk space) on<BR>our RDFS database application.<BR><BR>-bryan<BR><BR>------------------------------------------------------- All the advantages of Linux Managed Hosting--Without the Cost and Risk! Fully trained technicians. The highest number of Red Hat certifications in the hosting industry. FanaticalSupport. Click to learn more <A href="http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642"><FONT color=#0000ff>http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642</FONT></A> _______________________________________________ Jdbm-developer mailing list <A href="mailto:Jdb...@li..."><FONT color=#0000ff>Jdb...@li...</FONT></A> <A href="https://lists.sourceforge.net/lists/listinfo/jdbm-developer"><FONT color=#0000ff>https://lists.sourceforge.net/lists/listinfo/jdbm-developer</FONT></A> <<BR></FONT></FONT></TD></TR></TBODY></TABLE></DIV></FONT></BODY></HTML> |
From: Bryan T. <br...@sy...> - 2006-05-30 19:05:14
|
Kevin, I was responding to this: Colin wrote: <snip> If there was one golden feature that I could ask for, it would be solutions to deal with the free space packing issue. A new record allocation strategy would help, and support in the BaseRecordManager for iterating through allocated records and allowing inserts to specify the logical row id would make the stop-and-copy approach to repacking go more quickly. The system could iterate through the allocated records in the source BaseRecordManager and copy them directly to a target BaseRecordManager with logical row ids intact so that the BTrees would run without modification. </snip> -bryan -----Original Message----- From: jdb...@li... [mailto:jdb...@li...]On Behalf Of Kevin Day Sent: Tuesday, May 30, 2006 2:12 PM To: jdb...@li... Subject: re[4]: [Jdbm-developer] Free page selection bug hmmmm... Is this the request? I don't think that would truly provide pack capability, because you would need to be able to move records in an overlapped manner (e.g. shift a 250 byte record downward by 8 bytes) - that's quite a bit different than providing insertion at a specific offset... This is obviously not something that we'd want to add to the public API... but I think that providing the following call would be sufficient: move(long oldPhysicalOffset, long newPhysicalOffset, long recID) throws OverlappedMoveRequestException, InconsistentRecIDMappingException This call could check to ensure that the new location is either free bytes or only overlapped by the record data, then check to ensure that the oldPhysicalOffset and recID are consistent. If those checks pass, it would then perform the move and update all values as needed. My biggest concern here is in terms of locking the database - though it will create a bottleneck, I think the only truly safe thing we can do during this operation is to grab a full synch lock on the entire record manager... What do you guys think? - K > Kevin, Colin was requesting a feature to insert a physical row at a specified location in the file. It sounds like that is all he needs to provide a pack feature. If that is true, perhaps we can write the insert method and Colin could contribute the rest of the pack feature? In other news, I am working through the test suite for a page at a time record allocation strategy. It will buffer serialized records until there are enough to fill up a page and then batch them onto the page. I am hoping that this will produce much more efficient installations of records into pages. This will be an optional feature that can be enabled using RecordManagerOptions. Once we have this working, it should be a minor change to introduce a clustering feature. -bryan -----Original Message----- From: jdb...@li... [mailto:jdb...@li...]On Behalf Of Kevin Day Sent: Tuesday, May 30, 2006 1:51 PM To: jdb...@li... Subject: re[2]: [Jdbm-developer] Free page selection bug Gents- Wanted to float an intermediate idea here - it depends on the database utlization, but may provide a quick out. The primary issue withthe first fit algorithm was that, in certain reasonable usage models, it resulted in massive amounts of wasted space. This was more a result of the fact that the allocator doesn't support splitting or coallescene than anything. The downside of the current 'fix' for this issue is that other (quite reasonable) usage scenarios result in a huge amount of wasted CPU cycles (dontcha love these kinds of tradeoffs?). The ultimate fix for this, of course, is to re-design the allocation strategy completely, and include splitting and coallescing (or have that behavior be implicit to the algorithm). There has been quite a bit of discussion about strategies in this area, but things move slowly, so I'm not holding my breath for any of them. It seems to me that the immediate workaround here is to add pack capability to jdbm. I believe that this could be done, even with the current data structures, by creating a temporary lookup table for reverse mapping physical offsets to recids. The downside to this approach is that it might require some sort of quiessence in the system to get it right, so if the database is going to be hammered 24x7, then itmay not be the right solution - but I thought I'd toss it out. Here is how the algorithm could work: The pack operation is triggered jdbm creates a new, separate jdbm database with transactions turned off, optimized purely for speed (this is a temporary file) jdbm adds a BTree to act as a reverse lookup table. Keys are Long, physical offset. Values are Long, recid jdbm cranks through the translation table and inserts physical offset->recid mapping into the BTree jdbm then grabs the first data page, and checks to see if the first record on the page has identical length and used values. If not, then in a single transaction, it moves the second record to consume all space, updates the length and used value of the first record, lookups up the recid of the second physical record, updates the offset of the second physical recordin the translation table, closes transaction As an additional check, the system should check the physical offset in the current translation table against the physical offset of the actual record. If they are not the same, then the record shouldn't be moved, and the next record should be considered. I'm pretty sure that it should be possible to implement this in a manner that allows for partial pack operations. The database will certainly grow, but the pack operation will shrink it back down. Anyway, just wanted to get my thoughts out there on this in case anyone wants to run with it. It should be considerably easier (and less error prone) than implementing a new allocator, and it should provide the basis of a migration path into a new file format when that happens... If such a system were in place, it would be possible to go back to a first fit algorithm (with significantly better performance across a much wider field of applications), and still (hopefully) keep the fragmentation down. Cheers, - Kevin >Colin, I am not seeing as much grief from the allocation strategy as you describe. I expect that there is an interaction with the allocations that are being performed by your application. The change was introduced since it produced smaller stores for our RDFS database without a loss in speed, but clearly it is a loose for you. Let me look at the code and see what we can change. With respect to record copying, you should be able to refactor what you need out of the dump utility. It has mechanisms for scanning records from a page perspective, but it does not attempt to do inserts with a pre-defined logical record id. I can look into that I suppose, but let's see where the update (below) gets you. I've attached a modified version of FreePhysicalRowIdPage. Please try it out and let me know what impact it has on your application. It passes the jdbm test suite, but there are no performance tests that specifically target this feature. It has no performance impact (time or disk space) on our RDFS database application. -bryan ------------------------------------------------------- All the advantages of Linux Managed Hosting--Without the Cost and Risk! Fully trained technicians. The highest number of Red Hat certifications in the hosting industry. FanaticalSupport. Click to learn more http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642 _______________________________________________ Jdbm-developer mailing list Jdb...@li... https://lists.sourceforge.net/lists/listinfo/jdbm-developer < ------------------------------------------------------- All the advantages of Linux Managed Hosting--Without the Cost and Risk! Fully trained technicians. The highest number of Red Hat certifications in the hosting industry. Fanatical Support. Click to learn more http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642 _______________________________________________ Jdbm-developer mailing list Jdb...@li... https://lists.sourceforge.net/lists/listinfo/jdbm-developer |
From: Kevin D. <ke...@tr...> - 2006-05-30 19:44:30
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD> <STYLE type=text/css> P, UL, OL, DL, DIR, MENU, PRE { margin: 0 auto;}</STYLE> <META content="MSHTML 6.00.2900.2802" name=GENERATOR></HEAD> <BODY leftMargin=1 topMargin=1 rightMargin=1><FONT face=Tahoma> <DIV><FONT face=Arial size=2>Bryan-</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>ahhh - I see what he was getting at. Is there any advantage to doing it this way instead of an in-place pack?</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>I could certainly see where it would be easy to allow specifying of a recid during an insert (heck, the current implementation might already work just by calling update() instead of insert() - haven't checked the source, so I could be wrong on that... ), but writing to a separate file seems like it would be more intrusive (you'd literally have to take the entire database off-line to do the pack), than the alternative... It would be faster, though, because no logging against the source recman would be needed...</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>I'm definitely open to thinking about it from both directions, though.</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>- K</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2> </FONT> <TABLE> <TBODY> <TR> <TD width=1 bgColor=blue><FONT face=Arial size=2></FONT></TD> <TD><FONT face=Arial size=2><FONT color=green>> Kevin,<BR><BR>I was responding to this:<BR><BR>Colin wrote:<BR><snip><BR>If there was one golden feature that I could ask for, it would be<BR>solutions to deal with the free space packing issue. A new record<BR>allocation strategy would help, and support in the BaseRecordManager for<BR>iterating through allocated records and allowing inserts to specify the<BR>logical row id would make the stop-and-copy approach to repacking go<BR>more quickly. The system could iterate through the allocated records in<BR>the source BaseRecordManager and copy them directly to a target<BR>BaseRecordManager with logical row ids intact so that the BTrees would<BR>run without modification.<BR></snip><BR><BR>-bryan<BR>-----Original Message-----<BR>From: <A href="mailto:jdb...@li..."><FONT color=#0000ff>jdb...@li...</FONT></A><BR><A href="mailto:jdb...@li...]On"><FONT color=#0000ff>[mailto:jdb...@li...]On</FONT></A> Behalf Of Kevin Day<BR>Sent: Tuesday, May 30, 2006 2:12 PM<BR>To: <A href="mailto:jdb...@li..."><FONT color=#0000ff>jdb...@li...</FONT></A><BR>Subject: re[4]: [Jdbm-developer] Free page selection bug<BR><BR><BR>hmmmm...<BR><BR>Is this the request? I don't think that would truly provide pack<BR>capability, because you would need to be able to move records in an<BR>overlapped manner (e.g. shift a 250 byte record downward by 8 bytes) -<BR>that's quite a bit different than providing insertion at a specific<BR>offset...<BR><BR>This is obviously not something that we'd want to add to the public API...<BR>but I think that providing the following call would be sufficient:<BR><BR>move(long oldPhysicalOffset, long newPhysicalOffset, long recID) throws<BR>OverlappedMoveRequestException, InconsistentRecIDMappingException<BR><BR><BR>This call could check to ensure that the new location is either free bytes<BR>or only overlapped by the record data, then check to ensure that the<BR>oldPhysicalOffset and recID are consistent. If those checks pass, it would<BR>then perform the move and update all values as needed.<BR><BR>My biggest concern here is in terms of locking the database - though it will<BR>create a bottleneck, I think the only truly safe thing we can do during this<BR>operation is to grab a full synch lock on the entire record manager...<BR><BR>What do you guys think?<BR><BR>- K<BR><BR> > Kevin,<BR><BR>Colin was requesting a feature to insert a physical row at a specified<BR>location in the file. It sounds<BR>like that is all he needs to provide a pack feature. If that is true,<BR>perhaps we can write the insert<BR>method and Colin could contribute the rest of the pack feature?<BR><BR>In other news, I am working through the test suite for a page at a time<BR>record allocation strategy.<BR>It will buffer serialized records until there are enough to fill up a page<BR>and then batch them onto<BR>the page. I am hoping that this will produce much more efficient<BR>installations of records into pages.<BR>This will be an optional feature that can be enabled using<BR>RecordManagerOptions.<BR><BR>Once we have this working, it should be a minor change to introduce a<BR>clustering feature.<BR><BR>-bryan<BR>-----Original Message-----<BR>From: <A href="mailto:jdb...@li..."><FONT color=#0000ff>jdb...@li...</FONT></A><BR><A href="mailto:jdb...@li...]On"><FONT color=#0000ff>[mailto:jdb...@li...]On</FONT></A> Behalf Of Kevin Day<BR>Sent: Tuesday, May 30, 2006 1:51 PM<BR>To: <A href="mailto:jdb...@li..."><FONT color=#0000ff>jdb...@li...</FONT></A><BR>Subject: re[2]: [Jdbm-developer] Free page selection bug<BR><BR><BR>Gents-<BR><BR>Wanted to float an intermediate idea here - it depends on the database<BR>utlization, but may provide a quick out.<BR><BR>The primary issue withthe first fit algorithm was that, in certain<BR>reasonable usage models, it resulted in massive amounts of wasted space.<BR>This was more a result of the fact that the allocator doesn't support<BR>splitting or coallescene than anything. The downside of the current 'fix'<BR>for this issue is that other (quite reasonable) usage scenarios result in a<BR>huge amount of wasted CPU cycles (dontcha love these kinds of tradeoffs?).<BR><BR>The ultimate fix for this, of course, is to re-design the allocation<BR>strategy completely, and include splitting and coallescing (or have that<BR>behavior be implicit to the algorithm). There has been quite a bit of<BR>discussion about strategies in this area, but things move slowly, so I'm<BR>not holding my breath for any of them.<BR><BR><BR>It seems to me that the immediate workaround here is to add pack capability<BR>to jdbm. I believe that this could be done, even with the current data<BR>structures, by creating a temporary lookup table for reverse mapping<BR>physical offsets to recids. The downside to this approach is that it might<BR>require some sort of quiessence in the system to get it right, so if the<BR>database is going to be hammered 24x7, then itmay not be the right<BR>solution - but I thought I'd toss it out.<BR><BR>Here is how the algorithm could work:<BR><BR>The pack operation is triggered<BR>jdbm creates a new, separate jdbm database with transactions turned off,<BR>optimized purely for speed (this is a temporary file)<BR>jdbm adds a BTree to act as a reverse lookup table. Keys are Long,<BR>physical offset. Values are Long, recid<BR>jdbm cranks through the translation table and inserts physical<BR>offset->recid mapping into the BTree<BR>jdbm then grabs the first data page, and checks to see if the first record<BR>on the page has identical length and used values. If not, then in a single<BR>transaction, it moves the second record to consume all space, updates the<BR>length and used value of the first record, lookups up the recid of the<BR>second physical record, updates the offset of the second physical recordin<BR>the translation table, closes transaction<BR><BR>As an additional check, the system should check the physical offset in the<BR>current translation table against the physical offset of the actual record.<BR>If they are not the same, then the record shouldn't be moved, and the next<BR>record should be considered.<BR><BR>I'm pretty sure that it should be possible to implement this in a manner<BR>that allows for partial pack operations.<BR><BR>The database will certainly grow, but the pack operation will shrink it<BR>back down.<BR><BR><BR>Anyway, just wanted to get my thoughts out there on this in case anyone<BR>wants to run with it. It should be considerably easier (and less error<BR>prone) than implementing a new allocator, and it should provide the basis<BR>of a migration path into a new file format when that happens...<BR><BR>If such a system were in place, it would be possible to go back to a first<BR>fit algorithm (with significantly better performance across a much wider<BR>field of applications), and still (hopefully) keep the fragmentation down.<BR><BR>Cheers,<BR><BR>- Kevin<BR><BR><BR><BR><BR><BR>>Colin,<BR><BR>I am not seeing as much grief from the allocation strategy as you<BR>describe. I expect that there is an interaction with the allocations<BR>that are being performed by your application. The change was introduced<BR>since it produced smaller stores for our RDFS database without a loss in<BR>speed, but clearly it is a loose for you. Let me look at the code and<BR>see what we can change.<BR><BR>With respect to record copying, you should be able to refactor what you<BR>need out of the dump utility. It has mechanisms for scanning records<BR>from a page perspective, but it does not attempt to do inserts with a<BR>pre-defined logical record id. I can look into that I suppose, but let's<BR>see where the update (below) gets you.<BR><BR>I've attached a modified version of FreePhysicalRowIdPage. Please try<BR>it out and let me know what impact it has on your application. It passes<BR>the jdbm test suite, but there are no performance tests that specifically<BR>target this feature. It has no performance impact (time or disk space) on<BR>our RDFS database application.<BR><BR>-bryan<BR><BR>------------------------------------------------------- All the advantages<BR>of Linux Managed Hosting--Without the Cost and Risk! Fully trained<BR>technicians. The highest number of Red Hat certifications in the hosting<BR>industry. FanaticalSupport. Click to learn more<BR><A href="http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642"><FONT color=#0000ff>http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642</FONT></A><BR>_______________________________________________ Jdbm-developer mailing list<BR><A href="mailto:Jdb...@li..."><FONT color=#0000ff>Jdb...@li...</FONT></A><BR><A href="https://lists.sourceforge.net/lists/listinfo/jdbm-developer"><FONT color=#0000ff>https://lists.sourceforge.net/lists/listinfo/jdbm-developer</FONT></A> <<BR><BR><BR>------------------------------------------------------- All the advantages<BR>of Linux Managed Hosting--Without the Cost and Risk! Fully trained<BR>technicians. The highest number of Red Hat certifications in the hosting<BR>industry. Fanatical Support. Click to learn more<BR><A href="http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642"><FONT color=#0000ff>http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642</FONT></A><BR>_______________________________________________ Jdbm-developer mailing list<BR><A href="mailto:Jdb...@li..."><FONT color=#0000ff>Jdb...@li...</FONT></A><BR><A href="https://lists.sourceforge.net/lists/listinfo/jdbm-developer"><FONT color=#0000ff>https://lists.sourceforge.net/lists/listinfo/jdbm-developer</FONT></A><BR><BR><BR><BR>-------------------------------------------------------<BR>All the advantages of Linux Managed Hosting--Without the Cost and Risk!<BR>Fully trained technicians. The highest number of Red Hat certifications in<BR>the hosting industry. Fanatical Support. Click to learn more<BR><A href="http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642"><FONT color=#0000ff>http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642</FONT></A><BR>_______________________________________________<BR>Jdbm-developer mailing list<BR><A href="mailto:Jdb...@li..."><FONT color=#0000ff>Jdb...@li...</FONT></A><BR><A href="https://lists.sourceforge.net/lists/listinfo/jdbm-developer"><FONT color=#0000ff>https://lists.sourceforge.net/lists/listinfo/jdbm-developer</FONT></A><BR><BR><<BR></FONT></FONT></TD></TR></TBODY></TABLE></DIV></FONT></BODY></HTML> |
From: Bryan T. <br...@sy...> - 2006-05-30 20:17:50
|
Kevin, The only advantages are pragmatic. There may be existing code and we don't have to worry about locking the database. Rather than doing overlapping moves, I would suggest bulk operations that read in large parts of the store (many pages at a time), and then write them out again. You would only process the in use pages (data vs translation table). Since the translation table pages get scattered throughout, any packing scheme is of limited utility. We can't pack the store down below the last translation page. -bryan -----Original Message----- From: jdb...@li... [mailto:jdb...@li...]On Behalf Of Kevin Day Sent: Tuesday, May 30, 2006 3:49 PM To: JDBM Developer listserv Subject: re[6]: [Jdbm-developer] Free page selection bug Bryan- ahhh - I see what he was getting at. Is there any advantage to doing it this way instead of an in-place pack? I could certainly see where it would be easy to allow specifying of a recid during an insert (heck, the current implementation might already work just by calling update() instead of insert() - haven't checked the source, so I could be wrong on that... ), but writing to a separate file seems like it would be more intrusive (you'd literally have to take the entire database off-line to do the pack), than the alternative... It would be faster, though, because no logging against the source recman would be needed... I'm definitely open to thinking about it from both directions, though. - K > Kevin, I was responding to this: Colin wrote: <snip> If there was one golden feature that I could ask for, it would be solutions to deal with the free space packing issue. A new record allocation strategy would help, and support in the BaseRecordManager for iterating through allocated records and allowing inserts to specify the logical row id would make the stop-and-copy approach to repacking go more quickly. The system could iterate through the allocated records in the source BaseRecordManager and copy them directly to a target BaseRecordManager with logical row ids intact so that the BTrees would run without modification. </snip> -bryan -----Original Message----- From: jdb...@li... [mailto:jdb...@li...]On Behalf Of Kevin Day Sent: Tuesday, May 30, 2006 2:12 PM To: jdb...@li... Subject: re[4]: [Jdbm-developer] Free page selection bug hmmmm... Is this the request? I don't think that would truly provide pack capability, because you would need to be able to move records in an overlapped manner (e.g. shift a 250 byte record downward by 8 bytes) - that's quite a bit different than providing insertion at a specific offset... This is obviously not something that we'd want to add to the public API... but I think that providing the following call would be sufficient: move(long oldPhysicalOffset, long newPhysicalOffset, long recID) throws OverlappedMoveRequestException, InconsistentRecIDMappingException This call could check to ensure that the new location is either free bytes or only overlapped by the record data, then check to ensure that the oldPhysicalOffset and recID are consistent. If those checks pass, it would then perform the move and update all values as needed. My biggest concern here is in terms of locking the database - though it will create a bottleneck, I think the only truly safe thing we can do during this operation is to grab a full synch lock on the entire record manager... What do you guys think? - K > Kevin, Colin was requesting a feature to insert a physical row at a specified location in the file. It sounds like that is all he needs to provide a pack feature. If that is true, perhaps we can write the insert method and Colin could contribute the rest of the pack feature? In other news, I am working through the test suite for a page at a time record allocation strategy. It will buffer serialized records until there are enough to fill up a page and then batch them onto the page. I am hoping that this will produce much more efficient installations of records into pages. This will be an optional feature that can be enabled using RecordManagerOptions. Once we have this working, it should be a minor change to introduce a clustering feature. -bryan -----Original Message----- From: jdb...@li... [mailto:jdb...@li...]On Behalf Of Kevin Day Sent: Tuesday, May 30, 2006 1:51 PM To: jdb...@li... Subject: re[2]: [Jdbm-developer] Free page selection bug Gents- Wanted to float an intermediate idea here - it depends on the database utlization, but may provide a quick out. The primary issue withthe first fit algorithm was that, in certain reasonable usage models, it resulted in massive amounts of wasted space. This was more a result of the fact that the allocator doesn't support splitting or coallescene than anything. The downside of the current 'fix' for this issue is that other (quite reasonable) usage scenarios result in a huge amount of wasted CPU cycles (dontcha love these kinds of tradeoffs?). The ultimate fix for this, of course, is to re-design the allocation strategy completely, and include splitting and coallescing (or have that behavior be implicit to the algorithm). There has been quite a bit of discussion about strategies in this area, but things move slowly, so I'm not holding my breath for any of them. It seems to me that the immediate workaround here is to add pack capability to jdbm. I believe that this could be done, even with the current data structures, by creating a temporary lookup table for reverse mapping physical offsets to recids. The downside to this approach is that it might require some sort of quiessence in the system to get it right, so if the database is going to be hammered 24x7, then itmay not be the right solution - but I thought I'd toss it out. Here is how the algorithm could work: The pack operation is triggered jdbm creates a new, separate jdbm database with transactions turned off, optimized purely for speed (this is a temporary file) jdbm adds a BTree to act as a reverse lookup table. Keys are Long, physical offset. Values are Long, recid jdbm cranks through the translation table and inserts physical offset->recid mapping into the BTree jdbm then grabs the first data page, and checks to see if the first record on the page has identical length and used values. If not, then in a single transaction, it moves the second record to consume all space, updates the length and used value of the first record, lookups up the recid of the second physical record, updates the offset of the second physical recordin the translation table, closes transaction As an additional check, the system should check the physical offset in the current translation table against the physical offset of the actual record. If they are not the same, then the record shouldn't be moved, and the next record should be considered. I'm pretty sure that it should be possible to implement this in a manner that allows for partial pack operations. The database will certainly grow, but the pack operation will shrink it back down. Anyway, just wanted to get my thoughts out there on this in case anyone wants to run with it. It should be considerably easier (and less error prone) than implementing a new allocator, and it should provide the basis of a migration path into a new file format when that happens... If such a system were in place, it would be possible to go back to a first fit algorithm (with significantly better performance across a much wider field of applications), and still (hopefully) keep the fragmentation down. Cheers, - Kevin >Colin, I am not seeing as much grief from the allocation strategy as you describe. I expect that there is an interaction with the allocations that are being performed by your application. The change was introduced since it produced smaller stores for our RDFS database without a loss in speed, but clearly it is a loose for you. Let me look at the code and see what we can change. With respect to record copying, you should be able to refactor what you need out of the dump utility. It has mechanisms for scanning records from a page perspective, but it does not attempt to do inserts with a pre-defined logical record id. I can look into that I suppose, but let's see where the update (below) gets you. I've attached a modified version of FreePhysicalRowIdPage. Please try it out and let me know what impact it has on your application. It passes the jdbm test suite, but there are no performance tests that specifically target this feature. It has no performance impact (time or disk space) on our RDFS database application. -bryan ------------------------------------------------------- All the advantages of Linux Managed Hosting--Without the Cost and Risk! Fully trained technicians. The highest number of Red Hat certifications in the hosting industry. FanaticalSupport. Click to learn more http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642 _______________________________________________ Jdbm-developer mailing list Jdb...@li... https://lists.sourceforge.net/lists/listinfo/jdbm-developer < ------------------------------------------------------- All the advantages of Linux Managed Hosting--Without the Cost and Risk! Fully trained technicians. The highest number of Red Hat certifications in the hosting industry. Fanatical Support. Click to learn more http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642 _______________________________________________ Jdbm-developer mailing list Jdb...@li... https://lists.sourceforge.net/lists/listinfo/jdbm-developer ------------------------------------------------------- All the advantages of Linux Managed Hosting--Without the Cost and Risk! Fully trained technicians. The highest number of Red Hat certifications in the hosting industry. Fanatical Support. Click to learn more http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642 _______________________________________________ Jdbm-developer mailing list Jdb...@li... https://lists.sourceforge.net/lists/listinfo/jdbm-developer < ------------------------------------------------------- All the advantages of Linux Managed Hosting--Without the Cost and Risk! Fully trained technicians. The highest number of Red Hat certifications in the hosting industry. Fanatical Support. Click to learn more http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642 _______________________________________________ Jdbm-developer mailing list Jdb...@li... https://lists.sourceforge.net/lists/listinfo/jdbm-developer |