Kevin,

 

I think that we have a lot of latitude with the design.  We could keep the existing page view mechanisms and simply augment them along the lines that we are discussing, or we could go to the design that I am proposing and only support the translation page variation for now.

 

In terms of GC, I have been thinking of deferring physical row deletions and then batching them.

 

-bryan

 


From: jdbm-developer-admin@lists.sourceforge.net [mailto:jdbm-developer-admin@lists.sourceforge.net] On Behalf Of Kevin Day
Sent: Tuesday, April 11, 2006 5:48 PM
To: JDBM Developer listserv
Subject: re[6]: [Jdbm-developer] record managment

 

Bryan-

 

I certainly agree that writing to a clean page is a dead simple strategy when you are doing a bunch of inserts and makes it especially easy to cluster objects since there is only one tx doing the allocation and with data on the page.

KD - What is unknown is whether this strategy is any worse than any other strategy.  It's so simple that it seems like it almost has to be worse - but I can't really put my finger on exactly how it would worse.  I'm wondering if there is any research in the literature that critiques this approach.... ?

 

 

One improvement to this strategy would be to have a bit of interaction with the page cache.  We evict pages in chunks, and perform a GC run across all the pages in a given chunk.  If GC actually results in reclaimed space, the changed pages get pushed back into the safe at the top of the cache.

 

Tuning of the cache then becomes (in part) choosing a size to where page evictions result in entire pages being removed from the write set.

 

 

Like I said, this may be a complete loser approach - but all of that extra GC, etc... is ocurring in a background process, so maybe not...  My biggest issue is that it requires multiple writes of the pages to the safe...  And when we add translation pages to the mix, a single optimization of a given data page could result in a large number of pages having to be re-saved to cache.

 

An optimization would be to do what you are proposing and have an allocator strategy that only tracks open record ranges in cached pages (with slots on dirty pages being given higher priority). 

 

 

Certainly, having the translation info and data on the same page would fix some of this.  How about the following as a possible compromise that may give the best of both worlds:

 

Maintain a PAGE translation system at the beginning of the db file, but keep record location inside the page itself with the data.  A lookup would still consist of translation lookup to determine the physical page, but the position of a given record on it's page could change dynamically.  This would allow us to quickly move pages around without having to adjust potentially scattered OIDs, and allow us to quickly coallesce freed records to maximize available space on a page.

 

I'm still noodling on this idea - I think there is something here - I'm just not sure about the details.

 

- K

 

 

 

>
Kevin,
 
I was thinking that the pages in the cache would be those available for the record allocator.  You simply notice how much free space is on each page in the page cache as it comes in and build up an access structure to find a page with sufficient free space for an allocation quickly.
 
I certainly agree that writing to a clean page is a dead simple strategy when you are doing a bunch of inserts and makes it especially easy to cluster objects since there is only one tx doing the allocation and with data on the page.
 
-bryan
 


From: jdbm-developer-admin@lists.sourceforge.net [mailto:jdbm-developer-admin@lists.sourceforge.net] On Behalf Of Kevin Day
Sent: Tuesday, April 11, 2006 3:46 PM
To: JDBM Developer listserv
Subject: re[4]: [Jdbm-developer] record managment
 

Bryan-

 

Ahh - I see the concern now.

 

I don't think this really resolves the allocator issue - you will still wind up having to keep track of which pages have how much space left on them.  Imagine how the system will look after running for a long period of time - basically, you will have a bunch of pages, all with different sized dead zones in the middle.  You don't want to have to scan each page looking for sufficient available space, so we are back into the allocator discussion.

 

 

Some other options to consider:

 

1.  Using a fibonnoci series index for starting positions of linked lists containing free records

or

2.  Always write to clean pages, and use low priority optimization operation to move records from partially used pages to the wilderness zone - this has the advantage of completely removing the allocator from the discussion.  Disadvantages would be that the db file could grow quite a bit during heavy load, then shrink back during the optimization.  I think that this would be the best in terms of performance - a given transaction's update set would be basically written to disk in an entirely linear fashion, and it maximizes cache utilization.

 

 

 

Tracking free slots is easy - we just add a pointer to each page with the first available slot on the page, and another to the next page that contains available slots.  Slots would be assigned until the page was full, then it would move on to the next page.

 

 

I certainly see the need for a mechanism for allowing the application to force the db to group a set of records - I just think that this can be done in the context of arbitrary re-locations of records, using the translation system....

 

 

- K 

 
<

 

------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ Jdbm-developer mailing list Jdbm-developer@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/jdbm-developer