From: Thompson, B. B. <BRY...@sa...> - 2005-09-23 16:32:08
|
FYI - I've committed the updated version of jdbm.recman.DumpUtility and also README-alloc.txt on the memory management mechanisms. -b -----Original Message----- From: Thompson, Bryan B. Sent: Friday, September 23, 2005 10:21 AM To: Thompson, Bryan B.; 'Kevin Day' Cc: 'jdb...@li...' Subject: RE: [Jdbm-developer] commit: jdbm.recman.DumpUtility One more question: How can we correctly detect when there are no more records on a page? What I am seeing is that the rest of the page is filled with zeros, so it appears like records with zero size and zero capacity. However the dump could fail if this is not always the case. -bryan -----Original Message----- From: Thompson, Bryan B. Sent: Friday, September 23, 2005 10:11 AM To: Thompson, Bryan B.; Kevin Day Cc: jdb...@li... Subject: RE: [Jdbm-developer] commit: jdbm.recman.DumpUtility Kevin, Two questions related to the traversal of the "used pages" list. If a data page is solely used as a continuation page for a record, does it show up in the used pages list? If so, what is the return value for getFirst() on such a data page since there is no record header? Also, do you have thoughts on what would be "interesting" information to extract from the free page list? I am thinking of a free page count and perhaps determining how much fragmentation there is in the free page list. Which gets back to an earlier question -- does jdbm memory allocation make any effort to allocation contiguous pages for large records? -bryan -----Original Message----- From: jdb...@li... [mailto:jdb...@li...] On Behalf Of Thompson, Bryan B. Sent: Friday, September 23, 2005 8:15 AM To: Kevin Day Cc: jdb...@li... Subject: RE: [Jdbm-developer] commit: jdbm.recman.DumpUtility I'll try this today. One thing that I am not clear about with jdbm is whether it makes any attempt (or guarentee) that records which span a page will be on contiguous pages. As I mentioned, I have been playing with some blob/clob support. My original take was to have the records form a linked list. Since this is exactly what the jdbm record headers are doing, it should be possible to incrementally allocate new pages into a jdbm record, thereby supporting streaming from the application in addition to the current Serializer approach. An interesting thought. However, it occurred to me that greater efficiency could be obtained by blocking the linked list of records (in my current implementation of blob/clob) into an array of recids in a header record for the blob. That would make it possible to do pre-fetch strategies for subsequent segments of the blob. If we use the record header approach as it stands today, the most pre-fetch that we could do is one page read ahead at a time. You mentioned in another thread I/O efficiency. The guiding principle as I understand it is that you want to get as much I/O concurrency as possible so that you can get as many disk arms behind your application as possible. This leads to the use of striped disk arrays and cluster storage solutions. jdbm today is single threaded for read and write, so it is not possible to get I/O concurrency. Even if you have I/O concurrency at the store layer, your application has to support it as well. For us, I/O concurrency is gained by high level query languages that can use parallel read operations against the store or (in a different application which is a fuzzy inference engine based on a neural network model) by being able to model the main computation using a parallel processing approach. One change that I would like to introduce (if we wind up introducing changes in the jdbm file structure) is a long class identifier in the record header. This would make it possible to accurrately profile the contents of a jdbm store. For certain key classes (BTree, BPage, HashDictionary, HashBucket, String, Long) we would have "magic" pre-defined values that used negative recids. All other classes would be assigned recids by interning the class name in a string table. The string table itself could be just a BTree using a compressed key index whose keys are the string values and whose values are the jdbm records whose content is that string. The latter is necessary so that you can lookup the class name. I am certainly going to do this at the object manager layer for our application since some use of Externalizable or even Serializable appears to be why the store is so bloated (I had no idea just how bad java serialization was!). I think that it would be a nice feature for jdbm, but not one that we could introduce while maintaining binary compatibility in the store file. At the same time, I would also like to introduce version numbers to critical classes (BTree, BPage, etc.) so that we have more flexibility in evolving jdbm without breaking binary compatibility. -bryan -----Original Message----- From: Kevin Day [mailto:ke...@tr...] Sent: Thursday, September 22, 2005 11:26 PM To: Thompson, Bryan B. Subject: re: [Jdbm-developer] commit: jdbm.recman.DumpUtility Bryan- Thanks for picking this up right now - I am completely slammed with work that I'm not able to really write any jdbm code. For some reason it's a lot easier for me to think of conceptual design and algorithms in the evening than actually bang on the keyboard. Things should clear up a bit in the next week or so. Please let me know if you need a hand walking the data pages - I think that is going to be your best bet for getting a solid unused space percentage. - K > I finally got through the sourceforge CVS. I will pick up work on the free lists tomorrow. -bryan < |