From: Thompson, B. B. <BRY...@sa...> - 2006-04-03 16:30:32
|
Ah nice. In that case, buffering can be used to pre-fetch if random I/O provides sufficient throughput. Otherwise we need to store the pageId[] on the metadata record for efficient pre-fetch. -bryan _____ From: jdb...@li... [mailto:jdb...@li...] On Behalf Of Kevin Day Sent: Monday, April 03, 2006 12:28 PM To: JDBM Developer listserv Subject: re[4]: [Jdbm-developer] JBossCache and JBPM Bryan- One general comment: Deleting all of the pages for a given blob would not require that they all be read. Instead, just move the first page to the free page list. I definitely agree that some sort of mechanism to support gathering reads (similar to pre-fetch) is desirable. - K > I'm just looking at the blob code. It actually maintains a distributed linked list of data segments (pages). I guess that I never got around to modifying it to support an array of pageIds in the metadata record. I also never got around to modifying it to allocate the data segments using the lower level API of jdbm. It seems that I wrote this code before I was that deeply involved in the page manager APIs. However, I would definitely support the goal of an efficient blob API for jdbm and I am happy to coordinate with you on this. I see the BLOB API visible on the RecordManager and oriented to support very large data objects with incremental stream-based serialization in contrast to the existing byte[] oriented API which is just fine for records whose serialized size is only one or two pages. Blobs would be designed to support features such as streaming media. The design that I described above with a linked list of data segments might work out if the reader buffered several pages before "playing" the media. A design that supports pre-fetch would probably create an array of pageIds and store that in the metadata record. The other advantage of storing the pageIds in the metadata record is that you do not need to read all the data pages in order to delete the blob since their pageIds are stored in the metadata record. -bryan -----Original Message----- From: jdb...@li... <mailto:jdb...@li...> [mailto:jdb...@li...] <mailto:jdb...@li...> On Behalf Of Thompson, Bryan B. Sent: Monday, April 03, 2006 8:26 AM To: Elias Ross; Kevin Day Cc: jdb...@li... <mailto:jdb...@li...> Subject: RE: re[2]: [Jdbm-developer] JBossCache and JBPM Elias, When we have discussed streaming APIs the notion was to use byte[] for smaller records (less than a page) and provide a blob API for large records. I have a blob implementation that I can migrate into the jdbm package to support the use case you describe. The blob design that I used manages an array of pageIds so that it can pre-fetch pages and also makes it possible to delete a blob without reading all of the pages from disk. Can you expand on the service that the JBossCache is providing? It is a persistent object store for what features in JBoss? -bryan -----Original Message----- From: jdb...@li... <mailto:jdb...@li...> [mailto:jdb...@li...] <mailto:jdb...@li...> On Behalf Of Elias Ross Sent: Monday, April 03, 2006 2:14 AM To: Kevin Day Cc: jdb...@li... <mailto:jdb...@li...> Subject: Re: re[2]: [Jdbm-developer] JBossCache and JBPM On Sun, 2006-04-02 at 21:54 -0700, Kevin Day wrote: > Elias- > > There are a number of things about the current BTree implementation > that need some over-hauling - Adding the ability to browse on keys > instead of key/value pairs is definitely doable (in fact, in jdbm2, we > are in discussion about including the concept of an index BTree that > deals *ony* with keys plus a recid lookup). Right. The streaming interface is obviously the easiest way to do it. Once you have InputStream RecordManager.getStream(...) you can basically put in a "return" somewhere in the readExternal() method for the values are read. > I think the best strategy for reading keys only during a browse would > be to have an option in the BPage itself to defer value > deserialization. Agreed. The nice thing is you could send along the InputStream to, say, Tuple, so that a call to "getValue()" could lazily initialize the value. > I don't think that we want to get back into stream processing on the > serializers (there are very good reasons for working with known sized > arrays in the serializers) - we are going to be having a lively (I'm > sure) discussion about whether we should change the serlizer interface > in jdbm2 to work with ByteBuffer objects instead of byte[]. I'm not determined to change anybody's mind about byte[] versus InputStream (versus ByteBuffer or whatever.) I just think it'd be swell to support the streaming paradigm as well. If that means having two sets of serializers and decompressors or ISporks so be it. > Here is a rough strategy of how deferred deserialization would work: > During the BPage deserialize call, the byte arrays for the values > would be retrieved and stored. If default serialization is being > used, this is about all you can do - the first request for any value > in the page requires deserialization of the entire byte array (this is > one of the many problems with using default serialization). If custom > serialization is being used, then you could store the byte array of > each value individually, and only deserialize when needed. Right. If you noticed in my implementation of RowInputStream, basically I read in one page at a time and create a backing ByteArrayInputStream for each page. So, as you're done reading in one page, the next one shows up. > Another way to make this work is to ditch the idea of storing values > in the BTree entirely (this is what I do, and I suspect it's what a > lot of folks wind up doing eventually). Instead, make your key so the > last part of the key is the recid of the object that is actually > stored elsewhere in the record manager. Then set the value to a 0 > sized byte array. This is much closer to an index style btree... I guess you just as well could have stored a java.lang.Long in the value field. Anyway, your approach is easier than what I described (streams and pipes), then why couldn't it be the default implementation for BTree? > With all that said, are you seeing a performance hit from > deserializing values in the BTree? Not really. Again, I'm somewhat imagining (fantasizing?) storing very large elements (like movie clips) within JBossCache. I was curious, from a performance standpoint, how browsing keys might work. I was also wondering if I could read in partial elements, such as the title and other meta data before the main body itself. It seems such a waste to copy everything into a single byte array and then throw half of it away. I may end up using a file system instead for this sort of application, since it's pipe and stream based. What is a file system but yet another BTree? ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk <http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642> &kid=110944&bid=241720&dat=121642 _______________________________________________ Jdbm-developer mailing list Jdb...@li... <mailto:Jdb...@li...> https://lists.sourceforge.net/lists/listinfo/jdbm-developer <https://lists.sourceforge.net/lists/listinfo/jdbm-developer> ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk <http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642> &kid=110944&bid=241720&dat=121642 _______________________________________________ Jdbm-developer mailing list Jdb...@li... <mailto:Jdb...@li...> https://lists.sourceforge.net/lists/listinfo/jdbm-developer <https://lists.sourceforge.net/lists/listinfo/jdbm-developer> ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk <http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642> &kid=110944&bid=241720&dat=121642 _______________________________________________ Jdbm-developer mailing list Jdb...@li... <mailto:Jdb...@li...> https://lists.sourceforge.net/lists/listinfo/jdbm-developer <https://lists.sourceforge.net/lists/listinfo/jdbm-developer> < ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ Jdbm-developer mailing list Jdb...@li... https://lists.sourceforge.net/lists/listinfo/jdbm-developer |