From: Kevin D. <ke...@tr...> - 2006-04-11 16:45:22
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD> <STYLE type=text/css> P, UL, OL, DL, DIR, MENU, PRE { margin: 0 auto;}</STYLE> <META content="MSHTML 6.00.2900.2802" name=GENERATOR></HEAD> <BODY leftMargin=1 topMargin=1 rightMargin=1><FONT face=Tahoma> <DIV><FONT face=Arial size=2>Bryan-</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2></FONT></DIV> <DIV><FONT face=Arial size=2>Yes - I've looked at the GOM implementation (and have worked extensively with similar approaches - Microsoft COM, mozilla cross platform COM, etc...). I can see the utility in some situations, but for the development I do, losing the runtime type checking would be a massive problem. I'm trying to think of jdbm as a true object database, and not just a way of storing key/value pairs. Certainly, I could create wrapper classes that call the GOM framework, then work in an object oriented manner, but that is a heck of a lot of work (and boring work at that).</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>As a user of an oodbm, I expect to be able to work with objects the way I normally work with objects, but have the persistent capabilities. The solution I'm going for is much closer to JDO than COM... So, the idea of using events and listeners is fine - but we have to have a true object oriented mechanism for driving those events. Short of code injection, aspect weaving, or some ugly notification requirement imposed by the API, I just don't see how to achieve this. The alternative, then, is to provide some sort of reverse lookup method that either caches index info for all retrieved records or provides some sort of view into the index keys (or node locations) for each recid.</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>Expanding on my comment re: blink trees: We could certainly store the index pages on DATA pages, if desired - but there may be advantages (locality again!) to storing the index pages in their own page thread. Given that index page objects are likely to change much, much more frequently than data pages, it may make sense from a caching and free page perspective to keep them in a separate page thread.</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>Likewise, in my proof of concept, I could have stored the version tables on DATA pages, but I chose not to because version tables are extremely short lived, and will undergo a lot of churn. If the version tables were interspersed with the data, we'd wind up with a huge amount of fragmentation. I think (but can not say with absolute certainty) that maintaining these data structures in a separate page thread should optimize caching behavior, allocation, etc...</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>- K</FONT></DIV> <DIV><FONT face=Arial size=2> </FONT> <TABLE> <TBODY> <TR> <TD width=1 bgColor=blue><FONT face=Arial size=2></FONT></TD> <TD><FONT face=Arial size=2><FONT color=red>> <BR>Kevin, <BR> <BR>Can you expand on your comment about blink trees? Why do you see that they need to be outside of the record management API? I was hoping that some data structures persisted as records could opt out of the default concurrency control mechanisms and use consistency based mechanisms, but that does not mean that we need to store them directly on pages does it? <BR> <BR>I see index maintenance as the role of a framework layer over the record management API. Have you looked at the framework that I have implemented? [1]. It handles index maintenance by catching property updates and generating events. Indices register as listeners for events and handle the removal under the old key and the insert under the new key (the old key, new key, and the object are part of the event). This framework is designed around generic data rather than explicitly declared java fields and reflection. If you want to manage indices and your classes have fields that are getting serialized, serve as your index keys, and you have setters for those fields then you need to generate notification events from those setters and have your indices registered as listeners. <BR> <BR>-bryan <BR> <BR>[1] <A href="http://proto.cognitiveweb.org/projects/cweb/multiproject/cweb-generic-native/index.html"><FONT color=#0000ff>http://proto.cognitiveweb.org/projects/cweb/multiproject/cweb-generic-native/index.html</FONT></A> <BR> <BR><BR><BR>From: <A href="mailto:jdb...@li..."><FONT color=#0000ff>jdb...@li...</FONT></A> <A href="mailto:jdb...@li..."><FONT color=#0000ff>[mailto:jdb...@li...]</FONT></A> On Behalf Of Kevin Day<BR>Sent: Monday, April 10, 2006 9:46 PM<BR>To: JDBM Developer listserv<BR>Subject: [Jdbm-developer] primary and secondary indexes in the record manager <BR> <BR><BR>One thing that I've been thinking about in a background thread: <BR><BR> <BR><BR>We've discussed the possibility of making indexes a first order citizen of jdbm (instead of storing a BTree in the record manager itself, on the data pages). This has lots of advantages from a concurrency perspective (we can use BLink trees). It also creates a programming paradigm that is much closer to how people are used to working with databases. <BR><BR> <BR><BR>This is where secondary indexes come in. <BR><BR> <BR><BR>My experience so far with implementing multi-index constructs in jdbm has been that keeping the indexes in sync is an absolute bugger. I've had to resort to special serializers that get the serialized form of the keys used by a given record when that record is initially retrieved, then use those keys to remove and re-insert values during any update of the record. <BR><BR> <BR><BR>The problem with this is that you wind up having to deserialize the full record, then turn around and serialize keys for each index that record is a part of. And you have to do it for every single read (it isn't possible to determine the index key values during the update itself, because the object itself has already been changed). <BR><BR> <BR><BR>Another approach to this problem would be to maintain index meta data in a suplimentary index, keyed by recid. Whenever an update is performed, the recid would be retrieved, and the page and node of the object could be retrieved. This would allow for efficient updates, but changes to the main index tree (like a re-balance operation) could require a significant number of changes in the siplimentary index. <BR><BR> <BR><BR>A compromise would be to store only the page where the node for a given record exists, then do a linear search for the recid during updates. <BR><BR> <BR><BR>I'm really not sure where to go with all of this, but I think it's appropriate to start talking about it. If we have a suplimentary index, then it may make sense to toss the physical row location into it and ditch the translation pages entirely. Individual record lookups would require a tree search, but in 99% of my uses of jdbm this is the case already. This is closer to the primary key index idea that Alex floated awhile back. <BR><BR> <BR><BR>I'm not at all saying this is the way to go, but I would like to begin getting a handle on how we can maintain indexes for objects stored in jdbm. <BR><BR> <BR><BR>- K <<BR></FONT></FONT></TD></TR></TBODY></TABLE></DIV></FONT></BODY></HTML> |