From: Kevin D. <ke...@tr...> - 2006-04-11 21:55:57
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD> <STYLE type=text/css> P, UL, OL, DL, DIR, MENU, PRE { margin: 0 auto;}</STYLE> <META content="MSHTML 6.00.2900.2802" name=GENERATOR></HEAD> <BODY leftMargin=1 topMargin=1 rightMargin=1><FONT face=Tahoma> <DIV><FONT face=Arial size=2>Bryan-</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2></FONT></DIV> <DIV><FONT face=Arial size=2>I don't use JDO in my projects because it is much heavier weight than is appropriate for what I need. I also disagree with their use of byte code injection, and big gnarly configuration files.</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>In terms of whether indexing is a jdbm concern or not, I think I'll have to defer to Alex on this one. My understanding from past emails on the subject was that some form of indexing and constraint system was an extremely desirable feature for the core jdbm2 implementation.</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>When I say that the indexes need to reside outside the record manager, I mean that the standard recman semantecs don't apply. The new recman always operates in a transaction context, updates to index data structures don't take place in a transaction context - that's why I say the indexes need to live outside of the record manager.</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>Take a look at the sample class in the proof of concept code I emailed last week, and you'll see that we are *not* making insert/fetch/update calls against a transaction object. Instead, we make the calls against the record manager, which in turn determines which transactional context the operations should be performed.</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>Finally, the BTree implementation for the indexes is going to be completely different than a btree that lives inside the record manager. First off, it's going to be a b-link tree. Second, it's values will always be recids. Third, browser traversal and lookup will implicitly be fetching records, based on the current transaction context. So, the index is *much* closer to the record manager itself than to the objects stored inside the record manager...</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>- K</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2> </FONT> <TABLE> <TBODY> <TR> <TD width=1 bgColor=blue><FONT face=Arial size=2></FONT></TD> <TD><FONT face=Arial size=2><FONT color=green>> Kevin,<BR><BR>My critique of JDO was more of implementation approaches, but you raise a<BR>fair point. Would JDO for jdbm answer the mail for you? I know that there<BR>are OS JDO projects -- I believe Apache hosts one for example. Could you<BR>leverage the JDO framework (reflection, etc.) but use jdbm2 as the<BR>persistence store? Or is that exactly what you are planning?<BR><BR>However, I still do not see why management of primary and secondary indices<BR>is a "jdbm" level concern. This feature can clearly be built within a<BR>framework over jdbm, so why introduce it within jdbm? It seems to me that<BR>we are talking about the separation of concerns between record management<BR>with transactions, concurrency, etc. and an oodbms with transparent access<BR>to and use of the native java fields. I see the latter as interested but<BR>cleanly separable from the former -- just a layer on top.<BR><BR>I still don't follow why the b-link nodes would exist outside of the record<BR>manager API. Perhaps we are just using the terms slightly differently. I<BR>see that we need to use distinct pages for index nodes, but not that the<BR>data structures on those pages need to be any different. It is more that<BR>the free space on those pages is not available for the physical record<BR>allocator for user objects, but only for node allocations on the btree.<BR><BR>-bryan<BR><BR>________________________________________<BR>From: <A href="mailto:jdb...@li..."><FONT color=#0000ff>jdb...@li...</FONT></A><BR><A href="mailto:jdb...@li..."><FONT color=#0000ff>[mailto:jdb...@li...]</FONT></A> On Behalf Of Kevin Day<BR>Sent: Tuesday, April 11, 2006 3:33 PM<BR>To: JDBM Developer listserv<BR>Subject: re[4]: [Jdbm-developer] primary and secondary indexes in the record<BR>manager<BR><BR>Bryan-<BR> <BR>MS COM and GOM have a lot in common, actually - but that's a conversation<BR>for another day after we ship jdbm2 :-)<BR> <BR> <BR>I think that you may have a mis-perception of how JDO works. To the actual<BR>user of the system, it's completely transparent. There is work that has to<BR>be done in an external configuration file to define the O/R mapping, and<BR>setting up the indexes in the underlying datastore, but once that's done,<BR>using JDO is almost exactly what we are talking about for jdbm - you start a<BR>transaction, you do stuff and you call commit (the biggest difference is<BR>that in JDO I don't think you have to call update() when you change an<BR>object ).<BR> <BR>With the GOM approach, the overhead of property access, etc... is not the<BR>concern for me - it's the impact on the developer and the style of coding. <BR>It certainly would be possible to use annotations and automatically generate<BR>objects that conform to the described interface, etc... But it moves far<BR>afield from the concept of POJOs.<BR> <BR>Run-time overriding (for example using AspectJ) would be another way to get<BR>at this - but you force the user to declare which calls can change the<BR>object state, etc... - it's just a ton of extra work just to get<BR>persistence.<BR> <BR> <BR> <BR>Anyway, I think that it is best if we focus on the jdbm subsystem itself at<BR>this stage - we can certainly include hooks, etc... for making the higher<BR>level stuff happen - but I'd like to focus on getting the core functionality<BR>of a highly concurrent, fully functional oodbm up and running. One of the<BR>core features of most oodbms is the ability to navigate (and query) the<BR>object store using indexes, and I think we need to seriously consider if/how<BR>to do this in jdbm. If a higher level system doesn't take advantage of<BR>those indexes, then that's fine - no harm done.<BR> <BR>The vast majority of jdbm users are using it as a persistent tree map, which<BR>implies index based access. For high concurrency, we need to offer these<BR>users map capability that is concurrent aware, and if we are going down that<BR>path, then we should at least talk about secondary indexes, because having<BR>multi-keyed maps is useful for a huge number of development situations.<BR> <BR>BTW - another reason for why index trees shouldn't reside in the regular<BR>record manager is concurrent updates - if tx1 causes the tree to re-balance,<BR>we need to have a mechanism for ensuring that tx2's logical view of the tree<BR>stays the same, but that any changes that tx2 wanted to make will still<BR>work. That's why I'm proposing a b-link structure that holds recid<BR>references, and those recids are resolved in the context of whatever<BR>transaction happens to be active. <BR> <BR>This approach (I think) implies a certain amount of garbage collection<BR>functionality, and the GC operation requires visibility to both the<BR>versioned and non-versioned logical store. This also implies that the GC<BR>must *know* which objects are index objects, which in turn implies that the<BR>indexes are in some way managed in jdbm (something akin to SQL INSERT INDEX,<BR>DROP INDEX, etc... commands).<BR> <BR>- K<BR> <BR><BR>> Kevin,<BR><BR>That is interesting feedback. I personally do not think of COM when I think<BR>about GOM. I am not actually very read in on the Microsoft architectures<BR>and have not looked at them in several years, but I do not consider them to<BR>be persistent object database systems, data extensible, etc.<BR><BR>In any case, you can get strongly typed behavior from GOM by using property<BR>and association classes to define property and link constraints. I do not<BR>find much overhead in the property access and type checking mechanisms.<BR>2/3rds of the current time for our application on a bulk load operation is<BR>the serialization of objects and laying them down on to page images. If we<BR>use a hard reference cache so that no cache evictions occur until the tx<BR>commit, then the processing time drops by 2/3rds. This is why one of my<BR>goals has been a rewrite of the transaction system and physical row<BR>allocation systems of jdbm.<BR><BR>I certainly agree that handling index updates transparently is difficult for<BR>"standard" java objects. I think that an interesting approach could be done<BR>with Java 5 annotations and a BCE package. From what I have been able to<BR>see about JDO, there are very few implementations that provide what I might<BR>call a native java persistent object system that is performance oriented.<BR>E.g., JDO achieved through an O-R mapper and stored in an RDBMS appears to<BR>be a common strategy.<BR><BR>What I have see is pretty non-transparent. You write your native java<BR>objects to be explicitly persistence aware and do things like indicate when<BR>a property value change has invalidated an index entry for an object. I<BR>take it that this solution is not satisfying for you and that you are<BR>looking for a means to achieve transparent persistence for native java?<BR><BR>What I tend to do with GOM is write "skins" which implement application<BR>interfaces and are backed by the generic data object. That gives me the<BR>application semantics for the persistent objects and strong type checking as<BR>well as interoperability across GOM implementations. I don't find this to<BR>be that much effort, but it does smack of something that would benefit from<BR>automation. You could also subclass the generic object for a specific<BR>implementation and then operate in a type safe traditional OO manner, but<BR>that locks you into a specific object hierarchy and backend.<BR><BR>I am noodling with both UML driven authoring of the class, property and<BR>association constraints and BCE based optimizations for skins over live<BR>generic objects but I do not have anything concrete to offer in either case.<BR><BR>-bryan<BR><BR>________________________________________<BR>From: <A href="mailto:jdb...@li..."><FONT color=#0000ff>jdb...@li...</FONT></A><BR><A href="mailto:jdb...@li..."><FONT color=#0000ff>[mailto:jdb...@li...]</FONT></A> On Behalf Of Kevin Day<BR>Sent: Tuesday, April 11, 2006 12:49 PM<BR>To: JDBM Developer listserv<BR>Subject: re[2]: [Jdbm-developer] primary and secondary indexes in the record<BR>manager<BR><BR>Bryan-<BR> <BR>Yes - I've looked at the GOM implementation (and have worked extensively<BR>with similar approaches - Microsoft COM, mozilla cross platform COM,<BR>etc...). I can see the utility in some situations, but for the development<BR>I do, losing the runtime type checking would be a massive problem. I'm<BR>trying to think of jdbm as a true object database, and not just a way of<BR>storing key/value pairs. Certainly, I could create wrapper classes that<BR>call the GOM framework, then work in an object oriented manner, but that is<BR>a heck of a lot of work (and boring work at that).<BR> <BR>As a user of an oodbm, I expect to be able to work with objects the way I<BR>normally work with objects, but have the persistent capabilities. The<BR>solution I'm going for is much closer to JDO than COM... So, the idea of<BR>using events and listeners is fine - but we have to have a true object<BR>oriented mechanism for driving those events. Short of code injection,<BR>aspect weaving, or some ugly notification requirement imposed by the API, I<BR>just don't see how to achieve this. The alternative, then, is to provide<BR>some sort of reverse lookup method that either caches index info for all<BR>retrieved records or provides some sort of view into the index keys (or node<BR>locations) for each recid.<BR> <BR> <BR> <BR>Expanding on my comment re: blink trees: We could certainly store the index<BR>pages on DATA pages, if desired - but there may be advantages (locality<BR>again!) to storing the index pages in their own page thread. Given that<BR>index page objects are likely to change much, much more frequently than data<BR>pages, it may make sense from a caching and free page perspective to keep<BR>them in a separate page thread.<BR> <BR>Likewise, in my proof of concept, I could have stored the version tables on<BR>DATA pages, but I chose not to because version tables are extremely short<BR>lived, and will undergo a lot of churn. If the version tables were<BR>interspersed with the data, we'd wind up with a huge amount of<BR>fragmentation. I think (but can not say with absolute certainty) that<BR>maintaining these data structures in a separate page thread should optimize<BR>caching behavior, allocation, etc...<BR> <BR>- K<BR> <BR><BR>> <BR>Kevin, <BR> <BR>Can you expand on your comment about blink trees? Why do you see that they<BR>need to be outside of the record management API? I was hoping that some<BR>data structures persisted as records could opt out of the default<BR>concurrency control mechanisms and use consistency based mechanisms, but<BR>that does not mean that we need to store them directly on pages does it? <BR> <BR>I see index maintenance as the role of a framework layer over the record<BR>management API. Have you looked at the framework that I have implemented?<BR> [1]. It handles index maintenance by catching property updates and<BR>generating events. Indices register as listeners for events and handle the<BR>removal under the old key and the insert under the new key (the old key, new<BR>key, and the object are part of the event). This framework is designed<BR>around generic data rather than explicitly declared java fields and<BR>reflection. If you want to manage indices and your classes have fields that<BR>are getting serialized, serve as your index keys, and you have setters for<BR>those fields then you need to generate notification events from those<BR>setters and have your indices registered as listeners. <BR> <BR>-bryan <BR> <BR>[1]<BR><A href="http://proto.cognitiveweb.org/projects/cweb/multiproject/cweb-generic-native"><FONT color=#0000ff>http://proto.cognitiveweb.org/projects/cweb/multiproject/cweb-generic-native</FONT></A><BR>/index.html <BR> <BR><BR><BR>From: <A href="mailto:jdb...@li..."><FONT color=#0000ff>jdb...@li...</FONT></A><BR><A href="mailto:jdb...@li..."><FONT color=#0000ff>[mailto:jdb...@li...]</FONT></A> On Behalf Of Kevin Day<BR>Sent: Monday, April 10, 2006 9:46 PM<BR>To: JDBM Developer listserv<BR>Subject: [Jdbm-developer] primary and secondary indexes in the record<BR>manager <BR> <BR><BR>One thing that I've been thinking about in a background thread: <BR><BR> <BR><BR>We've discussed the possibility of making indexes a first order citizen of<BR>jdbm (instead of storing a BTree in the record manager itself, on the data<BR>pages). This has lots of advantages from a concurrency perspective (we can<BR>use BLink trees). It also creates a programming paradigm that is much<BR>closer to how people are used to working with databases. <BR><BR> <BR><BR>This is where secondary indexes come in. <BR><BR> <BR><BR>My experience so far with implementing multi-index constructs in jdbm has<BR>been that keeping the indexes in sync is an absolute bugger. I've had to<BR>resort to special serializers that get the serialized form of the keys used<BR>by a given record when that record is initially retrieved, then use those<BR>keys to remove and re-insert values during any update of the record. <BR><BR> <BR><BR>The problem with this is that you wind up having to deserialize the full<BR>record, then turn around and serialize keys for each index that record is a<BR>part of. And you have to do it for every single read (it isn't possible to<BR>determine the index key values during the update itself, because the object<BR>itself has already been changed). <BR><BR> <BR><BR>Another approach to this problem would be to maintain index meta data in a<BR>suplimentary index, keyed by recid. Whenever an update is performed, the<BR>recid would be retrieved, and the page and node of the object could be<BR>retrieved. This would allow for efficient updates, but changes to the main<BR>index tree (like a re-balance operation) could require a significant number<BR>of changes in the siplimentary index. <BR><BR> <BR><BR>A compromise would be to store only the page where the node for a given<BR>record exists, then do a linear search for the recid during updates. <BR><BR> <BR><BR>I'm really not sure where to go with all of this, but I think it's<BR>appropriate to start talking about it. If we have a suplimentary index,<BR>then it may make sense to toss the physical row location into it and ditch<BR>the translation pages entirely. Individual record lookups would require a<BR>tree search, but in 99% of my uses of jdbm this is the case already. This<BR>is closer to the primary key index idea that Alex floated awhile back. <BR><BR> <BR><BR>I'm not at all saying this is the way to go, but I would like to begin<BR>getting a handle on how we can maintain indexes for objects stored in jdbm. <BR><BR> <BR><BR>- K <<BR><BR><BR><BR>-------------------------------------------------------<BR>This SF.Net email is sponsored by xPML, a groundbreaking scripting language<BR>that extends applications into web and mobile media. Attend the live webcast<BR>and join the prime developer group breaking into this new coding territory!<BR><A href="http://sel.as-us.falkag.net/sel?cmd"><FONT color=#0000ff>http://sel.as-us.falkag.net/sel?cmd</FONT></A><BR><BR><<BR><BR><BR><BR>-------------------------------------------------------<BR>This SF.Net email is sponsored by xPML, a groundbreaking scripting language<BR>that extends applications into web and mobile media. Attend the live webcast<BR>and join the prime developer group breaking into this new coding territory!<BR><A href="http://sel.as-us.falkag.net/sel?cmd"><FONT color=#0000ff>http://sel.as-us.falkag.net/sel?cmd</FONT></A><BR><BR><<BR></FONT></FONT></TD></TR></TBODY></TABLE></DIV></FONT></BODY></HTML> |