re[6]: [Jdbm-developer] primary and secondary indexes in the record manager

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0
 Transitional//EN">
<HTML><HEAD>
<STYLE type=text/css> P, UL, OL, DL, DIR,
 MENU, PRE { margin: 0 auto;}</STYLE>

<META content="MSHTML 6.00.2900.2802" name=GENERATOR></HEAD>
<BODY leftMargin=1 topMargin=1 rightMargin=1><FONT
 face=Tahoma>
<DIV><FONT face=Arial size=2>Bryan-</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial size=2></FONT></DIV>
<DIV><FONT face=Arial size=2>I don't use
 JDO in my projects because it is much heavier
 weight than is appropriate for what I need.&nbsp;
 I also disagree with their use of byte code
 injection, and big gnarly configuration files.</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial size=2>In terms of
 whether indexing is a jdbm concern or not,
 I think I'll have to defer to Alex on this
 one.&nbsp; My understanding from past emails
 on the subject was that some form of indexing
 and constraint system was an extremely desirable
 feature for the core jdbm2 implementation.</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial size=2>When I say that
 the indexes need to reside outside the record
 manager, I mean that the standard recman
 semantecs don't apply.&nbsp; The new recman
 always operates in a transaction context,
 updates to index data structures don't take
 place in a transaction context&nbsp;- that's
 why I say the indexes need to live outside
 of the record manager.</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial size=2>Take a look
 at the sample class in the proof of concept
 code I emailed last week, and you'll see
 that we are *not* making insert/fetch/update
 calls against a transaction object.&nbsp;
 Instead, we make the calls against the record
 manager, which in turn determines which transactional
 context the operations should be performed.</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial size=2>Finally, the
 BTree implementation for the indexes is going
 to be completely different than a btree that
 lives inside the record manager.&nbsp; First
 off, it's going to be a b-link tree.&nbsp;
 Second, it's values will always be recids.&nbsp;
 Third, browser traversal and lookup will
 implicitly be fetching records, based on
 the current transaction context.&nbsp; So,
 the index is *much* closer to the record
 manager itself than to the objects stored
 inside the record manager...</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial size=2>- K</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial size=2>&nbsp;</FONT>
 
<TABLE>
<TBODY>
<TR>
<TD width=1 bgColor=blue><FONT face=Arial
 size=2></FONT></TD>
<TD><FONT face=Arial size=2><FONT color=green>&gt;
 Kevin,<BR><BR>My critique of JDO was more
 of implementation approaches, but you raise
 a<BR>fair point. &nbsp;Would JDO for jdbm
 answer the mail for you? &nbsp;I know that
 there<BR>are OS JDO projects -- I believe
 Apache hosts one for example. &nbsp;Could
 you<BR>leverage the JDO framework (reflection,
 etc.) but use jdbm2 as the<BR>persistence
 store? &nbsp;Or is that exactly what you
 are planning?<BR><BR>However, I still do
 not see why management of primary and secondary
 indices<BR>is a "jdbm" level concern. &nbsp;This
 feature can clearly be built within a<BR>framework
 over jdbm, so why introduce it within jdbm?
 &nbsp;It seems to me that<BR>we are talking
 about the separation of concerns between
 record management<BR>with transactions, concurrency,
 etc. and an oodbms with transparent access<BR>to
 and use of the native java fields. &nbsp;I
 see the latter as interested but<BR>cleanly
 separable from the former -- just a layer
 on top.<BR><BR>I still don't follow why the
 b-link nodes would exist outside of the record<BR>manager
 API. &nbsp;Perhaps we are just using the
 terms slightly differently. &nbsp;I<BR>see
 that we need to use distinct pages for index
 nodes, but not that the<BR>data structures
 on those pages need to be any different.
 &nbsp;It is more that<BR>the free space on
 those pages is not available for the physical
 record<BR>allocator for user objects, but
 only for node allocations on the btree.<BR><BR>-bryan<BR><BR>________________________________________<BR>From:
 <A href="mailto:jdb...@li..."><FONT
 color=#0000ff>jdb...@li...</FONT></A><BR><A
 href="mailto:jdb...@li..."><FONT
 color=#0000ff>[mailto:jdb...@li...]</FONT></A>
 On Behalf Of Kevin Day<BR>Sent: Tuesday,
 April 11, 2006 3:33 PM<BR>To: JDBM Developer
 listserv<BR>Subject: re[4]: [Jdbm-developer]
 primary and secondary indexes in the record<BR>manager<BR><BR>Bryan-<BR>&nbsp;<BR>MS
 COM and GOM have a lot in common, actually
 - but that's a conversation<BR>for another
 day after we ship jdbm2 :-)<BR>&nbsp;<BR>&nbsp;<BR>I
 think that you may have a mis-perception
 of how JDO works.&nbsp; To the actual<BR>user
 of the system, it's completely transparent.&nbsp;
 There is work that has to<BR>be done in&nbsp;an
 external configuration file&nbsp;to define
 the O/R mapping, and<BR>setting up the indexes
 in the underlying datastore, but once that's
 done,<BR>using JDO is almost exactly what
 we are talking about for jdbm - you start
 a<BR>transaction, you do stuff and you call
 commit (the biggest difference is<BR>that
 in JDO I don't think you have to call update()
 when you&nbsp;change an<BR>object&nbsp;).<BR>&nbsp;<BR>With
 the GOM approach, the overhead of property
 access, etc... is not the<BR>concern for
 me&nbsp;- it's the impact on the developer
 and the style of coding.&nbsp;<BR>It certainly
 would be possible to use annotations and
 automatically generate<BR>objects that conform
 to the described interface, etc...&nbsp;
 But it moves far<BR>afield from the concept
 of POJOs.<BR>&nbsp;<BR>Run-time overriding
 (for example using AspectJ) would be another
 way to get<BR>at this - but you force the
 user to declare which calls can change the<BR>object
 state, etc... - it's just a ton of extra
 work just to get<BR>persistence.<BR>&nbsp;<BR>&nbsp;<BR>&nbsp;<BR>Anyway,
 I think that it is best if we focus on the
 jdbm subsystem itself&nbsp;at<BR>this stage
 - we can certainly include hooks, etc...
 for making the higher<BR>level stuff happen
 - but I'd like to focus on getting the core
 functionality<BR>of a highly concurrent,
 fully functional&nbsp;oodbm up and running.&nbsp;
 One of the<BR>core features of most oodbms&nbsp;is
 the ability to navigate (and query)&nbsp;the<BR>object
 store using indexes, and I think we need
 to seriously consider if/how<BR>to do this
 in jdbm.&nbsp; If a higher level system doesn't
 take advantage of<BR>those indexes, then
 that's fine - no harm done.<BR>&nbsp;<BR>The
 vast majority of jdbm users are using it
 as a persistent tree map, which<BR>implies
 index based access.&nbsp; For high concurrency,
 we need to offer these<BR>users map capability
 that is concurrent aware, and if we are going
 down that<BR>path, then we should at least
 talk about secondary indexes, because having<BR>multi-keyed
 maps is useful for a huge number of development
 situations.<BR>&nbsp;<BR>BTW - another reason
 for why index trees shouldn't reside in the
 regular<BR>record manager is concurrent updates
 - if tx1 causes the tree to re-balance,<BR>we
 need to have a mechanism for ensuring that
 tx2's logical view of the tree<BR>stays the
 same, but that any changes that tx2 wanted
 to make will still<BR>work.&nbsp; That's
 why I'm proposing a b-link structure that
 holds recid<BR>references, and those recids
 are resolved in the context of whatever<BR>transaction
 happens to be active.&nbsp; <BR>&nbsp;<BR>This
 approach (I think) implies a certain amount
 of garbage collection<BR>functionality, and
 the GC operation requires visibility to both
 the<BR>versioned and non-versioned logical
 store.&nbsp; This also implies that the GC<BR>must
 *know* which objects are index objects, which
 in turn implies that the<BR>indexes are in
 some way managed in jdbm (something akin
 to SQL INSERT INDEX,<BR>DROP INDEX, etc...
 commands).<BR>&nbsp;<BR>- K<BR>&nbsp; <BR><BR>&gt;
 Kevin,<BR><BR>That is interesting feedback.
 &nbsp;I personally do not think of COM when
 I think<BR>about GOM. &nbsp;I am not actually
 very read in on the Microsoft architectures<BR>and
 have not looked at them in several years,
 but I do not consider them to<BR>be persistent
 object database systems, data extensible,
 etc.<BR><BR>In any case, you can get strongly
 typed behavior from GOM by using property<BR>and
 association classes to define property and
 link constraints. &nbsp;I do not<BR>find
 much overhead in the property access and
 type checking mechanisms.<BR>2/3rds of the
 current time for our application on a bulk
 load operation is<BR>the serialization of
 objects and laying them down on to page images.
 &nbsp;If we<BR>use a hard reference cache
 so that no cache evictions occur until the
 tx<BR>commit, then the processing time drops
 by 2/3rds. &nbsp;This is why one of my<BR>goals
 has been a rewrite of the transaction system
 and physical row<BR>allocation systems of
 jdbm.<BR><BR>I certainly agree that handling
 index updates transparently is difficult
 for<BR>"standard" java objects. &nbsp;I think
 that an interesting approach could be done<BR>with
 Java 5 annotations and a BCE package. &nbsp;From
 what I have been able to<BR>see about JDO,
 there are very few implementations that provide
 what I might<BR>call a native java persistent
 object system that is performance oriented.<BR>E.g.,
 JDO achieved through an O-R mapper and stored
 in an RDBMS appears to<BR>be a common strategy.<BR><BR>What
 I have see is pretty non-transparent. &nbsp;You
 write your native java<BR>objects to be explicitly
 persistence aware and do things like indicate
 when<BR>a property value change has invalidated
 an index entry for an object. &nbsp;I<BR>take
 it that this solution is not satisfying for
 you and that you are<BR>looking for a means
 to achieve transparent persistence for native
 java?<BR><BR>What I tend to do with GOM is
 write "skins" which implement application<BR>interfaces
 and are backed by the generic data object.
 &nbsp;That gives me the<BR>application semantics
 for the persistent objects and strong type
 checking as<BR>well as interoperability across
 GOM implementations. &nbsp;I don't find this
 to<BR>be that much effort, but it does smack
 of something that would benefit from<BR>automation.
 &nbsp;You could also subclass the generic
 object for a specific<BR>implementation and
 then operate in a type safe traditional OO
 manner, but<BR>that locks you into a specific
 object hierarchy and backend.<BR><BR>I am
 noodling with both UML driven authoring of
 the class, property and<BR>association constraints
 and BCE based optimizations for skins over
 live<BR>generic objects but I do not have
 anything concrete to offer in either case.<BR><BR>-bryan<BR><BR>________________________________________<BR>From:
 <A href="mailto:jdb...@li..."><FONT
 color=#0000ff>jdb...@li...</FONT></A><BR><A
 href="mailto:jdb...@li..."><FONT
 color=#0000ff>[mailto:jdb...@li...]</FONT></A>
 On Behalf Of Kevin Day<BR>Sent: Tuesday,
 April 11, 2006 12:49 PM<BR>To: JDBM Developer
 listserv<BR>Subject: re[2]: [Jdbm-developer]
 primary and secondary indexes in the record<BR>manager<BR><BR>Bryan-<BR>&nbsp;<BR>Yes
 - I've looked at the GOM implementation (and
 have worked extensively<BR>with similar approaches
 - Microsoft COM, mozilla cross platform COM,<BR>etc...).&nbsp;
 I can see the utility in some situations,
 but for the development<BR>I do, losing the
 runtime type checking would be a massive
 problem.&nbsp; I'm<BR>trying to think of
 jdbm as a true object database, and not just
 a way of<BR>storing key/value pairs.&nbsp;
 Certainly, I could create wrapper classes
 that<BR>call the GOM framework, then work
 in an object oriented manner, but that is<BR>a
 heck of a lot of work (and boring work at
 that).<BR>&nbsp;<BR>As a user of an oodbm,
 I expect to be able to work with objects
 the way I<BR>normally work with objects,
 but have the persistent capabilities.&nbsp;
 The<BR>solution I'm going for is much closer
 to JDO than COM...&nbsp; So, the idea of<BR>using
 events and listeners is fine - but we have
 to have a true object<BR>oriented mechanism
 for driving those events.&nbsp; Short of
 code injection,<BR>aspect&nbsp;weaving,&nbsp;or
 some ugly notification requirement imposed
 by the API, I<BR>just don't see how to achieve
 this.&nbsp; The alternative, then, is to
 provide<BR>some sort of reverse lookup method
 that either caches index info for all<BR>retrieved
 records or provides some sort of view into
 the index keys (or node<BR>locations) for
 each recid.<BR>&nbsp;<BR>&nbsp;<BR>&nbsp;<BR>Expanding
 on my comment re: blink trees:&nbsp; We could
 certainly store the index<BR>pages on DATA
 pages, if desired - but there may be advantages
 (locality<BR>again!) to storing the index
 pages in their own page thread.&nbsp; Given
 that<BR>index page objects are likely to
 change much, much more frequently than data<BR>pages,
 it may make sense from a caching and free
 page perspective to keep<BR>them in a separate
 page thread.<BR>&nbsp;<BR>Likewise, in my
 proof of concept, I could have stored the
 version tables on<BR>DATA pages, but I chose
 not to because version tables are extremely
 short<BR>lived, and will undergo a lot of
 churn.&nbsp; If the version tables were<BR>interspersed
 with the data, we'd wind up with a huge amount
 of<BR>fragmentation.&nbsp; I think (but can
 not say with absolute certainty) that<BR>maintaining
 these data structures in a separate page
 thread should optimize<BR>caching behavior,
 allocation, etc...<BR>&nbsp;<BR>- K<BR>&nbsp;
 <BR><BR>&gt; <BR>Kevin, <BR>&nbsp;<BR>Can
 you expand on your comment about blink trees?
 &nbsp;Why do you see that they<BR>need to
 be outside of the record management API?
 &nbsp;I was hoping that some<BR>data structures
 persisted as records could opt out of the
 default<BR>concurrency control mechanisms
 and use consistency based mechanisms, but<BR>that
 does not mean that we need to store them
 directly on pages does it? <BR>&nbsp;<BR>I
 see index maintenance as the role of a framework
 layer over the record<BR>management API.
 &nbsp;Have you looked at the framework that
 I have implemented?<BR>&nbsp;[1]. &nbsp;It
 handles index maintenance by catching property
 updates and<BR>generating events. &nbsp;Indices
 register as listeners for events and handle
 the<BR>removal under the old key and the
 insert under the new key (the old key, new<BR>key,
 and the object are part of the event). &nbsp;This
 framework is designed<BR>around generic data
 rather than explicitly declared java fields
 and<BR>reflection. &nbsp;If you want to manage
 indices and your classes have fields that<BR>are
 getting serialized, serve as your index keys,
 and you have setters for<BR>those fields
 then you need to generate notification events
 from those<BR>setters and have your indices
 registered as listeners. <BR>&nbsp;<BR>-bryan
 <BR>&nbsp;<BR>[1]<BR><A href="http://proto.cognitiveweb.org/projects/cweb/multiproject/cweb-generic-native"><FONT
 color=#0000ff>http://proto.cognitiveweb.org/projects/cweb/multiproject/cweb-generic-native</FONT></A><BR>/index.html
 <BR>&nbsp;<BR><BR><BR>From: <A href="mailto:jdb...@li..."><FONT
 color=#0000ff>jdb...@li...</FONT></A><BR><A
 href="mailto:jdb...@li..."><FONT
 color=#0000ff>[mailto:jdb...@li...]</FONT></A>
 On Behalf Of Kevin Day<BR>Sent: Monday, April
 10, 2006 9:46 PM<BR>To: JDBM Developer listserv<BR>Subject:
 [Jdbm-developer] primary and secondary indexes
 in the record<BR>manager <BR>&nbsp;<BR><BR>One
 thing that I've been thinking about in a
 background thread: <BR><BR>&nbsp;<BR><BR>We've
 discussed the possibility of making indexes
 a first order citizen of<BR>jdbm (instead
 of storing a BTree in the record manager
 itself, on the data<BR>pages). &nbsp;This
 has lots of advantages from a concurrency
 perspective (we can<BR>use BLink trees).
 &nbsp;It also creates a programming paradigm
 that is much<BR>closer to how people are
 used to working with databases. <BR><BR>&nbsp;<BR><BR>This
 is where secondary indexes come in. <BR><BR>&nbsp;<BR><BR>My
 experience so far with implementing multi-index
 constructs in jdbm has<BR>been that keeping
 the indexes in sync is an absolute bugger.
 &nbsp;I've had to<BR>resort to special serializers
 that get the serialized form of the keys
 used<BR>by a given record when that record
 is initially retrieved, then use those<BR>keys
 to remove and re-insert values during any
 update of the record. <BR><BR>&nbsp;<BR><BR>The
 problem with this is that you wind up having
 to deserialize the full<BR>record, then turn
 around and serialize keys for each index
 that record is a<BR>part of. &nbsp;And you
 have to do it for every single read (it isn't
 possible to<BR>determine the index key values
 during the update itself, because the object<BR>itself
 has already been changed). <BR><BR>&nbsp;<BR><BR>Another
 approach to this problem would be to maintain
 index meta data in a<BR>suplimentary index,
 keyed by recid. &nbsp;Whenever an update
 is performed, the<BR>recid would be retrieved,
 and the page and node of the object could
 be<BR>retrieved. &nbsp;This would allow for
 efficient updates, but changes to the main<BR>index
 tree (like a re-balance operation) could
 require a significant number<BR>of changes
 in the siplimentary index. <BR><BR>&nbsp;<BR><BR>A
 compromise would be to store only the page
 where the node for a given<BR>record exists,
 then do a linear search for the recid during
 updates. <BR><BR>&nbsp;<BR><BR>I'm really
 not sure where to go with all of this, but
 I think it's<BR>appropriate to start talking
 about it. &nbsp;If we have a suplimentary
 index,<BR>then it may make sense to toss
 the physical row location into it and ditch<BR>the
 translation pages entirely. &nbsp;Individual
 record lookups would require a<BR>tree search,
 but in 99% of my uses of jdbm this is the
 case already. &nbsp;This<BR>is closer to
 the primary key index idea that Alex floated
 awhile back. <BR><BR>&nbsp;<BR><BR>I'm not
 at all saying this is the way to go, but
 I would like to begin<BR>getting a handle
 on how we can maintain indexes for objects
 stored in jdbm. <BR><BR>&nbsp;<BR><BR>- K
 &lt;<BR><BR><BR><BR>-------------------------------------------------------<BR>This
 SF.Net email is sponsored by xPML, a groundbreaking
 scripting language<BR>that extends applications
 into web and mobile media. Attend the live
 webcast<BR>and join the prime developer group
 breaking into this new coding territory!<BR><A
 href="http://sel.as-us.falkag.net/sel?cmd"><FONT
 color=#0000ff>http://sel.as-us.falkag.net/sel?cmd</FONT></A><BR><BR>&lt;<BR><BR><BR><BR>-------------------------------------------------------<BR>This
 SF.Net email is sponsored by xPML, a groundbreaking
 scripting language<BR>that extends applications
 into web and mobile media. Attend the live
 webcast<BR>and join the prime developer group
 breaking into this new coding territory!<BR><A
 href="http://sel.as-us.falkag.net/sel?cmd"><FONT
 color=#0000ff>http://sel.as-us.falkag.net/sel?cmd</FONT></A><BR><BR>&lt;<BR></FONT></FONT></TD></TR></TBODY></TABLE></DIV></FONT></BODY></HTML>