re[2]: [Jdbm-developer] primary and secondary indexes in the record manager

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0
 Transitional//EN">
<HTML><HEAD>
<STYLE type=text/css> P, UL, OL, DL, DIR,
 MENU, PRE { margin: 0 auto;}</STYLE>

<META content="MSHTML 6.00.2900.2802" name=GENERATOR></HEAD>
<BODY leftMargin=1 topMargin=1 rightMargin=1><FONT
 face=Tahoma>
<DIV><FONT face=Arial size=2>Bryan-</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial size=2></FONT></DIV>
<DIV><FONT face=Arial size=2>Yes - I've looked
 at the GOM implementation (and have worked
 extensively with similar approaches - Microsoft
 COM, mozilla cross platform COM, etc...).&nbsp;
 I can see the utility in some situations,
 but for the development I do, losing the
 runtime type checking would be a massive
 problem.&nbsp; I'm trying to think of jdbm
 as a true object database, and not just a
 way of storing key/value pairs.&nbsp; Certainly,
 I could create wrapper classes that call
 the GOM framework, then work in an object
 oriented manner, but that is a heck of a
 lot of work (and boring work at that).</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial size=2>As a user of
 an oodbm, I expect to be able to work with
 objects the way I normally work with objects,
 but have the persistent capabilities.&nbsp;
 The solution I'm going for is much closer
 to JDO than COM...&nbsp; So, the idea of
 using events and listeners is fine - but
 we have to have a true object oriented mechanism
 for driving those events.&nbsp; Short of
 code injection, aspect&nbsp;weaving,&nbsp;or
 some ugly notification requirement imposed
 by the API, I just don't see how to achieve
 this.&nbsp; The alternative, then, is to
 provide some sort of reverse lookup method
 that either caches index info for all retrieved
 records or provides some sort of view into
 the index keys (or node locations) for each
 recid.</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial size=2>Expanding on
 my comment re: blink trees:&nbsp; We could
 certainly store the index pages on DATA pages,
 if desired - but there may be advantages
 (locality again!) to storing the index pages
 in their own page thread.&nbsp; Given that
 index page objects are likely to change much,
 much more frequently than data pages, it
 may make sense from a caching and free page
 perspective to keep them in a separate page
 thread.</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial size=2>Likewise, in
 my proof of concept, I could have stored
 the version tables on DATA pages, but I chose
 not to because version tables are extremely
 short lived, and will undergo a lot of churn.&nbsp;
 If the version tables were interspersed with
 the data, we'd wind up with a huge amount
 of fragmentation.&nbsp; I think (but can
 not say with absolute certainty) that maintaining
 these data structures in a separate page
 thread should optimize caching behavior,
 allocation, etc...</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial size=2>- K</FONT></DIV>
<DIV><FONT face=Arial size=2>&nbsp;</FONT>

<TABLE>
<TBODY>
<TR>
<TD width=1 bgColor=blue></TD>
<TD>&gt;
 Kevin, &nbsp; Can you expand on
 your comment about blink trees? &nbsp;Why
 do you see that they need to be outside of
 the record management API? &nbsp;I was hoping
 that some data structures persisted as records
 could opt out of the default concurrency
 control mechanisms and use consistency based
 mechanisms, but that does not mean that we
 need to store them directly on pages does
 it? &nbsp; I see index maintenance
 as the role of a framework layer over the
 record management API. &nbsp;Have you looked
 at the framework that I have implemented?
 &nbsp;[1]. &nbsp;It handles index maintenance
 by catching property updates and generating
 events. &nbsp;Indices register as listeners
 for events and handle the removal under the
 old key and the insert under the new key
 (the old key, new key, and the object are
 part of the event). &nbsp;This framework
 is designed around generic data rather than
 explicitly declared java fields and reflection.
 &nbsp;If you want to manage indices and your
 classes have fields that are getting serialized,
 serve as your index keys, and you have setters
 for those fields then you need to generate
 notification events from those setters and
 have your indices registered as listeners.
 &nbsp; -bryan &nbsp; [1] <A
 href="http://proto.cognitiveweb.org/projects/cweb/multiproject/cweb-generic-native/index.html">http://proto.cognitiveweb.org/projects/cweb/multiproject/cweb-generic-native/index.html</A>
 &nbsp; From: <A href="mailto:jdb...@li...">jdb...@li...</A>
 <A href="mailto:jdb...@li...">[mailto:jdb...@li...]</A>
 On Behalf Of Kevin Day Sent: Monday, April
 10, 2006 9:46 PM To: JDBM Developer listserv Subject:
 [Jdbm-developer] primary and secondary indexes
 in the record manager &nbsp; One
 thing that I've been thinking about in a
 background thread: &nbsp; We've
 discussed the possibility of making indexes
 a first order citizen of jdbm (instead of
 storing a BTree in the record manager itself,
 on the data pages). &nbsp;This has lots of
 advantages from a concurrency perspective
 (we can use BLink trees). &nbsp;It also creates
 a programming paradigm that is much closer
 to how people are used to working with databases.
 &nbsp; This is where secondary
 indexes come in. &nbsp; My
 experience so far with implementing multi-index
 constructs in jdbm has been that keeping
 the indexes in sync is an absolute bugger.
 &nbsp;I've had to resort to special serializers
 that get the serialized form of the keys
 used by a given record when that record is
 initially retrieved, then use those keys
 to remove and re-insert values during any
 update of the record. &nbsp; The
 problem with this is that you wind up having
 to deserialize the full record, then turn
 around and serialize keys for each index
 that record is a part of. &nbsp;And you have
 to do it for every single read (it isn't
 possible to determine the index key values
 during the update itself, because the object
 itself has already been changed). &nbsp; Another
 approach to this problem would be to maintain
 index meta data in a suplimentary index,
 keyed by recid. &nbsp;Whenever an update
 is performed, the recid would be retrieved,
 and the page and node of the object could
 be retrieved. &nbsp;This would allow for
 efficient updates, but changes to the main
 index tree (like a re-balance operation)
 could require a significant number of changes
 in the siplimentary index. &nbsp; A
 compromise would be to store only the page
 where the node for a given record exists,
 then do a linear search for the recid during
 updates. &nbsp; I'm really
 not sure where to go with all of this, but
 I think it's appropriate to start talking
 about it. &nbsp;If we have a suplimentary
 index, then it may make sense to toss the
 physical row location into it and ditch the
 translation pages entirely. &nbsp;Individual
 record lookups would require a tree search,
 but in 99% of my uses of jdbm this is the
 case already. &nbsp;This is closer to the
 primary key index idea that Alex floated
 awhile back. &nbsp; I'm not
 at all saying this is the way to go, but
 I would like to begin getting a handle on
 how we can maintain indexes for objects stored
 in jdbm. &nbsp; - K &lt; </TD></TR></TBODY></TABLE></DIV></BODY></HTML>