Can you expand on your comment about blink trees?  Why do you see that they need to be outside of the record management API?  I was hoping that some data structures persisted as records could opt out of the default concurrency control mechanisms and use consistency based mechanisms, but that does not mean that we need to store them directly on pages does it?


I see index maintenance as the role of a framework layer over the record management API.  Have you looked at the framework that I have implemented?  [1].  It handles index maintenance by catching property updates and generating events.  Indices register as listeners for events and handle the removal under the old key and the insert under the new key (the old key, new key, and the object are part of the event).  This framework is designed around “generic” data rather than explicitly declared java fields and reflection.  If you want to manage indices and your classes have fields that are getting serialized, serve as your index keys, and you have setters for those fields then you need to generate notification events from those setters and have your indices registered as listeners.






From: [] On Behalf Of Kevin Day
Sent: Monday, April 10, 2006 9:46 PM
To: JDBM Developer listserv
Subject: [Jdbm-developer] primary and secondary indexes in the record manager


One thing that I've been thinking about in a background thread:


We've discussed the possibility of making indexes a first order citizen of jdbm (instead of storing a BTree in the record manager itself, on the data pages).  This has lots of advantages from a concurrency perspective (we can use BLink trees).  It also creates a programming paradigm that is much closer to how people are used to working with databases.


This is where secondary indexes come in.


My experience so far with implementing multi-index constructs in jdbm has been that keeping the indexes in sync is an absolute bugger.  I've had to resort to special serializers that get the serialized form of the keys used by a given record when that record is initially retrieved, then use those keys to remove and re-insert values during any update of the record.


The problem with this is that you wind up having to deserialize the full record, then turn around and serialize keys for each index that record is a part of.  And you have to do it for every single read (it isn't possible to determine the index key values during the update itself, because the object itself has already been changed).


Another approach to this problem would be to maintain index meta data in a suplimentary index, keyed by recid.  Whenever an update is performed, the recid would be retrieved, and the page and node of the object could be retrieved.  This would allow for efficient updates, but changes to the main index tree (like a re-balance operation) could require a significant number of changes in the siplimentary index.


A compromise would be to store only the page where the node for a given record exists, then do a linear search for the recid during updates.


I'm really not sure where to go with all of this, but I think it's appropriate to start talking about it.  If we have a suplimentary index, then it may make sense to toss the physical row location into it and ditch the translation pages entirely.  Individual record lookups would require a tree search, but in 99% of my uses of jdbm this is the case already.  This is closer to the primary key index idea that Alex floated awhile back.


I'm not at all saying this is the way to go, but I would like to begin getting a handle on how we can maintain indexes for objects stored in jdbm.


- K

------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! _______________________________________________ Jdbm-developer mailing list