Kevin,

 

I am not sure what you mean.  I use indices in the GOM framework all the time.  The basic pattern is:

 

IGeneric g = om.makeGeneric(); // create a generic object.

ILinkSet set1 = g.getLinkSet(“set1”); // a named scalable collection accessible from “g”.

IGeneric x = om.makeGeneric(); x.setString(“name”,”Bryan”); // create another object and set its ‘name’ property.

set1.add(x); // add members to the link set.

ILinkSetIndex ndx = set1.getIndex(“name”); // obtain maintained index over the “name” property for the objects in set1.

Iterator itr = ndx.iterator(); // visit all members of “set1” in order by the value of their “name” property.

 

You have all the index access methods available on the ILinkSetIndex, e.g., get an iterator over a key range, #of keys in a range, etc.

 

I just don’t need any more mechanism in jdbm than already exists to provide this index support.

 

-bryan

 


From: jdbm-developer-admin@lists.sourceforge.net [mailto:jdbm-developer-admin@lists.sourceforge.net] On Behalf Of Kevin Day
Sent: Friday, April 14, 2006 7:52 PM
To: JDBM Developer listserv
Subject: re[16]: [Jdbm-developer] primary and secondary indexes in the record manager

 

Bryan-

 

Bingo.  I suspect that many users will fall into this category.  I take it from your comments that you don't use indexes at all in your system (or, more accurately, the indexes are implicit in your data structures themselves).

 

This API must be backed by a solid system for automatically keeping indexes in sync with the state of the record manager, and I want to make sure that there aren't any design decisions that would prevent this from being successful.  By my thinking, the critical aspect of such a system is the ability to identify the index keys that are associated with any record.  In a field oriented system, this is trivial, but an object oriented system presents significant challenges...

 

I'm leaning towards a reverse index, keyed by recid...  each index key will have to be stored twice, but maybe that's ok...

 

- K

 

 

 

 

>
Kevin,
 
The existing JDBM indexing support perfectly fits my needs, except where it is a little feature sparse.  The missing features are support for concurrency, support for traversal under concurrent modification, and efficient tracking of the#of keys in a key range.
 
I think that all we need to do is create a simple API for people who want ISAM style access [1] to their data.
 
-bryan
 
[1] http://en.wikipedia.org/wiki/ISAM
 


From: jdbm-developer-admin@lists.sourceforge.net [mailto:jdbm-developer-admin@lists.sourceforge.net] On Behalf Of Kevin Day
Sent: Thursday, April 13, 2006 8:03 PM
To: JDBM Developer listserv
Subject: re[14]: [Jdbm-developer] primary and secondary indexes in the record manager
 

Bryan-

 

Right - I view the work you are doing as something that should layered on top of an effective indexing sub-system, and only involved in automatically firing the recman.update() call in response to changes in the objects.  Having a sub-system devoted to index management greatly simplifies this.  WIth an indexing sub-system, there is a strong separation of concerns between what actually defines an object change (events), and what the effect of that change should actually be (consistency checking).

 

If one of my projects requires complete transparency, then I'll probably use something like BCE (probably via AspectJ) - but the majority of my work is in web and batched transactions, where the interaction with the datastore tends to betransactional in nature - you get stuff out of the database, you do stuff with it, then you tell the database that things have changed.  Lots of really tiny iterations.

 

Anyway, in traditional RDBMSes, and in the vast majority of OODBMSes, indexing is part of the underlying feature set - and I suspect that there is a large, untapped set of users who would love a nice embeddable OODBMS, but the simple key/value pairing currently available in jdbm just doesn't meet their needs.

 

As for whether indexing could ever be done correctly by just tacking it on, I'm not so sure - there is a reason that production database solutions provide built in indexing capability - users want it :-)  I also suspect that there are important design and architectural reasons, and that is why I'm being so insistent on at least figuring out *how* we can properly do this - doing it correctly may impact architectural design decisions.  I'd rather come up with a workable plan and determine that it won't impact the architecture than find out that we can't do it correctly 6 months from now because of decisions we made today.

 

We'll pick this up next week, I'm sure :-)

 

- K

 

 

  

 
>
Hum.  I can see that, and I imagine that some people are looking for this.  And now I understand why we appear to be worlds apart on this issue.  I never use jdbm without a weak reference cache that canonicalizes references and I never use it as a cursor system, but as coherent persistence of data and index state which is synchronized with data state.  Conceptually, I am more inclined to either generic persistent objects or native persistence of java objects using base classes, event notification, or reflection and BCE to make it work.
 
Perhaps it makes sense to factor out a jdbm2 package for APIs supporting a cursor-oriented system?  This is not really relevant to any of my interests in jdbm2 (concurrency, fast transaction processing, long transaction processing, extensible framework), but I could see a fairly trivial framework layer which provided suitable APIs for cursor oriented applications.
 
-bryan
 


From: jdbm-developer-admin@lists.sourceforge.net [mailto:jdbm-developer-admin@lists.sourceforge.net] On Behalf Of Kevin Day
Sent: Wednesday, April 12, 2006 5:06 PM
To: JDBM Developer listserv
Subject: re[12]: [Jdbm-developer] primary and secondary indexes in the record manager
 

Bryan-

 

Sure - it's no different than calling fetch() in the general case with no object caching.  With no object cache, a call to fetch() returns an object without un-updated changes reflected.

 

This is actually very consistent with cursor access to most database systems.  If you make changes to a record and run a query prior to initiating an update, the changes are not reflected in the search results.

 

I'd much rather have that behavior than have to include database code in my classes.  If I absolutely needed the index changes to be reflected regardless of whether update was called, then I'd be looking at something like aspectJ to inject index update events (and I'd have the woven code calling update).

 

- K

  

 
>Kevin,

If you do not update the indices immmediately within the scope of the tx in
which the attribute was modified then the indices become stale with respect
to the runtime data.  Is that acceptable?

-bryan


-----Original Message-----
From: jdbm-developer-admin@lists.sourceforge.net on behalf of Kevin Day
Sent: Wed 4/12/2006 2:33 PM
To: JDBM Developer listserv
Subject: re[10]: [Jdbm-developer] primary and secondary indexes in the
record  manager

Bryan-

My approach to the indeces question is that the indexes are part of the
'updated' state of the database.  If an update call hasn't been made, then
the changes caused by that update call are not reflected in the index.  This
is consistent with cursor based system behavior.

Forcing a user to modify their core code is certainly the easiest thing to
do - but it just doesn't fly in many, many development scenarios - it
severely violates IOC, makes testing much harder, etc...  EJBs went down
this path, and there is a massive developer backlash against it right now.

If immediate index update notification is required, then using aspects and
configuring a set of method calls that result in index updates could do it -
but if we go down that path, then I would challenge the use of the update()
call being part of the recman api at all.  I think that having changes be
marked by an update() call is a reasonable trade-off in the design of the
recman, and having that be the trigger to update indexes is probably also
appropriate.

- K
 
    >Kevin,

In terms of the public APIs for jdbm, I agree that they should offer
simplicity for the "common man" who just needs persistence and indices.  The
current API is well suited to this, and I would be inclined to make it even
more bear bones.  The real trick here is going to be devising the central
APIs so that we partition the different areas of concern in a flexible and
effective manner.  Creating a concrete class encompassing a good general
solution for access to persistent records and indices will not be hard, but
some applications will want to get under the hood and I want to support that
too.

With regard to the question of "aware" indices, the case that is the hardest
to support is when a user modifies an indexed field on a persistent object.
Unless you require property event notification for indexable attributes,
this is always going to go unnoticed and that will result in incoherent
indices.  What about requiring people to implement the appropriate property
change event mechanism.  That way you get the old and new property value in
the event.  That is all you really need.

-bryan

________________________________________
From:  <mailto:jdbm-developer-admin@lists.sourceforge.net>
jdbm-developer-admin@lists.sourceforge.net
<mailto:jdbm-developer-admin@lists.sourceforge.net>
[mailto:jdbm-developer-admin@lists.sourceforge.net] On Behalf Of Kevin Day
Sent: Wednesday, April 12, 2006 1:35 PM
To: JDBM Developer listserv
Subject: re[8]: [Jdbm-developer] primary and secondary indexes in the record
manager

Bryan-

Let's take a crack at how to handle self-concurrency managed objects.  We
can not use the regular recman api for working with these objects (BTree or
BPage), because the recman will cause the object to interact with the
standard concurrency control mechanism (MVCC coupled with either 2PL or
opportunistic).

From an architecture perspective, do we need to have the recman expose an
advanced api for allowing direct access to the logical record store
(bypassing the locking sub-system and version manager)?

My preference would be to keep the recman API for regular users as simple as
possible...




As for managing indexes, I certainly see ways of doing it (heck, I'm doing
it now in my own code) - the point of the discussion was to talk about what
options are available to us.  The options I currently see are:

1.  Require user to add behavior to their objects to support triggering of
index updates
2.  Capture the index key set of each record when it is fetched, and cache
it in the record cache
3.  Use a reverse index for mapping recids to the index key set
4.  Use a reverse index for mapping recids to the index page and slot
5.  During update, deserialize the source object (get an 'as stored' object
vs the 'as changed' object that resides in the cache at update time), then
capture the key index set based on the 'as stored' object.
6.  Require that the serialized form of the object include sufficient
information for extracting the index key set without having to deserialize
the entire object

I'm wondering if there are others.

#1 is most definitely not palatable - a tool like jdbm should not force
users to completely change their coding style or class hierarchies.
#2 works (it's what I'm doing now), but it is inneficient because we have to
deserialize the key set every time any record is fetched from the store.  If
also requires that a lot of data be kept in memory for records that probably
won't ever be updated
#3 doubles the size of the indexes (the keys must be stored twice - once in
the index and once in the reverse index)
#4 requires a significant amount of cooperation between the index and the
reverse index - but would be more space efficient than #3
#5 will slow down inserts and updates (we basically have to deserialize the
stored byte stream before we can serialize the new version of the object),
but requires no additional space in the db, and does not impact fetch
performance
#6 requires the user to change their coding practices, but only in the
serializer, so it won't impact the core of the user's design.  This option
completely precludes the use of default java serialization.


I'm quite interested to see if there are any other options that you guys can
think of...


- K


 

>Kevin,

I would rather keep the existing features and promote the design to a b-link
tree.

I actually have a use case for generalized values for the btrees.  In order
to improve read performance over indexed statements some of the attributes
of the statement are redundantly persisted in the btree values.

In terms of an indexing and constraint system, I am fine with that as a
feature but I see it as layered on, separable and not critical path for a
jdbm2 initial release.  Plus I just don't see how you are going to get that
without one thing or another that you seem to find distasteful.

-bryan

<


<
    
------------------------------------------------------- This SF.Net email is
sponsored by xPML, a groundbreaking scripting language that extends
applications into web and mobile media. Attend the live webcast and join the
prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________ Jdbm-developer mailing list
Jdbm-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/jdbm-developer


<
<
<

 

------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ Jdbm-developer mailing list Jdbm-developer@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/jdbm-developer