From: Thompson, B. B. <BRY...@sa...> - 2006-04-14 13:15:23
|
Kevin, The existing JDBM "indexing" support perfectly fits my needs, except where it is a little feature sparse. The missing features are support for concurrency, support for traversal under concurrent modification, and efficient tracking of the #of keys in a key range. I think that all we need to do is create a simple API for people who want ISAM style access [1] to their data. -bryan [1] http://en.wikipedia.org/wiki/ISAM <http://en.wikipedia.org/wiki/ISAM> _____ From: jdb...@li... [mailto:jdb...@li...] On Behalf Of Kevin Day Sent: Thursday, April 13, 2006 8:03 PM To: JDBM Developer listserv Subject: re[14]: [Jdbm-developer] primary and secondary indexes in the record manager Bryan- Right - I view the work you are doing as something that should layered on top of an effective indexing sub-system, and only involved in automatically firing the recman.update() call in response to changes in the objects. Having a sub-system devoted to index management greatly simplifies this. WIth an indexing sub-system, there is a strong separation of concerns between what actually defines an object change (events), and what the effect of that change should actually be (consistency checking). If one of my projects requires complete transparency, then I'll probably use something like BCE (probably via AspectJ) - but the majority of my work is in web and batched transactions, where the interaction with the datastore tends to be transactional in nature - you get stuff out of the database, you do stuff with it, then you tell the database that things have changed. Lots of really tiny iterations. Anyway, in traditional RDBMSes, and in the vast majority of OODBMSes, indexing is part of the underlying feature set - and I suspect that there is a large, untapped set of users who would love a nice embeddable OODBMS, but the simple key/value pairing currently available in jdbm just doesn't meet their needs. As for whether indexing could ever be done correctly by just tacking it on, I'm not so sure - there is a reason that production database solutions provide built in indexing capability - users want it :-) I also suspect that there are important design and architectural reasons, and that is why I'm being so insistent on at least figuring out *how* we can properly do this - doing it correctly may impact architectural design decisions. I'd rather come up with a workable plan and determine that it won't impact the architecture than find out that we can't do it correctly 6 months from now because of decisions we made today. We'll pick this up next week, I'm sure :-) - K > Hum. I can see that, and I imagine that some people are looking for this. And now I understand why we appear to be worlds apart on this issue. I never use jdbm without a weak reference cache that canonicalizes references and I never use it as a cursor system, but as coherent persistence of data and index state which is synchronized with data state. Conceptually, I am more inclined to either generic persistent objects or native persistence of java objects using base classes, event notification, or reflection and BCE to make it work. Perhaps it makes sense to factor out a jdbm2 package for APIs supporting a cursor-oriented system? This is not really relevant to any of my interests in jdbm2 (concurrency, fast transaction processing, long transaction processing, extensible framework), but I could see a fairly trivial framework layer which provided suitable APIs for cursor oriented applications. -bryan From: jdb...@li... <mailto:jdb...@li...> [mailto:jdb...@li...] <mailto:jdb...@li...> On Behalf Of Kevin Day Sent: Wednesday, April 12, 2006 5:06 PM To: JDBM Developer listserv Subject: re[12]: [Jdbm-developer] primary and secondary indexes in the record manager Bryan- Sure - it's no different than calling fetch() in the general case with no object caching. With no object cache, a call to fetch() returns an object without un-updated changes reflected. This is actually very consistent with cursor access to most database systems. If you make changes to a record and run a query prior to initiating an update, the changes are not reflected in the search results. I'd much rather have that behavior than have to include database code in my classes. If I absolutely needed the index changes to be reflected regardless of whether update was called, then I'd be looking at something like aspectJ to inject index update events (and I'd have the woven code calling update). - K > Kevin, If you do not update the indices immmediately within the scope of the tx in which the attribute was modified then the indices become stale with respect to the runtime data. Is that acceptable? -bryan -----Original Message----- From: jdb...@li... <mailto:jdb...@li...> on behalf of Kevin Day Sent: Wed 4/12/2006 2:33 PM To: JDBM Developer listserv Subject: re[10]: [Jdbm-developer] primary and secondary indexes in the record manager Bryan- My approach to the indeces question is that the indexes are part of the 'updated' state of the database. If an update call hasn't been made, then the changes caused by that update call are not reflected in the index. This is consistent with cursor based system behavior. Forcing a user to modify their core code is certainly the easiest thing to do - but it just doesn't fly in many, many development scenarios - it severely violates IOC, makes testing much harder, etc... EJBs went down this path, and there is a massive developer backlash against it right now. If immediate index update notification is required, then using aspects and configuring a set of method calls that result in index updates could do it - but if we go down that path, then I would challenge the use of the update() call being part of the recman api at all. I think that having changes be marked by an update() call is a reasonable trade-off in the design of the recman, and having that be the trigger to update indexes is probably also appropriate. - K >Kevin, In terms of the public APIs for jdbm, I agree that they should offer simplicity for the "common man" who just needs persistence and indices. The current API is well suited to this, and I would be inclined to make it even more bear bones. The real trick here is going to be devising the central APIs so that we partition the different areas of concern in a flexible and effective manner. Creating a concrete class encompassing a good general solution for access to persistent records and indices will not be hard, but some applications will want to get under the hood and I want to support that too. With regard to the question of "aware" indices, the case that is the hardest to support is when a user modifies an indexed field on a persistent object. Unless you require property event notification for indexable attributes, this is always going to go unnoticed and that will result in incoherent indices. What about requiring people to implement the appropriate property change event mechanism. That way you get the old and new property value in the event. That is all you really need. -bryan ________________________________________ From: <mailto:jdb...@li...> <mailto:jdb...@li...> jdb...@li... <mailto:jdb...@li...> <mailto:jdb...@li...> <mailto:jdb...@li...> [mailto:jdb...@li...] <mailto:jdb...@li...> On Behalf Of Kevin Day Sent: Wednesday, April 12, 2006 1:35 PM To: JDBM Developer listserv Subject: re[8]: [Jdbm-developer] primary and secondary indexes in the record manager Bryan- Let's take a crack at how to handle self-concurrency managed objects. We can not use the regular recman api for working with these objects (BTree or BPage), because the recman will cause the object to interact with the standard concurrency control mechanism (MVCC coupled with either 2PL or opportunistic). From an architecture perspective, do we need to have the recman expose an advanced api for allowing direct access to the logical record store (bypassing the locking sub-system and version manager)? My preference would be to keep the recman API for regular users as simple as possible... As for managing indexes, I certainly see ways of doing it (heck, I'm doing it now in my own code) - the point of the discussion was to talk about what options are available to us. The options I currently see are: 1. Require user to add behavior to their objects to support triggering of index updates 2. Capture the index key set of each record when it is fetched, and cache it in the record cache 3. Use a reverse index for mapping recids to the index key set 4. Use a reverse index for mapping recids to the index page and slot 5. During update, deserialize the source object (get an 'as stored' object vs the 'as changed' object that resides in the cache at update time), then capture the key index set based on the 'as stored' object. 6. Require that the serialized form of the object include sufficient information for extracting the index key set without having to deserialize the entire object I'm wondering if there are others. #1 is most definitely not palatable - a tool like jdbm should not force users to completely change their coding style or class hierarchies. #2 works (it's what I'm doing now), but it is inneficient because we have to deserialize the key set every time any record is fetched from the store. If also requires that a lot of data be kept in memory for records that probably won't ever be updated #3 doubles the size of the indexes (the keys must be stored twice - once in the index and once in the reverse index) #4 requires a significant amount of cooperation between the index and the reverse index - but would be more space efficient than #3 #5 will slow down inserts and updates (we basically have to deserialize the stored byte stream before we can serialize the new version of the object), but requires no additional space in the db, and does not impact fetch performance #6 requires the user to change their coding practices, but only in the serializer, so it won't impact the core of the user's design. This option completely precludes the use of default java serialization. I'm quite interested to see if there are any other options that you guys can think of... - K >Kevin, I would rather keep the existing features and promote the design to a b-link tree. I actually have a use case for generalized values for the btrees. In order to improve read performance over indexed statements some of the attributes of the statement are redundantly persisted in the btree values. In terms of an indexing and constraint system, I am fine with that as a feature but I see it as layered on, separable and not critical path for a jdbm2 initial release. Plus I just don't see how you are going to get that without one thing or another that you seem to find distasteful. -bryan < < ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk <http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642> &kid=110944&bid=241720&dat=121642 _______________________________________________ Jdbm-developer mailing list Jdb...@li... <mailto:Jdb...@li...> https://lists.sourceforge.net/lists/listinfo/jdbm-developer <https://lists.sourceforge.net/lists/listinfo/jdbm-developer> < < ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ Jdbm-developer mailing list Jdb...@li... https://lists.sourceforge.net/lists/listinfo/jdbm-developer |
From: Kevin D. <ke...@tr...> - 2006-04-14 23:48:14
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD> <STYLE type=text/css> P, UL, OL, DL, DIR, MENU, PRE { margin: 0 auto;}</STYLE> <META content="MSHTML 6.00.2900.2802" name=GENERATOR></HEAD> <BODY leftMargin=1 topMargin=1 rightMargin=1><FONT face=Tahoma> <DIV><FONT face=Arial size=2>Bryan-</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2></FONT></DIV> <DIV><FONT face=Arial size=2>Bingo. I suspect that many users will fall into this category. I take it from your comments that you don't use indexes at all in your system (or, more accurately, the indexes are implicit in your data structures themselves).</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>This API must be backed by a solid system for automatically keeping indexes in sync with the state of the record manager, and I want to make sure that there aren't any design decisions that would prevent this from being successful. By my thinking, the critical aspect of such a system is the ability to identify the index keys that are associated with any record. In a field oriented system, this is trivial, but an object oriented system presents significant challenges...</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>I'm leaning towards a reverse index, keyed by recid... each index key will have to be stored twice, but maybe that's ok...</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>- K</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2> </FONT> <TABLE> <TBODY> <TR> <TD width=1 bgColor=blue><FONT face=Arial size=2></FONT></TD> <TD><FONT face=Arial size=2><FONT color=blue>> <BR>Kevin, <BR> <BR>The existing JDBM indexing support perfectly fits my needs, except where it is a little feature sparse. The missing features are support for concurrency, support for traversal under concurrent modification, and efficient tracking of the#of keys in a key range. <BR> <BR>I think that all we need to do is create a simple API for people who want ISAM style access [1] to their data. <BR> <BR>-bryan <BR> <BR>[1] <A href="http://en.wikipedia.org/wiki/ISAM"><FONT color=#0000ff>http://en.wikipedia.org/wiki/ISAM</FONT></A> <BR> <BR><BR><BR>From: <A href="mailto:jdb...@li..."><FONT color=#0000ff>jdb...@li...</FONT></A> <A href="mailto:jdb...@li..."><FONT color=#0000ff>[mailto:jdb...@li...]</FONT></A> On Behalf Of Kevin Day<BR>Sent: Thursday, April 13, 2006 8:03 PM<BR>To: JDBM Developer listserv<BR>Subject: re[14]: [Jdbm-developer] primary and secondary indexes in the record manager <BR> <BR><BR>Bryan- <BR><BR> <BR><BR>Right - I view the work you are doing as something that should layered on top of an effective indexing sub-system, and only involved in automatically firing the recman.update() call in response to changes in the objects. Having a sub-system devoted to index management greatly simplifies this. WIth an indexing sub-system, there is a strong separation of concerns between what actually defines an object change (events), and what the effect of that change should actually be (consistency checking). <BR><BR> <BR><BR>If one of my projects requires complete transparency, then I'll probably use something like BCE (probably via AspectJ) - but the majority of my work is in web and batched transactions, where the interaction with the datastore tends to betransactional in nature - you get stuff out of the database, you do stuff with it, then you tell the database that things have changed. Lots of really tiny iterations. <BR><BR> <BR><BR>Anyway, in traditional RDBMSes, and in the vast majority of OODBMSes, indexing is part of the underlying feature set - and I suspect that there is a large, untapped set of users who would love a nice embeddable OODBMS, but the simple key/value pairing currently available in jdbm just doesn't meet their needs. <BR><BR> <BR><BR>As for whether indexing could ever be done correctly by just tacking it on, I'm not so sure - there is a reason that production database solutions provide built in indexing capability - users want it :-) I also suspect that there are important design and architectural reasons, and that is why I'm being so insistent on at least figuring out *how* we can properly do this - doing it correctly may impact architectural design decisions. I'd rather come up with a workable plan and determine that it won't impact the architecture than find out that we can't do it correctly 6 months from now because of decisions we made today. <BR><BR> <BR><BR>We'll pick this up next week, I'm sure :-) <BR><BR> <BR><BR>- K <BR><BR> <BR><BR> <BR><BR> <BR><BR> <BR>> <BR>Hum. I can see that, and I imagine that some people are looking for this. And now I understand why we appear to be worlds apart on this issue. I never use jdbm without a weak reference cache that canonicalizes references and I never use it as a cursor system, but as coherent persistence of data and index state which is synchronized with data state. Conceptually, I am more inclined to either generic persistent objects or native persistence of java objects using base classes, event notification, or reflection and BCE to make it work. <BR> <BR>Perhaps it makes sense to factor out a jdbm2 package for APIs supporting a cursor-oriented system? This is not really relevant to any of my interests in jdbm2 (concurrency, fast transaction processing, long transaction processing, extensible framework), but I could see a fairly trivial framework layer which provided suitable APIs for cursor oriented applications. <BR> <BR>-bryan <BR> <BR><BR><BR>From: <A href="mailto:jdb...@li..."><FONT color=#0000ff>jdb...@li...</FONT></A> <A href="mailto:jdb...@li..."><FONT color=#0000ff>[mailto:jdb...@li...]</FONT></A> On Behalf Of Kevin Day<BR>Sent: Wednesday, April 12, 2006 5:06 PM<BR>To: JDBM Developer listserv<BR>Subject: re[12]: [Jdbm-developer] primary and secondary indexes in the record manager <BR> <BR><BR>Bryan- <BR><BR> <BR><BR>Sure - it's no different than calling fetch() in the general case with no object caching. With no object cache, a call to fetch() returns an object without un-updated changes reflected. <BR><BR> <BR><BR>This is actually very consistent with cursor access to most database systems. If you make changes to a record and run a query prior to initiating an update, the changes are not reflected in the search results. <BR><BR> <BR><BR>I'd much rather have that behavior than have to include database code in my classes. If I absolutely needed the index changes to be reflected regardless of whether update was called, then I'd be looking at something like aspectJ to inject index update events (and I'd have the woven code calling update). <BR><BR> <BR><BR>- K <BR><BR> <BR><BR> <BR>>Kevin,<BR><BR>If you do not update the indices immmediately within the scope of the tx in<BR>which the attribute was modified then the indices become stale with respect<BR>to the runtime data. Is that acceptable?<BR><BR>-bryan<BR><BR><BR>-----Original Message-----<BR>From: <A href="mailto:jdb...@li..."><FONT color=#0000ff>jdb...@li...</FONT></A> on behalf of Kevin Day<BR>Sent: Wed 4/12/2006 2:33 PM<BR>To: JDBM Developer listserv<BR>Subject: re[10]: [Jdbm-developer] primary and secondary indexes in the<BR>record manager<BR><BR>Bryan-<BR><BR>My approach to the indeces question is that the indexes are part of the<BR>'updated' state of the database. If an update call hasn't been made, then<BR>the changes caused by that update call are not reflected in the index. This<BR>is consistent with cursor based system behavior.<BR><BR>Forcing a user to modify their core code is certainly the easiest thing to<BR>do - but it just doesn't fly in many, many development scenarios - it<BR>severely violates IOC, makes testing much harder, etc... EJBs went down<BR>this path, and there is a massive developer backlash against it right now.<BR><BR>If immediate index update notification is required, then using aspects and<BR>configuring a set of method calls that result in index updates could do it -<BR>but if we go down that path, then I would challenge the use of the update()<BR>call being part of the recman api at all. I think that having changes be<BR>marked by an update() call is a reasonable trade-off in the design of the<BR>recman, and having that be the trigger to update indexes is probably also<BR>appropriate.<BR><BR>- K<BR> <BR> >Kevin,<BR><BR>In terms of the public APIs for jdbm, I agree that they should offer<BR>simplicity for the "common man" who just needs persistence and indices. The<BR>current API is well suited to this, and I would be inclined to make it even<BR>more bear bones. The real trick here is going to be devising the central<BR>APIs so that we partition the different areas of concern in a flexible and<BR>effective manner. Creating a concrete class encompassing a good general<BR>solution for access to persistent records and indices will not be hard, but<BR>some applications will want to get under the hood and I want to support that<BR>too.<BR><BR>With regard to the question of "aware" indices, the case that is the hardest<BR>to support is when a user modifies an indexed field on a persistent object.<BR>Unless you require property event notification for indexable attributes,<BR>this is always going to go unnoticed and that will result in incoherent<BR>indices. What about requiring people to implement the appropriate property<BR>change event mechanism. That way you get the old and new property value in<BR>the event. That is all you really need.<BR><BR>-bryan <BR><BR>________________________________________<BR>From: <A href="mailto:jdb...@li..."><FONT color=#0000ff><mailto:jdb...@li...></FONT></A><BR><A href="mailto:jdb...@li..."><FONT color=#0000ff>jdb...@li...</FONT></A><BR><A href="mailto:jdb...@li..."><FONT color=#0000ff><mailto:jdb...@li...></FONT></A><BR><A href="mailto:jdb...@li..."><FONT color=#0000ff>[mailto:jdb...@li...]</FONT></A> On Behalf Of Kevin Day<BR>Sent: Wednesday, April 12, 2006 1:35 PM<BR>To: JDBM Developer listserv<BR>Subject: re[8]: [Jdbm-developer] primary and secondary indexes in the record<BR>manager<BR><BR>Bryan-<BR><BR>Let's take a crack at how to handle self-concurrency managed objects. We<BR>can not use the regular recman api for working with these objects (BTree or<BR>BPage), because the recman will cause the object to interact with the<BR>standard concurrency control mechanism (MVCC coupled with either 2PL or<BR>opportunistic).<BR><BR>From an architecture perspective, do we need to have the recman expose an<BR>advanced api for allowing direct access to the logical record store<BR>(bypassing the locking sub-system and version manager)?<BR><BR>My preference would be to keep the recman API for regular users as simple as<BR>possible...<BR><BR><BR><BR><BR>As for managing indexes, I certainly see ways of doing it (heck, I'm doing<BR>it now in my own code) - the point of the discussion was to talk about what<BR>options are available to us. The options I currently see are:<BR><BR>1. Require user to add behavior to their objects to support triggering of<BR>index updates<BR>2. Capture the index key set of each record when it is fetched, and cache<BR>it in the record cache<BR>3. Use a reverse index for mapping recids to the index key set<BR>4. Use a reverse index for mapping recids to the index page and slot<BR>5. During update, deserialize the source object (get an 'as stored' object<BR>vs the 'as changed' object that resides in the cache at update time), then<BR>capture the key index set based on the 'as stored' object.<BR>6. Require that the serialized form of the object include sufficient<BR>information for extracting the index key set without having to deserialize<BR>the entire object<BR><BR>I'm wondering if there are others.<BR><BR>#1 is most definitely not palatable - a tool like jdbm should not force<BR>users to completely change their coding style or class hierarchies.<BR>#2 works (it's what I'm doing now), but it is inneficient because we have to<BR>deserialize the key set every time any record is fetched from the store. If<BR>also requires that a lot of data be kept in memory for records that probably<BR>won't ever be updated<BR>#3 doubles the size of the indexes (the keys must be stored twice - once in<BR>the index and once in the reverse index)<BR>#4 requires a significant amount of cooperation between the index and the<BR>reverse index - but would be more space efficient than #3<BR>#5 will slow down inserts and updates (we basically have to deserialize the<BR>stored byte stream before we can serialize the new version of the object),<BR>but requires no additional space in the db, and does not impact fetch<BR>performance<BR>#6 requires the user to change their coding practices, but only in the<BR>serializer, so it won't impact the core of the user's design. This option<BR>completely precludes the use of default java serialization.<BR><BR><BR>I'm quite interested to see if there are any other options that you guys can<BR>think of...<BR><BR><BR>- K<BR><BR><BR> <BR><BR>>Kevin,<BR><BR>I would rather keep the existing features and promote the design to a b-link<BR>tree.<BR><BR>I actually have a use case for generalized values for the btrees. In order<BR>to improve read performance over indexed statements some of the attributes<BR>of the statement are redundantly persisted in the btree values.<BR><BR>In terms of an indexing and constraint system, I am fine with that as a<BR>feature but I see it as layered on, separable and not critical path for a<BR>jdbm2 initial release. Plus I just don't see how you are going to get that<BR>without one thing or another that you seem to find distasteful.<BR><BR>-bryan<BR><BR><<BR><BR><BR><<BR> <BR>------------------------------------------------------- This SF.Net email is<BR>sponsored by xPML, a groundbreaking scripting language that extends<BR>applications into web and mobile media. Attend the live webcast and join the<BR>prime developer group breaking into this new coding territory!<BR><A href="http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642"><FONT color=#0000ff>http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642</FONT></A><BR>_______________________________________________ Jdbm-developer mailing list<BR><A href="mailto:Jdb...@li..."><FONT color=#0000ff>Jdb...@li...</FONT></A><BR><A href="https://lists.sourceforge.net/lists/listinfo/jdbm-developer"><FONT color=#0000ff>https://lists.sourceforge.net/lists/listinfo/jdbm-developer</FONT></A> <BR><BR><BR><<BR><<BR><<BR></FONT></FONT></TD></TR></TBODY></TABLE></DIV></FONT></BODY></HTML> |