From: Kevin D. <ke...@tr...> - 2008-11-12 16:08:46
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD> <STYLE type=text/css> P, UL, OL, DL, DIR, MENU, PRE { margin: 0 auto;}</STYLE> <META content="MSHTML 6.00.2900.3059" name=GENERATOR></HEAD> <BODY leftMargin=1 topMargin=1 rightMargin=1><FONT face=Arial size=2> <DIV>I'd be happy to review the patch, yes. Please definitely run the full unit test suite (although I'm not sure we have a good test for hitting the record manager from multiple threads - Alex - are you aware of any such test?).</DIV> <DIV> </DIV> <DIV> </DIV> <DIV>FYI - my original thinking was that this would change the API semantics b/c in order to remove the contention, you would have to remove the synchronized keyword on the method calls, thus forcing users to expliclty lock/unlock the ReadWriteLock (this would be required for achieving better transactional control). But thinking about it, I suspect you are proposing managinge the ReadWriteLock from inside the record manager calls themselves (and not exposing the lock object to the user).</DIV> <DIV> </DIV> <DIV>You will still need to remove the synchronized keyword, but now that I'm thinking it through, I can see how that would maintain API (if not strict functional) compatability.</DIV> <DIV> </DIV> <DIV>I do still have some concerns about this change causing race conditions in external code to surface (i.e. where the synchronized methds for reads and writes were preventing an existing race condition from actually occuring). This isn't a reason to not make the change - just something to be aware of.</DIV> <DIV> </DIV> <DIV>FYI - I still think you are going to run into contention on the BTree data structures. If you do decide to get into using readwrite locks for that structure, be sure you check with Bryan or myself - the BTree structure has an inherent hierarchical lock structure, and you'll want to take advantage of that instead of just implementing the locks at the top level of the tree.</DIV> <DIV> </DIV> <DIV>- K</DIV> <DIV> </DIV> <DIV>Kevin Day<BR>Trumpet, Inc.<BR><BR></DIV> <DIV> </DIV></FONT> <DIV style="FONT-SIZE: x-small; FONT-FAMILY: Tahoma"> <DIV>----------------------- <B>Original Message</B> -----------------------</DIV> <DIV> </DIV> <DIV><B>From:</B> "Elias Ross" <A href="mailto:ge...@no..."><FONT color=#0000ff><ge...@no...></FONT></A></DIV> <DIV><B>To:</B> "Kevin Day" <A href="mailto:ke...@tr..."><FONT color=#0000ff><ke...@tr...></FONT></A></DIV> <DIV><B>Cc:</B> </DIV> <DIV><B>Date:</B> Tue, 11 Nov 2008 23:24:42 -0800</DIV> <DIV><B>Subject: <U>Re: [Jdbm-developer] "Fine grained" Read/Write locking</U></B></DIV> <DIV> </DIV></DIV><FONT face=Tahoma size=2> <DIV>Not sure how it would affect the API, unless users synchronize on<BR>RecordManager externally, e.g. for record iteration purpose.<BR><BR>I'll write a program and give it a try and let you know. Would you at<BR>least have time to apply my patch?<BR><BR>On Tue, Nov 11, 2008 at 1:58 PM, Kevin Day <A href="mailto:ke...@tr..."><FONT color=#0000ff><ke...@tr...></FONT></A> wrote:<BR>> Possibly... I was actually thinking about this the other day (though I'm<BR>> out of time to work on it). The problem is that the synchronized calls on<BR>> recordmanager are there to protect the integrity of the cache and disk<BR>> files. They do not protect the integrity of the data structures<BR>> themselves. I suspect that the bulk of lock contention in jdbm is occuring<BR>> at the data structure level (reads and writes on the BTrees for example)<BR>> than at the record level. Profiling, of course, could prove me wrong. <BR>><BR>> Note that this would probably impact the API, so it would have to be very<BR>> carefully thought through.<BR>><BR>> - K<BR>><BR>> Kevin Day<BR>><BR>> ----------------------- Original Message -----------------------<BR>><BR>> From: "Elias Ross" <A href="mailto:ge...@no..."><FONT color=#0000ff><ge...@no...></FONT></A><BR>> To: <A href="mailto:jdb...@li..."><FONT color=#0000ff>jdb...@li...</FONT></A><BR>> Cc:<BR>> Date: Fri, 7 Nov 2008 13:01:33 -0800<BR>> Subject: [Jdbm-developer] "Fine grained" Read/Write locking<BR>><BR>> Would the project be interested in a contribution where the<BR>> "synchronized" blocks were replaced with more fine-grained locking,<BR>> i.e. java.util.concurrent.ReentrantReadWriteLock ?<BR>><BR>> It'd make JDBM JDK 1.5 only.<BR>><BR>> It would probably help performance quite a lot for the common many-rea ders<BR>> case.<BR>><BR>> -------------------------------------------------------------------------<BR>> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge<BR>> Build the coolest Linux based applications with Moblin SDK & win great<BR>> prizes<BR>> Grand prize is a trip for two to an Open Source event anywhere in the world<BR>> <A href="http://moblin-contest.org/redirect.php?banner_id=100&url=/"><FONT color=#0000ff>http://moblin-contest.org/redirect.php?banner_id=100&url=/</FONT></A><BR>> _______________________________________________<BR>> Jdbm-developer mailing list<BR>> <A href="mailto:Jdbm-developer@lists.s"><FONT color=#0000ff>Jdbm-developer@lists.s</FONT></A> ourceforge.net<BR>> <A href="https://lists.sourceforge.net/lists/listinfo/jdbm-developer"><FONT color=#0000ff>https://lists.sourceforge.net/lists/listinfo/jdbm-developer</FONT></A><BR>><BR>><BR>> --<BR>> This message has been sca nned for viruses and<BR>> dangerous content by MailScanner, and is<BR>> believed to be clean.<BR><BR></DIV></FONT></BODY></HTML> |
From: Elias R. <ge...@no...> - 2008-11-12 19:08:07
|
On Wed, Nov 12, 2008 at 8:08 AM, Kevin Day <ke...@tr...> wrote: > I'd be happy to review the patch, yes. Please definitely run the full unit > test suite (although I'm not sure we have a good test for hitting the record > manager from multiple threads - Alex - are you aware of any such test?). I wrote a test class to try out testing from multiple threads and at least I can use it as a basis for performance analysis. It's not a very good test at this point, to be honest, but tests multiple readers and writers firing off simultaneously. There's at least one "big win" performance fix I found: Have the serialization step work without a lock, and then only lock when accessing the storage APIs. Something I soon discovered, though, was read-write locks don't really work since read operations require as much state change as writes. In terms of cache handling, there's a lot of state that gets updated for every read. Thus, ultimately every class involved needs fine grained, independent locking to improve concurrency. Which requires locking for internal data structures, which is easy since you simply write "synchronized" wrappers for these. But in the aggregate classes themselves require a bit of thought on how to make certain state changes atomic. I'm not 100% sure I understand all of these atomic state change invariants. > I do still have some concerns about this change causing race conditions in > external code to surface (i.e. where the synchronized methds for reads and > writes were preventing an existing race condition from actually occuring). > This isn't a reason to not make the change - just something to be aware of. Agreed, and this is a compatibility concern. Still, improving concurrency is something that needs to be addressed at some point. What comes to mind as a probable concern is there will need to be some way to "lock" when iterating the tree to prevent users from writing new data in the middle of iteration. Or this has to be solved in some clever way. > FYI - I still think you are going to run into contention on the BTree data > structures. If you do decide to get into using readwrite locks for that > structure, be sure you check with Bryan or myself - the BTree structure has > an inherent hierarchical lock structure, and you'll want to take advantage > of that instead of just implementing the locks at the top level of the tree. I've been concentrating on RecordManager at the moment, but I will get to BTree eventually. If there's a way to store my changes, say on a branch, I'd be happy to do so. |
From: Kevin D. <ke...@tr...> - 2008-11-12 20:41:37
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD> <STYLE type=text/css> P, UL, OL, DL, DIR, MENU, PRE { margin: 0 auto;}</STYLE> <META content="MSHTML 6.00.2900.3059" name=GENERATOR></HEAD> <BODY leftMargin=1 topMargin=1 rightMargin=1><FONT face=Arial size=2> <DIV>I like pulling the serialization out of the lock block (this assumes, of course, that the serializers aren't stateful - most aren't - but some are, especially Bryan's enhanced serialization system).</DIV> <DIV> </DIV> <DIV>Yeah - cache does result in state change on reads... The current architecture really wasn't designed with concurrency in mind. The theoretical work we've done so far on this has focused on what it would take to add transactional behavior to the system, and use transactions as the mechanism for relieving contention. Your approach is a nice intermediate solution - if it can be made to work without substantially altering the architecture.</DIV> <DIV> </DIV> <DIV> </DIV> <DIV>As far as contention during BTree/data structure navigation is concerned, I think that the most concurrent solution is going to be to make the tree itself aware of the cursors that are navigating it, and to actually update the state of the cursors (TupleBrowser) as the tree state changes. This is not trivial, but is going to be very important if we are going to allow reads and writes to occur simultaneously. That said, it might be sufficient to protect the tree using a readwritelock to allow multiple read traversals but only a single write. The cursor could obtain and release the read lock - you just run into the problem of telling users that they have to do an explicit release on the cursor. You also have the challenge of how to handle when the thread wants to make a change during it's iteration of the tree (you still have to update the cursor location when tree state changes, even with the readwritelock in place!).<BR></DIV> <DIV>Hard problems, eh?</DIV> <DIV> </DIV> <DIV>- K</DIV> <DIV> </DIV></FONT> <DIV style="FONT-SIZE: x-small; FONT-FAMILY: Tahoma"> <DIV>----------------------- <B>Original Message</B> -----------------------</DIV> <DIV> </DIV> <DIV><B>From:</B> "Elias Ross" <A href="mailto:ge...@no..."><FONT color=#0000ff><ge...@no...></FONT></A></DIV> <DIV><B>To:</B> "Kevin Day" <A href="mailto:ke...@tr..."><FONT color=#0000ff><ke...@tr...></FONT></A></DIV> <DIV><B>Cc:</B> "JDBM Developer listserv" <A href="mailto:jdb...@li..."><FONT color=#0000ff><jdb...@li...></FONT></A></DIV> <DIV><B>Date:</B> Wed, 12 Nov 2008 11:08:04 -0800</DIV> <DIV><B>Subject: <U>Re: [Jdbm-developer] "Fine grained" Read/Write locking</U></B></DIV> <DIV> </DIV></DIV><FONT face=Tahoma size=2> <DIV>On Wed, Nov 12, 2008 at 8:08 AM, Kevin Day <A href="mailto:ke...@tr..."><FONT color=#0000ff><ke...@tr...></FONT></A> wrote:<BR><BR>> I'd be happy to review the patch, yes. Please definitely run the full unit<BR>> test suite (although I'm not sure we have a good test for hitting the record<BR>> manager from multiple threads - Alex - are you aware of any such test?).<BR><BR>I wrote a test class to try out testing from multiple threads and at<BR>least I can use it as a basis for performance analysis. It's not a<BR>very good test at this point, to be honest, but tests multiple readers<BR>and writers firing off simultaneously.<BR><BR>There's at least one "big win" performance fix I found: Have the<BR>serialization step work without a lock, and then only lock when<BR>accessing the storage APIs.<BR><BR>Something I soon discovered, though, was read-write locks don't really<BR>work since read operations require as much state change as writes . In<BR>terms of cache handling, there's a lot of state that gets updated for<BR>every read.<BR><BR>Thus, ultimately every class involved needs fine grained, independent<BR>locking to improve concurrency. Which requires locking for internal<BR>data structures, which is easy since you simply write "synchronized"<BR>wrappers for these. But in the aggregate classes themselves require a<BR>bit of thought on how to make certain state changes atomic. I'm not<BR>100% sure I understand all of these atomic state change invariants.<BR><BR>> I do still have some concerns about this change causing race conditions in<BR>> external code to surface (i.e. where the synchronized methds for reads and<BR>> writes were preventing an existing race condition from actually occuring).<BR>> This isn't a reason to not make the change - just something to be aware of.<BR><BR>Agreed, and this is a compatibility concern. Still, improving<BR>concurrency is something that needs to be addressed at some point.<BR><BR>What comes to mind as a probable concern is there will need to be some<BR>way to "lock" when iterating the tree to prevent users from writing<BR>new data in the middle of iteration. Or this has to be solved in some<BR>clever way.<BR><BR>> FYI - I still think you are going to run into contention on the BTree data<BR>> structures. If you do decide to get into using readwrite locks for that<BR>> structure, be sure you check with Bryan or myself - the BTree structure has<BR>> an inherent hierarchical lock structure, and you'll want to take advantage<BR>> of that instead of just implementing the locks at the top level of the tree.<BR><BR>I've been concentrating on RecordManager at the moment, but I will get<BR>to BTree eventually.<BR><BR>If there's a way to store my changes, say on a branch, I'd be happy to do so.<BR><BR></DIV></FONT></BODY></HTML> |
From: Elias R. <ge...@no...> - 2008-11-14 21:41:38
|
Re: https://sourceforge.net/tracker/index.php?func=detail&aid=2281809&group_id=4155&atid=304155 I updated the concurrent patch with a small fix to RecMan and changed BTree to use read-write locks and added an explicit close() method to the Browser to release the read lock. This is a significant API and compatibility change but probably one that needs to happen at some point. I see this exception occasionally: java.lang.IllegalArgumentException: Key not found: 16 at jdbm.btree.BPage.remove(BPage.java:548) at jdbm.btree.BPage.remove(BPage.java:561) at jdbm.btree.BTree.remove0(BTree.java:436) at jdbm.btree.BTree.remove(BTree.java:418) at jdbm.btree.ConcurrentTest.delete(ConcurrentTest.java:125) at jdbm.btree.ConcurrentTest.remove(ConcurrentTest.java:167) But this happens with the current release of JDBM as well, so I don't suppose it's a regression. I haven't measured any improvement with lots of concurrent read-write but I imagine concurrent reads are faster since they won't block each other exclusively. This is probably the most important area of improvement I made. To improve performance of mutations, some some of optimistic concurrency could be used, e.g. the appropriate BPage to modify is first found under a read lock, the version is noted. The read lock is released, then a write lock is obtained for that BPage, and then a version check is made if to see if the BPage was since modified. Then the modification occurs and the lock is released. |
From: Kevin D. <ke...@tr...> - 2008-11-15 00:34:13
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD> <STYLE type=text/css> P, UL, OL, DL, DIR, MENU, PRE { margin: 0 auto;}</STYLE> <META content="MSHTML 6.00.2900.3059" name=GENERATOR></HEAD> <BODY leftMargin=1 topMargin=1 rightMargin=1><FONT face=Arial size=2> <DIV>Just a guess: If you are seeing the Key not found exception, are you maybe accessing the BTree in a non-thread safe way in your test (i.e. you have a list of keys that you are working with, you remove a key in one thread, but do not update the list of keys that you are working with before another thread uses the list).</DIV> <DIV> </DIV> <DIV>I guess the question is: Who removed key 16 from the tree, and did they do it without the knowledge of whatever thread is also doing another remove of the key 16 from the tree? Chances are pretty good that this is a test problem and not a BTree implementation issue (especially b/c you were seeing the problem even with the synchronized blocks in place).</DIV> <DIV> </DIV> <DIV> </DIV> <DIV>I think that it's going to be very important to demonstrate the performance advantages of these changes before we roll them into HEAD - especially with the case of the tuple browser (the current API does not forbid anyone from holding onto a tuple browser for a long period of time - if they do so, the browser may not be valid anymore - but it wouldn't lock the tree).</DIV> <DIV> </DIV> <DIV>If you can come up with some performance tests that we can use to show before and after behavior, I can get the test added to SVN, we can do some performance measurements with the current HEAD build, then perhaps create a branch and get performance metrics with the patch. Sound good?</DIV> <DIV> </DIV> <DIV>- K</DIV> <DIV> </DIV> <DIV><BR> </DIV> <DIV> </DIV></FONT> <DIV style="FONT-SIZE: x-small; FONT-FAMILY: Tahoma"> <DIV>----------------------- <B>Original Message</B> -----------------------</DIV> <DIV> </DIV> <DIV><B>From:</B> "Elias Ross" <A href="mailto:ge...@no..."><FONT color=#0000ff><ge...@no...></FONT></A></DIV> <DIV><B>To:</B> "Kevin Day" <A href="mailto:ke...@tr..."><FONT color=#0000ff><ke...@tr...>,</FONT></A> "JDBM Developer listserv" <A href="mailto:jdb...@li..."><FONT color=#0000ff><jdb...@li...></FONT></A></DIV> <DIV><B>Cc:</B> </DIV> <DIV><B>Date:</B> Fri, 14 Nov 2008 13:39:34 -0800</DIV> <DIV><B>Subject: <U>Re: [Jdbm-developer] "Fine grained" Read/Write locking</U></B></DIV> <DIV> </DIV></DIV><FONT face=Tahoma size=2> <DIV>Re: <A href="https://sourceforge.net/tracker/index.php?func=detail&aid=2281809&group_id=4155&atid=304155"><FONT color=#0000ff>https://sourceforge.net/tracker/index.php?func=detail&aid=2281809&group_id=4155&atid=304155</FONT></A><BR><BR>I updated the concurrent patch with a small fix to RecMan and changed<BR>BTree to use read-write locks and added an explicit close() method to<BR>the Browser to release the read lock. This is a significant API and<BR>compatibility change but probably one that needs to happen at some<BR>point.<BR><BR>I see this exception occasionally:<BR>java.lang.IllegalArgumentException: Key not found: 16<BR> at jdbm.btree.BPage.remove(BPage.java:548)<BR> at jdbm.btree.BPage.remove(BPage.java:561)<BR> at jdbm.btree.BTree.remove0(BTree.java:436)<BR> at jdbm.btree.BTree.remove(BTree.java:418)<BR> at jdbm.btree.Concurrent Test.delete(ConcurrentTest.java:125)<BR> at jdbm.btree.ConcurrentTest.remove(ConcurrentTest.java:167)<BR><BR>But this happens with the current release of JDBM as well, so I don't<BR>suppose it's a regression.<BR><BR>I haven't measured any improvement with lots of concurrent read-write<BR>but I imagine concurrent reads are faster since they won't block each<BR>other exclusively. This is probably the most important area of<BR>improvement I made.<BR><BR>To improve performance of mutations, some some of optimistic<BR>concurrency could be used, e.g. the appropriate BPage to modify is<BR>first found under a read lock, the version is noted. The read lock is<BR>released, then a write lock is obtained for that BPage, and then a<BR>version check is made if to see if the BPage was since modified. Then<BR>the modification occurs and the lock is released.<BR><BR>-------------------------------------------------------------------------<BR>This SF.Net email is sponso red by the Moblin Your Move Developer's challenge<BR>Build the coolest Linux based applications with Moblin SDK & win great prizes<BR>Grand prize is a trip for two to an Open Source event anywhere in the world<BR><A href="http://moblin-contest.org/redirect.php?banner_id=100&url=/"><FONT color=#0000ff>http://moblin-contest.org/redirect.php?banner_id=100&url=/</FONT></A><BR>_______________________________________________<BR>Jdbm-developer mailing list<BR><A href="mailto:Jdb...@li..."><FONT color=#0000ff>Jdb...@li...</FONT></A><BR><A href="https://lists.sourceforge.net/lists/listinfo/jdbm-developer"><FONT color=#0000ff>https://lists.sourceforge.net/lists/listinfo/jdbm-developer</FONT></A><BR><BR></DIV></FONT></BODY></HTML> |
From: Elias R. <ge...@no...> - 2008-11-15 05:26:53
|
On Fri, Nov 14, 2008 at 4:34 PM, Kevin Day <ke...@tr...> wrote: > I think that it's going to be very important to demonstrate the performance > advantages of these changes before we roll them into HEAD - especially with > the case of the tuple browser (the current API does not forbid anyone from > holding onto a tuple browser for a long period of time - if they do so, the > browser may not be valid anymore - but it wouldn't lock the tree). > I added a test that does purely reads. It runs in about 19 seconds with my changes, 24 seconds without, so about 25% faster. This is on a dual core system. Admittedly, there seems to be a bug that occurs occasionally, once in 20000-30000 reads. I still need to look at that. But there you go. >From an outsider's perspective, JDBM is a project with no significant development, with a single 1.0 release that's 3 years old. So I don't really feel like begging for inclusion of my changes. |
From: Kevin D. <ke...@tr...> - 2008-11-18 22:51:56
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD> <STYLE type=text/css> P, UL, OL, DL, DIR, MENU, PRE { margin: 0 auto;}</STYLE> <META content="MSHTML 6.00.2900.3059" name=GENERATOR></HEAD> <BODY leftMargin=1 topMargin=1 rightMargin=1><FONT face=Arial size=2> <DIV> <DIV>Right now, I haven't heard from the rest of the developers with their thoughts on this approach to locking the record manager and/or BTree. Hopefully we can kick some dust loose and get some thoughts and feedback.</DIV> <DIV> </DIV> <DIV>My opinion on this is that I like the idea of using the locks - a 25% improvement in a primarily read-oriented application is a good thing. I am not comfortable with actually making changes to the TupleBrowser. Another approach on TupleBrowser would be to introduce a new class called TransactionalTupleBrowser (or some other name) that would interact with the new locking semantics in the BTree to guarantee that while the browser is active, the tree will not be changed. I *especially* like pulling the serialization out of the sync blocks for stateless serializers.</DIV> <DIV> </DIV></DIV> <DIV> </DIV> <DIV>As an FYI regarding your question about the age of the 1.0 release, the jdbm project is considered to be 'feature complete' - it has met it's original mandate, and new bug identification is very close to 0. It is used in a lot of mature projects and systems, which is one reason that a change to API semantics is approached with some caution (though certainly not avoided - I actually really like the idea of introducing locks as an alternative to synchronization). As Bryan pointed out in another email, the cost of these locks is high (much more than an equivalent spin lock, for example) - so the change needs to be evaluated in the context of the improvement that it brings to the table. Your performance test helps to show that this change is indeed valid - not just theoretically beneficial.</DIV> <DIV> </DIV> <DIV>At this point, the project is involved in incremental improvement up to the point that a major rewrite can be considered (most likely with a change in API). Targets for the major rewrite (when we volunteers get enough time to work on it) are focused on adding multi-transaction and roll-back behavior - as well as addressing the concurrency issues with the current architecture.</DIV> <DIV> </DIV> <DIV>- K<BR></DIV> <DIV> </DIV></FONT> <DIV style="FONT-SIZE: x-small; FONT-FAMILY: Tahoma"> <DIV>----------------------- <B>Original Message</B> -----------------------</DIV> <DIV> </DIV> <DIV><B>From:</B> "Elias Ross" <A href="mailto:ge...@no..."><FONT color=#0000ff><ge...@no...></FONT></A></DIV> <DIV><B>To:</B> "Kevin Day" <A href="mailto:ke...@tr..."><FONT color=#0000ff><ke...@tr...></FONT></A></DIV> <DIV><B>Cc:</B> JDBM Developer listserv <A href="mailto:jdb...@li..."><FONT color=#0000ff><jdb...@li...></FONT></A></DIV> <DIV><B>Date:</B> Fri, 14 Nov 2008 21:26:47 -0800</DIV> <DIV><B>Subject: <U>Re: [Jdbm-developer] "Fine grained" Read/Write locking</U></B></DIV> <DIV> </DIV></DIV><FONT face=Tahoma size=2> <DIV>On Fri, Nov 14, 2008 at 4:34 PM, Kevin Day <A href="mailto:ke...@tr..."><FONT color=#0000ff><ke...@tr...></FONT></A> wrote:<BR><BR>> I think that it's going to be very important to demonstrate the performance<BR>> advantages of these changes before we roll them into HEAD - especially with<BR>> the case of the tuple browser (the current API does not forbid anyone from<BR>> holding onto a tuple browser for a long period of time - if they do so, the<BR>> browser may not be valid anymore - but it wouldn't lock the tree).<BR>><BR><BR>I added a test that does purely reads. It runs in about 19 seconds<BR>with my changes, 24 seconds without, so about 25% faster. This is on a<BR>dual core system. Admittedly, there seems to be a bug that occurs<BR>occasionally, once in 20000-30000 reads. I still need to look at that.<BR>But there you go.<BR><BR>>From an outsider's perspective, JDBM is a project with no significant<BR>development, with a single 1.0 release that's 3 years old. So I don't<BR>really feel like begging for inclusion of my changes.<BR><BR>-------------------------------------------------------------------------<BR>This SF.Net email is sponsored by the Moblin Your Move Developer's challenge<BR>Build the coolest Linux based applications with Moblin SDK & win great prizes<BR>Grand prize is a trip for two to an Open Source event anywhere in the world<BR><A href="http://moblin-contest.org/redirect.php?banner_id=100&url=/"><FONT color=#0000ff>http://moblin-contest.org/redirect.php?banner_id=100&url=/</FONT></A><BR>_______________________________________________<BR>Jdbm-developer mailing list<BR><A href="mailto:Jdb...@li..."><FONT color=#0000ff>Jdb...@li...</FONT></A><BR><A href="https://lists.sourceforge.net/lists/listinfo/jdbm-developer"><FONT color=#0000ff>https://lists.sourceforge.net/lists/listinfo/jdbm-developer</FONT></A><BR><BR></DIV></ FONT></BODY></HTML> |
From: Elias R. <ge...@no...> - 2008-11-19 18:24:28
|
On Tue, Nov 18, 2008 at 2:51 PM, Kevin Day <ke...@tr...> wrote: > At this point, the project is involved in incremental improvement up to the > point that a major rewrite can be considered (most likely with a change in > API). Targets for the major rewrite (when we volunteers get enough time to > work on it) are focused on adding multi-transaction and roll-back behavior - > as well as addressing the concurrency issues with the current architecture. Kevin, an interesting phenomenon I've observed is talented people never have enough time to work on something unless the barrier to entry is lowered. I'm guessing (not certain) that simply creating a 2.0 codestream and opening yourselves up for business will generate more progress than waiting for people to "free up." I'm not sure who's the "leader" of the JDBM group, but why not you lead, Kevin? Personally, you've done much to discourage me. Honestly, it'd be less effort if I just forked JDBM and did with it what I wanted. |
From: Kevin D. <ke...@tr...> - 2008-11-19 19:45:35
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD> <STYLE type=text/css> P, UL, OL, DL, DIR, MENU, PRE { margin: 0 auto;}</STYLE> <META content="MSHTML 6.00.2900.3059" name=GENERATOR></HEAD> <BODY leftMargin=1 topMargin=1 rightMargin=1><FONT face=Arial size=2> <DIV>Sorry to hear that - certainly not my intent. The reason that I'm not pushing 2.0 is that I have 6 commercial products that I actively develop, and I don't actually have a need for full blown transactional management in an embedded database for any of those projects. I *did* have the need for an efficient row level compression mechanism, which is why I developed it and added it to the system in a way that didn't impact other areas of the code base, which is why I was added to the development team. Right now, I have a need for heavy PDF processing, so the time that I do have to devot to open source projects is going to contributions to the iText project. These things move in cycles, and it would be great to have someone on board with an active interest in moving 2.0 forward.</DIV> <DIV> </DIV> <DIV>I think that the challenge you may be running into is that the change you are proposing is a good one (I don't think anyone will disagree with that) - but it impacts almost the entire code base. That means that having robust and reliable testing is critical to ensure that changes made don't break existing users of the library. This is made doubly difficult because it is hard to unit test concurrency. But by the same token, that is exactly why having the tests is so critically important. Is the objection that you have to writing the tests? If so, then I think you will run into some problems with this particular team of developers - we write tests first, then we write code.</DIV> <DIV> </DIV> <DIV> </DIV> <DIV>Anyway, I hope that the two hours I just spent performing a code review of your submission results in some good ideas for you.</DIV> <DIV> </DIV> <DIV>I'm all for lowering barriers to entry, but I will say that there are a *lot* of very complicated technical decisions that have to be made to ensure that the next design is solid. You can take a look at the archives of the developer listserv to get a feel for the issues that are being discussed, and if you have opinions on great ways to implement these kinds of changes, post them and get added to the team!</DIV> <DIV> </DIV> <DIV>- K</DIV> <DIV> </DIV></FONT> <DIV style="FONT-SIZE: x-small; FONT-FAMILY: Tahoma"> <DIV>----------------------- <B>Original Message</B> -----------------------</DIV> <DIV> </DIV> <DIV><B>From:</B> "Elias Ross" <A href="mailto:ge...@no..."><FONT color=#0000ff><ge...@no...></FONT></A></DIV> <DIV><B>To:</B> "Kevin Day" <A href="mailto:ke...@tr..."><FONT color=#0000ff><ke...@tr...></FONT></A></DIV> <DIV><B>Cc:</B> JDBM Developer listserv <A href="mailto:jdb...@li..."><FONT color=#0000ff><jdb...@li...></FONT></A></DIV> <DIV><B>Date:</B> Wed, 19 Nov 2008 10:24:20 -0800</DIV> <DIV><B>Subject: <U>Re: [Jdbm-developer] "Fine grained" Read/Write locking</U></B></DIV> <DIV> </DIV></DIV><FONT face=Tahoma size=2> <DIV>On Tue, Nov 18, 2008 at 2:51 PM, Kevin Day <A href="mailto:ke...@tr..."><FONT color=#0000ff><ke...@tr...></FONT></A> wrote:<BR><BR>> At this point, the project is involved in incremental improvement up to the<BR>> point that a major rewrite can be considered (most likely with a change in<BR>> API). Targets for the major rewrite (when we volunteers get enough time to<BR>> work on it) are focused on adding multi-transaction and roll-back behavior -<BR>> as well as addressing the concurrency issues with the current architecture.<BR><BR>Kevin, an interesting phenomenon I've observed is talented people<BR>never have enough time to work on something unless the barrier to<BR>entry is lowered.<BR><BR>I'm guessing (not certain) that simply creating a 2.0 codestream and<BR>opening yourselves up for business will generate more progress than<BR>waiting for people to "free up." I'm not sure who's the "leader" of<BR>the JDBM group, but why not you lead, Kevin?<BR><BR>Personally, you've done much to discourage me. Honestly, it'd be less<BR>effort if I just forked JDBM and did with it what I wanted.<BR><BR>-------------------------------------------------------------------------<BR>This SF.Net email is sponsored by the Moblin Your Move Developer's challenge<BR>Build the coolest Linux based applications with Moblin SDK & win great prizes<BR>Grand prize is a trip for two to an Open Source event anywhere in the world<BR><A href="http://moblin-contest.org/redirect.php?banner_id=100&url=/"><FONT color=#0000ff>http://moblin-contest.org/redirect.php?banner_id=100&url=/</FONT></A><BR>_______________________________________________<BR>Jdbm-developer mailing list<BR><A href="mailto:Jdb...@li..."><FONT color=#0000ff>Jdb...@li...</FONT></A><BR><A href="https://lists.sourceforge.net/lists/listinfo/jdbm-developer"><FONT color=#0000ff>https://lists.sourceforge.net/lis ts/listinfo/jdbm-developer</FONT></A><BR><BR></DIV></FONT></BODY></HTML> |
From: Elias R. <ge...@no...> - 2008-11-21 06:53:48
|
On Wed, Nov 19, 2008 at 11:45 AM, Kevin Day <ke...@tr...> wrote: > > This is made doubly difficult because it is > hard to unit test concurrency. But by the same token, that is exactly why > having the tests is so critically important. Is the objection that you have > to writing the tests? If so, then I think you will run into some problems > with this particular team of developers - we write tests first, then we > write code. I did write a few tests classes first to test correctness, they were in the first patch, I'm sorry you missed them. I didn't produce tests for gathering performance numbers it was obvious (to me anyway) that allowing for multiple concurrent readers would scale better. > Anyway, I hope that the two hours I just spent performing a code review of > your submission results in some good ideas for you. If you don't bill me for your time, I won't bill you for mine. > I'm all for lowering barriers to entry, but I will say that there are a > *lot* of very complicated technical decisions that have to be made to ensure > that the next design is solid. You can take a look at the archives of the > developer listserv to get a feel for the issues that are being discussed, > and if you have opinions on great ways to implement these kinds of changes, > post them and get added to the team! As you said, you had a need for record compression and went ahead and added that in. Luckily for you, it didn't have to affect the API in any way. Unfortunately, I don't see any easy way out of maintaining the synchronization contract between TupleBrowser and BTree. And since I'm interested mainly in improving read performance on two- and four-way dual core processors. I don't have any ideas on how to move forward other than either you host my changes or I host my own. And let's get serious: I'm not going to go implement some of the other features such as multiple, concurrent transactions so you might bestow me the privilege of hosting my own work. |
From: Kevin D. <ke...@tr...> - 2008-11-21 19:30:35
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD> <STYLE type=text/css> P, UL, OL, DL, DIR, MENU, PRE { margin: 0 auto;}</STYLE> <META content="MSHTML 6.00.2900.3059" name=GENERATOR></HEAD> <BODY leftMargin=1 topMargin=1 rightMargin=1><FONT face=Arial size=2> <DIV>The TupleBrowser situation can be addressed by introducing a new class that has the new semantecs, and leave the old TupleBrowser as is.</DIV> <DIV> </DIV> <DIV> </DIV> <DIV>My biggest thing here is the large number of changes at many levels in the API. All of these changes involve thread interaction, which means they are difficult to properly test.</DIV> <DIV> </DIV> <DIV>As an intermediate step, have you considered just replacing each synchronized key word with a readwritelock block? Such as:</DIV> <DIV> </DIV> <DIV>public synchronized Object fooRead(){</DIV> <DIV> </DIV> <DIV>}</DIV> <DIV> </DIV> <DIV>with:</DIV> <DIV> </DIV> <DIV>public Object fooRead(){</DIV> <DIV> readWriteLock.getReadLock().lock();</DIV> <DIV> try{</DIV> <DIV> </DIV> <DIV> </DIV> <DIV> } finally {</DIV> <DIV> readWriteLock.getReadLock().unlock();</DIV> <DIV> }</DIV> <DIV>}</DIV> <DIV> </DIV> <DIV> </DIV> <DIV>this type of change is safe to make, easy for others to visually verify/validate, and should make a big difference on the concurrency front. I would also suggest leaving out the annotations dependency at this point.</DIV> <DIV> </DIV> <DIV> </DIV> <DIV>Once that is solid, then a next pass that involves more surgically dealing with things like stateless serializers could be done. That set of changes would be fairly small and limited to one or two classes. Again, much easier to verify visually - plus it would make sense to spend the time writing the unit tests (which I expect would be difficult to write) for just that one area of the API.</DIV> <DIV> </DIV> <DIV>We may very well find that the vast majority of the performance gains come from the first pass of changes (just moving to readwritelock instead of hard synchronization), in which case changes in other areas of the API could be deferred - or evaluated in the context of the performance gains that each provides.</DIV> <DIV> </DIV> <DIV>I think this is a reasonable approach that should get you 99% of what you are looking for, while still giving the rest of the jdbm user population the confidence that we haven't accidentally introduced a difficult to find race condition.</DIV> <DIV> </DIV> <DIV> </DIV> <DIV>At this point, the code review has turned up several potential race conditions, and introducing those to the jdbm code base would not be advisable without analysis to ensure that they absolutely cannot occur. If another team member (Bryan maybe) is willing to do a code review of their own, and their conclusions conflict with mine, then we have a starting point for working things out. I have created a branch in SVN with your changes applied to promote this type of review. Hopefully they will find some flaws in my analysis.</DIV> <DIV> </DIV> <DIV>- K<BR></DIV> <DIV> </DIV></FONT> <DIV style="FONT-SIZE: x-small; FONT-FAMILY: Tahoma"> <DIV>----------------------- <B>Original Message</B> -----------------------</DIV> <DIV> </DIV> <DIV><B>From:</B> "Elias Ross" <A href="mailto:ge...@no..."><FONT color=#0000ff><ge...@no...></FONT></A></DIV> <DIV><B>To:</B> "Kevin Day" <A href="mailto:ke...@tr..."><FONT color=#0000ff><ke...@tr...></FONT></A></DIV> <DIV><B>Cc:</B> "JDBM Developer listserv" <A href="mailto:jdb...@li..."><FONT color=#0000ff><jdb...@li...></FONT></A></DIV> <DIV><B>Date:</B> Thu, 20 Nov 2008 22:53:44 -0800</DIV> <DIV><B>Subject: <U>Re: [Jdbm-developer] "Fine grained" Read/Write locking</U></B></DIV> <DIV> </DIV></DIV><FONT face=Tahoma size=2> <DIV>On Wed, Nov 19, 2008 at 11:45 AM, Kevin Day <A href="mailto:ke...@tr..."><FONT color=#0000ff><ke...@tr...></FONT></A> wrote:<BR>><BR>> This is made doubly difficult because it is<BR>> hard to unit test concurrency. But by the same token, that is exactly why<BR>> having the tests is so critically important. Is the objection that you have<BR>> to writing the tests? If so, then I think you will run into some problems<BR>> with this particular team of developers - we write tests first, then we<BR>> write code.<BR><BR>I did write a few tests classes first to test correctness, they were<BR>in the first patch, I'm sorry you missed them. I didn't produce tests<BR>for gathering performance numbers it was obvious (to me anyway) that<BR>allowing for multiple concurrent readers would scale better.<BR><BR>> Anyway, I hope that the two hours I just spent performing a code review of<BR>> your submission results in som e good ideas for you.<BR><BR>If you don't bill me for your time, I won't bill you for mine.<BR><BR>> I'm all for lowering barriers to entry, but I will say that there are a<BR>> *lot* of very complicated technical decisions that have to be made to ensure<BR>> that the next design is solid. You can take a look at the archives of the<BR>> developer listserv to get a feel for the issues that are being discussed,<BR>> and if you have opinions on great ways to implement these kinds of changes,<BR>> post them and get added to the team!<BR><BR>As you said, you had a need for record compression and went ahead and<BR>added that in. Luckily for you, it didn't have to affect the API in<BR>any way. Unfortunately, I don't see any easy way out of maintaining<BR>the synchronization contract between TupleBrowser and BTree. And since<BR>I'm interested mainly in improving read performance on two- and<BR>four-way dual core processors. I don't have any ideas on how to mo ve<BR>forward other than either you host my changes or I host my own.<BR><BR>And let's get serious: I'm not going to go implement some of the other<BR>features such as multiple, concurrent transactions so you might bestow<BR>me the privilege of hosting my own work.<BR><BR></DIV></FONT></BODY></HTML> |
From: Bryan T. <br...@sy...> - 2008-11-30 20:07:07
|
All, I've been sick, snowed by our own work, and then on thanksgiving vacation. No one is trying to shut anyone out. I would definately recommend that any concurrency changes go on in a branch since the head is in good shape right now. Here is my recommendation for achieving concurrent readers for jdbm. 1. Use a canonicalizing map with recid keys and weak reference values for the btree and its nodes and leaves. There is an implementation of such a map in jdbm (the WeakCache), but it is not enabled by default. 2. Ensure that there is a serialized block when you read in a child node/leaf so that the in memory data structure remains coherent (you don't want to have different instances of the same node or leaf in memory). What I did for higher concurrency (in bigdata) is to create a lock per child (by synchronizing on the element of an Object[] corresponding to the child that needed to be read from the store) so that contention can only arise when you have two readers that each seek to bring the same child into memory. This should work with jdbm as long as you do not allow a concurrent writer. That should get you concurrent readers, but it will not support interleaving writers with readers. The problem with (one or many) writer(s) is that B+Tree inserts and deletes can cause overflow or underflow of a node/leaf (though I am not sure if jdbm implements underflow for key deletes on the B+Tree). Both overflow and underflow operations on a B+Tree can have non-local changes. The most obvious example is an insert into a leaf can cause all of its parents to be split up to and including the root node, at which point a new root node is created and the depth of the tree is increased by one. These non-local changes will cause both concurrent readers and concurrent writers to fail in interesting ways. jdbm has absolutely no locking strategy in place to handle these non-local effect of mutation on the B+Tree. Instead it relies on all access to the B+Tree being serialized at a high level. Databases that allow this sort of thing with a B+Tree design similar to jdbm use a variety of mechanisms including lock chaining or a B-link tree (a variant of B+Tree where a pointer is kept to the new sibling when a node or leaf is split). You can not make this problem go away with only local locking, synchronized blocks, etc. If you consider the case I mention above (an insert into a leaf causes the root node to be split) you will see that ANY insert could wind up requiring a lock on the root node and you will not know whether or not you need that lock until you locate the leaf into which you are inserting (or from which you are removing) the key. Concurrency is not possible for readers and a writer (or for multiple writers) without addressing these non-local effects of mutation on the B+Tree. However, you can have concurrent readers as long as there is no writer. Since any writer would change the state of a record that the readers could then request from the store, the data structure simply can not be made coherent without either locking or transactions. If anyone is considering modifying jdbm to be a transactional database, please note that the block manager operates on 8k blocks. Entire blocks are read or written at a time. A block can hold many records (some records may span more than one block). So a transaction design for jdbm would require an exclusive write lock on the block while isolated readers would see the pre-modification state of the block. Achieving higher concurrency in jdbm at the recman layer is its own step and should have its own stress tests. Right now thread-safety is achieved by synchronized keywords on the recman API. It should be possible to re-visit the guts of the logical to physical mapping and modify them to be thread-safe. There are a number of issues that would have to be tackled including safe commit and delete (managing the persistent free list. I have no specific advice on how to approach this but it is something that Kevin and I have discussed in the past. So, in the absence of some substantive changes that introduce appropriate non-local locking constructs into jdbm, my recommendation is that the application should be responsible for serializing writers (giving them exclusive access to the BTree, including exclusive of readers). -bryan PS: Kevin, could you forward my email to the list since it will be bounced. > -----Original Message----- > From: Elias Ross [mailto:ge...@no...] > Sent: Friday, November 21, 2008 1:54 AM > To: Kevin Day > Cc: JDBM Developer listserv > Subject: Re: [Jdbm-developer] "Fine grained" Read/Write locking > > On Wed, Nov 19, 2008 at 11:45 AM, Kevin Day > <ke...@tr...> wrote: > > > > This is made doubly difficult because it is hard to unit test > > concurrency. But by the same token, that is exactly why having the > > tests is so critically important. Is the objection that > you have to > > writing the tests? If so, then I think you will run into some > > problems with this particular team of developers - we write tests > > first, then we write code. > > I did write a few tests classes first to test correctness, > they were in the first patch, I'm sorry you missed them. I > didn't produce tests for gathering performance numbers it was > obvious (to me anyway) that allowing for multiple concurrent > readers would scale better. > > > Anyway, I hope that the two hours I just spent performing a code > > review of your submission results in some good ideas for you. > > If you don't bill me for your time, I won't bill you for mine. > > > I'm all for lowering barriers to entry, but I will say that > there are > > a > > *lot* of very complicated technical decisions that have to > be made to > > ensure that the next design is solid. You can take a look at the > > archives of the developer listserv to get a feel for the > issues that > > are being discussed, and if you have opinions on great ways to > > implement these kinds of changes, post them and get added > to the team! > > As you said, you had a need for record compression and went > ahead and added that in. Luckily for you, it didn't have to > affect the API in any way. Unfortunately, I don't see any > easy way out of maintaining the synchronization contract > between TupleBrowser and BTree. And since I'm interested > mainly in improving read performance on two- and four-way > dual core processors. I don't have any ideas on how to move > forward other than either you host my changes or I host my own. > > And let's get serious: I'm not going to go implement some of > the other features such as multiple, concurrent transactions > so you might bestow me the privilege of hosting my own work. > > -------------------------------------------------------------- > ----------- > This SF.Net email is sponsored by the Moblin Your Move > Developer's challenge Build the coolest Linux based > applications with Moblin SDK & win great prizes Grand prize > is a trip for two to an Open Source event anywhere in the > world http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Jdbm-developer mailing list > Jdb...@li... > https://lists.sourceforge.net/lists/listinfo/jdbm-developer > |
From: Kevin D. <ke...@tr...> - 2008-11-19 18:26:37
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD> <STYLE type=text/css> P, UL, OL, DL, DIR, MENU, PRE { margin: 0 auto;}</STYLE> <META content="MSHTML 6.00.2900.3059" name=GENERATOR></HEAD> <BODY leftMargin=1 topMargin=1 rightMargin=1><FONT face=Arial size=2> <DIV>Elias-</DIV> <DIV> </DIV> <DIV>I just finished an initial review of the changes in the patch... To help with evaluation, I've created a branch in SVN called 'Concurrent' that has your patches applied. I have also added your BTree test to Trunk so we can get side-by-side performance comparisons.</DIV> <DIV> </DIV> <DIV> </DIV> <DIV>For reference, here are my test times on the BTree test:</DIV> <DIV> </DIV> <DIV>With concurrency changes: 12992 ms</DIV> <DIV>Without concurrency changes: 14891 ms</DIV> <DIV> </DIV> <DIV>So on my system that gives a delta of 1899 ms, or 13% improvement.</DIV> <DIV> </DIV> <DIV>For those of you who are following along at home, these durations are for the *random read* portion of the test only - they do not reflect the time spent populating the tree. I do not have a working profiler right now (friggin eclipse TPTP), so I'm not sure where most of the delay is coming from.</DIV> <DIV> </DIV> <DIV> </DIV> <DIV> </DIV> <DIV>I'm a bit puzzled by the fact that the total time spent deserializing during the test (per the printlns added to BaseRescordManager) was 14117 ms - most likely caused by the rounding error on the accuracy of System.currentTimeMillis()...</DIV> <DIV> </DIV> <DIV> </DIV> <DIV> </DIV> <DIV> </DIV> <DIV>Here are my comments so far on the code:</DIV> <DIV> </DIV> <DIV>1. <FONT size=2>ConcurrentBTreeReadTest.read() - this re-creates the Random object each time it is called, which results in non-random behavior. Probably not a real issue, per se, but it would be more correct to create the Random object as a thread local variable, or include it inside the Runnable itself. If the later, be careful - you are re-using the same Runnable in each thread, and this change would make the Runnable stateful.</FONT></DIV> <DIV> </DIV> <DIV>2. Inclusion of a non-functional dependency (for annotation inclusion) may obscure things a bit. We try to keep the dependency footprint small on jdbm - the other team members may have some opinions on inclusion of this particular dependency...</DIV> <DIV> </DIV> <DIV>3. As a matter of keeping the API clean, it may be advisable to expose the readWriteLock as a single entity, and access the readLock() and writeLock() members as needed (as opposed to breaking them out into separate internal variables in BTree, etc...). That's a minor thing... These should also be captured as final, and the member variable name refactored to be more meaningful (rwl, r, w are somewhat hard to read).</DIV> <DIV> </DIV> <DIV>4. In BTree and BaseRecordManager (and possibly elsewhere): I'm pretty sure that static initialization of rwl, r and w in the variable declaration block will give problems during deserialization (see <A href="http://www.javaworld.com/javaworld/jw-04-1998/jw-04-beans.html">http://www.javaworld.com/javaworld/jw-04-1998/jw-04-beans.html</A>).</DIV> <DIV> </DIV> <DIV>5. How are you using/intend to use StripedLock?</DIV> <DIV> </DIV> <DIV>6. SynchronizedLongKeyMap seems like it could be achieved using Collections.synchronizedMap(). I'm also not entirely sure why such a low level construct would be needed (I'm pretty sure we are trying to manage our synchronization at a higher level, right?)</DIV> <DIV> </DIV> <DIV>7. BaseRecordManager.<FONT size=2>disableTransactions - probably no need to worry with synchronization on this call - API docs say that this call can only be made once, at recman creation. Minor quibble, and it's probably best to leave it in for consistency sake - this comment is more to point out how that method is used.</FONT></DIV> <DIV> </DIV> <DIV>8. I noticed that synchronized is being removed from BaseRecordManager.insert() - I'm not sure that this is safe to do. It seems that there is a potential race condition between updating the logical row manager and the buffer... Same concern with delete(). I may be missing an implicit relationship between the updates to the buffers, the logical record manager and the physical record manager - but I do know that it is absolutely critical to keep those in sync with each other. Shouldn't these operations be protected with a w.lock() call? Shouldn't fetch() be protected with a r.lock() call?</DIV> <DIV> </DIV> <DIV>9. BlockIo.<FONT size=2>incrementTransactionCount - this results in two atomic operations (an increment and the set of clean). This is a race condition.</FONT></DIV> <DIV> </DIV> <DIV>10. any reason for flipping the dirty flag set to be before the array update in writeByte(), writeShort(), etc.... ?</DIV> <DIV> </DIV> <DIV>11. RecordFile - member variables <FONT size=2>transactionsDisabled, <FONT size=2>syncOnClose, debug will never change. They should really just be marked final, but b/c of the factory, we can't have these be part of the constructor (I don't think...), which eliminates final as an option.</FONT></FONT></DIV> <DIV> </DIV> <DIV>12. RecordFile.get() - it's interesting to me that this method would require addition of a synchronized block. I believe that the original intent was to have RecordFile not be threadsafe, and anyone using it would need to ensure thread safety. It doesn't seem to me that the new synchornized block is sufficient to ensure the internal integrity of the RecordFile data structures. As an example, there is an invariant that a block should only belong to one of the lists (inTxn, inUse, dirty, clean) at a time. If get() can be called in a non-thread safe manner, then I'm not certain that this invariant can be ensured. The lists themselves are synchronized, but I see a potential race when moving a node from one list to another. Comments?</DIV> <DIV> </DIV> <DIV>13. TransactionManager - I haven't traced this all the way through, but are you certain that the synchronized keyword on synchronizeLog() is sufficient to ensure that the in-memory state of the blocks is not being modified while the log is being synchronized? It we were synching from the file, it wouldn't be an issue, but that is pretty slow, so we do the sync from the in-memory representation of the changed blocks. If another thread was changing the block at the same time that the block was being written to the database, it would be a mess.</DIV> <DIV> </DIV> <DIV> </DIV> <DIV> </DIV> <DIV> </DIV> <DIV>Lots of stuff - hopefully it will provide food for thought and a starting point for others to review the code.</DIV> <DIV> </DIV> <DIV>Cheers,</DIV> <DIV> </DIV> <DIV> </DIV> <DIV> </DIV> <DIV>Kevin Day<BR><BR></DIV> <DIV> </DIV></FONT> <DIV style="FONT-SIZE: x-small; FONT-FAMILY: Tahoma"> <DIV>----------------------- <B>Original Message</B> -----------------------</DIV> <DIV> </DIV> <DIV><B>From:</B> "Elias Ross" <A href="mailto:ge...@no..."><FONT color=#0000ff><ge...@no...></FONT></A></DIV> <DIV><B>To:</B> "Kevin Day" <A href="mailto:ke...@tr..."><FONT color=#0000ff><ke...@tr...></FONT></A></DIV> <DIV><B>Cc:</B> JDBM Developer listserv <A href="mailto:jdb...@li..."><FONT color=#0000ff><jdb...@li...></FONT></A></DIV> <DIV><B>Date:</B> Fri, 14 Nov 2008 21:26:47 -0800</DIV> <DIV><B>Subject: <U>Re: [Jdbm-developer] "Fine grained" Read/Write locking</U></B></DIV> <DIV> </DIV></DIV><FONT face=Tahoma size=2> <DIV>On Fri, Nov 14, 2008 at 4:34 PM, Kevin Day <A href="mailto:ke...@tr..."><FONT color=#0000ff><ke...@tr...></FONT></A> wrote:<BR><BR>> I think that it's going to be very important to demonstrate the performance<BR>> advantages of these changes before we roll them into HEAD - especially with<BR>> the case of the tuple browser (the current API does not forbid anyone from<BR>> holding onto a tuple browser for a long period of time - if they do so, the<BR>> browser may not be valid anymore - but it wouldn't lock the tree).<BR>><BR><BR>I added a test that does purely reads. It runs in about 19 seconds<BR>with my changes, 24 seconds without, so about 25% faster. This is on a<BR>dual core system. Admittedly, there seems to be a bug that occurs<BR>occasionally, once in 20000-30000 reads. I still need to look at that.<BR>But there you go.<BR><BR>>From an outsider's perspective, JDBM is a project with no significant<BR>development, with a single 1.0 release that's 3 years old. So I don't<BR>really feel like begging for inclusion of my changes.<BR><BR>-------------------------------------------------------------------------<BR>This SF.Net email is sponsored by the Moblin Your Move Developer's challenge<BR>Build the coolest Linux based applications with Moblin SDK & win great prizes<BR>Grand prize is a trip for two to an Open Source event anywhere in the world<BR><A href="http://moblin-contest.org/redirect.php?banner_id=100&url=/"><FONT color=#0000ff>http://moblin-contest.org/redirect.php?banner_id=100&url=/</FONT></A><BR>_______________________________________________<BR>Jdbm-developer mailing list<BR><A href="mailto:Jdb...@li..."><FONT color=#0000ff>Jdb...@li...</FONT></A><BR><A href="https://lists.sourceforge.net/lists/listinfo/jdbm-developer"><FONT color=#0000ff>https://lists.sourceforge.net/lists/listinfo/jdbm-developer</FONT></A><BR><BR></DIV></ FONT></BODY></HTML> |
From: Bryan T. <br...@sy...> - 2008-11-17 16:20:01
|
FYI, in bigdata I have use read-write locks in a delegation pattern over the basic bigdata BTree. The bigdata BTree implementation is safe for concurrent readers (to my knowledge jdbm never tried to ensure that this was true). The delegation pattern allows code that does not want to use the resource locking mechanisms to allow concurrent readers or a single writer. It works fine (in my case, which is a different code base), but there is a HUGE performance penalty for small index operations. I always try to make sure that I am doing some relatively large "batch" operation on the index rather than point tests if I am going to use a read-write lock to prevent concurrent access. Bigdata uses "index procedures" for this purpose. Sort of like Java-based stored prodecures. For bigdata, you can also send them to a remote index (or index partition) for execution. Anyway, my main point is that there is a fairly heavy overhead for locking with a read-write lock and if you really have contention you want to make sure that you are maximizing the work done inside of the lock. -bryan _____ From: Kevin Day [mailto:ke...@tr...] Sent: Friday, November 14, 2008 7:34 PM To: JDBM Developer listserv Subject: Re: [Jdbm-developer] "Fine grained" Read/Write locking Just a guess: If you are seeing the Key not found exception, are you maybe accessing the BTree in a non-thread safe way in your test (i.e. you have a list of keys that you are working with, you remove a key in one thread, but do not update the list of keys that you are working with before another thread uses the list). I guess the question is: Who removed key 16 from the tree, and did they do it without the knowledge of whatever thread is also doing another remove of the key 16 from the tree? Chances are pretty good that this is a test problem and not a BTree implementation issue (especially b/c you were seeing the problem even with the synchronized blocks in place). I think that it's going to be very important to demonstrate the performance advantages of these changes before we roll them into HEAD - especially with the case of the tuple browser (the current API does not forbid anyone from holding onto a tuple browser for a long period of time - if they do so, the browser may not be valid anymore - but it wouldn't lock the tree). If you can come up with some performance tests that we can use to show before and after behavior, I can get the test added to SVN, we can do some performance measurements with the current HEAD build, then perhaps create a branch and get performance metrics with the patch. Sound good? - K ----------------------- Original Message ----------------------- From: "Elias Ross" <mailto:ge...@no...> <ge...@no...> To: "Kevin Day" <mailto:ke...@tr...> <ke...@tr...>, "JDBM Developer listserv" <mailto:jdb...@li...> <jdb...@li...> Cc: Date: Fri, 14 Nov 2008 13:39:34 -0800 Subject: Re: [Jdbm-developer] "Fine grained" Read/Write locking Re: <https://sourceforge.net/tracker/index.php?func=detail&aid=2281809&group_id= 4155&atid=304155> https://sourceforge.net/tracker/index.php?func=detail&aid=2281809&group_id=4 155&atid=304155 I updated the concurrent patch with a small fix to RecMan and changed BTree to use read-write locks and added an explicit close() method to the Browser to release the read lock. This is a significant API and compatibility change but probably one that needs to happen at some point. I see this exception occasionally: java.lang.IllegalArgumentException: Key not found: 16 at jdbm.btree.BPage.remove(BPage.java:548) at jdbm.btree.BPage.remove(BPage.java:561) at jdbm.btree.BTree.remove0(BTree.java:436) at jdbm.btree.BTree.remove(BTree.java:418) at jdbm.btree.Concurrent Test.delete(ConcurrentTest.java:125) at jdbm.btree.ConcurrentTest.remove(ConcurrentTest.java:167) But this happens with the current release of JDBM as well, so I don't suppose it's a regression. I haven't measured any improvement with lots of concurrent read-write but I imagine concurrent reads are faster since they won't block each other exclusively. This is probably the most important area of improvement I made. To improve performance of mutations, some some of optimistic concurrency could be used, e.g. the appropriate BPage to modify is first found under a read lock, the version is noted. The read lock is released, then a write lock is obtained for that BPage, and then a version check is made if to see if the BPage was since modified. Then the modification occurs and the lock is released. ------------------------------------------------------------------------- This SF.Net email is sponso red by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world <http://moblin-contest.org/redirect.php?banner_id=100&url=/> http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Jdbm-developer mailing list <mailto:Jdb...@li...> Jdb...@li... <https://lists.sourceforge.net/lists/listinfo/jdbm-developer> https://lists.sourceforge.net/lists/listinfo/jdbm-developer |