From: 'Kevin D. ' <ke...@tr...> - 2006-02-16 23:36:26
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD> <STYLE type=text/css> P, UL, OL, DL, DIR, MENU, PRE { margin: 0 auto;}</STYLE> <META content="MSHTML 6.00.2900.2802" name=GENERATOR></HEAD> <BODY leftMargin=1 topMargin=1 rightMargin=1><FONT face=Tahoma> <DIV><FONT face=Arial size=2>Alex-</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>Maybe you can help - can you explain to me how MVCC could have anything to do serializability of transactions? Am I completely off-base in saying that adding read locking to MVCC would defeat the purpose?</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>I'm obviously having a massive disconnect.</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>Thanks,</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>- K</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2></FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2> </FONT> <TABLE> <TBODY> <TR> <TD width=1 bgColor=blue><FONT face=Arial size=2></FONT></TD> <TD><FONT face=Arial size=2><FONT color=blue>> Kevin,<BR><BR>MVCC does provide serializability. The tradeoffs among correct<BR>concurrency control mechanisms involve performance not correctness.<BR>The problems with performance for MVCC are that, by itself, it tends<BR>to produce more transaction restarts and lower overall transaction<BR>throughput when compared to 2PL. Postgres is some sort of hybrid<BR>of 2PL and MVCC. I tried their chat group yesterday to get some<BR>insight into exactly what kind of a hybrid, but no one could say "Oh,<BR>we implemented XYZ".<BR><BR>I do not think that MVCC (or any hybrid of MVCC) is a silver bullet<BR>for performance. Neither postgres nor Oracle is a clear performance<BR>winner -- in fact I have personal experience with the one of these<BR>platforms which places it an order of magnitude slower than jdbm for<BR>our application (an semantic web store) using a solution developer by<BR>the database people themselves.<BR><BR>I recommend a closer reading of this paper. I have found later articles<BR>which use analytic techniques to examine the performance of a variety of<BR>concurrency control mechanisms, but none which benchmarks real databases<BR>explicitly in terms of this issue. Perhaps we could look at some of the<BR>RDBMS benchmark literature for some insight there?<BR><BR>Enjoy your vacation,<BR><BR>-bryan<BR><BR>-----Original Message-----<BR>From: <A href="mailto:jdb...@li..."><FONT color=#0000ff>jdb...@li...</FONT></A><BR>To: JDBM Developer listserv<BR>Sent: 2/16/2006 4:48 PM<BR>Subject: re[2]: [Jdbm-developer] MC-DataSafe <BR><BR>Bryan-<BR><BR>Can you please provide a description of how MVCC (with non-serialized<BR>processing of transactions) would cause either of the two lost data/bad<BR>data examples described in the paper?<BR><BR>The way I read your email, it sounds like you are implying that MVCC<BR>can't provide transaction isolation or atomic updates, when it<BR>absolutely does. MVCC just operates optimistically in these situations.<BR>Under MVCC in the first example, an abort would occur. In the second<BR>example under MVCC, the report created in T2 would be correct, and T1<BR>would commit without problems. In either case, nothing gets left on the<BR>floor...<BR><BR>The massive advantage of MVCC comes in the second example: If the<BR>report required a significant amount of time to complete (it was reading<BR>a huge number of rows), then serializing the transactions as described<BR>in the paper will cause massive slow downs for the poor sucker who is<BR>trying to execute T1 (and kill concurrency). In MVCC, the update would<BR>happen immediately and return, even if T2 takes a long time to complete.<BR>This absolutely maximizes concurrency, and is perfectly safe.<BR><BR><BR>I'm going to need some serious convincing with concrete examples before<BR>I'm willing to favor serializable transactions over MVCC's approach. My<BR>concern at this stage is that much of the literature on these topics<BR>either doesn't account for the radical departure from the norm that MVCC<BR>is, or it is incorrect in it's description of how MVCC actually operates<BR>in practice.<BR><BR>I actually find it humorous that the concurrency control paper we are<BR>talking about here says that "the problem for nondistributed DMBSs is<BR>well understood... and one approach called two-phase locking has been<BR>accepted as a standard solution." That may have been true in 1981 when<BR>the paper was published, but the good folks at PostgreSQL and others<BR>have pretty much blown that out of the water.<BR><BR>This means that any broad claims made in these older papers have to be<BR>taken with a considerable grain of salt, and evaluated in the context of<BR>the new development of MVCC. Bernstein and Goodman are certainly going<BR>to say that 2 phase locking is the only way to go - they weren't even<BR>aware that a better approach was possible.<BR><BR>Anyway - I'm going to be gone until the 28th - hopefully you and Alex<BR>can continue the discussion with vigour while I'm gone!<BR><BR>Cheers,<BR><BR>- K<BR><BR><A href="http://www-static.cc.gatech.edu/classes/AY2003/cs8803i_fall/ConcurrencyC"><FONT color=#0000ff>http://www-static.cc.gatech.edu/classes/AY2003/cs8803i_fall/ConcurrencyC</FONT></A><BR>ontrol.pdf<BR><A href="http://www-static.cc.gatech.edu/classes/AY2003/cs8803i_fall/Concurrency"><FONT color=#0000ff><http://www-static.cc.gatech.edu/classes/AY2003/cs8803i_fall/Concurrency</FONT></A><BR>Control.pdf> <BR> <BR> > I have to say that I favor serializable transactions -- at<BR>least in the<BR>abstract. Non-serializable transactions can leave things on the floor.<BR>If you do not have a concrete application you don't know what that might<BR>be. Two classic examples of problems with not enforcing<BR>serializablility<BR>are given at the start of the paper, e.g., a lost deposit into a bank<BR>account.<BR><BR>-bryan<BR><BR>-----Original Message-----<BR>From: Alex Boisvert <A href="mailto:boi...@in..."><FONT color=#0000ff><mailto:boi...@in...></FONT></A><BR><A href="mailto:boi...@in..."><FONT color=#0000ff>[mailto:boi...@in...]</FONT></A><BR>Sent: Thu 2/16/2006 11:55 AM<BR>To: Thompson, Bryan B.<BR>Cc: 'Kevin Day '; <A href="mailto:jdb...@li..."><FONT color=#0000ff><mailto:jdb...@li...></FONT></A><BR><A href="mailto:jdb...@li..."><FONT color=#0000ff>'jdb...@li...</FONT></A> '; '''JDBM<BR>Developer listserv ' ' '<BR>Subject: Re: [Jdbm-developer] MC-DataSafe (in Persistent Store Inte<BR>rface).<BR><BR><BR>I agree with figure 11 if you want to guarantee a completely<BR>serializable history. However, you can relax the isolation with MVCC to<BR>allow such ordering to happen, which won't guarantee complete<BR>serializability but in most cases will result in a consistent outcome.<BR><BR>The need to balance concurrency and consistency is satisfied by most<BR>databases by offering different levels of isolation (read uncomitted,<BR>read committed, repeatable read, serializable, ...)<BR><BR>alex<BR><BR><BR>Thompson, Bryan B. wrote:<BR><BR>>I think that we probably need to be much more concrete. However MVCC<BR>>does NOT permit a write to create a version which would pre-date a<BR>>version which had already been read by another tx. Intuitively such a<BR>>write would mean that the read had accessed the incorrect version.<BR>This<BR>>is the subject of figure 11 in [1]. Are we disagreeing about the<BR>meaning<BR>>of figure 11?<BR>><BR>>-bryan<BR>> <BR>><BR><BR><BR><BR>-------------------------------------------------------<BR>This SF.net email is sponsored by: Splunk Inc. Do you grep through log<BR>files<BR>for problems? Stop! Download the new AJAX search engine that makes<BR>searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!<BR><BR><A href="http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=12164"><FONT color=#0000ff><http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=12164</FONT></A><BR>2><BR><A href="http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642"><FONT color=#0000ff>http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642</FONT></A><BR>_______________________________________________<BR>Jdbm-developer mailing list<BR><A href="mailto:Jdb...@li..."><FONT color=#0000ff><mailto:Jdb...@li...></FONT></A><BR><A href="mailto:Jdb...@li..."><FONT color=#0000ff>Jdb...@li...</FONT></A><BR><A href="https://lists.sourceforge.net/lists/listinfo/jdbm-developer"><FONT color=#0000ff><https://lists.sourceforge.net/lists/listinfo/jdbm-developer></FONT></A><BR><A href="https://lists.sourceforge.net/lists/listinfo/jdbm-developer"><FONT color=#0000ff>https://lists.sourceforge.net/lists/listinfo/jdbm-developer</FONT></A><BR><BR><<BR> <BR>------------------------------------------------------- This SF.net<BR>email is sponsored by: Splunk Inc. Do you grep through log files for<BR>problems? Stop! Download the new AJAX search engine that makes searching<BR>your log files as easy as surfing the web. DOWNLOAD SPLUNK!<BR><A href="http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642"><FONT color=#0000ff>http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642</FONT></A><BR>_______________________________________________ Jdbm-developer mailing<BR>list <A href="mailto:Jdb...@li..."><FONT color=#0000ff>Jdb...@li...</FONT></A><BR><A href="https://lists.sourceforge.net/lists/listinfo/jdbm-developer"><FONT color=#0000ff>https://lists.sourceforge.net/lists/listinfo/jdbm-developer</FONT></A><BR><BR><<BR></FONT></FONT></TD></TR></TBODY></TABLE></DIV></FONT></BODY></HTML> |
From: Alex B. <boi...@in...> - 2006-02-17 00:19:23
|
I think it would look similar to what industrial databases are doing today: -read uncommitted -read committed -repeatable read -serializable This is more about relaxing MVCC than locks. You can stack locking on top of these isolation levels for custom levels of conflict allowance or prevention. alex Thompson, Bryan B. wrote: >Alex, > >If you are going to relax conflict analysis, what is that going >to look like under the "proposed" scheme? Release of some locks >before all locks have been acquired? > >-bryan > >-----Original Message----- >From: jdb...@li... >To: 'Kevin Day ' >Cc: 'JDBM Developer listserv ' >Sent: 2/16/2006 6:45 PM >Subject: Re: [Jdbm-developer] MC-DataSafe > > >Serializability is more about how strict you are when doing conflict >analysis. > >To have total ordering (aka serializability) you must enforce that the >read and write set of a transaction are completely disjoint from the >write set of any transaction committed after the transaction started. > >alex > > >'Kevin Day ' wrote: > > > >>Alex- >> >>Maybe you can help - can you explain to me how MVCC could have >>anything to do serializability of transactions? Am I completely >>off-base in saying that adding read locking to MVCC would defeat the >>purpose? >> >>I'm obviously having a massive disconnect. >> >>Thanks, >> >>- K >> >> >> >> >> > Kevin, >> >>MVCC does provide serializability. The tradeoffs among correct >>concurrency control mechanisms involve performance not correctness. >>The problems with performance for MVCC are that, by itself, it tends >>to produce more transaction restarts and lower overall transaction >>throughput when compared to 2PL. Postgres is some sort of hybrid >>of 2PL and MVCC. I tried their chat group yesterday to get some >>insight into exactly what kind of a hybrid, but no one could say "Oh, >>we implemented XYZ". >> >>I do not think that MVCC (or any hybrid of MVCC) is a silver bullet >>for performance. Neither postgres nor Oracle is a clear performance >>winner -- in fact I have personal experience with the one of these >>platforms which places it an order of magnitude slower than jdbm for >>our application (an semantic web store) using a solution developer by >>the database people themselves. >> >>I recommend a closer reading of this paper. I have found later >> >> >articles > > >>which use analytic techniques to examine the performance of a variety >> >> >of > > >>concurrency control mechanisms, but none which benchmarks real >> >> >databases > > >>explicitly in terms of this issue. Perhaps we could look at some of >> >> >the > > >>RDBMS benchmark literature for some insight there? >> >>Enjoy your vacation, >> >>-bryan >> >>-----Original Message----- >>From: jdb...@li... >><mailto:jdb...@li...> >>To: JDBM Developer listserv >>Sent: 2/16/2006 4:48 PM >>Subject: re[2]: [Jdbm-developer] MC-DataSafe >> >>Bryan- >> >>Can you please provide a description of how MVCC (with non-serialized >>processing of transactions) would cause either of the two lost >> >> >data/bad > > >>data examples described in the paper? >> >>The way I read your email, it sounds like you are implying that MVCC >>can't provide transaction isolation or atomic updates, when it >>absolutely does. MVCC just operates optimistically in these >> >> >situations. > > >>Under MVCC in the first example, an abort would occur. In the second >>example under MVCC, the report created in T2 would be correct, and T1 >>would commit without problems. In either case, nothing gets left on >> >> >the > > >>floor... >> >>The massive advantage of MVCC comes in the second example: If the >>report required a significant amount of time to complete (it was >> >> >reading > > >>a huge number of rows), then serializing the transactions as described >>in the paper will cause massive slow downs for the poor sucker who is >>trying to execute T1 (and kill concurrency). In MVCC, the update >> >> >would > > >>happen immediately and return, even if T2 takes a long time to >> >> >complete. > > >>This absolutely maximizes concurrency, and is perfectly safe. >> >> >>I'm going to need some serious convincing with concrete examples >> >> >before > > >>I'm willing to favor serializable transactions over MVCC's approach. >> >> >My > > >>concern at this stage is that much of the literature on these topics >>either doesn't account for the radical departure from the norm that >> >> >MVCC > > >>is, or it is incorrect in it's description of how MVCC actually >> >> >operates > > >>in practice. >> >>I actually find it humorous that the concurrency control paper we are >>talking about here says that "the problem for nondistributed DMBSs is >>well understood... and one approach called two-phase locking has been >>accepted as a standard solution." That may have been true in 1981 >> >> >when > > >>the paper was published, but the good folks at PostgreSQL and others >>have pretty much blown that out of the water. >> >>This means that any broad claims made in these older papers have to be >>taken with a considerable grain of salt, and evaluated in the context >> >> >of > > >>the new development of MVCC. Bernstein and Goodman are certainly >> >> >going > > >>to say that 2 phase locking is the only way to go - they weren't even >>aware that a better approach was possible. >> >>Anyway - I'm going to be gone until the 28th - hopefully you and Alex >>can continue the discussion with vigour while I'm gone! >> >>Cheers, >> >>- K >> >> >> > > > >------------------------------------------------------- >This SF.net email is sponsored by: Splunk Inc. Do you grep through log >files >for problems? Stop! Download the new AJAX search engine that makes >searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! >http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 >_______________________________________________ >Jdbm-developer mailing list >Jdb...@li... >https://lists.sourceforge.net/lists/listinfo/jdbm-developer > > |
From: 'Kevin D. ' <ke...@tr...> - 2006-02-17 00:34:16
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD> <STYLE type=text/css> P, UL, OL, DL, DIR, MENU, PRE { margin: 0 auto;}</STYLE> <META content="MSHTML 6.00.2900.2802" name=GENERATOR></HEAD> <BODY leftMargin=1 topMargin=1 rightMargin=1><FONT face=Tahoma> <DIV><FONT face=Arial size=2>Bryan-</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2></FONT></DIV> <DIV><FONT face=Arial size=2>OK - I'll get my act together on this over the week.</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>Cheers!</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>- K</FONT></DIV> <DIV><FONT face=Arial size=2> </FONT> <TABLE> <TBODY> <TR> <TD width=1 bgColor=blue><FONT face=Arial size=2></FONT></TD> <TD><FONT face=Arial size=2><FONT color=blue>> That is why using 2PL with MVCC is so interesting. It allows you<BR>to induce a total ordering (by assigning timestamps) based on the<BR>locks that are actually acquired. If there is a totally ordering,<BR>then all is good. If not, then one or more transactions will have<BR>to be aborted and retried (to have serializability).<BR><BR>-bryan<BR><BR>-----Original Message-----<BR>From: <A href="mailto:jdb...@li..."><FONT color=#0000ff>jdb...@li...</FONT></A><BR>To: 'JDBM Developer listserv '<BR>Sent: 2/16/2006 6:55 PM<BR>Subject: re[2]: [Jdbm-developer] MC-DataSafe<BR><BR>Alex-<BR><BR>Does that imply that enforcing serializability would defeat the purpose<BR>of MVCC, or are these totally separate issues? I guess if you had<BR>sufficient information before any of the transactions were started you<BR>could come up with an optimal strategy - but that seems like its way<BR>outside the realm of practical implementation for a general purpose<BR>database...<BR><BR>- K<BR> <BR> > <BR>Serializability is more about how strict you are when doing conflict<BR>analysis.<BR><BR>To have total ordering (aka serializability) you must enforce that the<BR>read and write set of a transaction are completely disjoint from the<BR>write set of any transaction committed after the transaction started.<BR><BR>alex<BR><BR><BR>'Kevin Day ' wrote:<BR><BR>> Alex-<BR>> <BR>> Maybe you can help - can you explain to me how MVCC could have<BR>> anything to do serializability of transactions? Am I completely<BR>> off-base in saying that adding read locking to MVCC would defeat the<BR>> purpose?<BR>> <BR>> I'm obviously having a massive disconnect.<BR>> <BR>> Thanks,<BR>> <BR>> - K<BR>> <BR>> <BR>> <BR>><BR>> > Kevin,<BR>><BR>> MVCC does provide serializability. The tradeoffs among correct<BR>> concurrency control mechanisms involve performance not correctness.<BR>> The problems with performance for MVCC are that, by itself, it tends<BR>> to produce more transaction restarts and lower overall transaction<BR>> throughput when compared to 2PL. Postgres is some sort of hybrid<BR>> of 2PL and MVCC. I tried their chat group yesterday to get some<BR>> insight into exactly what kind of a hybrid, but no one could say "Oh,<BR>> we implemented XYZ".<BR>><BR>> I do not think that MVCC (or any hybrid of MVCC) is a silver bullet<BR>> for performance. Neither postgres nor Oracle is a clear performance<BR>> winner -- in fact I have personal experience with the one of these<BR>> platforms which places it an order of magnitude slower than jdbm for<BR>> our application (an semantic web store) using a solution developer by<BR>> the database people themselves.<BR>><BR>> I recommend a closer reading of this paper. I have found later<BR>articles<BR>> which use analytic techniques to examine the performance of a variety<BR>of<BR>> concurrency control mechanisms, but none which benchmarks real<BR>databases<BR>> explicitly in terms of this issue. Perhaps we could look at some of<BR>the<BR>> RDBMS benchmark literature for some insight there?<BR>><BR>> Enjoy your vacation,<BR>><BR>> -bryan<BR>><BR>> -----Original Message-----<BR>> From: <A href="mailto:jdb...@li..."><FONT color=#0000ff><mailto:jdb...@li...></FONT></A><BR><A href="mailto:jdb...@li..."><FONT color=#0000ff>jdb...@li...</FONT></A><BR>> <A href="mailto:jdb...@li..."><FONT color=#0000ff><mailto:jdb...@li...></FONT></A><BR><A href="mailto:jdb...@li..."><FONT color=#0000ff><mailto:jdb...@li...></FONT></A><BR>> To: JDBM Developer listserv<BR>> Sent: 2/16/2006 4:48 PM<BR>> Subject: re[2]: [Jdbm-developer] MC-DataSafe<BR>><BR>> Bryan-<BR>><BR>> Can you please provide a description of how MVCC (with non-serialized<BR>> processing of transactions) would cause either of the two lost<BR>data/bad<BR>> data examples described in the paper?<BR>><BR>> The way I read your email, it sounds like you are implying that MVCC<BR>> can't provide transaction isolation or atomic updates, when it<BR>> absolutely does. MVCC just operates optimistically in these<BR>situations.<BR>> Under MVCC in the first example, an abort would occur. In the second<BR>> example under MVCC, the report created in T2 would be correct, and T1<BR>> would commit without problems. In either case, nothing gets left on<BR>the<BR>> floor...<BR>><BR>> The massive advantage of MVCC comes in the second example: If the<BR>> report required a significant amount of time to complete (it was<BR>reading<BR>> a huge number of rows), then serializing the transactions as described<BR>> in the paper will cause massive slow downs for the poor sucker who is<BR>> trying to execute T1 (and kill concurrency). In MVCC, the update<BR>would<BR>> happen immediately and return, even if T2 takes a long time to<BR>complete.<BR>> This absolutely maximizes concurrency, and is perfectly safe.<BR>><BR>><BR>> I'm going to need some serious convincing with concrete examples<BR>before<BR>> I'm willing to favor serializable transactions over MVCC's approach.<BR>My<BR>> concern at this stage is that much of the literature on these topics<BR>> either doesn't account for the radical departure from the norm that<BR>MVCC<BR>> is, or it is incorrect in it's description of how MVCC actually<BR>operates<BR>> in practice.<BR>><BR>> I actually find it humorous that the concurrency control paper we are<BR>> talking about here says that "the problem for nondistributed DMBSs is<BR>> well understood... and one approach called two-phase locking has been<BR>> accepted as a standard solution." That may have been true in 1981<BR>when<BR>> the paper was published, but the good folks at PostgreSQL and others<BR>> have pretty much blown that out of the water.<BR>><BR>> This means that any broad claims made in these older papers have to be<BR>> taken with a considerable grain of salt, and evaluated in the context<BR>of<BR>> the new development of MVCC. Bernstein and Goodman are certainly<BR>going<BR>> to say that 2 phase locking is the only way to go - they weren't even<BR>> aware that a better approach was possible.<BR>><BR>> Anyway - I'm going to be gone until the 28th - hopefully you and Alex<BR>> can continue the discussion with vigour while I'm gone!<BR>><BR>> Cheers,<BR>><BR>> - K<BR>><BR><BR><BR><BR>-------------------------------------------------------<BR>This SF.net email is sponsored by: Splunk Inc. Do you grep through log<BR>files<BR>for problems? Stop! Download the new AJAX search engine that makes<BR>searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!<BR><BR><A href="http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=12164"><FONT color=#0000ff><http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=12164</FONT></A><BR>2><BR><A href="http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642"><FONT color=#0000ff>http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642</FONT></A><BR>_______________________________________________<BR>Jdbm-developer mailing list<BR><A href="mailto:Jdb...@li..."><FONT color=#0000ff><mailto:Jdb...@li...></FONT></A><BR><A href="mailto:Jdb...@li..."><FONT color=#0000ff>Jdb...@li...</FONT></A><BR><A href="https://lists.sourceforge.net/lists/listinfo/jdbm-developer"><FONT color=#0000ff><https://lists.sourceforge.net/lists/listinfo/jdbm-developer></FONT></A><BR><A href="https://lists.sourceforge.net/lists/listinfo/jdbm-developer"><FONT color=#0000ff>https://lists.sourceforge.net/lists/listinfo/jdbm-developer</FONT></A><BR><BR><<BR> <BR>------------------------------------------------------- This SF.net<BR>email is sponsored by: Splunk Inc. Do you grep through log files for<BR>problems? Stop! Download the new AJAX search engine that makes searching<BR>your log files as easy as surfing the web. DOWNLOAD SPLUNK!<BR><A href="http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642"><FONT color=#0000ff>http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642</FONT></A><BR>_______________________________________________ Jdbm-developer mailing<BR>list <A href="mailto:Jdb...@li..."><FONT color=#0000ff>Jdb...@li...</FONT></A><BR><A href="https://lists.sourceforge.net/lists/listinfo/jdbm-developer"><FONT color=#0000ff>https://lists.sourceforge.net/lists/listinfo/jdbm-developer</FONT></A><BR><BR><<BR></FONT></FONT></TD></TR></TBODY></TABLE></DIV></FONT></BODY></HTML> |
From: Thompson, B. B. <BRY...@sa...> - 2006-02-17 13:12:43
|
Alex, Ok. I am still getting my feet in this space and I need to think more about how the different kinds of inconsistencies could be permitted in a hybrid 2PL + MVCC strategy. 2PL seems straightforward -- and maybe MVCC is as well. I am thinking of the implementation in terms of 2PL hierarchical locking [1] support for rw synchronization plus MVCC support for ww synchronization (per [2], section 5.3). What I think I need to do is work through the interaction of those synchronization mechanisms in more depth, maybe do some modeling to support that, and generate some ideas for how this might interface with jdbm and DBCache. If you have a design in mind I am all ears. Regardless it seems that we could incrementally improve jdbm right now by: 1. integrating DBCache for VLR Tx support (but without concurrency). 2. integrating the b-link support into the b+tree to support index traversal under concurrent modification. -bryan PS: Gray laid out those isolation levels in [1], together with heirarchical locking. [1] J.N. Gray, R.A. Lorie, G.R. Putzolu, I.L. Traiger. Granularity of Locks and Degrees of Consistency in a Shared Data Base. http://www.seas.upenn.edu/~zives/cis650/papers/granularity-locks.pdf [2] Bernstein, P. A. and Goodman, N. 1981. Concurrency Control in Distributed Database Systems. ACM Comput. Surv. 13, 2 (Jun. 1981), 185-221. DOI= http://doi.acm.org/10.1145/356842.356846 http://www-static.cc.gatech.edu/classes/AY2003/cs8803i_fall/ConcurrencyContr ol.pdf -----Original Message----- From: Alex Boisvert To: Thompson, Bryan B. Cc: 'jdb...@li... '; ''Kevin Day ' '; ''JDBM Developer listserv ' ' Sent: 2/16/2006 7:18 PM Subject: Re: [Jdbm-developer] MC-DataSafe I think it would look similar to what industrial databases are doing today: -read uncommitted -read committed -repeatable read -serializable This is more about relaxing MVCC than locks. You can stack locking on top of these isolation levels for custom levels of conflict allowance or prevention. alex Thompson, Bryan B. wrote: >Alex, > >If you are going to relax conflict analysis, what is that going >to look like under the "proposed" scheme? Release of some locks >before all locks have been acquired? > >-bryan > >-----Original Message----- >From: jdb...@li... >To: 'Kevin Day ' >Cc: 'JDBM Developer listserv ' >Sent: 2/16/2006 6:45 PM >Subject: Re: [Jdbm-developer] MC-DataSafe > > >Serializability is more about how strict you are when doing conflict >analysis. > >To have total ordering (aka serializability) you must enforce that the >read and write set of a transaction are completely disjoint from the >write set of any transaction committed after the transaction started. > >alex > > >'Kevin Day ' wrote: > > > >>Alex- >> >>Maybe you can help - can you explain to me how MVCC could have >>anything to do serializability of transactions? Am I completely >>off-base in saying that adding read locking to MVCC would defeat the >>purpose? >> >>I'm obviously having a massive disconnect. >> >>Thanks, >> >>- K >> >> >> >> >> > Kevin, >> >>MVCC does provide serializability. The tradeoffs among correct >>concurrency control mechanisms involve performance not correctness. >>The problems with performance for MVCC are that, by itself, it tends >>to produce more transaction restarts and lower overall transaction >>throughput when compared to 2PL. Postgres is some sort of hybrid >>of 2PL and MVCC. I tried their chat group yesterday to get some >>insight into exactly what kind of a hybrid, but no one could say "Oh, >>we implemented XYZ". >> >>I do not think that MVCC (or any hybrid of MVCC) is a silver bullet >>for performance. Neither postgres nor Oracle is a clear performance >>winner -- in fact I have personal experience with the one of these >>platforms which places it an order of magnitude slower than jdbm for >>our application (an semantic web store) using a solution developer by >>the database people themselves. >> >>I recommend a closer reading of this paper. I have found later >> >> >articles > > >>which use analytic techniques to examine the performance of a variety >> >> >of > > >>concurrency control mechanisms, but none which benchmarks real >> >> >databases > > >>explicitly in terms of this issue. Perhaps we could look at some of >> >> >the > > >>RDBMS benchmark literature for some insight there? >> >>Enjoy your vacation, >> >>-bryan >> >>-----Original Message----- >>From: jdb...@li... >><mailto:jdb...@li...> >>To: JDBM Developer listserv >>Sent: 2/16/2006 4:48 PM >>Subject: re[2]: [Jdbm-developer] MC-DataSafe >> >>Bryan- >> >>Can you please provide a description of how MVCC (with non-serialized >>processing of transactions) would cause either of the two lost >> >> >data/bad > > >>data examples described in the paper? >> >>The way I read your email, it sounds like you are implying that MVCC >>can't provide transaction isolation or atomic updates, when it >>absolutely does. MVCC just operates optimistically in these >> >> >situations. > > >>Under MVCC in the first example, an abort would occur. In the second >>example under MVCC, the report created in T2 would be correct, and T1 >>would commit without problems. In either case, nothing gets left on >> >> >the > > >>floor... >> >>The massive advantage of MVCC comes in the second example: If the >>report required a significant amount of time to complete (it was >> >> >reading > > >>a huge number of rows), then serializing the transactions as described >>in the paper will cause massive slow downs for the poor sucker who is >>trying to execute T1 (and kill concurrency). In MVCC, the update >> >> >would > > >>happen immediately and return, even if T2 takes a long time to >> >> >complete. > > >>This absolutely maximizes concurrency, and is perfectly safe. >> >> >>I'm going to need some serious convincing with concrete examples >> >> >before > > >>I'm willing to favor serializable transactions over MVCC's approach. >> >> >My > > >>concern at this stage is that much of the literature on these topics >>either doesn't account for the radical departure from the norm that >> >> >MVCC > > >>is, or it is incorrect in it's description of how MVCC actually >> >> >operates > > >>in practice. >> >>I actually find it humorous that the concurrency control paper we are >>talking about here says that "the problem for nondistributed DMBSs is >>well understood... and one approach called two-phase locking has been >>accepted as a standard solution." That may have been true in 1981 >> >> >when > > >>the paper was published, but the good folks at PostgreSQL and others >>have pretty much blown that out of the water. >> >>This means that any broad claims made in these older papers have to be >>taken with a considerable grain of salt, and evaluated in the context >> >> >of > > >>the new development of MVCC. Bernstein and Goodman are certainly >> >> >going > > >>to say that 2 phase locking is the only way to go - they weren't even >>aware that a better approach was possible. >> >>Anyway - I'm going to be gone until the 28th - hopefully you and Alex >>can continue the discussion with vigour while I'm gone! >> >>Cheers, >> >>- K >> >> >> > > > >------------------------------------------------------- >This SF.net email is sponsored by: Splunk Inc. Do you grep through log >files >for problems? Stop! Download the new AJAX search engine that makes >searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! >http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=12164 2 >_______________________________________________ >Jdbm-developer mailing list >Jdb...@li... >https://lists.sourceforge.net/lists/listinfo/jdbm-developer > > |
From: Alex B. <boi...@in...> - 2006-02-17 21:37:16
|
I'm a little concerned about adding MVCC to DBCache after the fact. But then again I agree that moving from the current JDBM architecture to DBCache is more incremental. I could help with the implementation of B-link in parallel to the DBCache work. alex Thompson, Bryan B. wrote: >Regardless it seems that we could incrementally improve jdbm right >now by: > >1. integrating DBCache for VLR Tx support (but without concurrency). > >2. integrating the b-link support into the b+tree to support index >traversal under concurrent modification. > >-bryan > > |
From: Thompson, B. B. <BRY...@sa...> - 2006-02-18 00:41:10
|
Alex, I don't know how this will all play out either. I see the concurrency control as functioning over logical records and some higher level data structures (indices or application data structures). DBCache needs to make sure that different transactions do not overwrite the same region of a page since it must enforce the distinction between original page images and working copies (of a page or a region on a page) being modified by a tx and also distinguish between the original and the updated page image (after the commit). jdbm certainly delays putting serialized objects onto pages, which helps. A log structured store would help by focusing on appending rather than in place updated. Once a write is performed on a page image in DBCache it seems that other writers must be blocked for the (sub)page region which was updated. Also, once a tx commits, the original state for the changed parts of the page image are no longer available. I am beginning to believe that the DBCache semantics (transaction page image semantics) are what they are and that MVCC is purely in terms of the record level semantics. Perhaps the only way to make it work out cleanly is to NOT update pages containing live records (i.e., using a log structured store) and then to periodically purge old versions and recycle the pages from which those versions were collected. I can see how we could create more efficient record allocation strategies using this approach as well. -bryan -----Original Message----- From: Alex Boisvert To: Thompson, Bryan B. Cc: ''jdb...@li... ' '; '''Kevin Day ' ' '; '''JDBM Developer listserv ' ' ' Sent: 2/17/2006 4:36 PM Subject: Re: [Jdbm-developer] MC-DataSafe I'm a little concerned about adding MVCC to DBCache after the fact. But then again I agree that moving from the current JDBM architecture to DBCache is more incremental. I could help with the implementation of B-link in parallel to the DBCache work. alex Thompson, Bryan B. wrote: >Regardless it seems that we could incrementally improve jdbm right >now by: > >1. integrating DBCache for VLR Tx support (but without concurrency). > >2. integrating the b-link support into the b+tree to support index >traversal under concurrent modification. > >-bryan > > |
From: Alex B. <boi...@in...> - 2006-02-18 18:29:54
|
Thompson, Bryan B. wrote: >I am beginning to believe that the DBCache semantics (transaction page >image semantics) are what they are and that MVCC is purely in terms of >the record level semantics. Perhaps the only way to make it work out >cleanly is to NOT update pages containing live records (i.e., using a >log structured store) and then to periodically purge old versions and >recycle the pages from which those versions were collected. I can see >how we could create more efficient record allocation strategies using >this approach as well. > > Yes, I have a feeling that it's going to play out towards a log-structured approach (appending and recycling) as well. alex |
From: Alex B. <boi...@in...> - 2006-02-16 23:46:16
|
Serializability is more about how strict you are when doing conflict analysis. To have total ordering (aka serializability) you must enforce that the read and write set of a transaction are completely disjoint from the write set of any transaction committed after the transaction started. alex 'Kevin Day ' wrote: > Alex- > > Maybe you can help - can you explain to me how MVCC could have > anything to do serializability of transactions? Am I completely > off-base in saying that adding read locking to MVCC would defeat the > purpose? > > I'm obviously having a massive disconnect. > > Thanks, > > - K > > > > > > Kevin, > > MVCC does provide serializability. The tradeoffs among correct > concurrency control mechanisms involve performance not correctness. > The problems with performance for MVCC are that, by itself, it tends > to produce more transaction restarts and lower overall transaction > throughput when compared to 2PL. Postgres is some sort of hybrid > of 2PL and MVCC. I tried their chat group yesterday to get some > insight into exactly what kind of a hybrid, but no one could say "Oh, > we implemented XYZ". > > I do not think that MVCC (or any hybrid of MVCC) is a silver bullet > for performance. Neither postgres nor Oracle is a clear performance > winner -- in fact I have personal experience with the one of these > platforms which places it an order of magnitude slower than jdbm for > our application (an semantic web store) using a solution developer by > the database people themselves. > > I recommend a closer reading of this paper. I have found later articles > which use analytic techniques to examine the performance of a variety of > concurrency control mechanisms, but none which benchmarks real databases > explicitly in terms of this issue. Perhaps we could look at some of the > RDBMS benchmark literature for some insight there? > > Enjoy your vacation, > > -bryan > > -----Original Message----- > From: jdb...@li... > <mailto:jdb...@li...> > To: JDBM Developer listserv > Sent: 2/16/2006 4:48 PM > Subject: re[2]: [Jdbm-developer] MC-DataSafe > > Bryan- > > Can you please provide a description of how MVCC (with non-serialized > processing of transactions) would cause either of the two lost data/bad > data examples described in the paper? > > The way I read your email, it sounds like you are implying that MVCC > can't provide transaction isolation or atomic updates, when it > absolutely does. MVCC just operates optimistically in these situations. > Under MVCC in the first example, an abort would occur. In the second > example under MVCC, the report created in T2 would be correct, and T1 > would commit without problems. In either case, nothing gets left on the > floor... > > The massive advantage of MVCC comes in the second example: If the > report required a significant amount of time to complete (it was reading > a huge number of rows), then serializing the transactions as described > in the paper will cause massive slow downs for the poor sucker who is > trying to execute T1 (and kill concurrency). In MVCC, the update would > happen immediately and return, even if T2 takes a long time to complete. > This absolutely maximizes concurrency, and is perfectly safe. > > > I'm going to need some serious convincing with concrete examples before > I'm willing to favor serializable transactions over MVCC's approach. My > concern at this stage is that much of the literature on these topics > either doesn't account for the radical departure from the norm that MVCC > is, or it is incorrect in it's description of how MVCC actually operates > in practice. > > I actually find it humorous that the concurrency control paper we are > talking about here says that "the problem for nondistributed DMBSs is > well understood... and one approach called two-phase locking has been > accepted as a standard solution." That may have been true in 1981 when > the paper was published, but the good folks at PostgreSQL and others > have pretty much blown that out of the water. > > This means that any broad claims made in these older papers have to be > taken with a considerable grain of salt, and evaluated in the context of > the new development of MVCC. Bernstein and Goodman are certainly going > to say that 2 phase locking is the only way to go - they weren't even > aware that a better approach was possible. > > Anyway - I'm going to be gone until the 28th - hopefully you and Alex > can continue the discussion with vigour while I'm gone! > > Cheers, > > - K > |
From: 'Kevin D. ' <ke...@tr...> - 2006-02-16 23:51:36
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD> <STYLE type=text/css> P, UL, OL, DL, DIR, MENU, PRE { margin: 0 auto;}</STYLE> <META content="MSHTML 6.00.2900.2802" name=GENERATOR></HEAD> <BODY leftMargin=1 topMargin=1 rightMargin=1><FONT face=Tahoma> <DIV><FONT face=Arial size=2>Alex-</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2></FONT></DIV> <DIV><FONT face=Arial size=2>Does that imply that enforcing serializability would defeat the purpose of MVCC, or are these totally separate issues? I guess if you had sufficient information before any of the transactions were started you could come up with an optimal strategy - but that seems like its way outside the realm of practical implementation for a general purpose database...</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>- K</FONT></DIV> <DIV><FONT face=Arial size=2> </FONT> <TABLE> <TBODY> <TR> <TD width=1 bgColor=blue><FONT face=Arial size=2></FONT></TD> <TD><FONT face=Arial size=2><FONT color=red>> <BR>Serializability is more about how strict you are when doing conflict<BR>analysis.<BR><BR>To have total ordering (aka serializability) you must enforce that the<BR>read and write set of a transaction are completely disjoint from the<BR>write set of any transaction committed after the transaction started.<BR><BR>alex<BR><BR><BR>'Kevin Day ' wrote:<BR><BR>> Alex-<BR>> <BR>> Maybe you can help - can you explain to me how MVCC could have<BR>> anything to do serializability of transactions? Am I completely<BR>> off-base in saying that adding read locking to MVCC would defeat the<BR>> purpose?<BR>> <BR>> I'm obviously having a massive disconnect.<BR>> <BR>> Thanks,<BR>> <BR>> - K<BR>> <BR>> <BR>> <BR>><BR>> > Kevin,<BR>><BR>> MVCC does provide serializability. The tradeoffs among correct<BR>> concurrency control mechanisms involve performance not correctness.<BR>> The problems with performance for MVCC are that, by itself, it tends<BR>> to produce more transaction restarts and lower overall transaction<BR>> throughput when compared to 2PL. Postgres is some sort of hybrid<BR>> of 2PL and MVCC. I tried their chat group yesterday to get some<BR>> insight into exactly what kind of a hybrid, but no one could say "Oh,<BR>> we implemented XYZ".<BR>><BR>> I do not think that MVCC (or any hybrid of MVCC) is a silver bullet<BR>> for performance. Neither postgres nor Oracle is a clear performance<BR>> winner -- in fact I have personal experience with the one of these<BR>> platforms which places it an order of magnitude slower than jdbm for<BR>> our application (an semantic web store) using a solution developer by<BR>> the database people themselves.<BR>><BR>> I recommend a closer reading of this paper. I have found later articles<BR>> which use analytic techniques to examine the performance of a variety of<BR>> concurrency control mechanisms, but none which benchmarks real databases<BR>> explicitly in terms of this issue. Perhaps we could look at some of the<BR>> RDBMS benchmark literature for some insight there?<BR>><BR>> Enjoy your vacation,<BR>><BR>> -bryan<BR>><BR>> -----Original Message-----<BR>> From: <A href="mailto:jdb...@li..."><FONT color=#0000ff>jdb...@li...</FONT></A><BR>> <A href="mailto:jdb...@li..."><FONT color=#0000ff><mailto:jdb...@li...></FONT></A><BR>> To: JDBM Developer listserv<BR>> Sent: 2/16/2006 4:48 PM<BR>> Subject: re[2]: [Jdbm-developer] MC-DataSafe<BR>><BR>> Bryan-<BR>><BR>> Can you please provide a description of how MVCC (with non-serialized<BR>> processing of transactions) would cause either of the two lost data/bad<BR>> data examples described in the paper?<BR>><BR>> The way I read your email, it sounds like you are implying that MVCC<BR>> can't provide transaction isolation or atomic updates, when it<BR>> absolutely does. MVCC just operates optimistically in these situations.<BR>> Under MVCC in the first example, an abort would occur. In the second<BR>> example under MVCC, the report created in T2 would be correct, and T1<BR>> would commit without problems. In either case, nothing gets left on the<BR>> floor...<BR>><BR>> The massive advantage of MVCC comes in the second example: If the<BR>> report required a significant amount of time to complete (it was reading<BR>> a huge number of rows), then serializing the transactions as described<BR>> in the paper will cause massive slow downs for the poor sucker who is<BR>> trying to execute T1 (and kill concurrency). In MVCC, the update would<BR>> happen immediately and return, even if T2 takes a long time to complete.<BR>> This absolutely maximizes concurrency, and is perfectly safe.<BR>><BR>><BR>> I'm going to need some serious convincing with concrete examples before<BR>> I'm willing to favor serializable transactions over MVCC's approach. My<BR>> concern at this stage is that much of the literature on these topics<BR>> either doesn't account for the radical departure from the norm that MVCC<BR>> is, or it is incorrect in it's description of how MVCC actually operates<BR>> in practice.<BR>><BR>> I actually find it humorous that the concurrency control paper we are<BR>> talking about here says that "the problem for nondistributed DMBSs is<BR>> well understood... and one approach called two-phase locking has been<BR>> accepted as a standard solution." That may have been true in 1981 when<BR>> the paper was published, but the good folks at PostgreSQL and others<BR>> have pretty much blown that out of the water.<BR>><BR>> This means that any broad claims made in these older papers have to be<BR>> taken with a considerable grain of salt, and evaluated in the context of<BR>> the new development of MVCC. Bernstein and Goodman are certainly going<BR>> to say that 2 phase locking is the only way to go - they weren't even<BR>> aware that a better approach was possible.<BR>><BR>> Anyway - I'm going to be gone until the 28th - hopefully you and Alex<BR>> can continue the discussion with vigour while I'm gone!<BR>><BR>> Cheers,<BR>><BR>> - K<BR>><BR><BR><BR><BR>-------------------------------------------------------<BR>This SF.net email is sponsored by: Splunk Inc. Do you grep through log files<BR>for problems? Stop! Download the new AJAX search engine that makes<BR>searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!<BR><A href="http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642"><FONT color=#0000ff>http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642</FONT></A><BR>_______________________________________________<BR>Jdbm-developer mailing list<BR><A href="mailto:Jdb...@li..."><FONT color=#0000ff>Jdb...@li...</FONT></A><BR><A href="https://lists.sourceforge.net/lists/listinfo/jdbm-developer"><FONT color=#0000ff>https://lists.sourceforge.net/lists/listinfo/jdbm-developer</FONT></A><BR><BR><<BR></FONT></FONT></TD></TR></TBODY></TABLE></DIV></FONT></BODY></HTML> |
From: Alex B. <boi...@in...> - 2006-02-17 00:08:13
|
Total ordering doesn't defeat the purpose of MVCC, you still get the benefit that a lot of transaction patterns can run concurrently without conflicting. In fact, optimistic locking provides the widest permissible range of non-conflicting transactions to execute concurrently. However, when transaction are potentially conflicting and there's contention for resources, pessimistic locking has proven to yield higher performance. If you know the read and write set of transactions in advance, it becomes trivial to order them for near-optimal performance. But in reality, you don't know in advance. That's why pessimistic locking comes into play. It's an easy way to obtain the read+write sets as you go so you can reduce the number of unnecessary rollbacks and maximize usage of non-contended resources. alex 'Kevin Day ' wrote: > Alex- > > Does that imply that enforcing serializability would defeat the > purpose of MVCC, or are these totally separate issues? I guess if you > had sufficient information before any of the transactions were started > you could come up with an optimal strategy - but that seems like its > way outside the realm of practical implementation for a general > purpose database... > > - K > > > > > Serializability is more about how strict you are when doing conflict > analysis. > > To have total ordering (aka serializability) you must enforce that the > read and write set of a transaction are completely disjoint from the > write set of any transaction committed after the transaction started. > > alex > |