Thread: [Prevayler-discussion] Prevayler and thread-safety
Brought to you by:
jsampson,
klauswuestefeld
From: Sergey D. <ser...@gm...> - 2009-01-25 10:19:51
|
Moving our discussion to the mail list... > > Hi, > > I'm considering using Prevayler for a web application. Could you answer a question about Prevayler and thread-safety? > > -------------------------------------------------------------- > > Prevayler guarantees that all the writes ( through its transactions) are synchronized. But what about reads? > > Is it right that dirty reads are possible if no explicit synchronizing is used (in user code)? > > Are they possible if a business object is read as: > > // get the 3rd account > > Accont account = (Bank)prevayler.prevalentSystem().getAccounts().get(2); > > ? > > If so what synchronizing strategies are good for a user code? > > (Consider a business object A contains a collection of business objects Bs), > > using a synchronized collection (of Bs inside of A), for example from java.util.concurrent package? > > synchronize collection reads outside transactions with the collection writes inside transactions, for example using "synchronized( collection )" code around reads and writes? > > -------------------------------------------------------------- > > Cheers, Sergey Klaus: > > Hi Sergey, >Take a look at the javadoc for Prevayler.execute(Query) >See you, Klaus. |
From: Sergey D. <ser...@gm...> - 2009-01-25 10:39:28
Attachments:
src.zip
|
Sorry for the mess, here is my answer in the plain text: First of all thanks for you answer! I wrote a simple PoC for sensible queries and it clealry shows that dirty reads are still possible. That happens because JMatch does not make deep copies of the matched objects. It only copies the references to them. So if a user gets anything other than atomic values or immutable objects, he can observe dirty reads. I attach my PoC( testing code) |
From: William P. <wi...@sc...> - 2009-01-25 11:08:41
|
Hi, Sergey. You are right that access like your examples is indeed unsynchronized. Klaus is right that if you want to do synchronized reads, you execute queries. This is the method to use: http://docs.rakeshv.org/java/prevayler/org/prevayler/Prevayler.html#execute(org.prevayler.Query) The simple way to think of it is that Prevayler provides transactional isolation by executing commands one at a time. So if you need a query that is isolated from all writes, package it up as a command object and feed it to the Prevayler object to execute. Does that help? William Sergey Didenko wrote: > Moving our discussion to the mail list... > > >> Hi, >> >> I'm considering using Prevayler for a web application. Could you answer a question about Prevayler and thread-safety? >> >> -------------------------------------------------------------- >> >> Prevayler guarantees that all the writes ( through its transactions) are synchronized. But what about reads? >> >> Is it right that dirty reads are possible if no explicit synchronizing is used (in user code)? >> >> Are they possible if a business object is read as: >> >> // get the 3rd account >> >> Accont account = (Bank)prevayler.prevalentSystem().getAccounts().get(2); >> >> ? >> >> If so what synchronizing strategies are good for a user code? >> >> (Consider a business object A contains a collection of business objects Bs), >> >> using a synchronized collection (of Bs inside of A), for example from java.util.concurrent package? >> >> synchronize collection reads outside transactions with the collection writes inside transactions, for example using "synchronized( collection )" code around reads and writes? >> >> -------------------------------------------------------------- >> >> Cheers, Sergey >> > > > Klaus: > >> Hi Sergey, >> > > >> Take a look at the javadoc for Prevayler.execute(Query) >> > > >> See you, Klaus. >> > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by: > SourcForge Community > SourceForge wants to tell your story. > http://p.sf.net/sfu/sf-spreadtheword > _______________________________________________ > To unsubscribe go to the end of this page: http://lists.sourceforge.net/lists/listinfo/prevayler-discussion > _______________________________________________ > "Databases in Memoriam" -- http://www.prevayler.org > |
From: Sergey D. <ser...@gm...> - 2009-01-25 12:35:17
|
Hi William, My point is: 1. Sensible query is indeed synchronized with commands (Transactions), 2. The code that accesses query result is not synchronized with Transactions So after a user safely executes a sensible query, she is going to unsafely access the query results ( unless they are atomic values or immutable objects). So the solution is either 1. to implement deep object cloning in JMatch ( I hope I did not miss if it is already there :) ) or 2. wrap object accesses into additional "synchronize( workingObject )" (or use java.util.concurrent features) Are you proposing to 3. Move complex read-only logic into separate Transactions ? Is there anything suitable in JMatch that I missed? Cheers, Sergey |
From: Klaus W. <kla...@gm...> - 2009-01-25 13:43:43
|
> 2. The code that accesses query result is not synchronized with Transactions Why not? Can you not treat every http POST as a transaction and every http GET as a sensitive query? You can do that in a single point in your code and then forget all about transactions and queries. Logically, you will be inside a web app in RAM that never crashes. See you, Klaus. |
From: Sergey D. <ser...@gm...> - 2009-01-25 13:53:36
|
Klaus, could you clarify? I don't quite understand your explanation. You can run my example code on multiprocessor system to see that it's quite possible for a reading thread to observe inconsistent results. Though it takes the result from "execute( sensibleQuery )" On Sun, Jan 25, 2009 at 3:43 PM, Klaus Wuestefeld <kla...@gm...> wrote: >> 2. The code that accesses query result is not synchronized with Transactions > > Why not? > > Can you not treat every http POST as a transaction and every http GET > as a sensitive query? > > You can do that in a single point in your code and then forget all > about transactions and queries. Logically, you will be inside a web > app in RAM that never crashes. > > See you, Klaus. > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by: > SourcForge Community > SourceForge wants to tell your story. > http://p.sf.net/sfu/sf-spreadtheword > _______________________________________________ > To unsubscribe go to the end of this page: http://lists.sourceforge.net/lists/listinfo/prevayler-discussion > _______________________________________________ > "Databases in Memoriam" -- http://www.prevayler.org > |
From: Klaus W. <kla...@gm...> - 2009-01-25 14:05:16
|
> > Can you not treat every http POST as a transaction and every http GET > > as a sensitive query? > Klaus, could you clarify? I don't quite understand your explanation. Wrap Prevayer around your entire web app, not only your business logic. So now every code you execute is either a Transaction (from http POSTs) or a synchronized query (from http GETs). There is no more accessing business object code from "outside" because there is no more "outside". See you, Klaus. |
From: William P. <wi...@sc...> - 2009-01-25 17:31:57
|
Klaus Wuestefeld wrote: > Wrap Prevayer around your entire web app, not only your business logic. > > So now every code you execute is either a Transaction (from http > POSTs) or a synchronized query (from http GETs). There is no more > accessing business object code from "outside" because there is no more > "outside". For those trying this in practice, make sure your output is properly buffered. Otherwise one user on a slow connection will hang your system from time to time. :-) Klaus, this discussion makes me wonder: does Prevayler execute multiple simultaneous queries in parallel? Writes, I'm sure, are one at a time. But given that most web apps are read-heavy, and given the ever-increasing number or cores available, it would make sense to do your GETs in simultaneous batches once you reach a certain level of load. William |
From: Sergey D. <ser...@gm...> - 2009-01-25 17:35:02
|
Klaus, this does not solve "dirty-reads" problem. It just lowers its possibility, because there is less time between query ends and query results read. Also it can decrease performance on multi-processor system, because every http POST request blocks other POST requests from the very start. That means that 7 other processor cores wait for a single POST request to finish. I'm totally ok with writing "synchronized" clauses, I just thought somebody has experience with preventing "dirty-reads" and can suggest a safe and efficient pattern. > Wrap Prevayer around your entire web app, not only your business logic. > > So now every code you execute is either a Transaction (from http > POSTs) or a synchronized query (from http GETs). There is no more > accessing business object code from "outside" because there is no more > "outside". |
From: William P. <wi...@sc...> - 2009-01-25 13:09:49
|
Hi, Sergey. Thanks for explaining further. I've never used JMatch, so I can't speak to that. Looking through the Prevayler code I've worked on, I see four patterns in our query response object: * returning immutable domain objects (say 30% of queries) * building data transfer objects (40%, generally as display-layer objects) * building meaningful result objects (20%) * not caring (10%) I was expecting to find some deep copies, but didn't see any. In the "not caring" case, it's because displaying updates would be either harmless or beneficial. I suspect that's not much help to you, but it's the best I've got myself. Perhaps others more familiar with JMatch will have better advice. William Sergey Didenko wrote: > Hi William, > > My point is: > > 1. Sensible query is indeed synchronized with commands (Transactions), > 2. The code that accesses query result is not synchronized with Transactions > > So after a user safely executes a sensible query, she is going to > unsafely access the query results ( unless they are atomic values or > immutable objects). > > So the solution is either > 1. to implement deep object cloning in JMatch ( I hope I did not miss > if it is already there :) ) > or > 2. wrap object accesses into additional "synchronize( workingObject )" > (or use java.util.concurrent features) > > Are you proposing to > 3. Move complex read-only logic into separate Transactions ? > > Is there anything suitable in JMatch that I missed? > > Cheers, Sergey > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by: > SourcForge Community > SourceForge wants to tell your story. > http://p.sf.net/sfu/sf-spreadtheword > _______________________________________________ > To unsubscribe go to the end of this page: http://lists.sourceforge.net/lists/listinfo/prevayler-discussion > _______________________________________________ > "Databases in Memoriam" -- http://www.prevayler.org > |
From: Sergey D. <ser...@gm...> - 2009-01-25 13:53:55
|
Hi William, Thank you a lot for your detailed answer. As for the "real-world" cases I suppose these dirty-reads can be observed in rare cases under high concurrent load. So it's still ok for a lot of applications. However that can lead to hard-to-catch bugs for more delicate applications. On Sun, Jan 25, 2009 at 3:09 PM, William Pietri <wi...@sc...> wrote: > Hi, Sergey. Thanks for explaining further. > > I've never used JMatch, so I can't speak to that. > > Looking through the Prevayler code I've worked on, I see four patterns in > our query response object: > > returning immutable domain objects (say 30% of queries) > building data transfer objects (40%, generally as display-layer objects) > building meaningful result objects (20%) > not caring (10%) > > I was expecting to find some deep copies, but didn't see any. > > In the "not caring" case, it's because displaying updates would be either > harmless or beneficial. > > I suspect that's not much help to you, but it's the best I've got myself. > Perhaps others more familiar with JMatch will have better advice. > > William > > > Sergey Didenko wrote: > > Hi William, > > My point is: > > 1. Sensible query is indeed synchronized with commands (Transactions), > 2. The code that accesses query result is not synchronized with Transactions > > So after a user safely executes a sensible query, she is going to > unsafely access the query results ( unless they are atomic values or > immutable objects). > > So the solution is either > 1. to implement deep object cloning in JMatch ( I hope I did not miss > if it is already there :) ) > or > 2. wrap object accesses into additional "synchronize( workingObject )" > (or use java.util.concurrent features) > > Are you proposing to > 3. Move complex read-only logic into separate Transactions ? > > Is there anything suitable in JMatch that I missed? > > Cheers, Sergey > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by: > SourcForge Community > SourceForge wants to tell your story. > http://p.sf.net/sfu/sf-spreadtheword > _______________________________________________ > To unsubscribe go to the end of this page: > http://lists.sourceforge.net/lists/listinfo/prevayler-discussion > _______________________________________________ > "Databases in Memoriam" -- http://www.prevayler.org > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by: > SourcForge Community > SourceForge wants to tell your story. > http://p.sf.net/sfu/sf-spreadtheword > _______________________________________________ > To unsubscribe go to the end of this page: > http://lists.sourceforge.net/lists/listinfo/prevayler-discussion > _______________________________________________ > "Databases in Memoriam" -- http://www.prevayler.org > > |
From: William P. <wi...@sc...> - 2009-01-25 17:54:35
|
Sergey Didenko wrote: > Hi William, > > Thank you a lot for your detailed answer. > > As for the "real-world" cases I suppose these dirty-reads can be > observed in rare cases under high concurrent load. So it's still ok > for a lot of applications. However that can lead to hard-to-catch bugs > for more delicate applications. Yeah, the 10% or so where we passed out mutable domain objects were things where we expected the changes were fine. To take a made-up example, consider a simple web forum, where the MemberProfile object might have some strings like display name, location, and favorite quote. If somebody updated all three fields at once, it's possible that another user looking at that profile might see only one of them updated on that viewing. But nobody cares, so in this case it's not worth copying the data while in the query. Naturally, this isn't something you'd do most of the time, as you're right, it can lead to subtle bugs. William |
From: Justin T. S. <ju...@kr...> - 2009-01-25 20:06:59
|
Replying to the whole thread... Sergey, yes, everything you've said is true. Unfortunately, the demos included with Prevayler are not threadsafe, so they're no help here. I almost always use Query and make sure it returns an immutable result. And I do advocate your #3: "Move complex read-only logic into separate Transactions." Well, Queries rather than Transactions. The point is, I think that just doing "CRUD"-style Queries and Transactions with complex logic on the outside is the wrong way to go; it kind of misses the point of Prevayler. Your business logic should be in your business objects, which should be in your prevalent system. The options William described are perfect. I can't really endorse the "not caring" case because it's easy to go just a little too far: It *might* be okay if all the fields in question are strings or primitives (except longs), but the problem with unsynchronized access to shared state is that you really don't know what you're going to get: In general, you might see things that synchronized access would *never* see (not just out-of-date values). For example, if you try to read from a HashMap in one thread while it's being updated in another thread, you could easily get a NullPointerException or even go into an infinite loop. As for running multiple queries concurrently, Prevayler doesn't currently (as of 2.3) support that, but probably will soon since it gets requested so often. (I did implement it on the java5_experiment branch.) However, even so, I wouldn't want a Query to actually be doing output; it should still really only be accessing the prevalent system and getting out as quickly as possible. I do occasionally go with the style of rendering the HTTP response within a Query; however, I would do that by writing to a StringBuilder and returning that from the Query, not by writing out to the client directly. I once did implement something very close to what Klaus described, wrapping each HTTP request in a Prevayler transaction or query. (We did it at NewEdu, where William and I worked together.) We saw it as the first step in adding Prevayler to a system that had no persistence at all yet. We were able to drop Prevayler into the system in a few days, most of which time was testing. For total correctness, we actually started by wrapping *all* requests into Transactions, just in case some GETs did modify the system. Then we gradually converted most over to Queries for performance. Over time, we ended up factoring various parts of the code out of those Transactions and Queries, as they didn't *quite* make sense to be exactly the same layer of code as the requests coming in. Cheers, Justin |
From: William P. <wi...@sc...> - 2009-01-25 20:26:43
|
Sergey Didenko wrote: >> Wrap Prevayer around your entire web app, not only your business logic. >> >> So now every code you execute is either a Transaction (from http >> POSTs) or a synchronized query (from http GETs). There is no more >> accessing business object code from "outside" because there is no more >> "outside". >> > Klaus, this does not solve "dirty-reads" problem. It just lowers its > possibility, because there is less time between query ends and query > results read. > How do you mean? It would seem to me that if you do all of your request handling, from initial parameter processing to final output buffer writes, inside a query or a transaction, then there is no time for dirty reads. > Also it can decrease performance on multi-processor system, because > every http POST request blocks other POST requests from the very > start. That means that 7 other processor cores wait for a single POST > request to finish. That is definitely a problem in theory, but for a lot of applications, it may not matter much. Typical web applications are very read-heavy. A prevalent system gives you such a performance boost compared with a database-backed system that the global write lock could still be much more efficient. When you start to push the boundaries of that, you could invest in finer-grained locking. Or you could start to distribute your app across multiple machines. Unless you're pretty sure that your load will plateau at a level where a global write lock is insufficient but you won't need multiple machines, it may be a better use of development time to skip the fine-grained locking and go right for splitting your app up. Either way, going with a global write lock would buy you a lot of time with pretty low development overhead. For a typical consumer traffic mix, I'm sure you could get 10m dynamic pageviews/month like that on a single commodity server, and I wouldn't be surprised if you topped 50m. William |
From: Sergey D. <ser...@gm...> - 2009-01-25 21:07:42
|
Thanks for explanations guys! Now I see that you propose to extend net.sourceforge.javamatch.query.MatchQuery for every sensible query that is more complex that standard javamatch queries. And to return only atomic values/ deep object copies / immutable objects. Also I see why you think that can be superficial than fine-grained locking - using one global lock scheme can speed up the development. BTW, it would be really good to put this info to the site. Unfortunately Prevayler has very low exposure in programmers community On Sun, Jan 25, 2009 at 10:26 PM, William Pietri <wi...@sc...> wrote: > Sergey Didenko wrote: > > Wrap Prevayer around your entire web app, not only your business logic. > > So now every code you execute is either a Transaction (from http > POSTs) or a synchronized query (from http GETs). There is no more > accessing business object code from "outside" because there is no more > "outside". > > > Klaus, this does not solve "dirty-reads" problem. It just lowers its > possibility, because there is less time between query ends and query > results read. > > > How do you mean? It would seem to me that if you do all of your request > handling, from initial parameter processing to final output buffer writes, > inside a query or a transaction, then there is no time for dirty reads. > > Also it can decrease performance on multi-processor system, because > every http POST request blocks other POST requests from the very > start. That means that 7 other processor cores wait for a single POST > request to finish. > > That is definitely a problem in theory, but for a lot of applications, it > may not matter much. Typical web applications are very read-heavy. A > prevalent system gives you such a performance boost compared with a > database-backed system that the global write lock could still be much more > efficient. When you start to push the boundaries of that, you could invest > in finer-grained locking. Or you could start to distribute your app across > multiple machines. > > Unless you're pretty sure that your load will plateau at a level where a > global write lock is insufficient but you won't need multiple machines, it > may be a better use of development time to skip the fine-grained locking and > go right for splitting your app up. Either way, going with a global write > lock would buy you a lot of time with pretty low development overhead. For a > typical consumer traffic mix, I'm sure you could get 10m dynamic > pageviews/month like that on a single commodity server, and I wouldn't be > surprised if you topped 50m. > > William > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by: > SourcForge Community > SourceForge wants to tell your story. > http://p.sf.net/sfu/sf-spreadtheword > _______________________________________________ > To unsubscribe go to the end of this page: > http://lists.sourceforge.net/lists/listinfo/prevayler-discussion > _______________________________________________ > "Databases in Memoriam" -- http://www.prevayler.org > > |
From: Sergey D. <ser...@gm...> - 2009-01-27 21:21:10
|
Guys, I study Prevayler further and see other cases where a feeling of false (thread-)safety can occur. Consider TransactionWithQuery. There is no warning neither in API nor in the documentation that returning a mutable business object can lead to a dirty-read in multithreading application. May be it would be good to have a special class that makes a deep cloning of result, may be it would just be enough to write a warning in the javadoc. May be it suffices to write a special article on the site like for the "baptism problem". I hope I will come up with good ideas about this later, during my studying/ using of Prevayler. However I want to focus your attention on this problem. Cheers, Sergey |
From: Klaus W. <kla...@gm...> - 2009-01-27 23:10:40
|
Our hand-holding responsibilities only go so far. The baptism problem is something introduced by serialization, so we document it. The fact that multithreaded code is tricky is not introduced by Prevayler. The issues would still exist even if you had an invulnerable VM in RAM, without Prevayler. The question you have to ask yourself before using Prevayler is: "If I had an invulnerable VM, would that be cool? Would I be capable of using it? Or am I too DBMS-atrophied?" With great power comes great responsibility. :P See you, Klaus. On Tue, Jan 27, 2009 at 7:19 PM, Sergey Didenko <ser...@gm...> wrote: > Guys, I study Prevayler further and see other cases where a feeling of > false (thread-)safety can occur. > > Consider TransactionWithQuery. There is no warning neither in API nor > in the documentation that returning a mutable business object can lead > to a dirty-read in multithreading application. May be it would be good > to have a special class that makes a deep cloning of result, may be it > would just be enough to write a warning in the javadoc. May be it > suffices to write a special article on the site like for the "baptism > problem". > > I hope I will come up with good ideas about this later, during my > studying/ using of Prevayler. However I want to focus your attention > on this problem. > > Cheers, Sergey > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by: > SourcForge Community > SourceForge wants to tell your story. > http://p.sf.net/sfu/sf-spreadtheword > _______________________________________________ > To unsubscribe go to the end of this page: http://lists.sourceforge.net/lists/listinfo/prevayler-discussion > _______________________________________________ > "Databases in Memoriam" -- http://www.prevayler.org > |
From: <mi...@ik...> - 2009-01-28 19:39:16
|
Hi. If you are using multiple threads in your application, you have a responsibility to synchronize carefully. This is nothing new. At least Prevayler helps you with the updates. But reading shared data in concurrent threads may require locks of some kind. I still think Prevayler enables something not even possible with many other frameworks, to actually use the "original" object, for good and bad. The performance, simplicity and small code overhead is simply unrivaled in my eyes. I fully understand your concern about how to successfully address concurrent access to the data model. It is no simple problem. I can tell you that we've thought of different solutions in (one of) my current project, and have come to the conclusion to deep clone most results from our queries. We actually started with the idea to maximize use of immutable objects, but it is a pain having immutable objects in the data model, if they are not "natural" immutable or "almost primitives". Multiple references to an immutable object is impractical when it needs updating etc. So my recommendation is: Use immutable for "leaf" objects, and objects that actually shouldn't change. (Kind of obvious if you think about it.) Do not use immutable for objects like users, preferences etc. We deep clone our objects using serialization, because we're lazy. It's a very simple way to make deep clones, and (often) one that people feel comfortable with and know possible side-effects of. Another way not to expose the data model outside of Prevayler by having custom built responses for queries (I find this a bit "unproductive"), or wrap everything inside Prevayler (in which case you are in practice single-threaded). Good luck! /Mikael > Guys, I study Prevayler further and see other cases where a feeling of > false (thread-)safety can occur. > > Consider TransactionWithQuery. There is no warning neither in API nor > in the documentation that returning a mutable business object can lead > to a dirty-read in multithreading application. May be it would be good > to have a special class that makes a deep cloning of result, may be it > would just be enough to write a warning in the javadoc. May be it > suffices to write a special article on the site like for the "baptism > problem". > > I hope I will come up with good ideas about this later, during my > studying/ using of Prevayler. However I want to focus your attention > on this problem. > > Cheers, Sergey > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by: > SourcForge Community > SourceForge wants to tell your story. > http://p.sf.net/sfu/sf-spreadtheword > _______________________________________________ > To unsubscribe go to the end of this page: > http://lists.sourceforge.net/lists/listinfo/prevayler-discussion > _______________________________________________ > "Databases in Memoriam" -- http://www.prevayler.org > |
From: Klaus W. <kla...@gm...> - 2009-01-28 22:11:58
|
Nice comments :) > or > wrap everything inside Prevayler (in which case you are in practice > single-threaded). Actually you can have queries running in parallel with Prevayler using a system-wide read/write lock for queries/transactions. Justin and I have already independently implemented that in experimental future versions of Prevayler. We just have to back-port that. Justin can do it in 3 mins, I in about 47. Then, you just have to make sure you locally synchronize the lazy-inits/evals, which is pretty simple. See you, Klaus. |
From: Sergey D. <ser...@gm...> - 2009-01-29 07:23:49
|
Thanks Mikael! Klaus, my main point is to make all these considerations explicit, so that people don't have to study Prevayler and JMatch code deeply to write their prevaylent (safe) multithreaded application. |
From: Klaus W. <kla...@gm...> - 2009-01-29 15:07:42
|
OK On Thu, Jan 29, 2009 at 5:23 AM, Sergey Didenko <ser...@gm...> wrote: > Thanks Mikael! > > Klaus, my main point is to make all these considerations explicit, so > that people don't have to study Prevayler and JMatch code deeply to > write their prevaylent (safe) multithreaded application. > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by: > SourcForge Community > SourceForge wants to tell your story. > http://p.sf.net/sfu/sf-spreadtheword > _______________________________________________ > To unsubscribe go to the end of this page: http://lists.sourceforge.net/lists/listinfo/prevayler-discussion > _______________________________________________ > "Databases in Memoriam" -- http://www.prevayler.org > |