Thread: [SQLObject] collision detection strategy
SQLObject is a Python ORM.
Brought to you by:
ianbicking,
phd
From: Bud P. B. <bu...@si...> - 2003-05-20 22:33:58
|
Had an idea for collision detection (see http://www.agiledata.org/essays/concurrencyControl.html) and it seems quite a bit simpler to what is described in the above essay. Can someone comment on whether the following is sound? * The shadow data of an object includes a timestamp (or better GUID) * every INSERT and UPDATE query changes this timestamp. * The (in memory) python object keeps the timestamp information from the last select, insert, or update query. * When the state of an object is written to the dbms (UPDATE query), collision detection is necessary to make sure noone else modified the object since... * this can be done by the following query: UPDATE table SET col1=val1, ..., colN = valN, timeStamp = <newTimeStamp> WHERE id=<id> AND timeStamp = <oldTimeStamp> This works only if none else has modified the record (since otherwise, timeStamp is not equal <oldTimeStamp>). With the DB-API, it seems the way to determine whether the update was successful or a collision prevented the database from being changed is to look at the rowcount attribute of the cursor: 1 means success, 0 means collision. I checked this with psycopg and it seems to work. I suppose it would also be sound to wrap this in a transaction... Does anyone see any problems with this approach? many thanks --b /----------------------------------------------------------------- | Bud P. Bruegger, Ph.D. | Sistema (www.sistema.it) | Via U. Bassi, 54 | 58100 Grosseto, Italy | +39-0564-411682 (voice and fax) \----------------------------------------------------------------- |
From: Matt G. <ma...@po...> - 2003-05-20 23:07:16
|
I've used this technique (optimistic locking) before and it works very well. It's the only way for web work. The only thing I would say is that a timestamp is not the ideal indicator of change. It might be unlikely, but it is possible for a database record to be read and updated without the timestamp's value changing. In fact, some database timestamps (Oracle 8i, IIRC) only have a 1 second accuracy which increases the chance of missed collisions considerably. An incrementing integer version number is possibly a better choice. The only problem with this is when the number gets to big for the column - PostgreSQL, for instance, considers it an error. Cheers, Matt On Tue, 2003-05-20 at 23:29, Bud P. Bruegger wrote: > Had an idea for collision detection (see > http://www.agiledata.org/essays/concurrencyControl.html) and it seems > quite a bit simpler to what is described in the above essay. Can > someone comment on whether the following is sound? > > * The shadow data of an object includes a timestamp (or better GUID) > > * every INSERT and UPDATE query changes this timestamp. > > * The (in memory) python object keeps the timestamp information from > the last select, insert, or update query. > > * When the state of an object is written to the dbms (UPDATE query), > collision detection is necessary to make sure noone else modified > the object since... > > * this can be done by the following query: > UPDATE table SET > col1=val1, > ..., > colN = valN, > timeStamp = <newTimeStamp> > WHERE id=<id> > AND timeStamp = <oldTimeStamp> > > This works only if none else has modified the record (since otherwise, > timeStamp is not equal <oldTimeStamp>). > > With the DB-API, it seems the way to determine whether the update was > successful or a collision prevented the database from being changed is > to look at the rowcount attribute of the cursor: 1 means success, 0 > means collision. > > I checked this with psycopg and it seems to work. > > I suppose it would also be sound to wrap this in a transaction... > > Does anyone see any problems with this approach? > > many thanks > --b > > > > /----------------------------------------------------------------- > | Bud P. Bruegger, Ph.D. > | Sistema (www.sistema.it) > | Via U. Bassi, 54 > | 58100 Grosseto, Italy > | +39-0564-411682 (voice and fax) > \----------------------------------------------------------------- > > > ------------------------------------------------------- > This SF.net email is sponsored by: ObjectStore. > If flattening out C++ or Java code to make your application fit in a > relational database is painful, don't do it! Check out ObjectStore. > Now part of Progress Software. http://www.objectstore.net/sourceforge > _______________________________________________ > sqlobject-discuss mailing list > sql...@li... > https://lists.sourceforge.net/lists/listinfo/sqlobject-discuss |
From: Luke O. <lu...@me...> - 2003-05-20 23:12:38
|
Bud - Your example appears to me to be optimistic row-level locking, straight forward. At first glance it seems like a reasonable implementation, although the big question with optimistic/detection schemes is how do you recover? I generally think this needs to be at the programmer's discretion, both for recovery method (typically overwrite or exception in my experience) and for scope (for X transaction, is attribute-level reasonable?) The other thought I'm having right now is that SQLObject seems to simplify some of this if you're using cached objects, because now there is only one possible object, so it's all on the python side to deal with locking as the developer sees fit. but I haven't entirely thought this through. Also, transactions or otherwise, it would seem that with cached objects it would be possible to support setting and managing what level of locking is appropriate on a class(table) or instance(row) or attribute level basis. My brain is already fried, so i won't try to think this out anymore. - Luke |
From: Matt G. <ma...@po...> - 2003-05-20 23:25:58
|
On Tue, 2003-05-20 at 23:57, Luke Opperman wrote: > Bud - <snip> > The other thought I'm having right now is that SQLObject seems to simplify some > of this if you're using cached objects, because now there is only one possible > object, so it's all on the python side to deal with locking as the developer > sees fit. but I haven't entirely thought this through. This assumes that **all** updates are going through SQLObject in a single process. The lock check should definitely happen in the database - it's the only part of the system that truly knows. Obviously, if you've got another process bypassing SQLObject's caching then you've got other problems anyway ;-). > > Also, transactions or otherwise, it would seem that with cached objects it > would be possible to support setting and managing what level of locking is > appropriate on a class(table) or instance(row) or attribute level basis. > > My brain is already fried, so i won't try to think this out anymore. > > - Luke > > > > ------------------------------------------------------- > This SF.net email is sponsored by: ObjectStore. > If flattening out C++ or Java code to make your application fit in a > relational database is painful, don't do it! Check out ObjectStore. > Now part of Progress Software. http://www.objectstore.net/sourceforge > _______________________________________________ > sqlobject-discuss mailing list > sql...@li... > https://lists.sourceforge.net/lists/listinfo/sqlobject-discuss |
From: Bud P. B. <bu...@si...> - 2003-05-20 23:33:18
|
Thanks, Luke. and here some reactions from an equally fried bains.. On Tue, 20 May 2003 17:57:52 -0500 Luke Opperman <lu...@me...> wrote: > Bud - > > Your example appears to me to be optimistic row-level locking, straight > forward. At first glance it seems like a reasonable implementation, although > the big question with optimistic/detection schemes is how do you recover? I > generally think this needs to be at the programmer's discretion, both for > recovery method (typically overwrite or exception in my experience) and for > scope (for X transaction, is attribute-level reasonable?) Definitely a difficult question! I will first opt for simple exception... > The other thought I'm having right now is that SQLObject seems to simplify > some of this if you're using cached objects, because now there is only one > possible object, so it's all on the python side to deal with locking as the > developer sees fit. but I haven't entirely thought this through. This is true if there is a single cache, ie. a single interpreter. But I'll likely run in a multi-process and possibly multi-machine environment... Keeping a cache coherent in a distributed setting is probably a non-trivial task.. One thing I've thought of is to funnel all write activity into a single process... Not sure how difficult this is.. But I think I'll keep it simple for starters.. > Also, transactions or otherwise, it would seem that with cached objects it > would be possible to support setting and managing what level of locking is > appropriate on a class(table) or instance(row) or attribute level basis. I have looked at the concurrancy control doc of PostgreSQL and am quite impressed. If I wasn't too tired to understand, they decide on the "separation level" per transaction (not per table or instance...). cheers -b |
From: Bud P. B. <bu...@si...> - 2003-05-20 23:23:12
|
Thanks for the feedback! Actually, in my ongoing implementation, I have used a GUID generator that I found in the Python Cookbook. It uses time (up to the second???) but also mixes in the IP of the machine and a random number. That way, problems with lack of time synch across machines, second-only time, and similar can be avoided. Do you see any need to take out a lock for the update query? I suppse it is highly unlikely (in my app, at least), that two updates on the same row happen at the same time... --b On 21 May 2003 00:06:58 +0100 Matt Goodall <ma...@po...> wrote: > I've used this technique (optimistic locking) before and it works very > well. It's the only way for web work. > > The only thing I would say is that a timestamp is not the ideal > indicator of change. It might be unlikely, but it is possible for a > database record to be read and updated without the timestamp's value > changing. In fact, some database timestamps (Oracle 8i, IIRC) only have > a 1 second accuracy which increases the chance of missed collisions > considerably. > > An incrementing integer version number is possibly a better choice. The > only problem with this is when the number gets to big for the column - > PostgreSQL, for instance, considers it an error. > > Cheers, Matt > > On Tue, 2003-05-20 at 23:29, Bud P. Bruegger wrote: > > Had an idea for collision detection (see > > http://www.agiledata.org/essays/concurrencyControl.html) and it seems > > quite a bit simpler to what is described in the above essay. Can > > someone comment on whether the following is sound? > > > > * The shadow data of an object includes a timestamp (or better GUID) > > > > * every INSERT and UPDATE query changes this timestamp. > > > > * The (in memory) python object keeps the timestamp information from > > the last select, insert, or update query. > > > > * When the state of an object is written to the dbms (UPDATE query), > > collision detection is necessary to make sure noone else modified > > the object since... > > > > * this can be done by the following query: > > UPDATE table SET > > col1=val1, > > ..., > > colN = valN, > > timeStamp = <newTimeStamp> > > WHERE id=<id> > > AND timeStamp = <oldTimeStamp> > > > > This works only if none else has modified the record (since otherwise, > > timeStamp is not equal <oldTimeStamp>). > > > > With the DB-API, it seems the way to determine whether the update was > > successful or a collision prevented the database from being changed is > > to look at the rowcount attribute of the cursor: 1 means success, 0 > > means collision. > > > > I checked this with psycopg and it seems to work. > > > > I suppose it would also be sound to wrap this in a transaction... > > > > Does anyone see any problems with this approach? > > > > many thanks > > --b > > > > > > > > /----------------------------------------------------------------- > > | Bud P. Bruegger, Ph.D. > > | Sistema (www.sistema.it) > > | Via U. Bassi, 54 > > | 58100 Grosseto, Italy > > | +39-0564-411682 (voice and fax) > > \----------------------------------------------------------------- > > > > > > ------------------------------------------------------- > > This SF.net email is sponsored by: ObjectStore. > > If flattening out C++ or Java code to make your application fit in a > > relational database is painful, don't do it! Check out ObjectStore. > > Now part of Progress Software. http://www.objectstore.net/sourceforge > > _______________________________________________ > > sqlobject-discuss mailing list > > sql...@li... > > https://lists.sourceforge.net/lists/listinfo/sqlobject-discuss > > > > ------------------------------------------------------- > This SF.net email is sponsored by: ObjectStore. > If flattening out C++ or Java code to make your application fit in a > relational database is painful, don't do it! Check out ObjectStore. > Now part of Progress Software. http://www.objectstore.net/sourceforge > _______________________________________________ > sqlobject-discuss mailing list > sql...@li... > https://lists.sourceforge.net/lists/listinfo/sqlobject-discuss > /----------------------------------------------------------------- | Bud P. Bruegger, Ph.D. | Sistema (www.sistema.it) | Via U. Bassi, 54 | 58100 Grosseto, Italy | +39-0564-411682 (voice and fax) \----------------------------------------------------------------- |
From: Matt G. <ma...@po...> - 2003-05-20 23:34:20
|
On Wed, 2003-05-21 at 00:22, Bud P. Bruegger wrote: > Thanks for the feedback! > > Actually, in my ongoing implementation, I have used a GUID generator that I > found in the Python Cookbook. It uses time (up to the second???) but > also mixes in the IP of the machine and a random number. That way, > problems with lack of time synch across machines, second-only time, > and similar can be avoided. Sounds like a reasonable solution. You'd have to be really unlucky to get problems now - 2 users, both using a web site from behind the same masquerading firewall and are both given the same random number ;-). > > > Do you see any need to take out a lock for the update query? I suppse > it is highly unlikely (in my app, at least), that two updates on the > same row happen at the same time... Can you explain what you mean by "take out a lock"? > > --b > > On 21 May 2003 00:06:58 +0100 > Matt Goodall <ma...@po...> wrote: > > > I've used this technique (optimistic locking) before and it works very > > well. It's the only way for web work. > > > > The only thing I would say is that a timestamp is not the ideal > > indicator of change. It might be unlikely, but it is possible for a > > database record to be read and updated without the timestamp's value > > changing. In fact, some database timestamps (Oracle 8i, IIRC) only have > > a 1 second accuracy which increases the chance of missed collisions > > considerably. > > > > An incrementing integer version number is possibly a better choice. The > > only problem with this is when the number gets to big for the column - > > PostgreSQL, for instance, considers it an error. > > > > Cheers, Matt > > > > On Tue, 2003-05-20 at 23:29, Bud P. Bruegger wrote: > > > Had an idea for collision detection (see > > > http://www.agiledata.org/essays/concurrencyControl.html) and it seems > > > quite a bit simpler to what is described in the above essay. Can > > > someone comment on whether the following is sound? > > > > > > * The shadow data of an object includes a timestamp (or better GUID) > > > > > > * every INSERT and UPDATE query changes this timestamp. > > > > > > * The (in memory) python object keeps the timestamp information from > > > the last select, insert, or update query. > > > > > > * When the state of an object is written to the dbms (UPDATE query), > > > collision detection is necessary to make sure noone else modified > > > the object since... > > > > > > * this can be done by the following query: > > > UPDATE table SET > > > col1=val1, > > > ..., > > > colN = valN, > > > timeStamp = <newTimeStamp> > > > WHERE id=<id> > > > AND timeStamp = <oldTimeStamp> > > > > > > This works only if none else has modified the record (since otherwise, > > > timeStamp is not equal <oldTimeStamp>). > > > > > > With the DB-API, it seems the way to determine whether the update was > > > successful or a collision prevented the database from being changed is > > > to look at the rowcount attribute of the cursor: 1 means success, 0 > > > means collision. > > > > > > I checked this with psycopg and it seems to work. > > > > > > I suppose it would also be sound to wrap this in a transaction... > > > > > > Does anyone see any problems with this approach? > > > > > > many thanks > > > --b > > > > > > > > > > > > /----------------------------------------------------------------- > > > | Bud P. Bruegger, Ph.D. > > > | Sistema (www.sistema.it) > > > | Via U. Bassi, 54 > > > | 58100 Grosseto, Italy > > > | +39-0564-411682 (voice and fax) > > > \----------------------------------------------------------------- > > > > > > > > > ------------------------------------------------------- > > > This SF.net email is sponsored by: ObjectStore. > > > If flattening out C++ or Java code to make your application fit in a > > > relational database is painful, don't do it! Check out ObjectStore. > > > Now part of Progress Software. http://www.objectstore.net/sourceforge > > > _______________________________________________ > > > sqlobject-discuss mailing list > > > sql...@li... > > > https://lists.sourceforge.net/lists/listinfo/sqlobject-discuss > > > > > > > > ------------------------------------------------------- > > This SF.net email is sponsored by: ObjectStore. > > If flattening out C++ or Java code to make your application fit in a > > relational database is painful, don't do it! Check out ObjectStore. > > Now part of Progress Software. http://www.objectstore.net/sourceforge > > _______________________________________________ > > sqlobject-discuss mailing list > > sql...@li... > > https://lists.sourceforge.net/lists/listinfo/sqlobject-discuss > > > > > /----------------------------------------------------------------- > | Bud P. Bruegger, Ph.D. > | Sistema (www.sistema.it) > | Via U. Bassi, 54 > | 58100 Grosseto, Italy > | +39-0564-411682 (voice and fax) > \----------------------------------------------------------------- |
From: Bud P. B. <bu...@si...> - 2003-05-21 07:27:52
|
> Can you explain what you mean by "take out a lock"? I suppose the correct term would be acquire a lock. This idea comes from the diagram for optimistic locking at http://www.agiledata.org/essays/concurrencyControl.html#OptimisticLocking. I also looked into the Postgres documentation and it states: <quote src=http://www.postgresql.org/docs/view.php?version=7.3&idoc=1&file=explicit-locking.html#LOCKING-ROWS> In addition to table-level locks, there are row-level locks. A row-level lock on a specific row is automatically acquired when the row is updated (or deleted or marked for update). The lock is held until the transaction commits or rolls back. Row-level locks don't affect data querying; they block writers to the same row only. To acquire a row-level lock on a row without actually modifying the row, select the row with SELECT FOR UPDATE . Note that once a particular row-level lock is acquired, the transaction may update the row multiple times without fear of conflicts. </quote> So I guess that at least in Postgres, row locks are acquired automatically... I see little probability for problems when updating a single object (table row) since it is a single (atomic??) SQL command. Trouble is more likely if in a transaction, several related (as in 1:many or many:many relationships) objects need to be updated in a transaction. It seems Postgres takes care of this too since it locks for the duration of the whole transaction... Hmmmm... but it locks the row... So if two transactions try to modify a set of objects at the same time, both may lock rows but deadlock each other??? Hmm... I'm not good at this kind of stuff...seems difficult... --b |
From: Ian B. <ia...@co...> - 2003-05-20 23:40:37
|
On Tue, 2003-05-20 at 17:57, Luke Opperman wrote: > The other thought I'm having right now is that SQLObject seems to simplify some > of this if you're using cached objects, because now there is only one possible > object, so it's all on the python side to deal with locking as the developer > sees fit. but I haven't entirely thought this through. Yes, in the case where there's one multi-threaded process that's in total control of the database, SQLObject makes it possible to do safe locking on the Python side. Programming in Webware (i.e., a multi-threaded application server), this covers a large portion of my usage. I had some various locks in there and thought of more, but now I feel like the programmer needs to be explicit about their locking, because there's no automated way to do it. For instance, with the timestamp, what is the object's starting point? Let's say you do: obj = Something(someID) # op A x = obj.a # op B y = obj.a # op C obj.a = x + 10 # op D What is the concurrency expectation? I.e., when is does optimistic locking fail? All of A, B, and C are potential times when the timestamp could be fetched from the database. With caching and multithreading, it's even more complicated (though I think if you want to do this locking you'll have to either not use threading, or use per-thread instances). So if we have to be explicit about locking, how do we express that? How do you say what you are locking? Columns, rows, tables? All are valid options, and none particularly more difficult than the others. Ian |
From: Matt G. <ma...@po...> - 2003-05-21 00:20:18
|
On Wed, 2003-05-21 at 00:41, Ian Bicking wrote: > On Tue, 2003-05-20 at 17:57, Luke Opperman wrote: > > The other thought I'm having right now is that SQLObject seems to simplify some > > of this if you're using cached objects, because now there is only one possible > > object, so it's all on the python side to deal with locking as the developer > > sees fit. but I haven't entirely thought this through. > > Yes, in the case where there's one multi-threaded process that's in > total control of the database, SQLObject makes it possible to do safe > locking on the Python side. Programming in Webware (i.e., a > multi-threaded application server), this covers a large portion of my > usage. > > I had some various locks in there and thought of more, but now I feel > like the programmer needs to be explicit about their locking, because > there's no automated way to do it. Unless the check is done during the update then there is no way to guarantee that the optimistic lock is honoured. Unless, that is, the object's row is locked using "select ... for update" (or something similar) just before the update. Of course, having a locking attribute with a small chance of error is better than no locking at all. > > For instance, with the timestamp, what is the object's starting point? > Let's say you do: > > obj = Something(someID) # op A > x = obj.a # op B > y = obj.a # op C > obj.a = x + 10 # op D > > What is the concurrency expectation? I.e., when is does optimistic > locking fail? All of A, B, and C are potential times when the timestamp > could be fetched from the database. I may be missing something obvious but why would reading the timestamp ever make an opt lock fail? The only time it matters is when the database is updated isn't it? > With caching and multithreading, > it's even more complicated (though I think if you want to do this > locking you'll have to either not use threading, or use per-thread > instances). > > So if we have to be explicit about locking, how do we express that? How > do you say what you are locking? Columns, rows, tables? All are valid > options, and none particularly more difficult than the others. Agh! Not table locking, please! The MySQL manual is the only place that sort of talk is acceptable ;-). I would imagine optimistic locking would take place at the object level which generally means row level in the database. - Matt |
From: Ian B. <ia...@co...> - 2003-05-21 01:33:47
|
On Tue, 2003-05-20 at 19:20, Matt Goodall wrote: > > For instance, with the timestamp, what is the object's starting point? > > Let's say you do: > > > > obj = Something(someID) # op A > > x = obj.a # op B > > y = obj.a # op C > > obj.a = x + 10 # op D > > > > What is the concurrency expectation? I.e., when is does optimistic > > locking fail? All of A, B, and C are potential times when the timestamp > > could be fetched from the database. > > I may be missing something obvious but why would reading the timestamp > ever make an opt lock fail? The only time it matters is when the > database is updated isn't it? It's a question of: supposing someone changes the row between A and D, should it fail? B and D? C and D? It could be implemented any of those ways (well, between B and D would require more code than I show there), or in an entirely different way. Using optimistic locking, it kind of boils down to when the timestamp was last retrieved from the database. As for what you want, any of those options is a valid programming decision. Maybe obj.a is dependent on other attributes, or other objects entirely, so A and D is right. Maybe you just want to make sure you've incremented it by 10, so B and D is right. Maybe you do your own consistency checking of x and y, so C to D is all that's necessary. Some sort of annotation is necessary. > > So if we have to be explicit about locking, how do we express that? How > > do you say what you are locking? Columns, rows, tables? All are valid > > options, and none particularly more difficult than the others. > > Agh! Not table locking, please! The MySQL manual is the only place that > sort of talk is acceptable ;-). > > I would imagine optimistic locking would take place at the object level > which generally means row level in the database. Given a locking interface, it shouldn't be hard to implement other locking (no one will force you to use it :). Locking on timestamps or other IDs is the most difficult, so if we do that there's no reason we can't do the other locking too. Ian |
From: Bud P. B. <bu...@si...> - 2003-05-21 07:38:55
|
On 21 May 2003 01:20:02 +0100 Matt Goodall <ma...@po...> wrote: > Unless the check is done during the update then there is no way to > guarantee that the optimistic lock is honoured. Unless, that is, the > object's row is locked using "select ... for update" (or something > similar) just before the update. It seems that in the case of a single process application (possibly multi-threaded), the locking could move from the dbms to the middle-layer. But this is not valid for all cases and to me it seems that (optimistic) locking in the dbms is easier and more general... At least in web apps, select .. for update seems to be problematic since a client may die away after locking without ever committing a change.... So there must be timeouts, administration, etc. Ugh... Also, if one would select for update just before committing a change, the client app could still die in the meantime (even if it's shorter) and at that point one basically has nothing more than optimistic locking... So it seems (as you have said for web apps) that optimistic locking is the only way.. --b |