From: Michael P. <mic...@gm...> - 2010-12-22 02:12:43
|
On Wed, Dec 22, 2010 at 10:23 AM, Mason Sharp <mas...@en...>wrote: > After "make clean; make", things look better. > > Thanks to take the time to check that. > I found another issue though. Still, you can go ahead and commit this since > it is close, in order to make merging easier. > > I'll do it, thanks. > If the coordinator tries to commit the prepared transactions, if it sends > commit prepared to one of the nodes, then is killed before it can send to > the other, if I restart the coordinator, I see the data from one of the > nodes only (GTM closed the transcation), which is not atomic. The second > data node is still alive and was the entire time. > That is true, if coordinator crashes, GTM closes all the backends of transactions that it considers as open. In the case of implicit COMMIT, even if we prepare/commit on the nodes, it is still seen as open on GTM. > > I fear we may have to treat implicit transactions similar to explicit > transactions. (BTW, do we handle explicit properly for these similar cases, > too?) If we stick with performance short cuts it is hard to be reliably > atomic. (Again, I will take the blame for trying to speed things up. > Perhaps we can have it as a configuration option if people have a lot of > implicit 2PC going on and understand the risks.) > Yeah I think so. A GUC parameter would make the deal, but I'd like to discuss more about that before deciding anything. > Anyway, the transaction would remain open, but it would have to be resolved > somehow. > > If we had a "transaction clean up" thread in GTM, it could note the > transaction information and periodically try and connect to the registered > nodes and resolve according to the rules we have talked about. (Again, some > of this code could be in some of the recovery tools you are writing, too). > The nice thing about doing something like this is we can automate things as > much as possible and not require DBA intervention; if a non-GTM component > goes down and comes up again, things will resolve by themselves. I suppose > if it is GTM itself that went down, once it rebuilds state properly, this > same mechanism could be called at the end of GTM recovery and resolve the > outstanding issues. > That it more or less what we are planning to do with the utility that will have to check the remaining 2PC transaction after a Coordinator crash. This utility would be kicked by the monitoring agent when noticing a Coordinator crash. This feature needs two things: 1) fix for EXECUTE DIRECT 2) extension of 2PC table (patch already written but not realigned with latest 2PC code) I think we need to walk through every step in the commit sequence and kill > an involved process and verify that we have a consistent view of the > database afterward, and that we have the ability/tools to resolve it. > > This code requires careful testing. > That's true, this code could lead easily to unexpected issues by playing with 2PC. -- Michael Paquier http://michaelpq.users.sourceforge.net |