From: Mason S. <mas...@en...> - 2010-12-22 01:23:35
|
On 12/21/10 3:33 AM, Michael Paquier wrote: > > > Could you give me more details about this crash? > After "make clean; make", things look better. I found another issue though. Still, you can go ahead and commit this since it is close, in order to make merging easier. If the coordinator tries to commit the prepared transactions, if it sends commit prepared to one of the nodes, then is killed before it can send to the other, if I restart the coordinator, I see the data from one of the nodes only (GTM closed the transcation), which is not atomic. The second data node is still alive and was the entire time. I fear we may have to treat implicit transactions similar to explicit transactions. (BTW, do we handle explicit properly for these similar cases, too?) If we stick with performance short cuts it is hard to be reliably atomic. (Again, I will take the blame for trying to speed things up. Perhaps we can have it as a configuration option if people have a lot of implicit 2PC going on and understand the risks.) Anyway, the transaction would remain open, but it would have to be resolved somehow. If we had a "transaction clean up" thread in GTM, it could note the transaction information and periodically try and connect to the registered nodes and resolve according to the rules we have talked about. (Again, some of this code could be in some of the recovery tools you are writing, too). The nice thing about doing something like this is we can automate things as much as possible and not require DBA intervention; if a non-GTM component goes down and comes up again, things will resolve by themselves. I suppose if it is GTM itself that went down, once it rebuilds state properly, this same mechanism could be called at the end of GTM recovery and resolve the outstanding issues. I think we need to walk through every step in the commit sequence and kill an involved process and verify that we have a consistent view of the database afterward, and that we have the ability/tools to resolve it. This code requires careful testing. Thanks, Mason > -- > Michael Paquier > http://michaelpq.users.sourceforge.net > -- Mason Sharp EnterpriseDB Corporation The Enterprise Postgres Company This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |