From: Mason S. <mas...@en...> - 2010-12-14 23:26:41
|
> Hi all, > > Here is the fix I propose based on the idea I proposed in a previous mail. > If a prepared transaction, partially committed, is aborted, this patch > gathers the handles to nodes where an error occurred and saves them on > GTM. > > The prepared transaction partially committed is kept alive on GTM, so > other transactions cannot see the partially committed results. > To complete the commit of the prepared transaction partially > committed, it is necessary to issue a COMMIT PREPARED 'gid'. > Once this command is issued, transaction will finish its commit properly. > > Mason, this solves the problem you saw when you made your tests. > It also respects the rule that a 2PC transaction partially committed > has to be committed. > Just took a brief look so far. Seems better. I understand that recovery and HA is in development and things are being done to lay the groundwork and improve, and that with this patch we are not trying to yet handle any and every situation. What happens if the coordinator fails before it can update GTM though? Also, I did a test and got this: WARNING: unexpected EOF on datanode connection WARNING: Connection to Datanode 1 has unexpected state 1 and will be dropped ERROR: Could not commit prepared transaction implicitely server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. #0 0x907afe42 in kill$UNIX2003 () #1 0x9082223a in raise () #2 0x9082e679 in abort () #3 0x003917ce in ExceptionalCondition (conditionName=0x433f6c "!(((proc->xid) != ((TransactionId) 0)))", errorType=0x3ecfd4 "FailedAssertion", fileName=0x433f50 "procarray.c", lineNumber=283) at assert.c:57 #4 0x00280916 in ProcArrayEndTransaction (proc=0x41cca70, latestXid=1018) at procarray.c:283 #5 0x0005905c in AbortTransaction () at xact.c:2525 #6 0x00059a6e in AbortCurrentTransaction () at xact.c:3001 #7 0x00059b10 in AbortCurrentTransactionOnce () at xact.c:3094 #8 0x0029c8d6 in PostgresMain (argc=4, argv=0x1002ff8, username=0x1002fc8 "masonsharp") at postgres.c:3622 #9 0x0025851c in BackendRun (port=0x7016f0) at postmaster.c:3607 #10 0x00257883 in BackendStartup (port=0x7016f0) at postmaster.c:3216 #11 0x002542b5 in ServerLoop () at postmaster.c:1445 #12 0x002538c1 in PostmasterMain (argc=5, argv=0x7005a0) at postmaster.c:1098 #13 0x001cf2f1 in main (argc=5, argv=0x7005a0) at main.c:188 I did the same test as before. I killed a data node after it received a COMMIT PREPARED message. I think we should be able to continue. The good news is that I should not see partially committed data, which I do not. But if I try and manually commit it from a new connection to the coordinator: mds=# COMMIT PREPARED 'T1018'; ERROR: Could not get GID data from GTM Maybe GTM removed this info when the coordinator disconnected? (Or maybe implicit transactions are only associated with a certain connection?) I can see the transaction on one data node, but not the other. Ideally we would come up with a scheme where if the coordinator session does not notify GTM, we can somehow recover. Maybe this is my fault- I believe I advocated avoiding the extra work for implicit 2PC in the name of performance. :-) We can think about what to do in the short term, and how to handle in the long term. In the short term, your approach may be good enough once debugged, since it is a relatively rare case. Long term we could think about a thread that runs on GTM and wakes up every 30 or 60 seconds or so (configurable), collects implicit transactions from the nodes (extension to pg_prepared_xacts required?) and if it sees that the XID does not have an associated live connection, knows that something went awry. It then sees if it committed on any of the nodes. If not, rollback all, if it did on at least one, commit on all. If one of the data nodes is down, it won't do anything, perhaps log a warning. This would avoid user intervention, and would be pretty cool. Some of this code you may already have been working on for recovery and we could reuse here. Regards, Mason > Thanks, > > -- > Michael Paquier > http://michaelpq.users.sourceforge.net > -- Mason Sharp EnterpriseDB Corporation The Enterprise Postgres Company This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |