From: Mason S. <mas...@en...> - 2010-12-12 19:37:22
|
On 12/6/10 12:32 AM, Michael Paquier wrote: > > I changed deeply the algorithm to avoid code duplication for implicit 2PC. > With the patch attached, Coordinator is prepared only if 2 > Coordinators at least are involved in a transaction (DDL case). > If only one Coordinator is involved in transaction or if transaction > does not contain any DDL, transaction is prepared on the involved > nodes only. > > To sum up: > 1) for DDL transaction (more than 1 Coordinator and more than 1 > Datanode involved in a transaction) > - prepare on Coordinator (2PC file written) > - prepare on Nodes (2PC file written) > - Commit prepared on Coordinator > - Commit prepared on Datanodes > 2) If no Coordinator, or only one Coordinator is involved in a transaction > - prepare on nodes > - commit on Coordinator > - Commit on Datanodes > > Note: I didn' t put calls to implicit prepare functions in a separate > functions because modification of CommitTransaction() are really light. > I reviewed, and I thought it looked good, except for a possible issue with committing. I wanted to test what happened with implicit transactions when there was a failure. I executed this in one session: mds1=# begin; BEGIN mds1=# insert into mds1 values (1,1); INSERT 0 1 mds1=# insert into mds1 values (2,2); INSERT 0 1 mds1=# commit; Before committing, I fired up gdb for a coordinator session and a data node session. On one of the data nodes, when the COMMIT PREPARED was received, I killed the backend to see what would happen. On the Coordinator I saw this: WARNING: unexpected EOF on datanode connection WARNING: Connection to Datanode 1 has unexpected state 1 and will be dropped WARNING: Connection to Datanode 2 has unexpected state 1 and will be dropped ERROR: Could not commit prepared transaction implicitely PANIC: cannot abort transaction 10312, it was already committed server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. I am not sure we should be aborting 10312, since it was committed on one of the nodes. It corresponds to the original prepared transaction. We also do not want a panic to happen. Next, I started a new coordinator session: mds1=# select * from mds1; col1 | col2 ------+------ 2 | 2 (1 row) I only see one of the rows. I thought, well, ok, we cannot undo a commit, and the other one must commit eventually. I was able to continue working normally: mds1=# insert into mds1 values (3,3); INSERT 0 1 mds1=# insert into mds1 values (4,4); INSERT 0 1 mds1=# insert into mds1 values (5,5); INSERT 0 1 mds1=# insert into mds1 values (6,6); INSERT 0 1 mds1=# select xmin,* from mds1; xmin | col1 | col2 -------+------+------ 10420 | 4 | 4 10422 | 6 | 6 10312 | 2 | 2 10415 | 3 | 3 10421 | 5 | 5 (5 rows) Note xmin keeps increasing because we closed the transaction on GTM at the "finish:" label. This may or may not be ok. Meanwhile, on the failed data node: mds1=# select * from pg_prepared_xacts; WARNING: Do not have a GTM snapshot available WARNING: Do not have a GTM snapshot available transaction | gid | prepared | owner | database -------------+--------+-------------------------------+------------+---------- 10312 | T10312 | 2010-12-12 12:04:30.946287-05 | xxxxxx | mds1 (1 row) The transaction id is 10312. Normally this would still appear in snapshots, but we close it on GTM. What should we do? - We could leave as is. We may in the future have an XC monitoring process look for possible 2PC anomalies occasionally and send an alert so that they could be resolved by a DBA. - We could instead choose not close out the transaction on GTM, so that the xid is still in snapshots. We could test if the rows are viewable or not. This could result in other side effects, but without further testing, I am guessing this may be similar to when an existing statement is running and cannot see a previously committed transaction that is open in its snapshot. So, I am thinking this is probably the preferable option (keeping it open on GTM until committed on all nodes), but we should test it. In any event, we should also fix the panic. It may be that we had a similar problem in the existing code before this patch, although I did some testing a few months back Pavan's crash test patch and things seemed stable. Also, we might want to check that explicit 2PC also handles this OK. Thanks, Mason > Regards, > > -- > Michael Paquier > http://michaelpq.users.sourceforge.net > > > ------------------------------------------------------------------------------ > What happens now with your Lotus Notes apps - do you make another costly > upgrade, or settle for being marooned without product support? Time to move > off Lotus Notes and onto the cloud with Force.com, apps are easier to build, > use, and manage than apps on traditional platforms. Sign up for the Lotus > Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d > > > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers -- Mason Sharp EnterpriseDB Corporation The Enterprise Postgres Company This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |