From: Michael P. <mic...@gm...> - 2010-11-30 08:27:19
Attachments:
implicit2pc5.patch
|
Hi all, Please see attached a patch that corrects 2PC (2 phase commit) in the case of an implicit 2PC. In the current HEAD, when a transaction involving several nodes in a write operation commits, it does a commit in the following order: 1) Prepare on datanodes 2) Commit on datanodes 3) Commit on Coordinator 4) Commit on GTM The problem is that Commit at Coordinator has to be done first to protect data consistency. With the patch attached, a commit is done in the following order: 1) Prepare on Coordinator (Flush a 2PC file if DDL is involved) 2) Prepare on datanodes involved in a write operation 3) Commit on Coordinator the prepared transaction 4) Commit on Datanodes the prepared transaction 5) Commit on GTM In case of a problem at Coordinator, transaction can be rollbacked on nodes, protecting data visibility and consistency. There is also a little improvement, in current head, it is necessary to go 2 times to GTM to commit global the transaction ID (GXID) used for Prepare and the GXID used for Commit. In this patch, GTM is only contacted once and commits at the same time both GXIDs. Regards, -- Michael Paquier http://michaelpq.users.sourceforge.net |
From: Pavan D. <pav...@gm...> - 2010-12-02 10:17:48
|
On Tue, Nov 30, 2010 at 1:57 PM, Michael Paquier <mic...@gm...>wrote: > Hi all, > > Please see attached a patch that corrects 2PC (2 phase commit) in the case > of an implicit 2PC. > I think we should try to minimize changes to CommitTransaction. Why not use the PrepareTransaction() to prepare the transaction instead of duplicating that code inside CommitTransaction ? Also, it would be nice if you can take away new code in a separate function and call that, something like AtEOXact_PGXC() or something like that. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com |
From: Michael P. <mic...@gm...> - 2010-12-06 05:32:25
Attachments:
implicit2pc6.patch
|
> > I think we should try to minimize changes to CommitTransaction. Why not use > the PrepareTransaction() to prepare the transaction instead of duplicating > that code inside CommitTransaction ? Also, it would be nice if you can take > away new code in a separate function and call that, something like > AtEOXact_PGXC() or something like that. > Hi all, I changed deeply the algorithm to avoid code duplication for implicit 2PC. With the patch attached, Coordinator is prepared only if 2 Coordinators at least are involved in a transaction (DDL case). If only one Coordinator is involved in transaction or if transaction does not contain any DDL, transaction is prepared on the involved nodes only. To sum up: 1) for DDL transaction (more than 1 Coordinator and more than 1 Datanode involved in a transaction) - prepare on Coordinator (2PC file written) - prepare on Nodes (2PC file written) - Commit prepared on Coordinator - Commit prepared on Datanodes 2) If no Coordinator, or only one Coordinator is involved in a transaction - prepare on nodes - commit on Coordinator - Commit on Datanodes Note: I didn' t put calls to implicit prepare functions in a separate functions because modification of CommitTransaction() are really light. Regards, -- Michael Paquier http://michaelpq.users.sourceforge.net |
From: Michael P. <mic...@gm...> - 2010-12-08 07:24:49
Attachments:
implicit2pc6_extend_pg_prepared_xacts.patch
|
Continuing on the modifications for 2PC, I finished a patch extending 2PC file and 2PC xact data protocol. With the patch attached, 2PC data contains the following information: - if the 2PC is implicit or explicit - if the transaction prepared contained DDL or not (if yes, it means that the transaction has also been prepared on Coordinators) - the coordinator number from where the 2PC has been issued - the list of nodes where the transaction has been prepared. In case of a transaction that prepared only on Coordinators, the list of nodes is set to "n" (case of sequence transactions) 2PC information below is sent down to nodes when an implicit or explicit prepare is made, and only on the necessary nodes. The patch also contains an extension of the view pg_prepared_xacts to be able to get from catalog table the extended 2PC information as well as the usual 2PC data. If you want to make tests, you have to apply first on HEAD implicit2pc6.patch and then apply implicit2pc6_extend_pg_prepared_xacts.patch. I forgot to say that, as pg_prepared_xacts is a catalog view, you have to connect to a node directly to get the 2PC information. The information could also be obtained with EXECUTE DIRECT but currently this functionality is broken. In the test I did, of course I checked if the views were OK, but also I checked that recovery of prepared transactions was properly done. If you want to try, kill a postgres process with SIGQUIT (kill -3) and relaunch it. 2PC data will recover from the 2PC files correctly. You can check by launching "select * from pg_prepared_xacts;" before and after stopping the postgres instance. Regards, Michael -- Michael Paquier http://michaelpq.users.sourceforge.net |
From: Mason S. <mas...@en...> - 2010-12-12 19:37:22
|
On 12/6/10 12:32 AM, Michael Paquier wrote: > > I changed deeply the algorithm to avoid code duplication for implicit 2PC. > With the patch attached, Coordinator is prepared only if 2 > Coordinators at least are involved in a transaction (DDL case). > If only one Coordinator is involved in transaction or if transaction > does not contain any DDL, transaction is prepared on the involved > nodes only. > > To sum up: > 1) for DDL transaction (more than 1 Coordinator and more than 1 > Datanode involved in a transaction) > - prepare on Coordinator (2PC file written) > - prepare on Nodes (2PC file written) > - Commit prepared on Coordinator > - Commit prepared on Datanodes > 2) If no Coordinator, or only one Coordinator is involved in a transaction > - prepare on nodes > - commit on Coordinator > - Commit on Datanodes > > Note: I didn' t put calls to implicit prepare functions in a separate > functions because modification of CommitTransaction() are really light. > I reviewed, and I thought it looked good, except for a possible issue with committing. I wanted to test what happened with implicit transactions when there was a failure. I executed this in one session: mds1=# begin; BEGIN mds1=# insert into mds1 values (1,1); INSERT 0 1 mds1=# insert into mds1 values (2,2); INSERT 0 1 mds1=# commit; Before committing, I fired up gdb for a coordinator session and a data node session. On one of the data nodes, when the COMMIT PREPARED was received, I killed the backend to see what would happen. On the Coordinator I saw this: WARNING: unexpected EOF on datanode connection WARNING: Connection to Datanode 1 has unexpected state 1 and will be dropped WARNING: Connection to Datanode 2 has unexpected state 1 and will be dropped ERROR: Could not commit prepared transaction implicitely PANIC: cannot abort transaction 10312, it was already committed server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. I am not sure we should be aborting 10312, since it was committed on one of the nodes. It corresponds to the original prepared transaction. We also do not want a panic to happen. Next, I started a new coordinator session: mds1=# select * from mds1; col1 | col2 ------+------ 2 | 2 (1 row) I only see one of the rows. I thought, well, ok, we cannot undo a commit, and the other one must commit eventually. I was able to continue working normally: mds1=# insert into mds1 values (3,3); INSERT 0 1 mds1=# insert into mds1 values (4,4); INSERT 0 1 mds1=# insert into mds1 values (5,5); INSERT 0 1 mds1=# insert into mds1 values (6,6); INSERT 0 1 mds1=# select xmin,* from mds1; xmin | col1 | col2 -------+------+------ 10420 | 4 | 4 10422 | 6 | 6 10312 | 2 | 2 10415 | 3 | 3 10421 | 5 | 5 (5 rows) Note xmin keeps increasing because we closed the transaction on GTM at the "finish:" label. This may or may not be ok. Meanwhile, on the failed data node: mds1=# select * from pg_prepared_xacts; WARNING: Do not have a GTM snapshot available WARNING: Do not have a GTM snapshot available transaction | gid | prepared | owner | database -------------+--------+-------------------------------+------------+---------- 10312 | T10312 | 2010-12-12 12:04:30.946287-05 | xxxxxx | mds1 (1 row) The transaction id is 10312. Normally this would still appear in snapshots, but we close it on GTM. What should we do? - We could leave as is. We may in the future have an XC monitoring process look for possible 2PC anomalies occasionally and send an alert so that they could be resolved by a DBA. - We could instead choose not close out the transaction on GTM, so that the xid is still in snapshots. We could test if the rows are viewable or not. This could result in other side effects, but without further testing, I am guessing this may be similar to when an existing statement is running and cannot see a previously committed transaction that is open in its snapshot. So, I am thinking this is probably the preferable option (keeping it open on GTM until committed on all nodes), but we should test it. In any event, we should also fix the panic. It may be that we had a similar problem in the existing code before this patch, although I did some testing a few months back Pavan's crash test patch and things seemed stable. Also, we might want to check that explicit 2PC also handles this OK. Thanks, Mason > Regards, > > -- > Michael Paquier > http://michaelpq.users.sourceforge.net > > > ------------------------------------------------------------------------------ > What happens now with your Lotus Notes apps - do you make another costly > upgrade, or settle for being marooned without product support? Time to move > off Lotus Notes and onto the cloud with Force.com, apps are easier to build, > use, and manage than apps on traditional platforms. Sign up for the Lotus > Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d > > > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers -- Mason Sharp EnterpriseDB Corporation The Enterprise Postgres Company This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: Michael P. <mic...@gm...> - 2010-12-13 02:28:34
|
> > I reviewed, and I thought it looked good, except for a possible issue with > committing. > > I wanted to test what happened with implicit transactions when there was a > failure. > > I executed this in one session: > > mds1=# begin; > BEGIN > mds1=# insert into mds1 values (1,1); > INSERT 0 1 > mds1=# insert into mds1 values (2,2); > INSERT 0 1 > mds1=# commit; > > Before committing, I fired up gdb for a coordinator session and a data node > session. > > On one of the data nodes, when the COMMIT PREPARED was received, I killed > the backend to see what would happen. On the Coordinator I saw this: > > > WARNING: unexpected EOF on datanode connection > WARNING: Connection to Datanode 1 has unexpected state 1 and will be > dropped > WARNING: Connection to Datanode 2 has unexpected state 1 and will be > dropped > > ERROR: Could not commit prepared transaction implicitely > PANIC: cannot abort transaction 10312, it was already committed > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Failed. > > I am not sure we should be aborting 10312, since it was committed on one of > the nodes. It corresponds to the original prepared transaction. We also do > not want a panic to happen. > This has to be corrected. If a PANIC happens on Coordinators each time a Datanode crashes, a simple node crash would mess up the whole cluster. It is a real problem I think. > Next, I started a new coordinator session: > > mds1=# select * from mds1; > col1 | col2 > ------+------ > 2 | 2 > (1 row) > > > I only see one of the rows. I thought, well, ok, we cannot undo a commit, > and the other one must commit eventually. I was able to continue working > normally: > > mds1=# insert into mds1 values (3,3); > INSERT 0 1 > mds1=# insert into mds1 values (4,4); > INSERT 0 1 > mds1=# insert into mds1 values (5,5); > INSERT 0 1 > mds1=# insert into mds1 values (6,6); > INSERT 0 1 > > mds1=# select xmin,* from mds1; > xmin | col1 | col2 > -------+------+------ > 10420 | 4 | 4 > 10422 | 6 | 6 > 10312 | 2 | 2 > 10415 | 3 | 3 > 10421 | 5 | 5 > (5 rows) > > > Note xmin keeps increasing because we closed the transaction on GTM at the > "finish:" label. This may or may not be ok. > This should be OK, no? > > Meanwhile, on the failed data node: > > mds1=# select * from pg_prepared_xacts; > WARNING: Do not have a GTM snapshot available > WARNING: Do not have a GTM snapshot available > transaction | gid | prepared | owner | > database > > -------------+--------+-------------------------------+------------+---------- > 10312 | T10312 | 2010-12-12 12:04:30.946287-05 | xxxxxx | mds1 > (1 row) > > The transaction id is 10312. Normally this would still appear in snapshots, > but we close it on GTM. > > What should we do? > > - We could leave as is. We may in the future have an XC monitoring process > look for possible 2PC anomalies occasionally and send an alert so that they > could be resolved by a DBA. > I was thinking about an external utility that could clean up partially committed or prepared transactions when a node crash happens. This is a part of HA, so I think the only thing that should be corrected now is the way errors are managed in the case of a partially committed prepared transaction on nodes. A PANIC is not acceptable for this case. > - We could instead choose not close out the transaction on GTM, so that the > xid is still in snapshots. We could test if the rows are viewable or not. > This could result in other side effects, but without further testing, I am > guessing this may be similar to when an existing statement is running and > cannot see a previously committed transaction that is open in its snapshot. > So, I am thinking this is probably the preferable option (keeping it open on > GTM until committed on all nodes), but we should test it. In any event, we > should also fix the panic. > If we let it open the transaction open on GTM, how do we know the GXID that has been used for Commit (different from the one that has been used for PREPARE as I recall)? If we do a Commit prepare on the remaining node that crashed, we have to commit the former PREPARE GXID, the former COMMIT PREPARED GXID and also the GXID that is used to issue the new COMMIT PREPARED on the remaining node. It is easy to get the GXID used for former PREPARE and new COMMIT PREPARED. But there is no real way yet to get back the GXID used for the former COMMIT PREPARE. I would see two ways to correct that: 1) Save the former COMMIT PREPARED GXID in GTM, but this would really impact performance. 2) Save the COMMIT PREPARED GXID on Coordinator and let the GXACT open on Coordinator (would be the best solution, but the transaction has already been committed on Coordinator). That's why I think the transaction should be to close the transaction on GTM, and a monitoring agent would be in charge to commit on the remaining nodes that crashed if a partial COMMIT has been done. Btw, it is a complicated point, so other's opinion is completely welcome. Regards, -- Michael Paquier http://michaelpq.users.sourceforge.net |
From: Mason S. <mas...@en...> - 2010-12-13 15:04:14
|
On 12/12/10 9:28 PM, Michael Paquier wrote: > > > I reviewed, and I thought it looked good, except for a possible > issue with committing. > > I wanted to test what happened with implicit transactions when > there was a failure. > > I executed this in one session: > > mds1=# begin; > BEGIN > mds1=# insert into mds1 values (1,1); > INSERT 0 1 > mds1=# insert into mds1 values (2,2); > INSERT 0 1 > mds1=# commit; > > Before committing, I fired up gdb for a coordinator session and a > data node session. > > On one of the data nodes, when the COMMIT PREPARED was received, I > killed the backend to see what would happen. On the Coordinator I > saw this: > > > WARNING: unexpected EOF on datanode connection > WARNING: Connection to Datanode 1 has unexpected state 1 and will > be dropped > WARNING: Connection to Datanode 2 has unexpected state 1 and will > be dropped > > ERROR: Could not commit prepared transaction implicitely > PANIC: cannot abort transaction 10312, it was already committed > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Failed. > > I am not sure we should be aborting 10312, since it was committed > on one of the nodes. It corresponds to the original prepared > transaction. We also do not want a panic to happen. > > This has to be corrected. > If a PANIC happens on Coordinators each time a Datanode crashes, a > simple node crash would mess up the whole cluster. > It is a real problem I think. Yes. > > > Next, I started a new coordinator session: > > mds1=# select * from mds1; > col1 | col2 > ------+------ > 2 | 2 > (1 row) > > > I only see one of the rows. I thought, well, ok, we cannot undo a > commit, and the other one must commit eventually. I was able to > continue working normally: > > mds1=# insert into mds1 values (3,3); > INSERT 0 1 > mds1=# insert into mds1 values (4,4); > INSERT 0 1 > mds1=# insert into mds1 values (5,5); > INSERT 0 1 > mds1=# insert into mds1 values (6,6); > INSERT 0 1 > > mds1=# select xmin,* from mds1; > xmin | col1 | col2 > -------+------+------ > 10420 | 4 | 4 > 10422 | 6 | 6 > 10312 | 2 | 2 > 10415 | 3 | 3 > 10421 | 5 | 5 > (5 rows) > > > Note xmin keeps increasing because we closed the transaction on > GTM at the "finish:" label. This may or may not be ok. > > This should be OK, no? Not necessarily. > > > Meanwhile, on the failed data node: > > mds1=# select * from pg_prepared_xacts; > WARNING: Do not have a GTM snapshot available > WARNING: Do not have a GTM snapshot available > transaction | gid | prepared | owner > | database > -------------+--------+-------------------------------+------------+---------- > 10312 | T10312 | 2010-12-12 12:04:30.946287-05 | xxxxxx | mds1 > (1 row) > > The transaction id is 10312. Normally this would still appear in > snapshots, but we close it on GTM. > > What should we do? > > - We could leave as is. We may in the future have an XC monitoring > process look for possible 2PC anomalies occasionally and send an > alert so that they could be resolved by a DBA. > > I was thinking about an external utility that could clean up partially > committed or prepared transactions when a node crash happens. > This is a part of HA, so I think the only thing that should be > corrected now is the way errors are managed in the case of a partially > committed prepared transaction on nodes. > A PANIC is not acceptable for this case. > > > - We could instead choose not close out the transaction on GTM, so > that the xid is still in snapshots. We could test if the rows are > viewable or not. This could result in other side effects, but > without further testing, I am guessing this may be similar to when > an existing statement is running and cannot see a previously > committed transaction that is open in its snapshot. So, I am > thinking this is probably the preferable option (keeping it open > on GTM until committed on all nodes), but we should test it. In > any event, we should also fix the panic. > > > If we let it open the transaction open on GTM, how do we know the GXID > that has been used for Commit (different from the one that has been > used for PREPARE as I recall)? We can test the behavior to see if it is ok to close this one out, otherwise, we have more work to do... > If we do a Commit prepare on the remaining node that crashed, we have > to commit the former PREPARE GXID, the former COMMIT PREPARED GXID and > also the GXID that is used to issue the new COMMIT PREPARED on the > remaining node. > It is easy to get the GXID used for former PREPARE and new COMMIT > PREPARED. But there is no real way yet to get back the GXID used for > the former COMMIT PREPARE. > I would see two ways to correct that: > 1) Save the former COMMIT PREPARED GXID in GTM, but this would really > impact performance. > 2) Save the COMMIT PREPARED GXID on Coordinator and let the GXACT open > on Coordinator (would be the best solution, but the transaction has > already been committed on Coordinator). > I think we need to research the effects of this and see how the system behaves if the partially failed commit prepared GXID is closed. I suppose it could cause a problem with viewing pg_prepared_xacts. We don't want the hint bits to get updated.... well, the first XID will be lower, so the lower open xmin should keep this from having the tuple frozen. > That's why I think the transaction should be to close the transaction > on GTM, and a monitoring agent would be in charge to commit on the > remaining nodes that crashed if a partial COMMIT has been done. From above, the node is still active and the query after the transaction is returning partial results. It should be an all or nothing operation. If we close the transaction on GTM, then it means that Postgres-XC is not atomic. I think it is important to be ACID compliant. I think we should fix the panic, then test how the system behaves if, even though the transaction is committed on one node, if we keep the transaction open. The XID will appear in all the snapshots and the row should not be viewable, and we can make sure that vacuum is also ok (should be). If it works ok, then I think we should keep the transaction open on GTM until all components have committed. > > Btw, it is a complicated point, so other's opinion is completely welcome. > Yes. Thanks, Mason > Regards, > > -- > Michael Paquier > http://michaelpq.users.sourceforge.net > -- Mason Sharp EnterpriseDB Corporation The Enterprise Postgres Company This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: Michael P. <mic...@gm...> - 2010-12-14 00:59:53
|
> >> mds1=# select xmin,* from mds1; >> xmin | col1 | col2 >> -------+------+------ >> 10420 | 4 | 4 >> 10422 | 6 | 6 >> 10312 | 2 | 2 >> 10415 | 3 | 3 >> 10421 | 5 | 5 >> (5 rows) >> >> >> Note xmin keeps increasing because we closed the transaction on GTM at the >> "finish:" label. This may or may not be ok. >> > This should be OK, no? > > Not necessarily. > I see, the transaction has been only partially committed so the xmin should keep the value of the oldest GXID (in this case the one that has not been completely committed). If we let it open the transaction open on GTM, how do we know the GXID that > has been used for Commit (different from the one that has been used for > PREPARE as I recall)? > > We can test the behavior to see if it is ok to close this one out, > otherwise, we have more work to do... > OK, I see, so not commit the transaction on GTM... In accordance with the current patch, we can know if implicit 2PC is used with CommitTransactionID I added in GlobalTransactionData for the implicit 2PC. If this value is set, it means that the transaction has been committed on Coordinator and that this Coordinator is using an implicit 2PC. This value set also means that the the nodes are partially committed or completely prepared. Here is my proposition. When an ABORT happens and CommitTransactionID is set, we do not commit the transaction ID used for PREPARE but we commit CommitTransactionID (no effect on visibility). On the other hand, we register the transaction as still prepared on GTM when Abort happens. This could be done with the API used for explicit 2PC. Then if there is a conflict, the DBA or a monitoring tool could use the explicit 2PC to finish the commit of the transaction partially prepared. This could make the deal. What do you think about that? I think we should fix the panic, then test how the system behaves if, even > though the transaction is committed on one node, if we keep the transaction > open. The XID will appear in all the snapshots and the row should not be > viewable, and we can make sure that vacuum is also ok (should be). If it > works ok, then I think we should keep the transaction open on GTM until all > components have committed. > The PANIC can be easily fixed. Without testing I would say that the system may be OK, as the transaction ID is still kept alive in snapshot. With that transaction is seen as alive in the cluster. -- Michael Paquier http://michaelpq.users.sourceforge.net |
From: Koichi S. <koi...@gm...> - 2010-12-14 01:15:11
|
Hi, please see inline... ---------- Koichi Suzuki 2010/12/13 Mason Sharp <mas...@en...>: > On 12/12/10 9:28 PM, Michael Paquier wrote: >> >> I reviewed, and I thought it looked good, except for a possible issue with >> committing. >> >> I wanted to test what happened with implicit transactions when there was a >> failure. >> >> I executed this in one session: >> >> mds1=# begin; >> BEGIN >> mds1=# insert into mds1 values (1,1); >> INSERT 0 1 >> mds1=# insert into mds1 values (2,2); >> INSERT 0 1 >> mds1=# commit; >> >> Before committing, I fired up gdb for a coordinator session and a data >> node session. >> >> On one of the data nodes, when the COMMIT PREPARED was received, I killed >> the backend to see what would happen. On the Coordinator I saw this: >> >> >> WARNING: unexpected EOF on datanode connection >> WARNING: Connection to Datanode 1 has unexpected state 1 and will be >> dropped >> WARNING: Connection to Datanode 2 has unexpected state 1 and will be >> dropped >> >> ERROR: Could not commit prepared transaction implicitely >> PANIC: cannot abort transaction 10312, it was already committed >> server closed the connection unexpectedly >> This probably means the server terminated abnormally >> before or while processing the request. >> The connection to the server was lost. Attempting reset: Failed. >> >> I am not sure we should be aborting 10312, since it was committed on one >> of the nodes. It corresponds to the original prepared transaction. We also >> do not want a panic to happen. > > This has to be corrected. > If a PANIC happens on Coordinators each time a Datanode crashes, a simple > node crash would mess up the whole cluster. > It is a real problem I think. > > Yes. > > >> >> Next, I started a new coordinator session: >> >> mds1=# select * from mds1; >> col1 | col2 >> ------+------ >> 2 | 2 >> (1 row) >> >> >> I only see one of the rows. I thought, well, ok, we cannot undo a commit, >> and the other one must commit eventually. I was able to continue working >> normally: >> >> mds1=# insert into mds1 values (3,3); >> INSERT 0 1 >> mds1=# insert into mds1 values (4,4); >> INSERT 0 1 >> mds1=# insert into mds1 values (5,5); >> INSERT 0 1 >> mds1=# insert into mds1 values (6,6); >> INSERT 0 1 Are these statements run as a transaction block or did they run as "autocommit" statements? >> >> mds1=# select xmin,* from mds1; >> xmin | col1 | col2 >> -------+------+------ >> 10420 | 4 | 4 >> 10422 | 6 | 6 >> 10312 | 2 | 2 >> 10415 | 3 | 3 >> 10421 | 5 | 5 >> (5 rows) >> >> >> Note xmin keeps increasing because we closed the transaction on GTM at the >> "finish:" label. This may or may not be ok. > > This should be OK, no? If the above statements ran in "autocommit" mode, each statement ran as separate transaction. Xmin just indicates GXID which "created" the row. To determine if it is visible or not, we have to visit CLOG (if GXID is not "frozen") and the list of live transactions to see if it is running, committed or aborted. Then we can determine if a given row should be visible or not. Therefore, if the creator transaction is left just "PREPARED", the creator transaction information will remain in PgProc and is regarded "running", thus it should be regarded "invisible" from other transactions. Similar consideration should be made to see "xmac" value of the row, in the case of "update" or "delete" statement. Hope it helps. --- Koichi Suzuki > > Not necessarily. > > >> >> Meanwhile, on the failed data node: >> >> mds1=# select * from pg_prepared_xacts; >> WARNING: Do not have a GTM snapshot available >> WARNING: Do not have a GTM snapshot available >> transaction | gid | prepared | owner | >> database >> >> -------------+--------+-------------------------------+------------+---------- >> 10312 | T10312 | 2010-12-12 12:04:30.946287-05 | xxxxxx | mds1 >> (1 row) >> >> The transaction id is 10312. Normally this would still appear in >> snapshots, but we close it on GTM. >> >> What should we do? >> >> - We could leave as is. We may in the future have an XC monitoring process >> look for possible 2PC anomalies occasionally and send an alert so that they >> could be resolved by a DBA. > > I was thinking about an external utility that could clean up partially > committed or prepared transactions when a node crash happens. > This is a part of HA, so I think the only thing that should be corrected now > is the way errors are managed in the case of a partially committed prepared > transaction on nodes. > A PANIC is not acceptable for this case. > >> >> - We could instead choose not close out the transaction on GTM, so that >> the xid is still in snapshots. We could test if the rows are viewable or >> not. This could result in other side effects, but without further testing, I >> am guessing this may be similar to when an existing statement is running and >> cannot see a previously committed transaction that is open in its snapshot. >> So, I am thinking this is probably the preferable option (keeping it open on >> GTM until committed on all nodes), but we should test it. In any event, we >> should also fix the panic. > > If we let it open the transaction open on GTM, how do we know the GXID that > has been used for Commit (different from the one that has been used for > PREPARE as I recall)? > > We can test the behavior to see if it is ok to close this one out, > otherwise, we have more work to do... > > If we do a Commit prepare on the remaining node that crashed, we have to > commit the former PREPARE GXID, the former COMMIT PREPARED GXID and also the > GXID that is used to issue the new COMMIT PREPARED on the remaining node. > > It is easy to get the GXID used for former PREPARE and new COMMIT PREPARED. > But there is no real way yet to get back the GXID used for the former COMMIT > PREPARE. > I would see two ways to correct that: > 1) Save the former COMMIT PREPARED GXID in GTM, but this would really impact > performance. > 2) Save the COMMIT PREPARED GXID on Coordinator and let the GXACT open on > Coordinator (would be the best solution, but the transaction has already > been committed on Coordinator). > > I think we need to research the effects of this and see how the system > behaves if the partially failed commit prepared GXID is closed. I suppose it > could cause a problem with viewing pg_prepared_xacts. We don't want the > hint bits to get updated.... well, the first XID will be lower, so the lower > open xmin should keep this from having the tuple frozen. > > That's why I think the transaction should be to close the transaction on > GTM, and a monitoring agent would be in charge to commit on the remaining > nodes that crashed if a partial COMMIT has been done. > > From above, the node is still active and the query after the transaction is > returning partial results. It should be an all or nothing operation. If we > close the transaction on GTM, then it means that Postgres-XC is not atomic. > I think it is important to be ACID compliant. > > I think we should fix the panic, then test how the system behaves if, even > though the transaction is committed on one node, if we keep the transaction > open. The XID will appear in all the snapshots and the row should not be > viewable, and we can make sure that vacuum is also ok (should be). If it > works ok, then I think we should keep the transaction open on GTM until all > components have committed. > > > Btw, it is a complicated point, so other's opinion is completely welcome. > > Yes. > > Thanks, > > Mason > > Regards, > > -- > Michael Paquier > http://michaelpq.users.sourceforge.net > > > > -- > Mason Sharp > EnterpriseDB Corporation > The Enterprise Postgres Company > > > This e-mail message (and any attachment) is intended for the use of > the individual or entity to whom it is addressed. This message > contains information from EnterpriseDB Corporation that may be > privileged, confidential, or exempt from disclosure under applicable > law. If you are not the intended recipient or authorized to receive > this for the intended recipient, any use, dissemination, distribution, > retention, archiving, or copying of this communication is strictly > prohibited. If you have received this e-mail in error, please notify > the sender immediately by reply e-mail and delete this message. > > ------------------------------------------------------------------------------ > Oracle to DB2 Conversion Guide: Learn learn about native support for PL/SQL, > new data types, scalar functions, improved concurrency, built-in packages, > OCI, SQL*Plus, data movement tools, best practices and more. > http://p.sf.net/sfu/oracle-sfdev2dev > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > |
From: Michael P. <mic...@gm...> - 2010-12-14 08:07:59
Attachments:
implicit2pc7.patch
|
Hi all, Here is the fix I propose based on the idea I proposed in a previous mail. If a prepared transaction, partially committed, is aborted, this patch gathers the handles to nodes where an error occurred and saves them on GTM. The prepared transaction partially committed is kept alive on GTM, so other transactions cannot see the partially committed results. To complete the commit of the prepared transaction partially committed, it is necessary to issue a COMMIT PREPARED 'gid'. Once this command is issued, transaction will finish its commit properly. Mason, this solves the problem you saw when you made your tests. It also respects the rule that a 2PC transaction partially committed has to be committed. Thanks, -- Michael Paquier http://michaelpq.users.sourceforge.net |
From: Mason S. <mas...@en...> - 2010-12-14 23:26:41
|
> Hi all, > > Here is the fix I propose based on the idea I proposed in a previous mail. > If a prepared transaction, partially committed, is aborted, this patch > gathers the handles to nodes where an error occurred and saves them on > GTM. > > The prepared transaction partially committed is kept alive on GTM, so > other transactions cannot see the partially committed results. > To complete the commit of the prepared transaction partially > committed, it is necessary to issue a COMMIT PREPARED 'gid'. > Once this command is issued, transaction will finish its commit properly. > > Mason, this solves the problem you saw when you made your tests. > It also respects the rule that a 2PC transaction partially committed > has to be committed. > Just took a brief look so far. Seems better. I understand that recovery and HA is in development and things are being done to lay the groundwork and improve, and that with this patch we are not trying to yet handle any and every situation. What happens if the coordinator fails before it can update GTM though? Also, I did a test and got this: WARNING: unexpected EOF on datanode connection WARNING: Connection to Datanode 1 has unexpected state 1 and will be dropped ERROR: Could not commit prepared transaction implicitely server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. #0 0x907afe42 in kill$UNIX2003 () #1 0x9082223a in raise () #2 0x9082e679 in abort () #3 0x003917ce in ExceptionalCondition (conditionName=0x433f6c "!(((proc->xid) != ((TransactionId) 0)))", errorType=0x3ecfd4 "FailedAssertion", fileName=0x433f50 "procarray.c", lineNumber=283) at assert.c:57 #4 0x00280916 in ProcArrayEndTransaction (proc=0x41cca70, latestXid=1018) at procarray.c:283 #5 0x0005905c in AbortTransaction () at xact.c:2525 #6 0x00059a6e in AbortCurrentTransaction () at xact.c:3001 #7 0x00059b10 in AbortCurrentTransactionOnce () at xact.c:3094 #8 0x0029c8d6 in PostgresMain (argc=4, argv=0x1002ff8, username=0x1002fc8 "masonsharp") at postgres.c:3622 #9 0x0025851c in BackendRun (port=0x7016f0) at postmaster.c:3607 #10 0x00257883 in BackendStartup (port=0x7016f0) at postmaster.c:3216 #11 0x002542b5 in ServerLoop () at postmaster.c:1445 #12 0x002538c1 in PostmasterMain (argc=5, argv=0x7005a0) at postmaster.c:1098 #13 0x001cf2f1 in main (argc=5, argv=0x7005a0) at main.c:188 I did the same test as before. I killed a data node after it received a COMMIT PREPARED message. I think we should be able to continue. The good news is that I should not see partially committed data, which I do not. But if I try and manually commit it from a new connection to the coordinator: mds=# COMMIT PREPARED 'T1018'; ERROR: Could not get GID data from GTM Maybe GTM removed this info when the coordinator disconnected? (Or maybe implicit transactions are only associated with a certain connection?) I can see the transaction on one data node, but not the other. Ideally we would come up with a scheme where if the coordinator session does not notify GTM, we can somehow recover. Maybe this is my fault- I believe I advocated avoiding the extra work for implicit 2PC in the name of performance. :-) We can think about what to do in the short term, and how to handle in the long term. In the short term, your approach may be good enough once debugged, since it is a relatively rare case. Long term we could think about a thread that runs on GTM and wakes up every 30 or 60 seconds or so (configurable), collects implicit transactions from the nodes (extension to pg_prepared_xacts required?) and if it sees that the XID does not have an associated live connection, knows that something went awry. It then sees if it committed on any of the nodes. If not, rollback all, if it did on at least one, commit on all. If one of the data nodes is down, it won't do anything, perhaps log a warning. This would avoid user intervention, and would be pretty cool. Some of this code you may already have been working on for recovery and we could reuse here. Regards, Mason > Thanks, > > -- > Michael Paquier > http://michaelpq.users.sourceforge.net > -- Mason Sharp EnterpriseDB Corporation The Enterprise Postgres Company This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: Michael P. <mic...@gm...> - 2010-12-15 02:37:12
Attachments:
implicit2pc8.patch
|
> Just took a brief look so far. Seems better. > > I understand that recovery and HA is in development and things are being > done to lay the groundwork and improve, and that with this patch we are not > trying to yet handle any and every situation. What happens if the > coordinator fails before it can update GTM though? > In this case the information is not saved on GTM. For a Coordinator crash, I was thinking of an external utility associated with the monitoring agent in charge of analyzing prepared transactions of the crashed Coordinator. This utility would analyze in the cluster the prepared transaction of the crashed Coordinator, and decide automatically which one to abort, commit depending on the transaction situation. For this purpose, it is essential to extend the 2PC information sent to Nodes (Datanodes of course, but Coordinators included in case of DDL). The patch extending 2PC information on nodes is also on this thread (patch based on version 6 of implicit 2pc patch). In this case I believe it is not necessary to save any info on GTM as the extended 2PC information only would be necessary to analyze the 2PC transaction of the crashed Coordinator. > Also, I did a test and got this: > > > WARNING: unexpected EOF on datanode connection > WARNING: Connection to Datanode 1 has unexpected state 1 and will be > dropped > > ERROR: Could not commit prepared transaction implicitely > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Failed. > > #0 0x907afe42 in kill$UNIX2003 () > #1 0x9082223a in raise () > #2 0x9082e679 in abort () > #3 0x003917ce in ExceptionalCondition (conditionName=0x433f6c > "!(((proc->xid) != ((TransactionId) 0)))", errorType=0x3ecfd4 > "FailedAssertion", fileName=0x433f50 "procarray.c", lineNumber=283) at > assert.c:57 > #4 0x00280916 in ProcArrayEndTransaction (proc=0x41cca70, latestXid=1018) > at procarray.c:283 > #5 0x0005905c in AbortTransaction () at xact.c:2525 > #6 0x00059a6e in AbortCurrentTransaction () at xact.c:3001 > #7 0x00059b10 in AbortCurrentTransactionOnce () at xact.c:3094 > #8 0x0029c8d6 in PostgresMain (argc=4, argv=0x1002ff8, username=0x1002fc8 > "masonsharp") at postgres.c:3622 > #9 0x0025851c in BackendRun (port=0x7016f0) at postmaster.c:3607 > #10 0x00257883 in BackendStartup (port=0x7016f0) at postmaster.c:3216 > #11 0x002542b5 in ServerLoop () at postmaster.c:1445 > #12 0x002538c1 in PostmasterMain (argc=5, argv=0x7005a0) at > postmaster.c:1098 > #13 0x001cf2f1 in main (argc=5, argv=0x7005a0) at main.c:188 > I suppose you enabled assertions when doing this test. The Coordinator was complaining that its transaction ID in PGProc was not correct. It is indeed true as in the case tested the transaction has ever committed on Coordinator. > I did the same test as before. I killed a data node after it received a > COMMIT PREPARED message. > > I think we should be able to continue. > > The good news is that I should not see partially committed data, which I do > not. > > But if I try and manually commit it from a new connection to the > coordinator: > > mds=# COMMIT PREPARED 'T1018'; > ERROR: Could not get GID data from GTM > > Maybe GTM removed this info when the coordinator disconnected? (Or maybe > implicit transactions are only associated with a certain connection?) > Yes it has been removed when your Coordinator instance crashed. I can see the transaction on one data node, but not the other. > > Ideally we would come up with a scheme where if the coordinator session > does not notify GTM, we can somehow recover. Maybe this is my fault- I > believe I advocated avoiding the extra work for implicit 2PC in the name of > performance. :-) > > We can think about what to do in the short term, and how to handle in the > long term. > > In the short term, your approach may be good enough once debugged, since it > is a relatively rare case. > > Long term we could think about a thread that runs on GTM and wakes up every > 30 or 60 seconds or so (configurable), collects implicit transactions from > the nodes (extension to pg_prepared_xacts required?) and if it sees that the > XID does not have an associated live connection, knows that something went > awry. It then sees if it committed on any of the nodes. If not, rollback > all, if it did on at least one, commit on all. If one of the data nodes is > down, it won't do anything, perhaps log a warning. This would avoid user > intervention, and would be pretty cool. Some of this code you may already > have been working on for recovery and we could reuse here. > > This is a nice idea. It depends of course on one thing; if we decide to base the HA features on a monitoring agent only or if XC should be able to run on its own (or even allow both modes). -- Michael Paquier http://michaelpq.users.sourceforge.net |
From: Mason S. <mas...@en...> - 2010-12-20 15:35:17
|
On 12/14/10 9:37 PM, Michael Paquier wrote: > > Just took a brief look so far. Seems better. > > I understand that recovery and HA is in development and things are > being done to lay the groundwork and improve, and that with this > patch we are not trying to yet handle any and every situation. > What happens if the coordinator fails before it can update GTM though? > > In this case the information is not saved on GTM. > For a Coordinator crash, I was thinking of an external utility > associated with the monitoring agent in charge of analyzing prepared > transactions of the crashed Coordinator. > This utility would analyze in the cluster the prepared transaction of > the crashed Coordinator, and decide automatically which one to abort, > commit depending on the transaction situation. > > For this purpose, it is essential to extend the 2PC information sent > to Nodes (Datanodes of course, but Coordinators included in case of DDL). > The patch extending 2PC information on nodes is also on this thread > (patch based on version 6 of implicit 2pc patch). > In this case I believe it is not necessary to save any info on GTM as > the extended 2PC information only would be necessary to analyze the > 2PC transaction of the crashed Coordinator. > > > Also, I did a test and got this: > > > WARNING: unexpected EOF on datanode connection > WARNING: Connection to Datanode 1 has unexpected state 1 and will > be dropped > > ERROR: Could not commit prepared transaction implicitely > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Failed. > > #0 0x907afe42 in kill$UNIX2003 () > #1 0x9082223a in raise () > #2 0x9082e679 in abort () > #3 0x003917ce in ExceptionalCondition (conditionName=0x433f6c > "!(((proc->xid) != ((TransactionId) 0)))", errorType=0x3ecfd4 > "FailedAssertion", fileName=0x433f50 "procarray.c", > lineNumber=283) at assert.c:57 > #4 0x00280916 in ProcArrayEndTransaction (proc=0x41cca70, > latestXid=1018) at procarray.c:283 > #5 0x0005905c in AbortTransaction () at xact.c:2525 > #6 0x00059a6e in AbortCurrentTransaction () at xact.c:3001 > #7 0x00059b10 in AbortCurrentTransactionOnce () at xact.c:3094 > #8 0x0029c8d6 in PostgresMain (argc=4, argv=0x1002ff8, > username=0x1002fc8 "masonsharp") at postgres.c:3622 > #9 0x0025851c in BackendRun (port=0x7016f0) at postmaster.c:3607 > #10 0x00257883 in BackendStartup (port=0x7016f0) at postmaster.c:3216 > #11 0x002542b5 in ServerLoop () at postmaster.c:1445 > #12 0x002538c1 in PostmasterMain (argc=5, argv=0x7005a0) at > postmaster.c:1098 > #13 0x001cf2f1 in main (argc=5, argv=0x7005a0) at main.c:188 > > I suppose you enabled assertions when doing this test. > The Coordinator was complaining that its transaction ID in PGProc was > not correct. > It is indeed true as in the case tested the transaction has ever > committed on Coordinator. I tried out the latest patch and it still crashes the coordinator. #0 pgxc_node_implicit_commit_prepared (prepare_xid=924, commit_xid=925, pgxc_handles=0x1042c0c, gid=0xbfffef4f "T924", is_commit=1 '\001') at execRemote.c:1826 1826 int co_conn_count = pgxc_handles->co_conn_count; (gdb) bt #0 pgxc_node_implicit_commit_prepared (prepare_xid=924, commit_xid=925, pgxc_handles=0x1042c0c, gid=0xbfffef4f "T924", is_commit=1 '\001') at execRemote.c:1826 #1 0x001c2b0d in PGXCNodeImplicitCommitPrepared (prepare_xid=924, commit_xid=925, gid=0xbfffef4f "T924", is_commit=1 '\001') at execRemote.c:1775 #2 0x0005845f in CommitTransaction () at xact.c:2013 #3 0x0005948f in CommitTransactionCommand () at xact.c:2746 #4 0x0029a6d7 in finish_xact_command () at postgres.c:2437 #5 0x002980d2 in exec_simple_query (query_string=0x103481c "commit;") at postgres.c:1070 #6 0x0029ccbb in PostgresMain (argc=4, argv=0x1002ff8, username=0x1002fc8 "masonsharp") at postgres.c:3766 #7 0x0025848c in BackendRun (port=0x7016f0) at postmaster.c:3607 #8 0x002577f3 in BackendStartup (port=0x7016f0) at postmaster.c:3216 #9 0x00254225 in ServerLoop () at postmaster.c:1445 #10 0x00253831 in PostmasterMain (argc=5, argv=0x7005a0) at postmaster.c:1098 #11 0x001cf261 in main (argc=5, argv=0x7005a0) at main.c:188 pgxc_handles looks ok though. It works ok in your environment? > > > I did the same test as before. I killed a data node after it > received a COMMIT PREPARED message. > > I think we should be able to continue. > > The good news is that I should not see partially committed data, > which I do not. > > But if I try and manually commit it from a new connection to the > coordinator: > > mds=# COMMIT PREPARED 'T1018'; > ERROR: Could not get GID data from GTM > > Maybe GTM removed this info when the coordinator disconnected? (Or > maybe implicit transactions are only associated with a certain > connection?) > > Yes it has been removed when your Coordinator instance crashed. > > I can see the transaction on one data node, but not the other. > > Ideally we would come up with a scheme where if the coordinator > session does not notify GTM, we can somehow recover. Maybe this > is my fault- I believe I advocated avoiding the extra work for > implicit 2PC in the name of performance. :-) > > We can think about what to do in the short term, and how to handle > in the long term. > > In the short term, your approach may be good enough once debugged, > since it is a relatively rare case. > > Long term we could think about a thread that runs on GTM and wakes > up every 30 or 60 seconds or so (configurable), collects implicit > transactions from the nodes (extension to pg_prepared_xacts > required?) and if it sees that the XID does not have an associated > live connection, knows that something went awry. It then sees if > it committed on any of the nodes. If not, rollback all, if it did > on at least one, commit on all. If one of the data nodes is down, > it won't do anything, perhaps log a warning. This would avoid user > intervention, and would be pretty cool. Some of this code you may > already have been working on for recovery and we could reuse here. > > This is a nice idea. > It depends of course on one thing; if we decide to base the HA > features on a monitoring agent only or if XC should be able to run on > its own (or even allow both modes). We can think about it... It could be separate from GTM, part of a monitoring process. Mason > > -- > Michael Paquier > http://michaelpq.users.sourceforge.net > -- Mason Sharp EnterpriseDB Corporation The Enterprise Postgres Company This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: Michael P. <mic...@gm...> - 2010-12-21 08:34:05
|
Sorry for my late reply, please see my answers inline. > #0 pgxc_node_implicit_commit_prepared (prepare_xid=924, commit_xid=925, > pgxc_handles=0x1042c0c, gid=0xbfffef4f "T924", is_commit=1 '\001') at > execRemote.c:1826 > 1826 int co_conn_count = pgxc_handles->co_conn_count; > (gdb) bt > #0 pgxc_node_implicit_commit_prepared (prepare_xid=924, commit_xid=925, > pgxc_handles=0x1042c0c, gid=0xbfffef4f "T924", is_commit=1 '\001') at > execRemote.c:1826 > #1 0x001c2b0d in PGXCNodeImplicitCommitPrepared (prepare_xid=924, > commit_xid=925, gid=0xbfffef4f "T924", is_commit=1 '\001') at > execRemote.c:1775 > #2 0x0005845f in CommitTransaction () at xact.c:2013 > #3 0x0005948f in CommitTransactionCommand () at xact.c:2746 > #4 0x0029a6d7 in finish_xact_command () at postgres.c:2437 > #5 0x002980d2 in exec_simple_query (query_string=0x103481c "commit;") at > postgres.c:1070 > #6 0x0029ccbb in PostgresMain (argc=4, argv=0x1002ff8, username=0x1002fc8 > "masonsharp") at postgres.c:3766 > #7 0x0025848c in BackendRun (port=0x7016f0) at postmaster.c:3607 > #8 0x002577f3 in BackendStartup (port=0x7016f0) at postmaster.c:3216 > #9 0x00254225 in ServerLoop () at postmaster.c:1445 > #10 0x00253831 in PostmasterMain (argc=5, argv=0x7005a0) at > postmaster.c:1098 > #11 0x001cf261 in main (argc=5, argv=0x7005a0) at main.c:188 > > pgxc_handles looks ok though. It works ok in your environment? > It looks that it crashed when assigning the coordinator number from pgxc_handles. I made a couple of tests in my environment and it worked well, with assertions assigned. By a couple of tests, I made a sequence creation, a couple of inserts in single and multiple nodes, DDL run. Everything went fine. We already saw in the past that not all the problems are reproducible in the environments we use for tests. Could you give me more details about this crash? -- Michael Paquier http://michaelpq.users.sourceforge.net |
From: Mason S. <mas...@en...> - 2010-12-22 01:23:35
|
On 12/21/10 3:33 AM, Michael Paquier wrote: > > > Could you give me more details about this crash? > After "make clean; make", things look better. I found another issue though. Still, you can go ahead and commit this since it is close, in order to make merging easier. If the coordinator tries to commit the prepared transactions, if it sends commit prepared to one of the nodes, then is killed before it can send to the other, if I restart the coordinator, I see the data from one of the nodes only (GTM closed the transcation), which is not atomic. The second data node is still alive and was the entire time. I fear we may have to treat implicit transactions similar to explicit transactions. (BTW, do we handle explicit properly for these similar cases, too?) If we stick with performance short cuts it is hard to be reliably atomic. (Again, I will take the blame for trying to speed things up. Perhaps we can have it as a configuration option if people have a lot of implicit 2PC going on and understand the risks.) Anyway, the transaction would remain open, but it would have to be resolved somehow. If we had a "transaction clean up" thread in GTM, it could note the transaction information and periodically try and connect to the registered nodes and resolve according to the rules we have talked about. (Again, some of this code could be in some of the recovery tools you are writing, too). The nice thing about doing something like this is we can automate things as much as possible and not require DBA intervention; if a non-GTM component goes down and comes up again, things will resolve by themselves. I suppose if it is GTM itself that went down, once it rebuilds state properly, this same mechanism could be called at the end of GTM recovery and resolve the outstanding issues. I think we need to walk through every step in the commit sequence and kill an involved process and verify that we have a consistent view of the database afterward, and that we have the ability/tools to resolve it. This code requires careful testing. Thanks, Mason > -- > Michael Paquier > http://michaelpq.users.sourceforge.net > -- Mason Sharp EnterpriseDB Corporation The Enterprise Postgres Company This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: Michael P. <mic...@gm...> - 2010-12-22 02:12:43
|
On Wed, Dec 22, 2010 at 10:23 AM, Mason Sharp <mas...@en...>wrote: > After "make clean; make", things look better. > > Thanks to take the time to check that. > I found another issue though. Still, you can go ahead and commit this since > it is close, in order to make merging easier. > > I'll do it, thanks. > If the coordinator tries to commit the prepared transactions, if it sends > commit prepared to one of the nodes, then is killed before it can send to > the other, if I restart the coordinator, I see the data from one of the > nodes only (GTM closed the transcation), which is not atomic. The second > data node is still alive and was the entire time. > That is true, if coordinator crashes, GTM closes all the backends of transactions that it considers as open. In the case of implicit COMMIT, even if we prepare/commit on the nodes, it is still seen as open on GTM. > > I fear we may have to treat implicit transactions similar to explicit > transactions. (BTW, do we handle explicit properly for these similar cases, > too?) If we stick with performance short cuts it is hard to be reliably > atomic. (Again, I will take the blame for trying to speed things up. > Perhaps we can have it as a configuration option if people have a lot of > implicit 2PC going on and understand the risks.) > Yeah I think so. A GUC parameter would make the deal, but I'd like to discuss more about that before deciding anything. > Anyway, the transaction would remain open, but it would have to be resolved > somehow. > > If we had a "transaction clean up" thread in GTM, it could note the > transaction information and periodically try and connect to the registered > nodes and resolve according to the rules we have talked about. (Again, some > of this code could be in some of the recovery tools you are writing, too). > The nice thing about doing something like this is we can automate things as > much as possible and not require DBA intervention; if a non-GTM component > goes down and comes up again, things will resolve by themselves. I suppose > if it is GTM itself that went down, once it rebuilds state properly, this > same mechanism could be called at the end of GTM recovery and resolve the > outstanding issues. > That it more or less what we are planning to do with the utility that will have to check the remaining 2PC transaction after a Coordinator crash. This utility would be kicked by the monitoring agent when noticing a Coordinator crash. This feature needs two things: 1) fix for EXECUTE DIRECT 2) extension of 2PC table (patch already written but not realigned with latest 2PC code) I think we need to walk through every step in the commit sequence and kill > an involved process and verify that we have a consistent view of the > database afterward, and that we have the ability/tools to resolve it. > > This code requires careful testing. > That's true, this code could lead easily to unexpected issues by playing with 2PC. -- Michael Paquier http://michaelpq.users.sourceforge.net |