Re: [Postgres-xc-developers] Patch correcting implicit 2-phase commit

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On 12/12/10 9:28 PM, Michael Paquier wrote:
>
>
>     I reviewed, and I thought it looked good, except for a possible
>     issue with committing.
>
>     I wanted to test what happened with implicit transactions when
>     there was a failure.
>
>     I executed this in one session:
>
>     mds1=# begin;
>     BEGIN
>     mds1=# insert into mds1 values (1,1);
>     INSERT 0 1
>     mds1=# insert into mds1 values (2,2);
>     INSERT 0 1
>     mds1=# commit;
>
>     Before committing, I fired up gdb for a coordinator session and a
>     data node session.
>
>     On one of the data nodes, when the COMMIT PREPARED was received, I
>     killed the backend to see what would happen. On the Coordinator I
>     saw this:
>
>
>     WARNING:  unexpected EOF on datanode connection
>     WARNING:  Connection to Datanode 1 has unexpected state 1 and will
>     be dropped
>     WARNING:  Connection to Datanode 2 has unexpected state 1 and will
>     be dropped
>
>     ERROR:  Could not commit prepared transaction implicitely
>     PANIC:  cannot abort transaction 10312, it was already committed
>     server closed the connection unexpectedly
>         This probably means the server terminated abnormally
>         before or while processing the request.
>     The connection to the server was lost. Attempting reset: Failed.
>
>     I am not sure we should be aborting 10312, since it was committed
>     on one of the nodes. It corresponds to the original prepared
>     transaction. We also do not want a panic to happen.
>
> This has to be corrected.
> If a PANIC happens on Coordinators each time a Datanode crashes, a 
> simple node crash would mess up the whole cluster.
> It is a real problem I think.
Yes.

>
>
>     Next, I started a new coordinator session:
>
>     mds1=# select * from mds1;
>      col1 | col2
>     ------+------
>         2 |    2
>     (1 row)
>
>
>     I only see one of the rows. I thought, well, ok, we cannot undo a
>     commit, and the other one must commit eventually.  I was able to
>     continue working normally:
>
>     mds1=# insert into mds1 values (3,3);
>     INSERT 0 1
>     mds1=# insert into mds1 values (4,4);
>     INSERT 0 1
>     mds1=# insert into mds1 values (5,5);
>     INSERT 0 1
>     mds1=# insert into mds1 values (6,6);
>     INSERT 0 1
>
>     mds1=# select xmin,* from mds1;
>      xmin  | col1 | col2
>     -------+------+------
>      10420 |    4 |    4
>      10422 |    6 |    6
>      10312 |    2 |    2
>      10415 |    3 |    3
>      10421 |    5 |    5
>     (5 rows)
>
>
>     Note xmin keeps increasing because we closed the transaction on
>     GTM at the "finish:" label. This may or may not be ok.
>
> This should be OK, no?
Not necessarily.

>
>
>     Meanwhile, on the failed data node:
>
>     mds1=# select * from pg_prepared_xacts;
>     WARNING:  Do not have a GTM snapshot available
>     WARNING:  Do not have a GTM snapshot available
>      transaction |  gid   |           prepared            |   owner   
>     | database
>     -------------+--------+-------------------------------+------------+----------
>            10312 | T10312 | 2010-12-12 12:04:30.946287-05 | xxxxxx | mds1
>     (1 row)
>
>     The transaction id is 10312. Normally this would still appear in
>     snapshots, but we close it on GTM.
>
>     What should we do?
>
>     - We could leave as is. We may in the future have an XC monitoring
>     process look for possible 2PC anomalies occasionally and send an
>     alert so that they could be resolved by a DBA.
>
> I was thinking about an external utility that could clean up partially 
> committed or prepared transactions when a node crash happens.
> This is a part of HA, so I think the only thing that should be 
> corrected now is the way errors are managed in the case of a partially 
> committed prepared transaction on nodes.
> A PANIC is not acceptable for this case.
>
>
>     - We could instead choose not close out the transaction on GTM, so
>     that the xid is still in snapshots. We could test if the rows are
>     viewable or not. This could result in other side effects, but
>     without further testing, I am guessing this may be similar to when
>     an existing statement is running and cannot see a previously
>     committed transaction that is open in its snapshot. So, I am
>     thinking this is probably the preferable option (keeping it open
>     on GTM until committed on all nodes), but we should test it. In
>     any event, we should also fix the panic.
>
>
> If we let it open the transaction open on GTM, how do we know the GXID 
> that has been used for Commit (different from the one that has been 
> used for PREPARE as I recall)?
We can test the behavior to see if it is ok to close this one out, 
otherwise, we have more work to do...

> If we do a Commit prepare on the remaining node that crashed, we have 
> to commit the former PREPARE GXID, the former COMMIT PREPARED GXID and 
> also the GXID that is used to issue the new COMMIT PREPARED on the 
> remaining node.
> It is easy to get the GXID used for former PREPARE and new COMMIT 
> PREPARED. But there is no real way yet to get back the GXID used for 
> the former COMMIT PREPARE.
> I would see two ways to correct that:
> 1) Save the former COMMIT PREPARED GXID in GTM, but this would really 
> impact performance.
> 2) Save the COMMIT PREPARED GXID on Coordinator and let the GXACT open 
> on Coordinator (would be the best solution, but the transaction has 
> already been committed on Coordinator).
>
I think we need to research the effects of this and see how the system 
behaves if the partially failed commit prepared GXID is closed. I 
suppose it could cause a problem with viewing pg_prepared_xacts. We 
don't want the  hint bits to get updated.... well, the first XID will be 
lower, so the lower open xmin should keep this from having the tuple frozen.

> That's why I think the transaction should be to close the transaction 
> on GTM, and a monitoring agent would be in charge to commit on the 
> remaining nodes that crashed if a partial COMMIT has been done.

 From above, the node is still active and the query after the 
transaction is returning partial results. It should be an all or nothing 
operation. If we close the transaction on GTM, then it means that 
Postgres-XC is not atomic. I think it is important to be ACID compliant.

I think we should fix the panic, then test how the system behaves if, 
even though the transaction is committed on one node, if we keep the 
transaction open. The XID will appear in all the snapshots and the row 
should not be viewable, and we can make sure that vacuum is also ok 
(should be).  If it works ok, then I think we should keep the 
transaction open on GTM until all components have committed.

>
> Btw, it is a complicated point, so other's opinion is completely welcome.
>
Yes.

Thanks,

Mason

> Regards,
>
> -- 
> Michael Paquier
> http://michaelpq.users.sourceforge.net
>

-- 
Mason Sharp
EnterpriseDB Corporation
The Enterprise Postgres Company

This e-mail message (and any attachment) is intended for the use of
the individual or entity to whom it is addressed. This message
contains information from EnterpriseDB Corporation that may be
privileged, confidential, or exempt from disclosure under applicable
law. If you are not the intended recipient or authorized to receive
this for the intended recipient, any use, dissemination, distribution,
retention, archiving, or copying of this communication is strictly
prohibited. If you have received this e-mail in error, please notify
the sender immediately by reply e-mail and delete this message.