Re: [Postgres-xc-developers] Patch correcting implicit 2-phase commit

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

>
> I reviewed, and I thought it looked good, except for a possible issue with
> committing.
>
> I wanted to test what happened with implicit transactions when there was a
> failure.
>
> I executed this in one session:
>
> mds1=# begin;
> BEGIN
> mds1=# insert into mds1 values (1,1);
> INSERT 0 1
> mds1=# insert into mds1 values (2,2);
> INSERT 0 1
> mds1=# commit;
>
> Before committing, I fired up gdb for a coordinator session and a data node
> session.
>
> On one of the data nodes, when the COMMIT PREPARED was received, I killed
> the backend to see what would happen. On the Coordinator I saw this:
>
>
> WARNING:  unexpected EOF on datanode connection
> WARNING:  Connection to Datanode 1 has unexpected state 1 and will be
> dropped
> WARNING:  Connection to Datanode 2 has unexpected state 1 and will be
> dropped
>
> ERROR:  Could not commit prepared transaction implicitely
> PANIC:  cannot abort transaction 10312, it was already committed
> server closed the connection unexpectedly
>     This probably means the server terminated abnormally
>     before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
>
> I am not sure we should be aborting 10312, since it was committed on one of
> the nodes. It corresponds to the original prepared transaction. We also do
> not want a panic to happen.
>
This has to be corrected.
If a PANIC happens on Coordinators each time a Datanode crashes, a simple
node crash would mess up the whole cluster.
It is a real problem I think.

> Next, I started a new coordinator session:
>
> mds1=# select * from mds1;
>  col1 | col2
> ------+------
>     2 |    2
> (1 row)
>
>
> I only see one of the rows. I thought, well, ok, we cannot undo a commit,
> and the other one must commit eventually.  I was able to continue working
> normally:
>
> mds1=# insert into mds1 values (3,3);
> INSERT 0 1
> mds1=# insert into mds1 values (4,4);
> INSERT 0 1
> mds1=# insert into mds1 values (5,5);
> INSERT 0 1
> mds1=# insert into mds1 values (6,6);
> INSERT 0 1
>
> mds1=# select xmin,* from mds1;
>  xmin  | col1 | col2
> -------+------+------
>  10420 |    4 |    4
>  10422 |    6 |    6
>  10312 |    2 |    2
>  10415 |    3 |    3
>  10421 |    5 |    5
> (5 rows)
>
>
> Note xmin keeps increasing because we closed the transaction on GTM at the
> "finish:" label. This may or may not be ok.
>
This should be OK, no?

>
> Meanwhile, on the failed data node:
>
> mds1=# select * from pg_prepared_xacts;
> WARNING:  Do not have a GTM snapshot available
> WARNING:  Do not have a GTM snapshot available
>  transaction |  gid   |           prepared            |   owner    |
> database
>
> -------------+--------+-------------------------------+------------+----------
>        10312 | T10312 | 2010-12-12 12:04:30.946287-05 | xxxxxx | mds1
> (1 row)
>
> The transaction id is 10312. Normally this would still appear in snapshots,
> but we close it on GTM.
>
> What should we do?
>
> - We could leave as is. We may in the future have an XC monitoring process
> look for possible 2PC anomalies occasionally and send an alert so that they
> could be resolved by a DBA.
>
I was thinking about an external utility that could clean up partially
committed or prepared transactions when a node crash happens.
This is a part of HA, so I think the only thing that should be corrected now
is the way errors are managed in the case of a partially committed prepared
transaction on nodes.
A PANIC is not acceptable for this case.

> - We could instead choose not close out the transaction on GTM, so that the
> xid is still in snapshots. We could test if the rows are viewable or not.
> This could result in other side effects, but without further testing, I am
> guessing this may be similar to when an existing statement is running and
> cannot see a previously committed transaction that is open in its snapshot.
> So, I am thinking this is probably the preferable option (keeping it open on
> GTM until committed on all nodes), but we should test it. In any event, we
> should also fix the panic.
>

If we let it open the transaction open on GTM, how do we know the GXID that
has been used for Commit (different from the one that has been used for
PREPARE as I recall)?
If we do a Commit prepare on the remaining node that crashed, we have to
commit the former PREPARE GXID, the former COMMIT PREPARED GXID and also the
GXID that is used to issue the new COMMIT PREPARED on the remaining node.
It is easy to get the GXID used for former PREPARE and new COMMIT PREPARED.
But there is no real way yet to get back the GXID used for the former COMMIT
PREPARE.
I would see two ways to correct that:
1) Save the former COMMIT PREPARED GXID in GTM, but this would really impact
performance.
2) Save the COMMIT PREPARED GXID on Coordinator and let the GXACT open on
Coordinator (would be the best solution, but the transaction has already
been committed on Coordinator).

That's why I think the transaction should be to close the transaction on
GTM, and a monitoring agent would be in charge to commit on the remaining
nodes that crashed if a partial COMMIT has been done.

Btw, it is a complicated point, so other's opinion is completely welcome.

Regards,

-- 
Michael Paquier
http://michaelpq.users.sourceforge.net