|
From: Mason S. <mas...@en...> - 2010-12-12 19:37:22
|
On 12/6/10 12:32 AM, Michael Paquier wrote:
>
> I changed deeply the algorithm to avoid code duplication for implicit 2PC.
> With the patch attached, Coordinator is prepared only if 2
> Coordinators at least are involved in a transaction (DDL case).
> If only one Coordinator is involved in transaction or if transaction
> does not contain any DDL, transaction is prepared on the involved
> nodes only.
>
> To sum up:
> 1) for DDL transaction (more than 1 Coordinator and more than 1
> Datanode involved in a transaction)
> - prepare on Coordinator (2PC file written)
> - prepare on Nodes (2PC file written)
> - Commit prepared on Coordinator
> - Commit prepared on Datanodes
> 2) If no Coordinator, or only one Coordinator is involved in a transaction
> - prepare on nodes
> - commit on Coordinator
> - Commit on Datanodes
>
> Note: I didn' t put calls to implicit prepare functions in a separate
> functions because modification of CommitTransaction() are really light.
>
I reviewed, and I thought it looked good, except for a possible issue
with committing.
I wanted to test what happened with implicit transactions when there was
a failure.
I executed this in one session:
mds1=# begin;
BEGIN
mds1=# insert into mds1 values (1,1);
INSERT 0 1
mds1=# insert into mds1 values (2,2);
INSERT 0 1
mds1=# commit;
Before committing, I fired up gdb for a coordinator session and a data
node session.
On one of the data nodes, when the COMMIT PREPARED was received, I
killed the backend to see what would happen. On the Coordinator I saw this:
WARNING: unexpected EOF on datanode connection
WARNING: Connection to Datanode 1 has unexpected state 1 and will be
dropped
WARNING: Connection to Datanode 2 has unexpected state 1 and will be
dropped
ERROR: Could not commit prepared transaction implicitely
PANIC: cannot abort transaction 10312, it was already committed
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
I am not sure we should be aborting 10312, since it was committed on one
of the nodes. It corresponds to the original prepared transaction. We
also do not want a panic to happen.
Next, I started a new coordinator session:
mds1=# select * from mds1;
col1 | col2
------+------
2 | 2
(1 row)
I only see one of the rows. I thought, well, ok, we cannot undo a
commit, and the other one must commit eventually. I was able to
continue working normally:
mds1=# insert into mds1 values (3,3);
INSERT 0 1
mds1=# insert into mds1 values (4,4);
INSERT 0 1
mds1=# insert into mds1 values (5,5);
INSERT 0 1
mds1=# insert into mds1 values (6,6);
INSERT 0 1
mds1=# select xmin,* from mds1;
xmin | col1 | col2
-------+------+------
10420 | 4 | 4
10422 | 6 | 6
10312 | 2 | 2
10415 | 3 | 3
10421 | 5 | 5
(5 rows)
Note xmin keeps increasing because we closed the transaction on GTM at
the "finish:" label. This may or may not be ok.
Meanwhile, on the failed data node:
mds1=# select * from pg_prepared_xacts;
WARNING: Do not have a GTM snapshot available
WARNING: Do not have a GTM snapshot available
transaction | gid | prepared | owner |
database
-------------+--------+-------------------------------+------------+----------
10312 | T10312 | 2010-12-12 12:04:30.946287-05 | xxxxxx | mds1
(1 row)
The transaction id is 10312. Normally this would still appear in
snapshots, but we close it on GTM.
What should we do?
- We could leave as is. We may in the future have an XC monitoring
process look for possible 2PC anomalies occasionally and send an alert
so that they could be resolved by a DBA.
- We could instead choose not close out the transaction on GTM, so that
the xid is still in snapshots. We could test if the rows are viewable or
not. This could result in other side effects, but without further
testing, I am guessing this may be similar to when an existing statement
is running and cannot see a previously committed transaction that is
open in its snapshot. So, I am thinking this is probably the preferable
option (keeping it open on GTM until committed on all nodes), but we
should test it. In any event, we should also fix the panic.
It may be that we had a similar problem in the existing code before this
patch, although I did some testing a few months back Pavan's crash test
patch and things seemed stable. Also, we might want to check that
explicit 2PC also handles this OK.
Thanks,
Mason
> Regards,
>
> --
> Michael Paquier
> http://michaelpq.users.sourceforge.net
>
>
> ------------------------------------------------------------------------------
> What happens now with your Lotus Notes apps - do you make another costly
> upgrade, or settle for being marooned without product support? Time to move
> off Lotus Notes and onto the cloud with Force.com, apps are easier to build,
> use, and manage than apps on traditional platforms. Sign up for the Lotus
> Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d
>
>
> _______________________________________________
> Postgres-xc-developers mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
--
Mason Sharp
EnterpriseDB Corporation
The Enterprise Postgres Company
This e-mail message (and any attachment) is intended for the use of
the individual or entity to whom it is addressed. This message
contains information from EnterpriseDB Corporation that may be
privileged, confidential, or exempt from disclosure under applicable
law. If you are not the intended recipient or authorized to receive
this for the intended recipient, any use, dissemination, distribution,
retention, archiving, or copying of this communication is strictly
prohibited. If you have received this e-mail in error, please notify
the sender immediately by reply e-mail and delete this message.
|