|
From: Aaron J. <aja...@re...> - 2014-06-03 07:20:38
|
I've been able to work my way backwards through the problem and have discovered the underlying problem. When a data coordinator is paired with a GTM proxy, it forwards its message to the GTM proxy who adds some data to the payload and forwards it to the GTM. Here is what I saw when looking at the wire.
The message captured between the data coordinator and the GTM proxy was as follows:
430000003d000000270000001064656d6f2e7075626c69632e666f6f0001000000000000000100000000000000ffffffffffffff7f010000000000000000
The message captured between the GTM proxy and the GTM was as follows:
430000000a00000000002746000000080000003b
Definitely a horrible truncation of the payload. The problem is in GTMProxy_ProxyCommand, specifically, the two calls to pq_getmsgunreadlen(). The assumption is that these are called before anything else. Unfortunately, the intel compiler calls pq_getmsgbytes() and subsequently calls the second instance of pq_getmsgunreadlen(). The second time it is called, the value returns zero and we end up with all kinds of byte truncation. I've attached a patch to fix the issue.
--- postgres-xc-1.2.1-orig/src/gtm/proxy/proxy_main.c 2014-04-03 05:18:38.000000000 +0000
+++ postgres-xc-1.2.1/src/gtm/proxy/proxy_main.c 2014-06-03 07:14:58.451411000 +0000
@@ -2390,6 +2390,7 @@
GTMProxy_CommandInfo *cmdinfo;
GTMProxy_ThreadInfo *thrinfo = GetMyThreadInfo;
GTM_ProxyMsgHeader proxyhdr;
+ size_t msgunreadlen = pq_getmsgunreadlen(message);
proxyhdr.ph_conid = conninfo->con_id;
@@ -2397,8 +2398,8 @@
if (gtmpqPutMsgStart('C', true, gtm_conn) ||
gtmpqPutnchar((char *)&proxyhdr, sizeof (GTM_ProxyMsgHeader), gtm_conn) ||
gtmpqPutInt(mtype, sizeof (GTM_MessageType), gtm_conn) ||
- gtmpqPutnchar(pq_getmsgbytes(message, pq_getmsgunreadlen(message)),
- pq_getmsgunreadlen(message), gtm_conn))
+ gtmpqPutnchar(pq_getmsgbytes(message, msgunreadlen),
+ msgunreadlen, gtm_conn))
elog(ERROR, "Error proxing data");
/*
Aaron
________________________________
From: Aaron Jackson [aja...@re...]
Sent: Monday, June 02, 2014 4:11 PM
To: pos...@li...
Subject: [Postgres-xc-general] Unable to create sequences
I tried to create a database as follows ...
CREATE TABLE Schema.TableFoo(
SomeId serial NOT NULL,
ForeignId int NOT NULL,
...
) WITH (OIDS = FALSE);
The server returned the following...
ERROR: GTM error, could not create sequence
Looked at the server logs for the gtm_proxy, nothing so I went to the gtm.
LOCATION: pq_copymsgbytes, pqformat.c:554
1:140488486782720:2014-06-02 21:10:58.870 UTC -WARNING: No transaction handle for gxid: 0
LOCATION: GTM_GXIDToHandle, gtm_txn.c:163
1:140488486782720:2014-06-02 21:10:58.870 UTC -WARNING: Invalid transaction handle: -1
LOCATION: GTM_HandleToTransactionInfo, gtm_txn.c:213
1:140488486782720:2014-06-02 21:10:58.870 UTC -ERROR: Failed to get a snapshot
LOCATION: ProcessGetSnapshotCommandMulti, gtm_snap.c:420
1:140488478390016:2014-06-02 21:10:58.871 UTC -ERROR: insufficient data left in message
LOCATION: pq_copymsgbytes, pqformat.c:554
1:140488486782720:2014-06-02 21:10:58.871 UTC -ERROR: insufficient data left in message
LOCATION: pq_copymsgbytes, pqformat.c:554
I'm definitely confused here. This cluster has been running fine for several days now. And now the GTM is failing. I performed a restart of the gtm and proxies (each using gtm_ctl to stop and restart the instance). Nothing has changed, the GTM continues to fail and will not create the sequence.
Any ideas?
Aaron
|