Postgres-XC Postgres-XC

Brought to you by: ahsanhadi, amitdkhan, ashutoshbapat, gabbasb, and 3 others

GTM-Standby

Authors:

Problem
Another bug
Yet still...
At last

GTM-Standby Again

serialize/deserialize
Memory context
Protocol Message Handling

Review of Backup to the Standby

How to fix (8th June, 2011)

Other Improvement (8th June, 2011)

Problem

With HA_Support branch, GTM-proxy successfully register itself to the GTM, but datanode fails.

Format of the registration message is as follows:

   if (gtmpqPutMsgStart('C', true, conn) ||
       gtmpqPutInt(MSG_NODE_REGISTER, sizeof (GTM_MessageType), conn) ||
       gtmpqPutnchar((char *)&type, sizeof(GTM_PGXCNodeType), conn) ||
       gtmpqPutnchar((char *)&nodenum, sizeof(GTM_PGXCNodeId), conn) ||
       gtmpqPutInt(strlen(host), sizeof (GTM_StrLen), conn) ||
       gtmpqPutnchar(host, strlen(host), conn) ||
       gtmpqPutnchar((char *)&port, sizeof(GTM_PGXCNodePort), conn) ||
       gtmpqPutnchar((char *)&proxynum, sizeof(GTM_PGXCNodeId), conn) ||
       gtmpqPutInt(strlen(datafolder), sizeof (GTM_StrLen), conn) ||
       gtmpqPutnchar(datafolder, strlen(datafolder), conn) ||
       gtmpqPutInt(status, sizeof(GTM_PGXCNodeStatus), conn))
       goto send_failed;

Compared with GTM-non-standby, two data were added:

1) host, including the length indicator, 2) Status.

Then, in the GTM-Proxy, this is handled by the function ProcessPGXCNodeCommand(). Different from the original version, it then tries to convert the IP address of the other pier (datanode/coordinator) into the host name using getaddrinfo(). Somehow, the host information sent with the above command is not consumed in GTM-Proxy.

Should look into this a bit more in detail.

Another bug

node_get_local_addr() needs return value initialization. Return value is stored into caller's area and if it is not initialized properly, caller may (due to his own variable settings) regard this as an error.

Yet still...

Somehow, length of the host name embedded in MSG_NODE_REGISTER message is not sent to GTM-ACT correctly.

At last

Year, there were a fault in GTM-Proxy. I found that GTM-Proxy does not receive MSG_NODE_REGISTER message members in correct order and did not proxy it to GTM in correct order. I fixed all this and then GTM-Proxy works fine.

GTM-Standby Again

I hoped that GTM-Standby then works find. It didn't happen. GTM-Standby crashed with a core. The crash is caused by dump_transactioninfo_elog(), which prints backup from the GTM-ACT to the log.

I didn't think this is not just this function's bug but this might be caused by wrong message send or parse by GTM or GTM-Standby. I examined the the response to the message MSG_TXN_GXID_LIST. In my test environment, it says that gti_thread_id value is parsed as 140380929935104. Because this is the thread id in GTM-ACT/GTM-Proxy, this number is quite unusual. I should visit GTM-ACT code to receive this message, parse and construct reply, then compare this with the parse done at GTM-Standby.

Yes, there were wrong implementation in gtm_serialize.c and gtm_serialize_debug.c. In gtm_serialize.c, sn_xip is regarded as "integer". in fact, it it is the address of GlobalTransacionId array and the number of the elements is indicated by sn_xcnt. On the other hand, in gtm_serialize_debug.c, sn_xip is regarded as a pointer to GlobalTransactionId. Because address in GTM-ACT is exported to GTM-Standby, this caused the error. ---> code fixed for the test.

I also found that coordcount and datanodecount can be zero and the current code malloc() size zero area which returns some address to be passed to future free. This address is "readable" and it can be harmful too. --> code fixed for the test.

serialize/deserialize

Serialize/deserialize of transaction information was not correct either. It includes the snapshot of the live transactions and is essentially an array of live transactions' GXIDs. Implementation just sent the "address" of the snapshot in GTM-ACT which causes serious problem in GTM-Standby.

So GTM-ACT should send the length and all the GXIDs in the tranasction structure.

GTM-Standby should then parse this and allocate sufficient memory to accomodate all the GXIDs. Because this (practically) occurs only in GTM/GTM-Standby process, it seems that we can simply use "palloc" for GTM. Here's another problem.

GTM and GTM-Standby exchanges it's status backup information including current Transaction status. This is done by gtm/client submodule, which may run both in postgres and GTM context. So gtm/client submodule embedded in Postgres process may have to refer GTM's memory context, which is not simple.

You may notice that this cannot be solved by providing separate header files or renaming memory allocation funcitons, which commonly done in PostgreSQL. Another way is to "encapsulate" memory allocation functions using function pointers.

Both Postgres and GTM process can provide global variable named, say, Gen_Alloc, which include pointers to memory allocation functions suitable for each processes. The former is based upon multi-processing and PGProc. The latter is based upon pthread. Both provides memory context, where all the memory allocated to specific memory context is automatically freed when the memory context is not in use any longer.

Gen_Alloc may contain necessary function entries, say, alloc, realloc, alloc0 and free, which point to real function entries for appropriate memory allocation context.

Common header file can be provided as follows (maybe in gtm/palloc.h and utils/palloc.h):

 typedef struct Gen_Alloc
 {
    void * (* alloc) (MemoryContext, size_t);
    void * (* realloc) (void *, size_t);
    void   (* free) (void *);
    void * (* alloc0) (size_t);
 } Gen_Alloc;
 //
 extern Gen_Alloc genAlloc_class;
 //
 #define genAlloc(x)      genAlloc_class.alloc(CurrentMemoryContext, x)
 #define genRealloc(x, y) genAlloc_class.realloc(x, y)
 #define genFree(x)       genAlloc_class.free(x)
 #define genAlloc0(x)     genAlloc_class.alloc0(CurrentMemoryContext, x)

We may need additional functions in this entry and we should pay good attention how to resolve the macro-supplied functions.

And each implementation may be:

GTM context: (maybe in mcxt.c)

 Gen_Alloc genAlloc_class = 
        {MemoryContextAlloc, pfree, repalloc, MemoryContextAllocZero};

Postgres context: maybe in gtm.c

 Gen_Alloc genAlloc_class =
        {MemoryContextAlloc, pfree, repalloc, MemoryContextAllocZero};

At present, they will only be used in gtm/client submodules, especially gtm_serialize.c and gtm_serialize_debug.c. We may want to extend the use of this methodology in the future.

Memory context

Needed modification for mcxt.c in backend and gtm, as well as common.c in pgxc_clean to provide memory allocation virtualization.

Take a look at src/include/gen_alloc.h for details.

Protocol Message Handling

The following errors were found in protocol message handling among GTM-Proxy, GTM, and GTM-Standby.

Node registration: in the node registration message, host name was optional and was determined by the type of the node to register. It was misleading so this item was made mandatory.
Snapshot serialization/deserialization: only the address of the GXIDs in snapshot in GTM-Act was copied to GTM-Standby. It is wrong. It was modified to include all the GXIDs in the array. In deserialization, because we need to allocate a memory in different context. The above "memory context" modification was needed to correct this.
Coordinators/datanodes involved in transactions: the were wrong either. Only the address of GTM-Act were transfered to GTM-Standby. It was modified to include all the nodes involved instead. In this case, "memory context" virtualization was needed too.

There's another issue in transaction serialization. Pthread_id of GTM-ACT is also copied to the Standby and the value looks incorrect, although this may not have bad impact later.

Review of Backup to the Standby

Current code just proxies request from the backends (or GTM-Proxy) to the standby. In GXID assignment, because of the parallel nature, there's no guarantee that a transaction is given the same GXID both in GTM and GTM-Standby. Instead, GTM should send assigned GXID to the backend.

Also, at present, GTM's backup is done after the response is sent to backends. We'll have a chance to loose COMMIT/ABORT in critical cases. Because COMMIT/ABORT report is sent from the backend when they're internally handled, GTM should first backup to the standby and then send the response to the backend. When reconnected, if backends does not have a reply, they will re-issue the request. If transactions are removed from the standby, it can just ignore and report they're done.

In this manner, we can also solve the last message issue. We don't have to check if backends did receive the last response or not.

Snapshot may not be backed up. This is essentially calculated from the current status. Also, all the backup requirement does not need any response. It will be better if GTM check its response occasionally to detect standby failure. We may be able to add "flags" to the message to indicate if "response" is needed or not. This can be used for other purpose. (maybe 32bit in network format?).

How to fix (8th June, 2011)

Create new command to backup transaction activities.
Send transaction handle and GXID so that standby can reproduce internal GXID management structure precisely.
Do not require any reply back to GTM.

Other Improvement (8th June, 2011)

Create new backup command for other operation, just not accompanied with any response back to GTM.
Establish infrastructure to start backup at any timing.
Backup-first strategy not to lose actions.
Test last-assigned GXID at the time of reconnect. If GTM-Proxy requires GXID for the backend where GXID has been assigned and not finished, old assigned GXID will be discarded.
Protocol from GTM-proxy to GTM should include some backend-identification (extend proxy header?)
Fix bugs in logging backup information.
Refactor logging.

Postgres-XC: Bug_Fixes

Postgres-XC Postgres-XC

GTM-Standby

Problem

Another bug

Yet still...

At last

GTM-Standby Again

serialize/deserialize

Memory context

Protocol Message Handling

Review of Backup to the Standby

How to fix (8th June, 2011)

Other Improvement (8th June, 2011)

Related