From: Zhou L. <zh...@gm...> - 2014-11-17 09:54:28
|
I found gtm proxy have some problems after gtm restarting , my environment is rhel7 and pgxc1.2. 1、gtm_proxy don't handle SIGPIPE , so gtm_proxy also down when gtm down, it sometimes occur. reproduce steps: PGXC stop gtm master Stop GTM master waiting for server to shut down.... done server stopped PGXC PGXC monitor all Not running: gtm master Running: gtm proxy gtm_pxy1 Running: gtm proxy gtm_pxy2 Running: coordinator master coord1 Running: coordinator master coord2 Running: datanode master datanode1 Running: datanode master datanode2 PGXC PGXC monitor all Not running: gtm master Not running: gtm proxy gtm_pxy1 Not running: gtm proxy gtm_pxy2 Running: coordinator master coord1 Running: coordinator master coord2 Running: datanode master datanode1 Running: datanode master datanode2 ----------------------------------------------------- Program received signal SIGPIPE, Broken pipe. 0x00007ff3cf3aa75b in send () from /lib64/libpthread.so.0 (gdb) bt #0 0x00007ff3cf3aa75b in send () from /lib64/libpthread.so.0 #1 0x0000000000403c31 in gtmpqSendSome (conn=0x7ff3c80028f0, len=21) at fe-misc.c:753 #2 0x0000000000403daf in gtmpqFlush (conn=0x7ff3c80028f0) at fe-misc.c:848 #3 0x000000000042331c in GTMProxy_ThreadMain (argp=0x1ccbb60) at proxy_main.c:1526 #4 0x00000000004281db in GTMProxy_ThreadMainWrapper (argp=0x1ccbb60) at proxy_thread.c:319 #5 0x00007ff3cf3a3df3 in start_thread () from /lib64/libpthread.so.0 #6 0x00007ff3cf0d13dd in clone () from /lib64/libc.so.6 2、i try to use pqsignal(SIGPIPE, SIG_IGN) to ignore a signal SIGPIPE, but another question occur, steps: PGXC stop gtm master PGXC start gtm master // because of gtm restart , gtm proxy process coordinaor or datanode unregiste command will fail PGXC stop coordinator master coord1 // when stop coord1 , coord1 send unregiste command 'C' , then 'X' to gtm proxy PGXC start coordinator master coord1 PGXC Psql postgres Selected coord1. psql (PGXC , based on PG 9.3.4) Type "help" for help. postgres=# \d WARNING: Xid is invalid. WARNING: Xid is invalid. WARNING: Xid is invalid. ERROR: GTM error, could not obtain snapshot XID = 0 -------------------------------------- static void ProcessPGXCNodeCommand(GTMProxy_ConnectionInfo *conninfo, GTM_Conn *gtm_conn, GTM_MessageType mtype, StringInfo message) { .... /* Unregister Node also on Proxy */ if (Recovery_PGXCNodeUnregister(cmd_data.cd_reg.type, cmd_data.cd_reg.nodename, false, conninfo->con_port->sock)) { ereport(ERROR, (EINVAL, errmsg("Failed to Unregister node"))); // gtm proxy process unregiste command 'C', if Recovery_PGXCNodeUnregister fail , ERROR lead to siglongjmp } ... } after this siglongjmp, gtm proxy will continue to process old unregister command 'C', Recovery_PGXCNodeUnregister still fail, then continue to siglongjmp. this phenomena cycle occurs. so gtm proxy cann't handler other connections , it blocks coordinator and datanode get xid. 3、if unregister proxy fail because of gtm restarting, this siglongjmp happens,gtm proxy doesn't stop propley. as follows: /* * Unregister Proxy on GTM */ static void UnregisterProxy(void) { ... failed: elog(ERROR, "can not Unregister Proxy on GTM"); // ERROR lead to siglongjmp. } these scenarios is rare , but can anyone fix the problem? -- Thanks Zhou Liang |