|
From: Zhou L. <zh...@gm...> - 2014-11-17 09:54:28
|
I found gtm proxy have some problems after gtm restarting , my environment
is rhel7 and pgxc1.2.
1、gtm_proxy don't handle SIGPIPE , so gtm_proxy also down when gtm down, it
sometimes occur.
reproduce steps:
PGXC stop gtm master
Stop GTM master
waiting for server to shut down.... done
server stopped
PGXC
PGXC monitor all
Not running: gtm master
Running: gtm proxy gtm_pxy1
Running: gtm proxy gtm_pxy2
Running: coordinator master coord1
Running: coordinator master coord2
Running: datanode master datanode1
Running: datanode master datanode2
PGXC
PGXC monitor all
Not running: gtm master
Not running: gtm proxy gtm_pxy1
Not running: gtm proxy gtm_pxy2
Running: coordinator master coord1
Running: coordinator master coord2
Running: datanode master datanode1
Running: datanode master datanode2
-----------------------------------------------------
Program received signal SIGPIPE, Broken pipe.
0x00007ff3cf3aa75b in send () from /lib64/libpthread.so.0
(gdb) bt
#0 0x00007ff3cf3aa75b in send () from /lib64/libpthread.so.0
#1 0x0000000000403c31 in gtmpqSendSome (conn=0x7ff3c80028f0, len=21) at
fe-misc.c:753
#2 0x0000000000403daf in gtmpqFlush (conn=0x7ff3c80028f0) at fe-misc.c:848
#3 0x000000000042331c in GTMProxy_ThreadMain (argp=0x1ccbb60) at
proxy_main.c:1526
#4 0x00000000004281db in GTMProxy_ThreadMainWrapper (argp=0x1ccbb60) at
proxy_thread.c:319
#5 0x00007ff3cf3a3df3 in start_thread () from /lib64/libpthread.so.0
#6 0x00007ff3cf0d13dd in clone () from /lib64/libc.so.6
2、i try to use pqsignal(SIGPIPE, SIG_IGN) to ignore a signal SIGPIPE, but
another question occur, steps:
PGXC stop gtm master
PGXC start gtm master // because of gtm restart , gtm proxy process
coordinaor or datanode unregiste command will fail
PGXC stop coordinator master coord1 // when stop coord1 , coord1 send
unregiste command 'C' , then 'X' to gtm proxy
PGXC start coordinator master coord1
PGXC Psql postgres
Selected coord1.
psql (PGXC , based on PG 9.3.4)
Type "help" for help.
postgres=# \d
WARNING: Xid is invalid.
WARNING: Xid is invalid.
WARNING: Xid is invalid.
ERROR: GTM error, could not obtain snapshot XID = 0
--------------------------------------
static void
ProcessPGXCNodeCommand(GTMProxy_ConnectionInfo *conninfo, GTM_Conn
*gtm_conn,
GTM_MessageType mtype, StringInfo message)
{ ....
/* Unregister Node also on Proxy */
if (Recovery_PGXCNodeUnregister(cmd_data.cd_reg.type,
cmd_data.cd_reg.nodename,
false,
conninfo->con_port->sock))
{
ereport(ERROR,
(EINVAL,
errmsg("Failed to Unregister node"))); // gtm proxy process
unregiste command 'C', if Recovery_PGXCNodeUnregister fail , ERROR lead to
siglongjmp
}
...
}
after this siglongjmp, gtm proxy will continue to process old unregister
command 'C', Recovery_PGXCNodeUnregister still fail, then continue to
siglongjmp. this phenomena cycle occurs. so gtm proxy cann't handler other
connections , it blocks coordinator and datanode get xid.
3、if unregister proxy fail because of gtm restarting, this siglongjmp
happens,gtm proxy doesn't stop propley. as follows:
/*
* Unregister Proxy on GTM
*/
static void
UnregisterProxy(void)
{ ...
failed:
elog(ERROR, "can not Unregister Proxy on GTM"); // ERROR lead to
siglongjmp.
}
these scenarios is rare , but can anyone fix the problem?
--
Thanks
Zhou Liang
|