From: Himpich, S. <Ste...@se...> - 2013-07-24 17:16:12
|
Hi there! My current setup consists of 6 Servers: GTM1 - with gtm master GTM2 - with gtm standby - with gtm_proxy DB01 - with coordinator1 (master) - with datanode1 (master) DB02 - with coordinator2 (master) - with datanode2 (master) dbback01 - with datanode2_slave dbback02 - with datanode1_slave Setup using pgxc_ctl init all works fine. I simulate a servercrash by shutting down: GTM2 DB02 dbback02 Then (via pgxc_ctl) I remove: gtm slave gtm_proxy datanode2 datanode1_slave Unfortunatly, this does not work if a gtm slave is envolved. The GTM still thinks there should be a gtm standby and times out trying to contact it. 1:140702615140096:2013-07-24 19:00:01.841 UTC -LOG: Failed to establish a connection with GTM standby. - 0x1dfd918 LOCATION: gtm_standby_connect_to_standby_int, gtm_standby.c:396 1:140702615140096:2013-07-24 19:00:55.893 UTC -LOG: Failed to establish a connection with GTM standby. - 0x1dfd918 LOCATION: gtm_standby_connect_to_standby_int, gtm_standby.c:396 1:140702615140096:2013-07-24 19:00:58.893 UTC -LOG: Failed to establish a connection with GTM standby. - 0x1dfd918 LOCATION: gtm_standby_connect_to_standby_int, gtm_standby.c:396 1:140702615140096:2013-07-24 19:01:01.893 UTC -LOG: Failed to establish a connection with GTM standby. - 0x1dfd918 LOCATION: gtm_standby_connect_to_standby_int, gtm_standby.c:396 monitor all output when everything was fine: pgxc_ctl(32409):1307241419_00 PGXC monitor all pgxc_ctl(32409):1307241419_00 Running: gtm master pgxc_ctl(32409):1307241419_00 Running: gtm slave pgxc_ctl(32409):1307241419_00 Running: gtm proxy gtm97-proxy pgxc_ctl(32409):1307241419_00 Running: coordinator master coorddbms181 pgxc_ctl(32409):1307241419_00 Running: coordinator master coorddbms197 pgxc_ctl(32409):1307241419_00 Running: datanode master datadbms181 pgxc_ctl(32409):1307241419_00 Running: datanode slave datadbms181 pgxc_ctl(32409):1307241419_00 Running: datanode master datadbms197 pgxc_ctl(32409):1307241419_00 Running: datanode slave datadbms197 monitor all output after (intentional) crash of server*2: pgxc_ctl(32409):1307241436_36 PGXC monitor all pgxc_ctl(32409):1307241437_35 Running: gtm master pgxc_ctl(32409):1307241437_38 Not running: gtm slave pgxc_ctl(32409):1307241437_41 Not running: gtm proxy gtm97-proxy pgxc_ctl(32409):1307241437_41 Running: coordinator master coorddbms181 pgxc_ctl(32409):1307241437_59 Not running: coordinator master coorddbms197 pgxc_ctl(32409):1307241437_59 Running: datanode master datadbms181 pgxc_ctl(32409):1307241438_17 Not running: datanode slave datadbms181 pgxc_ctl(32409):1307241438_20 Not running: datanode master datadbms197 pgxc_ctl(32409):1307241438_20 Running: datanode slave datadbms197 removal of gtm slave: pgxc_ctl(32409):1307241500_38 PGXC remove gtm slave pgxc_ctl(32409):1307241500_41 Removing gtm slave. pgxc_ctl(32409):1307241500_41 Done. [removal of remaining "not runing" parts, failover of slave datenode, etc] pgxc_ctl(32409):1307241540_40 PGXC monitor all pgxc_ctl(32409):1307241540_42 Running: gtm master pgxc_ctl(32409):1307241540_42 Running: coordinator master coorddbms181 pgxc_ctl(32409):1307241540_42 Running: datanode master datadbms181 pgxc_ctl(32409):1307241540_42 Running: datanode master datadbms197 But - as seen above - the gtm master still thinks he has a slave and tries (forever) to contact it. Restart of the whole (remaining) cluster doesn't help, too. The same setup without a gtm slave works fine - but I need it in case 'server*1' crashes. Any thoughts on that topic, any logging I might supply? Regards, Stefan |