Hi Folks,
We are able to deploy DB cluster on freebsd platform with HA mode (4 DB Nodes , 4 connectors including 1 GTM Master,1 GTM Slave and 2 GTM Proxy) and system running fine. When we fail the connector node ( GTM Master) then failover to GTM Slave is successful. But it is not serving the DB Queries throwing the following error.
ERROR: Failed to get pooled connections
Even on restart/reboot of DB cluster also no luck. But every node is status is Running. On GTM Master Node (which moved from slave to master state on failover) few of the poolerport tcp connections moving to TIME_WAIT state.
Cluster state is :
**su -l pgxl -c "/mps/db_pgxl/bin/pgxc_ctl 'monitor all'"
/usr/bin/bash
Installing pgxc_ctl_bash script as /var/mps/db_pgxl/pgxl/pgxc_ctl_home/pgxc_ctl_bash.
Installing pgxc_ctl_bash script as /var/mps/db_pgxl/pgxl/pgxc_ctl_home/pgxc_ctl_bash.
Reading configuration using /var/mps/db_pgxl/pgxl/pgxc_ctl_home/pgxc_ctl_bash --home /var/mps/db_pgxl/pgxl/pgxc_ctl_home --configuration /var/mps/db_pgxl/pgxl/pgxc_ctl_home/pgxc_ctl.conf
Finished to read configuration.
PGXC_CTL START *
Current directory: /var/mps/db_pgxl/pgxl/pgxc_ctl_home
Running: gtm master
Running: gtm proxy gtm2
Running: coordinator master coord1
Running: coordinator master coord3
Running: datanode master datanode1
Running: datanode master datanode2**
Any hep is appreciated.
Thanks,
Srinivas
select * from pgxc_node;
node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred | node_id
-----------+-----------+-----------+---------------+----------------+------------------+------------
coord1 | C | 11921 | 10.102.31.167 | f | f | 1885696643
coord3 | C | 11921 | 10.102.31.206 | f | f | 1638403545
datanode2 | D | 12921 | 10.102.31.202 | f | f | -905831925
datanode1 | D | 12921 | 10.102.31.171 | t | f | 888802358
coord1 IP is not updated even in the pgxc_ctl.conf has updated with failover IP as below.
coordMasterServers=( 10.102.31.168 none 10.102.31.206 )
coordPorts=( 11921 -1 11921 )
poolerPorts=( 15921 -1 15921 )
coordMasterDirs=( /var/mps/db_pgxl/data/auto/nodes/coord none /var/mps/db_pgxl/data/auto/nodes/coord )
coordSlaveServers=( none none none )
coordSlavePorts=( none none 11921 )
coordSlavePoolerPorts=( none none 15921 )
coordSlaveDirs=( none none none )
FreeBSD is not a supported platform at this stage. But you're welcome to test and submit patches to fix things.
Regarding this issue, did you use pgxc_ctl failover command to failover a coordinator? Can you attach the output of the command? You could possibly fix this by manually running ALTER NODE and change coordinator's IP address in pgxc_node.
Thanks,
Pavan
Hi Pavan,
Thanks for your response inspite of your busy schedule. Can you please guide me how to fix these issues.
Here is my complete story from scratch again. Looking forward for your guidance.
I have reconfigured setup and tested. on Making GTM Master failover, even restart of the cluster also still getting error when we execute a query:
ERROR: Failed to get pooled connections
The steps I have done during the Test:
PG XL Initial configuration:
=====================================================
GTM Master and coordinator(coord1) master IP: 10.102.31.167
GTM Slave and coordinator(coord1) slave IP: 10.102.31.168
Coordinator(coord2,coord3) Master with GTM_Proxy : 10.102.31.203, 10.102.31.206 --> No Slaves configured for these
Datanode(datanode1,datanode2) Master IP: 10.102.31.170, 10.102.31.201
Datanode(datanode1,datanode2) Slave IP: 10.102.31.171, 10.102.31.202
I removed running slave datanodes and coordinators using pgxc_ctl tool by using script. I have observed that If I shutdown the node and executed commands then pgxc_ctl commands are hanged until make the Node UP again. Is it the expected behaviour.
Before Failover of GTM Master(10.102.31.167), the pgxc_node state as below :
===========================================================================
/mps/db_pgxl/bin/psql -U mpsroot mpsdb -p 11921 -h 10.102.31.167
psql (PGXL 9.2.0, based on PG 9.2.4 (Postgres-XL 9.2.0))
Type "help" for help.
mpsdb=> select * from pgxc_node;
node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred | node_id
-----------+-----------+-----------+---------------+----------------+------------------+------------
coord1 | C | 11921 | 10.102.31.167 | f | f | 1885696643
datanode1 | D | 12921 | 10.102.31.170 | t | f | 888802358
datanode2 | D | 12921 | 10.102.31.202 | f | f | -905831925
(3 rows)
Here is the failover GTM and coordinator command outputs.
===========================================================================================
bash-2.05b# su -l pgxl -c "sh /mps/failover_gtm.sh coord1"
Script to failover GTM and Coordinator.
`/mps/db_pgxl/bin/pgxc_ctl "failover gtm " \'
Installing pgxc_ctl_bash script as /var/mps/db_pgxl/pgxl/pgxc_ctl_home/pgxc_ctl_bash.
Installing pgxc_ctl_bash script as /var/mps/db_pgxl/pgxl/pgxc_ctl_home/pgxc_ctl_bash.
Reading configuration using /var/mps/db_pgxl/pgxl/pgxc_ctl_home/pgxc_ctl_bash --home /var/mps/db_pgxl/pgxl/pgxc_ctl_home --configuration /var/mps/db_pgxl/pgxl/pgxc_ctl_home/pgxc_ctl.conf
Finished to read configuration.
/usr/bin/bash
* PGXC_CTL START **
Current directory: /var/mps/db_pgxl/pgxl/pgxc_ctl_home
Failover gtm
Running "gtm_ctl promote -Z gtm -D /var/mps/db_pgxl/data/auto/nodes/gtm"
Actual Command: ssh pgxl@10.102.31.168 "( gtm_ctl promote -Z gtm -D /var/mps/db_pgxl/data/auto/nodes/gtm ) > /tmp/ns-mgmt-system_STDOUT_24874_0 2>&1" < /dev/null > /dev/null 2>&1
Bring remote stdout: scp pgxl@10.102.31.168:/tmp/ns-mgmt-system_STDOUT_24874_0 /tmp/STDOUT_24874_1 > /dev/null 2>&1
Updating gtm.conf at 10.102.31.168:/var/mps/db_pgxl/data/auto/nodes/gtm
Actual Command: ssh pgxl@10.102.31.168 "( cat >> /var/mps/db_pgxl/data/auto/nodes/gtm/gtm.conf ) > /tmp/ns-mgmt-system_STDOUT_24874_3 2>&1" < /tmp/STDIN_24874_2 > /dev/null 2>&1
Bring remote stdout: scp pgxl@10.102.31.168:/tmp/ns-mgmt-system_STDOUT_24874_3 /tmp/STDOUT_24874_4 > /dev/null 2>&1
`/mps/db_pgxl/bin/pgxc_ctl "reconnect gtm_proxy all " \'
Installing pgxc_ctl_bash script as /var/mps/db_pgxl/pgxl/pgxc_ctl_home/pgxc_ctl_bash.
Installing pgxc_ctl_bash script as /var/mps/db_pgxl/pgxl/pgxc_ctl_home/pgxc_ctl_bash.
Reading configuration using /var/mps/db_pgxl/pgxl/pgxc_ctl_home/pgxc_ctl_bash --home /var/mps/db_pgxl/pgxl/pgxc_ctl_home --configuration /var/mps/db_pgxl/pgxl/pgxc_ctl_home/pgxc_ctl.conf
Finished to read configuration.
/usr/bin/bash
* PGXC_CTL START **
Current directory: /var/mps/db_pgxl/pgxl/pgxc_ctl_home
ERROR: gtm proxy is not configured.
`/mps/db_pgxl/bin/pgxc_ctl "stop gtm_proxy all " \'
Installing pgxc_ctl_bash script as /var/mps/db_pgxl/pgxl/pgxc_ctl_home/pgxc_ctl_bash.
Installing pgxc_ctl_bash script as /var/mps/db_pgxl/pgxl/pgxc_ctl_home/pgxc_ctl_bash.
Reading configuration using /var/mps/db_pgxl/pgxl/pgxc_ctl_home/pgxc_ctl_bash --home /var/mps/db_pgxl/pgxl/pgxc_ctl_home --configuration /var/mps/db_pgxl/pgxl/pgxc_ctl_home/pgxc_ctl.conf
Finished to read configuration.
/usr/bin/bash
* PGXC_CTL START **
Current directory: /var/mps/db_pgxl/pgxl/pgxc_ctl_home
Stopping all the gtm proxies.
ERROR: GTM Proxy is not configured.
`/mps/db_pgxl/bin/pgxc_ctl "start gtm_proxy all " \'
Installing pgxc_ctl_bash script as /var/mps/db_pgxl/pgxl/pgxc_ctl_home/pgxc_ctl_bash.
Installing pgxc_ctl_bash script as /var/mps/db_pgxl/pgxl/pgxc_ctl_home/pgxc_ctl_bash.
Reading configuration using /var/mps/db_pgxl/pgxl/pgxc_ctl_home/pgxc_ctl_bash --home /var/mps/db_pgxl/pgxl/pgxc_ctl_home --configuration /var/mps/db_pgxl/pgxl/pgxc_ctl_home/pgxc_ctl.conf
Finished to read configuration.
/usr/bin/bash
* PGXC_CTL START **
Current directory: /var/mps/db_pgxl/pgxl/pgxc_ctl_home
Starting all the gtm proxies.
ERROR: GTM Proxy is not configured.
`/mps/db_pgxl/bin/pgxc_ctl "failover coordinator coord1 " \'
Installing pgxc_ctl_bash script as /var/mps/db_pgxl/pgxl/pgxc_ctl_home/pgxc_ctl_bash.
Installing pgxc_ctl_bash script as /var/mps/db_pgxl/pgxl/pgxc_ctl_home/pgxc_ctl_bash.
Reading configuration using /var/mps/db_pgxl/pgxl/pgxc_ctl_home/pgxc_ctl_bash --home /var/mps/db_pgxl/pgxl/pgxc_ctl_home --configuration /var/mps/db_pgxl/pgxl/pgxc_ctl_home/pgxc_ctl.conf
Finished to read configuration.
ERROR: Failed to get pooled connections
ERROR: Failed to get pooled connections
/usr/bin/bash
* PGXC_CTL START **
Current directory: /var/mps/db_pgxl/pgxl/pgxc_ctl_home
Failover coordinators.
Failover the coordinator coord1.
Failover coordinator coord1 using GTM itself
Actual Command: ssh pgxl@10.102.31.168 "( pg_ctl promote -Z coordinator -D /var/mps/db_pgxl/data/auto/nodes/coord ) > /tmp/ns-mgmt-system_STDOUT_24987_0 2>&1" < /dev/null > /dev/null 2>&1
Bring remote stdout: scp pgxl@10.102.31.168:/tmp/ns-mgmt-system_STDOUT_24987_0 /tmp/STDOUT_24987_1 > /dev/null 2>&1
pg_ctl: cannot promote server; server is not in standby mode
Actual Command: ssh pgxl@10.102.31.168 "( pg_ctl restart -Z coordinator -D /var/mps/db_pgxl/data/auto/nodes/coord -w -o -i; sleep 1 ) > /tmp/ns-mgmt-system_STDOUT_24987_2 2>&1" < /dev/null > /dev/null 2>&1
Bring remote stdout: scp pgxl@10.102.31.168:/tmp/ns-mgmt-system_STDOUT_24987_2 /tmp/STDOUT_24987_3 > /dev/null 2>&1
ALTER NODE
pgxc_pool_reload
t
(1 row)
EXECUTE DIRECT
pgxc_pool_reload
t
(1 row)
Done.
----------- failover GTM and coordinator done -------
Failover happened Found that pgxc_node is not synced as in older GTM Master 10.102.31.167. To achieve it, user "ALTER/DROP NODe" commands through current GTM Master 10.102.31.168
======================================================================
On the GTM Slave which has taken over as GTM Master now:
. To achieve it as you recommended executed "ALTER NODE"
/mps/db_pgxl/bin/psql -U mpsroot mpsdb -p 11921 -h 10.102.31.168
psql (PGXL 9.2.0, based on PG 9.2.4 (Postgres-XL 9.2.0))
Type "help" for help.
mpsdb=> select * from pgxc_node;
node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred | node_id
-----------+-----------+-----------+---------------+----------------+------------------+-------------
coord2 | C | 11921 | 10.102.31.203 | f | f | -1197102633
coord3 | C | 11921 | 10.102.31.206 | f | f | 1638403545
datanode1 | D | 12921 | 10.102.31.170 | t | f | 888802358
datanode2 | D | 12921 | 10.102.31.201 | f | f | -905831925
coord1 | C | 11921 | 10.102.31.168 | f | f | 1885696643
(5 rows)
mpsdb=# drop node coord2;
DROP NODE
mpsdb=# drop node coord3;
DROP NODE
mpsdb=# select * from pgxc_node;
node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred | node_id
-----------+-----------+-----------+---------------+----------------+------------------+------------
datanode1 | D | 12921 | 10.102.31.170 | t | f | 888802358
datanode2 | D | 12921 | 10.102.31.201 | f | f | -905831925
coord1 | C | 11921 | 10.102.31.168 | f | f | 1885696643
mpsdb=# ALTER NODE datanode2 WITH (HOST='10.102.31.202', PORT=12921);
ALTER NODE
mpsdb=# select * from db_cluster ;
ERROR: Failed to get pooled connections
mpsdb=# select * from db_cluster ;
ERROR: Failed to get pooled connections
mpsdb=# select * from pgxc_node;
node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred | node_id
-----------+-----------+-----------+---------------+----------------+------------------+------------
datanode1 | D | 12921 | 10.102.31.170 | t | f | 888802358
coord1 | C | 11921 | 10.102.31.168 | f | f | 1885696643
datanode2 | D | 12921 | 10.102.31.202 | f | f | -905831925
(3 rows)
mpsdb=# SELECT pgxc_pool_reload();
pgxc_pool_reload
t
(1 row)
mpsdb=# select * from pgxc_node;
node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred | node_id
-----------+-----------+-----------+---------------+----------------+------------------+------------
datanode1 | D | 12921 | 10.102.31.170 | t | f | 888802358
coord1 | C | 11921 | 10.102.31.168 | f | f | 1885696643
datanode2 | D | 12921 | 10.102.31.202 | f | f | -905831925
(3 rows)
mpsdb=# select * from db_cluster ;
ERROR: Failed to get pooled connections
Thanks,
Srinivas
On Wed, Aug 3, 2016 at 11:08 AM, srinivasa reddy ksreddy543@users.sf.net
wrote:
Gosh you are using 9.2? That's obsolete and hasn't got any attention in the
last 2 years and is not even supported. I would suggest download 9.5 r1.2
from http://www.postgres-xl.org/download/ and try again.
Thanks,
Pavan
--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Hi Pavan,
As per your suggestion, I have upgraded and tested the scenarios on 9.5 r1.2 on FreeBSD 8.4 platform . No issues has been observed when i failover either distinct master and slave datanodes.But problem is seen when I failover the GTM node with coordinator.
DB Setup details
DBNodeName | master IP | Slave IP
datanode1 | 10.102.31.170 | 10.102.31.171
datanode2 | 10.102.31.201 | 10.102.31.202
coord1 | 10.102.31.167 | 10.102.31.168
I have executed the following script to take over to slave gtm and coordinator:
n=
/mps/db_pgxl/bin/pgxc_ctl "failover gtm"
echo "$n"
n=
/mps/db_pgxl/bin/pgxc_ctl "reconnect gtm_proxy all"
echo "$n"
n=
/mps/db_pgxl/bin/pgxc_ctl "stop gtm_proxy all"
echo "$n"
n=
/mps/db_pgxl/bin/pgxc_ctl "start gtm_proxy all"
echo "$n"
n=
/mps/db_pgxl/bin/pgxc_ctl "failover coordinator $1"
echo "$n"
echo "----------- failover GTM and coordinator done -------"
After the script execution , monitor all command is hanged to list Datanode Status:
su -l pgxl -c "/mps/db_pgxl/bin/pgxc_ctl 'monitor all'"
/usr/bin/bash
Installing pgxc_ctl_bash script as /var/mps/db_pgxl/pgxl/pgxc_ctl_home/pgxc_ctl_bash.
Installing pgxc_ctl_bash script as /var/mps/db_pgxl/pgxl/pgxc_ctl_home/pgxc_ctl_bash.
Reading configuration using /var/mps/db_pgxl/pgxl/pgxc_ctl_home/pgxc_ctl_bash --home /var/mps/db_pgxl/pgxl/pgxc_ctl_home --configuration /var/mps/db_pgxl/pgxl/pgxc_ctl_home/pgxc_ctl.conf
Finished reading configuration.
* PGXC_CTL START **
Current directory: /var/mps/db_pgxl/pgxl/pgxc_ctl_home
Running: gtm master
Running: coordinator master coord1
#Other actions tried as followed:
Rebooted all the database nodes, All the nodes status are in Running state.
#Connected to DB through Coordinator Master (which was earlier slave node with gtm and coordinator)
/mps/db_pgsql/bin/psql -U postgres mpsdb -p 11921 -h 10.102.31.168
psql (9.4.5, server 9.5.3 (Postgres-XL 9.5r1.2))
WARNING: psql major version 9.4, server major version 9.5.
Some psql features might not work.
Type "help" for help.
mpsdb=# select * from db_nodes;
ERROR: Failed to get pooled connections
HINT: This may happen because one or more nodes are currently unreachable, either because of node or network failure.
Its also possible that the target node may have hit the connection limit or the pooler is configured with low connections.
Please check if all nodes are running fine and also review max_connections and max_pool_size configuration parameters
mpsdb=# select * from pgxc_node;
node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred | node_id
-----------+-----------+-----------+---------------+----------------+------------------+------------
datanode1 | D | 12921 | 10.102.31.170 | t | f | 888802358
datanode2 | D | 12921 | 10.102.31.202 | f | f | -905831925
coord1 | C | 11921 | 10.102.31.168 | f | f | 1885696643
(3 rows)
Tried connection through Psql on Datanodes and seen the following error:
mpsdb=# \q
bash-2.05b# /mps/db_pgsql/bin/psql -U postgres mpsdb -p 12921 -h 10.102.31.170
psql: FATAL: Could not obtain a transaction ID from GTM. The GTM might have failed or lost connectivity
bash-2.05b# /mps/db_pgsql/bin/psql -U postgres mpsdb -p 12921 -h 10.102.31.202
psql: FATAL: Could not obtain a transaction ID from GTM. The GTM might have failed or lost connectivity
Could you please suggest some pointer how to proceed on it.
Thanks,
Srinivas
I have found the issue. On failover of GTM at datanode postgres.conf file gtm_host is not getting updated to GTM_Standby IP. Is it the known bug?
For workaround we are doing update of gtm_host information through script.