Re: [Postgres-xc-developers] What is best choice to set up a PGXC HA environment?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Mason,

Yes, if I can help I am honored to join such a development work. 

I was trying to build a HA environment for PG, and then PGXC, but still haven't achieve it by far. My OS is RHEL 6.4, do you have some documents or web links sharing with me? 

I have another idea which may not be a good one, but still would like to post here and have your advice. For the HA component, it looks like donot need to be installed on all the machines. We can develop a seperated module just like GTM, I call it XCMon temporarily which is suggested to be installed on the server of GTM and GTM slave. 
Machine1 (GTM, XCMon)  ---> Machine2(GTM slave, XCMon slave)
XCMon will monitor all the components by sending SELECT 1+1 to all Coordinators and all data nodes, and trying to get snapshot from GTM and all GTM proxies periodically. XCMon slave will detect XCMon master and prepare to take the control once XCMon master failed. Further, we can even combine XCMon functions into GTM and maker the deployment more eaiser. 

Thanks
Julian

Date: Fri, 7 Feb 2014 10:49:23 -0500
Subject: Re: [Postgres-xc-developers] What is best choice to set up a PGXC HA environment?
From: ms...@tr...
To: jul...@ou...
CC: his...@la...; koi...@gm...; pos...@li...

On Fri, Feb 7, 2014 at 2:39 AM, ZhangJulian <jul...@ou...> wrote:

Hi Hisada,

It is great to know the resource agent will be released within a few monthes. Thank your for your work and I am glad to be one of your first batch of users. 

About the feature of PGXC internal HA, I just think it is a attractive feature from a user's perspective, you had mentioned it has some advantanges such as no need to install other external tools.  Just now, I read the Admin Guide of Greenplum, it seems that GP has the internal HA support by a process named ftsprobe. 

I was thinking each Coordinator will fork one more process at the starting time along with the autovacuum/bgwriter processes, and the new process will do all the work as the Pacemaker does. 

When the GTM is down, each Coordinator will recognize it when it fetches the snapshots from GTM, then it will talk with other Coordinators and negociate to restart the GTM master or promote the GTM slave to master. But I am not sure how to send the RESTART GTM or PROMOTE GTM SLAVE command from a Coordinator process. Maybe the PROMOTE command can be replaced by a API invocation to the GTM component.

When one coordinator is down,  when the other coordinators execute a DDL (or each coordinator could send SELECT 1+1 to other coordinators periodically to verify it they are all alive), they will find the failed coordinator, then the alived coordinators can decide to remove the failed coordinator from the pgxc_nodes.

When one datanode is down, the coordinator will know it when it sends the REMOTE QUEYR to data node, or it can also send the SELECT 1+1 to each datanodes periodically. Then all the coordinator will negociate to promote the DataNode slave to master. 

But maybe it is not a better solution if the Pacemaker is easier to use? for example, we can develop a PGXC-Pacemaker Glue layer which can fetch all the cluster configuration from PGXC and then configure Pacemaker automatically....

These are all good thoughts and somewhat along the lines of what I have been thinking as well.  
We have been using Corosync/Pacemaker for quite some time. It works, but in hindsight I wish we would have put effort into an internal solution. While the current solution works, we have spent a lot of time tweaking and maintaining. In the past we have had seen aggressive failovers unnecessarily, for example. Also, it takes some resources and it does not like to manage too many components at once. In our case, we like to have two replicas of each data node on the other servers that have masters. Making node membership more flexible and getting components to agree when to failover is likely better long term solution. There would be more upfront effort, but easier installation and less management and maintenance long term.  Let me know if you have the time to collaborate on such a development effort if we undertake this at some point.

Our other product, TED (unrelated to Postgres-XC), manages failover internally and works well, including automatic recovery of downed nodes. We can perhaps draw on lessons there, too.

-- 
Mason Sharp
TransLattice - http://www.translattice.com
Distributed and Clustered Database Solutions