Re: [Postgres-xc-developers] What is best choice to set up a PGXC HA environment?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Hisada,

It is great to know the resource agent will be released within a few monthes. Thank your for your work and I am glad to be one of your first batch of users. 

About the feature of PGXC internal HA, I just think it is a attractive feature from a user's perspective, you had mentioned it has some advantanges such as no need to install other external tools.  Just now, I read the Admin Guide of Greenplum, it seems that GP has the internal HA support by a process named ftsprobe. 

I was thinking each Coordinator will fork one more process at the starting time along with the autovacuum/bgwriter processes, and the new process will do all the work as the Pacemaker does. 

When the GTM is down, each Coordinator will recognize it when it fetches the snapshots from GTM, then it will talk with other Coordinators and negociate to restart the GTM master or promote the GTM slave to master. But I am not sure how to send the RESTART GTM or PROMOTE GTM SLAVE command from a Coordinator process. Maybe the PROMOTE command can be replaced by a API invocation to the GTM component.

When one coordinator is down,  when the other coordinators execute a DDL (or each coordinator could send SELECT 1+1 to other coordinators periodically to verify it they are all alive), they will find the failed coordinator, then the alived coordinators can decide to remove the failed coordinator from the pgxc_nodes.

When one datanode is down, the coordinator will know it when it sends the REMOTE QUEYR to data node, or it can also send the SELECT 1+1 to each datanodes periodically. Then all the coordinator will negociate to promote the DataNode slave to master. 

But maybe it is not a better solution if the Pacemaker is easier to use? for example, we can develop a PGXC-Pacemaker Glue layer which can fetch all the cluster configuration from PGXC and then configure Pacemaker automatically....

Thanks
Julian

> From: his...@la...
> To: koi...@gm...; jul...@ou...
> CC: pos...@li...
> Subject: RE: [Postgres-xc-developers] What is best choice to set up a PGXC	HA environment?
> Date: Tue, 4 Feb 2014 11:23:05 +0900
> 
> Hi, Julian, 
> 
> > > I am thinking 3 choices as below:
> > >
> > > 1. Pacemaker and Corosync.
> > > I have little experience on Linux HA, so one week passed, I even can
> > > not install them successfully, including
> Pacemaker/Corosync/crmsh/resouce
> > agent.
> > > There are some website mentioned Pacemaker/corosync can help PGXC to
> > > build a HA infrastructure, but I can not find a comprehensive guide to
> > > do it. There are much more commponents in PGXC than PG, I think I
> > > should learn how to build it based on PG first.
> > 
> > I know separate XC project to provide Pacemaker/Corosync resource
> > agent for XC.   Please let me push them to provide info.
> 
> We are planning to release resource agent for pacemaker/heartbeat within a
> few months.
> Basic idea is to manage pairs of Master-Slave for Datanode, Coordinator and
> GTM at each server by pacemaker.
> Hopefully this could be one of the solution to HA feature at XC.
> 
> > > 2. Zookeeper
> > > It seems that Zookeeper has the ability to build a HA solution for
> > > PGXC, which have the similar function with Pacemaker, but I have to
> > > develop the heartbeat function for Zookeeper to
> > > start/stop/monitor/failover PGXC. And I do not know if my understand is
> > right.
> > 
> > Sorry, I'm not familiar with Zookeeper.
> >
> > > 3. PGXC support HA internally.
> > > Because the table of pgxc_nodes in coordinator already have some
> > > information about the cluster, it can be enhanced to save the
> > > Master/Slave relations, it is replicated between all coordinators,
> > > then it can used as a CRM(Cluster Resource Management, as Pacemaker)
> > compoment.
> > > And the coordinator will connect to datanode/gtm/other coordinator in
> > > its regular work, so the heartbeat function exists natually. Even when
> > > the database is in the spare time, the coordinator can send a simple
> > > query as "select 1+1" to datanodes as the heartbeat ticks.
> > > What need to do is that, the coordinator will start a new process when
> > > starting, the new process will act as a heartbeat /resouce_agent to
> > > monitor the cluster status, and restart/failover once one commponent
> fails.
> 
> How about monitoring coordinator and GTM? Do you have any idea?
> 
> > > As my initial understanding, Choice 3 is better than Choice 2 which is
> > > better than Choice 1. But for the development effort, the order is
> > > reversed, Choice 1 is easy achieved based on current existing codes.
> 
> What do we mean by better?
> My requirement is as follows : 
> 
> Availability :
>  - Shorten Failure detection
>  - Shorten downtime at Failover / Switchover
> 
> Node management usability : 
>  - We can manage Slave node as well as Master node into XC Cluster
>  - Enables node monitoring and management at psql
>  - No need to install / configure external tools : pacemaker / colosync 
> 
> What else?
> 
> Regards,
> 
> Hisada
> 
> > > I am very appreciated that you can share your advice with me.
> > 
> > Yes, I do agree with this solution.   I'd like to have this as a part
> > of XC release 1.3.
> 
> 
> 
> > PGXC internal HA should be integrated with other monitoring feature such
> as
> > server hardware, power and network.
> > 
> > It will be exciting to begin this discussion in this mailing list.
> > 
> > Regards;
> > ---
> > Koichi Suzuki
> > 
> > >
> > > Thanks
> > > Julian
> > >
> > >
> > >
> > > ----------------------------------------------------------------------
> > > -------- CenturyLink Cloud: The Leader in Enterprise Cloud Services.
> > > Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical
> > > Workloads, Development Environments & Everything In Between.
> > > Get a Quote or Start a Free Trial Today.
> > > http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.
> > > clktrk _______________________________________________
> > > Postgres-xc-developers mailing list
> > > Pos...@li...
> > > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
> > >
> > 
> > -------------------------------------------------------------------------
> > -----
> > WatchGuard Dimension instantly turns raw network data into actionable
> security
> > intelligence. It gives you real-time visual feedback on key security
> issues
> > and trends.  Skip the complicated setup - simply import a virtual
> appliance
> > and go from zero to informed in seconds.
> > http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clk
> > trk
> > _______________________________________________
> > Postgres-xc-developers mailing list
> > Pos...@li...
> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>