Thread: [Postgres-xc-general] Postgres-xc High Availability/Failover concerns

Brought to you by: ahsanhadi, amitdkhan, ashutoshbapat, gabbasb, and 3 others

postgres-xc-general

[Postgres-xc-general] Postgres-xc High Availability/Failover concerns

From: Phil S. <phi...@ne...> - 2012-04-30 20:38:06

Hi all ...

After running some some preliminary testing on our postgres-xc dev 
sandbox, I've come across some major concerns with High Availability and 
Failover.  I am hoping someone will be able to provide some insight on 
how to set up seamless high-available failover in a postgres-xc environment.


Here are the details:
--------------------

In the postgres-xc dev sandbox, a test table with the default 
round-robin distribution was created to test simultaneous DML SQL 
running on both db VMs.

With the DML SQL running, the coordinator and datanode on one of the db 
VMs was brought down (NB the host was not shut down, just the 
postgres-xc processes on the VM).  After a very short time (lets say max 
of 2 minutes), the DML SQL script running on the current active db VM 
started failing with the following error :
         ERROR:  Failed to get pooled connections

Our application requires 0 single points of failure.  If one of the db 
VMs goes down, the application needs the ability to automatically detect 
this outage and use the remaining active db VMs in the postgres-xc cluster.

Basically, I need to know how to set up the postgres-xc cluster to 
enable the applications to continue to run DML SQL when one of the db 
VMs goes down.  If this is not supported, then what would need to be put 
in place to allow for seamless high-available failover for a postgres-xc 
cluster.

Thanks in advance,
Phil.

Re: [Postgres-xc-general] Postgres-xc High Availability/Failover concerns

From: Ashutosh B. <ash...@en...> - 2012-05-02 07:28:20

Hi Phil,
The HA support for XC is dependent upon redundancy for each of the
components of XC. Coordinators themselves can provide redundancy
(except when you want to run DDL) and for datanodes steaming
replication can help set up slave server to a datanode, which can
provide HA in case that datanode fails. GTM-proxy provides redundancy
for GTM.

On Tue, May 1, 2012 at 2:07 AM, Phil Somers <phi...@ne...> wrote:
> Hi all ...
>
> After running some some preliminary testing on our postgres-xc dev
> sandbox, I've come across some major concerns with High Availability and
> Failover.  I am hoping someone will be able to provide some insight on
> how to set up seamless high-available failover in a postgres-xc environment.
>
>
> Here are the details:
> --------------------
>
> In the postgres-xc dev sandbox, a test table with the default
> round-robin distribution was created to test simultaneous DML SQL
> running on both db VMs.
>
> With the DML SQL running, the coordinator and datanode on one of the db
> VMs was brought down (NB the host was not shut down, just the
> postgres-xc processes on the VM).  After a very short time (lets say max
> of 2 minutes), the DML SQL script running on the current active db VM
> started failing with the following error :
>         ERROR:  Failed to get pooled connections
>
> Our application requires 0 single points of failure.  If one of the db
> VMs goes down, the application needs the ability to automatically detect
> this outage and use the remaining active db VMs in the postgres-xc cluster.
>
> Basically, I need to know how to set up the postgres-xc cluster to
> enable the applications to continue to run DML SQL when one of the db
> VMs goes down.  If this is not supported, then what would need to be put
> in place to allow for seamless high-available failover for a postgres-xc
> cluster.
>
> Thanks in advance,
> Phil.
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Postgres-xc-general mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general



-- 
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Enterprise Postgres Company

Re: [Postgres-xc-general] Postgres-xc High Availability/Failover concerns

From: Koichi S. <koi...@gm...> - 2012-05-02 09:04:52

Phil;

Thank you very much for the report.

This is an expected behavior.   To make HA arrangement, each datanode
needs its own backup using PG WAL shipping.   To enforce cluster-wide
data integrity, I advise to use synchronous replication.    Because
coordinators are involved in DDL, it will be better to have each
coordinator's backup as well.   In this case, when some coordinator
gets down, you can unregister failed coordinator by DROP NODE
statement so that it will not be involved any longer.   Faliback will
be a bit complicated in this case, which does not fit to this e-mail,
sorry.

Please note that each datanode has its own data and is not a simple
replica of others.

As same as vanilla PostgreSQL, each component backup has to be
configured and controlled by HA software for automatic failover.
Because GTM is specific to XC, it has dedicated backup called
GTM-standby, which can failover without any transaction loss.   This
need to be controlled by some HA middleware too.

I'm going to cover this issue in Postgres-XC tutorial in coming
PGCon2012 in Ottawa.

Best Regards;
----------
Koichi Suzuki


2012/5/1 Phil Somers <phi...@ne...>:
> Hi all ...
>
> After running some some preliminary testing on our postgres-xc dev
> sandbox, I've come across some major concerns with High Availability and
> Failover.  I am hoping someone will be able to provide some insight on
> how to set up seamless high-available failover in a postgres-xc environment.
>
>
> Here are the details:
> --------------------
>
> In the postgres-xc dev sandbox, a test table with the default
> round-robin distribution was created to test simultaneous DML SQL
> running on both db VMs.
>
> With the DML SQL running, the coordinator and datanode on one of the db
> VMs was brought down (NB the host was not shut down, just the
> postgres-xc processes on the VM).  After a very short time (lets say max
> of 2 minutes), the DML SQL script running on the current active db VM
> started failing with the following error :
>         ERROR:  Failed to get pooled connections
>
> Our application requires 0 single points of failure.  If one of the db
> VMs goes down, the application needs the ability to automatically detect
> this outage and use the remaining active db VMs in the postgres-xc cluster.
>
> Basically, I need to know how to set up the postgres-xc cluster to
> enable the applications to continue to run DML SQL when one of the db
> VMs goes down.  If this is not supported, then what would need to be put
> in place to allow for seamless high-available failover for a postgres-xc
> cluster.
>
> Thanks in advance,
> Phil.
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Postgres-xc-general mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general