Postgres-XC Postgres-XC

Brought to you by: ahsanhadi, amitdkhan, ashutoshbapat, gabbasb, and 3 others

Synchronous_streaming_replication

There is a newer version of this page. You can find it here.

Basics
Synchronous streaming replication for Postgres-XC

Basics

This part is related to all the background necessary to setup a synchronous replicated server with PostgreSQL. Before beginning the settings, it is essential to set up 2 data repositories for both slave and master.

Master settings

postgresql.conf

wal_level = hot_standby
archive_mode = on
archive_command = 'cp -i %p $HOME/bin/postgres/archive/%f'
max_wal_senders = 10

OK, archive_command depends on where you want to locate WAL archives

pg_hba.conf, necessary to allow slave contact master via replication mode

host replication michael 127.0.0.1/32 trust
host replication michael ::1/128 trust

Port of master is in this case 5432, so the default value.

Slave settings

Copy the master configuration file postgresql.conf, then modify the following lines to make it a slave.

hot_standby = on
port = 5433
recovery.conf, located in share/ as recovery.conf.sample

standby_mode = on
primary_conninfo = 'host=localhost port=5432'
restore_command = 'cp -i %p $HOME/bin/postgres/archive/%f'

restore_command should call archives at the same place as archive_command for postgresql.conf.

Fallback to slave

If master dies, the following lines permit to fallback easily to slave.

 echo "standby_mode = off" &gt;&gt; slave/recovery.conf
 echo "port = 5432" &gt;&gt; slave/postgresql.conf
 ./bin/pg_ctl -D slave restart

This basically just changes the slave port to the former port of master and makes the slave react as a master node by setting standby_mode at off. After restart is completed, reconvery.conf is renamed recovery.done to avoid recovery to be done several times.

Extra notes

 pg_start_backup('backup')

This can be used to start the creation of archives backup on disk. This can accelerate the slave creation by not having to copy all the data from master.

 pg_stop_backup()

This can be used to stop on-the-way the backup archiver.

Synchronous streaming replication for Postgres-XC

This part specifies how synchronous streaming replication will be implemented in Postgres-XC. Even if functionalities in PostgreSQL 9.1 already implemented are pretty stable, some extensions related to node communication control have to be designed to have a real HA solution.

Some background

Postgres-XC is a multi-master database cluster based on PostgreSQL.
It is made of a unique global transaction manager which feeds consistently transaction IDs and snapshots in the cluster to each node.
Nodes are made of 2 kinds of nodes: Coordinator and Datanode.
- Coordinators are holding other node information. A coordinator is able to communicate with other Coordinators and other Datanodes through a connection pooler. This connection pooler saves all the connection parameters to nodes (host name, port number) to be able to distribute connection with a libpq string protocol depending on database name and user name. Coordinators also have all the catalog information, and primarily the distribution information of each table in the database. With this information, Coordinator is able to push down SQL queries to correct Datanodes and then merge results that are sent back to application according a new plan type called RemoteQuery. A coordinator does not hold table data, and all the data contained each Coordinator is the same. So one Coordinator is the clone of another one.
- Datanodes react more or less like a PostgreSQL normal instance. They hold database data. What has been added is an interface to permit Datanodes to receive from Coordinators transaction IDs, snapshots, timestamp values instead of requesting them locally.

Limitations

Postgres-XC does not support yet tuple relocation from one node to another (impossible to update for instance column that holds the distribution key), so this design is limited to the case where the cluster has a fixed number of master nodes (for Datanodes and Coordinators (?)).
It is not thought here about trying to add or delete a Datanode on the fly. By that, it means that cluster configuration is not changed in a way that it modifies the node number and data distribution.
With those assumptions what remains is a cluster with a fixed size
This specification is based on PostgreSQL 9.1, but design is though to take into account as many replication features as possible.

Specifications

Here are the list of functionalities that will be added for the support of synchronous streaming replication. Most of them concern node management, master/slave identification and slave promotion system.

In no way Postgres-XC nodes are thought as being able to make system calls to be able to kick a slave initdb or something like this. NO WAY!

Catalog addition

A catalog called pgxc_nodes will be added with the following columns:

node type
node ID number
node host name
node port number
node immediate master ID

This table has as a primary key constraint on node number and node type. "node immediate master ID" is the node ID that a slave is using to connect to a master or another slave (case of cascade replication, not implemented in PostgreSQL 9.1 though). This catalog is created on Coordinator.

Cluster startup

As a limitation, all the configuration files of postgres-XC coordinator nodes only contain master Coordinator numbers.

With that, the initialization of the catalog table pgxc_nodes is made only with data of master nodes. In the case of a master node, "node immediate master ID" is filled with 0.

Once the cluster is up with a fixed number of nodes, the administrator has he possibility to update pgxc_nodes catalog with slaves already on that have already there configuration files set correctly to connect to the wanted node.

Adding a slave node after cluster start up

Following SQL is thought:

 REGISTER [COORDINATOR | DATANODE] SLAVE
 IDENTIFIED BY ID $id_num
 WITH CONNECTION ('$host1:$port1',...,'$hostN:$portN')
 ON MASTER $master_number.

If only 1 host/port couple is specified, the same values are applied for all the coordinators. In case multiple host/port values are specified, they have to correspond to the number of Coordinators in the cluster. The following SQL is sent to all the coordinators.

Pooler modification

Pooler has to be modified on two points:

At initialization phase, Pooler fills in the catalog table pgxc_nodes with parameter values found in postgresql.conf.
Pooler saves in shared memory information related to master nodes.
Addition of an additional boolean parameter when getting connections for a transaction.
- If the transaction is read-only, connections to master/slave are both possible.
- Connections are distributed with a round-robin algorithm.
- If the transaction first requested read-only connections, but launches on the way a DML, recontact pooler to get master connections instead of the ones that are