Re: [Postgres-xc-general] Cluster restart

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On 10/5/2013 5:19 PM, Michael Paquier wrote:
> On Sat, Oct 5, 2013 at 9:00 PM, Stefan Lekov <ar...@er...> wrote:
>> Hello, I'm new to the Postgres-XC project. In fact I am still
>> considering if I should install it in order to try it as a
>> replacement of my current database clusters (those are based around
>> MySQL and its binary_log based replication).
> Have you considered PostgreSQL as a potential solution before
> Postgres-XC. Why do you especially need XC?
I have used PosgreSQL in the past, I am using it at the moment (for 
other projects) and I'd like to continue using in the future. My current 
requirements are including having a multi-master replicated database 
cluster. These requirements are related to redundancy and possible 
scalability. While one PostgreSQL server will cope with any load that I 
can throw at it for the near future, that might not be the case in about 
an year or two. As for the redundancy part - I am familiar with 
PostgreSQL capabilities of a warm standby server however I am looking 
for something more robust.

Because of the requirement of "multi-master", I am investigating 
Postgres-XC and pgpool2 capabilities to deliver such system. I can 
migrate to a single PostgreSQL server, however I am not really keen on 
solving the replication dilemma on-the-fly when the system is already 
running with Postgres - I prefer having something that is already 
working as expected right from the start.

>> Before actually starting the installation of postgres-xc I would like
>> to know what is the procedure for restarting nodes. I have already
>> read a few documents/mails regarding restoring or resyncing a failed
>> datanode, however these documents does not answer my simple question:
>> What should be the procedure for rebooting servers? For example I
>> have a kernel updated pending (due to security reasons) - I'm
>> installing the new kernel, but I have to reboot all machine.
>> Theoretically all nodes (both coordinators and datanodes) are working
>> on different physical servers or VMes. In a perfect scenario I would
>> like to keep the system in production while I am restarting the
>> servers one by one. However I am not sure what would be the effect of
>> rebooting servers one by one.
> If a node is restarted or facing an outage, all the transactions it
> needs to be involved in will simply fail. In the case of Coordinator,
> this has effect only for DDL. For Datanodes, this has effect as well
> for DDL, but also for DML and SELECT of the node is needed for the
> transaction.
There would be no DDL during these operations. I can limit the queries 
to DML only.
>> For purpose of example let me have four datanodes: A,B,C,D All
>> servers are synced and are operating as expected. 1) Upgrade A,
>> reboot A 2) INSERT/UPDATE/DELETE queries 3) A boots up and is
>> successfully started 4) INSERT/UPDATE/DELETE queries 5) Upgrade B,
>> reboot B ... ... As for the "Coordinators" nodes. How are those
>> affected by temporary stopping and restarting the postgres-xc related
>> services. What should be the load balancer in front of these servers
>> in order to be able to both load-balance and fail-over if one of the
>> Coordinators is offline either due to failed server or due to
>> rebooting servers.
> DDLs won't work. Applications will use one access point. In this case
> no problems for your application, connect to the other Coordinators to
> execute queries as long as they are not DDLs.
What system, application or method would you recommend for performing 
the load-balance/fail-over of connections to the Coordinators.
>> I have no problem with relatively heavy operation of full restore of
>> a datanode in event of failed server. Such restoration operation can
>> be properly scheduled and executed, however I am interested how would
>> postgres-xc react to simple scenarioa simple operation of restarting
>> a server due to whatever reasons should
> As mentioned above, transactions that will need it will simply fail.
> You could always failover a slave for the outage period if necessary.
Correct me if I'm wrong: All data (read databases, schema, tables, etc) 
would be replicated to all datanodes. So before host A goes down all 
servers would have the same dataset. This way no transaction should fail 
due to the missing datanode A. While A has been booting up several 
transactions have passed (since such restart is an operation I can 
schedule, I'm doing that during time when we have low to no load on our 
systems, thus the transaction count is relatively low). My question is 
how to bring A back to having "the same dataset" as the rest of the 
datanodes before I can continue with the next host/datanode?

Regards,
Stefan Lekov