Re: [Postgres-xc-developers] Developing HA for PG XC ( was Re: Pacemaker RA for datanode/coordinato

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Fri, Aug 10, 2012 at 7:37 PM, Aris Setyawan <ari...@gm...> wrote:

> > Please understand that (mainly because of administrative reasons)
>
> What kind of administrative reasons, did you mean here?
>
Company policy. Sakata-san is working for NTT research laboratory, and they
are doing some internal research work.
Because of company policy, they can only disclose partially information
about their work.

>
> If public unable to join the development process, then this HA/RA will
> be easily to be commercial product after it begin stable by community
> testing.
>
> Eg: Bizgres, C-Store. They end to be "dead" opensource projects.
>
> I think we should develop our own HA, which every one can participate
> into. Thus, we can have an "HA" of development process.
>
Yes it is important to have such discussions and I think it will be very
helpful for everybody.
However, I am not sure that the core team can be really involved in
creating packaging solutions involving Pacemaker or whatever, but we will
of course provide support and add the core features necessary to build a
wonderful HA world.

Btw, as it is useful to discuss about that on this thread. You guys have to
know that the HA team needs some support. There are a couple of issues that
need to be fixed quickly as it impacts not only NTT team, but also
They are not directly related to the core work, but some limitations
currently in XC code make HA management harder than it should be.
I basically spotted 3 issues I am going to work on and hopefully fix next
week that will help HA work:
1) It is not possible to launch a query on a slave Coordinator, making
impossible to check if a slave Coordinator is alive or not, it is also not
possible to do a pg_dump on it even if hot-standby is on. What happens is
that we try to get a transaction ID in a code path specific to XC, when
session tries to fetch a new snapshot but we should get that from WAL
directly I think.
2) When connecting to a master Datanode, we do not get snapshot from GTM as
we should. Hence, a WARNING is sent back to client and we use incorrect
local snapshot for operation. I already blocked write operations on
Datanodes when an application connects directly to it.
3) Incorrect snapshot data is taken on slave nodes, either Datanode or
Coordinator. I believe that snapshot needs to be taken directly from WAL
data.

Please note that issues 1 and 3 are related. Those things are not
complicated, but they need to be taken care of.
Regards,
-- 
Michael Paquier
http://michael.otacoo.com

Re: [Postgres-xc-developers] Developing HA for PG XC ( was Re: Pacemaker RA for datanode/coordinato

Re: [Postgres-xc-developers] Developing HA for PG XC ( was Re: Pacemaker RA for datanode/coordinator)