From: Michael P. <mic...@gm...> - 2012-08-10 11:40:30
|
On Fri, Aug 10, 2012 at 7:37 PM, Aris Setyawan <ari...@gm...> wrote: > > Please understand that (mainly because of administrative reasons) > > What kind of administrative reasons, did you mean here? > Company policy. Sakata-san is working for NTT research laboratory, and they are doing some internal research work. Because of company policy, they can only disclose partially information about their work. > > If public unable to join the development process, then this HA/RA will > be easily to be commercial product after it begin stable by community > testing. > > Eg: Bizgres, C-Store. They end to be "dead" opensource projects. > > I think we should develop our own HA, which every one can participate > into. Thus, we can have an "HA" of development process. > Yes it is important to have such discussions and I think it will be very helpful for everybody. However, I am not sure that the core team can be really involved in creating packaging solutions involving Pacemaker or whatever, but we will of course provide support and add the core features necessary to build a wonderful HA world. Btw, as it is useful to discuss about that on this thread. You guys have to know that the HA team needs some support. There are a couple of issues that need to be fixed quickly as it impacts not only NTT team, but also They are not directly related to the core work, but some limitations currently in XC code make HA management harder than it should be. I basically spotted 3 issues I am going to work on and hopefully fix next week that will help HA work: 1) It is not possible to launch a query on a slave Coordinator, making impossible to check if a slave Coordinator is alive or not, it is also not possible to do a pg_dump on it even if hot-standby is on. What happens is that we try to get a transaction ID in a code path specific to XC, when session tries to fetch a new snapshot but we should get that from WAL directly I think. 2) When connecting to a master Datanode, we do not get snapshot from GTM as we should. Hence, a WARNING is sent back to client and we use incorrect local snapshot for operation. I already blocked write operations on Datanodes when an application connects directly to it. 3) Incorrect snapshot data is taken on slave nodes, either Datanode or Coordinator. I believe that snapshot needs to be taken directly from WAL data. Please note that issues 1 and 3 are related. Those things are not complicated, but they need to be taken care of. Regards, -- Michael Paquier http://michael.otacoo.com |