From: Michael P. <mic...@gm...> - 2012-07-30 13:21:16
|
On Mon, Jul 30, 2012 at 7:03 PM, Vladimir Stavrinov <vst...@gm...>wrote: > On Fri, Jul 20, 2012 at 06:18:22PM +0900, Michael Paquier wrote: > > > Like postgreSQL, you can attach a slave node to a datanode and then > perform a failover on it. > > After the master node fails for a reason or another, you will need to > promote a slave waiting behind. > > Something like pg_ctl promote -D $DN_FOLDER is enough. > > This is for the Datanode side. > > Then what you need to do is update the node catalogs on each > Coordinator to allow them to redirect to the new promoted > > node. > > Let's suppose that the node that failed was called datanodeN (you > need the same node name for master and slave). > > In order to do that, issue "ALTER NODE datanodeN WITH (HOST = > '$new_ip', PORT = $NEW_PORT); SELECT pgxc_pool_reload();" > > Do that on each Coordinator and then the promoted slave will be > visible to each Coordinator and will be a part of > > cluster. > > If You don't do this every day there are chances You make an error. How > much time it take in this case? As I wrote above, it is not own XC HA > feature, but rather external cluster infrastructure. As such it is better > to use mentioned above tandem drbd + corosync + pacemaker - at least it > get failover automated. > I do not mean to perform such operations manually. It was just to illustrate how to do it. Like PostgreSQL, XC provides to the user the necessary interface to perform easily and externally failover and HA operation. Then, the architect is free to use the HA utilities he wishes to perform any HA operation. In your case, a layer based on pacemaker would work. However, XC needs to be able to adapt to a maximum number of HA applications and monitoring utilities. The current interface fills this goal. > > > In 1.0 you can do that with those kind of things (want to remove data > from datanodeN): > > CREATE TABLE new_table TO NODE (datanode1, ... datanode(N-1), > datanode(N+1), datanodeP) AS SELECT * from old_table; > > DROP TABLE old_table; > > ALTER TABLE new_table RENAME TO old_table; > > Once you are sure that the datanode you want to remove has no unique > data (don't care about replicated...), perform a > > DROP NODE on each Coordinator, > > then pgxc_pool_reload() and the node will be removed correctly. > > Looks fine! What if there are thousands such tables to be relocated (it > is real case)? And as I see, to do opposite operation, i.e. adding > data node, wee need to use this CREATE/DROP/RENAME TABLE technique again? > It doesn't look like HA. > In 1.0, yes. And this is only necessary for hash/modulo/round robin tables. > > > Please note that I working on a patch able to do such stuff > automatically... Will be committed soon. > > It is hopeful news. > The patch is already committed in master branch. So you can do it in a simple command. > > > DISTRIBUTE BY REPLICATION) XC itself have no neither HA nor LB (at > least > > for writes) capabilities? > > > > Basically it has both, I know some guys who are already building an > HA/LB solution based on that... > > What do You mean? I mean: - HA: XC provides the necessary interface to allow other external tools to perform operations like for postgres. - LB: There is an automatic load balancing between Datanodes and Coordinator by design. Load balancing at Coordinator level has to be managed by an external tool. > As we saw above HA is external and LB is a question > either read or write. Yes, we have one only variant of such solution: > when all tables are replicated we have "internal" HA and "read" LB. But > such solution is implemented in many other technologies apart from XC. > But as far as I understand, the main feature of XC is what named as > "write-scalable, synchronous multi-master" > symmetric. > > OK, I still hope, I make right decision choosing XC as cluster > solution. But now summarizing discussed problematic I have further > question: Why do You implemented distribution types on table level? In a cluster what is important is to limit the amount of data exchanged between nodes to reach good performance. In order to accomplish that, you need a control of tables joins. In XC, maximizing performance is simply sending as many joins as possible to remote nodes, reducing the amount of data exchanged between nodes by that much. There are multiple ways to control data joins, like caching the data at Coordinator level for reuse, what pgpool-II does. But in this case how to manage prepared plans or write operations? This is hardly compatible with multi-master. Hence, the control is given to the tables. Explaining why distribution is controlled like this. It is very complex for using and is not transparent. For example, when You > need to install third party application You need to revise their all sql > scripts to add DISTRIBUTE BY statement if You don't want defaults. What > do You think about implementing different data node types (instead of > tables), i.e. "distributed" and "replicated" nodes? > Well, the only extension that XC adds is that, and it allows to perform either read and/or write scalability in a multi-master symmetric cluster, so that's a good deal! -- Michael Paquier http://michael.otacoo.com |