From: Abbas B. <abb...@en...> - 2013-04-19 20:22:56
|
Hi, Here are the proposed steps to remove a node from the cluster. Removing an existing coordinator ========================== Assume a two coordinator cluster, COORD_1 & COORD_2 Suppose we want to remove COORD2 for any reason. 1. Stop the coordinator to be removed. In our example we need to stop COORD_2. 2. Connect to any of the coordinators except the one to be removed. In our example assuming COORD_1 is running on port 5432, the following command would connect to COORD_1 psql postgres -p 5432 3. Drop the coordinator to be removed. For example to drop coordinator COORD_2 DROP NODE COORD_2; 4. Update the connection information cached in pool. SELECT pgxc_pool_reload(); COORD_2 is now removed from the cluster & COORD_1 would work as if COORD_2 never existed. CAUTION : If COORD_2 is still running and clients are connected to it, any queries issued would create inconsistencies in the cluster. Please note that there is no need to block DDLs because either way DDLs will fail after step 1 and before step 4. Removing an existing datanode ========================= Assume a two coordinator cluster, COORD_1 & COORD_2 with three datanodes DATA_NODE_1, DATA_NODE_2 & DATA_NODE_3 Suppose we want to remove DATA_NODE_3 for any reason. Further assume there is a table named rr_abc distributed in round robin fashion and has rows on all the three datanodes. 1. Block DMLs so that during step 2, while we are shifting data from the datanode to be removed some one could have an insert process inserting data in the same. Here we will need to add a system function similar to pgxc_lock_for_backup. This is a to do item. 2. Transfer the data from the datanode to be removed to the rest of the datanodes for all the tables in all the databases. For example to shift data of the table rr_abc to the rest of the nodes we can use command ALTER TABLE rr_abc DELETE NODE (DATA_NODE_3); 3. Confirm that there is no data left on the datanode to be removed. For example to confirm that there is no data left on DATA_NODE_3 select c.pcrelid from pgxc_class c, pgxc_node n where n.node_name = 'DATA_NODE_3' and n.oid = ANY (c.nodeoids); 4. Stop the datanode server to be removed. Now any SELECTs that involve the datanode to be removed would start failing and DMLs have already been blocked, so essentially the cluster would work only partially. 5. Connect to any of the coordinators. In our example assuming COORD_1 is running on port 5432, the following command would connect to COORD_1 psql postgres -p 5432 6. Drop the datanode to be removed. For example to drop datanode DATA_NODE_3 use command DROP NODE DATA_NODE_3; 7. Update the connection information cached in pool. SELECT pgxc_pool_reload(); 8. Repeat steps 5,6 & 7 for all the coordinators in the cluster. 9. UN-Block DMLs DATA_NODE_3 is now removed from the cluster. Comments are welcome. -- *Abbas* Architect Ph: 92.334.5100153 Skype ID: gabbasb www.enterprisedb.co <http://www.enterprisedb.com/>m<http://www.enterprisedb.com/> * Follow us on Twitter* @EnterpriseDB Visit EnterpriseDB for tutorials, webinars, whitepapers<http://www.enterprisedb.com/resources-community>and more<http://www.enterprisedb.com/resources-community> |