Re: [Postgres-xc-developers] Proposed steps to remove a node

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Mon, Apr 22, 2013 at 2:30 PM, 鈴木 幸市 <ko...@in...> wrote:

> In this case, after ALTER TABLE REDISTRIBUTE is issued against tab1, yes,
> DML to tab2 may target to the datanode being removed.   However, in this
> case, because tab1 has been redistributed, no DML to tab1 will target to
> tab1.
>
> After then when ALTER TABLE REDISTRIBUTE is issued against tab2, all the
> data are redistributed and after then no DML to tab2 will be targetted to
> the datanode to remove.
>
> We have discussed this issue about a year ago if we should exclude and
> redistribute all the tables before removing datanode and concluded that
> this should be DBA's responsibility to exclude the datanode to be removed
> from all the distribution, manually or automatic.
>
> As much as DBA issues ALTER TABLE correctly to all the tables, there's no
> chance to bring XC status into inconsistent status so I don't think we need
> DML blocking.
>
> Any other inputs?
>

I was not taking into account the fact that after tab1 has been
redistributed any DMLs will not target the removed node. I agree we do not
need any DML blocking. I will commit the updated steps in the repository.

> ---
> Koichi Suzuki
>
>
>
> On 2013/04/22, at 16:42, Abbas Butt <abb...@en...> wrote:
>
> Consider this case:
> Assume two tables in a database: tab1 and tab2, assume both are
> distributed by round robin.
> Assume a client C1 is connected to the cluster and is running a loop to
> insert rows to tab1.
> Assume administrator connects to the cluster and issues ALTER TABLE
> REDISTRIBUTE for table tab2. The moment this alter finishes, assume C1
> starts inserting rows to tab2, while administrator issues ALTER TABLE
> REDISTRIBUTE for table tab1, thinking tab2 is clear.
> Since the administrator has to issue ALTER TABLE REDISTRIBUTE table by
> table for all tables in all databases, client C1 would always have a chance
> to insert more rows in the table for which administrator has already issued
> ALTER TABLE REDISTRIBUTE.
>
> For this reason we need DML blocking.
>
> Comments/Suggestions are welcome.
>
> On Mon, Apr 22, 2013 at 11:15 AM, 鈴木 幸市 <ko...@in...> wrote:
>
>> Sorry Abbas,  I have a question/comment on removing a datanode.
>>
>> Before DBA would like to remove a datanode, he/she must run ALTER TABLE
>> to save all the data in the node to others.   Therefore, I'm not sure if we
>> need another means to lock DML.
>>
>> I understand TMP table will be an issue.   If a TMP table is created over
>> multiple nodes, it is under 2PC control which is not allowed so far.   So
>> what we can do is to create TMP table on a particular node only.   If it is
>> the removing datanode, operation will fail and I think it can keep whole
>> cluster in a consistent status.
>>
>> Yes, it is all DBA's responsibility to make sure that no data are left in
>> the datanode.   We have a means to make it sure as you submitted.
>>
>> So again, I'm not yet sure if DML blocking is still needed.
>>
>> Regards;
>>  ---
>> Koichi Suzuki
>>
>>
>>
>> On 2013/04/20, at 5:22, Abbas Butt <abb...@en...> wrote:
>>
>> Hi,
>>
>> Here are the proposed steps to remove a node from the cluster.
>>
>> Removing an existing coordinator
>> ==========================
>>
>> Assume a two coordinator cluster, COORD_1 & COORD_2
>> Suppose we want to remove COORD2 for any reason.
>>
>> 1. Stop the coordinator to be removed.
>>    In our example we need to stop COORD_2.
>>
>> 2. Connect to any of the coordinators except the one to be removed.
>>    In our example assuming COORD_1 is running on port 5432,
>>    the following command would connect to COORD_1
>>
>>    psql postgres -p 5432
>>
>> 3. Drop the coordinator to be removed.
>>    For example to drop coordinator COORD_2
>>
>>    DROP NODE COORD_2;
>>
>> 4. Update the connection information cached in pool.
>>
>>    SELECT pgxc_pool_reload();
>>
>> COORD_2 is now removed from the cluster & COORD_1 would work as if
>> COORD_2 never existed.
>>
>> CAUTION : If COORD_2 is still running and clients are connected to it,
>> any queries issued would create inconsistencies in the cluster.
>>
>> Please note that there is no need to block DDLs because either way DDLs
>> will fail after step 1 and before step 4.
>>
>>
>>
>>
>> Removing an existing datanode
>> =========================
>>
>> Assume a two coordinator cluster, COORD_1 & COORD_2
>> with three datanodes DATA_NODE_1, DATA_NODE_2 & DATA_NODE_3
>>
>> Suppose we want to remove DATA_NODE_3 for any reason.
>>
>> Further assume there is a table named rr_abc distributed in round robin
>> fashion
>> and has rows on all the three datanodes.
>>
>> 1. Block DMLs so that during step 2, while we are shifting data from
>>    the datanode to be removed some one could have an insert process
>>    inserting data in the same.
>>
>>    Here we will need to add a system function similar to
>> pgxc_lock_for_backup.
>>    This is a to do item.
>>
>> 2. Transfer the data from the datanode to be removed to the rest of the
>> datanodes for all the tables in all the databases.
>>    For example to shift data of the table rr_abc to the
>>    rest of the nodes we can use command
>>
>>    ALTER TABLE rr_abc DELETE NODE (DATA_NODE_3);
>>
>> 3. Confirm that there is no data left on the datanode to be removed.
>>    For example to confirm that there is no data left on DATA_NODE_3
>>
>>    select c.pcrelid from pgxc_class c, pgxc_node n where
>>    n.node_name = 'DATA_NODE_3' and n.oid = ANY (c.nodeoids);
>>
>> 4. Stop the datanode server to be removed.
>>    Now any SELECTs that involve the datanode to be removed would start
>> failing
>>    and DMLs have already been blocked, so essentially the cluster would
>> work
>>    only partially.
>>
>> 5. Connect to any of the coordinators.
>>    In our example assuming COORD_1 is running on port 5432,
>>    the following command would connect to COORD_1
>>
>>    psql postgres -p 5432
>>
>> 6. Drop the datanode to be removed.
>>    For example to drop datanode DATA_NODE_3 use command
>>
>>    DROP NODE DATA_NODE_3;
>>
>> 7. Update the connection information cached in pool.
>>
>>    SELECT pgxc_pool_reload();
>>
>> 8. Repeat steps 5,6 & 7 for all the coordinators in the cluster.
>>
>> 9. UN-Block DMLs
>>
>> DATA_NODE_3 is now removed from the cluster.
>>
>>
>> Comments are welcome.
>>
>> --
>> *Abbas*
>>  Architect
>>
>> Ph: 92.334.5100153
>>  Skype ID: gabbasb
>> www.enterprisedb.co <http://www.enterprisedb.com/>m<http://www.enterprisedb.com/>
>> *
>> Follow us on Twitter*
>> @EnterpriseDB
>>
>> Visit EnterpriseDB for tutorials, webinars, whitepapers<http://www.enterprisedb.com/resources-community>and more<http://www.enterprisedb.com/resources-community>
>> ------------------------------------------------------------------------------
>> Precog is a next-generation analytics platform capable of advanced
>> analytics on semi-structured data. The platform includes APIs for building
>> apps and a phenomenal toolset for data science. Developers can use
>> our toolset for easy data analysis & visualization. Get a free account!
>>
>> http://www2.precog.com/precogplatform/slashdotnewsletter_______________________________________________
>> Postgres-xc-developers mailing list
>> Pos...@li...
>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>>
>>
>>
>
>
> --
> --
> *Abbas*
>  Architect
>
> Ph: 92.334.5100153
>  Skype ID: gabbasb
> www.enterprisedb.co <http://www.enterprisedb.com/>m<http://www.enterprisedb.com/>
> *
> Follow us on Twitter*
> @EnterpriseDB
>
> Visit EnterpriseDB for tutorials, webinars, whitepapers<http://www.enterprisedb.com/resources-community>and more<http://www.enterprisedb.com/resources-community>
>
>
>

-- 
-- 
*Abbas*
 Architect

Ph: 92.334.5100153
 Skype ID: gabbasb
www.enterprisedb.co
<http://www.enterprisedb.com/>m<http://www.enterprisedb.com/>
*
Follow us on Twitter*
@EnterpriseDB

Visit EnterpriseDB for tutorials, webinars,
whitepapers<http://www.enterprisedb.com/resources-community>and
more<http://www.enterprisedb.com/resources-community>