Thread: [Postgres-xc-general] How distribution of tables works in PGXC?

Brought to you by: ahsanhadi, amitdkhan, ashutoshbapat, gabbasb, and 3 others

postgres-xc-general

[Postgres-xc-general] How distribution of tables works in PGXC?

From: amul s. <sul...@ya...> - 2013-08-07 05:56:47

Hello ALL,
 
I need little help, to migrated Postgres
database to the PGXC.
 
While creating database, I want to best
possible way of relations should distribute and/or replicated, Primary aim to
achieve to optimum performance.
 
But we need to consider following
constrain mention in documentation. 
 
1. In distributed tables, UNIQUE
constraints must include the distribution column of the table. 
2. in UNIQUE constraint, distribution
column must be included in PRIMARY KEY. Other restrictions apply to PRIMARY KEY
too.
3. REFERENCE must be the distribution
column while creating Foreign key in relation. In this case, we cannot add
PRIMARY KEY to other column in relation because PRIMARY KEY must be the
distribution column as well.
4. With REFERENCES integrity needs to be
the distribution column, it is Postgres-XC's restriction that we cannot specify
more than one columns with REFERENCES constraint.
5. Not allowed Modifying distribution
columns definition and Modifying distribution column values
 
I am not pretty clear about all those
point, so need your help.
My questions are as follow:
 
Is distribution on table is possible in
following condition ?
        1.
relation(table) has Primary key and Foreign key(REFERENCE column) and both are
different?
        2.
If primary key is on ( col1, col2) and Foreign key is only on (col2) is this
O.K, if PRIMARY key is used as distribution column. 
        3.
If relation has more than one Foreign key (on col1 and col2) 
 
I think it not possible distribute table
in such condition, in this case is this best way to distributed by ROUND ROBIN
or REPLECATE relations?
 
Thoughts?  Comments?
 
Thank you.
 
Regards
Amul Sul

Re: [Postgres-xc-general] How distribution of tables works in PGXC?

From: Koichi S. <koi...@gm...> - 2013-08-07 06:21:08

Hello Amul;


Welcome to Postgres-XC community.


2013/8/7 amul sul <sul...@ya...>

> Hello ALL,****
>
> I need little help, to migrated Postgres database to the PGXC.****
>
> While creating database, I want to best possible way of relations should
> distribute and/or replicated, Primary aim to achieve to optimum performance.
>

Some description on table distribution/replication will be found in PGCon
tutorial material available at our Wiki page
http://postgresxc.wikia.com/wiki/Postgres-XC_Wiki.


> ****
>
> But we need to consider following constrain mention in documentation. ****
>
> 1. In distributed tables, UNIQUE constraints must include the distribution
> column of the table.
>

Yes, because we cannot enforce unique constraints in this case unless we do
not have global index.   It is not practical to visit all the datanodes to
enforce this.


> ****
> 2. in UNIQUE constraint, distribution column must be included in PRIMARY
> KEY. Other restrictions apply to PRIMARY KEY too.
>

This is by the same reason as 1.


> ****
> 3. REFERENCE must be the distribution column while creating Foreign key in
> relation. In this case, we cannot add PRIMARY KEY to other column in
> relation because PRIMARY KEY must be the distribution column as well.
>

This is also by the same reason as 1.  We may be able to support this
without such global object management if referred column is a primary key
or set of referred column includes a primary key.


> ****
> 4. With REFERENCES integrity needs to be the distribution column, it is
> Postgres-XC's restriction that we cannot specify more than one columns with
> REFERENCES constraint.
>

See above.


>
****
> 5. Not allowed Modifying distribution columns definition and Modifying
> distribution column values
>

When distribution column value is altered, we need to move this row to
somewhere else.  It is not simple so far to track this to handle following
update statement correctly.

****
>
> I am not pretty clear about all those point, so need your help.****
> My questions are as follow:****
>
> Is distribution on table is possible in following condition ?****
>         1. relation(table) has Primary key and Foreign key(REFERENCE
> column) and both are different?****
>         2. If primary key is on ( col1, col2) and Foreign key is only on
> (col2) is this O.K, if PRIMARY key is used as distribution column. ****
>         3. If relation has more than one Foreign key (on col1 and col2) **
> **
>
> I think it not possible distribute table in such condition, in this case
> is this best way to distributed by ROUND ROBIN or REPLECATE relations?
>

Round robin is not a good idea.   Make referred table "replicated" will fix
the above problems.

Hope they help.

Regards;
---
Koichi Suzuki



> ****
>
> Thoughts?  Comments?****
>
> Thank you.****
>
> Regards****
> Amul Sul****
>
>
> ------------------------------------------------------------------------------
> Get 100% visibility into Java/.NET code with AppDynamics Lite!
> It's a free troubleshooting tool designed for production.
> Get down to code-level detail for bottlenecks, with <2% overhead.
> Download for free and get started troubleshooting in minutes.
> http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
> _______________________________________________
> Postgres-xc-general mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
>
>

Re: [Postgres-xc-general] How distribution of tables works in PGXC?

From: Ashutosh B. <ash...@en...> - 2013-08-07 06:25:46

Hi Amul,
Please check out this video http://www.youtube.com/watch?v=g_9CYat8nkY. The
video explains various impacts the distribution/replication has in XC.


On Wed, Aug 7, 2013 at 11:50 AM, Koichi Suzuki <koi...@gm...>wrote:

> Hello Amul;
>
>
> Welcome to Postgres-XC community.
>
>
> 2013/8/7 amul sul <sul...@ya...>
>
>> Hello ALL,****
>>
>> I need little help, to migrated Postgres database to the PGXC.****
>>
>> While creating database, I want to best possible way of relations should
>> distribute and/or replicated, Primary aim to achieve to optimum performance.
>>
>
> Some description on table distribution/replication will be found in PGCon
> tutorial material available at our Wiki page
> http://postgresxc.wikia.com/wiki/Postgres-XC_Wiki.
>
>
>> ****
>>
>> But we need to consider following constrain mention in documentation. ***
>> *
>>
>> 1. In distributed tables, UNIQUE constraints must include the
>> distribution column of the table.
>>
>
> Yes, because we cannot enforce unique constraints in this case unless we
> do not have global index.   It is not practical to visit all the datanodes
> to enforce this.
>
>
>>  ****
>> 2. in UNIQUE constraint, distribution column must be included in PRIMARY
>> KEY. Other restrictions apply to PRIMARY KEY too.
>>
>
> This is by the same reason as 1.
>
>
>> ****
>> 3. REFERENCE must be the distribution column while creating Foreign key
>> in relation. In this case, we cannot add PRIMARY KEY to other column in
>> relation because PRIMARY KEY must be the distribution column as well.
>>
>
> This is also by the same reason as 1.  We may be able to support this
> without such global object management if referred column is a primary key
> or set of referred column includes a primary key.
>
>
>> ****
>> 4. With REFERENCES integrity needs to be the distribution column, it is
>> Postgres-XC's restriction that we cannot specify more than one columns with
>> REFERENCES constraint.
>>
>
> See above.
>
>
>>
> ****
>> 5. Not allowed Modifying distribution columns definition and Modifying
>> distribution column values
>>
>
> When distribution column value is altered, we need to move this row to
> somewhere else.  It is not simple so far to track this to handle following
> update statement correctly.
>
> ****
>>
>> I am not pretty clear about all those point, so need your help.****
>> My questions are as follow:****
>>
>> Is distribution on table is possible in following condition ?****
>>         1. relation(table) has Primary key and Foreign key(REFERENCE
>> column) and both are different?****
>>         2. If primary key is on ( col1, col2) and Foreign key is only on
>> (col2) is this O.K, if PRIMARY key is used as distribution column. ****
>>         3. If relation has more than one Foreign key (on col1 and col2) *
>> ***
>>
>> I think it not possible distribute table in such condition, in this case
>> is this best way to distributed by ROUND ROBIN or REPLECATE relations?
>>
>
> Round robin is not a good idea.   Make referred table "replicated" will
> fix the above problems.
>
> Hope they help.
>
> Regards;
> ---
> Koichi Suzuki
>
>
>
>> ****
>>
>> Thoughts?  Comments?****
>>
>> Thank you.****
>>
>> Regards****
>>  Amul Sul****
>>
>>
>> ------------------------------------------------------------------------------
>> Get 100% visibility into Java/.NET code with AppDynamics Lite!
>> It's a free troubleshooting tool designed for production.
>> Get down to code-level detail for bottlenecks, with <2% overhead.
>> Download for free and get started troubleshooting in minutes.
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Postgres-xc-general mailing list
>> Pos...@li...
>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> Get 100% visibility into Java/.NET code with AppDynamics Lite!
> It's a free troubleshooting tool designed for production.
> Get down to code-level detail for bottlenecks, with <2% overhead.
> Download for free and get started troubleshooting in minutes.
> http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
> _______________________________________________
> Postgres-xc-general mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
>
>


-- 
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Postgres Database Company

Re: [Postgres-xc-general] How distribution of tables works in PGXC?

From: Michael P. <mic...@gm...> - 2013-08-07 06:39:01

Your questions are already answered, here are just some additions.

On Wed, Aug 7, 2013 at 2:56 PM, amul sul <sul...@ya...> wrote:
> 5. Not allowed Modifying distribution columns definition
This is true in 1.0, not in 1.1 and above. It is possible to change
the distribution type of a table with ALTER TABLE in the upcoming 1.1.
Syntax is the same as CREATE TABLE.
http://postgres-xc.sourceforge.net/docs/1_1_beta/sql-altertable.html

> 5bis. Modifying distribution column values
This needs tuple relocation... Don't count on that.

>
> I am not pretty clear about all those point, so need your help.
> My questions are as follow:
>
> Is distribution on table is possible in following condition ?
>         1. relation(table) has Primary key and Foreign key(REFERENCE column)
> and both are different?
In this case, you can distribute the child table on if column and
table have the same data type, and their data is located on the same
nodes such as they perfectly map. If this is not the case, you need to
replicate the parent table referenced.

>         2. If primary key is on ( col1, col2) and Foreign key is only on
> (col2) is this O.K, if PRIMARY key is used as distribution column.
If parent is distributed with col2, you could be able to enforce the
constraint evaluation to Datanodes if child is also distributed with
col2, and that the data of parent and child is located on the same
datanodes.
In other cases, replicate the parent.

>         3. If relation has more than one Foreign key (on col1 and col2)
As long as the parent is distributed with one column included in the
foreign key, it is fine to distribute the child table with the same
column. Child and parent would need once again to have their data
located on the same nodes.

In case 2&3, I am wondering if XC algorithm is smart enough to allow
table creation though in those cases... Comments on that?

> I think it not possible distribute table in such condition, in this case is
> this best way to distributed by ROUND ROBIN or REPLECATE relations?
Avoid ROUND ROBIN if you want to have control on the data distributed.
-- 
Michael

Re: [Postgres-xc-general] How distribution of tables works in PGXC?

From: amul s. <sul...@ya...> - 2013-08-08 00:51:46

Dear Michael-sir

Thanks to you and Suzuki-san, Bapat-sir for their quick reply.

Here just want to clear Parent and child relation, 
Do you mean, child table which contain Foreign key and Parent table is which going to be referred. 
The following eg. PRODUCTS table will be Parent and  ORDERS table is child.

I hope, this is right,

CREATE TABLE products ( 
product_no integer PRIMARY KEY, 
name text, 
price numeric
);

CREATE TABLE orders ( 
order_id integer,
product_no integer REFERENCES products (product_no), 
quantity integer
) 

As can i see, more general way is the referred table(here products) should replicated only when we unable distribute it (using its product_no) due to other limitations, will definitely follow in such case.  

My simple understanding is, when we going to add any row in ORDERS table, we are going to look respective entry is present in PRODUCTS table.
so we need to find it using it unique key in global distribution or locally.

That's  why here we needed PRODUCTS table should distributed on its PRODUCT_NO column.
and no worry about ORDERS table distribution.

Is this right?

But document at http://postgres-xc.sourceforge.net/docs/1_0_3/ddl-constraints.html

------------------------------------------------------------------------------
CREATE TABLE orders ( order_id integer, product_no integer REFERENCES products (product_no), quantity integer
) DISTRIBUTE BY HASH(product_no);
Note: The following description applies only to Postgres-XC
Please note that column with REFERENCE must be the distribution column. In this case, we cannot add PRIMARY KEY to order_id because PRIMARY KEY must be the distribution column as well. This limitation is introduced because constraints are enforced only locally in each Datanode, which will be resolved in the future.
------------------------------------------------------------------------------

This focusing to distribute ORDERS on its reference column (here product_no).

Does this required? 
because in my case if my query is inserting row in ORDERS table, will always try to search refereed table i.e PRODUCTS.
so why should distribute ORDERS table with REFERENCE column.

Lets say there will be third table which referring ORDER on its order_id, in this case its good idea to distribute it on order_id.
Finally picture will look like 

PRODUCTS --> distribute using product_no
ORDERS -->  distribute using order_id
THIRD_TABLE --> depends on requirement.

Can i do this way, do i missing something? 

>>Please check out this video http://www.youtube.com/watch?v=g_9CYat8nkY (Preview) . The video explains various impacts the distribution/replication has in XC.
Very informative presentation. Thank you :)

>>>Welcome to Postgres-XC community.
Thank you Suzuki-san.

>>Hi Amul,

Thanks and Regards,
Amul Sul

Re: [Postgres-xc-general] How distribution of tables works in PGXC?

From: Koichi S. <koi...@gm...> - 2013-08-08 02:23:43

Hello;


2013/8/8 amul sul <sul...@ya...>

> Dear Michael-sir
>
> Thanks to you and Suzuki-san, Bapat-sir for their quick reply.
>
> Here just want to clear Parent and child relation,
> Do you mean, child table which contain Foreign key and Parent table is
> which going to be referred.
> The following eg. PRODUCTS table will be Parent and  ORDERS table is child.
>
> I hope, this is right,
>
> CREATE TABLE products (
> product_no integer PRIMARY KEY,
> name text,
> price numeric
> );
>

This is correct.   The table will be distributed by hash using the primary
key.


>
> CREATE TABLE orders (
> order_id integer,
> product_no integer REFERENCES products (product_no),
> quantity integer
> )
>

This is not supported yet.   Because product table row referred from orders
table can be located  at other datanode.

To make this work, you can distribute product table as replicated.   This
works.   Because product table is more stable than order table, impact to
the whole throughput may not be significant.

Regards;
---
Koichi Suzuki


> As can i see, more general way is the referred table(here products) should
> replicated only when we unable distribute it (using its product_no) due to
> other limitations, will definitely follow in such case.
>
> My simple understanding is, when we going to add any row in ORDERS table,
> we are going to look respective entry is present in PRODUCTS table.
> so we need to find it using it unique key in global distribution or
> locally.
>
> That's  why here we needed PRODUCTS table should distributed on its
> PRODUCT_NO column.
> and no worry about ORDERS table distribution.
>
> Is this right?
>
> But document at
> http://postgres-xc.sourceforge.net/docs/1_0_3/ddl-constraints.html
>
>
> ------------------------------------------------------------------------------
> CREATE TABLE orders ( order_id integer, product_no integer REFERENCES
> products (product_no), quantity integer
> ) DISTRIBUTE BY HASH(product_no);
> Note: The following description applies only to Postgres-XC
> Please note that column with REFERENCE must be the distribution column. In
> this case, we cannot add PRIMARY KEY to order_id because PRIMARY KEY must
> be the distribution column as well. This limitation is introduced because
> constraints are enforced only locally in each Datanode, which will be
> resolved in the future.
>
> ------------------------------------------------------------------------------
>
> This focusing to distribute ORDERS on its reference column
> (here product_no).
>
> Does this required?
> because in my case if my query is inserting row in ORDERS table, will
> always try to search refereed table i.e PRODUCTS.
> so why should distribute ORDERS table with REFERENCE column.
>
> Lets say there will be third table which referring ORDER on its order_id,
> in this case its good idea to distribute it on order_id.
> Finally picture will look like
>
> PRODUCTS --> distribute using product_no
> ORDERS -->  distribute using order_id
> THIRD_TABLE --> depends on requirement.
>
> Can i do this way, do i missing something?
>
> >>Please check out this video http://www.youtube.com/watch?v=g_9CYat8nkY (Preview) .
> The video explains various impacts the distribution/replication has in XC.
> Very informative presentation. Thank you :)
>
> >>>Welcome to Postgres-XC community.
> Thank you Suzuki-san.
>
> >>Hi Amul,
>
> Thanks and Regards,
> Amul Sul
>
>
> ------------------------------------------------------------------------------
> Get 100% visibility into Java/.NET code with AppDynamics Lite!
> It's a free troubleshooting tool designed for production.
> Get down to code-level detail for bottlenecks, with <2% overhead.
> Download for free and get started troubleshooting in minutes.
> http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
> _______________________________________________
> Postgres-xc-general mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
>

Re: [Postgres-xc-general] How distribution of tables works in PGXC?

From: amul s. <sul...@ya...> - 2013-08-08 04:12:07

Hello Suzuki-san,

>This is not supported yet.   Because product table row referred from orders table can be located  at other datanode.

>To make this work, you can distribute product table as replicated.   This works.   Because product table is more stable than order table, >impact to the whole throughput may not be significant.

But in my whole database contain, multiple table with reference key.
Do i need to replicated all referred table? 
is this good scenario for achieving good performance compare to Single Postgres instance? 

Thanks and Regards,
Amul Sul