Thread: [Postgres-xc-general] modifying values at the datanode

Brought to you by: ahsanhadi, amitdkhan, ashutoshbapat, gabbasb, and 3 others

postgres-xc-general

[Postgres-xc-general] modifying values at the datanode

From: Sandeep G. <gup...@gm...> - 2013-10-04 17:58:31

Hi,

 I understand that the datanodes are read only and that updates/insert can
happen at co-ordinator. Also, it does not allow modification of column over
which the records are distributed.

However, in case I know what I am doing, it there anyway possible to modify
the values directly at datanodes. The modifications are not over column
over which distribution happens.

Thanks.
Sandeep

Re: [Postgres-xc-general] modifying values at the datanode

From: Michael P. <mic...@gm...> - 2013-10-05 10:27:21

On Sat, Oct 5, 2013 at 2:58 AM, Sandeep Gupta <gup...@gm...> wrote:
>  I understand that the datanodes are read only and that updates/insert can
> happen at coordinator.
You got it.

> Also, it does not allow modification of column over which the records are distributed.
Hum no, 1.1 allows ALTER TABLE that you can use to change the
distribution type of a table.

> However, in case I know what I am doing, it there anyway possible to modify
> the values directly at datanodes.
> The modifications are not over column over which distribution happens.
If you mean by connecting directly to the Datanodes, no. You would
break data consistency if table is replicated by the way by doing
that. Let the Coordinator planner do the job and choose the remote
nodes for you.

There have been discussion to merge Coordinators and Datanodes
together though. This would allow what you say, with a simpler cluster
design.
-- 
Michael

Re: [Postgres-xc-general] modifying values at the datanode

From: Sandeep G. <gup...@gm...> - 2013-10-05 14:14:26

Thanks Michael. I understand. The only issue is that we have an update
query as

update T set T.a = -1 from A where A.x = T.x


Both A and T and distributed by x column. The problem is that coordinator
first does the join and then
calls update several times at each datanode. This is turning out to be too
slow. Would have
been better if the entire query was shipped to the datanodes.

Thanks.
Sandeep



On Sat, Oct 5, 2013 at 6:27 AM, Michael Paquier
<mic...@gm...>wrote:

> On Sat, Oct 5, 2013 at 2:58 AM, Sandeep Gupta <gup...@gm...>
> wrote:
> >  I understand that the datanodes are read only and that updates/insert
> can
> > happen at coordinator.
> You got it.
>
> > Also, it does not allow modification of column over which the records
> are distributed.
> Hum no, 1.1 allows ALTER TABLE that you can use to change the
> distribution type of a table.
>
> > However, in case I know what I am doing, it there anyway possible to
> modify
> > the values directly at datanodes.
> > The modifications are not over column over which distribution happens.
> If you mean by connecting directly to the Datanodes, no. You would
> break data consistency if table is replicated by the way by doing
> that. Let the Coordinator planner do the job and choose the remote
> nodes for you.
>
> There have been discussion to merge Coordinators and Datanodes
> together though. This would allow what you say, with a simpler cluster
> design.
> --
> Michael
>

Re: [Postgres-xc-general] modifying values at the datanode

From: Michael P. <mic...@gm...> - 2013-10-05 14:26:59

On Sat, Oct 5, 2013 at 11:14 PM, Sandeep Gupta <gup...@gm...> wrote:
> Thanks Michael. I understand. The only issue is that we have an update
> query as
>
> update T set T.a = -1 from A where A.x = T.x
>
>
> Both A and T and distributed by x column. The problem is that coordinator
> first does the join and then
> calls update several times at each datanode. This is turning out to be too
> slow. Would have
> been better if the entire query was shipped to the datanodes.
Hum?! Logically, I would imagine that if A and T are distributed by x
this WHERE clause should be pushed down as the SET clause is a
constant. However perhaps UPDATE FROM does not have an explicit
support... Could you provide the version number and an EXPLAIN VERBOSE
output?

What if you put the where join in a subquery or a WITH clause? Like
that for example:
update T set T.a = -1 where A.x = (select A.x from A,T where A.x = T.x);
-- 
Michael

Re: [Postgres-xc-general] modifying values at the datanode

From: Sandeep G. <gup...@gm...> - 2013-10-05 14:48:09

Hi Michael,

 Sure. I am using pgxc v.1.
For the query explain verbose update person set intervened = 84 from d1_sum
WHERE person.pid=d1_sum.pid;


Query Plan:

 Update on public.person  (cost=0.00..0.00 rows=1000 width=24)
   Node/s: datanode1
   Node expr: person.pid
   Remote query: UPDATE ONLY public.person SET intervened = $3 WHERE
((person.ctid = $4) AND (person.xc_node_id = $5))
   ->  Data Node Scan on "_REMOTE_TABLE_QUERY_"  (cost=0.00..0.00 rows=1000
width=24)
         Output: person.pid, person.persons, 84, person.pid, person.ctid,
person.xc_node_id, d1_sum.ctid
         Node/s: datanode1
         Remote query: SELECT l.a_1, l.a_2, l.a_3, l.a_4, r.a_1 FROM
((SELECT person.pid, person.persons, person.ctid, person.xc_node_i
d FROM ONLY public.person WHERE true) l(a_1, a_2, a_3, a_4) JOIN (SELECT
d1_sum.ctid, d1_sum.pid FROM ONLY public.d1_sum WHERE true) r(
a_1, a_2) ON (true)) WHERE (l.a_1 = r.a_2)
(8 rows)

For the second style

explain verbose update person set intervened = 84 where person.pid =
(select d1_sum.pid from d1_sum,person WHERE person.pid=d1_sum.pid);

 Update on public.person  (cost=0.00..0.00 rows=1000 width=18)
   Node/s: datanode1
   Node expr: public.person.pid
   Remote query: UPDATE ONLY public.person SET intervened = $3 WHERE
((person.ctid = $4) AND (person.xc_node_id = $5))
   InitPlan 1 (returns $0)
     ->  Data Node Scan on "_REMOTE_TABLE_QUERY_"  (cost=0.00..0.00
rows=1000 width=4)
           Output: d1_sum.pid
           Node/s: datanode1
           Remote query: SELECT l.a_1 FROM ((SELECT d1_sum.pid FROM ONLY
public.d1_sum WHERE true) l(a_1) JOIN (SELECT person.pid FROM
ONLY public.person WHERE true) r(a_1) ON (true)) WHERE (l.a_1 = r.a_1)
   ->  Data Node Scan on person "_REMOTE_TABLE_QUERY_"  (cost=0.00..0.00
rows=1000 width=18)
         Output: public.person.pid, public.person.persons, 84,
public.person.pid, public.person.ctid, public.person.xc_node_id
         Node/s: datanode1
         Remote query: SELECT pid, persons, ctid, xc_node_id FROM ONLY
public.person WHERE true
         Coordinator quals: (public.person.pid = $0)


In both the scenarios the planner breaks it into two parts: update and
join. The results of join is pulled up at the coordinator and then shipped
one by one for update.

Thanks for taking a look.

-Sandeep





On Sat, Oct 5, 2013 at 10:26 AM, Michael Paquier
<mic...@gm...>wrote:

> On Sat, Oct 5, 2013 at 11:14 PM, Sandeep Gupta <gup...@gm...>
> wrote:
> > Thanks Michael. I understand. The only issue is that we have an update
> > query as
> >
> > update T set T.a = -1 from A where A.x = T.x
> >
> >
> > Both A and T and distributed by x column. The problem is that coordinator
> > first does the join and then
> > calls update several times at each datanode. This is turning out to be
> too
> > slow. Would have
> > been better if the entire query was shipped to the datanodes.
> Hum?! Logically, I would imagine that if A and T are distributed by x
> this WHERE clause should be pushed down as the SET clause is a
> constant. However perhaps UPDATE FROM does not have an explicit
> support... Could you provide the version number and an EXPLAIN VERBOSE
> output?
>
> What if you put the where join in a subquery or a WITH clause? Like
> that for example:
> update T set T.a = -1 where A.x = (select A.x from A,T where A.x = T.x);
> --
> Michael
>

Re: [Postgres-xc-general] modifying values at the datanode

From: Michael P. <mic...@gm...> - 2013-10-05 21:49:59

On Sat, Oct 5, 2013 at 11:48 PM, Sandeep Gupta <gup...@gm...> wrote:
>
> Hi Michael,
>
>  Sure. I am using pgxc v.1.
1.0 or 1.1?

> For the query explain verbose update person set intervened = 84 from d1_sum
> WHERE person.pid=d1_sum.pid;
>
>
> Query Plan:
>
>  Update on public.person  (cost=0.00..0.00 rows=1000 width=24)
>    Node/s: datanode1
>    Node expr: person.pid
>    Remote query: UPDATE ONLY public.person SET intervened = $3 WHERE
> ((person.ctid = $4) AND (person.xc_node_id = $5))
>    ->  Data Node Scan on "_REMOTE_TABLE_QUERY_"  (cost=0.00..0.00 rows=1000
> width=24)
>          Output: person.pid, person.persons, 84, person.pid, person.ctid,
> person.xc_node_id, d1_sum.ctid
>          Node/s: datanode1
>          Remote query: SELECT l.a_1, l.a_2, l.a_3, l.a_4, r.a_1 FROM
> ((SELECT person.pid, person.persons, person.ctid, person.xc_node_i
> d FROM ONLY public.person WHERE true) l(a_1, a_2, a_3, a_4) JOIN (SELECT
> d1_sum.ctid, d1_sum.pid FROM ONLY public.d1_sum WHERE true) r(
> a_1, a_2) ON (true)) WHERE (l.a_1 = r.a_2)
> (8 rows)
>
> For the second style
>
> explain verbose update person set intervened = 84 where person.pid = (select
> d1_sum.pid from d1_sum,person WHERE person.pid=d1_sum.pid);
>
>  Update on public.person  (cost=0.00..0.00 rows=1000 width=18)
>    Node/s: datanode1
>    Node expr: public.person.pid
>    Remote query: UPDATE ONLY public.person SET intervened = $3 WHERE
> ((person.ctid = $4) AND (person.xc_node_id = $5))
>    InitPlan 1 (returns $0)
>      ->  Data Node Scan on "_REMOTE_TABLE_QUERY_"  (cost=0.00..0.00
> rows=1000 width=4)
>            Output: d1_sum.pid
>            Node/s: datanode1
>            Remote query: SELECT l.a_1 FROM ((SELECT d1_sum.pid FROM ONLY
> public.d1_sum WHERE true) l(a_1) JOIN (SELECT person.pid FROM
> ONLY public.person WHERE true) r(a_1) ON (true)) WHERE (l.a_1 = r.a_1)
>    ->  Data Node Scan on person "_REMOTE_TABLE_QUERY_"  (cost=0.00..0.00
> rows=1000 width=18)
>          Output: public.person.pid, public.person.persons, 84,
> public.person.pid, public.person.ctid, public.person.xc_node_id
>          Node/s: datanode1
>          Remote query: SELECT pid, persons, ctid, xc_node_id FROM ONLY
> public.person WHERE true
>          Coordinator quals: (public.person.pid = $0)
>
>
> In both the scenarios the planner breaks it into two parts: update and join.
> The results of join is pulled up at the coordinator and then shipped one by
> one for update.
Indeed you are right. It seems that FROM clause support in UPDATE is
limited? Others, comments on that? I thought that there has been some
work done in the area.
-- 
Michael

Re: [Postgres-xc-general] modifying values at the datanode

From: Ashutosh B. <ash...@en...> - 2013-10-07 03:52:39

On Fri, Oct 4, 2013 at 11:28 PM, Sandeep Gupta <gup...@gm...>wrote:

> Hi,
>
>  I understand that the datanodes are read only and that updates/insert can
> happen at co-ordinator. Also, it does not allow modification of column over
> which the records are distributed.
>

Users should not connect to the datanodes directly, that might break
consistency. All the accesses to the data should happen through the
coordinator.


>
> However, in case I know what I am doing, it there anyway possible to
> modify the values directly at datanodes. The modifications are not over
> column over which distribution happens.
>
> Thanks.
> Sandeep
>
>
>
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
> from
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
> _______________________________________________
> Postgres-xc-general mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
>
>


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [Postgres-xc-general] modifying values at the datanode

From: Ashutosh B. <ash...@en...> - 2013-10-07 03:54:04

On Sat, Oct 5, 2013 at 7:44 PM, Sandeep Gupta <gup...@gm...>wrote:

> Thanks Michael. I understand. The only issue is that we have an update
> query as
>
> update T set T.a = -1 from A where A.x = T.x
>
>
> Both A and T and distributed by x column. The problem is that coordinator
> first does the join and then
> calls update several times at each datanode. This is turning out to be too
> slow. Would have
> been better if the entire query was shipped to the datanodes.
>
>
Right now there is no way to ship a DML with more than one relation
involved there. But that's something, I have been thinking about. If you
have developer resources and can produce a patch. I can help.


> Thanks.
> Sandeep
>
>
>
> On Sat, Oct 5, 2013 at 6:27 AM, Michael Paquier <mic...@gm...
> > wrote:
>
>> On Sat, Oct 5, 2013 at 2:58 AM, Sandeep Gupta <gup...@gm...>
>> wrote:
>> >  I understand that the datanodes are read only and that updates/insert
>> can
>> > happen at coordinator.
>> You got it.
>>
>> > Also, it does not allow modification of column over which the records
>> are distributed.
>> Hum no, 1.1 allows ALTER TABLE that you can use to change the
>> distribution type of a table.
>>
>> > However, in case I know what I am doing, it there anyway possible to
>> modify
>> > the values directly at datanodes.
>> > The modifications are not over column over which distribution happens.
>> If you mean by connecting directly to the Datanodes, no. You would
>> break data consistency if table is replicated by the way by doing
>> that. Let the Coordinator planner do the job and choose the remote
>> nodes for you.
>>
>> There have been discussion to merge Coordinators and Datanodes
>> together though. This would allow what you say, with a simpler cluster
>> design.
>> --
>> Michael
>>
>
>
>
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
> from
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
> _______________________________________________
> Postgres-xc-general mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
>
>


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [Postgres-xc-general] modifying values at the datanode

From: Sandeep G. <gup...@gm...> - 2013-10-07 15:40:35

Hi Ashutosh,

 Thanks for the note.  I cannot commit right away. However, whenever you
have some time can you mention relevant portions in the codebase where
changes have to be made. I have a general understanding of the execution
engine. I will take a look see if it feasible for me.

Thanks.
Sandeep



On Sun, Oct 6, 2013 at 11:53 PM, Ashutosh Bapat <
ash...@en...> wrote:

>
>
>
> On Sat, Oct 5, 2013 at 7:44 PM, Sandeep Gupta <gup...@gm...>wrote:
>
>> Thanks Michael. I understand. The only issue is that we have an update
>> query as
>>
>> update T set T.a = -1 from A where A.x = T.x
>>
>>
>> Both A and T and distributed by x column. The problem is that coordinator
>> first does the join and then
>> calls update several times at each datanode. This is turning out to be
>> too slow. Would have
>> been better if the entire query was shipped to the datanodes.
>>
>>
> Right now there is no way to ship a DML with more than one relation
> involved there. But that's something, I have been thinking about. If you
> have developer resources and can produce a patch. I can help.
>
>
>> Thanks.
>> Sandeep
>>
>>
>>
>> On Sat, Oct 5, 2013 at 6:27 AM, Michael Paquier <
>> mic...@gm...> wrote:
>>
>>> On Sat, Oct 5, 2013 at 2:58 AM, Sandeep Gupta <gup...@gm...>
>>> wrote:
>>> >  I understand that the datanodes are read only and that updates/insert
>>> can
>>> > happen at coordinator.
>>> You got it.
>>>
>>> > Also, it does not allow modification of column over which the records
>>> are distributed.
>>> Hum no, 1.1 allows ALTER TABLE that you can use to change the
>>> distribution type of a table.
>>>
>>> > However, in case I know what I am doing, it there anyway possible to
>>> modify
>>> > the values directly at datanodes.
>>> > The modifications are not over column over which distribution happens.
>>> If you mean by connecting directly to the Datanodes, no. You would
>>> break data consistency if table is replicated by the way by doing
>>> that. Let the Coordinator planner do the job and choose the remote
>>> nodes for you.
>>>
>>> There have been discussion to merge Coordinators and Datanodes
>>> together though. This would allow what you say, with a simpler cluster
>>> design.
>>> --
>>> Michael
>>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> October Webinars: Code for Performance
>> Free Intel webinars can help you accelerate application performance.
>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
>> from
>> the latest Intel processors and coprocessors. See abstracts and register >
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Postgres-xc-general mailing list
>> Pos...@li...
>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
>>
>>
>
>
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company
>

Re: [Postgres-xc-general] modifying values at the datanode

From: Ashutosh B. <ash...@en...> - 2013-10-08 04:04:25

The changes have to be in pgxc_make_modifytable() to construct a shippable
DML out of the scan plan and the modification plan. Changes will be needed
in ExecUpdate/Delete/Insert for execution of this shippable DML. Also,
there will be changes needed in pgxc_shippability_walker() to improve FQS
technique for the same.


On Mon, Oct 7, 2013 at 9:10 PM, Sandeep Gupta <gup...@gm...>wrote:

> Hi Ashutosh,
>
>  Thanks for the note.  I cannot commit right away. However, whenever you
> have some time can you mention relevant portions in the codebase where
> changes have to be made. I have a general understanding of the execution
> engine. I will take a look see if it feasible for me.
>
> Thanks.
> Sandeep
>
>
>
> On Sun, Oct 6, 2013 at 11:53 PM, Ashutosh Bapat <
> ash...@en...> wrote:
>
>>
>>
>>
>> On Sat, Oct 5, 2013 at 7:44 PM, Sandeep Gupta <gup...@gm...>wrote:
>>
>>> Thanks Michael. I understand. The only issue is that we have an update
>>> query as
>>>
>>> update T set T.a = -1 from A where A.x = T.x
>>>
>>>
>>> Both A and T and distributed by x column. The problem is that
>>> coordinator first does the join and then
>>> calls update several times at each datanode. This is turning out to be
>>> too slow. Would have
>>> been better if the entire query was shipped to the datanodes.
>>>
>>>
>> Right now there is no way to ship a DML with more than one relation
>> involved there. But that's something, I have been thinking about. If you
>> have developer resources and can produce a patch. I can help.
>>
>>
>>> Thanks.
>>> Sandeep
>>>
>>>
>>>
>>> On Sat, Oct 5, 2013 at 6:27 AM, Michael Paquier <
>>> mic...@gm...> wrote:
>>>
>>>> On Sat, Oct 5, 2013 at 2:58 AM, Sandeep Gupta <gup...@gm...>
>>>> wrote:
>>>> >  I understand that the datanodes are read only and that
>>>> updates/insert can
>>>> > happen at coordinator.
>>>> You got it.
>>>>
>>>> > Also, it does not allow modification of column over which the records
>>>> are distributed.
>>>> Hum no, 1.1 allows ALTER TABLE that you can use to change the
>>>> distribution type of a table.
>>>>
>>>> > However, in case I know what I am doing, it there anyway possible to
>>>> modify
>>>> > the values directly at datanodes.
>>>> > The modifications are not over column over which distribution happens.
>>>> If you mean by connecting directly to the Datanodes, no. You would
>>>> break data consistency if table is replicated by the way by doing
>>>> that. Let the Coordinator planner do the job and choose the remote
>>>> nodes for you.
>>>>
>>>> There have been discussion to merge Coordinators and Datanodes
>>>> together though. This would allow what you say, with a simpler cluster
>>>> design.
>>>> --
>>>> Michael
>>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> October Webinars: Code for Performance
>>> Free Intel webinars can help you accelerate application performance.
>>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
>>> from
>>> the latest Intel processors and coprocessors. See abstracts and register
>>> >
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> Postgres-xc-general mailing list
>>> Pos...@li...
>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
>>>
>>>
>>
>>
>> --
>> Best Wishes,
>> Ashutosh Bapat
>> EnterpriseDB Corporation
>> The Postgres Database Company
>>
>
>


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company