|
From: Sandeep G. <gup...@gm...> - 2013-10-04 17:58:31
|
Hi, I understand that the datanodes are read only and that updates/insert can happen at co-ordinator. Also, it does not allow modification of column over which the records are distributed. However, in case I know what I am doing, it there anyway possible to modify the values directly at datanodes. The modifications are not over column over which distribution happens. Thanks. Sandeep |
|
From: Michael P. <mic...@gm...> - 2013-10-05 10:27:21
|
On Sat, Oct 5, 2013 at 2:58 AM, Sandeep Gupta <gup...@gm...> wrote: > I understand that the datanodes are read only and that updates/insert can > happen at coordinator. You got it. > Also, it does not allow modification of column over which the records are distributed. Hum no, 1.1 allows ALTER TABLE that you can use to change the distribution type of a table. > However, in case I know what I am doing, it there anyway possible to modify > the values directly at datanodes. > The modifications are not over column over which distribution happens. If you mean by connecting directly to the Datanodes, no. You would break data consistency if table is replicated by the way by doing that. Let the Coordinator planner do the job and choose the remote nodes for you. There have been discussion to merge Coordinators and Datanodes together though. This would allow what you say, with a simpler cluster design. -- Michael |
|
From: Sandeep G. <gup...@gm...> - 2013-10-05 14:14:26
|
Thanks Michael. I understand. The only issue is that we have an update query as update T set T.a = -1 from A where A.x = T.x Both A and T and distributed by x column. The problem is that coordinator first does the join and then calls update several times at each datanode. This is turning out to be too slow. Would have been better if the entire query was shipped to the datanodes. Thanks. Sandeep On Sat, Oct 5, 2013 at 6:27 AM, Michael Paquier <mic...@gm...>wrote: > On Sat, Oct 5, 2013 at 2:58 AM, Sandeep Gupta <gup...@gm...> > wrote: > > I understand that the datanodes are read only and that updates/insert > can > > happen at coordinator. > You got it. > > > Also, it does not allow modification of column over which the records > are distributed. > Hum no, 1.1 allows ALTER TABLE that you can use to change the > distribution type of a table. > > > However, in case I know what I am doing, it there anyway possible to > modify > > the values directly at datanodes. > > The modifications are not over column over which distribution happens. > If you mean by connecting directly to the Datanodes, no. You would > break data consistency if table is replicated by the way by doing > that. Let the Coordinator planner do the job and choose the remote > nodes for you. > > There have been discussion to merge Coordinators and Datanodes > together though. This would allow what you say, with a simpler cluster > design. > -- > Michael > |
|
From: Michael P. <mic...@gm...> - 2013-10-05 14:26:59
|
On Sat, Oct 5, 2013 at 11:14 PM, Sandeep Gupta <gup...@gm...> wrote: > Thanks Michael. I understand. The only issue is that we have an update > query as > > update T set T.a = -1 from A where A.x = T.x > > > Both A and T and distributed by x column. The problem is that coordinator > first does the join and then > calls update several times at each datanode. This is turning out to be too > slow. Would have > been better if the entire query was shipped to the datanodes. Hum?! Logically, I would imagine that if A and T are distributed by x this WHERE clause should be pushed down as the SET clause is a constant. However perhaps UPDATE FROM does not have an explicit support... Could you provide the version number and an EXPLAIN VERBOSE output? What if you put the where join in a subquery or a WITH clause? Like that for example: update T set T.a = -1 where A.x = (select A.x from A,T where A.x = T.x); -- Michael |
|
From: Sandeep G. <gup...@gm...> - 2013-10-05 14:48:09
|
Hi Michael,
Sure. I am using pgxc v.1.
For the query explain verbose update person set intervened = 84 from d1_sum
WHERE person.pid=d1_sum.pid;
Query Plan:
Update on public.person (cost=0.00..0.00 rows=1000 width=24)
Node/s: datanode1
Node expr: person.pid
Remote query: UPDATE ONLY public.person SET intervened = $3 WHERE
((person.ctid = $4) AND (person.xc_node_id = $5))
-> Data Node Scan on "_REMOTE_TABLE_QUERY_" (cost=0.00..0.00 rows=1000
width=24)
Output: person.pid, person.persons, 84, person.pid, person.ctid,
person.xc_node_id, d1_sum.ctid
Node/s: datanode1
Remote query: SELECT l.a_1, l.a_2, l.a_3, l.a_4, r.a_1 FROM
((SELECT person.pid, person.persons, person.ctid, person.xc_node_i
d FROM ONLY public.person WHERE true) l(a_1, a_2, a_3, a_4) JOIN (SELECT
d1_sum.ctid, d1_sum.pid FROM ONLY public.d1_sum WHERE true) r(
a_1, a_2) ON (true)) WHERE (l.a_1 = r.a_2)
(8 rows)
For the second style
explain verbose update person set intervened = 84 where person.pid =
(select d1_sum.pid from d1_sum,person WHERE person.pid=d1_sum.pid);
Update on public.person (cost=0.00..0.00 rows=1000 width=18)
Node/s: datanode1
Node expr: public.person.pid
Remote query: UPDATE ONLY public.person SET intervened = $3 WHERE
((person.ctid = $4) AND (person.xc_node_id = $5))
InitPlan 1 (returns $0)
-> Data Node Scan on "_REMOTE_TABLE_QUERY_" (cost=0.00..0.00
rows=1000 width=4)
Output: d1_sum.pid
Node/s: datanode1
Remote query: SELECT l.a_1 FROM ((SELECT d1_sum.pid FROM ONLY
public.d1_sum WHERE true) l(a_1) JOIN (SELECT person.pid FROM
ONLY public.person WHERE true) r(a_1) ON (true)) WHERE (l.a_1 = r.a_1)
-> Data Node Scan on person "_REMOTE_TABLE_QUERY_" (cost=0.00..0.00
rows=1000 width=18)
Output: public.person.pid, public.person.persons, 84,
public.person.pid, public.person.ctid, public.person.xc_node_id
Node/s: datanode1
Remote query: SELECT pid, persons, ctid, xc_node_id FROM ONLY
public.person WHERE true
Coordinator quals: (public.person.pid = $0)
In both the scenarios the planner breaks it into two parts: update and
join. The results of join is pulled up at the coordinator and then shipped
one by one for update.
Thanks for taking a look.
-Sandeep
On Sat, Oct 5, 2013 at 10:26 AM, Michael Paquier
<mic...@gm...>wrote:
> On Sat, Oct 5, 2013 at 11:14 PM, Sandeep Gupta <gup...@gm...>
> wrote:
> > Thanks Michael. I understand. The only issue is that we have an update
> > query as
> >
> > update T set T.a = -1 from A where A.x = T.x
> >
> >
> > Both A and T and distributed by x column. The problem is that coordinator
> > first does the join and then
> > calls update several times at each datanode. This is turning out to be
> too
> > slow. Would have
> > been better if the entire query was shipped to the datanodes.
> Hum?! Logically, I would imagine that if A and T are distributed by x
> this WHERE clause should be pushed down as the SET clause is a
> constant. However perhaps UPDATE FROM does not have an explicit
> support... Could you provide the version number and an EXPLAIN VERBOSE
> output?
>
> What if you put the where join in a subquery or a WITH clause? Like
> that for example:
> update T set T.a = -1 where A.x = (select A.x from A,T where A.x = T.x);
> --
> Michael
>
|
|
From: Michael P. <mic...@gm...> - 2013-10-05 21:49:59
|
On Sat, Oct 5, 2013 at 11:48 PM, Sandeep Gupta <gup...@gm...> wrote: > > Hi Michael, > > Sure. I am using pgxc v.1. 1.0 or 1.1? > For the query explain verbose update person set intervened = 84 from d1_sum > WHERE person.pid=d1_sum.pid; > > > Query Plan: > > Update on public.person (cost=0.00..0.00 rows=1000 width=24) > Node/s: datanode1 > Node expr: person.pid > Remote query: UPDATE ONLY public.person SET intervened = $3 WHERE > ((person.ctid = $4) AND (person.xc_node_id = $5)) > -> Data Node Scan on "_REMOTE_TABLE_QUERY_" (cost=0.00..0.00 rows=1000 > width=24) > Output: person.pid, person.persons, 84, person.pid, person.ctid, > person.xc_node_id, d1_sum.ctid > Node/s: datanode1 > Remote query: SELECT l.a_1, l.a_2, l.a_3, l.a_4, r.a_1 FROM > ((SELECT person.pid, person.persons, person.ctid, person.xc_node_i > d FROM ONLY public.person WHERE true) l(a_1, a_2, a_3, a_4) JOIN (SELECT > d1_sum.ctid, d1_sum.pid FROM ONLY public.d1_sum WHERE true) r( > a_1, a_2) ON (true)) WHERE (l.a_1 = r.a_2) > (8 rows) > > For the second style > > explain verbose update person set intervened = 84 where person.pid = (select > d1_sum.pid from d1_sum,person WHERE person.pid=d1_sum.pid); > > Update on public.person (cost=0.00..0.00 rows=1000 width=18) > Node/s: datanode1 > Node expr: public.person.pid > Remote query: UPDATE ONLY public.person SET intervened = $3 WHERE > ((person.ctid = $4) AND (person.xc_node_id = $5)) > InitPlan 1 (returns $0) > -> Data Node Scan on "_REMOTE_TABLE_QUERY_" (cost=0.00..0.00 > rows=1000 width=4) > Output: d1_sum.pid > Node/s: datanode1 > Remote query: SELECT l.a_1 FROM ((SELECT d1_sum.pid FROM ONLY > public.d1_sum WHERE true) l(a_1) JOIN (SELECT person.pid FROM > ONLY public.person WHERE true) r(a_1) ON (true)) WHERE (l.a_1 = r.a_1) > -> Data Node Scan on person "_REMOTE_TABLE_QUERY_" (cost=0.00..0.00 > rows=1000 width=18) > Output: public.person.pid, public.person.persons, 84, > public.person.pid, public.person.ctid, public.person.xc_node_id > Node/s: datanode1 > Remote query: SELECT pid, persons, ctid, xc_node_id FROM ONLY > public.person WHERE true > Coordinator quals: (public.person.pid = $0) > > > In both the scenarios the planner breaks it into two parts: update and join. > The results of join is pulled up at the coordinator and then shipped one by > one for update. Indeed you are right. It seems that FROM clause support in UPDATE is limited? Others, comments on that? I thought that there has been some work done in the area. -- Michael |
|
From: Ashutosh B. <ash...@en...> - 2013-10-07 03:52:39
|
On Fri, Oct 4, 2013 at 11:28 PM, Sandeep Gupta <gup...@gm...>wrote: > Hi, > > I understand that the datanodes are read only and that updates/insert can > happen at co-ordinator. Also, it does not allow modification of column over > which the records are distributed. > Users should not connect to the datanodes directly, that might break consistency. All the accesses to the data should happen through the coordinator. > > However, in case I know what I am doing, it there anyway possible to > modify the values directly at datanodes. The modifications are not over > column over which distribution happens. > > Thanks. > Sandeep > > > > ------------------------------------------------------------------------------ > October Webinars: Code for Performance > Free Intel webinars can help you accelerate application performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most > from > the latest Intel processors and coprocessors. See abstracts and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > > -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company |
|
From: Ashutosh B. <ash...@en...> - 2013-10-07 03:54:04
|
On Sat, Oct 5, 2013 at 7:44 PM, Sandeep Gupta <gup...@gm...>wrote: > Thanks Michael. I understand. The only issue is that we have an update > query as > > update T set T.a = -1 from A where A.x = T.x > > > Both A and T and distributed by x column. The problem is that coordinator > first does the join and then > calls update several times at each datanode. This is turning out to be too > slow. Would have > been better if the entire query was shipped to the datanodes. > > Right now there is no way to ship a DML with more than one relation involved there. But that's something, I have been thinking about. If you have developer resources and can produce a patch. I can help. > Thanks. > Sandeep > > > > On Sat, Oct 5, 2013 at 6:27 AM, Michael Paquier <mic...@gm... > > wrote: > >> On Sat, Oct 5, 2013 at 2:58 AM, Sandeep Gupta <gup...@gm...> >> wrote: >> > I understand that the datanodes are read only and that updates/insert >> can >> > happen at coordinator. >> You got it. >> >> > Also, it does not allow modification of column over which the records >> are distributed. >> Hum no, 1.1 allows ALTER TABLE that you can use to change the >> distribution type of a table. >> >> > However, in case I know what I am doing, it there anyway possible to >> modify >> > the values directly at datanodes. >> > The modifications are not over column over which distribution happens. >> If you mean by connecting directly to the Datanodes, no. You would >> break data consistency if table is replicated by the way by doing >> that. Let the Coordinator planner do the job and choose the remote >> nodes for you. >> >> There have been discussion to merge Coordinators and Datanodes >> together though. This would allow what you say, with a simpler cluster >> design. >> -- >> Michael >> > > > > ------------------------------------------------------------------------------ > October Webinars: Code for Performance > Free Intel webinars can help you accelerate application performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most > from > the latest Intel processors and coprocessors. See abstracts and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > > -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company |
|
From: Sandeep G. <gup...@gm...> - 2013-10-07 15:40:35
|
Hi Ashutosh, Thanks for the note. I cannot commit right away. However, whenever you have some time can you mention relevant portions in the codebase where changes have to be made. I have a general understanding of the execution engine. I will take a look see if it feasible for me. Thanks. Sandeep On Sun, Oct 6, 2013 at 11:53 PM, Ashutosh Bapat < ash...@en...> wrote: > > > > On Sat, Oct 5, 2013 at 7:44 PM, Sandeep Gupta <gup...@gm...>wrote: > >> Thanks Michael. I understand. The only issue is that we have an update >> query as >> >> update T set T.a = -1 from A where A.x = T.x >> >> >> Both A and T and distributed by x column. The problem is that coordinator >> first does the join and then >> calls update several times at each datanode. This is turning out to be >> too slow. Would have >> been better if the entire query was shipped to the datanodes. >> >> > Right now there is no way to ship a DML with more than one relation > involved there. But that's something, I have been thinking about. If you > have developer resources and can produce a patch. I can help. > > >> Thanks. >> Sandeep >> >> >> >> On Sat, Oct 5, 2013 at 6:27 AM, Michael Paquier < >> mic...@gm...> wrote: >> >>> On Sat, Oct 5, 2013 at 2:58 AM, Sandeep Gupta <gup...@gm...> >>> wrote: >>> > I understand that the datanodes are read only and that updates/insert >>> can >>> > happen at coordinator. >>> You got it. >>> >>> > Also, it does not allow modification of column over which the records >>> are distributed. >>> Hum no, 1.1 allows ALTER TABLE that you can use to change the >>> distribution type of a table. >>> >>> > However, in case I know what I am doing, it there anyway possible to >>> modify >>> > the values directly at datanodes. >>> > The modifications are not over column over which distribution happens. >>> If you mean by connecting directly to the Datanodes, no. You would >>> break data consistency if table is replicated by the way by doing >>> that. Let the Coordinator planner do the job and choose the remote >>> nodes for you. >>> >>> There have been discussion to merge Coordinators and Datanodes >>> together though. This would allow what you say, with a simpler cluster >>> design. >>> -- >>> Michael >>> >> >> >> >> ------------------------------------------------------------------------------ >> October Webinars: Code for Performance >> Free Intel webinars can help you accelerate application performance. >> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most >> from >> the latest Intel processors and coprocessors. See abstracts and register > >> >> http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk >> _______________________________________________ >> Postgres-xc-general mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general >> >> > > > -- > Best Wishes, > Ashutosh Bapat > EnterpriseDB Corporation > The Postgres Database Company > |
|
From: Ashutosh B. <ash...@en...> - 2013-10-08 04:04:25
|
The changes have to be in pgxc_make_modifytable() to construct a shippable DML out of the scan plan and the modification plan. Changes will be needed in ExecUpdate/Delete/Insert for execution of this shippable DML. Also, there will be changes needed in pgxc_shippability_walker() to improve FQS technique for the same. On Mon, Oct 7, 2013 at 9:10 PM, Sandeep Gupta <gup...@gm...>wrote: > Hi Ashutosh, > > Thanks for the note. I cannot commit right away. However, whenever you > have some time can you mention relevant portions in the codebase where > changes have to be made. I have a general understanding of the execution > engine. I will take a look see if it feasible for me. > > Thanks. > Sandeep > > > > On Sun, Oct 6, 2013 at 11:53 PM, Ashutosh Bapat < > ash...@en...> wrote: > >> >> >> >> On Sat, Oct 5, 2013 at 7:44 PM, Sandeep Gupta <gup...@gm...>wrote: >> >>> Thanks Michael. I understand. The only issue is that we have an update >>> query as >>> >>> update T set T.a = -1 from A where A.x = T.x >>> >>> >>> Both A and T and distributed by x column. The problem is that >>> coordinator first does the join and then >>> calls update several times at each datanode. This is turning out to be >>> too slow. Would have >>> been better if the entire query was shipped to the datanodes. >>> >>> >> Right now there is no way to ship a DML with more than one relation >> involved there. But that's something, I have been thinking about. If you >> have developer resources and can produce a patch. I can help. >> >> >>> Thanks. >>> Sandeep >>> >>> >>> >>> On Sat, Oct 5, 2013 at 6:27 AM, Michael Paquier < >>> mic...@gm...> wrote: >>> >>>> On Sat, Oct 5, 2013 at 2:58 AM, Sandeep Gupta <gup...@gm...> >>>> wrote: >>>> > I understand that the datanodes are read only and that >>>> updates/insert can >>>> > happen at coordinator. >>>> You got it. >>>> >>>> > Also, it does not allow modification of column over which the records >>>> are distributed. >>>> Hum no, 1.1 allows ALTER TABLE that you can use to change the >>>> distribution type of a table. >>>> >>>> > However, in case I know what I am doing, it there anyway possible to >>>> modify >>>> > the values directly at datanodes. >>>> > The modifications are not over column over which distribution happens. >>>> If you mean by connecting directly to the Datanodes, no. You would >>>> break data consistency if table is replicated by the way by doing >>>> that. Let the Coordinator planner do the job and choose the remote >>>> nodes for you. >>>> >>>> There have been discussion to merge Coordinators and Datanodes >>>> together though. This would allow what you say, with a simpler cluster >>>> design. >>>> -- >>>> Michael >>>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> October Webinars: Code for Performance >>> Free Intel webinars can help you accelerate application performance. >>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most >>> from >>> the latest Intel processors and coprocessors. See abstracts and register >>> > >>> >>> http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk >>> _______________________________________________ >>> Postgres-xc-general mailing list >>> Pos...@li... >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general >>> >>> >> >> >> -- >> Best Wishes, >> Ashutosh Bapat >> EnterpriseDB Corporation >> The Postgres Database Company >> > > -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company |