|
From: Sandeep G. <gup...@gm...> - 2013-10-05 14:48:09
|
Hi Michael,
Sure. I am using pgxc v.1.
For the query explain verbose update person set intervened = 84 from d1_sum
WHERE person.pid=d1_sum.pid;
Query Plan:
Update on public.person (cost=0.00..0.00 rows=1000 width=24)
Node/s: datanode1
Node expr: person.pid
Remote query: UPDATE ONLY public.person SET intervened = $3 WHERE
((person.ctid = $4) AND (person.xc_node_id = $5))
-> Data Node Scan on "_REMOTE_TABLE_QUERY_" (cost=0.00..0.00 rows=1000
width=24)
Output: person.pid, person.persons, 84, person.pid, person.ctid,
person.xc_node_id, d1_sum.ctid
Node/s: datanode1
Remote query: SELECT l.a_1, l.a_2, l.a_3, l.a_4, r.a_1 FROM
((SELECT person.pid, person.persons, person.ctid, person.xc_node_i
d FROM ONLY public.person WHERE true) l(a_1, a_2, a_3, a_4) JOIN (SELECT
d1_sum.ctid, d1_sum.pid FROM ONLY public.d1_sum WHERE true) r(
a_1, a_2) ON (true)) WHERE (l.a_1 = r.a_2)
(8 rows)
For the second style
explain verbose update person set intervened = 84 where person.pid =
(select d1_sum.pid from d1_sum,person WHERE person.pid=d1_sum.pid);
Update on public.person (cost=0.00..0.00 rows=1000 width=18)
Node/s: datanode1
Node expr: public.person.pid
Remote query: UPDATE ONLY public.person SET intervened = $3 WHERE
((person.ctid = $4) AND (person.xc_node_id = $5))
InitPlan 1 (returns $0)
-> Data Node Scan on "_REMOTE_TABLE_QUERY_" (cost=0.00..0.00
rows=1000 width=4)
Output: d1_sum.pid
Node/s: datanode1
Remote query: SELECT l.a_1 FROM ((SELECT d1_sum.pid FROM ONLY
public.d1_sum WHERE true) l(a_1) JOIN (SELECT person.pid FROM
ONLY public.person WHERE true) r(a_1) ON (true)) WHERE (l.a_1 = r.a_1)
-> Data Node Scan on person "_REMOTE_TABLE_QUERY_" (cost=0.00..0.00
rows=1000 width=18)
Output: public.person.pid, public.person.persons, 84,
public.person.pid, public.person.ctid, public.person.xc_node_id
Node/s: datanode1
Remote query: SELECT pid, persons, ctid, xc_node_id FROM ONLY
public.person WHERE true
Coordinator quals: (public.person.pid = $0)
In both the scenarios the planner breaks it into two parts: update and
join. The results of join is pulled up at the coordinator and then shipped
one by one for update.
Thanks for taking a look.
-Sandeep
On Sat, Oct 5, 2013 at 10:26 AM, Michael Paquier
<mic...@gm...>wrote:
> On Sat, Oct 5, 2013 at 11:14 PM, Sandeep Gupta <gup...@gm...>
> wrote:
> > Thanks Michael. I understand. The only issue is that we have an update
> > query as
> >
> > update T set T.a = -1 from A where A.x = T.x
> >
> >
> > Both A and T and distributed by x column. The problem is that coordinator
> > first does the join and then
> > calls update several times at each datanode. This is turning out to be
> too
> > slow. Would have
> > been better if the entire query was shipped to the datanodes.
> Hum?! Logically, I would imagine that if A and T are distributed by x
> this WHERE clause should be pushed down as the SET clause is a
> constant. However perhaps UPDATE FROM does not have an explicit
> support... Could you provide the version number and an EXPLAIN VERBOSE
> output?
>
> What if you put the where join in a subquery or a WITH clause? Like
> that for example:
> update T set T.a = -1 where A.x = (select A.x from A,T where A.x = T.x);
> --
> Michael
>
|