|
From: Sandeep G. <gup...@gm...> - 2013-10-17 12:11:36
|
I maybe missing something. Is it the case that postgres or pgxc doesn't support streaming aggregates? That would allow aggregation over sorted streams to produce a aggregated stream that is also sorted. However, I am not too sure about this. -Sandeep On 10/17/2013 07:13 AM, Ashutosh Bapat wrote: > That's not possible right now. For grouping, either the grouped input > from the datanode gets shuffled at the coordinator or ordered on the > grouping column. Thus even if we get the ordered intput from datanode > on a particular column that order is disturbed at the coordinator > because of grouping. > > > On Thu, Oct 17, 2013 at 4:37 PM, Sandeep Gupta > <gup...@gm... <mailto:gup...@gm...>> wrote: > > How about a plan where the datanodes perform the sort as well and the > coordinator performs a sorted merge? > > Are such plans not part of the query planner? > > -Sandeep > > > On 10/17/2013 07:04 AM, Ashutosh Bapat wrote: >> There is GROUP BY clause that needs to be evaluated before the >> result can be ordered. Thus GROUP BY is sent to the datanode but >> not ORDER BY. >> >> >> On Thu, Oct 17, 2013 at 4:31 PM, Sandeep Gupta >> <gup...@gm... <mailto:gup...@gm...>> wrote: >> >> Hi Ashutosh, >> >> >> Attached below is the query and the corresponding query >> plan. I am using version 1.1. >> Thanks for taking a look at this. >> >> -Sandeep >> >> SELECT exposed_time effectedDate, ROUND(COUNT(a.pid)/10) COUNT FROM public.vt_demography_info_xc d, public.ses_vt_20130805_xc a WHERE d.pid=a.pid AND d.countyid='50015' AND d.age BETWEEN 5 AND 18 AND d.gender=1 GROUP BY exposed_time ORDER BY exposed_time; >> >> >> >> >> QUERY PLAN (Coordinator) >> >> Sort (cost=10000000005.03..10000000005.03 rows=1 width=8) >> Output: a.exposed_time, (round(((count((count(a.pid))) / 10))::double precision)) >> Sort Key: a.exposed_time >> -> HashAggregate (cost=5.00..5.02 rows=1 width=8) >> Output: a.exposed_time, round(((count((count(a.pid))) / 10))::double precision) >> -> Data Node Scan on "__REMOTE_GROUP_QUERY__" (cost=0.00..0.00 rows=1000 width=8) >> Output: a.exposed_time, (count(a.pid)) >> Node/s: datanode1, datanode10, datanode11, datanode12, datanode13, datanode14, datanode15, datanode16, datanode2, datanode3, datanode4, datanode5, datanode6, datanode7, datanode8, datanode9 >> Remote query: SELECT r.a_1, count(r.a_2) FROM ((SELECT d.pid FROM ONLY public.vt_demography_info_xc d WHERE ((d.age >= 5) AND (d.age <= 18) AND ((d.countyid)::text = '50015'::text) AND (d.gender = 1))) l(a_1) JOIN (SELECT a.exposed_time, a.pid FROM ONLY public.ses_vt_20130805_xc a WHERE ((a.exposed_time >= 4667) AND (a.exposed_time <= 5031))) r(a_1, a_2) ON (true)) WHERE (l.a_1 = r.a_2) GROUP BY 1 >> (9 rows) >> >> QUERY PLAN (Datanode) >> >> GroupAggregate (cost=0.00..47862.29 rows=225 width=8) >> Output: a.exposed_time, round(((count(a.pid) / 10))::double precision) >> -> Nested Loop (cost=0.00..47856.05 rows=460 width=8) >> Output: a.exposed_time, a.pid >> -> Index Scan using et_ses on public.ses_vt_20130805_xc a (cost=0.00..7283.10 rows=129583 width=8) >> Output: a.pid, a.rep, a.exposed_time, a.infectious_time, a.recovered_time >> Index Cond: ((a.exposed_time >= 4667) AND (a.exposed_time <= 5031)) >> -> Index Scan using pid_demo on public.vt_demography_info_xc d (cost=0.00..0.30 rows=1 width=4) >> Output: d.pid, d.hid, d.age, d.gender, d.zipode, d.blockgroupid, d.longitude, d.lattitude, d.county, d.countyid >> Index Cond: (d.pid = a.pid) >> Filter: ((d.age >= 5) AND (d.age <= 18) AND ((d.countyid)::text = '50015'::text) AND (d.gender = 1)) >> (11 rows) >> >> >> >> >> On Thu, Oct 17, 2013 at 12:12 AM, Ashutosh Bapat >> <ash...@en... >> <mailto:ash...@en...>> wrote: >> >> Sandeep, >> It would be nice if you mention the version of XC in your >> mail. Sort push down is available from 1.1 onwards. If >> you do not see sort getting pushed down in 1.1, please >> report detailed definitions of the tables, query and the >> EXPLAIN output. >> >> >> On Thu, Oct 17, 2013 at 1:09 AM, Sandeep Gupta >> <gup...@gm... >> <mailto:gup...@gm...>> wrote: >> >> Hi, >> >> In an another query that requires the result to be >> aggregated and ordered by a field (lets say timeo) >> the query planner currently pulls the results and >> then performs a sort with hash aggregate. >> >> The table at the datanodes are clustered by timeo. I >> was wondering if it possible >> for query planner to push down the order by clause at >> the datanode and then perform >> sort-merge aggregate at the coordinator. Surely, that >> would be a better query plan. >> >> We have tried enable_sort=off etc. but that doesn't >> work. >> >> Thanks. >> Sandeep >> >> >> >> ------------------------------------------------------------------------------ >> October Webinars: Code for Performance >> Free Intel webinars can help you accelerate >> application performance. >> Explore tips for MPI, OpenMP, advanced profiling, and >> more. Get the most from >> the latest Intel processors and coprocessors. See >> abstracts and register > >> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk >> _______________________________________________ >> Postgres-xc-general mailing list >> Pos...@li... >> <mailto:Pos...@li...> >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general >> >> >> >> >> -- >> Best Wishes, >> Ashutosh Bapat >> EnterpriseDB Corporation >> The Postgres Database Company >> >> >> >> >> >> -- >> Best Wishes, >> Ashutosh Bapat >> EnterpriseDB Corporation >> The Postgres Database Company > > > > > -- > Best Wishes, > Ashutosh Bapat > EnterpriseDB Corporation > The Postgres Database Company |