|
From: Ashutosh B. <ash...@en...> - 2013-10-17 12:19:24
|
That sounds a good idea. Can you please provide a patch? On Thu, Oct 17, 2013 at 5:41 PM, Sandeep Gupta <gup...@gm...>wrote: > I maybe missing something. Is it the case that postgres or pgxc doesn't > support > streaming aggregates? That would allow aggregation over sorted streams to > produce a > aggregated stream that is also sorted. > However, I am not too sure about this. > > -Sandeep > > > > > On 10/17/2013 07:13 AM, Ashutosh Bapat wrote: > > That's not possible right now. For grouping, either the grouped input from > the datanode gets shuffled at the coordinator or ordered on the grouping > column. Thus even if we get the ordered intput from datanode on a > particular column that order is disturbed at the coordinator because of > grouping. > > > On Thu, Oct 17, 2013 at 4:37 PM, Sandeep Gupta <gup...@gm...>wrote: > >> How about a plan where the datanodes perform the sort as well and the >> coordinator performs a sorted merge? >> >> Are such plans not part of the query planner? >> >> -Sandeep >> >> >> On 10/17/2013 07:04 AM, Ashutosh Bapat wrote: >> >> There is GROUP BY clause that needs to be evaluated before the result can >> be ordered. Thus GROUP BY is sent to the datanode but not ORDER BY. >> >> >> On Thu, Oct 17, 2013 at 4:31 PM, Sandeep Gupta <gup...@gm...>wrote: >> >>> Hi Ashutosh, >>> >>> >>> Attached below is the query and the corresponding query plan. I am >>> using version 1.1. >>> Thanks for taking a look at this. >>> >>> -Sandeep >>> >>> SELECT exposed_time effectedDate, ROUND(COUNT(a.pid)/10) COUNT FROM public.vt_demography_info_xc d, public.ses_vt_20130805_xc a WHERE d.pid=a.pid AND d.countyid='50015' AND d.age BETWEEN 5 AND 18 AND d.gender=1 GROUP BY exposed_time ORDER BY exposed_time; >>> >>> >>> >>> >>> >>> QUERY PLAN (Coordinator) >>> >>> Sort (cost=10000000005.03..10000000005.03 rows=1 width=8) >>> Output: a.exposed_time, (round(((count((count(a.pid))) / 10))::double precision)) >>> Sort Key: a.exposed_time >>> -> HashAggregate (cost=5.00..5.02 rows=1 width=8) >>> Output: a.exposed_time, round(((count((count(a.pid))) / 10))::double precision) >>> -> Data Node Scan on "__REMOTE_GROUP_QUERY__" (cost=0.00..0.00 rows=1000 width=8) >>> Output: a.exposed_time, (count(a.pid)) >>> Node/s: datanode1, datanode10, datanode11, datanode12, datanode13, datanode14, datanode15, datanode16, datanode2, datanode3, datanode4, datanode5, datanode6, datanode7, datanode8, datanode9 >>> Remote query: SELECT r.a_1, count(r.a_2) FROM ((SELECT d.pid FROM ONLY public.vt_demography_info_xc d WHERE ((d.age >= 5) AND (d.age <= 18) AND ((d.countyid)::text = '50015'::text) AND (d.gender = 1))) l(a_1) JOIN (SELECT a.exposed_time, a.pid FROM ONLY public.ses_vt_20130805_xc a WHERE ((a.exposed_time >= 4667) AND (a.exposed_time <= 5031))) r(a_1, a_2) ON (true)) WHERE (l.a_1 = r.a_2) GROUP BY 1 >>> (9 rows) >>> >>> QUERY PLAN (Datanode) >>> >>> GroupAggregate (cost=0.00..47862.29 rows=225 width=8) >>> Output: a.exposed_time, round(((count(a.pid) / 10))::double precision) >>> -> Nested Loop (cost=0.00..47856.05 rows=460 width=8) >>> Output: a.exposed_time, a.pid >>> -> Index Scan using et_ses on public.ses_vt_20130805_xc a (cost=0.00..7283.10 rows=129583 width=8) >>> Output: a.pid, a.rep, a.exposed_time, a.infectious_time, a.recovered_time >>> Index Cond: ((a.exposed_time >= 4667) AND (a.exposed_time <= 5031)) >>> -> Index Scan using pid_demo on public.vt_demography_info_xc d (cost=0.00..0.30 rows=1 width=4) >>> Output: d.pid, d.hid, d.age, d.gender, d.zipode, d.blockgroupid, d.longitude, d.lattitude, d.county, d.countyid >>> Index Cond: (d.pid = a.pid) >>> Filter: ((d.age >= 5) AND (d.age <= 18) AND ((d.countyid)::text = '50015'::text) AND (d.gender = 1)) >>> (11 rows) >>> >>> >>> >>> >>> On Thu, Oct 17, 2013 at 12:12 AM, Ashutosh Bapat < >>> ash...@en...> wrote: >>> >>>> Sandeep, >>>> It would be nice if you mention the version of XC in your mail. Sort >>>> push down is available from 1.1 onwards. If you do not see sort getting >>>> pushed down in 1.1, please report detailed definitions of the tables, query >>>> and the EXPLAIN output. >>>> >>>> >>>> On Thu, Oct 17, 2013 at 1:09 AM, Sandeep Gupta < >>>> gup...@gm...> wrote: >>>> >>>>> Hi, >>>>> >>>>> In an another query that requires the result to be aggregated and >>>>> ordered by a field (lets say timeo) >>>>> the query planner currently pulls the results and then performs a >>>>> sort with hash aggregate. >>>>> >>>>> The table at the datanodes are clustered by timeo. I was wondering if >>>>> it possible >>>>> for query planner to push down the order by clause at the datanode and >>>>> then perform >>>>> sort-merge aggregate at the coordinator. Surely, that would be a >>>>> better query plan. >>>>> >>>>> We have tried enable_sort=off etc. but that doesn't work. >>>>> >>>>> Thanks. >>>>> Sandeep >>>>> >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> October Webinars: Code for Performance >>>>> Free Intel webinars can help you accelerate application performance. >>>>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the >>>>> most from >>>>> the latest Intel processors and coprocessors. See abstracts and >>>>> register > >>>>> >>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk >>>>> _______________________________________________ >>>>> Postgres-xc-general mailing list >>>>> Pos...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general >>>>> >>>>> >>>> >>>> >>>> -- >>>> Best Wishes, >>>> Ashutosh Bapat >>>> EnterpriseDB Corporation >>>> The Postgres Database Company >>>> >>> >>> >> >> >> -- >> Best Wishes, >> Ashutosh Bapat >> EnterpriseDB Corporation >> The Postgres Database Company >> >> >> > > > -- > Best Wishes, > Ashutosh Bapat > EnterpriseDB Corporation > The Postgres Database Company > > > -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company |