From: Koichi S. <koi...@gm...> - 2013-08-07 08:28:07
|
Hi; 2013/8/6 Ashutosh Bapat <ash...@en...> > Hi All, > I worked on this (patch attached) but I didn't see any performance > improvement. In fact, to my surprise the performance actually dropped. > > On a true cluster, with 5 coordinators and 5 datanodes, I saw these > numbers with 1000000 rows in a table with two integer columns. The times > are average times required for 100 runs of the same query, returning whole > rows ordered by one of the columns. > No ORDER BY push-down - 1640 ms > ORDER BY push-down using Sort plan - 1100 ms (this is the current > implementation) > ORDER BY Push-down using MergeAppend plan - 1170 ms > > The assumption behind trying this out was explained in my earlier mail in > this thread. But it looks like we do not get any improvement. > > One of the possible reasons, I think, is involvement of multiple > RemoteQuery nodes in MergeAppend plan. While executing MergeAppend plan, > the ExecProcNode() is called on the underlying node, one after the other. > For every RemoteQuery node in MergeAppend plan it waits till that node > emits a row. This means that we do not execute the queries on datanodes in > parallel when executing through MergeAppend plan unlike Sort plan where > there is only one RemoteQuery node involved. > > In order to get full advantage of MergeAppend plan (and keep the code > clean), I think we need to query the datanodes in parallel somewhere > outside ExecProcNode(). Thoughts? Suggestions? > Can we divide MergeAppend execution into two stage, one is just to issue a query and then begin to receive rows. Because this is a change in the core, we should submit the idea, the usecase and the code to PG community. Regards; --- Koichi Suzuki > > On Fri, May 24, 2013 at 1:30 AM, Ashutosh Bapat < > ash...@en...> wrote: > >> Hi All, >> While I hear Robert's presentation on query planning go wrong, I heard >> (first time) about MergeAppend plan. It looks like, we have something >> better to do with the way we are pushing the ORDER BY clauses and doing >> merge at the coordinator. Right now we are using Sort node and in ExecSort >> we start merging the rows. Instead, I think we should be using MergeAppend >> node. The only bad thing about this approach is MergeAppend node expects >> separate plans for each run of the sorted data. To do this, we need to >> replicate the RemoteQuery node as many times as there are nodes and put >> them in the list to the MergeAppend plan. I think that's going to solve the >> problem with materialisation (see mail thread "Using remote sorting for >> merge-join") and thus improve performance. It also takes away the need to >> have xc_node_id in the tupleslot of tuplestore. Obviously, simplifying the >> code. >> >> Any comments? >> >> Created 3613815 >> <https://sourceforge.net/tracker/?func=detail&aid=3613815&group_id=311227&atid=1310232>and >> assigned to me. >> <https://sourceforge.net/tracker/?func=detail&aid=3613815&group_id=311227&atid=1310232> >> >> On Mon, Dec 3, 2012 at 3:44 AM, Ashutosh Bapat < >> ash...@en...> wrote: >> >>> On Fri, Nov 30, 2012 at 9:37 AM, Michael Paquier >>> <mic...@gm...> wrote: >>> > Hi Ashutosh, >>> > >>> > Thanks for the patch. >>> > I found for the time being the following problems: >>> > 1) The patch has some whitespaces >>> > /home/michael/download/xc_sort.patch:16: trailing whitespace. >>> > >>> > /home/michael/download/xc_sort.patch:34: trailing whitespace. >>> > >>> > /home/michael/download/xc_sort.patch:307: trailing whitespace. >>> > return local_plan_left; >>> > /home/michael/download/xc_sort.patch:356: trailing whitespace. >>> > >>> > /home/michael/download/xc_sort.patch:360: trailing whitespace. >>> > >>> >>> Fixed. >>> >>> > error: patch failed: src/include/pgxc/execRemote.h:95 >>> > 2) the patch does not apply cleanly even by using "patch -p1"-like >>> commands >>> >>> When I sent the patch (more than a week before you replied) it was >>> based on the HEAD, but then there were some commits which created the >>> conflicts. The one attached should apply. >>> >>> > 3) No documentation about the new option enable_remotesort. >>> > 4) enable_remotesort is not listed in postgresql.conf. >>> > Please provide both documentation and reference in >>> postgresql.conf.sample >>> > for transparency of the code. As an open source project it is >>> important to >>> > classify all the options available, for both the user and the >>> developers. >>> >>> Added although, I still don't approve of it myself. >>> >>> > 5) What is cruelly missing with this patch are numbers. By how much >>> this >>> > feature improves performance? >>> > Simple tests like a ORDER BY on a table with millions of rows would be >>> > enough, but as this is a performance feature, you should provide data >>> about >>> > how much those cases are improved. >>> > 6) A complain that I say for each planner-related patch in this >>> project.... >>> > This patch has a size of 200KB, 65% being updates on the EXPLAIN >>> queries of >>> > the XC query planning regression tests. This makes the commit tree >>> > unreadable and the evolution of the regression tests realy hard to >>> follow >>> > because they need to be modified each time a planner optimization is >>> > introduced. We should move to a set of tests which is far more >>> > code-modification proof! >>> >>> This is planner code, so it's going to affect the plans that it >>> creates. In the test files we add EXPLAIN (with proper switches) to >>> check 1. whether the plans are coming as expected at that stage of >>> development, 2. Whether the clauses expected to be pushed down are >>> being indeed pushed down 3. the queries to be sent to the datanodes >>> are being crafted correctly. For every planner change, all the three >>> things above change, for the operation being planned and hence the >>> expected output changes are unavoidable. In fact, if you look at how >>> the expected output changing you get a idea of how correctly/wrong the >>> planner changes are made. Those diffs if go wrong, are great help to >>> me in finding problems earlier. During merge as well, if those tests >>> fail because of EXPLAIN outputs, that's a way to unwanted planner >>> changes early enough and correct them. Sorry, for the huge change, but >>> it's unavoidable. The testcases are intentionally prone to (planner) >>> code modification. >>> >>> > >>> > Regards, >>> > >>> > On Mon, Nov 26, 2012 at 8:42 PM, Ashutosh Bapat >>> > <ash...@en...> wrote: >>> >> >>> >> Hi All, >>> >> PFA patch which enables pushing down the ORDER BY clause or sorting >>> to the >>> >> Datanode/s through standard_planner(). >>> >> >>> >> This patch leverages the already existing merge sort mechanism for >>> >> RemoteQuery but now outside RemoteQuery using Sort plans. >>> >> >>> >> For a Sort plan it checks if all the expressions on which the result >>> needs >>> >> to be sorted are shippable and that the underlying tree allows to >>> push ORDER >>> >> BY to the Datanodes. If so, it includes the ORDER BY clause in the >>> query to >>> >> be sent to the Datanodes. If the number of Datanodes which >>> participate in >>> >> RemoteQuery is not 1, it adds a covering Sort node to merge the >>> results from >>> >> the Datanodes, otherwise, there is no covering Sort plan. >>> >> >>> >> The merge sort mechanism treats every Datanode as a tape and >>> prefetches >>> >> the results from the Datanode corresponding to a dwindling tape. This >>> is >>> >> enabled by setting the node from where to pull the next row before >>> calling >>> >> ExecProcNode on RemoteQueryState. >>> >> >>> >> The same mechanism is being used for Sort plan node under Group or Agg >>> >> nodes. >>> >> >>> >> The patch adds test xc_sort.sql for testing this functionality. It >>> also >>> >> removes the now useless members sort and tuplesortstate from >>> RemoteQuery and >>> >> RemoteQueryState resp. >>> >> -- >>> >> Best Wishes, >>> >> Ashutosh Bapat >>> >> EntepriseDB Corporation >>> >> The Enterprise Postgres Company >>> >> >>> >> >>> >> >>> >> >>> ------------------------------------------------------------------------------ >>> >> Monitor your physical, virtual and cloud infrastructure from a single >>> >> web console. Get in-depth insight into apps, servers, databases, >>> vmware, >>> >> SAP, cloud infrastructure, etc. Download 30-day Free Trial. >>> >> Pricing starts from $795 for 25 servers or applications! >>> >> http://p.sf.net/sfu/zoho_dev2dev_nov >>> >> _______________________________________________ >>> >> Postgres-xc-developers mailing list >>> >> Pos...@li... >>> >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>> >> >>> > >>> > >>> > >>> > -- >>> > Michael Paquier >>> > http://michael.otacoo.com >>> >>> >>> >>> -- >>> Best Wishes, >>> Ashutosh Bapat >>> EntepriseDB Corporation >>> The Enterprise Postgres Company >>> >> >> >> >> -- >> Best Wishes, >> Ashutosh Bapat >> EntepriseDB Corporation >> The Postgres Database Company >> > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Postgres Database Company > > > ------------------------------------------------------------------------------ > Get your SQL database under version control now! > Version control is standard for application code, but databases havent > caught up. So what steps can you take to put your SQL databases under > version control? Why should you start doing it? Read more to find out. > http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > |