From: Koichi S. <koi...@gm...> - 2013-08-08 06:12:06
|
Can it be optional so that planner can choose if ExecInitRemoteQuery just issue a query or wait for the first row to come? Then it is a planner issue which to choose. 2013/8/7 Ashutosh Bapat <ash...@en...> > > > > On Wed, Aug 7, 2013 at 1:57 PM, Koichi Suzuki <koi...@gm...>wrote: > >> Hi; >> >> >> 2013/8/6 Ashutosh Bapat <ash...@en...> >> >>> Hi All, >>> I worked on this (patch attached) but I didn't see any performance >>> improvement. In fact, to my surprise the performance actually dropped. >>> >>> On a true cluster, with 5 coordinators and 5 datanodes, I saw these >>> numbers with 1000000 rows in a table with two integer columns. The times >>> are average times required for 100 runs of the same query, returning whole >>> rows ordered by one of the columns. >>> No ORDER BY push-down - 1640 ms >>> ORDER BY push-down using Sort plan - 1100 ms (this is the current >>> implementation) >>> ORDER BY Push-down using MergeAppend plan - 1170 ms >>> >>> The assumption behind trying this out was explained in my earlier mail >>> in this thread. But it looks like we do not get any improvement. >>> >>> One of the possible reasons, I think, is involvement of multiple >>> RemoteQuery nodes in MergeAppend plan. While executing MergeAppend plan, >>> the ExecProcNode() is called on the underlying node, one after the other. >>> For every RemoteQuery node in MergeAppend plan it waits till that node >>> emits a row. This means that we do not execute the queries on datanodes in >>> parallel when executing through MergeAppend plan unlike Sort plan where >>> there is only one RemoteQuery node involved. >>> >>> In order to get full advantage of MergeAppend plan (and keep the code >>> clean), I think we need to query the datanodes in parallel somewhere >>> outside ExecProcNode(). Thoughts? Suggestions? >>> >> >> Can we divide MergeAppend execution into two stage, one is just to issue >> a query and then begin to receive rows. Because this is a change in the >> core, we should submit the idea, the usecase and the code to PG community. >> >> > This can be done if we move "issueing query" inside ExecInitRemoteQuery, > but I am not sure if that can be applied in all the cases. This wouldn't > require any change in PG. > > >> Regards; >> --- >> Koichi Suzuki >> >> >>> >>> On Fri, May 24, 2013 at 1:30 AM, Ashutosh Bapat < >>> ash...@en...> wrote: >>> >>>> Hi All, >>>> While I hear Robert's presentation on query planning go wrong, I heard >>>> (first time) about MergeAppend plan. It looks like, we have something >>>> better to do with the way we are pushing the ORDER BY clauses and doing >>>> merge at the coordinator. Right now we are using Sort node and in ExecSort >>>> we start merging the rows. Instead, I think we should be using MergeAppend >>>> node. The only bad thing about this approach is MergeAppend node expects >>>> separate plans for each run of the sorted data. To do this, we need to >>>> replicate the RemoteQuery node as many times as there are nodes and put >>>> them in the list to the MergeAppend plan. I think that's going to solve the >>>> problem with materialisation (see mail thread "Using remote sorting for >>>> merge-join") and thus improve performance. It also takes away the need to >>>> have xc_node_id in the tupleslot of tuplestore. Obviously, simplifying the >>>> code. >>>> >>>> Any comments? >>>> >>>> Created 3613815 >>>> <https://sourceforge.net/tracker/?func=detail&aid=3613815&group_id=311227&atid=1310232>and >>>> assigned to me. >>>> <https://sourceforge.net/tracker/?func=detail&aid=3613815&group_id=311227&atid=1310232> >>>> >>>> On Mon, Dec 3, 2012 at 3:44 AM, Ashutosh Bapat < >>>> ash...@en...> wrote: >>>> >>>>> On Fri, Nov 30, 2012 at 9:37 AM, Michael Paquier >>>>> <mic...@gm...> wrote: >>>>> > Hi Ashutosh, >>>>> > >>>>> > Thanks for the patch. >>>>> > I found for the time being the following problems: >>>>> > 1) The patch has some whitespaces >>>>> > /home/michael/download/xc_sort.patch:16: trailing whitespace. >>>>> > >>>>> > /home/michael/download/xc_sort.patch:34: trailing whitespace. >>>>> > >>>>> > /home/michael/download/xc_sort.patch:307: trailing whitespace. >>>>> > return local_plan_left; >>>>> > /home/michael/download/xc_sort.patch:356: trailing whitespace. >>>>> > >>>>> > /home/michael/download/xc_sort.patch:360: trailing whitespace. >>>>> > >>>>> >>>>> Fixed. >>>>> >>>>> > error: patch failed: src/include/pgxc/execRemote.h:95 >>>>> > 2) the patch does not apply cleanly even by using "patch -p1"-like >>>>> commands >>>>> >>>>> When I sent the patch (more than a week before you replied) it was >>>>> based on the HEAD, but then there were some commits which created the >>>>> conflicts. The one attached should apply. >>>>> >>>>> > 3) No documentation about the new option enable_remotesort. >>>>> > 4) enable_remotesort is not listed in postgresql.conf. >>>>> > Please provide both documentation and reference in >>>>> postgresql.conf.sample >>>>> > for transparency of the code. As an open source project it is >>>>> important to >>>>> > classify all the options available, for both the user and the >>>>> developers. >>>>> >>>>> Added although, I still don't approve of it myself. >>>>> >>>>> > 5) What is cruelly missing with this patch are numbers. By how much >>>>> this >>>>> > feature improves performance? >>>>> > Simple tests like a ORDER BY on a table with millions of rows would >>>>> be >>>>> > enough, but as this is a performance feature, you should provide >>>>> data about >>>>> > how much those cases are improved. >>>>> > 6) A complain that I say for each planner-related patch in this >>>>> project.... >>>>> > This patch has a size of 200KB, 65% being updates on the EXPLAIN >>>>> queries of >>>>> > the XC query planning regression tests. This makes the commit tree >>>>> > unreadable and the evolution of the regression tests realy hard to >>>>> follow >>>>> > because they need to be modified each time a planner optimization is >>>>> > introduced. We should move to a set of tests which is far more >>>>> > code-modification proof! >>>>> >>>>> This is planner code, so it's going to affect the plans that it >>>>> creates. In the test files we add EXPLAIN (with proper switches) to >>>>> check 1. whether the plans are coming as expected at that stage of >>>>> development, 2. Whether the clauses expected to be pushed down are >>>>> being indeed pushed down 3. the queries to be sent to the datanodes >>>>> are being crafted correctly. For every planner change, all the three >>>>> things above change, for the operation being planned and hence the >>>>> expected output changes are unavoidable. In fact, if you look at how >>>>> the expected output changing you get a idea of how correctly/wrong the >>>>> planner changes are made. Those diffs if go wrong, are great help to >>>>> me in finding problems earlier. During merge as well, if those tests >>>>> fail because of EXPLAIN outputs, that's a way to unwanted planner >>>>> changes early enough and correct them. Sorry, for the huge change, but >>>>> it's unavoidable. The testcases are intentionally prone to (planner) >>>>> code modification. >>>>> >>>>> > >>>>> > Regards, >>>>> > >>>>> > On Mon, Nov 26, 2012 at 8:42 PM, Ashutosh Bapat >>>>> > <ash...@en...> wrote: >>>>> >> >>>>> >> Hi All, >>>>> >> PFA patch which enables pushing down the ORDER BY clause or sorting >>>>> to the >>>>> >> Datanode/s through standard_planner(). >>>>> >> >>>>> >> This patch leverages the already existing merge sort mechanism for >>>>> >> RemoteQuery but now outside RemoteQuery using Sort plans. >>>>> >> >>>>> >> For a Sort plan it checks if all the expressions on which the >>>>> result needs >>>>> >> to be sorted are shippable and that the underlying tree allows to >>>>> push ORDER >>>>> >> BY to the Datanodes. If so, it includes the ORDER BY clause in the >>>>> query to >>>>> >> be sent to the Datanodes. If the number of Datanodes which >>>>> participate in >>>>> >> RemoteQuery is not 1, it adds a covering Sort node to merge the >>>>> results from >>>>> >> the Datanodes, otherwise, there is no covering Sort plan. >>>>> >> >>>>> >> The merge sort mechanism treats every Datanode as a tape and >>>>> prefetches >>>>> >> the results from the Datanode corresponding to a dwindling tape. >>>>> This is >>>>> >> enabled by setting the node from where to pull the next row before >>>>> calling >>>>> >> ExecProcNode on RemoteQueryState. >>>>> >> >>>>> >> The same mechanism is being used for Sort plan node under Group or >>>>> Agg >>>>> >> nodes. >>>>> >> >>>>> >> The patch adds test xc_sort.sql for testing this functionality. It >>>>> also >>>>> >> removes the now useless members sort and tuplesortstate from >>>>> RemoteQuery and >>>>> >> RemoteQueryState resp. >>>>> >> -- >>>>> >> Best Wishes, >>>>> >> Ashutosh Bapat >>>>> >> EntepriseDB Corporation >>>>> >> The Enterprise Postgres Company >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> ------------------------------------------------------------------------------ >>>>> >> Monitor your physical, virtual and cloud infrastructure from a >>>>> single >>>>> >> web console. Get in-depth insight into apps, servers, databases, >>>>> vmware, >>>>> >> SAP, cloud infrastructure, etc. Download 30-day Free Trial. >>>>> >> Pricing starts from $795 for 25 servers or applications! >>>>> >> http://p.sf.net/sfu/zoho_dev2dev_nov >>>>> >> _______________________________________________ >>>>> >> Postgres-xc-developers mailing list >>>>> >> Pos...@li... >>>>> >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>>> >> >>>>> > >>>>> > >>>>> > >>>>> > -- >>>>> > Michael Paquier >>>>> > http://michael.otacoo.com >>>>> >>>>> >>>>> >>>>> -- >>>>> Best Wishes, >>>>> Ashutosh Bapat >>>>> EntepriseDB Corporation >>>>> The Enterprise Postgres Company >>>>> >>>> >>>> >>>> >>>> -- >>>> Best Wishes, >>>> Ashutosh Bapat >>>> EntepriseDB Corporation >>>> The Postgres Database Company >>>> >>> >>> >>> >>> -- >>> Best Wishes, >>> Ashutosh Bapat >>> EntepriseDB Corporation >>> The Postgres Database Company >>> >>> >>> ------------------------------------------------------------------------------ >>> Get your SQL database under version control now! >>> Version control is standard for application code, but databases havent >>> caught up. So what steps can you take to put your SQL databases under >>> version control? Why should you start doing it? Read more to find out. >>> >>> http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk >>> >>> _______________________________________________ >>> Postgres-xc-developers mailing list >>> Pos...@li... >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>> >>> >> > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Postgres Database Company > |