From: Ashutosh B. <ash...@en...> - 2013-08-08 06:24:51
|
Yes, that's a possibility. Although, we need a robust executor interface, so that this separation would work without any problem. But, unfortunately, current RemoteQuery executor is fragile. On Thu, Aug 8, 2013 at 11:41 AM, Koichi Suzuki <koi...@gm...>wrote: > Can it be optional so that planner can choose if ExecInitRemoteQuery just > issue a query or wait for the first row to come? Then it is a planner > issue which to choose. > > > 2013/8/7 Ashutosh Bapat <ash...@en...> > >> >> >> >> On Wed, Aug 7, 2013 at 1:57 PM, Koichi Suzuki <koi...@gm...>wrote: >> >>> Hi; >>> >>> >>> 2013/8/6 Ashutosh Bapat <ash...@en...> >>> >>>> Hi All, >>>> I worked on this (patch attached) but I didn't see any performance >>>> improvement. In fact, to my surprise the performance actually dropped. >>>> >>>> On a true cluster, with 5 coordinators and 5 datanodes, I saw these >>>> numbers with 1000000 rows in a table with two integer columns. The times >>>> are average times required for 100 runs of the same query, returning whole >>>> rows ordered by one of the columns. >>>> No ORDER BY push-down - 1640 ms >>>> ORDER BY push-down using Sort plan - 1100 ms (this is the current >>>> implementation) >>>> ORDER BY Push-down using MergeAppend plan - 1170 ms >>>> >>>> The assumption behind trying this out was explained in my earlier mail >>>> in this thread. But it looks like we do not get any improvement. >>>> >>>> One of the possible reasons, I think, is involvement of multiple >>>> RemoteQuery nodes in MergeAppend plan. While executing MergeAppend plan, >>>> the ExecProcNode() is called on the underlying node, one after the other. >>>> For every RemoteQuery node in MergeAppend plan it waits till that node >>>> emits a row. This means that we do not execute the queries on datanodes in >>>> parallel when executing through MergeAppend plan unlike Sort plan where >>>> there is only one RemoteQuery node involved. >>>> >>>> In order to get full advantage of MergeAppend plan (and keep the code >>>> clean), I think we need to query the datanodes in parallel somewhere >>>> outside ExecProcNode(). Thoughts? Suggestions? >>>> >>> >>> Can we divide MergeAppend execution into two stage, one is just to issue >>> a query and then begin to receive rows. Because this is a change in the >>> core, we should submit the idea, the usecase and the code to PG community. >>> >>> >> This can be done if we move "issueing query" inside ExecInitRemoteQuery, >> but I am not sure if that can be applied in all the cases. This wouldn't >> require any change in PG. >> >> >>> Regards; >>> --- >>> Koichi Suzuki >>> >>> >>>> >>>> On Fri, May 24, 2013 at 1:30 AM, Ashutosh Bapat < >>>> ash...@en...> wrote: >>>> >>>>> Hi All, >>>>> While I hear Robert's presentation on query planning go wrong, I heard >>>>> (first time) about MergeAppend plan. It looks like, we have something >>>>> better to do with the way we are pushing the ORDER BY clauses and doing >>>>> merge at the coordinator. Right now we are using Sort node and in ExecSort >>>>> we start merging the rows. Instead, I think we should be using MergeAppend >>>>> node. The only bad thing about this approach is MergeAppend node expects >>>>> separate plans for each run of the sorted data. To do this, we need to >>>>> replicate the RemoteQuery node as many times as there are nodes and put >>>>> them in the list to the MergeAppend plan. I think that's going to solve the >>>>> problem with materialisation (see mail thread "Using remote sorting for >>>>> merge-join") and thus improve performance. It also takes away the need to >>>>> have xc_node_id in the tupleslot of tuplestore. Obviously, simplifying the >>>>> code. >>>>> >>>>> Any comments? >>>>> >>>>> Created 3613815 >>>>> <https://sourceforge.net/tracker/?func=detail&aid=3613815&group_id=311227&atid=1310232>and >>>>> assigned to me. >>>>> <https://sourceforge.net/tracker/?func=detail&aid=3613815&group_id=311227&atid=1310232> >>>>> >>>>> On Mon, Dec 3, 2012 at 3:44 AM, Ashutosh Bapat < >>>>> ash...@en...> wrote: >>>>> >>>>>> On Fri, Nov 30, 2012 at 9:37 AM, Michael Paquier >>>>>> <mic...@gm...> wrote: >>>>>> > Hi Ashutosh, >>>>>> > >>>>>> > Thanks for the patch. >>>>>> > I found for the time being the following problems: >>>>>> > 1) The patch has some whitespaces >>>>>> > /home/michael/download/xc_sort.patch:16: trailing whitespace. >>>>>> > >>>>>> > /home/michael/download/xc_sort.patch:34: trailing whitespace. >>>>>> > >>>>>> > /home/michael/download/xc_sort.patch:307: trailing whitespace. >>>>>> > return local_plan_left; >>>>>> > /home/michael/download/xc_sort.patch:356: trailing whitespace. >>>>>> > >>>>>> > /home/michael/download/xc_sort.patch:360: trailing whitespace. >>>>>> > >>>>>> >>>>>> Fixed. >>>>>> >>>>>> > error: patch failed: src/include/pgxc/execRemote.h:95 >>>>>> > 2) the patch does not apply cleanly even by using "patch -p1"-like >>>>>> commands >>>>>> >>>>>> When I sent the patch (more than a week before you replied) it was >>>>>> based on the HEAD, but then there were some commits which created the >>>>>> conflicts. The one attached should apply. >>>>>> >>>>>> > 3) No documentation about the new option enable_remotesort. >>>>>> > 4) enable_remotesort is not listed in postgresql.conf. >>>>>> > Please provide both documentation and reference in >>>>>> postgresql.conf.sample >>>>>> > for transparency of the code. As an open source project it is >>>>>> important to >>>>>> > classify all the options available, for both the user and the >>>>>> developers. >>>>>> >>>>>> Added although, I still don't approve of it myself. >>>>>> >>>>>> > 5) What is cruelly missing with this patch are numbers. By how much >>>>>> this >>>>>> > feature improves performance? >>>>>> > Simple tests like a ORDER BY on a table with millions of rows would >>>>>> be >>>>>> > enough, but as this is a performance feature, you should provide >>>>>> data about >>>>>> > how much those cases are improved. >>>>>> > 6) A complain that I say for each planner-related patch in this >>>>>> project.... >>>>>> > This patch has a size of 200KB, 65% being updates on the EXPLAIN >>>>>> queries of >>>>>> > the XC query planning regression tests. This makes the commit tree >>>>>> > unreadable and the evolution of the regression tests realy hard to >>>>>> follow >>>>>> > because they need to be modified each time a planner optimization is >>>>>> > introduced. We should move to a set of tests which is far more >>>>>> > code-modification proof! >>>>>> >>>>>> This is planner code, so it's going to affect the plans that it >>>>>> creates. In the test files we add EXPLAIN (with proper switches) to >>>>>> check 1. whether the plans are coming as expected at that stage of >>>>>> development, 2. Whether the clauses expected to be pushed down are >>>>>> being indeed pushed down 3. the queries to be sent to the datanodes >>>>>> are being crafted correctly. For every planner change, all the three >>>>>> things above change, for the operation being planned and hence the >>>>>> expected output changes are unavoidable. In fact, if you look at how >>>>>> the expected output changing you get a idea of how correctly/wrong the >>>>>> planner changes are made. Those diffs if go wrong, are great help to >>>>>> me in finding problems earlier. During merge as well, if those tests >>>>>> fail because of EXPLAIN outputs, that's a way to unwanted planner >>>>>> changes early enough and correct them. Sorry, for the huge change, but >>>>>> it's unavoidable. The testcases are intentionally prone to (planner) >>>>>> code modification. >>>>>> >>>>>> > >>>>>> > Regards, >>>>>> > >>>>>> > On Mon, Nov 26, 2012 at 8:42 PM, Ashutosh Bapat >>>>>> > <ash...@en...> wrote: >>>>>> >> >>>>>> >> Hi All, >>>>>> >> PFA patch which enables pushing down the ORDER BY clause or >>>>>> sorting to the >>>>>> >> Datanode/s through standard_planner(). >>>>>> >> >>>>>> >> This patch leverages the already existing merge sort mechanism for >>>>>> >> RemoteQuery but now outside RemoteQuery using Sort plans. >>>>>> >> >>>>>> >> For a Sort plan it checks if all the expressions on which the >>>>>> result needs >>>>>> >> to be sorted are shippable and that the underlying tree allows to >>>>>> push ORDER >>>>>> >> BY to the Datanodes. If so, it includes the ORDER BY clause in the >>>>>> query to >>>>>> >> be sent to the Datanodes. If the number of Datanodes which >>>>>> participate in >>>>>> >> RemoteQuery is not 1, it adds a covering Sort node to merge the >>>>>> results from >>>>>> >> the Datanodes, otherwise, there is no covering Sort plan. >>>>>> >> >>>>>> >> The merge sort mechanism treats every Datanode as a tape and >>>>>> prefetches >>>>>> >> the results from the Datanode corresponding to a dwindling tape. >>>>>> This is >>>>>> >> enabled by setting the node from where to pull the next row before >>>>>> calling >>>>>> >> ExecProcNode on RemoteQueryState. >>>>>> >> >>>>>> >> The same mechanism is being used for Sort plan node under Group or >>>>>> Agg >>>>>> >> nodes. >>>>>> >> >>>>>> >> The patch adds test xc_sort.sql for testing this functionality. It >>>>>> also >>>>>> >> removes the now useless members sort and tuplesortstate from >>>>>> RemoteQuery and >>>>>> >> RemoteQueryState resp. >>>>>> >> -- >>>>>> >> Best Wishes, >>>>>> >> Ashutosh Bapat >>>>>> >> EntepriseDB Corporation >>>>>> >> The Enterprise Postgres Company >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> ------------------------------------------------------------------------------ >>>>>> >> Monitor your physical, virtual and cloud infrastructure from a >>>>>> single >>>>>> >> web console. Get in-depth insight into apps, servers, databases, >>>>>> vmware, >>>>>> >> SAP, cloud infrastructure, etc. Download 30-day Free Trial. >>>>>> >> Pricing starts from $795 for 25 servers or applications! >>>>>> >> http://p.sf.net/sfu/zoho_dev2dev_nov >>>>>> >> _______________________________________________ >>>>>> >> Postgres-xc-developers mailing list >>>>>> >> Pos...@li... >>>>>> >> >>>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>>>> >> >>>>>> > >>>>>> > >>>>>> > >>>>>> > -- >>>>>> > Michael Paquier >>>>>> > http://michael.otacoo.com >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Best Wishes, >>>>>> Ashutosh Bapat >>>>>> EntepriseDB Corporation >>>>>> The Enterprise Postgres Company >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Best Wishes, >>>>> Ashutosh Bapat >>>>> EntepriseDB Corporation >>>>> The Postgres Database Company >>>>> >>>> >>>> >>>> >>>> -- >>>> Best Wishes, >>>> Ashutosh Bapat >>>> EntepriseDB Corporation >>>> The Postgres Database Company >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Get your SQL database under version control now! >>>> Version control is standard for application code, but databases havent >>>> caught up. So what steps can you take to put your SQL databases under >>>> version control? Why should you start doing it? Read more to find out. >>>> >>>> http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk >>>> >>>> _______________________________________________ >>>> Postgres-xc-developers mailing list >>>> Pos...@li... >>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>> >>>> >>> >> >> >> -- >> Best Wishes, >> Ashutosh Bapat >> EntepriseDB Corporation >> The Postgres Database Company >> > > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Postgres Database Company |