Re: [Postgres-xc-developers] Pushing down ORDER BY clauses or Sorting

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Can it be optional so that planner can choose if ExecInitRemoteQuery just
issue a query or wait for the first row to come?  Then it is a planner
issue which to choose.

2013/8/7 Ashutosh Bapat <ash...@en...>

>
>
>
> On Wed, Aug 7, 2013 at 1:57 PM, Koichi Suzuki <koi...@gm...>wrote:
>
>> Hi;
>>
>>
>> 2013/8/6 Ashutosh Bapat <ash...@en...>
>>
>>> Hi All,
>>> I worked on this (patch attached) but I didn't see any performance
>>> improvement. In fact, to my surprise the performance actually dropped.
>>>
>>> On a true cluster, with 5 coordinators and 5 datanodes, I saw these
>>> numbers with 1000000 rows in a table with two integer columns. The times
>>> are average times required for 100 runs of the same query, returning whole
>>> rows ordered by one of the columns.
>>> No ORDER BY push-down - 1640 ms
>>> ORDER BY push-down using Sort plan - 1100 ms (this is the current
>>> implementation)
>>> ORDER BY Push-down using MergeAppend plan - 1170 ms
>>>
>>> The assumption behind trying this out was explained in my earlier mail
>>> in this thread. But it looks like we do not get any improvement.
>>>
>>> One of the possible reasons, I think, is involvement of multiple
>>> RemoteQuery nodes in MergeAppend plan. While executing MergeAppend plan,
>>> the ExecProcNode() is called on the underlying node, one after the other.
>>> For every RemoteQuery node in MergeAppend plan it waits till that node
>>> emits a row. This means that we do not execute the queries on datanodes in
>>> parallel when executing through MergeAppend plan unlike Sort plan where
>>> there is only one RemoteQuery node involved.
>>>
>>> In order to get full advantage of MergeAppend plan (and keep the code
>>> clean), I think we need to query the datanodes in parallel somewhere
>>> outside ExecProcNode(). Thoughts? Suggestions?
>>>
>>
>> Can we divide MergeAppend execution into two stage, one is just to issue
>> a query and then begin to receive rows.   Because this is a change in the
>> core, we should submit the idea, the usecase and the code to PG community.
>>
>>
> This can be done if we move "issueing query" inside ExecInitRemoteQuery,
> but I am not sure if that can be applied in all the cases. This wouldn't
> require any change in PG.
>
>
>> Regards;
>> ---
>> Koichi Suzuki
>>
>>
>>>
>>> On Fri, May 24, 2013 at 1:30 AM, Ashutosh Bapat <
>>> ash...@en...> wrote:
>>>
>>>> Hi All,
>>>> While I hear Robert's presentation on query planning go wrong, I heard
>>>> (first time) about MergeAppend plan. It looks like, we have something
>>>> better to do with the way we are pushing the ORDER BY clauses and doing
>>>> merge at the coordinator. Right now we are using Sort node and in ExecSort
>>>> we start merging the rows. Instead, I think we should be using MergeAppend
>>>> node. The only bad thing about this approach is MergeAppend node expects
>>>> separate plans for each run of the sorted data. To do this, we need to
>>>> replicate the RemoteQuery node as many times as there are nodes and put
>>>> them in the list to the MergeAppend plan. I think that's going to solve the
>>>> problem with materialisation (see mail thread "Using remote sorting for
>>>> merge-join") and thus improve performance. It also takes away the need to
>>>> have xc_node_id in the tupleslot of tuplestore. Obviously, simplifying the
>>>> code.
>>>>
>>>> Any comments?
>>>>
>>>> Created 3613815
>>>> <https://sourceforge.net/tracker/?func=detail&aid=3613815&group_id=311227&atid=1310232>and
>>>> assigned to me.
>>>> <https://sourceforge.net/tracker/?func=detail&aid=3613815&group_id=311227&atid=1310232>
>>>>
>>>> On Mon, Dec 3, 2012 at 3:44 AM, Ashutosh Bapat <
>>>> ash...@en...> wrote:
>>>>
>>>>> On Fri, Nov 30, 2012 at 9:37 AM, Michael Paquier
>>>>> <mic...@gm...> wrote:
>>>>> > Hi Ashutosh,
>>>>> >
>>>>> > Thanks for the patch.
>>>>> > I found for the time being the following problems:
>>>>> > 1) The patch has some whitespaces
>>>>> > /home/michael/download/xc_sort.patch:16: trailing whitespace.
>>>>> >
>>>>> > /home/michael/download/xc_sort.patch:34: trailing whitespace.
>>>>> >
>>>>> > /home/michael/download/xc_sort.patch:307: trailing whitespace.
>>>>> >         return local_plan_left;
>>>>> > /home/michael/download/xc_sort.patch:356: trailing whitespace.
>>>>> >
>>>>> > /home/michael/download/xc_sort.patch:360: trailing whitespace.
>>>>> >
>>>>>
>>>>> Fixed.
>>>>>
>>>>> > error: patch failed: src/include/pgxc/execRemote.h:95
>>>>> > 2) the patch does not apply cleanly even by using "patch -p1"-like
>>>>> commands
>>>>>
>>>>> When I sent the patch (more than a week before you replied) it was
>>>>> based on the HEAD, but then there were some commits which created the
>>>>> conflicts. The one attached should apply.
>>>>>
>>>>> > 3) No documentation about the new option enable_remotesort.
>>>>> > 4) enable_remotesort is not listed in postgresql.conf.
>>>>> > Please provide both documentation and reference in
>>>>> postgresql.conf.sample
>>>>> > for transparency of the code. As an open source project it is
>>>>> important to
>>>>> > classify all the options available, for both the user and the
>>>>> developers.
>>>>>
>>>>> Added although, I still don't approve of it myself.
>>>>>
>>>>> > 5) What is cruelly missing with this patch are numbers. By how much
>>>>> this
>>>>> > feature improves performance?
>>>>> > Simple tests like a ORDER BY on a table with millions of rows would
>>>>> be
>>>>> > enough, but as this is a performance feature, you should provide
>>>>> data about
>>>>> > how much those cases are improved.
>>>>> > 6) A complain that I say for each planner-related patch in this
>>>>> project....
>>>>> > This patch has a size of 200KB, 65% being updates on the EXPLAIN
>>>>> queries of
>>>>> > the XC query planning regression tests. This makes the commit tree
>>>>> > unreadable and the evolution of the regression tests realy hard to
>>>>> follow
>>>>> > because they need to be modified each time a planner optimization is
>>>>> > introduced. We should move to a set of tests which is far more
>>>>> > code-modification proof!
>>>>>
>>>>> This is planner code, so it's going to affect the plans that it
>>>>> creates. In the test files we add EXPLAIN (with proper switches) to
>>>>> check 1. whether the plans are coming as expected at that stage of
>>>>> development, 2. Whether the clauses expected to be pushed down are
>>>>> being indeed pushed down 3. the queries to be sent to the datanodes
>>>>> are being crafted correctly. For every planner change, all the three
>>>>> things above change, for the operation being planned and hence the
>>>>> expected output changes are unavoidable. In fact, if you look at how
>>>>> the expected output changing you get a idea of how correctly/wrong the
>>>>> planner changes are made. Those diffs if go wrong, are great help to
>>>>> me in finding problems earlier. During merge as well, if those tests
>>>>> fail because of EXPLAIN outputs, that's a way to unwanted planner
>>>>> changes early enough and correct them. Sorry, for the huge change, but
>>>>> it's unavoidable. The testcases are intentionally prone to (planner)
>>>>> code modification.
>>>>>
>>>>> >
>>>>> > Regards,
>>>>> >
>>>>> > On Mon, Nov 26, 2012 at 8:42 PM, Ashutosh Bapat
>>>>> > <ash...@en...> wrote:
>>>>> >>
>>>>> >> Hi All,
>>>>> >> PFA patch which enables pushing down the ORDER BY clause or sorting
>>>>> to the
>>>>> >> Datanode/s through standard_planner().
>>>>> >>
>>>>> >> This patch leverages the already existing merge sort mechanism for
>>>>> >> RemoteQuery but now outside RemoteQuery using Sort plans.
>>>>> >>
>>>>> >> For a Sort plan it checks if all the expressions on which the
>>>>> result needs
>>>>> >> to be sorted are shippable and that the underlying tree allows to
>>>>> push ORDER
>>>>> >> BY to the Datanodes. If so, it includes the ORDER BY clause in the
>>>>> query to
>>>>> >> be sent to the Datanodes. If the number of Datanodes which
>>>>> participate in
>>>>> >> RemoteQuery is not 1, it adds a covering Sort node to merge the
>>>>> results from
>>>>> >> the Datanodes, otherwise, there is no covering Sort plan.
>>>>> >>
>>>>> >> The merge sort mechanism treats every Datanode as a tape and
>>>>> prefetches
>>>>> >> the results from the Datanode corresponding to a dwindling tape.
>>>>> This is
>>>>> >> enabled by setting the node from where to pull the next row before
>>>>> calling
>>>>> >> ExecProcNode on RemoteQueryState.
>>>>> >>
>>>>> >> The same mechanism is being used for Sort plan node under Group or
>>>>> Agg
>>>>> >> nodes.
>>>>> >>
>>>>> >> The patch adds test xc_sort.sql for testing this functionality. It
>>>>> also
>>>>> >> removes the now useless members sort and tuplesortstate from
>>>>> RemoteQuery and
>>>>> >> RemoteQueryState resp.
>>>>> >> --
>>>>> >> Best Wishes,
>>>>> >> Ashutosh Bapat
>>>>> >> EntepriseDB Corporation
>>>>> >> The Enterprise Postgres Company
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> ------------------------------------------------------------------------------
>>>>> >> Monitor your physical, virtual and cloud infrastructure from a
>>>>> single
>>>>> >> web console. Get in-depth insight into apps, servers, databases,
>>>>> vmware,
>>>>> >> SAP, cloud infrastructure, etc. Download 30-day Free Trial.
>>>>> >> Pricing starts from $795 for 25 servers or applications!
>>>>> >> http://p.sf.net/sfu/zoho_dev2dev_nov
>>>>> >> _______________________________________________
>>>>> >> Postgres-xc-developers mailing list
>>>>> >> Pos...@li...
>>>>> >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>>>>> >>
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> > Michael Paquier
>>>>> > http://michael.otacoo.com
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Wishes,
>>>>> Ashutosh Bapat
>>>>> EntepriseDB Corporation
>>>>> The Enterprise Postgres Company
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Wishes,
>>>> Ashutosh Bapat
>>>> EntepriseDB Corporation
>>>> The Postgres Database Company
>>>>
>>>
>>>
>>>
>>> --
>>> Best Wishes,
>>> Ashutosh Bapat
>>> EntepriseDB Corporation
>>> The Postgres Database Company
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Get your SQL database under version control now!
>>> Version control is standard for application code, but databases havent
>>> caught up. So what steps can you take to put your SQL databases under
>>> version control? Why should you start doing it? Read more to find out.
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
>>>
>>> _______________________________________________
>>> Postgres-xc-developers mailing list
>>> Pos...@li...
>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>>>
>>>
>>
>
>
> --
> Best Wishes,
> Ashutosh Bapat
> EntepriseDB Corporation
> The Postgres Database Company
>