Re: [Postgres-xc-developers] Pushing down ORDER BY clauses or Sorting

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi;

2013/8/6 Ashutosh Bapat <ash...@en...>

> Hi All,
> I worked on this (patch attached) but I didn't see any performance
> improvement. In fact, to my surprise the performance actually dropped.
>
> On a true cluster, with 5 coordinators and 5 datanodes, I saw these
> numbers with 1000000 rows in a table with two integer columns. The times
> are average times required for 100 runs of the same query, returning whole
> rows ordered by one of the columns.
> No ORDER BY push-down - 1640 ms
> ORDER BY push-down using Sort plan - 1100 ms (this is the current
> implementation)
> ORDER BY Push-down using MergeAppend plan - 1170 ms
>
> The assumption behind trying this out was explained in my earlier mail in
> this thread. But it looks like we do not get any improvement.
>
> One of the possible reasons, I think, is involvement of multiple
> RemoteQuery nodes in MergeAppend plan. While executing MergeAppend plan,
> the ExecProcNode() is called on the underlying node, one after the other.
> For every RemoteQuery node in MergeAppend plan it waits till that node
> emits a row. This means that we do not execute the queries on datanodes in
> parallel when executing through MergeAppend plan unlike Sort plan where
> there is only one RemoteQuery node involved.
>
> In order to get full advantage of MergeAppend plan (and keep the code
> clean), I think we need to query the datanodes in parallel somewhere
> outside ExecProcNode(). Thoughts? Suggestions?
>

Can we divide MergeAppend execution into two stage, one is just to issue a
query and then begin to receive rows.   Because this is a change in the
core, we should submit the idea, the usecase and the code to PG community.

Regards;
---
Koichi Suzuki

>
> On Fri, May 24, 2013 at 1:30 AM, Ashutosh Bapat <
> ash...@en...> wrote:
>
>> Hi All,
>> While I hear Robert's presentation on query planning go wrong, I heard
>> (first time) about MergeAppend plan. It looks like, we have something
>> better to do with the way we are pushing the ORDER BY clauses and doing
>> merge at the coordinator. Right now we are using Sort node and in ExecSort
>> we start merging the rows. Instead, I think we should be using MergeAppend
>> node. The only bad thing about this approach is MergeAppend node expects
>> separate plans for each run of the sorted data. To do this, we need to
>> replicate the RemoteQuery node as many times as there are nodes and put
>> them in the list to the MergeAppend plan. I think that's going to solve the
>> problem with materialisation (see mail thread "Using remote sorting for
>> merge-join") and thus improve performance. It also takes away the need to
>> have xc_node_id in the tupleslot of tuplestore. Obviously, simplifying the
>> code.
>>
>> Any comments?
>>
>> Created 3613815
>> <https://sourceforge.net/tracker/?func=detail&aid=3613815&group_id=311227&atid=1310232>and
>> assigned to me.
>> <https://sourceforge.net/tracker/?func=detail&aid=3613815&group_id=311227&atid=1310232>
>>
>> On Mon, Dec 3, 2012 at 3:44 AM, Ashutosh Bapat <
>> ash...@en...> wrote:
>>
>>> On Fri, Nov 30, 2012 at 9:37 AM, Michael Paquier
>>> <mic...@gm...> wrote:
>>> > Hi Ashutosh,
>>> >
>>> > Thanks for the patch.
>>> > I found for the time being the following problems:
>>> > 1) The patch has some whitespaces
>>> > /home/michael/download/xc_sort.patch:16: trailing whitespace.
>>> >
>>> > /home/michael/download/xc_sort.patch:34: trailing whitespace.
>>> >
>>> > /home/michael/download/xc_sort.patch:307: trailing whitespace.
>>> >         return local_plan_left;
>>> > /home/michael/download/xc_sort.patch:356: trailing whitespace.
>>> >
>>> > /home/michael/download/xc_sort.patch:360: trailing whitespace.
>>> >
>>>
>>> Fixed.
>>>
>>> > error: patch failed: src/include/pgxc/execRemote.h:95
>>> > 2) the patch does not apply cleanly even by using "patch -p1"-like
>>> commands
>>>
>>> When I sent the patch (more than a week before you replied) it was
>>> based on the HEAD, but then there were some commits which created the
>>> conflicts. The one attached should apply.
>>>
>>> > 3) No documentation about the new option enable_remotesort.
>>> > 4) enable_remotesort is not listed in postgresql.conf.
>>> > Please provide both documentation and reference in
>>> postgresql.conf.sample
>>> > for transparency of the code. As an open source project it is
>>> important to
>>> > classify all the options available, for both the user and the
>>> developers.
>>>
>>> Added although, I still don't approve of it myself.
>>>
>>> > 5) What is cruelly missing with this patch are numbers. By how much
>>> this
>>> > feature improves performance?
>>> > Simple tests like a ORDER BY on a table with millions of rows would be
>>> > enough, but as this is a performance feature, you should provide data
>>> about
>>> > how much those cases are improved.
>>> > 6) A complain that I say for each planner-related patch in this
>>> project....
>>> > This patch has a size of 200KB, 65% being updates on the EXPLAIN
>>> queries of
>>> > the XC query planning regression tests. This makes the commit tree
>>> > unreadable and the evolution of the regression tests realy hard to
>>> follow
>>> > because they need to be modified each time a planner optimization is
>>> > introduced. We should move to a set of tests which is far more
>>> > code-modification proof!
>>>
>>> This is planner code, so it's going to affect the plans that it
>>> creates. In the test files we add EXPLAIN (with proper switches) to
>>> check 1. whether the plans are coming as expected at that stage of
>>> development, 2. Whether the clauses expected to be pushed down are
>>> being indeed pushed down 3. the queries to be sent to the datanodes
>>> are being crafted correctly. For every planner change, all the three
>>> things above change, for the operation being planned and hence the
>>> expected output changes are unavoidable. In fact, if you look at how
>>> the expected output changing you get a idea of how correctly/wrong the
>>> planner changes are made. Those diffs if go wrong, are great help to
>>> me in finding problems earlier. During merge as well, if those tests
>>> fail because of EXPLAIN outputs, that's a way to unwanted planner
>>> changes early enough and correct them. Sorry, for the huge change, but
>>> it's unavoidable. The testcases are intentionally prone to (planner)
>>> code modification.
>>>
>>> >
>>> > Regards,
>>> >
>>> > On Mon, Nov 26, 2012 at 8:42 PM, Ashutosh Bapat
>>> > <ash...@en...> wrote:
>>> >>
>>> >> Hi All,
>>> >> PFA patch which enables pushing down the ORDER BY clause or sorting
>>> to the
>>> >> Datanode/s through standard_planner().
>>> >>
>>> >> This patch leverages the already existing merge sort mechanism for
>>> >> RemoteQuery but now outside RemoteQuery using Sort plans.
>>> >>
>>> >> For a Sort plan it checks if all the expressions on which the result
>>> needs
>>> >> to be sorted are shippable and that the underlying tree allows to
>>> push ORDER
>>> >> BY to the Datanodes. If so, it includes the ORDER BY clause in the
>>> query to
>>> >> be sent to the Datanodes. If the number of Datanodes which
>>> participate in
>>> >> RemoteQuery is not 1, it adds a covering Sort node to merge the
>>> results from
>>> >> the Datanodes, otherwise, there is no covering Sort plan.
>>> >>
>>> >> The merge sort mechanism treats every Datanode as a tape and
>>> prefetches
>>> >> the results from the Datanode corresponding to a dwindling tape. This
>>> is
>>> >> enabled by setting the node from where to pull the next row before
>>> calling
>>> >> ExecProcNode on RemoteQueryState.
>>> >>
>>> >> The same mechanism is being used for Sort plan node under Group or Agg
>>> >> nodes.
>>> >>
>>> >> The patch adds test xc_sort.sql for testing this functionality. It
>>> also
>>> >> removes the now useless members sort and tuplesortstate from
>>> RemoteQuery and
>>> >> RemoteQueryState resp.
>>> >> --
>>> >> Best Wishes,
>>> >> Ashutosh Bapat
>>> >> EntepriseDB Corporation
>>> >> The Enterprise Postgres Company
>>> >>
>>> >>
>>> >>
>>> >>
>>> ------------------------------------------------------------------------------
>>> >> Monitor your physical, virtual and cloud infrastructure from a single
>>> >> web console. Get in-depth insight into apps, servers, databases,
>>> vmware,
>>> >> SAP, cloud infrastructure, etc. Download 30-day Free Trial.
>>> >> Pricing starts from $795 for 25 servers or applications!
>>> >> http://p.sf.net/sfu/zoho_dev2dev_nov
>>> >> _______________________________________________
>>> >> Postgres-xc-developers mailing list
>>> >> Pos...@li...
>>> >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Michael Paquier
>>> > http://michael.otacoo.com
>>>
>>>
>>>
>>> --
>>> Best Wishes,
>>> Ashutosh Bapat
>>> EntepriseDB Corporation
>>> The Enterprise Postgres Company
>>>
>>
>>
>>
>> --
>> Best Wishes,
>> Ashutosh Bapat
>> EntepriseDB Corporation
>> The Postgres Database Company
>>
>
>
>
> --
> Best Wishes,
> Ashutosh Bapat
> EntepriseDB Corporation
> The Postgres Database Company
>
>
> ------------------------------------------------------------------------------
> Get your SQL database under version control now!
> Version control is standard for application code, but databases havent
> caught up. So what steps can you take to put your SQL databases under
> version control? Why should you start doing it? Read more to find out.
> http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
> _______________________________________________
> Postgres-xc-developers mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>
>