Re: [Postgres-xc-developers] Pushing down ORDER BY clauses or Sorting

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Yes, that's a possibility. Although, we need a robust executor interface,
so that this separation would work without any problem. But, unfortunately,
current RemoteQuery executor is fragile.

On Thu, Aug 8, 2013 at 11:41 AM, Koichi Suzuki <koi...@gm...>wrote:

> Can it be optional so that planner can choose if ExecInitRemoteQuery just
> issue a query or wait for the first row to come?  Then it is a planner
> issue which to choose.
>
>
> 2013/8/7 Ashutosh Bapat <ash...@en...>
>
>>
>>
>>
>> On Wed, Aug 7, 2013 at 1:57 PM, Koichi Suzuki <koi...@gm...>wrote:
>>
>>> Hi;
>>>
>>>
>>> 2013/8/6 Ashutosh Bapat <ash...@en...>
>>>
>>>> Hi All,
>>>> I worked on this (patch attached) but I didn't see any performance
>>>> improvement. In fact, to my surprise the performance actually dropped.
>>>>
>>>> On a true cluster, with 5 coordinators and 5 datanodes, I saw these
>>>> numbers with 1000000 rows in a table with two integer columns. The times
>>>> are average times required for 100 runs of the same query, returning whole
>>>> rows ordered by one of the columns.
>>>> No ORDER BY push-down - 1640 ms
>>>> ORDER BY push-down using Sort plan - 1100 ms (this is the current
>>>> implementation)
>>>> ORDER BY Push-down using MergeAppend plan - 1170 ms
>>>>
>>>> The assumption behind trying this out was explained in my earlier mail
>>>> in this thread. But it looks like we do not get any improvement.
>>>>
>>>> One of the possible reasons, I think, is involvement of multiple
>>>> RemoteQuery nodes in MergeAppend plan. While executing MergeAppend plan,
>>>> the ExecProcNode() is called on the underlying node, one after the other.
>>>> For every RemoteQuery node in MergeAppend plan it waits till that node
>>>> emits a row. This means that we do not execute the queries on datanodes in
>>>> parallel when executing through MergeAppend plan unlike Sort plan where
>>>> there is only one RemoteQuery node involved.
>>>>
>>>> In order to get full advantage of MergeAppend plan (and keep the code
>>>> clean), I think we need to query the datanodes in parallel somewhere
>>>> outside ExecProcNode(). Thoughts? Suggestions?
>>>>
>>>
>>> Can we divide MergeAppend execution into two stage, one is just to issue
>>> a query and then begin to receive rows.   Because this is a change in the
>>> core, we should submit the idea, the usecase and the code to PG community.
>>>
>>>
>> This can be done if we move "issueing query" inside ExecInitRemoteQuery,
>> but I am not sure if that can be applied in all the cases. This wouldn't
>> require any change in PG.
>>
>>
>>> Regards;
>>> ---
>>> Koichi Suzuki
>>>
>>>
>>>>
>>>> On Fri, May 24, 2013 at 1:30 AM, Ashutosh Bapat <
>>>> ash...@en...> wrote:
>>>>
>>>>> Hi All,
>>>>> While I hear Robert's presentation on query planning go wrong, I heard
>>>>> (first time) about MergeAppend plan. It looks like, we have something
>>>>> better to do with the way we are pushing the ORDER BY clauses and doing
>>>>> merge at the coordinator. Right now we are using Sort node and in ExecSort
>>>>> we start merging the rows. Instead, I think we should be using MergeAppend
>>>>> node. The only bad thing about this approach is MergeAppend node expects
>>>>> separate plans for each run of the sorted data. To do this, we need to
>>>>> replicate the RemoteQuery node as many times as there are nodes and put
>>>>> them in the list to the MergeAppend plan. I think that's going to solve the
>>>>> problem with materialisation (see mail thread "Using remote sorting for
>>>>> merge-join") and thus improve performance. It also takes away the need to
>>>>> have xc_node_id in the tupleslot of tuplestore. Obviously, simplifying the
>>>>> code.
>>>>>
>>>>> Any comments?
>>>>>
>>>>> Created 3613815
>>>>> <https://sourceforge.net/tracker/?func=detail&aid=3613815&group_id=311227&atid=1310232>and
>>>>> assigned to me.
>>>>> <https://sourceforge.net/tracker/?func=detail&aid=3613815&group_id=311227&atid=1310232>
>>>>>
>>>>> On Mon, Dec 3, 2012 at 3:44 AM, Ashutosh Bapat <
>>>>> ash...@en...> wrote:
>>>>>
>>>>>> On Fri, Nov 30, 2012 at 9:37 AM, Michael Paquier
>>>>>> <mic...@gm...> wrote:
>>>>>> > Hi Ashutosh,
>>>>>> >
>>>>>> > Thanks for the patch.
>>>>>> > I found for the time being the following problems:
>>>>>> > 1) The patch has some whitespaces
>>>>>> > /home/michael/download/xc_sort.patch:16: trailing whitespace.
>>>>>> >
>>>>>> > /home/michael/download/xc_sort.patch:34: trailing whitespace.
>>>>>> >
>>>>>> > /home/michael/download/xc_sort.patch:307: trailing whitespace.
>>>>>> >         return local_plan_left;
>>>>>> > /home/michael/download/xc_sort.patch:356: trailing whitespace.
>>>>>> >
>>>>>> > /home/michael/download/xc_sort.patch:360: trailing whitespace.
>>>>>> >
>>>>>>
>>>>>> Fixed.
>>>>>>
>>>>>> > error: patch failed: src/include/pgxc/execRemote.h:95
>>>>>> > 2) the patch does not apply cleanly even by using "patch -p1"-like
>>>>>> commands
>>>>>>
>>>>>> When I sent the patch (more than a week before you replied) it was
>>>>>> based on the HEAD, but then there were some commits which created the
>>>>>> conflicts. The one attached should apply.
>>>>>>
>>>>>> > 3) No documentation about the new option enable_remotesort.
>>>>>> > 4) enable_remotesort is not listed in postgresql.conf.
>>>>>> > Please provide both documentation and reference in
>>>>>> postgresql.conf.sample
>>>>>> > for transparency of the code. As an open source project it is
>>>>>> important to
>>>>>> > classify all the options available, for both the user and the
>>>>>> developers.
>>>>>>
>>>>>> Added although, I still don't approve of it myself.
>>>>>>
>>>>>> > 5) What is cruelly missing with this patch are numbers. By how much
>>>>>> this
>>>>>> > feature improves performance?
>>>>>> > Simple tests like a ORDER BY on a table with millions of rows would
>>>>>> be
>>>>>> > enough, but as this is a performance feature, you should provide
>>>>>> data about
>>>>>> > how much those cases are improved.
>>>>>> > 6) A complain that I say for each planner-related patch in this
>>>>>> project....
>>>>>> > This patch has a size of 200KB, 65% being updates on the EXPLAIN
>>>>>> queries of
>>>>>> > the XC query planning regression tests. This makes the commit tree
>>>>>> > unreadable and the evolution of the regression tests realy hard to
>>>>>> follow
>>>>>> > because they need to be modified each time a planner optimization is
>>>>>> > introduced. We should move to a set of tests which is far more
>>>>>> > code-modification proof!
>>>>>>
>>>>>> This is planner code, so it's going to affect the plans that it
>>>>>> creates. In the test files we add EXPLAIN (with proper switches) to
>>>>>> check 1. whether the plans are coming as expected at that stage of
>>>>>> development, 2. Whether the clauses expected to be pushed down are
>>>>>> being indeed pushed down 3. the queries to be sent to the datanodes
>>>>>> are being crafted correctly. For every planner change, all the three
>>>>>> things above change, for the operation being planned and hence the
>>>>>> expected output changes are unavoidable. In fact, if you look at how
>>>>>> the expected output changing you get a idea of how correctly/wrong the
>>>>>> planner changes are made. Those diffs if go wrong, are great help to
>>>>>> me in finding problems earlier. During merge as well, if those tests
>>>>>> fail because of EXPLAIN outputs, that's a way to unwanted planner
>>>>>> changes early enough and correct them. Sorry, for the huge change, but
>>>>>> it's unavoidable. The testcases are intentionally prone to (planner)
>>>>>> code modification.
>>>>>>
>>>>>> >
>>>>>> > Regards,
>>>>>> >
>>>>>> > On Mon, Nov 26, 2012 at 8:42 PM, Ashutosh Bapat
>>>>>> > <ash...@en...> wrote:
>>>>>> >>
>>>>>> >> Hi All,
>>>>>> >> PFA patch which enables pushing down the ORDER BY clause or
>>>>>> sorting to the
>>>>>> >> Datanode/s through standard_planner().
>>>>>> >>
>>>>>> >> This patch leverages the already existing merge sort mechanism for
>>>>>> >> RemoteQuery but now outside RemoteQuery using Sort plans.
>>>>>> >>
>>>>>> >> For a Sort plan it checks if all the expressions on which the
>>>>>> result needs
>>>>>> >> to be sorted are shippable and that the underlying tree allows to
>>>>>> push ORDER
>>>>>> >> BY to the Datanodes. If so, it includes the ORDER BY clause in the
>>>>>> query to
>>>>>> >> be sent to the Datanodes. If the number of Datanodes which
>>>>>> participate in
>>>>>> >> RemoteQuery is not 1, it adds a covering Sort node to merge the
>>>>>> results from
>>>>>> >> the Datanodes, otherwise, there is no covering Sort plan.
>>>>>> >>
>>>>>> >> The merge sort mechanism treats every Datanode as a tape and
>>>>>> prefetches
>>>>>> >> the results from the Datanode corresponding to a dwindling tape.
>>>>>> This is
>>>>>> >> enabled by setting the node from where to pull the next row before
>>>>>> calling
>>>>>> >> ExecProcNode on RemoteQueryState.
>>>>>> >>
>>>>>> >> The same mechanism is being used for Sort plan node under Group or
>>>>>> Agg
>>>>>> >> nodes.
>>>>>> >>
>>>>>> >> The patch adds test xc_sort.sql for testing this functionality. It
>>>>>> also
>>>>>> >> removes the now useless members sort and tuplesortstate from
>>>>>> RemoteQuery and
>>>>>> >> RemoteQueryState resp.
>>>>>> >> --
>>>>>> >> Best Wishes,
>>>>>> >> Ashutosh Bapat
>>>>>> >> EntepriseDB Corporation
>>>>>> >> The Enterprise Postgres Company
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> ------------------------------------------------------------------------------
>>>>>> >> Monitor your physical, virtual and cloud infrastructure from a
>>>>>> single
>>>>>> >> web console. Get in-depth insight into apps, servers, databases,
>>>>>> vmware,
>>>>>> >> SAP, cloud infrastructure, etc. Download 30-day Free Trial.
>>>>>> >> Pricing starts from $795 for 25 servers or applications!
>>>>>> >> http://p.sf.net/sfu/zoho_dev2dev_nov
>>>>>> >> _______________________________________________
>>>>>> >> Postgres-xc-developers mailing list
>>>>>> >> Pos...@li...
>>>>>> >>
>>>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>>>>>> >>
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > Michael Paquier
>>>>>> > http://michael.otacoo.com
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Wishes,
>>>>>> Ashutosh Bapat
>>>>>> EntepriseDB Corporation
>>>>>> The Enterprise Postgres Company
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Wishes,
>>>>> Ashutosh Bapat
>>>>> EntepriseDB Corporation
>>>>> The Postgres Database Company
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Wishes,
>>>> Ashutosh Bapat
>>>> EntepriseDB Corporation
>>>> The Postgres Database Company
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Get your SQL database under version control now!
>>>> Version control is standard for application code, but databases havent
>>>> caught up. So what steps can you take to put your SQL databases under
>>>> version control? Why should you start doing it? Read more to find out.
>>>>
>>>> http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
>>>>
>>>> _______________________________________________
>>>> Postgres-xc-developers mailing list
>>>> Pos...@li...
>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>>>>
>>>>
>>>
>>
>>
>> --
>> Best Wishes,
>> Ashutosh Bapat
>> EntepriseDB Corporation
>> The Postgres Database Company
>>
>
>

-- 
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Postgres Database Company