From: Ashutosh B. <ash...@en...> - 2013-05-23 20:00:44
|
Hi All, While I hear Robert's presentation on query planning go wrong, I heard (first time) about MergeAppend plan. It looks like, we have something better to do with the way we are pushing the ORDER BY clauses and doing merge at the coordinator. Right now we are using Sort node and in ExecSort we start merging the rows. Instead, I think we should be using MergeAppend node. The only bad thing about this approach is MergeAppend node expects separate plans for each run of the sorted data. To do this, we need to replicate the RemoteQuery node as many times as there are nodes and put them in the list to the MergeAppend plan. I think that's going to solve the problem with materialisation (see mail thread "Using remote sorting for merge-join") and thus improve performance. It also takes away the need to have xc_node_id in the tupleslot of tuplestore. Obviously, simplifying the code. Any comments? Created 3613815 <https://sourceforge.net/tracker/?func=detail&aid=3613815&group_id=311227&atid=1310232>and assigned to me. <https://sourceforge.net/tracker/?func=detail&aid=3613815&group_id=311227&atid=1310232> On Mon, Dec 3, 2012 at 3:44 AM, Ashutosh Bapat < ash...@en...> wrote: > On Fri, Nov 30, 2012 at 9:37 AM, Michael Paquier > <mic...@gm...> wrote: > > Hi Ashutosh, > > > > Thanks for the patch. > > I found for the time being the following problems: > > 1) The patch has some whitespaces > > /home/michael/download/xc_sort.patch:16: trailing whitespace. > > > > /home/michael/download/xc_sort.patch:34: trailing whitespace. > > > > /home/michael/download/xc_sort.patch:307: trailing whitespace. > > return local_plan_left; > > /home/michael/download/xc_sort.patch:356: trailing whitespace. > > > > /home/michael/download/xc_sort.patch:360: trailing whitespace. > > > > Fixed. > > > error: patch failed: src/include/pgxc/execRemote.h:95 > > 2) the patch does not apply cleanly even by using "patch -p1"-like > commands > > When I sent the patch (more than a week before you replied) it was > based on the HEAD, but then there were some commits which created the > conflicts. The one attached should apply. > > > 3) No documentation about the new option enable_remotesort. > > 4) enable_remotesort is not listed in postgresql.conf. > > Please provide both documentation and reference in postgresql.conf.sample > > for transparency of the code. As an open source project it is important > to > > classify all the options available, for both the user and the developers. > > Added although, I still don't approve of it myself. > > > 5) What is cruelly missing with this patch are numbers. By how much this > > feature improves performance? > > Simple tests like a ORDER BY on a table with millions of rows would be > > enough, but as this is a performance feature, you should provide data > about > > how much those cases are improved. > > 6) A complain that I say for each planner-related patch in this > project.... > > This patch has a size of 200KB, 65% being updates on the EXPLAIN queries > of > > the XC query planning regression tests. This makes the commit tree > > unreadable and the evolution of the regression tests realy hard to follow > > because they need to be modified each time a planner optimization is > > introduced. We should move to a set of tests which is far more > > code-modification proof! > > This is planner code, so it's going to affect the plans that it > creates. In the test files we add EXPLAIN (with proper switches) to > check 1. whether the plans are coming as expected at that stage of > development, 2. Whether the clauses expected to be pushed down are > being indeed pushed down 3. the queries to be sent to the datanodes > are being crafted correctly. For every planner change, all the three > things above change, for the operation being planned and hence the > expected output changes are unavoidable. In fact, if you look at how > the expected output changing you get a idea of how correctly/wrong the > planner changes are made. Those diffs if go wrong, are great help to > me in finding problems earlier. During merge as well, if those tests > fail because of EXPLAIN outputs, that's a way to unwanted planner > changes early enough and correct them. Sorry, for the huge change, but > it's unavoidable. The testcases are intentionally prone to (planner) > code modification. > > > > > Regards, > > > > On Mon, Nov 26, 2012 at 8:42 PM, Ashutosh Bapat > > <ash...@en...> wrote: > >> > >> Hi All, > >> PFA patch which enables pushing down the ORDER BY clause or sorting to > the > >> Datanode/s through standard_planner(). > >> > >> This patch leverages the already existing merge sort mechanism for > >> RemoteQuery but now outside RemoteQuery using Sort plans. > >> > >> For a Sort plan it checks if all the expressions on which the result > needs > >> to be sorted are shippable and that the underlying tree allows to push > ORDER > >> BY to the Datanodes. If so, it includes the ORDER BY clause in the > query to > >> be sent to the Datanodes. If the number of Datanodes which participate > in > >> RemoteQuery is not 1, it adds a covering Sort node to merge the results > from > >> the Datanodes, otherwise, there is no covering Sort plan. > >> > >> The merge sort mechanism treats every Datanode as a tape and prefetches > >> the results from the Datanode corresponding to a dwindling tape. This is > >> enabled by setting the node from where to pull the next row before > calling > >> ExecProcNode on RemoteQueryState. > >> > >> The same mechanism is being used for Sort plan node under Group or Agg > >> nodes. > >> > >> The patch adds test xc_sort.sql for testing this functionality. It also > >> removes the now useless members sort and tuplesortstate from > RemoteQuery and > >> RemoteQueryState resp. > >> -- > >> Best Wishes, > >> Ashutosh Bapat > >> EntepriseDB Corporation > >> The Enterprise Postgres Company > >> > >> > >> > >> > ------------------------------------------------------------------------------ > >> Monitor your physical, virtual and cloud infrastructure from a single > >> web console. Get in-depth insight into apps, servers, databases, vmware, > >> SAP, cloud infrastructure, etc. Download 30-day Free Trial. > >> Pricing starts from $795 for 25 servers or applications! > >> http://p.sf.net/sfu/zoho_dev2dev_nov > >> _______________________________________________ > >> Postgres-xc-developers mailing list > >> Pos...@li... > >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > >> > > > > > > > > -- > > Michael Paquier > > http://michael.otacoo.com > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Postgres Database Company |