Re: [Postgres-xc-developers] Postgres-xc coordinator operation

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Yes, both the tables are distributed by 'id'.

On Thu, 2015-06-04 at 12:08 +0530, Ashutosh Bapat wrote:
> 
> 
> On Thu, Jun 4, 2015 at 6:38 AM, Abbas Butt
> <abb...@en...> wrote:
>         
>         
>         On Wed, Jun 3, 2015 at 5:10 PM, Manikandan Soundarapandian
>         <ma...@vt...> wrote:
>                 Hi Abbas,
>                 
>                 I just looked at the planner for the query that am
>                 running. This is the query that am running,
>                 
>                 SELECT d.col1 RVALUE, ROUND(COUNT(d.col2)/10) COUNT
>                 FROM Table1 d, Table2 i WHERE d.id=i.id AND i.time
>                 BETWEEN TRUNC(CURRENT_DATE-TO_DATE('01 Jan 2000','DD
>                 Mon YYYY')+1-365) AND TRUNC(CURRENT_DATE-TO_DATE('01
>                 Jan 2000','DD Mon YYYY')) GROUP BY d.col1 ORDER BY
>                 d.col1 DESC;
>                 
>                 And the query plan for a configuration with 4
>                 datanodes and 1 coordinator,
>                 
>                 GroupAggregate  (cost=0.07..0.10 rows=1 width=56)
>                    ->  Sort  (cost=0.07..0.08 rows=1 width=56)
>                          Sort Key: d.col1
>                          ->  Hash Join  (cost=0.01..0.06 rows=1
>                 width=56)
>                                Hash Cond: (d.id = i.id)
>                 
> 
> 
> Are both the tables distributed by id?
> 
> The clauses with mutable functions like current_date are being pushded
> down to individual relations. That prevents join from being pushed
> down to the datanodes if id is distribution key. I think that should
> be fixed. The clause movement should be reversed to the Join node.
> 
>  
>                                ->  Data Node Scan on Table1
>                 "_REMOTE_TABLE_QUERY_"  (cost=0.00..0.00 rows=1000
>                 width=56)
>                                      Node/s: datanode_c2_d1,
>                 datanode_c3_d1, datanode_c4_d1, datanode_c5_d1
>                                ->  Hash  (cost=0.00..0.00 rows=1000
>                 width=32)
>                                      ->  Data Node Scan on Table2
>                 "_REMOTE_TABLE_QUERY_"  (cost=0.00..0.00 rows=1000
>                 width=32)
>                                            Node/s: datanode_c2_d1,
>                 datanode_c3_d1, datanode_c4_d1, datanode_c5_d1
>                                            Coordinator quals:
>                 (((i.time)::double precision <=
>                 trunc(((('now'::cstring)::date - to_date('01 Jan
>                 2000'::text, 'DD Mon YYYY'::text)))::double precisio
>                 n)) AND ((i.time)::double precision >=
>                 trunc(((((('now'::cstring)::date - to_date('01 Jan
>                 2000'::text, 'DD Mon YYYY'::text)) + 1) -
>                 365))::double precision)))
>                 
>                 
>                 Table1 contains 22 million records and Table2 contains
>                 10 million records.
>                 This query took several minutes (almost 35) to get
>                 results and one third of that time is spent in
>                 FetchTuple. Do you see something wrong with the query
>                 plan?
>                 
>         
>         
>         Can you provide output of 
>         
>         \d+ Table1
>         \d+ Table2
>         
>         I am interested in the distribution column of the tables. 
>         
>         
>         Also please provide the output of EXPLAIN VERBOSE on the
>         query.
>         
>         
>         
>                 
>                 Since the multiplexed IO isn't the real bottleneck
>                 here, my proposal would be to change the way how
>                 populating the TupleStore at the coordinator is
>                 working and introduce parallelism there. Is this a
>                 right direction? Please let me know your thoughts.
>                 
>                 Thanks
>                 Mani
>                 
>                 
>                 On Wed, 2015-06-03 at 01:49 -0700, Abbas Butt wrote: 
>                 > 
>                 > 
>                 > On Tue, Jun 2, 2015 at 10:17 PM, Manikandan
>                 > Soundarapandian <ma...@vt...> wrote:
>                 >         Hi Abbas,
>                 >         
>                 >         Thanks for the reply. I understand how
>                 >         pgxc_node_receive is working to
>                 >         get incoming data from the datanodes.
>                 >         Yes, I have made experiments to study the
>                 >         postgres-xc system in a setup
>                 >         with 12 datanodes and a coordinator and
>                 >         performed join, groupby and
>                 >         order by operations on two tables with
>                 >         around 20 million and 10 million
>                 >         records respectively. I have found that
>                 >         around 20-30% of the time is
>                 >         spent in the 'FetchTuple' method which reads
>                 >         into the provided tupleslot
>                 >         one datarow at a time from the combiner's
>                 >         buffer which is in turn filled
>                 >         by the pgxc_node_receive method. 
>                 > 
>                 > XC planner pushes parts of query processing to
>                 > datanodes to speed things up.
>                 > 
>                 > In the queries you tried, did the query planner
>                 > perform this optimization?
>                 > 
>                 > If not then you can gain advantage by improving the
>                 > planner to generate optimized plans that avoid
>                 > UN-necessary materialization at the coordinator.
>                 > 
>                 > This will bring the FetchTuple time down since there
>                 > will be less tuples to fetch.
>                 > 
>                 > 
>                 > If the planner is already producing optimized plans,
>                 > then what's your proposal?
>                 > 
>                 > 
>                 > 
>                 >         
>                 >         Thanks 
>                 >         
>                 >         On Wed, 2015-06-03 at 09:45 +0500, Abbas
>                 >         Butt wrote:
>                 >         >
>                 >         >
>                 >         > On Wed, Jun 3, 2015 at 4:11 AM, Manikandan
>                 >         Soundarapandian
>                 >         > <ma...@vt...> wrote:
>                 >         >         Hi,
>                 >         >
>                 >         >
>                 >         >         I am a graduate student working on
>                 >         my research in parallel
>                 >         >         databases. I would like to know
>                 >         how the postgres-xc
>                 >         >         coordinator works. I understand
>                 >         that the datanodes run the
>                 >         >         query in parallel and the results
>                 >         are collected by the
>                 >         >         coordinator which runs any more
>                 >         computation that is required
>                 >         >         or just provides the output to the
>                 >         client that requested the
>                 >         >         query. I would like to know
>                 >         whether the coordinator does this
>                 >         >         data collection from datanodes in
>                 >         a sequential fashion?
>                 >         >
>                 >         >
>                 >         > The coordinator uses multiplexed IO using
>                 >         select on all fds of
>                 >         > datanodes. For more details please see
>                 >         pgxc_node_receive function in
>                 >         > pgxcnode.c. The loop for reading data on
>                 >         all set fds is sequential,
>                 >         > but the coordinator does not wait for data
>                 >         from the datanode to which
>                 >         > the coordinator had sent the query first.
>                 >         >
>                 >         >         For example, lets consider we want
>                 >         to run the query on table
>                 >         >         table_x which is hash distributed
>                 >         among 10 datanodes,
>                 >         >         select count(*) from table_x;
>                 >         >         Each datanode will run the query
>                 >         and give their local counts
>                 >         >         and the coordinator has to collect
>                 >         the individual counts and
>                 >         >         come up with the final count
>                 >         before sending the output. Is the
>                 >         >         data collection process at the
>                 >         coordinator done in a
>                 >         >         sequential fashion? I am actually
>                 >         looking to introduce some
>                 >         >         kind of parallelism in this data
>                 >         collection if it is
>                 >         >         sequential and do performance
>                 >         studies. Please clarify.
>                 >         >
>                 >         >
>                 >         > To improve performance of any system,
>                 >         first study the bottleneck, and
>                 >         > target to widen that. Have you done any
>                 >         study of Postgres-XC to find
>                 >         > where the performance bottleneck is?
>                 >         >
>                 >         >
>                 >         >
>                 >         >         --
>                 >         >         Thanks
>                 >         >         Mani
>                 >         >         Department of Computer Science
>                 >         >         Virginia Tech
>                 >         >
>                 >         >
>                 >          ------------------------------------------------------------------------------
>                 >         >
>                 >         >
>                 >          _______________________________________________
>                 >         >         Postgres-xc-developers mailing
>                 >         list
>                 >         >
>                 >          Pos...@li...
>                 >         >
>                 >          https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>                 >         >
>                 >         >
>                 >         >
>                 >         >
>                 >         >
>                 >         > --
>                 >         > --
>                 >         > Abbas
>                 >         > Architect
>                 >         >
>                 >         >
>                 >         > Ph: 92.334.5100153
>                 >         >
>                 >         > Skype ID: gabbasb
>                 >         >
>                 >         > www.enterprisedb.com
>                 >         >
>                 >         > Follow us on Twitter
>                 >         > @EnterpriseDB
>                 >         >
>                 >         > Visit EnterpriseDB for tutorials,
>                 >         webinars, whitepapers and more
>                 >         >
>                 >         
>                 >         
>                 >         
>                 > 
>                 > 
>                 > 
>                 > -- 
>                 > -- 
>                 > Abbas
>                 > Architect
>                 > 
>                 > 
>                 > Ph: 92.334.5100153
>                 > 
>                 > Skype ID: gabbasb
>                 > 
>                 > www.enterprisedb.com
>                 > 
>                 > Follow us on Twitter
>                 > @EnterpriseDB 
>                 > 
>                 > Visit EnterpriseDB for tutorials, webinars,
>                 > whitepapers and more
>                 > 
>                 
>                 
>         
>         
>         
>         -- 
>         -- 
>         Abbas
>         Architect
>         
>         
>         Ph: 92.334.5100153
>         
>         Skype ID: gabbasb
>         
>         www.enterprisedb.com
>         
>         Follow us on Twitter
>         @EnterpriseDB 
>         
>         Visit EnterpriseDB for tutorials, webinars, whitepapers and
>         more
>         
>         
>         ------------------------------------------------------------------------------
>         
>         _______________________________________________
>         Postgres-xc-developers mailing list
>         Pos...@li...
>         https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>         
> 
> 
> 
> -- 
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company
>