Re: [Postgres-xc-developers] Postgres-xc coordinator operation

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Wed, Jun 3, 2015 at 5:10 PM, Manikandan Soundarapandian <ma...@vt...>
wrote:

>  Hi Abbas,
>
> I just looked at the planner for the query that am running. This is the
> query that am running,
>
> *SELECT d.col1 RVALUE, ROUND(COUNT(d.col2)/10) COUNT FROM Table1 d, Table2
> i WHERE d.id <http://d.id>=i.id <http://i.id> AND i.time BETWEEN
> TRUNC(CURRENT_DATE-TO_DATE('01 Jan 2000','DD Mon YYYY')+1-365) AND
> TRUNC(CURRENT_DATE-TO_DATE('01 Jan 2000','DD Mon YYYY')) GROUP BY d.col1
> ORDER BY d.col1 DESC;*
>
> And the query plan for a configuration with 4 datanodes and 1 coordinator,
>
> *GroupAggregate  (cost=0.07..0.10 rows=1 width=56)*
> *   ->  Sort  (cost=0.07..0.08 rows=1 width=56)*
> *         Sort Key: d.col1*
> *         ->  Hash Join  (cost=0.01..0.06 rows=1 width=56)*
> *               Hash Cond: (d.id <http://d.id> = i.id <http://i.id>)*
> *               ->  Data Node Scan on Table1 "_REMOTE_TABLE_QUERY_"
> (cost=0.00..0.00 rows=1000 width=56)*
> *                     Node/s: datanode_c2_d1, datanode_c3_d1,
> datanode_c4_d1, datanode_c5_d1*
> *               ->  Hash  (cost=0.00..0.00 rows=1000 width=32)*
> *                     ->  Data Node Scan on Table2 "_REMOTE_TABLE_QUERY_"
> (cost=0.00..0.00 rows=1000 width=32)*
> *                           Node/s: datanode_c2_d1, datanode_c3_d1,
> datanode_c4_d1, datanode_c5_d1*
> *                           Coordinator quals: (((i.time)::double
> precision <= trunc(((('now'::cstring)::date - to_date('01 Jan 2000'::text,
> 'DD Mon YYYY'::text)))::double precisio*
> *n)) AND ((i.time)::double precision >= trunc(((((('now'::cstring)::date -
> to_date('01 Jan 2000'::text, 'DD Mon YYYY'::text)) + 1) - 365))::double
> precision)))*
>
>
> Table1 contains 22 million records and Table2 contains 10 million records.
> This query took several minutes (almost 35) to get results and one third
> of that time is spent in FetchTuple. Do you see something wrong with the
> query plan?
>

Can you provide output of
\d+ Table1
\d+ Table2
I am interested in the distribution column of the tables.

Also please provide the output of EXPLAIN VERBOSE on the query.

> Since the multiplexed IO isn't the real bottleneck here, my proposal would
> be to change the way how populating the TupleStore at the coordinator is
> working and introduce parallelism there. Is this a right direction? Please
> let me know your thoughts.
>
> Thanks
> Mani
>
>
> On Wed, 2015-06-03 at 01:49 -0700, Abbas Butt wrote:
>
>
>
>  On Tue, Jun 2, 2015 at 10:17 PM, Manikandan Soundarapandian <
> ma...@vt...> wrote:
>
> Hi Abbas,
>
> Thanks for the reply. I understand how pgxc_node_receive is working to
> get incoming data from the datanodes.
> Yes, I have made experiments to study the postgres-xc system in a setup
> with 12 datanodes and a coordinator and performed join, groupby and
> order by operations on two tables with around 20 million and 10 million
> records respectively. I have found that around 20-30% of the time is
> spent in the 'FetchTuple' method which reads into the provided tupleslot
> one datarow at a time from the combiner's buffer which is in turn filled
> by the pgxc_node_receive method.
>
>
>  XC planner pushes parts of query processing to datanodes to speed things
> up.
>
>  In the queries you tried, did the query planner perform this
> optimization?
>
>  If not then you can gain advantage by improving the planner to generate
> optimized plans that avoid UN-necessary materialization at the coordinator.
>
>  This will bring the FetchTuple time down since there will be less tuples
> to fetch.
>
>
>  If the planner is already producing optimized plans, then what's your
> proposal?
>
>
>
>
> Thanks
>
>
> On Wed, 2015-06-03 at 09:45 +0500, Abbas Butt wrote:
> >
> >
> > On Wed, Jun 3, 2015 at 4:11 AM, Manikandan Soundarapandian
> > <ma...@vt...> wrote:
> >         Hi,
> >
> >
> >         I am a graduate student working on my research in parallel
> >         databases. I would like to know how the postgres-xc
> >         coordinator works. I understand that the datanodes run the
> >         query in parallel and the results are collected by the
> >         coordinator which runs any more computation that is required
> >         or just provides the output to the client that requested the
> >         query. I would like to know whether the coordinator does this
> >         data collection from datanodes in a sequential fashion?
> >
> >
> > The coordinator uses multiplexed IO using select on all fds of
> > datanodes. For more details please see pgxc_node_receive function in
> > pgxcnode.c. The loop for reading data on all set fds is sequential,
> > but the coordinator does not wait for data from the datanode to which
> > the coordinator had sent the query first.
> >
> >         For example, lets consider we want to run the query on table
> >         table_x which is hash distributed among 10 datanodes,
> >         select count(*) from table_x;
> >         Each datanode will run the query and give their local counts
> >         and the coordinator has to collect the individual counts and
> >         come up with the final count before sending the output. Is the
> >         data collection process at the coordinator done in a
> >         sequential fashion? I am actually looking to introduce some
> >         kind of parallelism in this data collection if it is
> >         sequential and do performance studies. Please clarify.
> >
> >
> > To improve performance of any system, first study the bottleneck, and
> > target to widen that. Have you done any study of Postgres-XC to find
> > where the performance bottleneck is?
> >
> >
> >
> >         --
> >         Thanks
> >         Mani
> >         Department of Computer Science
> >         Virginia Tech
> >
> >
>  ------------------------------------------------------------------------------
> >
> >         _______________________________________________
> >         Postgres-xc-developers mailing list
> >         Pos...@li...
> >
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
> >
> >
> >
> >
> >
> > --
> > --
> > Abbas
> > Architect
> >
> >
> > Ph: 92.334.5100153
> >
> > Skype ID: gabbasb
> >
> > www.enterprisedb.com
> >
> > Follow us on Twitter
> > @EnterpriseDB
> >
> > Visit EnterpriseDB for tutorials, webinars, whitepapers and more
> >
>
>
>
>
>
>
> --
>
>  --
> *Abbas*
>
>  Architect
>
>
> Ph: 92.334.5100153
>
>  Skype ID: gabbasb
>
>  www.enterprisedb.co <http://www.enterprisedb.com/>m
> <http://www.enterprisedb.com/>
>
> *Follow us on Twitter*
> @EnterpriseDB
>
> Visit EnterpriseDB for tutorials, webinars, whitepapers
> <http://www.enterprisedb.com/resources-community> and more
> <http://www.enterprisedb.com/resources-community>
>
>
>

-- 
-- 
*Abbas*
 Architect

Ph: 92.334.5100153
 Skype ID: gabbasb
www.enterprisedb.co <http://www.enterprisedb.com/>m
<http://www.enterprisedb.com/>

*Follow us on Twitter*
@EnterpriseDB

Visit EnterpriseDB for tutorials, webinars, whitepapers
<http://www.enterprisedb.com/resources-community> and more
<http://www.enterprisedb.com/resources-community>