Re: [Postgres-xc-developers] Postgres-xc coordinator operation

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Wed, Jun 3, 2015 at 8:57 PM, Manikandan Soundarapandian <ma...@vt...>
wrote:

>  Hi, Please see below
>
> \d+ Table1
> Table "Table1"
>     Column    |         Type          | Modifiers | Storage  | Stats
> target | Description
>
> --------------+-----------------------+-----------+----------+--------------+-------------
>  id          | numeric               |           | main     | 1000
> |
>  age          | numeric               |           | main     |
> 1000         |
>  gender       | numeric               |           | main     |
> 1000         |
>  col1     | character varying(5)  |           | extended | 1000         |
> Indexes:
>     "index_age_table1" btree (age)
>     "index_col1_table1" btree (col1)
>     "index_gender_table1" btree (gender)
>     "index_id_table1" btree (id)
> Has OIDs: no
> Distribute By: HASH(id)
> Location Nodes: ALL DATANODES
>
> \d+ Table2
> Table "Table2"
>      Column      |  Type   | Modifiers | Storage | Stats target |
> Description
>
> -----------------+---------+-----------+---------+--------------+-------------
>  id             | numeric |           | main    | 1000         |
>  rep             | numeric |           | main    | 1000         |
>  time    | numeric |           | main    | 1000         |
>
> Indexes:
>     "index_time_table2" btree (time)
>     "index_id_table2" btree (id)
>     "index_rep_table2" btree (rep)
> Has OIDs: no
> Distribute By: HASH(id)
> Location Nodes: ALL DATANODES
>
> Explain verbose of the query:
> GroupAggregate  (cost=0.07..0.10 rows=1 width=56)
>    Output: d.col1, round(((count(d.id) / 10))::double precision)
>    ->  Sort  (cost=0.07..0.08 rows=1 width=56)
>          Output: d.col1, d.id
>          Sort Key: d.col1
>          ->  Hash Join  (cost=0.01..0.06 rows=1 width=56)
>                Output: d.col1, d.id
>                Hash Cond: (d.id = i.id)
>                ->  Data Node Scan on Table1 "_REMOTE_TABLE_QUERY_"
> (cost=0.00..0.00 rows=1000 width=56)
>                      Output: d.col1, d.id
>                      Node/s: datanode_c2_d1, datanode_c3_d1,
> datanode_c4_d1, datanode_c5_d1
>                      Remote query: SELECT col1, id FROM ONLY Table1 d
> WHERE true
>                ->  Hash  (cost=0.00..0.00 rows=1000 width=32)
>                      Output: i.id
>                      ->  Data Node Scan on Table2 "_REMOTE_TABLE_QUERY_"
> (cost=0.00..0.00 rows=1000 width=32)
>                            Output: i.id
>                            Node/s: datanode_c2_d1, datanode_c3_d1,
> datanode_c4_d1, datanode_c5_d1
>                            Remote query: SELECT id, time FROM ONLY Table2
> i WHERE true
>                            Coordinator quals: (((i.time)::double precision
> <= trunc(((('now'::cstring)::date - to_date('01 Jan 2000'::text, 'DD Mon
> YYYY'::text)))::double precis
> ion)) AND ((i.time)::double precision >= trunc(((((('now'::cstring)::date
> - to_date('01 Jan 2000'::text, 'DD Mon YYYY'::text)) + 1) - 365))::double
> precision)))
>
> Also, one weird thing that I noticed is if I don't give the 'trunc()' date
> manipulation in the query and just gave some valid numbers in the between
> clause like 'between 5266 and 5632', am getting a different query plan
> and the query executes in just a few seconds.
>

The reason is that in this case the join gets pushed down to datanodes,
hence the major computation is done in parallel and only qualifying results
are transferred to the coordinator. Like Ashutosh has suggested you can
look to revise the planning of mutable functions in this case to get better
performance.
Another option is to rewrite the query in the following manner

Get current date in a variable curDate
Compute TRUNC(curDate - TO_DATE('01 Jan 2000','DD Mon YYYY')+1-365) and
save in startDate variable
Compute TRUNC(curDate - TO_DATE('01 Jan 2000','DD Mon YYYY')) and save the
result in endDate
then re-write your query using the already computed values.

You can use libpq based C program to do this.

> The following is the query planner output when I give the numbers directly
> in the between clause.
> GroupAggregate  (cost=49.83..57.35 rows=1 width=56)
>    Output: d.col1, round(((count((count(d.id))) / 10))::double precision)
>    ->  Sort  (cost=49.83..52.33 rows=1000 width=56)
>          Output: d.col1, (count(d.id))
>          Sort Key: d.col1
>          ->  Data Node Scan on "__REMOTE_GROUP_QUERY__"  (cost=0.00..0.00
> rows=1000 width=56)
>                Output: d.col1, (count(d.id))
>                Node/s: datanode_c2_d1, datanode_c3_d1, datanode_c4_d1,
> datanode_c5_d1
>                Remote query: SELECT l.a_1, count(l.a_2) FROM ((SELECT
> d.col1, d.id FROM ONLY Table1 d WHERE true) l(a_1, a_2) JOIN (SELECT i.id
> FROM ONLY Table2 i WHERE ((i.time >= 5266::numeric) AND (i.time <=
> 5632::numeric))) r(a_1) ON (true)) WHERE (l.a_2 = r.a_1) GROUP BY 1 ORDER
> BY 1 DESC
>
> But my requirement is with the trunc() date manipulation.
>
> Thanks
>
>
> On Wed, 2015-06-03 at 18:08 -0700, Abbas Butt wrote:
>
>
>
>  On Wed, Jun 3, 2015 at 5:10 PM, Manikandan Soundarapandian <ma...@vt...>
> wrote:
>
>  Hi Abbas,
>
> I just looked at the planner for the query that am running. This is the
> query that am running,
>
> *SELECT d.col1 RVALUE, ROUND(COUNT(d.col2)/10) COUNT FROM Table1 d, Table2
> i WHERE d.id <http://d.id>=i.id <http://i.id> AND i.time BETWEEN
> TRUNC(CURRENT_DATE-TO_DATE('01 Jan 2000','DD Mon YYYY')+1-365) AND
> TRUNC(CURRENT_DATE-TO_DATE('01 Jan 2000','DD Mon YYYY')) GROUP BY d.col1
> ORDER BY d.col1 DESC;*
>
> And the query plan for a configuration with 4 datanodes and 1 coordinator,
>
> *GroupAggregate  (cost=0.07..0.10 rows=1 width=56)*
> *   ->  Sort  (cost=0.07..0.08 rows=1 width=56)*
> *         Sort Key: d.col1*
> *         ->  Hash Join  (cost=0.01..0.06 rows=1 width=56)*
> *               Hash Cond: (d.id <http://d.id> = i.id <http://i.id>)*
> *               ->  Data Node Scan on Table1 "_REMOTE_TABLE_QUERY_"
> (cost=0.00..0.00 rows=1000 width=56)*
> *                     Node/s: datanode_c2_d1, datanode_c3_d1,
> datanode_c4_d1, datanode_c5_d1*
> *               ->  Hash  (cost=0.00..0.00 rows=1000 width=32)*
> *                     ->  Data Node Scan on Table2 "_REMOTE_TABLE_QUERY_"
> (cost=0.00..0.00 rows=1000 width=32)*
> *                           Node/s: datanode_c2_d1, datanode_c3_d1,
> datanode_c4_d1, datanode_c5_d1*
> *                           Coordinator quals: (((i.time)::double
> precision <= trunc(((('now'::cstring)::date - to_date('01 Jan 2000'::text,
> 'DD Mon YYYY'::text)))::double precisio*
> *n)) AND ((i.time)::double precision >= trunc(((((('now'::cstring)::date -
> to_date('01 Jan 2000'::text, 'DD Mon YYYY'::text)) + 1) - 365))::double
> precision)))*
>
>
> Table1 contains 22 million records and Table2 contains 10 million records.
> This query took several minutes (almost 35) to get results and one third
> of that time is spent in FetchTuple. Do you see something wrong with the
> query plan?
>
>
>
>  Can you provide output of
>
>  \d+ Table1
> \d+ Table2
>
>  I am interested in the distribution column of the tables.
>
>
>  Also please provide the output of EXPLAIN VERBOSE on the query.
>
>
>
>
> Since the multiplexed IO isn't the real bottleneck here, my proposal would
> be to change the way how populating the TupleStore at the coordinator is
> working and introduce parallelism there. Is this a right direction? Please
> let me know your thoughts.
>
> Thanks
> Mani
>
>
>
> On Wed, 2015-06-03 at 01:49 -0700, Abbas Butt wrote:
>
>
>
> On Tue, Jun 2, 2015 at 10:17 PM, Manikandan Soundarapandian <ma...@vt...>
> wrote:
>
> Hi Abbas,
>
> Thanks for the reply. I understand how pgxc_node_receive is working to
> get incoming data from the datanodes.
> Yes, I have made experiments to study the postgres-xc system in a setup
> with 12 datanodes and a coordinator and performed join, groupby and
> order by operations on two tables with around 20 million and 10 million
> records respectively. I have found that around 20-30% of the time is
> spent in the 'FetchTuple' method which reads into the provided tupleslot
> one datarow at a time from the combiner's buffer which is in turn filled
> by the pgxc_node_receive method.
>
>
> XC planner pushes parts of query processing to datanodes to speed things
> up.
>
> In the queries you tried, did the query planner perform this optimization?
>
> If not then you can gain advantage by improving the planner to generate
> optimized plans that avoid UN-necessary materialization at the coordinator.
>
> This will bring the FetchTuple time down since there will be less tuples
> to fetch.
>
>
> If the planner is already producing optimized plans, then what's your
> proposal?
>
>
>
>
> Thanks
>
> On Wed, 2015-06-03 at 09:45 +0500, Abbas Butt wrote:
> >
> >
> > On Wed, Jun 3, 2015 at 4:11 AM, Manikandan Soundarapandian
> > <ma...@vt...> wrote:
> >         Hi,
> >
> >
> >         I am a graduate student working on my research in parallel
> >         databases. I would like to know how the postgres-xc
> >         coordinator works. I understand that the datanodes run the
> >         query in parallel and the results are collected by the
> >         coordinator which runs any more computation that is required
> >         or just provides the output to the client that requested the
> >         query. I would like to know whether the coordinator does this
> >         data collection from datanodes in a sequential fashion?
> >
> >
> > The coordinator uses multiplexed IO using select on all fds of
> > datanodes. For more details please see pgxc_node_receive function in
> > pgxcnode.c. The loop for reading data on all set fds is sequential,
> > but the coordinator does not wait for data from the datanode to which
> > the coordinator had sent the query first.
> >
> >         For example, lets consider we want to run the query on table
> >         table_x which is hash distributed among 10 datanodes,
> >         select count(*) from table_x;
> >         Each datanode will run the query and give their local counts
> >         and the coordinator has to collect the individual counts and
> >         come up with the final count before sending the output. Is the
> >         data collection process at the coordinator done in a
> >         sequential fashion? I am actually looking to introduce some
> >         kind of parallelism in this data collection if it is
> >         sequential and do performance studies. Please clarify.
> >
> >
> > To improve performance of any system, first study the bottleneck, and
> > target to widen that. Have you done any study of Postgres-XC to find
> > where the performance bottleneck is?
> >
> >
> >
> >         --
> >         Thanks
> >         Mani
> >         Department of Computer Science
> >         Virginia Tech
> >
> >
>  ------------------------------------------------------------------------------
> >
> >         _______________________________________________
> >         Postgres-xc-developers mailing list
> >         Pos...@li...
> >
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
> >
> >
> >
> >
> >
> > --
> > --
> > Abbas
> > Architect
> >
> >
> > Ph: 92.334.5100153
> >
> > Skype ID: gabbasb
> >
> > www.enterprisedb.com
> >
> > Follow us on Twitter
> > @EnterpriseDB
> >
> > Visit EnterpriseDB for tutorials, webinars, whitepapers and more
> >
>
>
>
>
>
>
> --
> --
> *Abbas*
> Architect
>
>
> Ph: 92.334.5100153
>
> Skype ID: gabbasb
>
> www.enterprisedb.co <http://www.enterprisedb.com/>m
> <http://www.enterprisedb.com/>
>
> *Follow us on Twitter*
> @EnterpriseDB
>
> Visit EnterpriseDB for tutorials, webinars, whitepapers
> <http://www.enterprisedb.com/resources-community> and more
> <http://www.enterprisedb.com/resources-community>
>
>
>
>
>
>
> --
>
>  --
> *Abbas*
>
>  Architect
>
>
> Ph: 92.334.5100153
>
>  Skype ID: gabbasb
>
>  www.enterprisedb.co <http://www.enterprisedb.com/>m
> <http://www.enterprisedb.com/>
>
> *Follow us on Twitter*
> @EnterpriseDB
>
> Visit EnterpriseDB for tutorials, webinars, whitepapers
> <http://www.enterprisedb.com/resources-community> and more
> <http://www.enterprisedb.com/resources-community>
>
>
>

-- 
-- 
*Abbas*
 Architect

Ph: 92.334.5100153
 Skype ID: gabbasb
www.enterprisedb.co <http://www.enterprisedb.com/>m
<http://www.enterprisedb.com/>

*Follow us on Twitter*
@EnterpriseDB

Visit EnterpriseDB for tutorials, webinars, whitepapers
<http://www.enterprisedb.com/resources-community> and more
<http://www.enterprisedb.com/resources-community>