Re: [Postgres-xc-developers] Postgres-xc coordinator operation

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Thu, Jun 4, 2015 at 6:38 AM, Abbas Butt <abb...@en...>
wrote:

>
>
> On Wed, Jun 3, 2015 at 5:10 PM, Manikandan Soundarapandian <ma...@vt...>
> wrote:
>
>>  Hi Abbas,
>>
>> I just looked at the planner for the query that am running. This is the
>> query that am running,
>>
>> *SELECT d.col1 RVALUE, ROUND(COUNT(d.col2)/10) COUNT FROM Table1 d,
>> Table2 i WHERE d.id <http://d.id>=i.id <http://i.id> AND i.time BETWEEN
>> TRUNC(CURRENT_DATE-TO_DATE('01 Jan 2000','DD Mon YYYY')+1-365) AND
>> TRUNC(CURRENT_DATE-TO_DATE('01 Jan 2000','DD Mon YYYY')) GROUP BY d.col1
>> ORDER BY d.col1 DESC;*
>>
>> And the query plan for a configuration with 4 datanodes and 1 coordinator,
>>
>> *GroupAggregate  (cost=0.07..0.10 rows=1 width=56)*
>> *   ->  Sort  (cost=0.07..0.08 rows=1 width=56)*
>> *         Sort Key: d.col1*
>> *         ->  Hash Join  (cost=0.01..0.06 rows=1 width=56)*
>> *               Hash Cond: (d.id <http://d.id> = i.id <http://i.id>)*
>>
>
Are both the tables distributed by id?
The clauses with mutable functions like current_date are being pushded down
to individual relations. That prevents join from being pushed down to the
datanodes if id is distribution key. I think that should be fixed. The
clause movement should be reversed to the Join node.


> *               ->  Data Node Scan on Table1 "_REMOTE_TABLE_QUERY_"
>> (cost=0.00..0.00 rows=1000 width=56)*
>> *                     Node/s: datanode_c2_d1, datanode_c3_d1,
>> datanode_c4_d1, datanode_c5_d1*
>> *               ->  Hash  (cost=0.00..0.00 rows=1000 width=32)*
>> *                     ->  Data Node Scan on Table2
>> "_REMOTE_TABLE_QUERY_"  (cost=0.00..0.00 rows=1000 width=32)*
>> *                           Node/s: datanode_c2_d1, datanode_c3_d1,
>> datanode_c4_d1, datanode_c5_d1*
>> *                           Coordinator quals: (((i.time)::double
>> precision <= trunc(((('now'::cstring)::date - to_date('01 Jan 2000'::text,
>> 'DD Mon YYYY'::text)))::double precisio*
>> *n)) AND ((i.time)::double precision >= trunc(((((('now'::cstring)::date
>> - to_date('01 Jan 2000'::text, 'DD Mon YYYY'::text)) + 1) - 365))::double
>> precision)))*
>>
>>
>> Table1 contains 22 million records and Table2 contains 10 million records.
>> This query took several minutes (almost 35) to get results and one third
>> of that time is spent in FetchTuple. Do you see something wrong with the
>> query plan?
>>
>
> Can you provide output of
> \d+ Table1
> \d+ Table2
> I am interested in the distribution column of the tables.
>
> Also please provide the output of EXPLAIN VERBOSE on the query.
>
>
>
>> Since the multiplexed IO isn't the real bottleneck here, my proposal
>> would be to change the way how populating the TupleStore at the coordinator
>> is working and introduce parallelism there. Is this a right direction?
>> Please let me know your thoughts.
>>
>> Thanks
>> Mani
>>
>>
>> On Wed, 2015-06-03 at 01:49 -0700, Abbas Butt wrote:
>>
>>
>>
>>  On Tue, Jun 2, 2015 at 10:17 PM, Manikandan Soundarapandian <
>> ma...@vt...> wrote:
>>
>> Hi Abbas,
>>
>> Thanks for the reply. I understand how pgxc_node_receive is working to
>> get incoming data from the datanodes.
>> Yes, I have made experiments to study the postgres-xc system in a setup
>> with 12 datanodes and a coordinator and performed join, groupby and
>> order by operations on two tables with around 20 million and 10 million
>> records respectively. I have found that around 20-30% of the time is
>> spent in the 'FetchTuple' method which reads into the provided tupleslot
>> one datarow at a time from the combiner's buffer which is in turn filled
>> by the pgxc_node_receive method.
>>
>>
>>  XC planner pushes parts of query processing to datanodes to speed
>> things up.
>>
>>  In the queries you tried, did the query planner perform this
>> optimization?
>>
>>  If not then you can gain advantage by improving the planner to generate
>> optimized plans that avoid UN-necessary materialization at the coordinator.
>>
>>  This will bring the FetchTuple time down since there will be less
>> tuples to fetch.
>>
>>
>>  If the planner is already producing optimized plans, then what's your
>> proposal?
>>
>>
>>
>>
>> Thanks
>>
>>
>> On Wed, 2015-06-03 at 09:45 +0500, Abbas Butt wrote:
>> >
>> >
>> > On Wed, Jun 3, 2015 at 4:11 AM, Manikandan Soundarapandian
>> > <ma...@vt...> wrote:
>> >         Hi,
>> >
>> >
>> >         I am a graduate student working on my research in parallel
>> >         databases. I would like to know how the postgres-xc
>> >         coordinator works. I understand that the datanodes run the
>> >         query in parallel and the results are collected by the
>> >         coordinator which runs any more computation that is required
>> >         or just provides the output to the client that requested the
>> >         query. I would like to know whether the coordinator does this
>> >         data collection from datanodes in a sequential fashion?
>> >
>> >
>> > The coordinator uses multiplexed IO using select on all fds of
>> > datanodes. For more details please see pgxc_node_receive function in
>> > pgxcnode.c. The loop for reading data on all set fds is sequential,
>> > but the coordinator does not wait for data from the datanode to which
>> > the coordinator had sent the query first.
>> >
>> >         For example, lets consider we want to run the query on table
>> >         table_x which is hash distributed among 10 datanodes,
>> >         select count(*) from table_x;
>> >         Each datanode will run the query and give their local counts
>> >         and the coordinator has to collect the individual counts and
>> >         come up with the final count before sending the output. Is the
>> >         data collection process at the coordinator done in a
>> >         sequential fashion? I am actually looking to introduce some
>> >         kind of parallelism in this data collection if it is
>> >         sequential and do performance studies. Please clarify.
>> >
>> >
>> > To improve performance of any system, first study the bottleneck, and
>> > target to widen that. Have you done any study of Postgres-XC to find
>> > where the performance bottleneck is?
>> >
>> >
>> >
>> >         --
>> >         Thanks
>> >         Mani
>> >         Department of Computer Science
>> >         Virginia Tech
>> >
>> >
>>  ------------------------------------------------------------------------------
>> >
>> >         _______________________________________________
>> >         Postgres-xc-developers mailing list
>> >         Pos...@li...
>> >
>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>> >
>> >
>> >
>> >
>> >
>> > --
>> > --
>> > Abbas
>> > Architect
>> >
>> >
>> > Ph: 92.334.5100153
>> >
>> > Skype ID: gabbasb
>> >
>> > www.enterprisedb.com
>> >
>> > Follow us on Twitter
>> > @EnterpriseDB
>> >
>> > Visit EnterpriseDB for tutorials, webinars, whitepapers and more
>> >
>>
>>
>>
>>
>>
>>
>> --
>>
>>  --
>> *Abbas*
>>
>>  Architect
>>
>>
>> Ph: 92.334.5100153
>>
>>  Skype ID: gabbasb
>>
>>  www.enterprisedb.co <http://www.enterprisedb.com/>m
>> <http://www.enterprisedb.com/>
>>
>> *Follow us on Twitter*
>> @EnterpriseDB
>>
>> Visit EnterpriseDB for tutorials, webinars, whitepapers
>> <http://www.enterprisedb.com/resources-community> and more
>> <http://www.enterprisedb.com/resources-community>
>>
>>
>>
>
>
> --
> --
> *Abbas*
>  Architect
>
> Ph: 92.334.5100153
>  Skype ID: gabbasb
> www.enterprisedb.co <http://www.enterprisedb.com/>m
> <http://www.enterprisedb.com/>
>
> *Follow us on Twitter*
> @EnterpriseDB
>
> Visit EnterpriseDB for tutorials, webinars, whitepapers
> <http://www.enterprisedb.com/resources-community> and more
> <http://www.enterprisedb.com/resources-community>
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Postgres-xc-developers mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>
>


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company