Re: [Postgres-xc-developers] Postgres-xc coordinator operation

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Abbas,

Thanks for the reply. I understand how pgxc_node_receive is working to
get incoming data from the datanodes. 
Yes, I have made experiments to study the postgres-xc system in a setup
with 12 datanodes and a coordinator and performed join, groupby and
order by operations on two tables with around 20 million and 10 million
records respectively. I have found that around 20-30% of the time is
spent in the 'FetchTuple' method which reads into the provided tupleslot
one datarow at a time from the combiner's buffer which is in turn filled
by the pgxc_node_receive method. 

Thanks

On Wed, 2015-06-03 at 09:45 +0500, Abbas Butt wrote:
> 
> 
> On Wed, Jun 3, 2015 at 4:11 AM, Manikandan Soundarapandian
> <ma...@vt...> wrote:
>         Hi,
>         
>         
>         I am a graduate student working on my research in parallel
>         databases. I would like to know how the postgres-xc
>         coordinator works. I understand that the datanodes run the
>         query in parallel and the results are collected by the
>         coordinator which runs any more computation that is required
>         or just provides the output to the client that requested the
>         query. I would like to know whether the coordinator does this
>         data collection from datanodes in a sequential fashion?
> 
> 
> The coordinator uses multiplexed IO using select on all fds of
> datanodes. For more details please see pgxc_node_receive function in
> pgxcnode.c. The loop for reading data on all set fds is sequential,
> but the coordinator does not wait for data from the datanode to which
> the coordinator had sent the query first.
>  
>         For example, lets consider we want to run the query on table
>         table_x which is hash distributed among 10 datanodes,
>         select count(*) from table_x;
>         Each datanode will run the query and give their local counts
>         and the coordinator has to collect the individual counts and
>         come up with the final count before sending the output. Is the
>         data collection process at the coordinator done in a
>         sequential fashion? I am actually looking to introduce some
>         kind of parallelism in this data collection if it is
>         sequential and do performance studies. Please clarify.
> 
> 
> To improve performance of any system, first study the bottleneck, and
> target to widen that. Have you done any study of Postgres-XC to find
> where the performance bottleneck is?
>  
>         
>         
>         -- 
>         Thanks
>         Mani
>         Department of Computer Science
>         Virginia Tech
>         
>         ------------------------------------------------------------------------------
>         
>         _______________________________________________
>         Postgres-xc-developers mailing list
>         Pos...@li...
>         https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>         
> 
> 
> 
> 
> -- 
> -- 
> Abbas
> Architect
> 
> 
> Ph: 92.334.5100153
> 
> Skype ID: gabbasb
> 
> www.enterprisedb.com
> 
> Follow us on Twitter
> @EnterpriseDB 
> 
> Visit EnterpriseDB for tutorials, webinars, whitepapers and more
>