From: Manikandan S. <ma...@vt...> - 2015-06-03 05:18:11
|
Hi Abbas, Thanks for the reply. I understand how pgxc_node_receive is working to get incoming data from the datanodes. Yes, I have made experiments to study the postgres-xc system in a setup with 12 datanodes and a coordinator and performed join, groupby and order by operations on two tables with around 20 million and 10 million records respectively. I have found that around 20-30% of the time is spent in the 'FetchTuple' method which reads into the provided tupleslot one datarow at a time from the combiner's buffer which is in turn filled by the pgxc_node_receive method. Thanks On Wed, 2015-06-03 at 09:45 +0500, Abbas Butt wrote: > > > On Wed, Jun 3, 2015 at 4:11 AM, Manikandan Soundarapandian > <ma...@vt...> wrote: > Hi, > > > I am a graduate student working on my research in parallel > databases. I would like to know how the postgres-xc > coordinator works. I understand that the datanodes run the > query in parallel and the results are collected by the > coordinator which runs any more computation that is required > or just provides the output to the client that requested the > query. I would like to know whether the coordinator does this > data collection from datanodes in a sequential fashion? > > > The coordinator uses multiplexed IO using select on all fds of > datanodes. For more details please see pgxc_node_receive function in > pgxcnode.c. The loop for reading data on all set fds is sequential, > but the coordinator does not wait for data from the datanode to which > the coordinator had sent the query first. > > For example, lets consider we want to run the query on table > table_x which is hash distributed among 10 datanodes, > select count(*) from table_x; > Each datanode will run the query and give their local counts > and the coordinator has to collect the individual counts and > come up with the final count before sending the output. Is the > data collection process at the coordinator done in a > sequential fashion? I am actually looking to introduce some > kind of parallelism in this data collection if it is > sequential and do performance studies. Please clarify. > > > To improve performance of any system, first study the bottleneck, and > target to widen that. Have you done any study of Postgres-XC to find > where the performance bottleneck is? > > > > -- > Thanks > Mani > Department of Computer Science > Virginia Tech > > ------------------------------------------------------------------------------ > > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > > > > > -- > -- > Abbas > Architect > > > Ph: 92.334.5100153 > > Skype ID: gabbasb > > www.enterprisedb.com > > Follow us on Twitter > @EnterpriseDB > > Visit EnterpriseDB for tutorials, webinars, whitepapers and more > |