From: Abbas B. <abb...@en...> - 2015-06-03 08:49:53
|
On Tue, Jun 2, 2015 at 10:17 PM, Manikandan Soundarapandian <ma...@vt...> wrote: > Hi Abbas, > > Thanks for the reply. I understand how pgxc_node_receive is working to > get incoming data from the datanodes. > Yes, I have made experiments to study the postgres-xc system in a setup > with 12 datanodes and a coordinator and performed join, groupby and > order by operations on two tables with around 20 million and 10 million > records respectively. I have found that around 20-30% of the time is > spent in the 'FetchTuple' method which reads into the provided tupleslot > one datarow at a time from the combiner's buffer which is in turn filled > by the pgxc_node_receive method. > XC planner pushes parts of query processing to datanodes to speed things up. In the queries you tried, did the query planner perform this optimization? If not then you can gain advantage by improving the planner to generate optimized plans that avoid UN-necessary materialization at the coordinator. This will bring the FetchTuple time down since there will be less tuples to fetch. If the planner is already producing optimized plans, then what's your proposal? > Thanks > > On Wed, 2015-06-03 at 09:45 +0500, Abbas Butt wrote: > > > > > > On Wed, Jun 3, 2015 at 4:11 AM, Manikandan Soundarapandian > > <ma...@vt...> wrote: > > Hi, > > > > > > I am a graduate student working on my research in parallel > > databases. I would like to know how the postgres-xc > > coordinator works. I understand that the datanodes run the > > query in parallel and the results are collected by the > > coordinator which runs any more computation that is required > > or just provides the output to the client that requested the > > query. I would like to know whether the coordinator does this > > data collection from datanodes in a sequential fashion? > > > > > > The coordinator uses multiplexed IO using select on all fds of > > datanodes. For more details please see pgxc_node_receive function in > > pgxcnode.c. The loop for reading data on all set fds is sequential, > > but the coordinator does not wait for data from the datanode to which > > the coordinator had sent the query first. > > > > For example, lets consider we want to run the query on table > > table_x which is hash distributed among 10 datanodes, > > select count(*) from table_x; > > Each datanode will run the query and give their local counts > > and the coordinator has to collect the individual counts and > > come up with the final count before sending the output. Is the > > data collection process at the coordinator done in a > > sequential fashion? I am actually looking to introduce some > > kind of parallelism in this data collection if it is > > sequential and do performance studies. Please clarify. > > > > > > To improve performance of any system, first study the bottleneck, and > > target to widen that. Have you done any study of Postgres-XC to find > > where the performance bottleneck is? > > > > > > > > -- > > Thanks > > Mani > > Department of Computer Science > > Virginia Tech > > > > > ------------------------------------------------------------------------------ > > > > _______________________________________________ > > Postgres-xc-developers mailing list > > Pos...@li... > > > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > > > > > > > > > > > -- > > -- > > Abbas > > Architect > > > > > > Ph: 92.334.5100153 > > > > Skype ID: gabbasb > > > > www.enterprisedb.com > > > > Follow us on Twitter > > @EnterpriseDB > > > > Visit EnterpriseDB for tutorials, webinars, whitepapers and more > > > > > -- -- *Abbas* Architect Ph: 92.334.5100153 Skype ID: gabbasb www.enterprisedb.co <http://www.enterprisedb.com/>m <http://www.enterprisedb.com/> *Follow us on Twitter* @EnterpriseDB Visit EnterpriseDB for tutorials, webinars, whitepapers <http://www.enterprisedb.com/resources-community> and more <http://www.enterprisedb.com/resources-community> |