From: Abbas B. <abb...@en...> - 2015-06-04 01:08:44
|
On Wed, Jun 3, 2015 at 5:10 PM, Manikandan Soundarapandian <ma...@vt...> wrote: > Hi Abbas, > > I just looked at the planner for the query that am running. This is the > query that am running, > > *SELECT d.col1 RVALUE, ROUND(COUNT(d.col2)/10) COUNT FROM Table1 d, Table2 > i WHERE d.id <http://d.id>=i.id <http://i.id> AND i.time BETWEEN > TRUNC(CURRENT_DATE-TO_DATE('01 Jan 2000','DD Mon YYYY')+1-365) AND > TRUNC(CURRENT_DATE-TO_DATE('01 Jan 2000','DD Mon YYYY')) GROUP BY d.col1 > ORDER BY d.col1 DESC;* > > And the query plan for a configuration with 4 datanodes and 1 coordinator, > > *GroupAggregate (cost=0.07..0.10 rows=1 width=56)* > * -> Sort (cost=0.07..0.08 rows=1 width=56)* > * Sort Key: d.col1* > * -> Hash Join (cost=0.01..0.06 rows=1 width=56)* > * Hash Cond: (d.id <http://d.id> = i.id <http://i.id>)* > * -> Data Node Scan on Table1 "_REMOTE_TABLE_QUERY_" > (cost=0.00..0.00 rows=1000 width=56)* > * Node/s: datanode_c2_d1, datanode_c3_d1, > datanode_c4_d1, datanode_c5_d1* > * -> Hash (cost=0.00..0.00 rows=1000 width=32)* > * -> Data Node Scan on Table2 "_REMOTE_TABLE_QUERY_" > (cost=0.00..0.00 rows=1000 width=32)* > * Node/s: datanode_c2_d1, datanode_c3_d1, > datanode_c4_d1, datanode_c5_d1* > * Coordinator quals: (((i.time)::double > precision <= trunc(((('now'::cstring)::date - to_date('01 Jan 2000'::text, > 'DD Mon YYYY'::text)))::double precisio* > *n)) AND ((i.time)::double precision >= trunc(((((('now'::cstring)::date - > to_date('01 Jan 2000'::text, 'DD Mon YYYY'::text)) + 1) - 365))::double > precision)))* > > > Table1 contains 22 million records and Table2 contains 10 million records. > This query took several minutes (almost 35) to get results and one third > of that time is spent in FetchTuple. Do you see something wrong with the > query plan? > Can you provide output of \d+ Table1 \d+ Table2 I am interested in the distribution column of the tables. Also please provide the output of EXPLAIN VERBOSE on the query. > Since the multiplexed IO isn't the real bottleneck here, my proposal would > be to change the way how populating the TupleStore at the coordinator is > working and introduce parallelism there. Is this a right direction? Please > let me know your thoughts. > > Thanks > Mani > > > On Wed, 2015-06-03 at 01:49 -0700, Abbas Butt wrote: > > > > On Tue, Jun 2, 2015 at 10:17 PM, Manikandan Soundarapandian < > ma...@vt...> wrote: > > Hi Abbas, > > Thanks for the reply. I understand how pgxc_node_receive is working to > get incoming data from the datanodes. > Yes, I have made experiments to study the postgres-xc system in a setup > with 12 datanodes and a coordinator and performed join, groupby and > order by operations on two tables with around 20 million and 10 million > records respectively. I have found that around 20-30% of the time is > spent in the 'FetchTuple' method which reads into the provided tupleslot > one datarow at a time from the combiner's buffer which is in turn filled > by the pgxc_node_receive method. > > > XC planner pushes parts of query processing to datanodes to speed things > up. > > In the queries you tried, did the query planner perform this > optimization? > > If not then you can gain advantage by improving the planner to generate > optimized plans that avoid UN-necessary materialization at the coordinator. > > This will bring the FetchTuple time down since there will be less tuples > to fetch. > > > If the planner is already producing optimized plans, then what's your > proposal? > > > > > Thanks > > > On Wed, 2015-06-03 at 09:45 +0500, Abbas Butt wrote: > > > > > > On Wed, Jun 3, 2015 at 4:11 AM, Manikandan Soundarapandian > > <ma...@vt...> wrote: > > Hi, > > > > > > I am a graduate student working on my research in parallel > > databases. I would like to know how the postgres-xc > > coordinator works. I understand that the datanodes run the > > query in parallel and the results are collected by the > > coordinator which runs any more computation that is required > > or just provides the output to the client that requested the > > query. I would like to know whether the coordinator does this > > data collection from datanodes in a sequential fashion? > > > > > > The coordinator uses multiplexed IO using select on all fds of > > datanodes. For more details please see pgxc_node_receive function in > > pgxcnode.c. The loop for reading data on all set fds is sequential, > > but the coordinator does not wait for data from the datanode to which > > the coordinator had sent the query first. > > > > For example, lets consider we want to run the query on table > > table_x which is hash distributed among 10 datanodes, > > select count(*) from table_x; > > Each datanode will run the query and give their local counts > > and the coordinator has to collect the individual counts and > > come up with the final count before sending the output. Is the > > data collection process at the coordinator done in a > > sequential fashion? I am actually looking to introduce some > > kind of parallelism in this data collection if it is > > sequential and do performance studies. Please clarify. > > > > > > To improve performance of any system, first study the bottleneck, and > > target to widen that. Have you done any study of Postgres-XC to find > > where the performance bottleneck is? > > > > > > > > -- > > Thanks > > Mani > > Department of Computer Science > > Virginia Tech > > > > > ------------------------------------------------------------------------------ > > > > _______________________________________________ > > Postgres-xc-developers mailing list > > Pos...@li... > > > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > > > > > > > > > > > -- > > -- > > Abbas > > Architect > > > > > > Ph: 92.334.5100153 > > > > Skype ID: gabbasb > > > > www.enterprisedb.com > > > > Follow us on Twitter > > @EnterpriseDB > > > > Visit EnterpriseDB for tutorials, webinars, whitepapers and more > > > > > > > > > -- > > -- > *Abbas* > > Architect > > > Ph: 92.334.5100153 > > Skype ID: gabbasb > > www.enterprisedb.co <http://www.enterprisedb.com/>m > <http://www.enterprisedb.com/> > > *Follow us on Twitter* > @EnterpriseDB > > Visit EnterpriseDB for tutorials, webinars, whitepapers > <http://www.enterprisedb.com/resources-community> and more > <http://www.enterprisedb.com/resources-community> > > > -- -- *Abbas* Architect Ph: 92.334.5100153 Skype ID: gabbasb www.enterprisedb.co <http://www.enterprisedb.com/>m <http://www.enterprisedb.com/> *Follow us on Twitter* @EnterpriseDB Visit EnterpriseDB for tutorials, webinars, whitepapers <http://www.enterprisedb.com/resources-community> and more <http://www.enterprisedb.com/resources-community> |