From: Amit K. <ami...@en...> - 2013-03-18 10:03:27
|
On 8 March 2013 14:00, Ashutosh Bapat <ash...@en...>wrote: > Hi Amit, > Please find my replies inlined, > > > >> I think the logic of shippability of outer joins is flawless. Didn't >> find any holes. Patch comments below : >> >> ------- >> >> In case of distributed equi-join case, why is >> IsExecNodesColumnDistributed() used instead of >> IsExecNodesDistributedByValue() ? We want to always rule out the round >> robin case, no ? I can see that pgxc_find_dist_equijoin_qual() will >> always fail for round robin tables because they won't have any distrib >> columns, but still , just curious ... >> >> > It keeps open the possibility that we will be able to ship equi-join if we > can somehow infer that the rows from both the sides of join, participating > in the result of join are collocated. > > >> ------- >> >> * PGXC_TODO: What do we do when baselocatortype is >> * LOCATOR_TYPE_DISTRIBUTED? It could be anything HASH >> distributed or >> * MODULO distributed. In that case, having equi-join >> doesn't work >> * really, because same value from different relation >> will go to >> * different node. >> >> The above comment says that it does not work if one of the tables is >> distributed by hash and other table is distributed by modulo. But the >> code is actually checking the baselocatortype also, so I guess it >> works correctly after all ? I did not get what is the TODO here. Or >> does it mean this ? : >> For (t1_hash join t2_hash on ...) tj1 join (t1_mod join t2_mod on ...) >> tj2 on tj1.col1 = tj2.col4 >> the merged nodes for tj1 will have LOCATOR_TYPE_DISTRIBUTED, and the >> merged nodes for tj2 will also be LOCATOR_TYPE_DISTRIBUTED, and so tj1 >> join tj2 would be wrongly marked shippable even though they should not >> be shippable because of the mix of hash and modulo ? >> >> > That's correct. This should be taken care by my second patch up for > review. I think with that patch, we won't need LOCATOR_TYPE_DISTRIBUTED. > While reviewing that patch, can you please also review if this is true. > > >> ------- >> >> Is pgxc_is_expr_shippable(equi_join_expr) necessary ? Won't this qual >> be examined in is_query_shippable() walker ? >> > > This code will get executed in standard_planner() as well, so it's > possible that some of the join quals will be shippable and some are not. > While this is fine for an inner join, we want to make sure the a qual which > implies collocation of rows is shippable. This check is more from future > extension perspective than anything else. > > Ok. Understood all the comments above. > >> -------- >> >> If both tables reside on a single datanode, every join case should be >> shippable, which doesn't seem to be happening : >> postgres=# create table tab2 (id2 int, v varchar) distribute by >> replication to node (datanode_1); >> postgres=# create table tab1 (id1 int, v varchar) to node (datanode_1); >> postgres=# explain select * from (tab1 full outer join tab2 on id1 = id2 >> ) ; >> QUERY PLAN >> >> ------------------------------------------------------------------------------------------------- >> Hash Full Join (cost=0.12..0.26 rows=10 width=72) >> Hash Cond: (tab1.id1 = tab2.id2) >> -> Data Node Scan on tab1 "_REMOTE_TABLE_QUERY_" (cost=0.00..0.00 >> rows=1000 width=36) >> Node/s: datanode_1 >> -> Hash (cost=0.00..0.00 rows=1000 width=36) >> -> Data Node Scan on tab2 "_REMOTE_TABLE_QUERY_" >> (cost=0.00..0.00 rows=1000 width=36) >> Node/s: datanode_1 >> >> Probably you need to take out the following statement out of the >> distributed case and apply it as a general rule: >> /* If there is only single node, try merging the nodes */ >> if (list_length(inner_en->nodeList) == 1 && >> list_length(outer_en->nodeList) == 1) >> merge_nodes = true; >> >> > I am thinking about this and actually thought that we should mark a single > node ExecNodes as REPLICATED, so that it doesn't need any special handling. > What do you think? > I am concerned about loss of information that the underlying table is actually distributed. Also, there is a function IsReturningDMLOnReplicatedTable() which is using this information, although not sure how much it's making use of that information. I leave that to you for deciding which option to choose. I personally feel it's always good to be explicit while checking for this condition. > > >> >> >>> -- >> >>> Best Wishes, >> >>> Ashutosh Bapat >> >>> EntepriseDB Corporation >> >>> The Enterprise Postgres Company >> >> >> >> >> >> >> >> >> >> -- >> >> Best Wishes, >> >> Ashutosh Bapat >> >> EntepriseDB Corporation >> >> The Enterprise Postgres Company >> > >> > >> > >> > >> > -- >> > Best Wishes, >> > Ashutosh Bapat >> > EntepriseDB Corporation >> > The Enterprise Postgres Company >> > >> > >> ------------------------------------------------------------------------------ >> > Free Next-Gen Firewall Hardware Offer >> > Buy your Sophos next-gen firewall before the end March 2013 >> > and get the hardware for free! Learn more. >> > http://p.sf.net/sfu/sophos-d2d-feb >> > _______________________________________________ >> > Postgres-xc-developers mailing list >> > Pos...@li... >> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> > >> > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > |