From: Ashutosh B. <ash...@en...> - 2013-03-18 10:12:02
|
Ok, I think it's better to leave distributed as distributed and handle each separately. On Mon, Mar 18, 2013 at 2:02 PM, Amit Khandekar < ami...@en...> wrote: > > > On 8 March 2013 14:00, Ashutosh Bapat <ash...@en...>wrote: > >> Hi Amit, >> Please find my replies inlined, >> >> >> >>> I think the logic of shippability of outer joins is flawless. Didn't >>> find any holes. Patch comments below : >>> >>> ------- >>> >>> In case of distributed equi-join case, why is >>> IsExecNodesColumnDistributed() used instead of >>> IsExecNodesDistributedByValue() ? We want to always rule out the round >>> robin case, no ? I can see that pgxc_find_dist_equijoin_qual() will >>> always fail for round robin tables because they won't have any distrib >>> columns, but still , just curious ... >>> >>> >> It keeps open the possibility that we will be able to ship equi-join if >> we can somehow infer that the rows from both the sides of join, >> participating in the result of join are collocated. >> >> >>> ------- >>> >>> * PGXC_TODO: What do we do when baselocatortype is >>> * LOCATOR_TYPE_DISTRIBUTED? It could be anything HASH >>> distributed or >>> * MODULO distributed. In that case, having equi-join >>> doesn't work >>> * really, because same value from different relation >>> will go to >>> * different node. >>> >>> The above comment says that it does not work if one of the tables is >>> distributed by hash and other table is distributed by modulo. But the >>> code is actually checking the baselocatortype also, so I guess it >>> works correctly after all ? I did not get what is the TODO here. Or >>> does it mean this ? : >>> For (t1_hash join t2_hash on ...) tj1 join (t1_mod join t2_mod on ...) >>> tj2 on tj1.col1 = tj2.col4 >>> the merged nodes for tj1 will have LOCATOR_TYPE_DISTRIBUTED, and the >>> merged nodes for tj2 will also be LOCATOR_TYPE_DISTRIBUTED, and so tj1 >>> join tj2 would be wrongly marked shippable even though they should not >>> be shippable because of the mix of hash and modulo ? >>> >>> >> That's correct. This should be taken care by my second patch up for >> review. I think with that patch, we won't need LOCATOR_TYPE_DISTRIBUTED. >> While reviewing that patch, can you please also review if this is true. >> >> >>> ------- >>> >>> Is pgxc_is_expr_shippable(equi_join_expr) necessary ? Won't this qual >>> be examined in is_query_shippable() walker ? >>> >> >> This code will get executed in standard_planner() as well, so it's >> possible that some of the join quals will be shippable and some are not. >> While this is fine for an inner join, we want to make sure the a qual which >> implies collocation of rows is shippable. This check is more from future >> extension perspective than anything else. >> >> > > Ok. Understood all the comments above. > > >> >>> -------- >>> >>> If both tables reside on a single datanode, every join case should be >>> shippable, which doesn't seem to be happening : >>> postgres=# create table tab2 (id2 int, v varchar) distribute by >>> replication to node (datanode_1); >>> postgres=# create table tab1 (id1 int, v varchar) to node (datanode_1); >>> postgres=# explain select * from (tab1 full outer join tab2 on id1 = id2 >>> ) ; >>> QUERY PLAN >>> >>> ------------------------------------------------------------------------------------------------- >>> Hash Full Join (cost=0.12..0.26 rows=10 width=72) >>> Hash Cond: (tab1.id1 = tab2.id2) >>> -> Data Node Scan on tab1 "_REMOTE_TABLE_QUERY_" (cost=0.00..0.00 >>> rows=1000 width=36) >>> Node/s: datanode_1 >>> -> Hash (cost=0.00..0.00 rows=1000 width=36) >>> -> Data Node Scan on tab2 "_REMOTE_TABLE_QUERY_" >>> (cost=0.00..0.00 rows=1000 width=36) >>> Node/s: datanode_1 >>> >>> Probably you need to take out the following statement out of the >>> distributed case and apply it as a general rule: >>> /* If there is only single node, try merging the nodes */ >>> if (list_length(inner_en->nodeList) == 1 && >>> list_length(outer_en->nodeList) == 1) >>> merge_nodes = true; >>> >>> >> I am thinking about this and actually thought that we should mark a >> single node ExecNodes as REPLICATED, so that it doesn't need any special >> handling. What do you think? >> > > I am concerned about loss of information that the underlying table is > actually distributed. Also, there is a function > IsReturningDMLOnReplicatedTable() which is using this information, although > not sure how much it's making use of that information. I leave that to you > for deciding which option to choose. I personally feel it's always good to > be explicit while checking for this condition. > > >> >> >>> >>> >>> -- >>> >>> Best Wishes, >>> >>> Ashutosh Bapat >>> >>> EntepriseDB Corporation >>> >>> The Enterprise Postgres Company >>> >> >>> >> >>> >> >>> >> >>> >> -- >>> >> Best Wishes, >>> >> Ashutosh Bapat >>> >> EntepriseDB Corporation >>> >> The Enterprise Postgres Company >>> > >>> > >>> > >>> > >>> > -- >>> > Best Wishes, >>> > Ashutosh Bapat >>> > EntepriseDB Corporation >>> > The Enterprise Postgres Company >>> > >>> > >>> ------------------------------------------------------------------------------ >>> > Free Next-Gen Firewall Hardware Offer >>> > Buy your Sophos next-gen firewall before the end March 2013 >>> > and get the hardware for free! Learn more. >>> > http://p.sf.net/sfu/sophos-d2d-feb >>> > _______________________________________________ >>> > Postgres-xc-developers mailing list >>> > Pos...@li... >>> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>> > >>> >> >> >> >> -- >> Best Wishes, >> Ashutosh Bapat >> EntepriseDB Corporation >> The Enterprise Postgres Company >> > > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |