Re: [Postgres-xc-developers] shipping outer joins

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Ok, I think it's better to leave distributed as distributed and handle each
separately.

On Mon, Mar 18, 2013 at 2:02 PM, Amit Khandekar <
ami...@en...> wrote:

>
>
> On 8 March 2013 14:00, Ashutosh Bapat <ash...@en...>wrote:
>
>> Hi Amit,
>> Please find my replies inlined,
>>
>>
>>
>>> I think the logic of shippability of outer joins is flawless. Didn't
>>> find any holes. Patch comments below :
>>>
>>> -------
>>>
>>> In case of distributed equi-join case, why is
>>> IsExecNodesColumnDistributed() used instead of
>>> IsExecNodesDistributedByValue() ? We want to always rule out the round
>>> robin case, no ? I can see that pgxc_find_dist_equijoin_qual() will
>>> always fail for round robin tables because they won't have any distrib
>>> columns, but still , just curious ...
>>>
>>>
>> It keeps open the possibility that we will be able to ship equi-join if
>> we can somehow infer that the rows from both the sides of join,
>> participating in the result of join are collocated.
>>
>>
>>> -------
>>>
>>>                  * PGXC_TODO: What do we do when baselocatortype is
>>>                  * LOCATOR_TYPE_DISTRIBUTED? It could be anything HASH
>>> distributed or
>>>                  * MODULO distributed. In that case, having equi-join
>>> doesn't work
>>>                  * really, because same value from different relation
>>> will go to
>>>                  * different node.
>>>
>>> The above comment says that it does not work if one of the tables is
>>> distributed by hash and other table is distributed by modulo. But the
>>> code is actually checking the baselocatortype also, so I guess it
>>> works correctly after all ? I did not get what is the TODO here. Or
>>> does it mean this ? :
>>> For (t1_hash join t2_hash on ...) tj1 join (t1_mod join t2_mod on ...)
>>> tj2 on tj1.col1 = tj2.col4
>>> the merged nodes for tj1 will have  LOCATOR_TYPE_DISTRIBUTED, and the
>>> merged nodes for tj2 will also be LOCATOR_TYPE_DISTRIBUTED, and so tj1
>>> join tj2 would be wrongly marked shippable even though they should not
>>> be shippable because of the mix of hash and modulo ?
>>>
>>>
>> That's correct. This should be taken care by my second patch up for
>> review. I think with that patch, we won't need LOCATOR_TYPE_DISTRIBUTED.
>> While reviewing that patch, can you please also review if this is true.
>>
>>
>>> -------
>>>
>>> Is pgxc_is_expr_shippable(equi_join_expr) necessary ? Won't this qual
>>> be examined in is_query_shippable() walker ?
>>>
>>
>> This code will get executed in standard_planner() as well, so it's
>> possible that some of the join quals will be shippable and some are not.
>> While this is fine for an inner join, we want to make sure the a qual which
>> implies collocation of rows is shippable. This check is more from future
>> extension perspective than anything else.
>>
>>
>
> Ok. Understood all the comments above.
>
>
>>
>>> --------
>>>
>>> If both tables reside on a single datanode, every join case should be
>>> shippable, which doesn't seem to be happening :
>>> postgres=# create table tab2 (id2 int, v varchar) distribute by
>>> replication to node (datanode_1);
>>> postgres=# create table tab1 (id1 int, v varchar) to node (datanode_1);
>>> postgres=# explain select * from (tab1 full outer join tab2 on id1 = id2
>>> ) ;
>>>                                            QUERY PLAN
>>>
>>> -------------------------------------------------------------------------------------------------
>>>  Hash Full Join  (cost=0.12..0.26 rows=10 width=72)
>>>    Hash Cond: (tab1.id1 = tab2.id2)
>>>    ->  Data Node Scan on tab1 "_REMOTE_TABLE_QUERY_"  (cost=0.00..0.00
>>> rows=1000 width=36)
>>>          Node/s: datanode_1
>>>    ->  Hash  (cost=0.00..0.00 rows=1000 width=36)
>>>          ->  Data Node Scan on tab2 "_REMOTE_TABLE_QUERY_"
>>> (cost=0.00..0.00 rows=1000 width=36)
>>>                Node/s: datanode_1
>>>
>>> Probably you need to take out the following statement out of the
>>> distributed case and apply it as a general rule:
>>>                 /* If there is only single node, try merging the nodes */
>>>                 if (list_length(inner_en->nodeList) == 1 &&
>>>                         list_length(outer_en->nodeList) == 1)
>>>                         merge_nodes = true;
>>>
>>>
>> I am thinking about this and actually thought that we should mark a
>> single node ExecNodes as REPLICATED, so that it doesn't need any special
>> handling. What do you think?
>>
>
> I am concerned about loss of information that the underlying table is
> actually distributed. Also, there is a function
> IsReturningDMLOnReplicatedTable() which is using this information, although
> not sure how much it's making use of that information. I leave that to you
> for deciding which option to choose. I personally feel it's always good to
> be explicit while checking for this condition.
>
>
>>
>>
>>>
>>> >>> --
>>> >>> Best Wishes,
>>> >>> Ashutosh Bapat
>>> >>> EntepriseDB Corporation
>>> >>> The Enterprise Postgres Company
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Best Wishes,
>>> >> Ashutosh Bapat
>>> >> EntepriseDB Corporation
>>> >> The Enterprise Postgres Company
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > Best Wishes,
>>> > Ashutosh Bapat
>>> > EntepriseDB Corporation
>>> > The Enterprise Postgres Company
>>> >
>>> >
>>> ------------------------------------------------------------------------------
>>> > Free Next-Gen Firewall Hardware Offer
>>> > Buy your Sophos next-gen firewall before the end March 2013
>>> > and get the hardware for free! Learn more.
>>> > http://p.sf.net/sfu/sophos-d2d-feb
>>> > _______________________________________________
>>> > Postgres-xc-developers mailing list
>>> > Pos...@li...
>>> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>>> >
>>>
>>
>>
>>
>> --
>> Best Wishes,
>> Ashutosh Bapat
>> EntepriseDB Corporation
>> The Enterprise Postgres Company
>>
>
>

-- 
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Enterprise Postgres Company