Re: [Postgres-xc-developers] Using remote sorting for merge-join

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Understood the situation.   Bulk row transfer between
coordinator/datanode is another infrastructure we need for sure.
This will fit 10G network (we need to use giant packet to use its
bandwidth).

Regards;
----------
Koichi Suzuki

2013/3/26 Ashutosh Bapat <ash...@en...>:
>
>
> On Tue, Mar 26, 2013 at 10:19 AM, Koichi Suzuki <koi...@gm...>
> wrote:
>>
>> On thing we should think for option 1 is:
>>
>> When a number of the result is huge, applications has to wait long
>> time until they get the first row.  Because this option may need disk
>> write, total resource consumption will be larger.
>>
>
> Yes, I am aware of this fact. Please read the next paragraph and you will
> see that the current situation is no better.
>
>>
>> I'm wondering if we can use "cursor" at database so that we can read
>> each tape more simply, I mean, to leave each query node open and read
>> next row from any query node.
>>
>
> We do that right now. But because of such a simulated cursor (it's not
> cursor per say, but we just fetch the required result from connection as the
> demand arises in merging runs), we observer following things
>
> If the plan has multiple remote query nodes (as there will be in case of
> merge join), we assign the same connection to these nodes. Before this
> assignment, the result from the previous connection is materialised at the
> coordinator. This means that, when we will get huge result from the
> datanode, it will be materialised (which will have the more cost as
> materialising it on tape, as this materialisation happens in a linked list,
> which is not optimized). We need to share connection between more than one
> RemoteQuery node because same transaction can not work on two connections to
> same server. Not only performance, but the code has become ugly because of
> this approach. At various places in executor, we have special handling for
> sorting, which needs to be maintained.
>
> Instead if we materialise all the result on tape and then proceed with step
> D5 in Knuth's algorithm for polyphase merge sort, the code will be much
> simpler and we won't loose much performance. In fact, we might be able to
> leverage fetching bulk data on connection which can be materialised on tape
> in bulk.
>
>>
>> Regards;
>> ----------
>> Koichi Suzuki
>>
>>
>> 2013/3/25 Ashutosh Bapat <ash...@en...>:
>> > Hi All,
>> > I am working on using remote sorting for merge joins. The idea is while
>> > using merge join at the coordinator, get the data sorted from the
>> > datanodes;
>> > for replicated relations, we can get all the rows sorted and for
>> > distributed
>> > tables we have to get sorted runs which can be merged at the
>> > coordinator.
>> > For merge join the sorted inner relation needs to be randomly
>> > accessible.
>> > For replicated relations this can be achieved by materialising the
>> > result.
>> > But for distributed relations, we do not materialise the sorted result
>> > at
>> > coordinator but compute the sorted result by merging the sorted results
>> > from
>> > individual nodes on the fly. For distributed relations, the connection
>> > to
>> > the datanodes themselves are used as logical tapes (which provide the
>> > sorted
>> > runs). The final result is computed on the fly by choosing the smallest
>> > or
>> > greatest row (as required) from the connections.
>> >
>> > For a Sort node the materialised result can reside in memory (if it fits
>> > there) or on one of the logical tapes used for merge sort. So, in order
>> > to
>> > provide random access to the sorted result, we need to materialise the
>> > result either in the memory or on the logical tape. In-memory
>> > materialisation is not easily possible since we have already resorted
>> > for
>> > tape based sort, in case of distributed relations and to materialise the
>> > result on tape, there is no logical tape available in current algorithm.
>> > To
>> > make it work, there are following possible ways
>> >
>> > 1. When random access is required, materialise the sorted runs from
>> > individual nodes onto tapes (one tape for each node) and then merge them
>> > on
>> > one extra tape, which can be used for materialisation.
>> > 2. Use a mix of connections and logical tape in the same tape set. Merge
>> > the
>> > sorted runs from connections on a logical tape in the same logical tape
>> > set.
>> >
>> > While the second one looks attractive from performance perspective (it
>> > saves
>> > writing and reading from the tape), it would make the merge code ugly by
>> > using mixed tapes. The read calls for connection and logical tape are
>> > different and we will need both on the logical tape where the final
>> > result
>> > is materialized. So, I am thinking of going with 1, in fact, to have
>> > same
>> > code to handle remote sort, use 1 in all cases (whether or not
>> > materialization is required).
>> >
>> > Had original authors of remote sort code thought about this
>> > materialization?
>> > Anything they can share on this topic?
>> > Any comment?
>> > --
>> > Best Wishes,
>> > Ashutosh Bapat
>> > EntepriseDB Corporation
>> > The Enterprise Postgres Company
>> >
>> >
>> > ------------------------------------------------------------------------------
>> > Everyone hates slow websites. So do we.
>> > Make your web apps faster with AppDynamics
>> > Download AppDynamics Lite for free today:
>> > http://p.sf.net/sfu/appdyn_d2d_mar
>> > _______________________________________________
>> > Postgres-xc-developers mailing list
>> > Pos...@li...
>> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>> >
>
>
>
>
> --
> Best Wishes,
> Ashutosh Bapat
> EntepriseDB Corporation
> The Enterprise Postgres Company