Re: [Postgres-xc-developers] Using remote sorting for merge-join

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Tue, Mar 26, 2013 at 10:19 AM, Koichi Suzuki
<koi...@gm...>wrote:

> On thing we should think for option 1 is:
>
> When a number of the result is huge, applications has to wait long
> time until they get the first row.  Because this option may need disk
> write, total resource consumption will be larger.
>
>
Yes, I am aware of this fact. Please read the next paragraph and you will
see that the current situation is no better.

> I'm wondering if we can use "cursor" at database so that we can read
> each tape more simply, I mean, to leave each query node open and read
> next row from any query node.
>
>
We do that right now. But because of such a simulated cursor (it's not
cursor per say, but we just fetch the required result from connection as
the demand arises in merging runs), we observer following things

If the plan has multiple remote query nodes (as there will be in case of
merge join), we assign the same connection to these nodes. Before this
assignment, the result from the previous connection is materialised at the
coordinator. This means that, when we will get huge result from the
datanode, it will be materialised (which will have the more cost as
materialising it on tape, as this materialisation happens in a linked list,
which is not optimized). We need to share connection between more than one
RemoteQuery node because same transaction can not work on two connections
to same server. Not only performance, but the code has become ugly because
of this approach. At various places in executor, we have special handling
for sorting, which needs to be maintained.

Instead if we materialise all the result on tape and then proceed with step
D5 in Knuth's algorithm for polyphase merge sort, the code will be much
simpler and we won't loose much performance. In fact, we might be able to
leverage fetching bulk data on connection which can be materialised on tape
in bulk.

> Regards;
> ----------
> Koichi Suzuki
>
>
> 2013/3/25 Ashutosh Bapat <ash...@en...>:
> > Hi All,
> > I am working on using remote sorting for merge joins. The idea is while
> > using merge join at the coordinator, get the data sorted from the
> datanodes;
> > for replicated relations, we can get all the rows sorted and for
> distributed
> > tables we have to get sorted runs which can be merged at the coordinator.
> > For merge join the sorted inner relation needs to be randomly accessible.
> > For replicated relations this can be achieved by materialising the
> result.
> > But for distributed relations, we do not materialise the sorted result at
> > coordinator but compute the sorted result by merging the sorted results
> from
> > individual nodes on the fly. For distributed relations, the connection to
> > the datanodes themselves are used as logical tapes (which provide the
> sorted
> > runs). The final result is computed on the fly by choosing the smallest
> or
> > greatest row (as required) from the connections.
> >
> > For a Sort node the materialised result can reside in memory (if it fits
> > there) or on one of the logical tapes used for merge sort. So, in order
> to
> > provide random access to the sorted result, we need to materialise the
> > result either in the memory or on the logical tape. In-memory
> > materialisation is not easily possible since we have already resorted for
> > tape based sort, in case of distributed relations and to materialise the
> > result on tape, there is no logical tape available in current algorithm.
> To
> > make it work, there are following possible ways
> >
> > 1. When random access is required, materialise the sorted runs from
> > individual nodes onto tapes (one tape for each node) and then merge them
> on
> > one extra tape, which can be used for materialisation.
> > 2. Use a mix of connections and logical tape in the same tape set. Merge
> the
> > sorted runs from connections on a logical tape in the same logical tape
> set.
> >
> > While the second one looks attractive from performance perspective (it
> saves
> > writing and reading from the tape), it would make the merge code ugly by
> > using mixed tapes. The read calls for connection and logical tape are
> > different and we will need both on the logical tape where the final
> result
> > is materialized. So, I am thinking of going with 1, in fact, to have same
> > code to handle remote sort, use 1 in all cases (whether or not
> > materialization is required).
> >
> > Had original authors of remote sort code thought about this
> materialization?
> > Anything they can share on this topic?
> > Any comment?
> > --
> > Best Wishes,
> > Ashutosh Bapat
> > EntepriseDB Corporation
> > The Enterprise Postgres Company
> >
> >
> ------------------------------------------------------------------------------
> > Everyone hates slow websites. So do we.
> > Make your web apps faster with AppDynamics
> > Download AppDynamics Lite for free today:
> > http://p.sf.net/sfu/appdyn_d2d_mar
> > _______________________________________________
> > Postgres-xc-developers mailing list
> > Pos...@li...
> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
> >
>

-- 
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Enterprise Postgres Company