Re: [Postgres-xc-developers] Using remote sorting for merge-join

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Thanks.  Then 90% improvement means about 53% of the duration, while 50%
means 67% of it.   Number of queries in a given duration is 190 vs. 150,
difference is 40.

Considering the needed resource, it may be okay to begin with
materialization.

Any other inputs?
----------
Koichi Suzuki

2013/4/1 Ashutosh Bapat <ash...@en...>

>
>
> On Mon, Apr 1, 2013 at 10:59 AM, Koichi Suzuki <koi...@gm...>wrote:
>
>> I understand materialize everything makes code clearer and implementation
>> becomes simpler and better structured.
>>
>> What do you mean by x% improvement?   Does 90% improvement mean the total
>> duration is 10% of the original?
>>
> x% improvement means, duration reduces to 100/(100+x) as compared to the
> non-pushdown scenario. Or in simpler words, we see (100+x) queries being
> completed by pushdown approach in the same time in which nonpushdown
> approach completes 100 queries.
>
>> ----------
>> Koichi Suzuki
>>
>>
>> 2013/3/29 Ashutosh Bapat <ash...@en...>
>>
>>> Hi All,
>>> I measured the scale up for both approaches - a. using datanode
>>> connections as tapes (existing one) b. materialising result on tapes before
>>> merging (the approach I proposed). For 1M rows, 5 coordinators I have found
>>> that approach (a) gives 90% improvement whereas approach (b) gives 50%
>>> improvement. Although the difference is significant, I feel that approach
>>> (b) is much cleaner than approach (a) and doesn't have large footprint
>>> compared to PG code and it takes care of all the cases like 1.
>>> materialising sorted result, 2. takes care of any number of datanode
>>> connections without memory overrun. It's possible to improve it further if
>>> we avoid materialisation of datanode result in tuplestore.
>>>
>>> Patch attached for reference.
>>>
>>> On Tue, Mar 26, 2013 at 10:38 AM, Ashutosh Bapat <
>>> ash...@en...> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Mar 26, 2013 at 10:19 AM, Koichi Suzuki <
>>>> koi...@gm...> wrote:
>>>>
>>>>> On thing we should think for option 1 is:
>>>>>
>>>>> When a number of the result is huge, applications has to wait long
>>>>> time until they get the first row.  Because this option may need disk
>>>>> write, total resource consumption will be larger.
>>>>>
>>>>>
>>>> Yes, I am aware of this fact. Please read the next paragraph and you
>>>> will see that the current situation is no better.
>>>>
>>>>
>>>>> I'm wondering if we can use "cursor" at database so that we can read
>>>>> each tape more simply, I mean, to leave each query node open and read
>>>>> next row from any query node.
>>>>>
>>>>>
>>>> We do that right now. But because of such a simulated cursor (it's not
>>>> cursor per say, but we just fetch the required result from connection as
>>>> the demand arises in merging runs), we observer following things
>>>>
>>>> If the plan has multiple remote query nodes (as there will be in case
>>>> of merge join), we assign the same connection to these nodes. Before this
>>>> assignment, the result from the previous connection is materialised at the
>>>> coordinator. This means that, when we will get huge result from the
>>>> datanode, it will be materialised (which will have the more cost as
>>>> materialising it on tape, as this materialisation happens in a linked list,
>>>> which is not optimized). We need to share connection between more than one
>>>> RemoteQuery node because same transaction can not work on two connections
>>>> to same server. Not only performance, but the code has become ugly because
>>>> of this approach. At various places in executor, we have special handling
>>>> for sorting, which needs to be maintained.
>>>>
>>>> Instead if we materialise all the result on tape and then proceed with
>>>> step D5 in Knuth's algorithm for polyphase merge sort, the code will be
>>>> much simpler and we won't loose much performance. In fact, we might be able
>>>> to leverage fetching bulk data on connection which can be materialised on
>>>> tape in bulk.
>>>>
>>>>
>>>>> Regards;
>>>>> ----------
>>>>> Koichi Suzuki
>>>>>
>>>>>
>>>>> 2013/3/25 Ashutosh Bapat <ash...@en...>:
>>>>> > Hi All,
>>>>> > I am working on using remote sorting for merge joins. The idea is
>>>>> while
>>>>> > using merge join at the coordinator, get the data sorted from the
>>>>> datanodes;
>>>>> > for replicated relations, we can get all the rows sorted and for
>>>>> distributed
>>>>> > tables we have to get sorted runs which can be merged at the
>>>>> coordinator.
>>>>> > For merge join the sorted inner relation needs to be randomly
>>>>> accessible.
>>>>> > For replicated relations this can be achieved by materialising the
>>>>> result.
>>>>> > But for distributed relations, we do not materialise the sorted
>>>>> result at
>>>>> > coordinator but compute the sorted result by merging the sorted
>>>>> results from
>>>>> > individual nodes on the fly. For distributed relations, the
>>>>> connection to
>>>>> > the datanodes themselves are used as logical tapes (which provide
>>>>> the sorted
>>>>> > runs). The final result is computed on the fly by choosing the
>>>>> smallest or
>>>>> > greatest row (as required) from the connections.
>>>>> >
>>>>> > For a Sort node the materialised result can reside in memory (if it
>>>>> fits
>>>>> > there) or on one of the logical tapes used for merge sort. So, in
>>>>> order to
>>>>> > provide random access to the sorted result, we need to materialise
>>>>> the
>>>>> > result either in the memory or on the logical tape. In-memory
>>>>> > materialisation is not easily possible since we have already
>>>>> resorted for
>>>>> > tape based sort, in case of distributed relations and to materialise
>>>>> the
>>>>> > result on tape, there is no logical tape available in current
>>>>> algorithm. To
>>>>> > make it work, there are following possible ways
>>>>> >
>>>>> > 1. When random access is required, materialise the sorted runs from
>>>>> > individual nodes onto tapes (one tape for each node) and then merge
>>>>> them on
>>>>> > one extra tape, which can be used for materialisation.
>>>>> > 2. Use a mix of connections and logical tape in the same tape set.
>>>>> Merge the
>>>>> > sorted runs from connections on a logical tape in the same logical
>>>>> tape set.
>>>>> >
>>>>> > While the second one looks attractive from performance perspective
>>>>> (it saves
>>>>> > writing and reading from the tape), it would make the merge code
>>>>> ugly by
>>>>> > using mixed tapes. The read calls for connection and logical tape are
>>>>> > different and we will need both on the logical tape where the final
>>>>> result
>>>>> > is materialized. So, I am thinking of going with 1, in fact, to have
>>>>> same
>>>>> > code to handle remote sort, use 1 in all cases (whether or not
>>>>> > materialization is required).
>>>>> >
>>>>> > Had original authors of remote sort code thought about this
>>>>> materialization?
>>>>> > Anything they can share on this topic?
>>>>> > Any comment?
>>>>> > --
>>>>> > Best Wishes,
>>>>> > Ashutosh Bapat
>>>>> > EntepriseDB Corporation
>>>>> > The Enterprise Postgres Company
>>>>> >
>>>>> >
>>>>> ------------------------------------------------------------------------------
>>>>> > Everyone hates slow websites. So do we.
>>>>> > Make your web apps faster with AppDynamics
>>>>> > Download AppDynamics Lite for free today:
>>>>> > http://p.sf.net/sfu/appdyn_d2d_mar
>>>>> > _______________________________________________
>>>>> > Postgres-xc-developers mailing list
>>>>> > Pos...@li...
>>>>> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>>>>> >
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Wishes,
>>>> Ashutosh Bapat
>>>> EntepriseDB Corporation
>>>> The Enterprise Postgres Company
>>>>
>>>
>>>
>>>
>>> --
>>> Best Wishes,
>>> Ashutosh Bapat
>>> EntepriseDB Corporation
>>> The Enterprise Postgres Company
>>>
>>
>>
>
>
> --
> Best Wishes,
> Ashutosh Bapat
> EntepriseDB Corporation
> The Enterprise Postgres Company
>