From: Koichi S. <koi...@gm...> - 2013-04-01 06:02:25
|
Thanks. Then 90% improvement means about 53% of the duration, while 50% means 67% of it. Number of queries in a given duration is 190 vs. 150, difference is 40. Considering the needed resource, it may be okay to begin with materialization. Any other inputs? ---------- Koichi Suzuki 2013/4/1 Ashutosh Bapat <ash...@en...> > > > On Mon, Apr 1, 2013 at 10:59 AM, Koichi Suzuki <koi...@gm...>wrote: > >> I understand materialize everything makes code clearer and implementation >> becomes simpler and better structured. >> >> What do you mean by x% improvement? Does 90% improvement mean the total >> duration is 10% of the original? >> > x% improvement means, duration reduces to 100/(100+x) as compared to the > non-pushdown scenario. Or in simpler words, we see (100+x) queries being > completed by pushdown approach in the same time in which nonpushdown > approach completes 100 queries. > >> ---------- >> Koichi Suzuki >> >> >> 2013/3/29 Ashutosh Bapat <ash...@en...> >> >>> Hi All, >>> I measured the scale up for both approaches - a. using datanode >>> connections as tapes (existing one) b. materialising result on tapes before >>> merging (the approach I proposed). For 1M rows, 5 coordinators I have found >>> that approach (a) gives 90% improvement whereas approach (b) gives 50% >>> improvement. Although the difference is significant, I feel that approach >>> (b) is much cleaner than approach (a) and doesn't have large footprint >>> compared to PG code and it takes care of all the cases like 1. >>> materialising sorted result, 2. takes care of any number of datanode >>> connections without memory overrun. It's possible to improve it further if >>> we avoid materialisation of datanode result in tuplestore. >>> >>> Patch attached for reference. >>> >>> On Tue, Mar 26, 2013 at 10:38 AM, Ashutosh Bapat < >>> ash...@en...> wrote: >>> >>>> >>>> >>>> On Tue, Mar 26, 2013 at 10:19 AM, Koichi Suzuki < >>>> koi...@gm...> wrote: >>>> >>>>> On thing we should think for option 1 is: >>>>> >>>>> When a number of the result is huge, applications has to wait long >>>>> time until they get the first row. Because this option may need disk >>>>> write, total resource consumption will be larger. >>>>> >>>>> >>>> Yes, I am aware of this fact. Please read the next paragraph and you >>>> will see that the current situation is no better. >>>> >>>> >>>>> I'm wondering if we can use "cursor" at database so that we can read >>>>> each tape more simply, I mean, to leave each query node open and read >>>>> next row from any query node. >>>>> >>>>> >>>> We do that right now. But because of such a simulated cursor (it's not >>>> cursor per say, but we just fetch the required result from connection as >>>> the demand arises in merging runs), we observer following things >>>> >>>> If the plan has multiple remote query nodes (as there will be in case >>>> of merge join), we assign the same connection to these nodes. Before this >>>> assignment, the result from the previous connection is materialised at the >>>> coordinator. This means that, when we will get huge result from the >>>> datanode, it will be materialised (which will have the more cost as >>>> materialising it on tape, as this materialisation happens in a linked list, >>>> which is not optimized). We need to share connection between more than one >>>> RemoteQuery node because same transaction can not work on two connections >>>> to same server. Not only performance, but the code has become ugly because >>>> of this approach. At various places in executor, we have special handling >>>> for sorting, which needs to be maintained. >>>> >>>> Instead if we materialise all the result on tape and then proceed with >>>> step D5 in Knuth's algorithm for polyphase merge sort, the code will be >>>> much simpler and we won't loose much performance. In fact, we might be able >>>> to leverage fetching bulk data on connection which can be materialised on >>>> tape in bulk. >>>> >>>> >>>>> Regards; >>>>> ---------- >>>>> Koichi Suzuki >>>>> >>>>> >>>>> 2013/3/25 Ashutosh Bapat <ash...@en...>: >>>>> > Hi All, >>>>> > I am working on using remote sorting for merge joins. The idea is >>>>> while >>>>> > using merge join at the coordinator, get the data sorted from the >>>>> datanodes; >>>>> > for replicated relations, we can get all the rows sorted and for >>>>> distributed >>>>> > tables we have to get sorted runs which can be merged at the >>>>> coordinator. >>>>> > For merge join the sorted inner relation needs to be randomly >>>>> accessible. >>>>> > For replicated relations this can be achieved by materialising the >>>>> result. >>>>> > But for distributed relations, we do not materialise the sorted >>>>> result at >>>>> > coordinator but compute the sorted result by merging the sorted >>>>> results from >>>>> > individual nodes on the fly. For distributed relations, the >>>>> connection to >>>>> > the datanodes themselves are used as logical tapes (which provide >>>>> the sorted >>>>> > runs). The final result is computed on the fly by choosing the >>>>> smallest or >>>>> > greatest row (as required) from the connections. >>>>> > >>>>> > For a Sort node the materialised result can reside in memory (if it >>>>> fits >>>>> > there) or on one of the logical tapes used for merge sort. So, in >>>>> order to >>>>> > provide random access to the sorted result, we need to materialise >>>>> the >>>>> > result either in the memory or on the logical tape. In-memory >>>>> > materialisation is not easily possible since we have already >>>>> resorted for >>>>> > tape based sort, in case of distributed relations and to materialise >>>>> the >>>>> > result on tape, there is no logical tape available in current >>>>> algorithm. To >>>>> > make it work, there are following possible ways >>>>> > >>>>> > 1. When random access is required, materialise the sorted runs from >>>>> > individual nodes onto tapes (one tape for each node) and then merge >>>>> them on >>>>> > one extra tape, which can be used for materialisation. >>>>> > 2. Use a mix of connections and logical tape in the same tape set. >>>>> Merge the >>>>> > sorted runs from connections on a logical tape in the same logical >>>>> tape set. >>>>> > >>>>> > While the second one looks attractive from performance perspective >>>>> (it saves >>>>> > writing and reading from the tape), it would make the merge code >>>>> ugly by >>>>> > using mixed tapes. The read calls for connection and logical tape are >>>>> > different and we will need both on the logical tape where the final >>>>> result >>>>> > is materialized. So, I am thinking of going with 1, in fact, to have >>>>> same >>>>> > code to handle remote sort, use 1 in all cases (whether or not >>>>> > materialization is required). >>>>> > >>>>> > Had original authors of remote sort code thought about this >>>>> materialization? >>>>> > Anything they can share on this topic? >>>>> > Any comment? >>>>> > -- >>>>> > Best Wishes, >>>>> > Ashutosh Bapat >>>>> > EntepriseDB Corporation >>>>> > The Enterprise Postgres Company >>>>> > >>>>> > >>>>> ------------------------------------------------------------------------------ >>>>> > Everyone hates slow websites. So do we. >>>>> > Make your web apps faster with AppDynamics >>>>> > Download AppDynamics Lite for free today: >>>>> > http://p.sf.net/sfu/appdyn_d2d_mar >>>>> > _______________________________________________ >>>>> > Postgres-xc-developers mailing list >>>>> > Pos...@li... >>>>> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>>> > >>>>> >>>> >>>> >>>> >>>> -- >>>> Best Wishes, >>>> Ashutosh Bapat >>>> EntepriseDB Corporation >>>> The Enterprise Postgres Company >>>> >>> >>> >>> >>> -- >>> Best Wishes, >>> Ashutosh Bapat >>> EntepriseDB Corporation >>> The Enterprise Postgres Company >>> >> >> > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > |