From: Ashutosh B. <ash...@en...> - 2013-04-01 05:48:28
|
On Mon, Apr 1, 2013 at 10:59 AM, Koichi Suzuki <koi...@gm...>wrote: > I understand materialize everything makes code clearer and implementation > becomes simpler and better structured. > > What do you mean by x% improvement? Does 90% improvement mean the total > duration is 10% of the original? > x% improvement means, duration reduces to 100/(100+x) as compared to the non-pushdown scenario. Or in simpler words, we see (100+x) queries being completed by pushdown approach in the same time in which nonpushdown approach completes 100 queries. > ---------- > Koichi Suzuki > > > 2013/3/29 Ashutosh Bapat <ash...@en...> > >> Hi All, >> I measured the scale up for both approaches - a. using datanode >> connections as tapes (existing one) b. materialising result on tapes before >> merging (the approach I proposed). For 1M rows, 5 coordinators I have found >> that approach (a) gives 90% improvement whereas approach (b) gives 50% >> improvement. Although the difference is significant, I feel that approach >> (b) is much cleaner than approach (a) and doesn't have large footprint >> compared to PG code and it takes care of all the cases like 1. >> materialising sorted result, 2. takes care of any number of datanode >> connections without memory overrun. It's possible to improve it further if >> we avoid materialisation of datanode result in tuplestore. >> >> Patch attached for reference. >> >> On Tue, Mar 26, 2013 at 10:38 AM, Ashutosh Bapat < >> ash...@en...> wrote: >> >>> >>> >>> On Tue, Mar 26, 2013 at 10:19 AM, Koichi Suzuki < >>> koi...@gm...> wrote: >>> >>>> On thing we should think for option 1 is: >>>> >>>> When a number of the result is huge, applications has to wait long >>>> time until they get the first row. Because this option may need disk >>>> write, total resource consumption will be larger. >>>> >>>> >>> Yes, I am aware of this fact. Please read the next paragraph and you >>> will see that the current situation is no better. >>> >>> >>>> I'm wondering if we can use "cursor" at database so that we can read >>>> each tape more simply, I mean, to leave each query node open and read >>>> next row from any query node. >>>> >>>> >>> We do that right now. But because of such a simulated cursor (it's not >>> cursor per say, but we just fetch the required result from connection as >>> the demand arises in merging runs), we observer following things >>> >>> If the plan has multiple remote query nodes (as there will be in case of >>> merge join), we assign the same connection to these nodes. Before this >>> assignment, the result from the previous connection is materialised at the >>> coordinator. This means that, when we will get huge result from the >>> datanode, it will be materialised (which will have the more cost as >>> materialising it on tape, as this materialisation happens in a linked list, >>> which is not optimized). We need to share connection between more than one >>> RemoteQuery node because same transaction can not work on two connections >>> to same server. Not only performance, but the code has become ugly because >>> of this approach. At various places in executor, we have special handling >>> for sorting, which needs to be maintained. >>> >>> Instead if we materialise all the result on tape and then proceed with >>> step D5 in Knuth's algorithm for polyphase merge sort, the code will be >>> much simpler and we won't loose much performance. In fact, we might be able >>> to leverage fetching bulk data on connection which can be materialised on >>> tape in bulk. >>> >>> >>>> Regards; >>>> ---------- >>>> Koichi Suzuki >>>> >>>> >>>> 2013/3/25 Ashutosh Bapat <ash...@en...>: >>>> > Hi All, >>>> > I am working on using remote sorting for merge joins. The idea is >>>> while >>>> > using merge join at the coordinator, get the data sorted from the >>>> datanodes; >>>> > for replicated relations, we can get all the rows sorted and for >>>> distributed >>>> > tables we have to get sorted runs which can be merged at the >>>> coordinator. >>>> > For merge join the sorted inner relation needs to be randomly >>>> accessible. >>>> > For replicated relations this can be achieved by materialising the >>>> result. >>>> > But for distributed relations, we do not materialise the sorted >>>> result at >>>> > coordinator but compute the sorted result by merging the sorted >>>> results from >>>> > individual nodes on the fly. For distributed relations, the >>>> connection to >>>> > the datanodes themselves are used as logical tapes (which provide the >>>> sorted >>>> > runs). The final result is computed on the fly by choosing the >>>> smallest or >>>> > greatest row (as required) from the connections. >>>> > >>>> > For a Sort node the materialised result can reside in memory (if it >>>> fits >>>> > there) or on one of the logical tapes used for merge sort. So, in >>>> order to >>>> > provide random access to the sorted result, we need to materialise the >>>> > result either in the memory or on the logical tape. In-memory >>>> > materialisation is not easily possible since we have already resorted >>>> for >>>> > tape based sort, in case of distributed relations and to materialise >>>> the >>>> > result on tape, there is no logical tape available in current >>>> algorithm. To >>>> > make it work, there are following possible ways >>>> > >>>> > 1. When random access is required, materialise the sorted runs from >>>> > individual nodes onto tapes (one tape for each node) and then merge >>>> them on >>>> > one extra tape, which can be used for materialisation. >>>> > 2. Use a mix of connections and logical tape in the same tape set. >>>> Merge the >>>> > sorted runs from connections on a logical tape in the same logical >>>> tape set. >>>> > >>>> > While the second one looks attractive from performance perspective >>>> (it saves >>>> > writing and reading from the tape), it would make the merge code ugly >>>> by >>>> > using mixed tapes. The read calls for connection and logical tape are >>>> > different and we will need both on the logical tape where the final >>>> result >>>> > is materialized. So, I am thinking of going with 1, in fact, to have >>>> same >>>> > code to handle remote sort, use 1 in all cases (whether or not >>>> > materialization is required). >>>> > >>>> > Had original authors of remote sort code thought about this >>>> materialization? >>>> > Anything they can share on this topic? >>>> > Any comment? >>>> > -- >>>> > Best Wishes, >>>> > Ashutosh Bapat >>>> > EntepriseDB Corporation >>>> > The Enterprise Postgres Company >>>> > >>>> > >>>> ------------------------------------------------------------------------------ >>>> > Everyone hates slow websites. So do we. >>>> > Make your web apps faster with AppDynamics >>>> > Download AppDynamics Lite for free today: >>>> > http://p.sf.net/sfu/appdyn_d2d_mar >>>> > _______________________________________________ >>>> > Postgres-xc-developers mailing list >>>> > Pos...@li... >>>> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>> > >>>> >>> >>> >>> >>> -- >>> Best Wishes, >>> Ashutosh Bapat >>> EntepriseDB Corporation >>> The Enterprise Postgres Company >>> >> >> >> >> -- >> Best Wishes, >> Ashutosh Bapat >> EntepriseDB Corporation >> The Enterprise Postgres Company >> > > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |