From: Koichi S. <koi...@gm...> - 2013-04-01 05:29:42
|
I understand materialize everything makes code clearer and implementation becomes simpler and better structured. What do you mean by x% improvement? Does 90% improvement mean the total duration is 10% of the original? ---------- Koichi Suzuki 2013/3/29 Ashutosh Bapat <ash...@en...> > Hi All, > I measured the scale up for both approaches - a. using datanode > connections as tapes (existing one) b. materialising result on tapes before > merging (the approach I proposed). For 1M rows, 5 coordinators I have found > that approach (a) gives 90% improvement whereas approach (b) gives 50% > improvement. Although the difference is significant, I feel that approach > (b) is much cleaner than approach (a) and doesn't have large footprint > compared to PG code and it takes care of all the cases like 1. > materialising sorted result, 2. takes care of any number of datanode > connections without memory overrun. It's possible to improve it further if > we avoid materialisation of datanode result in tuplestore. > > Patch attached for reference. > > On Tue, Mar 26, 2013 at 10:38 AM, Ashutosh Bapat < > ash...@en...> wrote: > >> >> >> On Tue, Mar 26, 2013 at 10:19 AM, Koichi Suzuki < >> koi...@gm...> wrote: >> >>> On thing we should think for option 1 is: >>> >>> When a number of the result is huge, applications has to wait long >>> time until they get the first row. Because this option may need disk >>> write, total resource consumption will be larger. >>> >>> >> Yes, I am aware of this fact. Please read the next paragraph and you will >> see that the current situation is no better. >> >> >>> I'm wondering if we can use "cursor" at database so that we can read >>> each tape more simply, I mean, to leave each query node open and read >>> next row from any query node. >>> >>> >> We do that right now. But because of such a simulated cursor (it's not >> cursor per say, but we just fetch the required result from connection as >> the demand arises in merging runs), we observer following things >> >> If the plan has multiple remote query nodes (as there will be in case of >> merge join), we assign the same connection to these nodes. Before this >> assignment, the result from the previous connection is materialised at the >> coordinator. This means that, when we will get huge result from the >> datanode, it will be materialised (which will have the more cost as >> materialising it on tape, as this materialisation happens in a linked list, >> which is not optimized). We need to share connection between more than one >> RemoteQuery node because same transaction can not work on two connections >> to same server. Not only performance, but the code has become ugly because >> of this approach. At various places in executor, we have special handling >> for sorting, which needs to be maintained. >> >> Instead if we materialise all the result on tape and then proceed with >> step D5 in Knuth's algorithm for polyphase merge sort, the code will be >> much simpler and we won't loose much performance. In fact, we might be able >> to leverage fetching bulk data on connection which can be materialised on >> tape in bulk. >> >> >>> Regards; >>> ---------- >>> Koichi Suzuki >>> >>> >>> 2013/3/25 Ashutosh Bapat <ash...@en...>: >>> > Hi All, >>> > I am working on using remote sorting for merge joins. The idea is while >>> > using merge join at the coordinator, get the data sorted from the >>> datanodes; >>> > for replicated relations, we can get all the rows sorted and for >>> distributed >>> > tables we have to get sorted runs which can be merged at the >>> coordinator. >>> > For merge join the sorted inner relation needs to be randomly >>> accessible. >>> > For replicated relations this can be achieved by materialising the >>> result. >>> > But for distributed relations, we do not materialise the sorted result >>> at >>> > coordinator but compute the sorted result by merging the sorted >>> results from >>> > individual nodes on the fly. For distributed relations, the connection >>> to >>> > the datanodes themselves are used as logical tapes (which provide the >>> sorted >>> > runs). The final result is computed on the fly by choosing the >>> smallest or >>> > greatest row (as required) from the connections. >>> > >>> > For a Sort node the materialised result can reside in memory (if it >>> fits >>> > there) or on one of the logical tapes used for merge sort. So, in >>> order to >>> > provide random access to the sorted result, we need to materialise the >>> > result either in the memory or on the logical tape. In-memory >>> > materialisation is not easily possible since we have already resorted >>> for >>> > tape based sort, in case of distributed relations and to materialise >>> the >>> > result on tape, there is no logical tape available in current >>> algorithm. To >>> > make it work, there are following possible ways >>> > >>> > 1. When random access is required, materialise the sorted runs from >>> > individual nodes onto tapes (one tape for each node) and then merge >>> them on >>> > one extra tape, which can be used for materialisation. >>> > 2. Use a mix of connections and logical tape in the same tape set. >>> Merge the >>> > sorted runs from connections on a logical tape in the same logical >>> tape set. >>> > >>> > While the second one looks attractive from performance perspective (it >>> saves >>> > writing and reading from the tape), it would make the merge code ugly >>> by >>> > using mixed tapes. The read calls for connection and logical tape are >>> > different and we will need both on the logical tape where the final >>> result >>> > is materialized. So, I am thinking of going with 1, in fact, to have >>> same >>> > code to handle remote sort, use 1 in all cases (whether or not >>> > materialization is required). >>> > >>> > Had original authors of remote sort code thought about this >>> materialization? >>> > Anything they can share on this topic? >>> > Any comment? >>> > -- >>> > Best Wishes, >>> > Ashutosh Bapat >>> > EntepriseDB Corporation >>> > The Enterprise Postgres Company >>> > >>> > >>> ------------------------------------------------------------------------------ >>> > Everyone hates slow websites. So do we. >>> > Make your web apps faster with AppDynamics >>> > Download AppDynamics Lite for free today: >>> > http://p.sf.net/sfu/appdyn_d2d_mar >>> > _______________________________________________ >>> > Postgres-xc-developers mailing list >>> > Pos...@li... >>> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>> > >>> >> >> >> >> -- >> Best Wishes, >> Ashutosh Bapat >> EntepriseDB Corporation >> The Enterprise Postgres Company >> > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > |