From: Ashutosh B. <ash...@en...> - 2013-04-18 03:43:38
|
Did you measure the performance? On Thu, Apr 18, 2013 at 9:02 AM, Abbas Butt <abb...@en...>wrote: > > > On Thu, Apr 18, 2013 at 1:07 AM, Abbas Butt <abb...@en...>wrote: > >> Hi, >> Here is the review of the patch. >> >> Overall the patch is good to go. I have reviewed the code and found some >> minor errors, which I corrected and have attached the revised patch with >> the mail. >> >> I have tested both the cases when the sort happens in memory and when it >> happens using disk and found both working. >> >> I agree that the approach used in the patch is cleaner and has smaller >> footprint. >> >> I have corrected some white space errors and an unintentional change in >> function set_dbcleanup_callback >> git apply /home/edb/Desktop/MergeSort/xc_sort.patch >> /home/edb/Desktop/MergeSort/xc_sort.patch:539: trailing whitespace. >> void *fparams; >> /home/edb/Desktop/MergeSort/xc_sort.patch:1012: trailing whitespace. >> >> /home/edb/Desktop/MergeSort/xc_sort.patch:1018: trailing whitespace. >> >> /home/edb/Desktop/MergeSort/xc_sort.patch:1087: trailing whitespace. >> /* >> /home/edb/Desktop/MergeSort/xc_sort.patch:1228: trailing whitespace. >> size_t len, Oid msgnode_oid, >> warning: 5 lines add whitespace errors. >> >> I am leaving a query running for tonight which would sort 10M rows of a >> distributed table and would return top 100 of them. I would report its >> outcome tomorrow morning. >> > > It worked, here is the test case > > 1. create table test1 (id integer primary key , padding text); > 2. Load 10M rows > 3. select id from test1 order by 1 limit 100 > > > > >> >> Best Regards >> >> >> On Mon, Apr 1, 2013 at 11:02 AM, Koichi Suzuki <koi...@gm... >> > wrote: >> >>> Thanks. Then 90% improvement means about 53% of the duration, while 50% >>> means 67% of it. Number of queries in a given duration is 190 vs. 150, >>> difference is 40. >>> >>> Considering the needed resource, it may be okay to begin with >>> materialization. >>> >>> Any other inputs? >>> ---------- >>> Koichi Suzuki >>> >>> >>> 2013/4/1 Ashutosh Bapat <ash...@en...> >>> >>>> >>>> >>>> On Mon, Apr 1, 2013 at 10:59 AM, Koichi Suzuki < >>>> koi...@gm...> wrote: >>>> >>>>> I understand materialize everything makes code clearer and >>>>> implementation becomes simpler and better structured. >>>>> >>>>> What do you mean by x% improvement? Does 90% improvement mean the >>>>> total duration is 10% of the original? >>>>> >>>> x% improvement means, duration reduces to 100/(100+x) as compared to >>>> the non-pushdown scenario. Or in simpler words, we see (100+x) queries >>>> being completed by pushdown approach in the same time in which nonpushdown >>>> approach completes 100 queries. >>>> >>>>> ---------- >>>>> Koichi Suzuki >>>>> >>>>> >>>>> 2013/3/29 Ashutosh Bapat <ash...@en...> >>>>> >>>>>> Hi All, >>>>>> I measured the scale up for both approaches - a. using datanode >>>>>> connections as tapes (existing one) b. materialising result on tapes before >>>>>> merging (the approach I proposed). For 1M rows, 5 coordinators I have found >>>>>> that approach (a) gives 90% improvement whereas approach (b) gives 50% >>>>>> improvement. Although the difference is significant, I feel that approach >>>>>> (b) is much cleaner than approach (a) and doesn't have large footprint >>>>>> compared to PG code and it takes care of all the cases like 1. >>>>>> materialising sorted result, 2. takes care of any number of datanode >>>>>> connections without memory overrun. It's possible to improve it further if >>>>>> we avoid materialisation of datanode result in tuplestore. >>>>>> >>>>>> Patch attached for reference. >>>>>> >>>>>> On Tue, Mar 26, 2013 at 10:38 AM, Ashutosh Bapat < >>>>>> ash...@en...> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Mar 26, 2013 at 10:19 AM, Koichi Suzuki < >>>>>>> koi...@gm...> wrote: >>>>>>> >>>>>>>> On thing we should think for option 1 is: >>>>>>>> >>>>>>>> When a number of the result is huge, applications has to wait long >>>>>>>> time until they get the first row. Because this option may need >>>>>>>> disk >>>>>>>> write, total resource consumption will be larger. >>>>>>>> >>>>>>>> >>>>>>> Yes, I am aware of this fact. Please read the next paragraph and you >>>>>>> will see that the current situation is no better. >>>>>>> >>>>>>> >>>>>>>> I'm wondering if we can use "cursor" at database so that we can read >>>>>>>> each tape more simply, I mean, to leave each query node open and >>>>>>>> read >>>>>>>> next row from any query node. >>>>>>>> >>>>>>>> >>>>>>> We do that right now. But because of such a simulated cursor (it's >>>>>>> not cursor per say, but we just fetch the required result from connection >>>>>>> as the demand arises in merging runs), we observer following things >>>>>>> >>>>>>> If the plan has multiple remote query nodes (as there will be in >>>>>>> case of merge join), we assign the same connection to these nodes. Before >>>>>>> this assignment, the result from the previous connection is materialised at >>>>>>> the coordinator. This means that, when we will get huge result from the >>>>>>> datanode, it will be materialised (which will have the more cost as >>>>>>> materialising it on tape, as this materialisation happens in a linked list, >>>>>>> which is not optimized). We need to share connection between more than one >>>>>>> RemoteQuery node because same transaction can not work on two connections >>>>>>> to same server. Not only performance, but the code has become ugly because >>>>>>> of this approach. At various places in executor, we have special handling >>>>>>> for sorting, which needs to be maintained. >>>>>>> >>>>>>> Instead if we materialise all the result on tape and then proceed >>>>>>> with step D5 in Knuth's algorithm for polyphase merge sort, the code will >>>>>>> be much simpler and we won't loose much performance. In fact, we might be >>>>>>> able to leverage fetching bulk data on connection which can be materialised >>>>>>> on tape in bulk. >>>>>>> >>>>>>> >>>>>>>> Regards; >>>>>>>> ---------- >>>>>>>> Koichi Suzuki >>>>>>>> >>>>>>>> >>>>>>>> 2013/3/25 Ashutosh Bapat <ash...@en...>: >>>>>>>> > Hi All, >>>>>>>> > I am working on using remote sorting for merge joins. The idea is >>>>>>>> while >>>>>>>> > using merge join at the coordinator, get the data sorted from the >>>>>>>> datanodes; >>>>>>>> > for replicated relations, we can get all the rows sorted and for >>>>>>>> distributed >>>>>>>> > tables we have to get sorted runs which can be merged at the >>>>>>>> coordinator. >>>>>>>> > For merge join the sorted inner relation needs to be randomly >>>>>>>> accessible. >>>>>>>> > For replicated relations this can be achieved by materialising >>>>>>>> the result. >>>>>>>> > But for distributed relations, we do not materialise the sorted >>>>>>>> result at >>>>>>>> > coordinator but compute the sorted result by merging the sorted >>>>>>>> results from >>>>>>>> > individual nodes on the fly. For distributed relations, the >>>>>>>> connection to >>>>>>>> > the datanodes themselves are used as logical tapes (which provide >>>>>>>> the sorted >>>>>>>> > runs). The final result is computed on the fly by choosing the >>>>>>>> smallest or >>>>>>>> > greatest row (as required) from the connections. >>>>>>>> > >>>>>>>> > For a Sort node the materialised result can reside in memory (if >>>>>>>> it fits >>>>>>>> > there) or on one of the logical tapes used for merge sort. So, in >>>>>>>> order to >>>>>>>> > provide random access to the sorted result, we need to >>>>>>>> materialise the >>>>>>>> > result either in the memory or on the logical tape. In-memory >>>>>>>> > materialisation is not easily possible since we have already >>>>>>>> resorted for >>>>>>>> > tape based sort, in case of distributed relations and to >>>>>>>> materialise the >>>>>>>> > result on tape, there is no logical tape available in current >>>>>>>> algorithm. To >>>>>>>> > make it work, there are following possible ways >>>>>>>> > >>>>>>>> > 1. When random access is required, materialise the sorted runs >>>>>>>> from >>>>>>>> > individual nodes onto tapes (one tape for each node) and then >>>>>>>> merge them on >>>>>>>> > one extra tape, which can be used for materialisation. >>>>>>>> > 2. Use a mix of connections and logical tape in the same tape >>>>>>>> set. Merge the >>>>>>>> > sorted runs from connections on a logical tape in the same >>>>>>>> logical tape set. >>>>>>>> > >>>>>>>> > While the second one looks attractive from performance >>>>>>>> perspective (it saves >>>>>>>> > writing and reading from the tape), it would make the merge code >>>>>>>> ugly by >>>>>>>> > using mixed tapes. The read calls for connection and logical tape >>>>>>>> are >>>>>>>> > different and we will need both on the logical tape where the >>>>>>>> final result >>>>>>>> > is materialized. So, I am thinking of going with 1, in fact, to >>>>>>>> have same >>>>>>>> > code to handle remote sort, use 1 in all cases (whether or not >>>>>>>> > materialization is required). >>>>>>>> > >>>>>>>> > Had original authors of remote sort code thought about this >>>>>>>> materialization? >>>>>>>> > Anything they can share on this topic? >>>>>>>> > Any comment? >>>>>>>> > -- >>>>>>>> > Best Wishes, >>>>>>>> > Ashutosh Bapat >>>>>>>> > EntepriseDB Corporation >>>>>>>> > The Enterprise Postgres Company >>>>>>>> > >>>>>>>> > >>>>>>>> ------------------------------------------------------------------------------ >>>>>>>> > Everyone hates slow websites. So do we. >>>>>>>> > Make your web apps faster with AppDynamics >>>>>>>> > Download AppDynamics Lite for free today: >>>>>>>> > http://p.sf.net/sfu/appdyn_d2d_mar >>>>>>>> > _______________________________________________ >>>>>>>> > Postgres-xc-developers mailing list >>>>>>>> > Pos...@li... >>>>>>>> > >>>>>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>>>>>> > >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Best Wishes, >>>>>>> Ashutosh Bapat >>>>>>> EntepriseDB Corporation >>>>>>> The Enterprise Postgres Company >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Best Wishes, >>>>>> Ashutosh Bapat >>>>>> EntepriseDB Corporation >>>>>> The Enterprise Postgres Company >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Best Wishes, >>>> Ashutosh Bapat >>>> EntepriseDB Corporation >>>> The Enterprise Postgres Company >>>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Own the Future-Intel® Level Up Game Demo Contest 2013 >>> Rise to greatness in Intel's independent game demo contest. >>> Compete for recognition, cash, and the chance to get your game >>> on Steam. $5K grand prize plus 10 genre and skill prizes. >>> Submit your demo by 6/6/13. http://p.sf.net/sfu/intel_levelupd2d >>> _______________________________________________ >>> Postgres-xc-developers mailing list >>> Pos...@li... >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>> >>> >> >> >> -- >> -- >> Abbas >> Architect >> EnterpriseDB Corporation >> The Enterprise PostgreSQL Company >> >> Phone: 92-334-5100153 >> >> Website: www.enterprisedb.com >> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> Follow us on Twitter: http://www.twitter.com/enterprisedb >> >> This e-mail message (and any attachment) is intended for the use of >> the individual or entity to whom it is addressed. This message >> contains information from EnterpriseDB Corporation that may be >> privileged, confidential, or exempt from disclosure under applicable >> law. If you are not the intended recipient or authorized to receive >> this for the intended recipient, any use, dissemination, distribution, >> retention, archiving, or copying of this communication is strictly >> prohibited. If you have received this e-mail in error, please notify >> the sender immediately by reply e-mail and delete this message. >> > > > > -- > -- > Abbas > Architect > EnterpriseDB Corporation > The Enterprise PostgreSQL Company > > Phone: 92-334-5100153 > > Website: www.enterprisedb.com > EnterpriseDB Blog: http://blogs.enterprisedb.com/ > Follow us on Twitter: http://www.twitter.com/enterprisedb > > This e-mail message (and any attachment) is intended for the use of > the individual or entity to whom it is addressed. This message > contains information from EnterpriseDB Corporation that may be > privileged, confidential, or exempt from disclosure under applicable > law. If you are not the intended recipient or authorized to receive > this for the intended recipient, any use, dissemination, distribution, > retention, archiving, or copying of this communication is strictly > prohibited. If you have received this e-mail in error, please notify > the sender immediately by reply e-mail and delete this message. > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |