From: Ashutosh B. <ash...@en...> - 2013-04-18 11:38:52
|
Did you do it on true cluster or by running all instances on same machine? The later would degrade the performance. On Thu, Apr 18, 2013 at 4:38 PM, Abbas Butt <abb...@en...>wrote: > > > On Thu, Apr 18, 2013 at 8:43 AM, Ashutosh Bapat < > ash...@en...> wrote: > >> Did you measure the performance? >> > > I tried but I was getting very strange numbers , It took some hours but > reported > > Time: 365649.353 ms > > which comes out to be some 6 minutes, I am not sure why. > > >> >> >> On Thu, Apr 18, 2013 at 9:02 AM, Abbas Butt <abb...@en...>wrote: >> >>> >>> >>> On Thu, Apr 18, 2013 at 1:07 AM, Abbas Butt <abb...@en... >>> > wrote: >>> >>>> Hi, >>>> Here is the review of the patch. >>>> >>>> Overall the patch is good to go. I have reviewed the code and found >>>> some minor errors, which I corrected and have attached the revised patch >>>> with the mail. >>>> >>>> I have tested both the cases when the sort happens in memory and when >>>> it happens using disk and found both working. >>>> >>>> I agree that the approach used in the patch is cleaner and has smaller >>>> footprint. >>>> >>>> I have corrected some white space errors and an unintentional change in >>>> function set_dbcleanup_callback >>>> git apply /home/edb/Desktop/MergeSort/xc_sort.patch >>>> /home/edb/Desktop/MergeSort/xc_sort.patch:539: trailing whitespace. >>>> void *fparams; >>>> /home/edb/Desktop/MergeSort/xc_sort.patch:1012: trailing whitespace. >>>> >>>> /home/edb/Desktop/MergeSort/xc_sort.patch:1018: trailing whitespace. >>>> >>>> /home/edb/Desktop/MergeSort/xc_sort.patch:1087: trailing whitespace. >>>> /* >>>> /home/edb/Desktop/MergeSort/xc_sort.patch:1228: trailing whitespace. >>>> size_t len, Oid msgnode_oid, >>>> warning: 5 lines add whitespace errors. >>>> >>>> I am leaving a query running for tonight which would sort 10M rows of a >>>> distributed table and would return top 100 of them. I would report its >>>> outcome tomorrow morning. >>>> >>> >>> It worked, here is the test case >>> >>> 1. create table test1 (id integer primary key , padding text); >>> 2. Load 10M rows >>> 3. select id from test1 order by 1 limit 100 >>> >>> >>> >>> >>>> >>>> Best Regards >>>> >>>> >>>> On Mon, Apr 1, 2013 at 11:02 AM, Koichi Suzuki < >>>> koi...@gm...> wrote: >>>> >>>>> Thanks. Then 90% improvement means about 53% of the duration, while >>>>> 50% means 67% of it. Number of queries in a given duration is 190 vs. >>>>> 150, difference is 40. >>>>> >>>>> Considering the needed resource, it may be okay to begin with >>>>> materialization. >>>>> >>>>> Any other inputs? >>>>> ---------- >>>>> Koichi Suzuki >>>>> >>>>> >>>>> 2013/4/1 Ashutosh Bapat <ash...@en...> >>>>> >>>>>> >>>>>> >>>>>> On Mon, Apr 1, 2013 at 10:59 AM, Koichi Suzuki < >>>>>> koi...@gm...> wrote: >>>>>> >>>>>>> I understand materialize everything makes code clearer and >>>>>>> implementation becomes simpler and better structured. >>>>>>> >>>>>>> What do you mean by x% improvement? Does 90% improvement mean the >>>>>>> total duration is 10% of the original? >>>>>>> >>>>>> x% improvement means, duration reduces to 100/(100+x) as compared to >>>>>> the non-pushdown scenario. Or in simpler words, we see (100+x) queries >>>>>> being completed by pushdown approach in the same time in which nonpushdown >>>>>> approach completes 100 queries. >>>>>> >>>>>>> ---------- >>>>>>> Koichi Suzuki >>>>>>> >>>>>>> >>>>>>> 2013/3/29 Ashutosh Bapat <ash...@en...> >>>>>>> >>>>>>>> Hi All, >>>>>>>> I measured the scale up for both approaches - a. using datanode >>>>>>>> connections as tapes (existing one) b. materialising result on tapes before >>>>>>>> merging (the approach I proposed). For 1M rows, 5 coordinators I have found >>>>>>>> that approach (a) gives 90% improvement whereas approach (b) gives 50% >>>>>>>> improvement. Although the difference is significant, I feel that approach >>>>>>>> (b) is much cleaner than approach (a) and doesn't have large footprint >>>>>>>> compared to PG code and it takes care of all the cases like 1. >>>>>>>> materialising sorted result, 2. takes care of any number of datanode >>>>>>>> connections without memory overrun. It's possible to improve it further if >>>>>>>> we avoid materialisation of datanode result in tuplestore. >>>>>>>> >>>>>>>> Patch attached for reference. >>>>>>>> >>>>>>>> On Tue, Mar 26, 2013 at 10:38 AM, Ashutosh Bapat < >>>>>>>> ash...@en...> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Mar 26, 2013 at 10:19 AM, Koichi Suzuki < >>>>>>>>> koi...@gm...> wrote: >>>>>>>>> >>>>>>>>>> On thing we should think for option 1 is: >>>>>>>>>> >>>>>>>>>> When a number of the result is huge, applications has to wait long >>>>>>>>>> time until they get the first row. Because this option may need >>>>>>>>>> disk >>>>>>>>>> write, total resource consumption will be larger. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> Yes, I am aware of this fact. Please read the next paragraph and >>>>>>>>> you will see that the current situation is no better. >>>>>>>>> >>>>>>>>> >>>>>>>>>> I'm wondering if we can use "cursor" at database so that we can >>>>>>>>>> read >>>>>>>>>> each tape more simply, I mean, to leave each query node open and >>>>>>>>>> read >>>>>>>>>> next row from any query node. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> We do that right now. But because of such a simulated cursor (it's >>>>>>>>> not cursor per say, but we just fetch the required result from connection >>>>>>>>> as the demand arises in merging runs), we observer following things >>>>>>>>> >>>>>>>>> If the plan has multiple remote query nodes (as there will be in >>>>>>>>> case of merge join), we assign the same connection to these nodes. Before >>>>>>>>> this assignment, the result from the previous connection is materialised at >>>>>>>>> the coordinator. This means that, when we will get huge result from the >>>>>>>>> datanode, it will be materialised (which will have the more cost as >>>>>>>>> materialising it on tape, as this materialisation happens in a linked list, >>>>>>>>> which is not optimized). We need to share connection between more than one >>>>>>>>> RemoteQuery node because same transaction can not work on two connections >>>>>>>>> to same server. Not only performance, but the code has become ugly because >>>>>>>>> of this approach. At various places in executor, we have special handling >>>>>>>>> for sorting, which needs to be maintained. >>>>>>>>> >>>>>>>>> Instead if we materialise all the result on tape and then proceed >>>>>>>>> with step D5 in Knuth's algorithm for polyphase merge sort, the code will >>>>>>>>> be much simpler and we won't loose much performance. In fact, we might be >>>>>>>>> able to leverage fetching bulk data on connection which can be materialised >>>>>>>>> on tape in bulk. >>>>>>>>> >>>>>>>>> >>>>>>>>>> Regards; >>>>>>>>>> ---------- >>>>>>>>>> Koichi Suzuki >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 2013/3/25 Ashutosh Bapat <ash...@en...>: >>>>>>>>>> > Hi All, >>>>>>>>>> > I am working on using remote sorting for merge joins. The idea >>>>>>>>>> is while >>>>>>>>>> > using merge join at the coordinator, get the data sorted from >>>>>>>>>> the datanodes; >>>>>>>>>> > for replicated relations, we can get all the rows sorted and >>>>>>>>>> for distributed >>>>>>>>>> > tables we have to get sorted runs which can be merged at the >>>>>>>>>> coordinator. >>>>>>>>>> > For merge join the sorted inner relation needs to be randomly >>>>>>>>>> accessible. >>>>>>>>>> > For replicated relations this can be achieved by materialising >>>>>>>>>> the result. >>>>>>>>>> > But for distributed relations, we do not materialise the sorted >>>>>>>>>> result at >>>>>>>>>> > coordinator but compute the sorted result by merging the sorted >>>>>>>>>> results from >>>>>>>>>> > individual nodes on the fly. For distributed relations, the >>>>>>>>>> connection to >>>>>>>>>> > the datanodes themselves are used as logical tapes (which >>>>>>>>>> provide the sorted >>>>>>>>>> > runs). The final result is computed on the fly by choosing the >>>>>>>>>> smallest or >>>>>>>>>> > greatest row (as required) from the connections. >>>>>>>>>> > >>>>>>>>>> > For a Sort node the materialised result can reside in memory >>>>>>>>>> (if it fits >>>>>>>>>> > there) or on one of the logical tapes used for merge sort. So, >>>>>>>>>> in order to >>>>>>>>>> > provide random access to the sorted result, we need to >>>>>>>>>> materialise the >>>>>>>>>> > result either in the memory or on the logical tape. In-memory >>>>>>>>>> > materialisation is not easily possible since we have already >>>>>>>>>> resorted for >>>>>>>>>> > tape based sort, in case of distributed relations and to >>>>>>>>>> materialise the >>>>>>>>>> > result on tape, there is no logical tape available in current >>>>>>>>>> algorithm. To >>>>>>>>>> > make it work, there are following possible ways >>>>>>>>>> > >>>>>>>>>> > 1. When random access is required, materialise the sorted runs >>>>>>>>>> from >>>>>>>>>> > individual nodes onto tapes (one tape for each node) and then >>>>>>>>>> merge them on >>>>>>>>>> > one extra tape, which can be used for materialisation. >>>>>>>>>> > 2. Use a mix of connections and logical tape in the same tape >>>>>>>>>> set. Merge the >>>>>>>>>> > sorted runs from connections on a logical tape in the same >>>>>>>>>> logical tape set. >>>>>>>>>> > >>>>>>>>>> > While the second one looks attractive from performance >>>>>>>>>> perspective (it saves >>>>>>>>>> > writing and reading from the tape), it would make the merge >>>>>>>>>> code ugly by >>>>>>>>>> > using mixed tapes. The read calls for connection and logical >>>>>>>>>> tape are >>>>>>>>>> > different and we will need both on the logical tape where the >>>>>>>>>> final result >>>>>>>>>> > is materialized. So, I am thinking of going with 1, in fact, to >>>>>>>>>> have same >>>>>>>>>> > code to handle remote sort, use 1 in all cases (whether or not >>>>>>>>>> > materialization is required). >>>>>>>>>> > >>>>>>>>>> > Had original authors of remote sort code thought about this >>>>>>>>>> materialization? >>>>>>>>>> > Anything they can share on this topic? >>>>>>>>>> > Any comment? >>>>>>>>>> > -- >>>>>>>>>> > Best Wishes, >>>>>>>>>> > Ashutosh Bapat >>>>>>>>>> > EntepriseDB Corporation >>>>>>>>>> > The Enterprise Postgres Company >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> ------------------------------------------------------------------------------ >>>>>>>>>> > Everyone hates slow websites. So do we. >>>>>>>>>> > Make your web apps faster with AppDynamics >>>>>>>>>> > Download AppDynamics Lite for free today: >>>>>>>>>> > http://p.sf.net/sfu/appdyn_d2d_mar >>>>>>>>>> > _______________________________________________ >>>>>>>>>> > Postgres-xc-developers mailing list >>>>>>>>>> > Pos...@li... >>>>>>>>>> > >>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>>>>>>>> > >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Best Wishes, >>>>>>>>> Ashutosh Bapat >>>>>>>>> EntepriseDB Corporation >>>>>>>>> The Enterprise Postgres Company >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Best Wishes, >>>>>>>> Ashutosh Bapat >>>>>>>> EntepriseDB Corporation >>>>>>>> The Enterprise Postgres Company >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Best Wishes, >>>>>> Ashutosh Bapat >>>>>> EntepriseDB Corporation >>>>>> The Enterprise Postgres Company >>>>>> >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Own the Future-Intel® Level Up Game Demo Contest 2013 >>>>> Rise to greatness in Intel's independent game demo contest. >>>>> Compete for recognition, cash, and the chance to get your game >>>>> on Steam. $5K grand prize plus 10 genre and skill prizes. >>>>> Submit your demo by 6/6/13. http://p.sf.net/sfu/intel_levelupd2d >>>>> _______________________________________________ >>>>> Postgres-xc-developers mailing list >>>>> Pos...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>>> >>>>> >>>> >>>> >>>> -- >>>> -- >>>> Abbas >>>> Architect >>>> EnterpriseDB Corporation >>>> The Enterprise PostgreSQL Company >>>> >>>> Phone: 92-334-5100153 >>>> >>>> Website: www.enterprisedb.com >>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>>> >>>> This e-mail message (and any attachment) is intended for the use of >>>> the individual or entity to whom it is addressed. This message >>>> contains information from EnterpriseDB Corporation that may be >>>> privileged, confidential, or exempt from disclosure under applicable >>>> law. If you are not the intended recipient or authorized to receive >>>> this for the intended recipient, any use, dissemination, distribution, >>>> retention, archiving, or copying of this communication is strictly >>>> prohibited. If you have received this e-mail in error, please notify >>>> the sender immediately by reply e-mail and delete this message. >>>> >>> >>> >>> >>> -- >>> -- >>> Abbas >>> Architect >>> EnterpriseDB Corporation >>> The Enterprise PostgreSQL Company >>> >>> Phone: 92-334-5100153 >>> >>> Website: www.enterprisedb.com >>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>> >>> This e-mail message (and any attachment) is intended for the use of >>> the individual or entity to whom it is addressed. This message >>> contains information from EnterpriseDB Corporation that may be >>> privileged, confidential, or exempt from disclosure under applicable >>> law. If you are not the intended recipient or authorized to receive >>> this for the intended recipient, any use, dissemination, distribution, >>> retention, archiving, or copying of this communication is strictly >>> prohibited. If you have received this e-mail in error, please notify >>> the sender immediately by reply e-mail and delete this message. >>> >> >> >> >> -- >> Best Wishes, >> Ashutosh Bapat >> EntepriseDB Corporation >> The Enterprise Postgres Company >> > > > > -- > -- > Abbas > Architect > EnterpriseDB Corporation > The Enterprise PostgreSQL Company > > Phone: 92-334-5100153 > > Website: www.enterprisedb.com > EnterpriseDB Blog: http://blogs.enterprisedb.com/ > Follow us on Twitter: http://www.twitter.com/enterprisedb > > This e-mail message (and any attachment) is intended for the use of > the individual or entity to whom it is addressed. This message > contains information from EnterpriseDB Corporation that may be > privileged, confidential, or exempt from disclosure under applicable > law. If you are not the intended recipient or authorized to receive > this for the intended recipient, any use, dissemination, distribution, > retention, archiving, or copying of this communication is strictly > prohibited. If you have received this e-mail in error, please notify > the sender immediately by reply e-mail and delete this message. > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |