From: Ashutosh B. <ash...@en...> - 2013-04-19 03:42:58
|
Then the perf is expected. There is too much IO on the same machine and too much context switch. On Thu, Apr 18, 2013 at 7:35 PM, Abbas Butt <abb...@en...>wrote: > All instances on the same machine. > > > On Thu, Apr 18, 2013 at 4:38 PM, Ashutosh Bapat < > ash...@en...> wrote: > >> Did you do it on true cluster or by running all instances on same >> machine? The later would degrade the performance. >> >> >> On Thu, Apr 18, 2013 at 4:38 PM, Abbas Butt <abb...@en...>wrote: >> >>> >>> >>> On Thu, Apr 18, 2013 at 8:43 AM, Ashutosh Bapat < >>> ash...@en...> wrote: >>> >>>> Did you measure the performance? >>>> >>> >>> I tried but I was getting very strange numbers , It took some hours but >>> reported >>> >>> Time: 365649.353 ms >>> >>> which comes out to be some 6 minutes, I am not sure why. >>> >>> >>>> >>>> >>>> On Thu, Apr 18, 2013 at 9:02 AM, Abbas Butt < >>>> abb...@en...> wrote: >>>> >>>>> >>>>> >>>>> On Thu, Apr 18, 2013 at 1:07 AM, Abbas Butt < >>>>> abb...@en...> wrote: >>>>> >>>>>> Hi, >>>>>> Here is the review of the patch. >>>>>> >>>>>> Overall the patch is good to go. I have reviewed the code and found >>>>>> some minor errors, which I corrected and have attached the revised patch >>>>>> with the mail. >>>>>> >>>>>> I have tested both the cases when the sort happens in memory and when >>>>>> it happens using disk and found both working. >>>>>> >>>>>> I agree that the approach used in the patch is cleaner and has >>>>>> smaller footprint. >>>>>> >>>>>> I have corrected some white space errors and an unintentional change >>>>>> in function set_dbcleanup_callback >>>>>> git apply /home/edb/Desktop/MergeSort/xc_sort.patch >>>>>> /home/edb/Desktop/MergeSort/xc_sort.patch:539: trailing >>>>>> whitespace. >>>>>> void *fparams; >>>>>> /home/edb/Desktop/MergeSort/xc_sort.patch:1012: trailing >>>>>> whitespace. >>>>>> >>>>>> /home/edb/Desktop/MergeSort/xc_sort.patch:1018: trailing >>>>>> whitespace. >>>>>> >>>>>> /home/edb/Desktop/MergeSort/xc_sort.patch:1087: trailing >>>>>> whitespace. >>>>>> /* >>>>>> /home/edb/Desktop/MergeSort/xc_sort.patch:1228: trailing >>>>>> whitespace. >>>>>> size_t len, Oid msgnode_oid, >>>>>> warning: 5 lines add whitespace errors. >>>>>> >>>>>> I am leaving a query running for tonight which would sort 10M rows of >>>>>> a distributed table and would return top 100 of them. I would report its >>>>>> outcome tomorrow morning. >>>>>> >>>>> >>>>> It worked, here is the test case >>>>> >>>>> 1. create table test1 (id integer primary key , padding text); >>>>> 2. Load 10M rows >>>>> 3. select id from test1 order by 1 limit 100 >>>>> >>>>> >>>>> >>>>> >>>>>> >>>>>> Best Regards >>>>>> >>>>>> >>>>>> On Mon, Apr 1, 2013 at 11:02 AM, Koichi Suzuki < >>>>>> koi...@gm...> wrote: >>>>>> >>>>>>> Thanks. Then 90% improvement means about 53% of the duration, while >>>>>>> 50% means 67% of it. Number of queries in a given duration is 190 vs. >>>>>>> 150, difference is 40. >>>>>>> >>>>>>> Considering the needed resource, it may be okay to begin with >>>>>>> materialization. >>>>>>> >>>>>>> Any other inputs? >>>>>>> ---------- >>>>>>> Koichi Suzuki >>>>>>> >>>>>>> >>>>>>> 2013/4/1 Ashutosh Bapat <ash...@en...> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Apr 1, 2013 at 10:59 AM, Koichi Suzuki < >>>>>>>> koi...@gm...> wrote: >>>>>>>> >>>>>>>>> I understand materialize everything makes code clearer and >>>>>>>>> implementation becomes simpler and better structured. >>>>>>>>> >>>>>>>>> What do you mean by x% improvement? Does 90% improvement mean >>>>>>>>> the total duration is 10% of the original? >>>>>>>>> >>>>>>>> x% improvement means, duration reduces to 100/(100+x) as compared >>>>>>>> to the non-pushdown scenario. Or in simpler words, we see (100+x) queries >>>>>>>> being completed by pushdown approach in the same time in which nonpushdown >>>>>>>> approach completes 100 queries. >>>>>>>> >>>>>>>>> ---------- >>>>>>>>> Koichi Suzuki >>>>>>>>> >>>>>>>>> >>>>>>>>> 2013/3/29 Ashutosh Bapat <ash...@en...> >>>>>>>>> >>>>>>>>>> Hi All, >>>>>>>>>> I measured the scale up for both approaches - a. using datanode >>>>>>>>>> connections as tapes (existing one) b. materialising result on tapes before >>>>>>>>>> merging (the approach I proposed). For 1M rows, 5 coordinators I have found >>>>>>>>>> that approach (a) gives 90% improvement whereas approach (b) gives 50% >>>>>>>>>> improvement. Although the difference is significant, I feel that approach >>>>>>>>>> (b) is much cleaner than approach (a) and doesn't have large footprint >>>>>>>>>> compared to PG code and it takes care of all the cases like 1. >>>>>>>>>> materialising sorted result, 2. takes care of any number of datanode >>>>>>>>>> connections without memory overrun. It's possible to improve it further if >>>>>>>>>> we avoid materialisation of datanode result in tuplestore. >>>>>>>>>> >>>>>>>>>> Patch attached for reference. >>>>>>>>>> >>>>>>>>>> On Tue, Mar 26, 2013 at 10:38 AM, Ashutosh Bapat < >>>>>>>>>> ash...@en...> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Mar 26, 2013 at 10:19 AM, Koichi Suzuki < >>>>>>>>>>> koi...@gm...> wrote: >>>>>>>>>>> >>>>>>>>>>>> On thing we should think for option 1 is: >>>>>>>>>>>> >>>>>>>>>>>> When a number of the result is huge, applications has to wait >>>>>>>>>>>> long >>>>>>>>>>>> time until they get the first row. Because this option may >>>>>>>>>>>> need disk >>>>>>>>>>>> write, total resource consumption will be larger. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> Yes, I am aware of this fact. Please read the next paragraph and >>>>>>>>>>> you will see that the current situation is no better. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> I'm wondering if we can use "cursor" at database so that we can >>>>>>>>>>>> read >>>>>>>>>>>> each tape more simply, I mean, to leave each query node open >>>>>>>>>>>> and read >>>>>>>>>>>> next row from any query node. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> We do that right now. But because of such a simulated cursor >>>>>>>>>>> (it's not cursor per say, but we just fetch the required result from >>>>>>>>>>> connection as the demand arises in merging runs), we observer following >>>>>>>>>>> things >>>>>>>>>>> >>>>>>>>>>> If the plan has multiple remote query nodes (as there will be in >>>>>>>>>>> case of merge join), we assign the same connection to these nodes. Before >>>>>>>>>>> this assignment, the result from the previous connection is materialised at >>>>>>>>>>> the coordinator. This means that, when we will get huge result from the >>>>>>>>>>> datanode, it will be materialised (which will have the more cost as >>>>>>>>>>> materialising it on tape, as this materialisation happens in a linked list, >>>>>>>>>>> which is not optimized). We need to share connection between more than one >>>>>>>>>>> RemoteQuery node because same transaction can not work on two connections >>>>>>>>>>> to same server. Not only performance, but the code has become ugly because >>>>>>>>>>> of this approach. At various places in executor, we have special handling >>>>>>>>>>> for sorting, which needs to be maintained. >>>>>>>>>>> >>>>>>>>>>> Instead if we materialise all the result on tape and then >>>>>>>>>>> proceed with step D5 in Knuth's algorithm for polyphase merge sort, the >>>>>>>>>>> code will be much simpler and we won't loose much performance. In fact, we >>>>>>>>>>> might be able to leverage fetching bulk data on connection which can be >>>>>>>>>>> materialised on tape in bulk. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Regards; >>>>>>>>>>>> ---------- >>>>>>>>>>>> Koichi Suzuki >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> 2013/3/25 Ashutosh Bapat <ash...@en...>: >>>>>>>>>>>> > Hi All, >>>>>>>>>>>> > I am working on using remote sorting for merge joins. The >>>>>>>>>>>> idea is while >>>>>>>>>>>> > using merge join at the coordinator, get the data sorted from >>>>>>>>>>>> the datanodes; >>>>>>>>>>>> > for replicated relations, we can get all the rows sorted and >>>>>>>>>>>> for distributed >>>>>>>>>>>> > tables we have to get sorted runs which can be merged at the >>>>>>>>>>>> coordinator. >>>>>>>>>>>> > For merge join the sorted inner relation needs to be randomly >>>>>>>>>>>> accessible. >>>>>>>>>>>> > For replicated relations this can be achieved by >>>>>>>>>>>> materialising the result. >>>>>>>>>>>> > But for distributed relations, we do not materialise the >>>>>>>>>>>> sorted result at >>>>>>>>>>>> > coordinator but compute the sorted result by merging the >>>>>>>>>>>> sorted results from >>>>>>>>>>>> > individual nodes on the fly. For distributed relations, the >>>>>>>>>>>> connection to >>>>>>>>>>>> > the datanodes themselves are used as logical tapes (which >>>>>>>>>>>> provide the sorted >>>>>>>>>>>> > runs). The final result is computed on the fly by choosing >>>>>>>>>>>> the smallest or >>>>>>>>>>>> > greatest row (as required) from the connections. >>>>>>>>>>>> > >>>>>>>>>>>> > For a Sort node the materialised result can reside in memory >>>>>>>>>>>> (if it fits >>>>>>>>>>>> > there) or on one of the logical tapes used for merge sort. >>>>>>>>>>>> So, in order to >>>>>>>>>>>> > provide random access to the sorted result, we need to >>>>>>>>>>>> materialise the >>>>>>>>>>>> > result either in the memory or on the logical tape. In-memory >>>>>>>>>>>> > materialisation is not easily possible since we have already >>>>>>>>>>>> resorted for >>>>>>>>>>>> > tape based sort, in case of distributed relations and to >>>>>>>>>>>> materialise the >>>>>>>>>>>> > result on tape, there is no logical tape available in current >>>>>>>>>>>> algorithm. To >>>>>>>>>>>> > make it work, there are following possible ways >>>>>>>>>>>> > >>>>>>>>>>>> > 1. When random access is required, materialise the sorted >>>>>>>>>>>> runs from >>>>>>>>>>>> > individual nodes onto tapes (one tape for each node) and then >>>>>>>>>>>> merge them on >>>>>>>>>>>> > one extra tape, which can be used for materialisation. >>>>>>>>>>>> > 2. Use a mix of connections and logical tape in the same tape >>>>>>>>>>>> set. Merge the >>>>>>>>>>>> > sorted runs from connections on a logical tape in the same >>>>>>>>>>>> logical tape set. >>>>>>>>>>>> > >>>>>>>>>>>> > While the second one looks attractive from performance >>>>>>>>>>>> perspective (it saves >>>>>>>>>>>> > writing and reading from the tape), it would make the merge >>>>>>>>>>>> code ugly by >>>>>>>>>>>> > using mixed tapes. The read calls for connection and logical >>>>>>>>>>>> tape are >>>>>>>>>>>> > different and we will need both on the logical tape where the >>>>>>>>>>>> final result >>>>>>>>>>>> > is materialized. So, I am thinking of going with 1, in fact, >>>>>>>>>>>> to have same >>>>>>>>>>>> > code to handle remote sort, use 1 in all cases (whether or not >>>>>>>>>>>> > materialization is required). >>>>>>>>>>>> > >>>>>>>>>>>> > Had original authors of remote sort code thought about this >>>>>>>>>>>> materialization? >>>>>>>>>>>> > Anything they can share on this topic? >>>>>>>>>>>> > Any comment? >>>>>>>>>>>> > -- >>>>>>>>>>>> > Best Wishes, >>>>>>>>>>>> > Ashutosh Bapat >>>>>>>>>>>> > EntepriseDB Corporation >>>>>>>>>>>> > The Enterprise Postgres Company >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> ------------------------------------------------------------------------------ >>>>>>>>>>>> > Everyone hates slow websites. So do we. >>>>>>>>>>>> > Make your web apps faster with AppDynamics >>>>>>>>>>>> > Download AppDynamics Lite for free today: >>>>>>>>>>>> > http://p.sf.net/sfu/appdyn_d2d_mar >>>>>>>>>>>> > _______________________________________________ >>>>>>>>>>>> > Postgres-xc-developers mailing list >>>>>>>>>>>> > Pos...@li... >>>>>>>>>>>> > >>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>>>>>>>>>> > >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Best Wishes, >>>>>>>>>>> Ashutosh Bapat >>>>>>>>>>> EntepriseDB Corporation >>>>>>>>>>> The Enterprise Postgres Company >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Best Wishes, >>>>>>>>>> Ashutosh Bapat >>>>>>>>>> EntepriseDB Corporation >>>>>>>>>> The Enterprise Postgres Company >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Best Wishes, >>>>>>>> Ashutosh Bapat >>>>>>>> EntepriseDB Corporation >>>>>>>> The Enterprise Postgres Company >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------------------------ >>>>>>> Own the Future-Intel® Level Up Game Demo Contest 2013 >>>>>>> Rise to greatness in Intel's independent game demo contest. >>>>>>> Compete for recognition, cash, and the chance to get your game >>>>>>> on Steam. $5K grand prize plus 10 genre and skill prizes. >>>>>>> Submit your demo by 6/6/13. http://p.sf.net/sfu/intel_levelupd2d >>>>>>> _______________________________________________ >>>>>>> Postgres-xc-developers mailing list >>>>>>> Pos...@li... >>>>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> -- >>>>>> Abbas >>>>>> Architect >>>>>> EnterpriseDB Corporation >>>>>> The Enterprise PostgreSQL Company >>>>>> >>>>>> Phone: 92-334-5100153 >>>>>> >>>>>> Website: www.enterprisedb.com >>>>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>>>>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>>>>> >>>>>> This e-mail message (and any attachment) is intended for the use of >>>>>> the individual or entity to whom it is addressed. This message >>>>>> contains information from EnterpriseDB Corporation that may be >>>>>> privileged, confidential, or exempt from disclosure under applicable >>>>>> law. If you are not the intended recipient or authorized to receive >>>>>> this for the intended recipient, any use, dissemination, distribution, >>>>>> retention, archiving, or copying of this communication is strictly >>>>>> prohibited. If you have received this e-mail in error, please notify >>>>>> the sender immediately by reply e-mail and delete this message. >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> -- >>>>> Abbas >>>>> Architect >>>>> EnterpriseDB Corporation >>>>> The Enterprise PostgreSQL Company >>>>> >>>>> Phone: 92-334-5100153 >>>>> >>>>> Website: www.enterprisedb.com >>>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>>>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>>>> >>>>> This e-mail message (and any attachment) is intended for the use of >>>>> the individual or entity to whom it is addressed. This message >>>>> contains information from EnterpriseDB Corporation that may be >>>>> privileged, confidential, or exempt from disclosure under applicable >>>>> law. If you are not the intended recipient or authorized to receive >>>>> this for the intended recipient, any use, dissemination, distribution, >>>>> retention, archiving, or copying of this communication is strictly >>>>> prohibited. If you have received this e-mail in error, please notify >>>>> the sender immediately by reply e-mail and delete this message. >>>>> >>>> >>>> >>>> >>>> -- >>>> Best Wishes, >>>> Ashutosh Bapat >>>> EntepriseDB Corporation >>>> The Enterprise Postgres Company >>>> >>> >>> >>> >>> -- >>> -- >>> Abbas >>> Architect >>> EnterpriseDB Corporation >>> The Enterprise PostgreSQL Company >>> >>> Phone: 92-334-5100153 >>> >>> Website: www.enterprisedb.com >>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>> >>> This e-mail message (and any attachment) is intended for the use of >>> the individual or entity to whom it is addressed. This message >>> contains information from EnterpriseDB Corporation that may be >>> privileged, confidential, or exempt from disclosure under applicable >>> law. If you are not the intended recipient or authorized to receive >>> this for the intended recipient, any use, dissemination, distribution, >>> retention, archiving, or copying of this communication is strictly >>> prohibited. If you have received this e-mail in error, please notify >>> the sender immediately by reply e-mail and delete this message. >>> >> >> >> >> -- >> Best Wishes, >> Ashutosh Bapat >> EntepriseDB Corporation >> The Enterprise Postgres Company >> > > > > -- > -- > Abbas > Architect > EnterpriseDB Corporation > The Enterprise PostgreSQL Company > > Phone: 92-334-5100153 > > Website: www.enterprisedb.com > EnterpriseDB Blog: http://blogs.enterprisedb.com/ > Follow us on Twitter: http://www.twitter.com/enterprisedb > > This e-mail message (and any attachment) is intended for the use of > the individual or entity to whom it is addressed. This message > contains information from EnterpriseDB Corporation that may be > privileged, confidential, or exempt from disclosure under applicable > law. If you are not the intended recipient or authorized to receive > this for the intended recipient, any use, dissemination, distribution, > retention, archiving, or copying of this communication is strictly > prohibited. If you have received this e-mail in error, please notify > the sender immediately by reply e-mail and delete this message. > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |