From: Gökhan S. <gok...@gm...> - 2012-07-04 17:17:45
|
Hello, I am working on creating some distribution plots to analyze cloud droplet and drop features. You can see one such plot at http://atmos.uwyo.edu/~gsever/data/rf06_1second/rf06_belowcloud_SurfaceArea_1second.pdf This file contains 38 pages and each page has 16 panels created via MPL's AxesGrid toolkit. I am using PdfPages from pdf backend profile to construct this multi-page plot. The original code that is used to create this plot is in http://code.google.com/p/ccnworks/source/browse/trunk/parcel_drizzle/rf06_moments.py The problem I am reporting is due to the lengthier plot creation times. It takes about 4 minutes to create such plot in my laptop. To better demonstrate the issue I created a sample script which you can use to reproduce my timing results --well based on pseudo/random data points. All my data points in the original script are float64 so I use float64 in the sample script as well. The script is at http://atmos.uwyo.edu/~gsever/data/matplotlib/test_speed.pyI also included 2 pages output running the script with nums=2 setting http://atmos.uwyo.edu/~gsever/data/matplotlib/test_speed.pdf Comparing my original output, indeed cloud particles are not from a normal distribution :) Joke aside, running with nums=2 for 2 pages time run test_speed.py CPU times: user 12.39 s, sys: 0.10 s, total: 12.49 s Wall time: 12.84 s when nums=38, just like my original script, then I get similar timing to my original run time run test.py CPU times: user 227.39 s, sys: 1.74 s, total: 229.13 s Wall time: 234.87 s In addition to these longer plot creation times, 38 pages plot creation consumes about 3 GB memory. I am wondering if there are tricks to improve plot creation times as well as more efficiently using the memory. Attempting to create two such distributions blocks my machine eating 6 GB of ram space. Using Python 2.7, NumPy 2.0.0.dev-7e202a2, IPython 0.13.beta1, matplotlib 1.1.1rc on Fedora 16 (x86_64) Thanks. -- Gökhan |
From: Nicolas R. <Nic...@in...> - 2012-07-05 09:40:42
|
Your files do not seem to be readable: http://atmos.uwyo.edu/~gsever/data/matplotlib/test_speed.py http://atmos.uwyo.edu/~gsever/data/matplotlib/test_speed.pdf Nicolas On Jul 4, 2012, at 19:17 , Gökhan Sever wrote: > Hello, > > I am working on creating some distribution plots to analyze cloud droplet and drop features. You can see one such plot at http://atmos.uwyo.edu/~gsever/data/rf06_1second/rf06_belowcloud_SurfaceArea_1second.pdf > This file contains 38 pages and each page has 16 panels created via MPL's AxesGrid toolkit. I am using PdfPages from pdf backend profile to construct this multi-page plot. The original code that is used to create this plot is in http://code.google.com/p/ccnworks/source/browse/trunk/parcel_drizzle/rf06_moments.py > > The problem I am reporting is due to the lengthier plot creation times. It takes about 4 minutes to create such plot in my laptop. To better demonstrate the issue I created a sample script which you can use to reproduce my timing results --well based on pseudo/random data points. All my data points in the original script are float64 so I use float64 in the sample script as well. > > The script is at http://atmos.uwyo.edu/~gsever/data/matplotlib/test_speed.py I also included 2 pages output running the script with nums=2 setting http://atmos.uwyo.edu/~gsever/data/matplotlib/test_speed.pdf > Comparing my original output, indeed cloud particles are not from a normal distribution :) > > Joke aside, running with nums=2 for 2 pages > > time run test_speed.py > CPU times: user 12.39 s, sys: 0.10 s, total: 12.49 s > Wall time: 12.84 s > > when nums=38, just like my original script, then I get similar timing to my original run > > time run test.py > CPU times: user 227.39 s, sys: 1.74 s, total: 229.13 s > Wall time: 234.87 s > > In addition to these longer plot creation times, 38 pages plot creation consumes about 3 GB memory. I am wondering if there are tricks to improve plot creation times as well as more efficiently using the memory. Attempting to create two such distributions blocks my machine eating 6 GB of ram space. > > Using Python 2.7, NumPy 2.0.0.dev-7e202a2, IPython 0.13.beta1, matplotlib 1.1.1rc on Fedora 16 (x86_64) > > Thanks. > > -- > Gökhan > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/_______________________________________________ > Matplotlib-users mailing list > Mat...@li... > https://lists.sourceforge.net/lists/listinfo/matplotlib-users |
From: Benjamin R. <ben...@ou...> - 2012-07-05 13:30:25
|
On Wed, Jul 4, 2012 at 1:17 PM, Gökhan Sever <gok...@gm...> wrote: > Hello, > > I am working on creating some distribution plots to analyze cloud droplet > and drop features. You can see one such plot at > http://atmos.uwyo.edu/~gsever/data/rf06_1second/rf06_belowcloud_SurfaceArea_1second.pdf > This file contains 38 pages and each page has 16 panels created via > MPL's AxesGrid toolkit. I am using PdfPages from pdf backend profile to > construct this multi-page plot. The original code that is used to create > this plot is in > http://code.google.com/p/ccnworks/source/browse/trunk/parcel_drizzle/rf06_moments.py > > The problem I am reporting is due to the lengthier plot creation times. It > takes about 4 minutes to create such plot in my laptop. To better > demonstrate the issue I created a sample script which you can use to > reproduce my timing results --well based on pseudo/random data points. All > my data points in the original script are float64 so I use float64 in the > sample script as well. > > The script is at > http://atmos.uwyo.edu/~gsever/data/matplotlib/test_speed.py I also > included 2 pages output running the script with nums=2 setting > http://atmos.uwyo.edu/~gsever/data/matplotlib/test_speed.pdf > Comparing my original output, indeed cloud particles are not from a normal > distribution :) > > Joke aside, running with nums=2 for 2 pages > > time run test_speed.py > CPU times: user 12.39 s, sys: 0.10 s, total: 12.49 s > Wall time: 12.84 s > > when nums=38, just like my original script, then I get similar timing to > my original run > > time run test.py > CPU times: user 227.39 s, sys: 1.74 s, total: 229.13 s > Wall time: 234.87 s > > In addition to these longer plot creation times, 38 pages plot creation > consumes about 3 GB memory. I am wondering if there are tricks to improve > plot creation times as well as more efficiently using the memory. > Attempting to create two such distributions blocks my machine eating 6 GB > of ram space. > > Using Python 2.7, NumPy 2.0.0.dev-7e202a2, IPython > 0.13.beta1, matplotlib 1.1.1rc on Fedora 16 (x86_64) > > Thanks. > > -- > Gökhan > > Gokhan, Looking through your code, I see that you have all of the figure objects available all at once, rather than one at a time. In belowcloud_M0(), you create all of your figure objects and AxesGrid objects in list comprehensions, and then you have multiple for-loops that performs a particular action on each of these. Then you create your PdfPages object and loop over each of the figures, saving it to the page. I would do it quite differently. At the beginning of the function, create your PdfPages object. Then have a single loop over "range(nums)" where you create a figure object and an AxesGrid object. Do your 16 (or less) plots, and any other text you need for that figure. Save it to the PdfPage object, and then close the figure object. When the loop is done, close the PdfPages object. I think you will see huge performance improvement that way. Cheers! Ben Root |
From: Gökhan S. <gok...@gm...> - 2012-07-05 14:45:20
|
On Thu, Jul 5, 2012 at 7:29 AM, Benjamin Root <ben...@ou...> wrote: > > > On Wed, Jul 4, 2012 at 1:17 PM, Gökhan Sever <gok...@gm...>wrote: > >> Hello, >> >> I am working on creating some distribution plots to analyze cloud droplet >> and drop features. You can see one such plot at >> http://atmos.uwyo.edu/~gsever/data/rf06_1second/rf06_belowcloud_SurfaceArea_1second.pdf >> This file contains 38 pages and each page has 16 panels created via >> MPL's AxesGrid toolkit. I am using PdfPages from pdf backend profile to >> construct this multi-page plot. The original code that is used to create >> this plot is in >> http://code.google.com/p/ccnworks/source/browse/trunk/parcel_drizzle/rf06_moments.py >> >> The problem I am reporting is due to the lengthier plot creation times. >> It takes about 4 minutes to create such plot in my laptop. To better >> demonstrate the issue I created a sample script which you can use to >> reproduce my timing results --well based on pseudo/random data points. All >> my data points in the original script are float64 so I use float64 in the >> sample script as well. >> >> The script is at >> http://atmos.uwyo.edu/~gsever/data/matplotlib/test_speed.py I also >> included 2 pages output running the script with nums=2 setting >> http://atmos.uwyo.edu/~gsever/data/matplotlib/test_speed.pdf >> Comparing my original output, indeed cloud particles are not from a >> normal distribution :) >> >> Joke aside, running with nums=2 for 2 pages >> >> time run test_speed.py >> CPU times: user 12.39 s, sys: 0.10 s, total: 12.49 s >> Wall time: 12.84 s >> >> when nums=38, just like my original script, then I get similar timing to >> my original run >> >> time run test.py >> CPU times: user 227.39 s, sys: 1.74 s, total: 229.13 s >> Wall time: 234.87 s >> >> In addition to these longer plot creation times, 38 pages plot creation >> consumes about 3 GB memory. I am wondering if there are tricks to improve >> plot creation times as well as more efficiently using the memory. >> Attempting to create two such distributions blocks my machine eating 6 GB >> of ram space. >> >> Using Python 2.7, NumPy 2.0.0.dev-7e202a2, IPython >> 0.13.beta1, matplotlib 1.1.1rc on Fedora 16 (x86_64) >> >> Thanks. >> >> -- >> Gökhan >> >> > Gokhan, > > Looking through your code, I see that you have all of the figure objects > available all at once, rather than one at a time. In belowcloud_M0(), you > create all of your figure objects and AxesGrid objects in list > comprehensions, and then you have multiple for-loops that performs a > particular action on each of these. Then you create your PdfPages object > and loop over each of the figures, saving it to the page. > > I would do it quite differently. At the beginning of the function, create > your PdfPages object. Then have a single loop over "range(nums)" where you > create a figure object and an AxesGrid object. Do your 16 (or less) plots, > and any other text you need for that figure. Save it to the PdfPage > object, and then close the figure object. When the loop is done, close the > PdfPages object. > > I think you will see huge performance improvement that way. > > Cheers! > Ben Root > > Hi, Could you try the files again? I believe I have given read permission for outside access. http://atmos.uwyo.edu/~gsever/data/matplotlib/test_speed.py http://atmos.uwyo.edu/~gsever/data/matplotlib/test_speed.pdf Ben, Thanks for your suggestion. I will give it a try and report back here. -- Gökhan |
From: Gökhan S. <gok...@gm...> - 2012-07-05 15:36:52
|
On Thu, Jul 5, 2012 at 8:45 AM, Gökhan Sever <gok...@gm...> wrote: > On Thu, Jul 5, 2012 at 7:29 AM, Benjamin Root <ben...@ou...> wrote: > >> >> >> On Wed, Jul 4, 2012 at 1:17 PM, Gökhan Sever <gok...@gm...>wrote: >> >>> Hello, >>> >>> I am working on creating some distribution plots to analyze cloud >>> droplet and drop features. You can see one such plot at >>> http://atmos.uwyo.edu/~gsever/data/rf06_1second/rf06_belowcloud_SurfaceArea_1second.pdf >>> This file contains 38 pages and each page has 16 panels created via >>> MPL's AxesGrid toolkit. I am using PdfPages from pdf backend profile to >>> construct this multi-page plot. The original code that is used to create >>> this plot is in >>> http://code.google.com/p/ccnworks/source/browse/trunk/parcel_drizzle/rf06_moments.py >>> >>> The problem I am reporting is due to the lengthier plot creation times. >>> It takes about 4 minutes to create such plot in my laptop. To better >>> demonstrate the issue I created a sample script which you can use to >>> reproduce my timing results --well based on pseudo/random data points. All >>> my data points in the original script are float64 so I use float64 in the >>> sample script as well. >>> >>> The script is at >>> http://atmos.uwyo.edu/~gsever/data/matplotlib/test_speed.py I also >>> included 2 pages output running the script with nums=2 setting >>> http://atmos.uwyo.edu/~gsever/data/matplotlib/test_speed.pdf >>> Comparing my original output, indeed cloud particles are not from a >>> normal distribution :) >>> >>> Joke aside, running with nums=2 for 2 pages >>> >>> time run test_speed.py >>> CPU times: user 12.39 s, sys: 0.10 s, total: 12.49 s >>> Wall time: 12.84 s >>> >>> when nums=38, just like my original script, then I get similar timing to >>> my original run >>> >>> time run test.py >>> CPU times: user 227.39 s, sys: 1.74 s, total: 229.13 s >>> Wall time: 234.87 s >>> >>> In addition to these longer plot creation times, 38 pages plot creation >>> consumes about 3 GB memory. I am wondering if there are tricks to improve >>> plot creation times as well as more efficiently using the memory. >>> Attempting to create two such distributions blocks my machine eating 6 GB >>> of ram space. >>> >>> Using Python 2.7, NumPy 2.0.0.dev-7e202a2, IPython >>> 0.13.beta1, matplotlib 1.1.1rc on Fedora 16 (x86_64) >>> >>> Thanks. >>> >>> -- >>> Gökhan >>> >>> >> Gokhan, >> >> Looking through your code, I see that you have all of the figure objects >> available all at once, rather than one at a time. In belowcloud_M0(), you >> create all of your figure objects and AxesGrid objects in list >> comprehensions, and then you have multiple for-loops that performs a >> particular action on each of these. Then you create your PdfPages object >> and loop over each of the figures, saving it to the page. >> >> I would do it quite differently. At the beginning of the function, >> create your PdfPages object. Then have a single loop over "range(nums)" >> where you create a figure object and an AxesGrid object. Do your 16 (or >> less) plots, and any other text you need for that figure. Save it to the >> PdfPage object, and then close the figure object. When the loop is done, >> close the PdfPages object. >> >> I think you will see huge performance improvement that way. >> >> Cheers! >> Ben Root >> >> > > Hi, > > Could you try the files again? I believe I have given read permission for > outside access. > > http://atmos.uwyo.edu/~gsever/data/matplotlib/test_speed.py > http://atmos.uwyo.edu/~gsever/data/matplotlib/test_speed.pdf > > > Ben, > > Thanks for your suggestion. I will give it a try and report back here. > > > -- > Gökhan > Hi, Please ignore the "pass" line in the first for loop in http://atmos.uwyo.edu/~gsever/data/matplotlib/test_speed.py I have the 2nd version of this script at http://atmos.uwyo.edu/~gsever/data/matplotlib/test_speed2.py following Ben's suggestion. Now there are two main loops, the former is for one Figure and AxesGrid object creation, then the inner loop is for plotting and decorating the grid objects. I am assuming that my inner loop is correct. Tracking the xx variable helps me to plot the right index from the concX data array on to the right grid element. Timings are below. As you can see, these are similar to my test_speed.py version of the runs. nums = 2 I1 time run test_speed2.py CPU times: user 10.85 s, sys: 0.10 s, total: 10.95 s Wall time: 11.19 s nums=38 I1 time run test_speed2.py CPU times: user 232.73 s, sys: 0.28 s, total: 233.01 s Wall time: 238.75 s However, I have my 3GB memory back free. Any other suggestions? -- Gökhan |
From: Benjamin R. <ben...@ou...> - 2012-07-05 15:55:59
|
On Thu, Jul 5, 2012 at 11:36 AM, Gökhan Sever <gok...@gm...> wrote: > > > On Thu, Jul 5, 2012 at 8:45 AM, Gökhan Sever <gok...@gm...>wrote: > >> On Thu, Jul 5, 2012 at 7:29 AM, Benjamin Root <ben...@ou...> wrote: >> >>> >>> >>> On Wed, Jul 4, 2012 at 1:17 PM, Gökhan Sever <gok...@gm...>wrote: >>> >>>> Hello, >>>> >>>> I am working on creating some distribution plots to analyze cloud >>>> droplet and drop features. You can see one such plot at >>>> http://atmos.uwyo.edu/~gsever/data/rf06_1second/rf06_belowcloud_SurfaceArea_1second.pdf >>>> This file contains 38 pages and each page has 16 panels created via >>>> MPL's AxesGrid toolkit. I am using PdfPages from pdf backend profile to >>>> construct this multi-page plot. The original code that is used to create >>>> this plot is in >>>> http://code.google.com/p/ccnworks/source/browse/trunk/parcel_drizzle/rf06_moments.py >>>> >>>> The problem I am reporting is due to the lengthier plot creation times. >>>> It takes about 4 minutes to create such plot in my laptop. To better >>>> demonstrate the issue I created a sample script which you can use to >>>> reproduce my timing results --well based on pseudo/random data points. All >>>> my data points in the original script are float64 so I use float64 in the >>>> sample script as well. >>>> >>>> The script is at >>>> http://atmos.uwyo.edu/~gsever/data/matplotlib/test_speed.py I also >>>> included 2 pages output running the script with nums=2 setting >>>> http://atmos.uwyo.edu/~gsever/data/matplotlib/test_speed.pdf >>>> Comparing my original output, indeed cloud particles are not from a >>>> normal distribution :) >>>> >>>> Joke aside, running with nums=2 for 2 pages >>>> >>>> time run test_speed.py >>>> CPU times: user 12.39 s, sys: 0.10 s, total: 12.49 s >>>> Wall time: 12.84 s >>>> >>>> when nums=38, just like my original script, then I get similar timing >>>> to my original run >>>> >>>> time run test.py >>>> CPU times: user 227.39 s, sys: 1.74 s, total: 229.13 s >>>> Wall time: 234.87 s >>>> >>>> In addition to these longer plot creation times, 38 pages plot creation >>>> consumes about 3 GB memory. I am wondering if there are tricks to improve >>>> plot creation times as well as more efficiently using the memory. >>>> Attempting to create two such distributions blocks my machine eating 6 GB >>>> of ram space. >>>> >>>> Using Python 2.7, NumPy 2.0.0.dev-7e202a2, IPython >>>> 0.13.beta1, matplotlib 1.1.1rc on Fedora 16 (x86_64) >>>> >>>> Thanks. >>>> >>>> -- >>>> Gökhan >>>> >>>> >>> Gokhan, >>> >>> Looking through your code, I see that you have all of the figure objects >>> available all at once, rather than one at a time. In belowcloud_M0(), you >>> create all of your figure objects and AxesGrid objects in list >>> comprehensions, and then you have multiple for-loops that performs a >>> particular action on each of these. Then you create your PdfPages object >>> and loop over each of the figures, saving it to the page. >>> >>> I would do it quite differently. At the beginning of the function, >>> create your PdfPages object. Then have a single loop over "range(nums)" >>> where you create a figure object and an AxesGrid object. Do your 16 (or >>> less) plots, and any other text you need for that figure. Save it to the >>> PdfPage object, and then close the figure object. When the loop is done, >>> close the PdfPages object. >>> >>> I think you will see huge performance improvement that way. >>> >>> Cheers! >>> Ben Root >>> >>> >> >> Hi, >> >> Could you try the files again? I believe I have given read permission for >> outside access. >> >> http://atmos.uwyo.edu/~gsever/data/matplotlib/test_speed.py >> http://atmos.uwyo.edu/~gsever/data/matplotlib/test_speed.pdf >> >> >> Ben, >> >> Thanks for your suggestion. I will give it a try and report back here. >> >> >> -- >> Gökhan >> > > Hi, > > Please ignore the "pass" line in the first for loop in > http://atmos.uwyo.edu/~gsever/data/matplotlib/test_speed.py > > I have the 2nd version of this script at > http://atmos.uwyo.edu/~gsever/data/matplotlib/test_speed2.py > following Ben's suggestion. Now there are two main loops, the former is > for one Figure and AxesGrid object creation, then the inner loop is for > plotting and decorating the grid objects. I am assuming that my inner loop > is correct. Tracking the xx variable helps me to plot the right index from > the concX data array on to the right grid element. > > Timings are below. As you can see, these are similar to my test_speed.py > version of the runs. > > nums = 2 > I1 time run test_speed2.py > CPU times: user 10.85 s, sys: 0.10 s, total: 10.95 s > Wall time: 11.19 s > > nums=38 > I1 time run test_speed2.py > CPU times: user 232.73 s, sys: 0.28 s, total: 233.01 s > Wall time: 238.75 s > > However, I have my 3GB memory back free. > > Any other suggestions? > > > -- > Gökhan > > And you might get back more memory if you didn't have to have all the data in memory at once, but that may or may not help you. The only other suggestion I can make is to attempt to eliminate the overhead in the inner loop. Essentially, I would try making a single figure and a single AxesGrid object (before the outer loop). Then go over each subplot in the AxesGrid object and set the limits, the log scale, the ticks and the tick locater (I wouldn't be surprised if that is eating up cpu cycles). All of this would be done once before the loop you have right now. Then create the PdfPages object, and loop over all of the plots you have, essentially recycling the figure and AxesGrid object. At end of the outer loop, instead of closing the figure, you should call "remove()" for each plot element you made. Essentially, as you loop over the inner loop, save the output of the plot() call to a list, and then when done with those plots, pop each element of that list and call "remove()" to take it out of the subplot. This will let the subplot axes retain the properties you set earlier. I hope that made sense. Ben Root |
From: Fabrice S. <si...@lm...> - 2012-07-05 17:15:29
|
> At end of the outer loop, instead of closing the figure, you should > call "remove()" for each plot element you made. Essentially, as you > loop over the inner loop, save the output of the plot() call to a > list, and then when done with those plots, pop each element of that > list and call "remove()" to take it out of the subplot. This will let > the subplot axes retain the properties you set earlier. Instead of remove()'ing the graphical elements, you can also reuse them if the kind of plots you intend to do is the same along the figure for simple plots. See : http://paste.debian.net/177857/ -- Fabrice Silva |
From: Gökhan S. <gok...@gm...> - 2012-07-17 22:35:42
|
There is one issue I spotted in this code. Although hard to notice from the produced plot, only the latest grid is updated when set_ydata is called. So a slight modification makes this code running correctly as originally intended. L1list = [] L2list = [] for i in range(nums): for j in range(xx*16, xx*16+16): if i == 0: L1list.append(grid[j%16].plot(dd1, conc1[j], color='r', lw=1.5)[0]) L2list.append(grid[j%16].plot(dd2, conc2[j], color='b', lw=1.5)[0]) grid[j% 16].set_xscale('log') grid[j%16].set_xticks([10, 100, 1000]) grid[j%16]. set_yticks([-25, 0, 25]) grid[j%16].yaxis.set_minor_locator(ticker. MultipleLocator(5)) grid[j%16].set_xlim(1,2000, auto=False) grid[j%16]. set_ylim(-50,50, auto=False) else: L1list(j%16).set_ydata(conc1[j]) L2list(j%16).set_ydata(conc2[j]) On Thu, Jul 5, 2012 at 11:15 AM, Fabrice Silva <si...@lm...>wrote: > > > > At end of the outer loop, instead of closing the figure, you should > > call "remove()" for each plot element you made. Essentially, as you > > loop over the inner loop, save the output of the plot() call to a > > list, and then when done with those plots, pop each element of that > > list and call "remove()" to take it out of the subplot. This will let > > the subplot axes retain the properties you set earlier. > > Instead of remove()'ing the graphical elements, you can also reuse them > if the kind of plots you intend to do is the same along the figure > for simple plots. See : http://paste.debian.net/177857/ > > -- > Fabrice Silva > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Matplotlib-users mailing list > Mat...@li... > https://lists.sourceforge.net/lists/listinfo/matplotlib-users > -- Gökhan |
From: Gökhan S. <gok...@gm...> - 2012-07-05 17:45:47
|
On Thu, Jul 5, 2012 at 11:15 AM, Fabrice Silva <si...@lm...>wrote: > > > > At end of the outer loop, instead of closing the figure, you should > > call "remove()" for each plot element you made. Essentially, as you > > loop over the inner loop, save the output of the plot() call to a > > list, and then when done with those plots, pop each element of that > > list and call "remove()" to take it out of the subplot. This will let > > the subplot axes retain the properties you set earlier. > > Instead of remove()'ing the graphical elements, you can also reuse them > if the kind of plots you intend to do is the same along the figure > for simple plots. See : http://paste.debian.net/177857/ I was close to getting the script run as you pasted. (One minor correction in your script is indexing L1 and L2, either L1[0] or L1, (comma) required in the assignments since grid.plot returns a list) The key here was "reuse" as you told. Memory consumption almost drops to half comparing to the test_speed2.py script run. Now I am down to ~1 minutes from about ~4 minutes execution times, which is indeed quite significant, provided that I experiment on 6 such 38 pages plots. nums = 2 I1 time run test_speed3.py CPU times: user 8.19 s, sys: 0.07 s, total: 8.26 s Wall time: 8.49 s nums=38 I1 time run test_speed3.py CPU times: user 78.84 s, sys: 0.19 s, total: 79.03 s Wall time: 80.88 s Thanks Fabrice for your feedback. |
From: Benjamin R. <ben...@ou...> - 2012-07-05 17:54:07
|
On Thu, Jul 5, 2012 at 1:45 PM, Gökhan Sever <gok...@gm...> wrote: > > > On Thu, Jul 5, 2012 at 11:15 AM, Fabrice Silva <si...@lm...>wrote: > >> >> >> > At end of the outer loop, instead of closing the figure, you should >> > call "remove()" for each plot element you made. Essentially, as you >> > loop over the inner loop, save the output of the plot() call to a >> > list, and then when done with those plots, pop each element of that >> > list and call "remove()" to take it out of the subplot. This will let >> > the subplot axes retain the properties you set earlier. >> >> Instead of remove()'ing the graphical elements, you can also reuse them >> if the kind of plots you intend to do is the same along the figure >> for simple plots. See : http://paste.debian.net/177857/ > > > I was close to getting the script run as you pasted. (One minor correction > in your script is indexing L1 and L2, either L1[0] or L1, (comma) required > in the assignments since grid.plot returns a list) The key here was "reuse" > as you told. Memory consumption almost drops to half comparing to the > test_speed2.py script run. Now I am down to ~1 minutes from about ~4 > minutes execution times, which is indeed quite significant, provided that I > experiment on 6 such 38 pages plots. > > nums = 2 > I1 time run test_speed3.py > CPU times: user 8.19 s, sys: 0.07 s, total: 8.26 s > Wall time: 8.49 s > > nums=38 > I1 time run test_speed3.py > CPU times: user 78.84 s, sys: 0.19 s, total: 79.03 s > Wall time: 80.88 s > > Thanks Fabrice for your feedback. > > 38 * 16 = 608 80 / 608 = 0.1316 seconds per plot At this point, I doubt you are going to get much more speed-ups. Glad to be of help! Fabrice -- Good suggestion! I should have thought of that given how much I use that technique in doing animation. Ben Root |
From: Gökhan S. <gok...@gm...> - 2012-07-05 18:11:52
|
> > >> 38 * 16 = 608 > 80 / 608 = 0.1316 seconds per plot > > At this point, I doubt you are going to get much more speed-ups. Glad to > be of help! > > Fabrice -- Good suggestion! I should have thought of that given how much > I use that technique in doing animation. > > Ben Root > > I am including profiled runs for the records --only first 10 lines to keep e-mail shorter. Total times are longer comparing to the raw run -p executions. I believe profiled run has its own call overhead. I1 run -p test_speed.py 171889738 function calls (169109959 primitive calls) in 374.311 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 4548012 34.583 0.000 34.583 0.000 {numpy.core.multiarray.array} 1778401 21.012 0.000 46.227 0.000 path.py:86(__init__) 521816 17.844 0.000 17.844 0.000 artist.py:74(__init__) 2947090 15.432 0.000 15.432 0.000 weakref.py:243(__init__) 1778401 9.515 0.000 9.515 0.000 {method 'all' of 'numpy.ndarray' objects} 13691669 8.654 0.000 8.654 0.000 {getattr} 1085280 8.550 0.000 17.629 0.000 core.py:2749(_update_from) 1299904 7.809 0.000 76.060 0.000 markers.py:115(_recache) 38 7.378 0.194 7.378 0.194 {gc.collect} 13564851 6.768 0.000 6.768 0.000 {isinstance} I1 run -p test_speed3.py 61658708 function calls (60685172 primitive calls) in 100.934 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 937414 6.638 0.000 6.638 0.000 {numpy.core.multiarray.array} 374227 4.377 0.000 7.500 0.000 path.py:198(iter_segments) 6974613 3.866 0.000 3.866 0.000 {getattr} 542640 3.809 0.000 7.900 0.000 core.py:2749(_update_from) 141361 3.665 0.000 7.136 0.000 transforms.py:99(invalidate) 324688/161136 2.780 0.000 27.747 0.000 transforms.py:1729(transform) 64448 2.753 0.000 64.921 0.001 lines.py:463(draw) 231195 2.748 0.000 7.072 0.000 path.py:86(__init__) 684970/679449 2.679 0.000 3.888 0.000 backend_pdf.py:128(pdfRepr) 67526 2.651 0.000 7.522 0.000 backend_pdf.py:1226(pathOperations) -- Gökhan |
From: Michiel de H. <mjl...@ya...> - 2012-07-07 15:40:47
|
One reason behind the lengthy plot creation times is likely the PDF backend itself. Whereas the Mac OS X and the Cairo backends make use of new_gc and gc.restore to keep track of the graphics context, the PDF backend uses check_gc and an internal stack of graphics contexts. Since nowadays matplotlib has gc.restore functionality, I don't think that that is needed any more. See this revision for when gc.restore was added to matplotlib: http://matplotlib.svn.sourceforge.net/viewvc/matplotlib?view=revision&revision=7112 In the same revision the Mac OS X and Cairo backends were modified to make use of gc.restore. The PDF backend (and the postscript backend also, btw) can be simplified in the same way to speed up these backends, as well as to reduce the output file sizes. Best, -Michiel. --- On Thu, 7/5/12, Gökhan Sever <gok...@gm...> wrote: From: Gökhan Sever <gok...@gm...> Subject: Re: [Matplotlib-users] Accelerating PDF saved plots To: "Benjamin Root" <ben...@ou...> Cc: mat...@li... Date: Thursday, July 5, 2012, 2:11 PM 38 * 16 = 608 80 / 608 = 0.1316 seconds per plot At this point, I doubt you are going to get much more speed-ups. Glad to be of help! Fabrice -- Good suggestion! I should have thought of that given how much I use that technique in doing animation. Ben Root I am including profiled runs for the records --only first 10 lines to keep e-mail shorter. Total times are longer comparing to the raw run -p executions. I believe profiled run has its own call overhead. I1 run -p test_speed.py 171889738 function calls (169109959 primitive calls) in 374.311 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 4548012 34.583 0.000 34.583 0.000 {numpy.core.multiarray.array} 1778401 21.012 0.000 46.227 0.000 path.py:86(__init__) 521816 17.844 0.000 17.844 0.000 artist.py:74(__init__) 2947090 15.432 0.000 15.432 0.000 weakref.py:243(__init__) 1778401 9.515 0.000 9.515 0.000 {method 'all' of 'numpy.ndarray' objects} 13691669 8.654 0.000 8.654 0.000 {getattr} 1085280 8.550 0.000 17.629 0.000 core.py:2749(_update_from) 1299904 7.809 0.000 76.060 0.000 markers.py:115(_recache) 38 7.378 0.194 7.378 0.194 {gc.collect} 13564851 6.768 0.000 6.768 0.000 {isinstance} I1 run -p test_speed3.py 61658708 function calls (60685172 primitive calls) in 100.934 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 937414 6.638 0.000 6.638 0.000 {numpy.core.multiarray.array} 374227 4.377 0.000 7.500 0.000 path.py:198(iter_segments) 6974613 3.866 0.000 3.866 0.000 {getattr} 542640 3.809 0.000 7.900 0.000 core.py:2749(_update_from) 141361 3.665 0.000 7.136 0.000 transforms.py:99(invalidate)324688/161136 2.780 0.000 27.747 0.000 transforms.py:1729(transform) 64448 2.753 0.000 64.921 0.001 lines.py:463(draw) 231195 2.748 0.000 7.072 0.000 path.py:86(__init__)684970/679449 2.679 0.000 3.888 0.000 backend_pdf.py:128(pdfRepr) 67526 2.651 0.000 7.522 0.000 backend_pdf.py:1226(pathOperations) -- Gökhan -----Inline Attachment Follows----- ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ -----Inline Attachment Follows----- _______________________________________________ Matplotlib-users mailing list Mat...@li... https://lists.sourceforge.net/lists/listinfo/matplotlib-users |
From: Gökhan S. <gok...@gm...> - 2012-07-08 01:05:16
|
Hi, What kind of outputs can these backends create? I don't use MAC, so my question is particularly for the Cairo backend. Could make a simple speed comparison between these backends and the original script that uses the PDF backend. I am assuming the changes you mention require quite some work to make the PDFbackend running faster. Thanks. On Sat, Jul 7, 2012 at 9:40 AM, Michiel de Hoon <mjl...@ya...> wrote: > One reason behind the lengthy plot creation times is likely the PDF > backend itself. > > Whereas the Mac OS X and the Cairo backends make use of new_gc and > gc.restore to keep track of the graphics context, the PDF backend uses > check_gc and an internal stack of graphics contexts. Since nowadays > matplotlib has gc.restore functionality, I don't think that that is needed > any more. > > See this revision for when gc.restore was added to matplotlib: > > > http://matplotlib.svn.sourceforge.net/viewvc/matplotlib?view=revision&revision=7112 > > In the same revision the Mac OS X and Cairo backends were modified to make > use of gc.restore. The PDF backend (and the postscript backend also, btw) > can be simplified in the same way to speed up these backends, as well as to > reduce the output file sizes. > > Best, > -Michiel. > > --- On *Thu, 7/5/12, Gökhan Sever <gok...@gm...>* wrote: > > > From: Gökhan Sever <gok...@gm...> > Subject: Re: [Matplotlib-users] Accelerating PDF saved plots > To: "Benjamin Root" <ben...@ou...> > Cc: mat...@li... > Date: Thursday, July 5, 2012, 2:11 PM > > > > 38 * 16 = 608 > 80 / 608 = 0.1316 seconds per plot > > At this point, I doubt you are going to get much more speed-ups. Glad to > be of help! > > Fabrice -- Good suggestion! I should have thought of that given how much > I use that technique in doing animation. > > Ben Root > > > I am including profiled runs for the records --only first 10 lines to keep > e-mail shorter. Total times are longer comparing to the raw run -p > executions. I believe profiled run has its own call overhead. > > I1 run -p test_speed.py > 171889738 function calls (169109959 primitive calls) in 374.311 seconds > > Ordered by: internal time > > ncalls tottime percall cumtime percall filename:lineno(function) > 4548012 34.583 0.000 34.583 0.000 {numpy.core.multiarray.array} > 1778401 21.012 0.000 46.227 0.000 path.py:86(__init__) > 521816 17.844 0.000 17.844 0.000 artist.py:74(__init__) > 2947090 15.432 0.000 15.432 0.000 weakref.py:243(__init__) > 1778401 9.515 0.000 9.515 0.000 {method 'all' of > 'numpy.ndarray' objects} > 13691669 8.654 0.000 8.654 0.000 {getattr} > 1085280 8.550 0.000 17.629 0.000 core.py:2749(_update_from) > 1299904 7.809 0.000 76.060 0.000 markers.py:115(_recache) > 38 7.378 0.194 7.378 0.194 {gc.collect} > 13564851 6.768 0.000 6.768 0.000 {isinstance} > > > > > I1 run -p test_speed3.py > 61658708 function calls (60685172 primitive calls) in 100.934 seconds > > Ordered by: internal time > > ncalls tottime percall cumtime percall filename:lineno(function) > 937414 6.638 0.000 6.638 0.000 {numpy.core.multiarray.array} > 374227 4.377 0.000 7.500 0.000 path.py:198(iter_segments) > 6974613 3.866 0.000 3.866 0.000 {getattr} > 542640 3.809 0.000 7.900 0.000 core.py:2749(_update_from) > 141361 3.665 0.000 7.136 0.000 transforms.py:99(invalidate) > 324688/161136 2.780 0.000 27.747 0.000 > transforms.py:1729(transform) > 64448 2.753 0.000 64.921 0.001 lines.py:463(draw) > 231195 2.748 0.000 7.072 0.000 path.py:86(__init__) > 684970/679449 2.679 0.000 3.888 0.000 > backend_pdf.py:128(pdfRepr) > 67526 2.651 0.000 7.522 0.000 > backend_pdf.py:1226(pathOperations) > > > > -- > Gökhan > > -----Inline Attachment Follows----- > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > > -----Inline Attachment Follows----- > > > _______________________________________________ > Matplotlib-users mailing list > Mat...@li...<http://mc/compose?to=Mat...@li...> > https://lists.sourceforge.net/lists/listinfo/matplotlib-users > > -- Gökhan |
From: Michiel de H. <mjl...@ya...> - 2012-07-08 05:14:10
|
Hi, > What kind of outputs can these backends create? The Mac OS X backend can create PDFs, but it simply uses the pdf backend to do so, so that wouldn't help you. The cairo backend can create PDFs by using cairo, so that could be worth trying. > Could make a simple speed comparison between these backends > and the original script that uses the PDF backend. That would be useful, but keep in mind that there would be three options to compare: 1) The current PDF backend; 2) A modified PDF backend; 3) The cairo backend creating PDFs. Since we don't have 2) yet, we cannot do the full comparison yet, but still it would be good to know if it is faster to create PDFs by using cairo compared to the current PDF backend. > I am assuming the changes you mention require quite some work > to make the PDFbackend running faster. I think it is not so bad, since it's mainly a matter of removing the stuff from the PDF backend that is no longer needed. Do we have a maintainer for the PDF backend? Because I would rather rely on him/her to make the changes to this backend. Otherwise, I can give it a try, but probably I won't be able to find the time for it within this month. Best, -Michiel. --- On Sat, 7/7/12, Gökhan Sever <gok...@gm...> wrote: From: Gökhan Sever <gok...@gm...> Subject: Re: [Matplotlib-users] Accelerating PDF saved plots To: "Michiel de Hoon" <mjl...@ya...> Cc: mat...@li... Date: Saturday, July 7, 2012, 9:05 PM Hi, What kind of outputs can these backends create? I don't use MAC, so my question is particularly for the Cairo backend. Could make a simple speed comparison between these backends and the original script that uses the PDF backend. I am assuming the changes you mention require quite some work to make the PDFbackend running faster. Thanks. On Sat, Jul 7, 2012 at 9:40 AM, Michiel de Hoon <mjl...@ya...> wrote: One reason behind the lengthy plot creation times is likely the PDF backend itself. Whereas the Mac OS X and the Cairo backends make use of new_gc and gc.restore to keep track of the graphics context, the PDF backend uses check_gc and an internal stack of graphics contexts. Since nowadays matplotlib has gc.restore functionality, I don't think that that is needed any more. See this revision for when gc.restore was added to matplotlib: http://matplotlib.svn.sourceforge.net/viewvc/matplotlib?view=revision&revision=7112 In the same revision the Mac OS X and Cairo backends were modified to make use of gc.restore. The PDF backend (and the postscript backend also, btw) can be simplified in the same way to speed up these backends, as well as to reduce the output file sizes. Best, -Michiel. --- On Thu, 7/5/12, Gökhan Sever <gok...@gm...> wrote: From: Gökhan Sever <gok...@gm...> Subject: Re: [Matplotlib-users] Accelerating PDF saved plots To: "Benjamin Root" <ben...@ou...> Cc: mat...@li... Date: Thursday, July 5, 2012, 2:11 PM 38 * 16 = 608 80 / 608 = 0.1316 seconds per plot At this point, I doubt you are going to get much more speed-ups. Glad to be of help! Fabrice -- Good suggestion! I should have thought of that given how much I use that technique in doing animation. Ben Root I am including profiled runs for the records --only first 10 lines to keep e-mail shorter. Total times are longer comparing to the raw run -p executions. I believe profiled run has its own call overhead. I1 run -p test_speed.py 171889738 function calls (169109959 primitive calls) in 374.311 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 4548012 34.583 0.000 34.583 0.000 {numpy.core.multiarray.array} 1778401 21.012 0.000 46.227 0.000 path.py:86(__init__) 521816 17.844 0.000 17.844 0.000 artist.py:74(__init__) 2947090 15.432 0.000 15.432 0.000 weakref.py:243(__init__) 1778401 9.515 0.000 9.515 0.000 {method 'all' of 'numpy.ndarray' objects} 13691669 8.654 0.000 8.654 0.000 {getattr} 1085280 8.550 0.000 17.629 0.000 core.py:2749(_update_from) 1299904 7.809 0.000 76.060 0.000 markers.py:115(_recache) 38 7.378 0.194 7.378 0.194 {gc.collect} 13564851 6.768 0.000 6.768 0.000 {isinstance} I1 run -p test_speed3.py 61658708 function calls (60685172 primitive calls) in 100.934 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 937414 6.638 0.000 6.638 0.000 {numpy.core.multiarray.array} 374227 4.377 0.000 7.500 0.000 path.py:198(iter_segments) 6974613 3.866 0.000 3.866 0.000 {getattr} 542640 3.809 0.000 7.900 0.000 core.py:2749(_update_from) 141361 3.665 0.000 7.136 0.000 transforms.py:99(invalidate)324688/161136 2.780 0.000 27.747 0.000 transforms.py:1729(transform) 64448 2.753 0.000 64.921 0.001 lines.py:463(draw) 231195 2.748 0.000 7.072 0.000 path.py:86(__init__)684970/679449 2.679 0.000 3.888 0.000 backend_pdf.py:128(pdfRepr) 67526 2.651 0.000 7.522 0.000 backend_pdf.py:1226(pathOperations) -- Gökhan -----Inline Attachment Follows----- ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ -----Inline Attachment Follows----- _______________________________________________ Matplotlib-users mailing list Mat...@li... https://lists.sourceforge.net/lists/listinfo/matplotlib-users -- Gökhan |
From: Eric F. <ef...@ha...> - 2012-07-08 06:20:52
|
On 2012/07/07 7:14 PM, Michiel de Hoon wrote: > Hi, > > > What kind of outputs can these backends create? > > The Mac OS X backend can create PDFs, but it simply uses the pdf backend > to do so, so that wouldn't help you. > The cairo backend can create PDFs by using cairo, so that could be worth > trying. > > > Could make a simple speed comparison between these backends > > and the original script that uses the PDF backend. > > That would be useful, but keep in mind that there would be three options > to compare: > 1) The current PDF backend; > 2) A modified PDF backend; > 3) The cairo backend creating PDFs. > Since we don't have 2) yet, we cannot do the full comparison yet, but > still it would be good to know if it is faster to create PDFs by using > cairo compared to the current PDF backend. > > > I am assuming the changes you mention require quite some work > > to make the PDFbackend running faster. > > I think it is not so bad, since it's mainly a matter of removing the > stuff from the PDF backend that is no longer needed. Do we have a > maintainer for the PDF backend? Because I would rather rely on him/her > to make the changes to this backend. Otherwise, I can give it a try, but > probably I won't be able to find the time for it within this month. > It would be a good idea to enter a Github ticket for this, referring to this email thread. Mike D. and Jouni S. have done most of the work on the pdf backend. Eric > Best, > -Michiel. > > > > --- On *Sat, 7/7/12, Gökhan Sever /<gok...@gm...>/* wrote: > > > From: Gökhan Sever <gok...@gm...> > Subject: Re: [Matplotlib-users] Accelerating PDF saved plots > To: "Michiel de Hoon" <mjl...@ya...> > Cc: mat...@li... > Date: Saturday, July 7, 2012, 9:05 PM > > Hi, > > What kind of outputs can these backends create? I don't use MAC, so > my question is particularly for the Cairo backend. Could make a > simple speed comparison between these backends and the original > script that uses the PDF backend. I am assuming the changes you > mention require quite some work to make the PDFbackend running faster. > > Thanks. > > On Sat, Jul 7, 2012 at 9:40 AM, Michiel de Hoon <mjl...@ya... > </mc/compose?to=mjl...@ya...>> wrote: > > One reason behind the lengthy plot creation times is likely the > PDF backend itself. > > Whereas the Mac OS X and the Cairo backends make use of new_gc > and gc.restore to keep track of the graphics context, the PDF > backend uses check_gc and an internal stack of graphics > contexts. Since nowadays matplotlib has gc.restore > functionality, I don't think that that is needed any more. > > See this revision for when gc.restore was added to matplotlib: > > http://matplotlib.svn.sourceforge.net/viewvc/matplotlib?view=revision&revision=7112 > > In the same revision the Mac OS X and Cairo backends were > modified to make use of gc.restore. The PDF backend (and the > postscript backend also, btw) can be simplified in the same way > to speed up these backends, as well as to reduce the output file > sizes. > > Best, > -Michiel. > > --- On *Thu, 7/5/12, Gökhan Sever /<gok...@gm... > </mc/compose?to=gok...@gm...>>/* wrote: > > > From: Gökhan Sever <gok...@gm... > </mc/compose?to=gok...@gm...>> > Subject: Re: [Matplotlib-users] Accelerating PDF saved plots > To: "Benjamin Root" <ben...@ou... > </mc/compose?to=ben...@ou...>> > Cc: mat...@li... > </mc/compose?to=mat...@li...> > Date: Thursday, July 5, 2012, 2:11 PM > > > > 38 * 16 = 608 > 80 / 608 = 0.1316 seconds per plot > > At this point, I doubt you are going to get much more > speed-ups. Glad to be of help! > > Fabrice -- Good suggestion! I should have thought of > that given how much I use that technique in doing animation. > > Ben Root > > > I am including profiled runs for the records --only first 10 > lines to keep e-mail shorter. Total times are longer > comparing to the raw run -p executions. I believe profiled > run has its own call overhead. > > I1 run -p test_speed.py > 171889738 function calls (169109959 primitive calls) in > 374.311 seconds > > Ordered by: internal time > > ncalls tottime percall cumtime percall > filename:lineno(function) > 4548012 34.583 0.000 34.583 0.000 > {numpy.core.multiarray.array} > 1778401 21.012 0.000 46.227 0.000 > path.py:86(__init__) > 521816 17.844 0.000 17.844 0.000 > artist.py:74(__init__) > 2947090 15.432 0.000 15.432 0.000 > weakref.py:243(__init__) > 1778401 9.515 0.000 9.515 0.000 {method 'all' > of 'numpy.ndarray' objects} > 13691669 8.654 0.000 8.654 0.000 {getattr} > 1085280 8.550 0.000 17.629 0.000 > core.py:2749(_update_from) > 1299904 7.809 0.000 76.060 0.000 > markers.py:115(_recache) > 38 7.378 0.194 7.378 0.194 {gc.collect} > 13564851 6.768 0.000 6.768 0.000 {isinstance} > > > > > I1 run -p test_speed3.py > 61658708 function calls (60685172 primitive calls) in > 100.934 seconds > > Ordered by: internal time > > ncalls tottime percall cumtime percall > filename:lineno(function) > 937414 6.638 0.000 6.638 0.000 > {numpy.core.multiarray.array} > 374227 4.377 0.000 7.500 0.000 > path.py:198(iter_segments) > 6974613 3.866 0.000 3.866 0.000 {getattr} > 542640 3.809 0.000 7.900 0.000 > core.py:2749(_update_from) > 141361 3.665 0.000 7.136 0.000 > transforms.py:99(invalidate) > 324688/161136 2.780 0.000 27.747 0.000 > transforms.py:1729(transform) > 64448 2.753 0.000 64.921 0.001 > lines.py:463(draw) > 231195 2.748 0.000 7.072 0.000 > path.py:86(__init__) > 684970/679449 2.679 0.000 3.888 0.000 > backend_pdf.py:128(pdfRepr) > 67526 2.651 0.000 7.522 0.000 > backend_pdf.py:1226(pathOperations) > > > > -- > Gökhan > > -----Inline Attachment Follows----- > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's > security and > threat landscape has changed and how IT managers can > respond. Discussions > will include endpoint security, mobile security and the > latest in malware > threats. > http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > > -----Inline Attachment Follows----- > > > _______________________________________________ > Matplotlib-users mailing list > Mat...@li... > <http://mc/compose?to=Mat...@li...> > https://lists.sourceforge.net/lists/listinfo/matplotlib-users > > > > > -- > Gökhan > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > > > > _______________________________________________ > Matplotlib-users mailing list > Mat...@li... > https://lists.sourceforge.net/lists/listinfo/matplotlib-users > |
From: Michiel de H. <mjl...@ya...> - 2012-07-09 11:29:58
|
Hi Eric, Jouni, Thanks for your replies. I opened an issue here: https://github.com/matplotlib/matplotlib/issues/992 and wrote an outline of how the PDF backend can be simplified by making use of gc.restore to keep track of the graphics context. In essence the PDF backend would then follow the same logic as the Cairo and Mac OS X backends, so it may be good to compare to these (especially the Cairo backend, since it's written in pure Python and easy to understand). Best, -Michiel. --- On Sun, 7/8/12, Eric Firing <ef...@ha...> wrote: > From: Eric Firing <ef...@ha...> > Subject: Re: [Matplotlib-users] Accelerating PDF saved plots > To: mat...@li... > Date: Sunday, July 8, 2012, 2:20 AM > On 2012/07/07 7:14 PM, Michiel de > Hoon wrote: > > Hi, > > > > > What kind of outputs can these backends > create? > > > > The Mac OS X backend can create PDFs, but it simply > uses the pdf backend > > to do so, so that wouldn't help you. > > The cairo backend can create PDFs by using cairo, so > that could be worth > > trying. > > > > > Could make a simple speed comparison between > these backends > > > and the original script that uses the PDF > backend. > > > > That would be useful, but keep in mind that there would > be three options > > to compare: > > 1) The current PDF backend; > > 2) A modified PDF backend; > > 3) The cairo backend creating PDFs. > > Since we don't have 2) yet, we cannot do the full > comparison yet, but > > still it would be good to know if it is faster to > create PDFs by using > > cairo compared to the current PDF backend. > > > > > I am assuming the changes you mention > require quite some work > > > to make the PDFbackend running faster. > > > > I think it is not so bad, since it's mainly a matter of > removing the > > stuff from the PDF backend that is no longer needed. Do > we have a > > maintainer for the PDF backend? Because I would rather > rely on him/her > > to make the changes to this backend. Otherwise, I can > give it a try, but > > probably I won't be able to find the time for it within > this month. > > > > It would be a good idea to enter a Github ticket for this, > referring to > this email thread. > > Mike D. and Jouni S. have done most of the work on the pdf > backend. > > Eric > > > Best, > > -Michiel. > > > > > > > > --- On *Sat, 7/7/12, Gökhan Sever /<gok...@gm...>/* > wrote: > > > > > > From: Gökhan Sever <gok...@gm...> > > Subject: Re: [Matplotlib-users] > Accelerating PDF saved plots > > To: "Michiel de Hoon" <mjl...@ya...> > > Cc: mat...@li... > > Date: Saturday, July 7, 2012, > 9:05 PM > > > > Hi, > > > > What kind of outputs can these > backends create? I don't use MAC, so > > my question is particularly for > the Cairo backend. Could make a > > simple speed comparison between > these backends and the original > > script that uses the PDF > backend. I am assuming the changes you > > mention require quite some work > to make the PDFbackend running faster. > > > > Thanks. > > > > On Sat, Jul 7, 2012 at 9:40 AM, > Michiel de Hoon <mjl...@ya... > > </mc/compose?to=mjl...@ya...>> > wrote: > > > > One reason behind > the lengthy plot creation times is likely the > > PDF backend > itself. > > > > Whereas the Mac > OS X and the Cairo backends make use of new_gc > > and gc.restore to > keep track of the graphics context, the PDF > > backend uses > check_gc and an internal stack of graphics > > contexts. Since > nowadays matplotlib has gc.restore > > functionality, I > don't think that that is needed any more. > > > > See this revision > for when gc.restore was added to matplotlib: > > > > http://matplotlib.svn.sourceforge.net/viewvc/matplotlib?view=revision&revision=7112 > > > > In the same > revision the Mac OS X and Cairo backends were > > modified to make > use of gc.restore. The PDF backend (and the > > postscript > backend also, btw) can be simplified in the same way > > to speed up these > backends, as well as to reduce the output file > > sizes. > > > > Best, > > -Michiel. > > > > --- On *Thu, > 7/5/12, Gökhan Sever /<gok...@gm... > > > </mc/compose?to=gok...@gm...>>/* > wrote: > > > > > > > From: Gökhan Sever <gok...@gm... > > > </mc/compose?to=gok...@gm...>> > > > Subject: Re: [Matplotlib-users] > Accelerating PDF saved plots > > To: > "Benjamin Root" <ben...@ou... > > > </mc/compose?to=ben...@ou...>> > > Cc: > mat...@li... > > > </mc/compose?to=mat...@li...> > > > Date: Thursday, July 5, 2012, 2:11 PM > > > > > > > > > 38 * 16 = 608 > > > 80 / 608 = 0.1316 seconds per plot > > > > > At this point, I doubt you are going to > get much more > > > speed-ups. Glad to be of help! > > > > > Fabrice -- Good suggestion! I should > have thought of > > > that given how much I use that technique > in doing animation. > > > > > Ben Root > > > > > > I > am including profiled runs for the records --only first 10 > > > lines to keep e-mail shorter. Total times > are longer > > > comparing to the raw run -p executions. I > believe profiled > > run > has its own call overhead. > > > > I1 > run -p test_speed.py > > > 171889738 function calls (169109959 > primitive calls) in > > > 374.311 seconds > > > > > Ordered by: internal time > > > > > ncalls tottime percall > cumtime percall > > > filename:lineno(function) > > > 4548012 34.583 > 0.000 34.583 0.000 > > > {numpy.core.multiarray.array} > > > 1778401 21.012 > 0.000 46.227 0.000 > > > path.py:86(__init__) > > > 521816 17.844 > 0.000 17.844 0.000 > > > artist.py:74(__init__) > > > 2947090 15.432 > 0.000 15.432 0.000 > > > weakref.py:243(__init__) > > > 1778401 9.515 0.000 > 9.515 0.000 {method 'all' > > of > 'numpy.ndarray' objects} > > > 13691669 8.654 > 0.000 8.654 0.000 {getattr} > > > 1085280 8.550 > 0.000 17.629 0.000 > > > core.py:2749(_update_from) > > > 1299904 7.809 > 0.000 76.060 0.000 > > > markers.py:115(_recache) > > > 38 7.378 > 0.194 7.378 0.194 {gc.collect} > > > 13564851 6.768 > 0.000 6.768 0.000 {isinstance} > > > > > > > > > > I1 > run -p test_speed3.py > > > 61658708 function calls (60685172 > primitive calls) in > > > 100.934 seconds > > > > > Ordered by: internal time > > > > > ncalls tottime percall > cumtime percall > > > filename:lineno(function) > > > 937414 6.638 > 0.000 6.638 0.000 > > > {numpy.core.multiarray.array} > > > 374227 4.377 > 0.000 7.500 0.000 > > > path.py:198(iter_segments) > > > 6974613 3.866 0.000 > 3.866 0.000 {getattr} > > > 542640 3.809 > 0.000 7.900 0.000 > > > core.py:2749(_update_from) > > > 141361 3.665 > 0.000 7.136 0.000 > > > transforms.py:99(invalidate) > > > 324688/161136 2.780 > 0.000 27.747 0.000 > > > transforms.py:1729(transform) > > > 64448 2.753 > 0.000 64.921 0.001 > > > lines.py:463(draw) > > > 231195 2.748 > 0.000 7.072 0.000 > > > path.py:86(__init__) > > > 684970/679449 2.679 > 0.000 3.888 0.000 > > > backend_pdf.py:128(pdfRepr) > > > 67526 2.651 0.000 > 7.522 0.000 > > > backend_pdf.py:1226(pathOperations) > > > > > > > > -- > > > Gökhan > > > > > -----Inline Attachment Follows----- > > > > > > > ------------------------------------------------------------------------------ > > > Live Security Virtual Conference > > > Exclusive live event will cover all the > ways today's > > > security and > > > threat landscape has changed and how IT > managers can > > > respond. Discussions > > > will include endpoint security, mobile > security and the > > > latest in malware > > > threats. > > http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > > > > > -----Inline Attachment Follows----- > > > > > > > _______________________________________________ > > > Matplotlib-users mailing list > > Mat...@li... > > > <http://mc/compose?to=Mat...@li...> > > https://lists.sourceforge.net/lists/listinfo/matplotlib-users > > > > > > > > > > -- > > Gökhan > > > > > > > > > ------------------------------------------------------------------------------ > > Live Security Virtual Conference > > Exclusive live event will cover all the ways today's > security and > > threat landscape has changed and how IT managers can > respond. Discussions > > will include endpoint security, mobile security and the > latest in malware > > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > > > > > > > > _______________________________________________ > > Matplotlib-users mailing list > > Mat...@li... > > https://lists.sourceforge.net/lists/listinfo/matplotlib-users > > > > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's > security and > threat landscape has changed and how IT managers can > respond. Discussions > will include endpoint security, mobile security and the > latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Matplotlib-users mailing list > Mat...@li... > https://lists.sourceforge.net/lists/listinfo/matplotlib-users > |
From: Jouni K. S. <jk...@ik...> - 2012-07-13 08:17:04
|
Thanks for the explanation! Let's discuss further on that issue page. Michiel de Hoon <mjl...@ya...> writes: > Hi Eric, Jouni, > > Thanks for your replies. I opened an issue here: > > https://github.com/matplotlib/matplotlib/issues/992 > > and wrote an outline of how the PDF backend can be simplified by > making use of gc.restore to keep track of the graphics context. In > essence the PDF backend would then follow the same logic as the Cairo > and Mac OS X backends, so it may be good to compare to these > (especially the Cairo backend, since it's written in pure Python and > easy to understand). > > Best, > -Michiel. -- Jouni K. Seppänen http://www.iki.fi/jks |
From: Jouni K. S. <jk...@ik...> - 2012-07-08 06:19:51
|
Michiel de Hoon <mjl...@ya...> writes: > I think it is not so bad, since it's mainly a matter of removing the > stuff from the PDF backend that is no longer needed. Do we have a > maintainer for the PDF backend? Because I would rather rely on him/her > to make the changes to this backend. That would be me. Can you outline what parts you think can be removed? I'm currently travelling and don't always have an Internet connection, or much time available, so turnaround can be slow. -- Jouni K. Seppänen http://www.iki.fi/jks |
From: Gökhan S. <gok...@gm...> - 2012-07-05 17:55:23
|
> > And you might get back more memory if you didn't have to have all the data > in memory at once, but that may or may not help you. The only other > suggestion I can make is to attempt to eliminate the overhead in the inner > loop. Essentially, I would try making a single figure and a single > AxesGrid object (before the outer loop). Then go over each subplot in the > AxesGrid object and set the limits, the log scale, the ticks and the tick > locater (I wouldn't be surprised if that is eating up cpu cycles). All of > this would be done once before the loop you have right now. Then create > the PdfPages object, and loop over all of the plots you have, essentially > recycling the figure and AxesGrid object. > > At end of the outer loop, instead of closing the figure, you should call > "remove()" for each plot element you made. Essentially, as you loop over > the inner loop, save the output of the plot() call to a list, and then when > done with those plots, pop each element of that list and call "remove()" to > take it out of the subplot. This will let the subplot axes retain the > properties you set earlier. > > I hope that made sense. > Ben Root > > Hi Ben, I should have data the available at once, as I directly read that array from a netcdf file. The memory requirement for my data is small comparing to overhead added once plot creation is started. Fabrice's reply includes most of what you describe except the remove call part. These changes made big impact to lower my execution times. Thank you again for your explanation. -- Gökhan |
From: Benjamin R. <ben...@ou...> - 2012-07-05 18:18:16
|
On Thu, Jul 5, 2012 at 1:55 PM, Gökhan Sever <gok...@gm...> wrote: > And you might get back more memory if you didn't have to have all the data >> in memory at once, but that may or may not help you. The only other >> suggestion I can make is to attempt to eliminate the overhead in the inner >> loop. Essentially, I would try making a single figure and a single >> AxesGrid object (before the outer loop). Then go over each subplot in the >> AxesGrid object and set the limits, the log scale, the ticks and the tick >> locater (I wouldn't be surprised if that is eating up cpu cycles). All of >> this would be done once before the loop you have right now. Then create >> the PdfPages object, and loop over all of the plots you have, essentially >> recycling the figure and AxesGrid object. >> >> At end of the outer loop, instead of closing the figure, you should call >> "remove()" for each plot element you made. Essentially, as you loop over >> the inner loop, save the output of the plot() call to a list, and then when >> done with those plots, pop each element of that list and call "remove()" to >> take it out of the subplot. This will let the subplot axes retain the >> properties you set earlier. >> >> I hope that made sense. >> Ben Root >> >> > Hi Ben, > > I should have data the available at once, as I directly read that array > from a netcdf file. The memory requirement for my data is small comparing > to overhead added once plot creation is started. Fabrice's reply includes > most of what you describe except the remove call part. These changes made > big impact to lower my execution times. Thank you again for your > explanation. > > > -- > Gökhan > Actually, looking at Fabrice's code, you might be able to get it to be slightly faster. Lines 39-41 should be protected by a "if i == 0" statement because it only needs to be done once. Furthermore, you might get some more improvements if you allow the subplots to share_all, in which case, you only need to set the limits and maybe the scale and the locator once. Cheers! Ben Root |
From: Gökhan S. <gok...@gm...> - 2012-07-05 21:05:15
|
On Thu, Jul 5, 2012 at 12:17 PM, Benjamin Root <ben...@ou...> wrote: > > Actually, looking at Fabrice's code, you might be able to get it to be > slightly faster. Lines 39-41 should be protected by a "if i == 0" > statement because it only needs to be done once. Furthermore, you might > get some more improvements if you allow the subplots to share_all, in which > case, you only need to set the limits and maybe the scale and the locator > once. > > Cheers! > Ben Root > > Good catch. Bringing lines 39-41 in the if i==0 block makes the label texts appear jagged. See my output for this case at -> http://atmos.uwyo.edu/~gsever/data/matplotlib/test_speed3_jaggedlabels.pdf Putting these lines right below main fig and grid object creations make them look normal, and this case saves me 3-5 more seconds. Setting share_all option to 1 makes x-ticks unreasonably placed on the axes. As if the share_all option is applied only to the first plot call. See the example output at -> http://atmos.uwyo.edu/~gsever/data/matplotlib/test_speed3_badxaxes.pdf I actually started with share_all=1 from this example -> http://matplotlib.sourceforge.net/mpl_toolkits/axes_grid/examples/demo_axes_grid.py Particularly the construction given in def demo_grid_with_single_cbar(fig). However I noticed, this behavior earlier and explicit grid calls solved this issue. |