On Thu, Jul 5, 2012 at 11:36 AM, Gökhan Sever <gokhansever@...> wrote:
> On Thu, Jul 5, 2012 at 8:45 AM, Gökhan Sever <gokhansever@...:
>> On Thu, Jul 5, 2012 at 7:29 AM, Benjamin Root <ben.root@...> wrote:
>>> On Wed, Jul 4, 2012 at 1:17 PM, Gökhan Sever <gokhansever@...:
>>>> I am working on creating some distribution plots to analyze cloud
>>>> droplet and drop features. You can see one such plot at
>>>> This file contains 38 pages and each page has 16 panels created via
>>>> MPL's AxesGrid toolkit. I am using PdfPages from pdf backend profile to
>>>> construct this multi-page plot. The original code that is used to create
>>>> this plot is in
>>>> The problem I am reporting is due to the lengthier plot creation times.
>>>> It takes about 4 minutes to create such plot in my laptop. To better
>>>> demonstrate the issue I created a sample script which you can use to
>>>> reproduce my timing results --well based on pseudo/random data points. All
>>>> my data points in the original script are float64 so I use float64 in the
>>>> sample script as well.
>>>> The script is at
>>>> http://atmos.uwyo.edu/~gsever/data/matplotlib/test_speed.py I also
>>>> included 2 pages output running the script with nums=2 setting
>>>> Comparing my original output, indeed cloud particles are not from a
>>>> normal distribution :)
>>>> Joke aside, running with nums=2 for 2 pages
>>>> time run test_speed.py
>>>> CPU times: user 12.39 s, sys: 0.10 s, total: 12.49 s
>>>> Wall time: 12.84 s
>>>> when nums=38, just like my original script, then I get similar timing
>>>> to my original run
>>>> time run test.py
>>>> CPU times: user 227.39 s, sys: 1.74 s, total: 229.13 s
>>>> Wall time: 234.87 s
>>>> In addition to these longer plot creation times, 38 pages plot creation
>>>> consumes about 3 GB memory. I am wondering if there are tricks to improve
>>>> plot creation times as well as more efficiently using the memory.
>>>> Attempting to create two such distributions blocks my machine eating 6 GB
>>>> of ram space.
>>>> Using Python 2.7, NumPy 2.0.0.dev-7e202a2, IPython
>>>> 0.13.beta1, matplotlib 1.1.1rc on Fedora 16 (x86_64)
>>> Looking through your code, I see that you have all of the figure objects
>>> available all at once, rather than one at a time. In belowcloud_M0(), you
>>> create all of your figure objects and AxesGrid objects in list
>>> comprehensions, and then you have multiple for-loops that performs a
>>> particular action on each of these. Then you create your PdfPages object
>>> and loop over each of the figures, saving it to the page.
>>> I would do it quite differently. At the beginning of the function,
>>> create your PdfPages object. Then have a single loop over "range(nums)"
>>> where you create a figure object and an AxesGrid object. Do your 16 (or
>>> less) plots, and any other text you need for that figure. Save it to the
>>> PdfPage object, and then close the figure object. When the loop is done,
>>> close the PdfPages object.
>>> I think you will see huge performance improvement that way.
>>> Ben Root
>> Could you try the files again? I believe I have given read permission for
>> outside access.
>> Thanks for your suggestion. I will give it a try and report back here.
> Please ignore the "pass" line in the first for loop in
> I have the 2nd version of this script at
> following Ben's suggestion. Now there are two main loops, the former is
> for one Figure and AxesGrid object creation, then the inner loop is for
> plotting and decorating the grid objects. I am assuming that my inner loop
> is correct. Tracking the xx variable helps me to plot the right index from
> the concX data array on to the right grid element.
> Timings are below. As you can see, these are similar to my test_speed.py
> version of the runs.
> nums = 2
> I1 time run test_speed2.py
> CPU times: user 10.85 s, sys: 0.10 s, total: 10.95 s
> Wall time: 11.19 s
> I1 time run test_speed2.py
> CPU times: user 232.73 s, sys: 0.28 s, total: 233.01 s
> Wall time: 238.75 s
> However, I have my 3GB memory back free.
> Any other suggestions?
And you might get back more memory if you didn't have to have all the data
in memory at once, but that may or may not help you. The only other
suggestion I can make is to attempt to eliminate the overhead in the inner
loop. Essentially, I would try making a single figure and a single
AxesGrid object (before the outer loop). Then go over each subplot in the
AxesGrid object and set the limits, the log scale, the ticks and the tick
locater (I wouldn't be surprised if that is eating up cpu cycles). All of
this would be done once before the loop you have right now. Then create
the PdfPages object, and loop over all of the plots you have, essentially
recycling the figure and AxesGrid object.
At end of the outer loop, instead of closing the figure, you should call
"remove()" for each plot element you made. Essentially, as you loop over
the inner loop, save the output of the plot() call to a list, and then when
done with those plots, pop each element of that list and call "remove()" to
take it out of the subplot. This will let the subplot axes retain the
properties you set earlier.
I hope that made sense.