On Wed, Jul 4, 2012 at 1:17 PM, Gökhan Sever <firstname.lastname@example.org> wrote:
Hello,I am working on creating some distribution plots to analyze cloud droplet and drop features. You can see one such plot at http://atmos.uwyo.edu/~gsever/data/rf06_1second/rf06_belowcloud_SurfaceArea_1second.pdfThis file contains 38 pages and each page has 16 panels created via MPL's AxesGrid toolkit. I am using PdfPages from pdf backend profile to construct this multi-page plot. The original code that is used to create this plot is in http://code.google.com/p/ccnworks/source/browse/trunk/parcel_drizzle/rf06_moments.pyThe problem I am reporting is due to the lengthier plot creation times. It takes about 4 minutes to create such plot in my laptop. To better demonstrate the issue I created a sample script which you can use to reproduce my timing results --well based on pseudo/random data points. All my data points in the original script are float64 so I use float64 in the sample script as well.The script is at http://atmos.uwyo.edu/~gsever/data/matplotlib/test_speed.py I also included 2 pages output running the script with nums=2 setting http://atmos.uwyo.edu/~gsever/data/matplotlib/test_speed.pdfComparing my original output, indeed cloud particles are not from a normal distribution :)Joke aside, running with nums=2 for 2 pagestime run test_speed.pyCPU times: user 12.39 s, sys: 0.10 s, total: 12.49 sWall time: 12.84 swhen nums=38, just like my original script, then I get similar timing to my original run
time run test.pyCPU times: user 227.39 s, sys: 1.74 s, total: 229.13 sWall time: 234.87 sIn addition to these longer plot creation times, 38 pages plot creation consumes about 3 GB memory. I am wondering if there are tricks to improve plot creation times as well as more efficiently using the memory. Attempting to create two such distributions blocks my machine eating 6 GB of ram space.Using Python 2.7, NumPy 2.0.0.dev-7e202a2, IPython 0.13.beta1, matplotlib 1.1.1rc on Fedora 16 (x86_64)Thanks.--
Looking through your code, I see that you have all of the figure objects available all at once, rather than one at a time. In belowcloud_M0(), you create all of your figure objects and AxesGrid objects in list comprehensions, and then you have multiple for-loops that performs a particular action on each of these. Then you create your PdfPages object and loop over each of the figures, saving it to the page.
I would do it quite differently. At the beginning of the function, create your PdfPages object. Then have a single loop over "range(nums)" where you create a figure object and an AxesGrid object. Do your 16 (or less) plots, and any other text you need for that figure. Save it to the PdfPage object, and then close the figure object. When the loop is done, close the PdfPages object.
I think you will see huge performance improvement that way.