I suppose I'm a bit confused -- I thought that jpeglib, part of which
is implemented by PIL (??) could process compressed images without
representing decompressing them to a dense raster-image matrix
That said, I tried to do some PIL things, and as soon as I converted
an image (or something similar) the memory taken up suggested that the
image was represented completely and uncompressed (memory was more or
less evenly split between virtual and real memory).
So, I guess what remains are the problems with iPython. My
MATLAB-loving friend has stuck his nose up because of the memory-leaky
interactive prompt, claiming that MATLAB has no such problems ...
Thanks for your help, in any case.
On Mon, Oct 5, 2009 at 5:47 AM, Michael Droettboom <mdroe@...> wrote:
> For some reason, my earlier reply didn't seem to make it to the mailing
> list. Here it is in its entirety:
> If you assign each figure to a new number, it will keep all of those figures
> around in memory (because pyplot thinks you may want to use it again.) The
> best route is to call close('all') or fig.close() with each loop iteration.
> 40MB per image doesn't sound way out of reason to me. How big are your
> On 10/05/2009 03:46 AM, Leo Trottier wrote:
> I think I've figured out what's going on. It's a combination of things:
> 1) iPython is ignorant of the problems associated with caching massive data
> 2) iPython doesn't seem to have a good way to clear data from memory
> reliably (https://bugs.launchpad.net/ipython/+bug/412350)
> iPython is designed for interactive use, and stores a lot of values so they
> can be conveniently reused later. For long running "batch" scripts, you can
> use "regular" Python, or run the code in iPython such that it isn't
> displayed at the console (by using "import" or "%run"). Bug 2) may help
> looks like it would still require some manual intervention to be usefull.
> You're still using a tool designed for fine-grained interactive use (eg. a
> pen) where one designed for automation may be more appropriate (eg. a laser
> printer) :)
> 3) matplotlib/Python seems to be insufficiently aggressive in its garbage
> collection (??)
> Is that still true after forcibly closing the figures on each loop iteration
> as I suggested? Many hours have been spent squashing memory leaks in
> matplotlib, and I am not aware of any in at least 0.98 and later (other than
> some unavoidable small leaks in certain GUI backends). Do you have a
> standalone example that illustrates this on a recent version of matplotlib?
> 4) For obvious reasons, JPGs are much bigger when stored as arrays (though
> they still seem to take up more memory than they should)
> It's pretty easy to estimate the memory requirements for an image. If the
> image is true-color (by this, I mean not color-mapped), you'll need
> 4-bytes-per-pixel for the original image, plus a cached scaled copy (the
> size of which depends on the output dpi), again with 4 bytes per pixel. For
> color-mapped images, you'll have 4-byte floats for each pixel, 4-byte rgba
> for the color-mapped image, and again a cached scaled copy of that. Not
> knowing the size of your input images, it's impossible to say if 40MB per
> image is way too big or not, but it's not unheard of by any means.
> Problems 1-3 seem problematic enough that they will get fixed eventually.
> ... but (4) is a design issue. Assuming it's possible, it looks like there
> could be benefits to making an array-like wrapper around PIL image objects
> (perhaps similar in principle to a sparse matrix). Given PIL.ImageMath,
> ImagePath, etc., it seems actually fairly doable. Wouldn't something like
> this be of major benefit to people using SciPy for anything image-related?
> Are you suggesting decompressing the JPEG on-the-fly with each redraw? I'm
> not certain that would be fast enough for interactive use. It may be worth
> experimenting with, but it would require a lot of changes to how matplotlib
> works. It's also very tricky to get right -- I'm not aware of any image
> processing applications that don't ultimately store a dense matrix of
> uncompressed image data in memory, except for something like compressed
> OpenGL textures on a graphics card. PIL certainly doesn't retain the
> compressed JPEG in memory. So, I'm not sure the cost/benefit tradeoff is
> right here -- the problems it solves can be solved much more easily without
> sacrificing speed in other ways. That is, if the image data is simply too
> large, it can be scaled before feeding it to imshow(). And generating
> multiple figures in batch is not a problem if the figure is explicitly
> Hope this helps. I would like to get to the bottom of any memory leaks, so
> if you can provide a standalone script that leaks, despite calling
> figure.close() in each iteration, please let me know.
> On Fri, Oct 2, 2009 at 7:45 AM, Michael Droettboom <mdroe@...> wrote:
>> If you assign each figure to a new number, it will keep all of those
>> figures around in memory (because pyplot thinks you may want to use it
>> again.) The best route is to call close('all') or fig.close() with each
>> loop iteration.
>> 40MB per image doesn't sound way out of reason to me. How big are your
>> On 10/01/2009 10:25 PM, Leo Trottier wrote:
>> I have a friend who's having strange memory issues when opening and
>> displaying images (using Matplotlib).
>> Here's what he says:
>> pylab seems really inefficient: Opening a few images and displaying them
>> eats up tons of memory, and the memory doesn't get freed.
>> Starting python, and run
>> In : from glob import *;
>> In : from pylab import *
>> python has 33MB of memory.
>> In : i = 1
>> In : for imname in glob("*.JPG"):
>> ...: im = imread(imname)
>> ...: figure(i); i = i+1
>> ...: imshow(im)
>> This opens 10 figures and displays them. Python takes 480MB of memory.
>> This is crazy, for 10 images -- 40+MB of memory for each!
>> In : close("all")
>> In : i = 1
>> In : for imname in glob("*.JPG"):
>> im = imread(imname)
>> figure(i); i = i+1
>> This closes all figures and opens them again. Python takes up 837MB of
>> and so on... Something is really wrong with memory management.
>> ##### System info: ##############
>> (using macosx backend)
>> 2.4GHz MacBook Pro Intel Core 2 Duo
>> 4GB 667MHz DDR2 SDRAM
>> In : sys.version
>> Out: '2.6.2 (r262:71600, Oct 1 2009, 16:44:23) \n[GCC 4.2.1 (Apple
>> Inc. build 5646)]'
>> In : numpy.__version__
>> Out: '1.3.0'
>> In : matplotlib.__version__
>> Out: '0.99.1.1'
>> In : scipy.__version__
>> Out: '0.7.1'
>> In :
>> Come build with us! The BlackBerry® Developer Conference in SF, CA
>> is the only developer event you need to attend this year. Jumpstart your
>> developing skills, take BlackBerry mobile applications to market and stay
>> ahead of the curve. Join us from November 9-12, 2009. Register
>> Matplotlib-users mailing list