Re: [Jython-users] Memory hogs (reformatted)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Kevin J. Butler wrote:

A very poorly formatted message (Darn the copy & paste in Mozilla mail!).
Sorry about that, here's a better formatting job:

Alex Tichtchenko wrote:
 >
 > Anybody has any hints on how to control memory allocator/garbage collector
 > behaviour ? Consider the following bit of code:
 >
 > #md5 fodder to make it a sensible bit of code :))
 > import md5
 >
 > m = md5.new()
 > f = open('foo', 'rb')
 > b = f.read(2048)
 > while b != '':
 >     m.update(b)
 >     b = f.read(2048)
 > print m.hexdigest()
 >
 > The interesting part here is 'b = f.read()' in the loop. I tried different
 > approaches, but invariably this bit of code tends to run out of memory
 > fairly quickly on the moderate size files, say 130Mb.

Actually, the interesting part is the m.update( b ).

 From MD5Object.update(...):

         data += arg.toString();

The md5 object is accumulating the entire string, then the digest method 
calculates the md5sum of it.

This is inherently unscalable.  :-(  It should be filed as a bug.

If you get rid of the m.update(), or use a non-collecting implementation, you 
shouldn't have any problem. (Except, of course, that you said the md5 stuff 
wasn't really your application - I assume whatever you are really wanting to 
do is doing something similar.)

 > Any hints, ways to kickstart garbage collection cycles, anything -- highly
 > appreciated.

     rt = Runtime.getRuntime()
     rt.gc()

suggests the interpreter should do a garbage collection.  This won't help you 
here, because m.data == the whole file read to date.

You can also examine total & free memory to watch if something is consuming 
memory:

         print rt.totalMemory(), rt.freeMemory()

kb