From: Kevin J. B. <kev...@bi...> - 2002-06-12 16:32:55
|
Kevin J. Butler wrote: A very poorly formatted message (Darn the copy & paste in Mozilla mail!). Sorry about that, here's a better formatting job: Alex Tichtchenko wrote: > > Anybody has any hints on how to control memory allocator/garbage collector > behaviour ? Consider the following bit of code: > > #md5 fodder to make it a sensible bit of code :)) > import md5 > > m = md5.new() > f = open('foo', 'rb') > b = f.read(2048) > while b != '': > m.update(b) > b = f.read(2048) > print m.hexdigest() > > The interesting part here is 'b = f.read()' in the loop. I tried different > approaches, but invariably this bit of code tends to run out of memory > fairly quickly on the moderate size files, say 130Mb. Actually, the interesting part is the m.update( b ). From MD5Object.update(...): data += arg.toString(); The md5 object is accumulating the entire string, then the digest method calculates the md5sum of it. This is inherently unscalable. :-( It should be filed as a bug. If you get rid of the m.update(), or use a non-collecting implementation, you shouldn't have any problem. (Except, of course, that you said the md5 stuff wasn't really your application - I assume whatever you are really wanting to do is doing something similar.) > Any hints, ways to kickstart garbage collection cycles, anything -- highly > appreciated. rt = Runtime.getRuntime() rt.gc() suggests the interpreter should do a garbage collection. This won't help you here, because m.data == the whole file read to date. You can also examine total & free memory to watch if something is consuming memory: print rt.totalMemory(), rt.freeMemory() kb |