I must say this one was quite hard to trace.
I can't reproduce it as a test case but man, does it leak.
Gory details :
class FakeStream( object ):
FILE_THRESHOLD = 131072
def __init__( self, known_length = None ):
if known_length is not None:
self._isFull = lambda: self._stream.tell() >=
known_length
else:
self._isFull = lambda: False
known_length = 0
if known_length < self.FILE_THRESHOLD:
self._stream = StringIO()
self.write = self._write_stringio
else:
self._stream = os.tmpfile()
self.write = self._write_file
etc...
Now this is a Fake Stream (as its name implies) class which contains
either a cStringIO or a file depending on the file size. It mutates from
cStringIO to file when you stuff a lot of stuff in it. It's designed to
accumulate POST request data in a webserver.
So it has a self._isFull() which returns True when the amount of data
acucmulated is equal to the Content-Length header which is parsed
somewhere else.
Now I noticed the memory usage of my program growing like crazy
under apachebench.
Then I played with the gc module and saw that each POST request
had a leaking FakeStream which was referenced by a cell object :
Here's a referencer graph :
<Libs.Streams.FakeStream object at 0xb59a496c>
<type 'cell'> <cell at 0xb7d179bc: FakeStream object at 0xb59a49
<type 'list'> [<cell at 0xb7d179bc: FakeStream object at 0xb59a4
<type 'dict'> {<type 'function'>: [<function execle at 0xb7d49d8
<type 'list'> [{<type 'function'>: [<function execle at 0xb7d49d
<type 'dict'> {<type 'function'>: [<function execle at
0xb7d49d8
<type 'list'> [[{<type 'function'>: [<function execle at
0xb7d49
<type 'list'> [{<type 'function'>: [<function execle at 0xb7d49d
<type 'list'> [[{<type 'function'>: [<function execle at
0xb7d49
<type 'list'> [[<cell at 0xb7d179bc: FakeStream object at 0xb59a
<type 'list'> [{<type 'function'>: [<function execle at 0xb7d49d
<type 'list'> [[{<type 'function'>: [<function execle at
0xb7d49
<type 'list'> [<cell at 0xb7d179bc: FakeStream object at 0xb59a4
<type 'list'> [[<cell at 0xb7d179bc: FakeStream object at 0xb59a
<type 'list'> [[[<cell at 0xb7d179bc: FakeStream object at
0xb59
<type 'list'> [[[[<cell at 0xb7d179bc: FakeStream object at
0xb5
<type 'tuple'> (<cell at 0xb7d1777c: int object at 0x8278590>, <c
<type 'list'> [(1,), (), (<type 'object'>,), (<type 'object'>,),
<type 'dict'> {<type 'function'>: [<function execle at
0xb7d49d8
<type 'list'> [{<type 'function'>: [<function execle at
0xb7d49d
<type 'list'> [{<type 'function'>: [<function execle at
0xb7d49d
<type 'list'> [[(1,), (), (<type 'object'>,), (<type 'object'>,)
<type 'list'> [{<type 'function'>: [<function execle at
0xb7d49d
<type 'list'> [[<cell at 0xb7d179bc: FakeStream object at 0xb59a
<type 'list'> [[(1,), (), (<type 'object'>,), (<type 'object'>,)
<type 'list'> [[[(1,), (), (<type 'object'>,), (<type 'object'>,
<type 'function'> <function <lambda> at 0xb59ffdf4>
<type 'list'> [<function execle at 0xb7d49d84>, <function
execlp
<type 'dict'> {<type 'function'>: [<function execle at
0xb7d49d8
<type 'list'> [[<function execle at 0xb7d49d84>, <function
execl
<type 'list'> [[(1,), (), (<type 'object'>,), (<type 'object'>,)
<type 'list'> [[<function execle at 0xb7d49d84>, <function
execl
<type 'dict'> {'_stream': <cStringIO.StringO object at
0xb59a49a
<type 'list'> [{<type 'function'>: [<function execle at
0xb7d49d
<type 'list'> [[<function execle at 0xb7d49d84>, <function
execl
<class 'Libs.Streams.FakeStream'> <Libs.Streams.
FakeStream object at 0xb59a496c>
The lambda function with self as a closure argument is the problem. It
generates a <cell> and It seems that psyco can't garbage collect this
when the FakeStream object is no longer referenced. If I remove the
lambda OR psyco, no more leaks. The cell is the last reference to the
FakeStream. It is itself referenced by a lot of stuff, though.
I hope this will help you find the problem... it looks quite daunting !
Regards,
PF Caillaud
pfcaillaud@boutiquenumerique.com
Logged In: YES
user_id=587274
DAMMIT sourceforge ate all my tabulations.
I'll repost the indented text as an attachment !
Logged In: YES
user_id=4771
Thanks a lot for this report, with some debugging time I should be able to eventually fix this leak now. Some leaks have been mentionned previously but you've got the first small example.
In the meantime, note that the cell (nested scope) variables 'self' and 'known_length' in the lambda, in addition to causing apparently the leak, would have a bad impact on the performance of Psyco even without the leak. You could use the default argument trick,
lambda self=self, known_length=known_length:
self._stream.tell() >= known_length
which seems to get rid of the leak too. (Which doesn't mean that I won't have to investigate and fix it, of course!)
Logged In: YES
user_id=587274
Thanks for the default argument trick ! I had not thought about that but it's
nice. I'm not worried about performance though, this function is called once
every write of 32KB in the stream, but I'll keep it in mind for the future.
I did the test as you said :
lambda s=self, k=known_length: s._stream.tell() >= k
The leak is still there ! Sorry ;)
With this the leak disappears : using a simple method instead of a closure
if known_length is not None:
self._known_length = known_length
self._isFull = self._tIsFull
...
def _tIsFull( self ):
return self._stream.tell() >= self._known_length
Regards,
PF Caillaud
Logged In: YES
user_id=587274
Also the bound method thing should be faster; I think.
Logged In: YES
user_id=4771
After some debugging, the problem is that -- a fact that
I overlooked -- creating functions dynamically and calling
them will try to specialize for each function, which leaks
the function. So creating new functions (with 'def' or
'lambda') is a sure source of leaks in the current Psyco...
Not sure what to do about it. I could detect if more than
a few different functions arrive at a given point and stop
specializing, or specialize differently -- e.g. on the code
object instead.
What I am not sure either is if there aren't a number of
other places that expose a similar leak. I guess a complete
solution would involve reviewing all the specialization
points and adding some kind of fall-back mechanism for when
too many different values have been seen there already.
That's probably more work than I want to spend...
Logged In: YES
user_id=4771
This might be solved in the Subversion repository HEAD.