Menu

#55 Memory Leak

open
nobody
6
2014-08-16
2005-05-09
Peufeu
No

I must say this one was quite hard to trace.
I can't reproduce it as a test case but man, does it leak.

Gory details :

class FakeStream( object ):
FILE_THRESHOLD = 131072

def __init__( self, known_length = None ):
if known_length is not None:
self._isFull = lambda: self._stream.tell() >=
known_length
else:
self._isFull = lambda: False
known_length = 0

if known_length < self.FILE_THRESHOLD:
self._stream = StringIO()
self.write = self._write_stringio
else:
self._stream = os.tmpfile()
self.write = self._write_file

etc...

Now this is a Fake Stream (as its name implies) class which contains
either a cStringIO or a file depending on the file size. It mutates from
cStringIO to file when you stuff a lot of stuff in it. It's designed to
accumulate POST request data in a webserver.

So it has a self._isFull() which returns True when the amount of data
acucmulated is equal to the Content-Length header which is parsed
somewhere else.

Now I noticed the memory usage of my program growing like crazy
under apachebench.

Then I played with the gc module and saw that each POST request
had a leaking FakeStream which was referenced by a cell object :

Here's a referencer graph :

<Libs.Streams.FakeStream object at 0xb59a496c>
<type 'cell'> <cell at 0xb7d179bc: FakeStream object at 0xb59a49
<type 'list'> [<cell at 0xb7d179bc: FakeStream object at 0xb59a4
<type 'dict'> {<type 'function'>: [<function execle at 0xb7d49d8
<type 'list'> [{<type 'function'>: [<function execle at 0xb7d49d
<type 'dict'> {<type 'function'>: [<function execle at
0xb7d49d8
<type 'list'> [[{<type 'function'>: [<function execle at
0xb7d49
<type 'list'> [{<type 'function'>: [<function execle at 0xb7d49d
<type 'list'> [[{<type 'function'>: [<function execle at
0xb7d49
<type 'list'> [[<cell at 0xb7d179bc: FakeStream object at 0xb59a
<type 'list'> [{<type 'function'>: [<function execle at 0xb7d49d
<type 'list'> [[{<type 'function'>: [<function execle at
0xb7d49
<type 'list'> [<cell at 0xb7d179bc: FakeStream object at 0xb59a4
<type 'list'> [[<cell at 0xb7d179bc: FakeStream object at 0xb59a
<type 'list'> [[[<cell at 0xb7d179bc: FakeStream object at
0xb59
<type 'list'> [[[[<cell at 0xb7d179bc: FakeStream object at
0xb5
<type 'tuple'> (<cell at 0xb7d1777c: int object at 0x8278590>, <c
<type 'list'> [(1,), (), (<type 'object'>,), (<type 'object'>,),
<type 'dict'> {<type 'function'>: [<function execle at
0xb7d49d8
<type 'list'> [{<type 'function'>: [<function execle at
0xb7d49d
<type 'list'> [{<type 'function'>: [<function execle at
0xb7d49d
<type 'list'> [[(1,), (), (<type 'object'>,), (<type 'object'>,)
<type 'list'> [{<type 'function'>: [<function execle at
0xb7d49d
<type 'list'> [[<cell at 0xb7d179bc: FakeStream object at 0xb59a
<type 'list'> [[(1,), (), (<type 'object'>,), (<type 'object'>,)
<type 'list'> [[[(1,), (), (<type 'object'>,), (<type 'object'>,
<type 'function'> <function <lambda> at 0xb59ffdf4>
<type 'list'> [<function execle at 0xb7d49d84>, <function
execlp
<type 'dict'> {<type 'function'>: [<function execle at
0xb7d49d8
<type 'list'> [[<function execle at 0xb7d49d84>, <function
execl
<type 'list'> [[(1,), (), (<type 'object'>,), (<type 'object'>,)
<type 'list'> [[<function execle at 0xb7d49d84>, <function
execl
<type 'dict'> {'_stream': <cStringIO.StringO object at
0xb59a49a
<type 'list'> [{<type 'function'>: [<function execle at
0xb7d49d
<type 'list'> [[<function execle at 0xb7d49d84>, <function
execl
<class 'Libs.Streams.FakeStream'> <Libs.Streams.
FakeStream object at 0xb59a496c>

The lambda function with self as a closure argument is the problem. It
generates a <cell> and It seems that psyco can't garbage collect this
when the FakeStream object is no longer referenced. If I remove the
lambda OR psyco, no more leaks. The cell is the last reference to the
FakeStream. It is itself referenced by a lot of stuff, though.

I hope this will help you find the problem... it looks quite daunting !

Regards,

PF Caillaud
pfcaillaud@boutiquenumerique.com

Discussion

  • Peufeu

    Peufeu - 2005-05-09
    • priority: 5 --> 6
     
  • Peufeu

    Peufeu - 2005-05-09

    Logged In: YES
    user_id=587274

    DAMMIT sourceforge ate all my tabulations.
    I'll repost the indented text as an attachment !

     
  • Peufeu

    Peufeu - 2005-05-09
     
  • Armin Rigo

    Armin Rigo - 2005-05-10

    Logged In: YES
    user_id=4771

    Thanks a lot for this report, with some debugging time I should be able to eventually fix this leak now. Some leaks have been mentionned previously but you've got the first small example.

    In the meantime, note that the cell (nested scope) variables 'self' and 'known_length' in the lambda, in addition to causing apparently the leak, would have a bad impact on the performance of Psyco even without the leak. You could use the default argument trick,

    lambda self=self, known_length=known_length:
    self._stream.tell() >= known_length

    which seems to get rid of the leak too. (Which doesn't mean that I won't have to investigate and fix it, of course!)

     
  • Peufeu

    Peufeu - 2005-05-10

    Logged In: YES
    user_id=587274

    Thanks for the default argument trick ! I had not thought about that but it's
    nice. I'm not worried about performance though, this function is called once
    every write of 32KB in the stream, but I'll keep it in mind for the future.

    I did the test as you said :
    lambda s=self, k=known_length: s._stream.tell() >= k

    The leak is still there ! Sorry ;)

    With this the leak disappears : using a simple method instead of a closure

    if known_length is not None:
    self._known_length = known_length
    self._isFull = self._tIsFull
    ...

    def _tIsFull( self ):
    return self._stream.tell() >= self._known_length

    Regards,
    PF Caillaud

     
  • Peufeu

    Peufeu - 2005-05-10

    Logged In: YES
    user_id=587274

    Also the bound method thing should be faster; I think.

     
  • Armin Rigo

    Armin Rigo - 2005-06-04

    Logged In: YES
    user_id=4771

    After some debugging, the problem is that -- a fact that
    I overlooked -- creating functions dynamically and calling
    them will try to specialize for each function, which leaks
    the function. So creating new functions (with 'def' or
    'lambda') is a sure source of leaks in the current Psyco...

    Not sure what to do about it. I could detect if more than
    a few different functions arrive at a given point and stop
    specializing, or specialize differently -- e.g. on the code
    object instead.

    What I am not sure either is if there aren't a number of
    other places that expose a similar leak. I guess a complete
    solution would involve reviewing all the specialization
    points and adding some kind of fall-back mechanism for when
    too many different values have been seen there already.
    That's probably more work than I want to spend...

     
  • Armin Rigo

    Armin Rigo - 2006-04-17

    Logged In: YES
    user_id=4771

    This might be solved in the Subversion repository HEAD.

     

Log in to post a comment.

MongoDB Logo MongoDB