Thread: [Pyparsing] Patch to fix memory leaks with Python 3.x
Brought to you by:
ptmcg
From: Michael D. <md...@gm...> - 2011-03-11 19:20:13
|
We are in the process of porting matplotlib to Python 3.x. Matplotlib uses pyparsing to parse a TeX-like mini-language for math expressions. A bunch of hard-working folks at the Cape Town PUG noticed that memory was leaking like crazy whenever this functionality was being used. On further investigation, it very confusingly turns out it was leaking stack frames, so even objects that never touched the pyparsing-based parser were getting leaked. This seems to be centered around the change in Python 3.x where exception objects contain a member "__traceback__" containing the full traceback of the exception. This means that an exception object that is referenced outside of an except block will create a cyclical reference with the local stack frame in which its in. For example, in code like: try: do_something() except Exception as exc: my_exc = exc # my_exc will live beyond the except block return my_exc This creates a cylical reference from my_exc -> my_exc.__traceback__ -> local stack frame -> my_exc. Having cyclical references means that any local variable *anywhere in the stack* of the thrown exception, will not be freed until the garbage collector feels enough pressure to do so. When those objects include C-extensions that allocate memory on the heap (as is the case in matplotlib), the garbage collector doesn't know enough about those objects to start freeing soon enough, and memory usage quickly grows unmanageable. See this warning in the "porting to Python 3" guide: http://docs.python.org/py3k/howto/pyporting.html#capturing-the-currently-raised-exception See also the "Open Issue" section of PEP 3134: http://www.python.org/dev/peps/pep-3134/ This causes a lot of headaches storing and passing around exceptions for later use as pyparsing does routinely. I have attached a patch against SVN that seems to resolve these reference leaks -- at least the ones that are exercised by matplotlib's math parser. The changes fall into a number of categories: 1) Remove use of sys.exc_info(). In Python 3, the exception object (that is the "exc" variable of "except Exception as exc") is automatically dereferenced upon leaving the except block. The same is not true of the result of sys.exc_info(), and if the exception object leaves the except block it requires special care to avoid creating a cyclical reference with the frame. It's not required, but it does simplify the code a lot to simply use "except Exception as exc" where it applies. 2) By storing the "myException" object in ParserElement objects, it was creating a cyclical reference between the exception object and the ParserElement object. This was not much of a problem in Python 2.x, but in Python 3.x since exception objects pull in all the baggage from the traceback, the memory wastage is considerable. I fixed this case by simply creating exception objects when they are raised, and not maintaining a myException member. I don't know why the myException member existed in the first place (performance considerations perhaps?), so I don't know if there are downsides to this change. An alternative might be to store a weak reference to the ParserElement inside of the exception object -- but that creates a user-visible API change to the exception object. 3) When exception objects do need to exist outside of the except block, the traceback should be removed from the exception object, using "exc.__traceback__ = None". There are a few examples of this, such as storing exceptions in the parser cache (in _parseCache). By deleting the traceback, it is basically restored to the behavior of the old Python 2.x code, which, by using sys.exc_info(), was storing the exception only and not the traceback payload. Thanks again for pyparsing -- it has been invaluable on our project. I hope this patch will benefit others making the transition to Python 3. Cheers, Mike -- Michael Droettboom http://www.droettboom.com/ |
From: Paul M. <pt...@au...> - 2011-03-14 04:50:43
|
This sounds like some terrific work, thanks! Unfortunately I got no attachment on your e-mail, could you paste it to someplace publicly accessible, maybe pastebin.com? I've got some other changes queued up for a next release, but this would be great to get included. Please write back when you've got your code posted. Thanks! -- Paul -----Original Message----- From: Michael Droettboom [mailto:md...@gm...] Sent: Friday, March 11, 2011 1:11 PM To: pyp...@li... Subject: [Pyparsing] Patch to fix memory leaks with Python 3.x We are in the process of porting matplotlib to Python 3.x. Matplotlib uses pyparsing to parse a TeX-like mini-language for math expressions. A bunch of hard-working folks at the Cape Town PUG noticed that memory was leaking like crazy whenever this functionality was being used. On further investigation, it very confusingly turns out it was leaking stack frames, so even objects that never touched the pyparsing-based parser were getting leaked. This seems to be centered around the change in Python 3.x where exception objects contain a member "__traceback__" containing the full traceback of the exception. This means that an exception object that is referenced outside of an except block will create a cyclical reference with the local stack frame in which its in. For example, in code like: try: do_something() except Exception as exc: my_exc = exc # my_exc will live beyond the except block return my_exc This creates a cylical reference from my_exc -> my_exc.__traceback__ -> local stack frame -> my_exc. Having cyclical references means that any local variable *anywhere in the stack* of the thrown exception, will not be freed until the garbage collector feels enough pressure to do so. When those objects include C-extensions that allocate memory on the heap (as is the case in matplotlib), the garbage collector doesn't know enough about those objects to start freeing soon enough, and memory usage quickly grows unmanageable. See this warning in the "porting to Python 3" guide: http://docs.python.org/py3k/howto/pyporting.html#capturing-the-currently-raised-exception See also the "Open Issue" section of PEP 3134: http://www.python.org/dev/peps/pep-3134/ This causes a lot of headaches storing and passing around exceptions for later use as pyparsing does routinely. I have attached a patch against SVN that seems to resolve these reference leaks -- at least the ones that are exercised by matplotlib's math parser. The changes fall into a number of categories: 1) Remove use of sys.exc_info(). In Python 3, the exception object (that is the "exc" variable of "except Exception as exc") is automatically dereferenced upon leaving the except block. The same is not true of the result of sys.exc_info(), and if the exception object leaves the except block it requires special care to avoid creating a cyclical reference with the frame. It's not required, but it does simplify the code a lot to simply use "except Exception as exc" where it applies. 2) By storing the "myException" object in ParserElement objects, it was creating a cyclical reference between the exception object and the ParserElement object. This was not much of a problem in Python 2.x, but in Python 3.x since exception objects pull in all the baggage from the traceback, the memory wastage is considerable. I fixed this case by simply creating exception objects when they are raised, and not maintaining a myException member. I don't know why the myException member existed in the first place (performance considerations perhaps?), so I don't know if there are downsides to this change. An alternative might be to store a weak reference to the ParserElement inside of the exception object -- but that creates a user-visible API change to the exception object. 3) When exception objects do need to exist outside of the except block, the traceback should be removed from the exception object, using "exc.__traceback__ = None". There are a few examples of this, such as storing exceptions in the parser cache (in _parseCache). By deleting the traceback, it is basically restored to the behavior of the old Python 2.x code, which, by using sys.exc_info(), was storing the exception only and not the traceback payload. Thanks again for pyparsing -- it has been invaluable on our project. I hope this patch will benefit others making the transition to Python 3. Cheers, Mike -- Michael Droettboom http://www.droettboom.com/ |
From: Michael D. <md...@gm...> - 2011-03-16 16:05:09
|
Sorry about that. I've put the patch up here: https://gist.github.com/869225 Mike On Mon, Mar 14, 2011 at 12:50 AM, Paul McGuire <pt...@au...> wrote: > This sounds like some terrific work, thanks! Unfortunately I got no > attachment on your e-mail, could you paste it to someplace publicly > accessible, maybe pastebin.com? I've got some other changes queued up for > a next release, but this would be great to get included. > > Please write back when you've got your code posted. > > Thanks! > -- Paul > > > > > -----Original Message----- > From: Michael Droettboom [mailto:md...@gm...] > Sent: Friday, March 11, 2011 1:11 PM > To: pyp...@li... > Subject: [Pyparsing] Patch to fix memory leaks with Python 3.x > > We are in the process of porting matplotlib to Python 3.x. Matplotlib uses > pyparsing to parse a TeX-like mini-language for math expressions. > > A bunch of hard-working folks at the Cape Town PUG noticed that memory was > leaking like crazy whenever this functionality was being used. On further > investigation, it very confusingly turns out it was leaking stack frames, so > even objects that never touched the pyparsing-based parser were getting > leaked. > > This seems to be centered around the change in Python 3.x where exception > objects contain a member "__traceback__" containing the full traceback of > the exception. This means that an exception object that is referenced > outside of an except block will create a cyclical reference with the local > stack frame in which its in. For example, in code like: > > try: > do_something() > except Exception as exc: > my_exc = exc # my_exc will live beyond the except block return my_exc > > This creates a cylical reference from my_exc -> my_exc.__traceback__ -> > local stack frame -> my_exc. > > Having cyclical references means that any local variable *anywhere in the > stack* of the thrown exception, will not be freed until the garbage > collector feels enough pressure to do so. When those objects include > C-extensions that allocate memory on the heap (as is the case in > matplotlib), the garbage collector doesn't know enough about those objects > to start freeing soon enough, and memory usage quickly grows unmanageable. > > See this warning in the "porting to Python 3" guide: > > http://docs.python.org/py3k/howto/pyporting.html#capturing-the-currently-raised-exception > > See also the "Open Issue" section of PEP 3134: > http://www.python.org/dev/peps/pep-3134/ > > This causes a lot of headaches storing and passing around exceptions for > later use as pyparsing does routinely. > > I have attached a patch against SVN that seems to resolve these reference > leaks -- at least the ones that are exercised by matplotlib's math parser. > > The changes fall into a number of categories: > > 1) Remove use of sys.exc_info(). In Python 3, the exception object (that > is the "exc" variable of "except Exception as exc") is automatically > dereferenced upon leaving the except block. The same is not true of the > result of sys.exc_info(), and if the exception object leaves the except > block it requires special care to avoid creating a cyclical reference with > the frame. It's not required, but it does simplify the code a lot to simply > use "except Exception as exc" where it applies. > > 2) By storing the "myException" object in ParserElement objects, it was > creating a cyclical reference between the exception object and the > ParserElement object. This was not much of a problem in Python 2.x, but in > Python 3.x since exception objects pull in all the baggage from the > traceback, the memory wastage is considerable. I fixed this case by simply > creating exception objects when they are raised, and not maintaining a > myException member. I don't know why the myException member existed in the > first place (performance considerations perhaps?), so I don't know if there > are downsides to this change. An alternative might be to store a weak > reference to the ParserElement inside of the exception object -- but that > creates a user-visible API change to the exception object. > > 3) When exception objects do need to exist outside of the except block, the > traceback should be removed from the exception object, using > "exc.__traceback__ = None". There are a few examples of this, such as > storing exceptions in the parser cache (in _parseCache). By deleting the > traceback, it is basically restored to the behavior of the old Python 2.x > code, which, by using sys.exc_info(), was storing the exception only and not > the traceback payload. > > Thanks again for pyparsing -- it has been invaluable on our project. I > hope this patch will benefit others making the transition to Python 3. > > Cheers, > Mike > > -- > Michael Droettboom > http://www.droettboom.com/ > > -- Michael Droettboom http://www.droettboom.com/ |