Re: [Pyparsing] Patch to fix memory leaks with Python 3.x
Brought to you by:
ptmcg
From: Michael D. <md...@gm...> - 2011-03-16 16:05:09
|
Sorry about that. I've put the patch up here: https://gist.github.com/869225 Mike On Mon, Mar 14, 2011 at 12:50 AM, Paul McGuire <pt...@au...> wrote: > This sounds like some terrific work, thanks! Unfortunately I got no > attachment on your e-mail, could you paste it to someplace publicly > accessible, maybe pastebin.com? I've got some other changes queued up for > a next release, but this would be great to get included. > > Please write back when you've got your code posted. > > Thanks! > -- Paul > > > > > -----Original Message----- > From: Michael Droettboom [mailto:md...@gm...] > Sent: Friday, March 11, 2011 1:11 PM > To: pyp...@li... > Subject: [Pyparsing] Patch to fix memory leaks with Python 3.x > > We are in the process of porting matplotlib to Python 3.x. Matplotlib uses > pyparsing to parse a TeX-like mini-language for math expressions. > > A bunch of hard-working folks at the Cape Town PUG noticed that memory was > leaking like crazy whenever this functionality was being used. On further > investigation, it very confusingly turns out it was leaking stack frames, so > even objects that never touched the pyparsing-based parser were getting > leaked. > > This seems to be centered around the change in Python 3.x where exception > objects contain a member "__traceback__" containing the full traceback of > the exception. This means that an exception object that is referenced > outside of an except block will create a cyclical reference with the local > stack frame in which its in. For example, in code like: > > try: > do_something() > except Exception as exc: > my_exc = exc # my_exc will live beyond the except block return my_exc > > This creates a cylical reference from my_exc -> my_exc.__traceback__ -> > local stack frame -> my_exc. > > Having cyclical references means that any local variable *anywhere in the > stack* of the thrown exception, will not be freed until the garbage > collector feels enough pressure to do so. When those objects include > C-extensions that allocate memory on the heap (as is the case in > matplotlib), the garbage collector doesn't know enough about those objects > to start freeing soon enough, and memory usage quickly grows unmanageable. > > See this warning in the "porting to Python 3" guide: > > http://docs.python.org/py3k/howto/pyporting.html#capturing-the-currently-raised-exception > > See also the "Open Issue" section of PEP 3134: > http://www.python.org/dev/peps/pep-3134/ > > This causes a lot of headaches storing and passing around exceptions for > later use as pyparsing does routinely. > > I have attached a patch against SVN that seems to resolve these reference > leaks -- at least the ones that are exercised by matplotlib's math parser. > > The changes fall into a number of categories: > > 1) Remove use of sys.exc_info(). In Python 3, the exception object (that > is the "exc" variable of "except Exception as exc") is automatically > dereferenced upon leaving the except block. The same is not true of the > result of sys.exc_info(), and if the exception object leaves the except > block it requires special care to avoid creating a cyclical reference with > the frame. It's not required, but it does simplify the code a lot to simply > use "except Exception as exc" where it applies. > > 2) By storing the "myException" object in ParserElement objects, it was > creating a cyclical reference between the exception object and the > ParserElement object. This was not much of a problem in Python 2.x, but in > Python 3.x since exception objects pull in all the baggage from the > traceback, the memory wastage is considerable. I fixed this case by simply > creating exception objects when they are raised, and not maintaining a > myException member. I don't know why the myException member existed in the > first place (performance considerations perhaps?), so I don't know if there > are downsides to this change. An alternative might be to store a weak > reference to the ParserElement inside of the exception object -- but that > creates a user-visible API change to the exception object. > > 3) When exception objects do need to exist outside of the except block, the > traceback should be removed from the exception object, using > "exc.__traceback__ = None". There are a few examples of this, such as > storing exceptions in the parser cache (in _parseCache). By deleting the > traceback, it is basically restored to the behavior of the old Python 2.x > code, which, by using sys.exc_info(), was storing the exception only and not > the traceback payload. > > Thanks again for pyparsing -- it has been invaluable on our project. I > hope this patch will benefit others making the transition to Python 3. > > Cheers, > Mike > > -- > Michael Droettboom > http://www.droettboom.com/ > > -- Michael Droettboom http://www.droettboom.com/ |