Python parsing module / Discussion / Help/Open Discussion: _trim

Michael Cohen - 2016-01-17

I received the following strange error:

value = self._parseNoCache( instring, loc, doActions, callPreParse ) File "/home/scudette/Dev/local/lib/python2.7/site-packages/pyparsing.py", line 1022, in _parseNoCache tokens = fn( instring, tokensStart, retTokens ) File "/home/scudette/Dev/local/lib/python2.7/site-packages/pyparsing.py", line 770, in wrapper ret = func(*args[limit[0]:]) TypeError: _make_attribute() takes exactly 2 arguments (1 given)

The I looked the source to pyparsing and was a little nausiated to see the _trimarity function.:

'decorator to trim function calls to match the arity of the target' def _trim_arity(func, maxargs=2): if func in singleArgBuiltins: return lambda s,l,t: func(t) limit = [0] foundArity = [False] def wrapper(*args): while 1: try: ret = func(*args[limit[0]:]) foundArity[0] = True return ret except TypeError: if limit[0] <= maxargs and not foundArity[0]: limit[0] += 1 continue raise return wrapper

This code is suboptimal because:
1) It assumes that a TypeError means that the call to the function is made with the wrong number of parameters. In fact if a TypeError is propagated from within the function it will just try to call it with fewer args leading to a very confusing error message for the user (especially if the function is a lambda they dont even have a name).

2) It does this every single call which is unnecessary since the function prototype is not going to change at runtime!

It is way better to replace with the following code:

def _trim_arity(func, maxargs=2): func_args = inspect.getargspec(func).args if func_args[0] == "self": func_args.pop(0) if len(func_args) == 1: return lambda s, l, t: func(t) elif len(func_args) == 2: return lambda s, l, t: func(l, t) elif len(func_args) == 3: return func

This look at definition time at the args list of the function and just dispatches the correct wrapper to each. It will have minimal impact at runtime, and more importantly will not disturb the python backtracing allowing the user to see important error messages about their callbacks:

File "/home/scudette/Dev/local/lib/python2.7/site-packages/pyparsing.py", line 1022, in _parseNoCache tokens = fn( instring, tokensStart, retTokens ) File "/home/scudette/rekall/tools/layout_expert/layout_expert/lib/parsers.py", line 17, in <lambda> return lambda s, l, t: func(t) File "/home/scudette/rekall/tools/layout_expert/layout_expert/parser/parser.py", line 430, in _make_attribute *expression)) TypeError: type object argument after * must be a sequence, not CNumber
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Paul McGuire - 2016-01-18
  
  Michael -
  
  Thanks for posting your note - sorry to hear that Pyparsing's code was
  nausea-inducing.
  
  This approach to _trim_arity was actually provided by Raymond Hettinger,
  long-time Python luminary and author of a number of modules in the Python
  standard library (including my favorite, itertools). Raymond is nothing if
  not diligent about avoiding unnecessary processing or overhead in his code.
  In fact, when I first saw this, I had a similar reaction that you did. Not
  the nausea part, but the part about thinking that this checking would be
  done in every call to the parse action. However, since the argument limit is
  saved outside the wrapper function, the repetitive argument count testing
  only occurs on the first call to the parse action - once the correct number
  of arguments is determined, subsequent calls use that number from then on.
  
  (My previous version of _trim_arity also used various introspection features
  to extract the arguments from the provided function, but this logic was
  quite fragile. There are a number of edge cases, beyond just the "skip over
  self if it is the first argument" one that you found, and the introspection
  calls had some incompatibilities between Py2 and Py3. My unit tests include
  several of these edge cases, and your straightforward proposed patch using
  inspect actually fails to pass them.)
  
  However, inspired by your email and several other recent postings, I took
  another run at making _trim_arity able to differentiate between TypeErrors
  raised during arity testing and those real TypeErrors raised within the
  body of the parse action. I think I now have a working version, having
  tried this with my own test case:
  
  Word('a').setParseAction(lambda t: t[0]+1).parseString('aaa')
  
  This parse action raises a TypeError because it tries to add a string and an
  int. With the latest updates to _trim_arity, I now get the correct
  exception message:
  
  TypeError: cannot concatenate 'str' and 'int' objects
  
  Instead of the previous (and misleading)
  
  <lambda>() takes exactly 1 argument (0 given)</lambda>
  
  I've checked this version into the SourceForge SVN repository, and it will
  be included in the next Pyparsing release. You can extract it for yourself
  if you like and try it out.
  
  Thanks again for your post,
  
  -- Paul
  
  From: Michael Cohen [mailto:scudette@users.sf.net]
  Sent: Sunday, January 17, 2016 3:09 PM
  To: [pyparsing:discussion] 337293@discussion.pyparsing.p.re.sf.net
  Subject: [pyparsing:discussion] _trim_arity hides user exceptions
  
  I received the following strange error:
  
  value = self._parseNoCache( instring, loc, doActions, callPreParse )
  File "/home/scudette/Dev/local/lib/python2.7/site-packages/pyparsing.py",
  line 1022, in _parseNoCache
  tokens = fn( instring, tokensStart, retTokens )
  File "/home/scudette/Dev/local/lib/python2.7/site-packages/pyparsing.py",
  line 770, in wrapper
  ret = func(*args[limit[0]:])
  TypeError: _make_attribute() takes exactly 2 arguments (1 given)
  
  The I looked the source to pyparsing and was a little nausiated to see the
  _trimarity function.:
  
  'decorator to trim function calls to match the arity of the target'
  def _trim_arity(func, maxargs=2):
  if func in singleArgBuiltins:
  return lambda s,l,t: func(t)
  limit = [0]
  foundArity = [False]
  def wrapper(args):
  while 1:
  try:
  ret = func(args[limit[0]:])
  foundArity[0] = True
  return ret
  except TypeError:
  if limit[0] <= maxargs and not foundArity[0]:
  limit[0] += 1
  continue
  raise
  return wrapper
  
  This code is suboptimal because:
  1) It assumes that a TypeError means that the call to the function is made
  with the wrong number of parameters. In fact if a TypeError is propagated
  from within the function it will just try to call it with fewer args leading
  to a very confusing error message for the user (especially if the function
  is a lambda they dont even have a name).
  
  2) It does this every single call which is unnecessary since the function
  prototype is not going to change at runtime!
  
  It is way better to replace with the following code:
  
  def _trim_arity(func, maxargs=2):
  func_args = inspect.getargspec(func).args
  if func_args[0] == "self":
  func_args.pop(0)
  
  if len(func_args) == 1: return lambda s, l, t: func(t) elif len(func_args) == 2: return lambda s, l, t: func(l, t) elif len(func_args) == 3: return func
  
  This look at definition time at the args list of the function and just
  dispatches the correct wrapper to each. It will have minimal impact at
  runtime, and more importantly will not disturb the python backtracing
  allowing the user to see important error messages about their callbacks:
  
  File "/home/scudette/Dev/local/lib/python2.7/site-packages/pyparsing.py",
  line 1022, in _parseNoCache
  tokens = fn( instring, tokensStart, retTokens )
  File
  "/home/scudette/rekall/tools/layout_expert/layout_expert/lib/parsers.py",
  line 17, in <lambda>
  return lambda s, l, t: func(t)
  File
  "/home/scudette/rekall/tools/layout_expert/layout_expert/parser/parser.py",
  line 430, in _make_attribute
  *expression))
  TypeError: type object argument after * must be a sequence, not CNumber</lambda>
  
  _trim_arity hides user exceptions
  https://sourceforge.net/p/pyparsing/discussion/337293/thread/8af2268f/?limi t=25#b2ad
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/pyparsing/discussion/337293/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  This email has been checked for viruses by Avast antivirus software.
  https://www.avast.com/antivirus
  
  alternate
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Paul McGuire - 2016-01-18

Michael –

Thanks for posting your note – sorry to hear that Pyparsing’s code was nausea-inducing.

This approach to _trim_arity was actually provided by Raymond Hettinger, long-time Python luminary and author of a number of modules in the Python standard library (including my favorite, itertools). Raymond is nothing if not diligent about avoiding unnecessary processing or overhead in his code. In fact, when I first saw this, I had a similar reaction that you did. Not the nausea part, but the part about thinking that this checking would be done in every call to the parse action. However, since the argument limit is saved outside the wrapper function, the repetitive argument count testing only occurs on the first call to the parse action – once the correct number of arguments is determined, subsequent calls use that number from then on.

(My previous version of _trim_arity also used various introspection features to extract the arguments from the provided function, but this logic was quite fragile. There are a number of edge cases, beyond just the “skip over self if it is the first argument” one that you found, and the introspection calls had some incompatibilities between Py2 and Py3. My unit tests include several of these edge cases, and your straightforward proposed patch using inspect actually fails to pass them.)

However, inspired by your email and several other recent postings, I took another run at making _trim_arity able to differentiate between TypeErrors raised during arity testing and those real TypeErrors raised within the body of the parse action. I think I now have a working version, having tried this with my own test case:

Word('a').setParseAction(lambda t: t[0]+1).parseString('aaa')

This parse action raises a TypeError because it tries to add a string and an int. With the latest updates to _trim_arity, I now get the correct exception message:

TypeError: cannot concatenate 'str' and 'int' objects

Instead of the previous (and misleading)

<lambda>() takes exactly 1 argument (0 given)

I’ve checked this version into the SourceForge SVN repository, and it will be included in the next Pyparsing release. You can extract it for yourself if you like and try it out.

Thanks again for your post,
-- Paul
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

_trim_arity hides user exceptions

Forums

Help

_trim_arity hides user exceptions document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

_trim_arity hides user exceptions