Re: [Pyparsing] Stuck on recursive parsing

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Greg -

Welcome to pyparsing, and a nice first attempt for someone new!

Of course, the solution to this is to "trick" pyparsing into accepting a
function call.  First of all, I would suggest that your function call should
really be something like this:

Function = Token + "(" + PP.delimitedList(Expr2) + ")"

in order to handle functions that have multiple arguments.  However, if you
are just trying to support basic math functions like abs, sqrt, sin, etc.,
then what you have is fine.

Ok.  The next step is to expand your concept a little bit.  In your grammar,
a function call is really at the same expression level as an integer or a
token.  So let's insert a function as one of the possible terms that could
be evaluated within your expression:

Expr2 = PP.operatorPrecedence(Integer | Function | Token,
    [(PP.oneOf("not - +"), 1, PP.opAssoc.RIGHT),
     (PP.oneOf("* / %"), 2, PP.opAssoc.LEFT),
     ... etc.

(I've placed Function ahead of Token in the list, so that "abs(-6)" tries to
evaluate as a function first, and a token second.)

Ah, but Function was defined using Expr2, we can't use it in the definition
of Expr2 also.

Unless... you guessed it - we make Expr2 a Forward!  Like this:

Expr2 = PP.Forward()
Function = Token + "(" + PP.delimitedList(Expr2) + ")"

# Basic operations
# change '=' to '<<' here
Expr2 << PP.operatorPrecedence(Integer | Function | Token,
    [(PP.oneOf("not - +"), 1, PP.opAssoc.RIGHT),
     (PP.oneOf("* / %"), 2, PP.opAssoc.LEFT),
     ... etc.

At this point, we are getting *very* recursive.  You can try to evaluate
your test string, but it might take a *long* time to run.  I am working on
this issue with another pyparsing user, but at the moment, it is a
performance issue of operatorPrecedence.  To workaround this problem, we can
use pyparsing's packrat parsing support.  Immediately after your "import
pyparsing..." statement, add this line:

PP.ParserElement.enablePackrat()

This reduces a lot of the repetitive parsing calls, and can improve the
performance of a heavily recursive grammar by 1000X or more.

With these changes, I now get:

[[2, '-', [5, '*', 'abs', '(', ['-', 6], ')'], '+', 3]]

A couple parting thoughts:

It would be good to retain the association of 'abs' and its argument list,
instead of just having that flat list of parsed tokens.  Pyparsing's Group
class is for just this purpose.  Change the definition of Function to Group
the entire expression, and add another Group for the arguments (to make them
easy to pick out from within the function's token list), like this:

Function = PP.Group(Token + 
        PP.Group("(" + PP.delimitedList(Expr2) + ")"))

and your results are now better structured for easier processing:

[[2, '-', [5, '*', ['abs', ['(', ['-', 6], ')']]], '+', 3]]

Or with some better indentation:

[
  [2, '-', 
    [5, '*', 
      ['abs', 
        ['(', ['-', 6], ')']
      ]
    ], '+', 3
  ]
]

Lastly, here is an alternative way to represent your potentially
dot-delimited identifier token:

Ident = PP.Word(PP.alphas, PP.alphanums + "_")
Token = PP.Combine(Ident + PP.ZeroOrMore('.' + Ident))

This is just a style point, and what you have is working fine, but this
change avoids duplicating the 'PP.Word(PP.alphas, PP.alphanums + "_")' bit;
and adds the pyparsing Combine expression, which will collapse the separate
tokens of a reference like "math.abs" from ['math', '.', 'abs'] to just
'math.abs'.

You could also have done the repetition using the pyparsing delimitedList
helper, as:

Token = PP.delimitedList(Ident,'.',combine=True)

Again, this is much more personal style, and is completely dependent on your
own taste in using pyparsing.

Cheers!
-- Paul