Re: [Pyparsing] Stuck on recursive parsing
Brought to you by:
ptmcg
From: Paul M. <pa...@al...> - 2008-03-06 22:42:59
|
Greg - Welcome to pyparsing, and a nice first attempt for someone new! Of course, the solution to this is to "trick" pyparsing into accepting a function call. First of all, I would suggest that your function call should really be something like this: Function = Token + "(" + PP.delimitedList(Expr2) + ")" in order to handle functions that have multiple arguments. However, if you are just trying to support basic math functions like abs, sqrt, sin, etc., then what you have is fine. Ok. The next step is to expand your concept a little bit. In your grammar, a function call is really at the same expression level as an integer or a token. So let's insert a function as one of the possible terms that could be evaluated within your expression: Expr2 = PP.operatorPrecedence(Integer | Function | Token, [(PP.oneOf("not - +"), 1, PP.opAssoc.RIGHT), (PP.oneOf("* / %"), 2, PP.opAssoc.LEFT), ... etc. (I've placed Function ahead of Token in the list, so that "abs(-6)" tries to evaluate as a function first, and a token second.) Ah, but Function was defined using Expr2, we can't use it in the definition of Expr2 also. Unless... you guessed it - we make Expr2 a Forward! Like this: Expr2 = PP.Forward() Function = Token + "(" + PP.delimitedList(Expr2) + ")" # Basic operations # change '=' to '<<' here Expr2 << PP.operatorPrecedence(Integer | Function | Token, [(PP.oneOf("not - +"), 1, PP.opAssoc.RIGHT), (PP.oneOf("* / %"), 2, PP.opAssoc.LEFT), ... etc. At this point, we are getting *very* recursive. You can try to evaluate your test string, but it might take a *long* time to run. I am working on this issue with another pyparsing user, but at the moment, it is a performance issue of operatorPrecedence. To workaround this problem, we can use pyparsing's packrat parsing support. Immediately after your "import pyparsing..." statement, add this line: PP.ParserElement.enablePackrat() This reduces a lot of the repetitive parsing calls, and can improve the performance of a heavily recursive grammar by 1000X or more. With these changes, I now get: [[2, '-', [5, '*', 'abs', '(', ['-', 6], ')'], '+', 3]] A couple parting thoughts: It would be good to retain the association of 'abs' and its argument list, instead of just having that flat list of parsed tokens. Pyparsing's Group class is for just this purpose. Change the definition of Function to Group the entire expression, and add another Group for the arguments (to make them easy to pick out from within the function's token list), like this: Function = PP.Group(Token + PP.Group("(" + PP.delimitedList(Expr2) + ")")) and your results are now better structured for easier processing: [[2, '-', [5, '*', ['abs', ['(', ['-', 6], ')']]], '+', 3]] Or with some better indentation: [ [2, '-', [5, '*', ['abs', ['(', ['-', 6], ')'] ] ], '+', 3 ] ] Lastly, here is an alternative way to represent your potentially dot-delimited identifier token: Ident = PP.Word(PP.alphas, PP.alphanums + "_") Token = PP.Combine(Ident + PP.ZeroOrMore('.' + Ident)) This is just a style point, and what you have is working fine, but this change avoids duplicating the 'PP.Word(PP.alphas, PP.alphanums + "_")' bit; and adds the pyparsing Combine expression, which will collapse the separate tokens of a reference like "math.abs" from ['math', '.', 'abs'] to just 'math.abs'. You could also have done the repetition using the pyparsing delimitedList helper, as: Token = PP.delimitedList(Ident,'.',combine=True) Again, this is much more personal style, and is completely dependent on your own taste in using pyparsing. Cheers! -- Paul |