Thread: [Pyparsing] Recursive grammar
Brought to you by:
ptmcg
From: thomas_h <th...@go...> - 2007-11-09 11:04:56
|
Hi all, I'm new to pyparsing, and I'm struggling with recursive production rules. As an example, I want to parse expressions that are either numbers or additions of numbers. Grammar: number :: '0'..'9'* aexp :: number | number '+' aexp The following straight-forward implementation doesn't work: from pyparsing import * number = Word(nums) aexp = Forward() aexp = Or ([ number, number + '+' + aexp , ]) Running the code yields: aexp.parseString("3+4") => (['3'], {}) I probably miss something simple here, but I can't figure out what. If I make aexp non-recursive ("..., number + '+' + number,])") it works, but this doesn't solve the general problem (re-arranging the cases in the 'Or' argument doesn't help either). I looked into the "fourFn.py" example, but this is much too elaborate to answer my question. Thanks, Thomas |
From: Ralph C. <ra...@in...> - 2007-11-09 11:57:04
|
Hi Thomas, > from pyparsing import * > number = Word(nums) > aexp = Forward() > aexp = Or ([ > number, > number + '+' + aexp , > ]) > > Running the code yields: > > aexp.parseString("3+4") => (['3'], {}) Examine aexp shows >>> aexp {W:(0123...) ^ {{W:(0123...) "+"} Forward:None}} which indicates to me that the Forward hasn't been set to anything. That's because Forwards have to be assigned with `<<' rather than `='. You're forgetting about the Forward created with the initial assignment to aexp by altering aexp to be an Or. I can't find the documentation online, but have a look for something like /usr/share/doc/python2.4-pyparsing/htmldoc/index.html on your machine and see Forward: Forward declaration of an expression to be defined later - used for recursive grammars, such as algebraic infix notation. When the expression is known, it is assigned to the Forward variable using the '<<' operator. When I try: >>> aexp = Forward() >>> aexp << Or ([ ... number, ... number + '+' + aexp , ... ]) Forward:{W:(0123...) ^ {{W:(0123...) "+"} ...}} Which looks a bit better. >>> aexp.parseString("3+4") (['3', '+', '4'], {}) Cheers, Ralph. |
From: thomas_h <th...@go...> - 2007-11-09 13:56:24
|
Thanks for the fast answers! > and your recursion wont recurse properly. Instead, you must use the '<<' > operator. In your example, you just need to replace: Ah, silly me. I had the '<<' operator in my original code, but forgot it when I tried to stip it down to a minimal example ... Yes, now it works :-). The reason I tried to build a minimal example in the first place was that I had double recursion in my initial version, something like aexp << Or([..., aexp + '+' + aexp, ...]) which results in infinite recursion. I felt this might be the right way to do it, but maybe I'm wrong. Is this generally possible in pyparsing? > While this all works, I'm curious why you are not using the operators > defined for creating compound constructs such as Or. The form you have > feels very tedious to me, compared to: > > aexp << ( number ^ number + '+' + aexp ) I tried the operators superficially and got error messages from Python, but that was probably my fault (e.g. not importing enough). I will improve on that :-). > And I would also suggest that you try to use the MatchFirst construct > instead of Or, especially with recursive expressions: > > aexp << ( number + '+' + aexp | number ) I presume this will gain me some run-time efficiency. > Note that I had to reorder the terms in order to try the more restrictive > test first. But MatchFirst will stop at the first matching alternative, > while Or will evaluate all alternatives and select the longest. In > recursive expressions, Or can descend down a neverending sequence of > self-referencing alternatives. I will keep this in mind, although I might stick with Or for the beginning and use MatchFirst as an optimization later on (my target is parsing JavaScript). > > Welcome to pyparsing! Thanks :-). Apart from my beginner's problems I've got the impression that it is really good stuff. Thanks for making the effort! =Thomas |
From: Ralph C. <ra...@in...> - 2007-11-09 18:08:00
|
Hi Thomas, > The reason I tried to build a minimal example in the first place was > that I had double recursion in my initial version, something like > > aexp << Or([..., > aexp + '+' + aexp, > ...]) > > which results in infinite recursion. I felt this might be the right > way to do it, but maybe I'm wrong. Is this generally possible in > pyparsing? I doubt it is. If you look at a grammar for a language you tend to find arithmetical expressions defined: expr := term (addop term)* term := factor (mulop factor)* factor := floatnum | '(' expr ')' addop := '+' | '-' mulop := '*' | '/' This makes the grammar, and the parse tree, represent the precedence of operators. Note the parenthesis are handled with the recursive definition where `expr' was a Forward() initially. Using ZeroOrMore() it's quite easy to turn the above into Python. Maybe fourFn.py will make a little more sense now > > And I would also suggest that you try to use the MatchFirst > > construct instead of Or, especially with recursive expressions: > > > > aexp << ( number + '+' + aexp | number ) > > I presume this will gain me some run-time efficiency. Yes. Or() tries all of them and then returns the longest. MatchFirst() can give up trying after the first one that matches. Cheers, Ralph. |
From: thomas_h <th...@go...> - 2007-11-10 23:49:59
|
Great, thanks for the help! =Thomas > > Hi Thomas, > > > The reason I tried to build a minimal example in the first place was > > that I had double recursion in my initial version, something like > > > > aexp << Or([..., > > aexp + '+' + aexp, > > ...]) > > > > which results in infinite recursion. I felt this might be the right > > way to do it, but maybe I'm wrong. Is this generally possible in > > pyparsing? > > I doubt it is. If you look at a grammar for a language you tend to find > arithmetical expressions defined: > > expr := term (addop term)* > term := factor (mulop factor)* > factor := floatnum | '(' expr ')' > addop := '+' | '-' > mulop := '*' | '/' > > This makes the grammar, and the parse tree, represent the precedence of > operators. Note the parenthesis are handled with the recursive > definition where `expr' was a Forward() initially. Using ZeroOrMore() > it's quite easy to turn the above into Python. Maybe fourFn.py will > make a little more sense now > > > > And I would also suggest that you try to use the MatchFirst > > > construct instead of Or, especially with recursive expressions: > > > > > > aexp << ( number + '+' + aexp | number ) > > > > I presume this will gain me some run-time efficiency. > > Yes. Or() tries all of them and then returns the longest. MatchFirst() > can give up trying after the first one that matches. > > Cheers, > > > Ralph. > > |