Thread: [Pyparsing] Operator Precedence and Associativity.
Brought to you by:
ptmcg
From: Ralph C. <ra...@in...> - 2006-09-12 12:24:22
|
Hi, PyParsing seems great; many thanks. But I'm having trouble concocting the right grammar for the typical binary operator part. I'm aware of the latest fourFn.py but its use of ZeroOrMore() results in + 1 + 2 + 3 ==> /|\ 1 2 3 whereas I'd like the left associativity explicit, i.e. + / \ 1 + 2 + 3 ==> + 3 / \ 1 2 I can achieve this with a right-associative operator, e.g. exponent, but how do I do this for left-associative ones whilst avoiding endless left-recursion? I wonder if PyParsing would benefit from being able to define operator tables, similar to K&R's, giving the tokens, precedence, and associativity, and being able to plumb that into the rest of the recursive descent parser at some point? Cheers, Ralph. |
From: Paolo L. <p....@hy...> - 2006-09-12 15:57:01
|
Hi Ralph, Ralph Corderoy wrote: > whereas I'd like the left associativity explicit, i.e. > > + > / \ > 1 + 2 + 3 ==> + 3 > / \ > 1 2 > > I can achieve this with a right-associative operator, e.g. exponent, but > how do I do this for left-associative ones whilst avoiding endless > left-recursion? I think you have no other choice but to rearrange ParseResults by setting a specific parse action for the expression: from pyparsing import * plus = Literal('+') num = Word(nums) def left_associative_regroup(s,l,t): new_t = ParseResults( [ t[0], t[2] ] ) for item in t[3:]: if item != "+": new_t = ParseResults( [new_t, item] ) return new_t expr = num + ZeroOrMore( plus + num ) expr.setParseAction(left_associative_regroup) print expr.parseString("1 + 2 + 3 + 4 + 5") > I wonder if PyParsing would benefit from being able to define operator > tables, similar to K&R's, giving the tokens, precedence, and > associativity, and being able to plumb that into the rest of the > recursive descent parser at some point? It's not trivial at all. From http://en.wikipedia.org/wiki/Left_recursion "A formal grammar that contains left recursion cannot be parsed by a recursive descent parser" PyParsing qualifies as recursive descent parser. I think it would rather be possible to build Specilized Constructs such as LeftAssociativeOperator(term=Word(nums), operator=Literal('+') ) but those would only cope with immediate left recursion... Hope this helps Ciao Paolo |
From: Ralph C. <ra...@in...> - 2006-09-13 10:04:22
|
Hi Paolo, > Ralph Corderoy wrote: > > whereas I'd like the left associativity explicit, i.e. > > > > + > > / \ > > 1 + 2 + 3 ==> + 3 > > / \ > > 1 2 > > > > I can achieve this with a right-associative operator, e.g. exponent, but > > how do I do this for left-associative ones whilst avoiding endless > > left-recursion? > > I think you have no other choice but to rearrange ParseResults by > setting a specific parse action for the expression: OK, thanks. I hadn't got that far, just been banging my head trying to get the desired result naturally. > > I wonder if PyParsing would benefit from being able to define operator > > tables, similar to K&R's, giving the tokens, precedence, and > > associativity, and being able to plumb that into the rest of the > > recursive descent parser at some point? > > It's not trivial at all. > > From http://en.wikipedia.org/wiki/Left_recursion > > "A formal grammar that contains left recursion cannot be parsed by a > recursive descent parser" > > PyParsing qualifies as recursive descent parser. Yes, but I don't think the R.D. parser below, written by hand, not with pyparsing, is left-recursive. It can just support different associativity depending on how the parse tree is built. Cheers, Ralph. #! /usr/bin/python def main(): test('1', "'1'") test('1+2', "('1', '+', '2')") test('1+2+3', "(('1', '+', '2'), '+', '3')") test('1*2+3', "(('1', '*', '2'), '+', '3')") test('1+2*3', "('1', '+', ('2', '*', '3'))") test('1^2^3', "('1', '^', ('2', '^', '3'))") test('-3^2', "('-', ('3', '^', '2'))") test('-+-3^2', "('-', ('+', ('-', ('3', '^', '2'))))") test('1a+2b*3c', "(('1', 'a'), '+', (('2', 'b'), '*', ('3', 'c')))") test('5a>b', "(('5', 'a'), 'b')") def test(s, e): r = repr(parse(s)) if r == e: print 'pass: %s => %s' % (s, r) else: print 'FAIL: %s: want %s. got %s.' % (s, e, r) def parse(s): l = list(s + ';') e = parse_statement(l) expect(l, ';') return e def parse_statement(s): 'statement := expr [">" unit]' e = parse_expr(s) if peek(s) == '>': consume(s) u = parse_unit(s) if not u: raise 'Expect unit after >.' return (e, u) return e def parse_expr(s): 'expr := term ["+" term]*' a = parse_term(s) while peek(s) == '+': op = consume(s) b = parse_term(s) a = (a, op, b) return a def parse_term(s): 'term := unary ["*" unary]*' a = parse_unary(s) while peek(s) == '*': op = consume(s) b = parse_unary(s) a = (a, op, b) return a def parse_unary(s): 'unary := ["+" | "-"] unary | factor' sign = '+' if peek(s) in '+-': sign = consume(s) a = parse_unary(s) return (sign, a) return parse_factor(s) def parse_factor(s): 'factor := number ["^" factor]' a = parse_number(s) while peek(s) == '^': op = consume(s) b = parse_factor(s) a = (a, op, b) return a def parse_number(s): 'number := "0" | "1" | ... unit' if not peek(s).isdigit(): raise i = consume(s) u = parse_unit(s) if u: return (i, u) return i def parse_unit(s): 'unit := "a" | "b" | "c" | empty' if peek(s) not in 'abc': return None return consume(s) def peek(s): return s[0] def expect(s, t): c = consume(s) if c != t: raise 'Expected %s but got %s' % (t, c) return def consume(s): return s.pop(0) main() |
From: Paolo L. <p....@hy...> - 2006-09-12 20:07:05
|
Ralph Corderoy wrote: >> "A formal grammar that contains left recursion cannot be parsed by a >> recursive descent parser" >> >> PyParsing qualifies as recursive descent parser. > > Yes, but I don't think the R.D. parser below, written by hand, not with > pyparsing, is left-recursive. It can just support different > associativity depending on how the parse tree is built. It is not left-recursive, of course. In your example you basically build the tree in the correct way instead of rearranging it. But please note that in my example I could not add a parse action to ZeroOrMore but to expr. Simply modifing or parametrizing the current tree building policy in one of the existing tokens is not enough. a new token as the ones that I suggested could maybe provide a solution (similar to your parse_term or parse_expr in how the tree is built). Translating a set of operator precedence and associativity rules in order to adapt parse behaviour could be a step further but, in terms of interface, I'd prefer using the one the I suggested... If you think that could be a solution, I could try to provide an implementation... Ciao Paolo > > Cheers, > > > Ralph. > > > #! /usr/bin/python > > def main(): > test('1', "'1'") > test('1+2', "('1', '+', '2')") > test('1+2+3', "(('1', '+', '2'), '+', '3')") > test('1*2+3', "(('1', '*', '2'), '+', '3')") > test('1+2*3', "('1', '+', ('2', '*', '3'))") > test('1^2^3', "('1', '^', ('2', '^', '3'))") > test('-3^2', "('-', ('3', '^', '2'))") > test('-+-3^2', "('-', ('+', ('-', ('3', '^', '2'))))") > test('1a+2b*3c', "(('1', 'a'), '+', (('2', 'b'), '*', ('3', 'c')))") > test('5a>b', "(('5', 'a'), 'b')") > > def test(s, e): > r = repr(parse(s)) > if r == e: > print 'pass: %s => %s' % (s, r) > else: > print 'FAIL: %s: want %s. got %s.' % (s, e, r) > > def parse(s): > l = list(s + ';') > e = parse_statement(l) > expect(l, ';') > return e > > def parse_statement(s): > 'statement := expr [">" unit]' > e = parse_expr(s) > if peek(s) == '>': > consume(s) > u = parse_unit(s) > if not u: > raise 'Expect unit after >.' > return (e, u) > return e > > def parse_expr(s): > 'expr := term ["+" term]*' > a = parse_term(s) > while peek(s) == '+': > op = consume(s) > b = parse_term(s) > a = (a, op, b) > return a > > def parse_term(s): > 'term := unary ["*" unary]*' > a = parse_unary(s) > while peek(s) == '*': > op = consume(s) > b = parse_unary(s) > a = (a, op, b) > return a > > def parse_unary(s): > 'unary := ["+" | "-"] unary | factor' > sign = '+' > if peek(s) in '+-': > sign = consume(s) > a = parse_unary(s) > return (sign, a) > return parse_factor(s) > > def parse_factor(s): > 'factor := number ["^" factor]' > a = parse_number(s) > while peek(s) == '^': > op = consume(s) > b = parse_factor(s) > a = (a, op, b) > return a > > def parse_number(s): > 'number := "0" | "1" | ... unit' > if not peek(s).isdigit(): > raise > i = consume(s) > u = parse_unit(s) > if u: > return (i, u) > return i > > def parse_unit(s): > 'unit := "a" | "b" | "c" | empty' > if peek(s) not in 'abc': > return None > return consume(s) > > def peek(s): > return s[0] > > def expect(s, t): > c = consume(s) > if c != t: > raise 'Expected %s but got %s' % (t, c) > return > > def consume(s): > return s.pop(0) > > main() > |
From: Ralph C. <ra...@in...> - 2006-09-13 12:23:32
|
Hi Paolo, > Ralph Corderoy wrote: > > Yes, but I don't think the R.D. parser below, written by hand, not with > > pyparsing, is left-recursive. It can just support different > > associativity depending on how the parse tree is built. > > It is not left-recursive, of course. > In your example you basically build the tree in the correct way > instead of rearranging it. But please note that in my example I could > not add a parse action to ZeroOrMore but to expr. I agree. Given expr << term + ZeroOrMore(plusop + term) we can't use Group() because the left-most (term+term) isn't accessible. > a new token as the ones that I suggested could maybe provide a solution > (similar to your parse_term or parse_expr in how the tree is built). > ... > If you think that could be a solution, I could try to provide an > implementation... That's OK. The use of parseAction() that you showed me is enough to give me the idea. I guess I'd like to write number = Words(digit) factor = Forward() factor << RightBinOpSeq(number, expop, factor) unary = OptRightUnaryOpSeq(signop, factor) term = unary ^ LeftBinOpSeq(unary, multop, unary) expr = term ^ LeftBinOpSeq(term, plusop, term) statement = expr + semicolon and have the /(Opt)?(Left|Right|Non)(Unary|Bin)OpSeq/ routines do the Grouping implicitly so 1+2*-3^4*5+-+-6 yeilds the tree ( , +, ) (1, +, ) (-, ) ( , *, 5) (+, ) (2, *, ) (-, 6) (-, ) (3, ^, 4) Although, something like expop = Literal('^') signop = OneOf('+ -') multop = Literal('*') plusop = Literal('+') expr = Forward() term = Forward() unary = Forward() factor = Forward() expr = OperatorGrammar( \ expop, 2, 'R', signop, 1, 'R', multop, 2, 'L', plusop, 2, 'L', ''') also seems attractive, and simpler to specify. Cheers, Ralph. |
From: Paul M. <pa...@al...> - 2006-09-13 13:39:23
|
This discussion from c.l.py (http://www.velocityreviews.com/forums/t335296-basic-tokenizer.html - about 2 years ago) might also shed some light on this topic, especially the comments (and posted code) from Andrea Griffini. He pointed out the '^' left-associativity bug in fourFn.py which I was able to fix in a subsequent release. I like the graphical rending of the parse tree. :) -- Paul > -----Original Message----- > From: pyp...@li... > [mailto:pyp...@li...] On > Behalf Of Ralph Corderoy > Sent: Wednesday, September 13, 2006 7:23 AM > To: Paolo Losi > Cc: pyp...@li... > Subject: Re: [Pyparsing] Operator Precedence and Associativity. > > > Hi Paolo, > > > Ralph Corderoy wrote: > > > Yes, but I don't think the R.D. parser below, written by > hand, not with > > > pyparsing, is left-recursive. It can just support different > > > associativity depending on how the parse tree is built. > > > > It is not left-recursive, of course. > > In your example you basically build the tree in the correct way > > instead of rearranging it. But please note that in my > example I could > > not add a parse action to ZeroOrMore but to expr. > > I agree. Given > > expr << term + ZeroOrMore(plusop + term) > > we can't use Group() because the left-most (term+term) isn't > accessible. > > > a new token as the ones that I suggested could maybe > provide a solution > > (similar to your parse_term or parse_expr in how the tree is built). > > ... > > If you think that could be a solution, I could try to provide an > > implementation... > > That's OK. The use of parseAction() that you showed me is enough to > give me the idea. > > I guess I'd like to write > > number = Words(digit) > factor = Forward() > factor << RightBinOpSeq(number, expop, factor) > unary = OptRightUnaryOpSeq(signop, factor) > term = unary ^ LeftBinOpSeq(unary, multop, unary) > expr = term ^ LeftBinOpSeq(term, plusop, term) > statement = expr + semicolon > > and have the /(Opt)?(Left|Right|Non)(Unary|Bin)OpSeq/ routines do the > Grouping implicitly so > > 1+2*-3^4*5+-+-6 > > yeilds the tree > > ( , +, ) > (1, +, ) (-, ) > ( , *, 5) (+, ) > (2, *, ) (-, 6) > (-, ) > (3, ^, 4) > > Although, something like > > expop = Literal('^') > signop = OneOf('+ -') > multop = Literal('*') > plusop = Literal('+') > > expr = Forward() > term = Forward() > unary = Forward() > factor = Forward() > > expr = OperatorGrammar( \ > expop, 2, 'R', > signop, 1, 'R', > multop, 2, 'L', > plusop, 2, 'L', > ''') > > also seems attractive, and simpler to specify. > > Cheers, > > > Ralph. > > > > -------------------------------------------------------------- > ----------- > Using Tomcat but need to do more? Need to support web > services, security? > Get stuff done quickly with pre-integrated technology to make > your job easier > Download IBM WebSphere Application Server v.1.0.1 based on > Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057& > dat=121642 > _______________________________________________ > Pyparsing-users mailing list > Pyp...@li... > https://lists.sourceforge.net/lists/listinfo/pyparsing-users > > |
From: Ralph C. <ra...@in...> - 2006-09-15 11:00:36
|
Hi Paul, > This discussion from c.l.py > (http://www.velocityreviews.com/forums/t335296-basic-tokenizer.html - > about 2 years ago) might also shed some light on this topic, > especially the comments (and posted code) from Andrea Griffini. He > pointed out the '^' left-associativity bug in fourFn.py which I was > able to fix in a subsequent release. Thanks, I had already found that, and you're right, it helped concoct the right pyparsing grammar, as in fourFn.py. The issue is I don't want [1, '+', 2, '+', 3] returned, but [[1, '+', 2], '+', 3] which doesn't seem possible without post-parsing manipulation. That is, the `classic' parse tree of an infix algebraic expression isn't achievable. > I like the graphical rending of the parse tree. :) Yes, it just occurred to me when I was sitting there with the text all on one line. Not hard to do, either, with vim's Ctrl-Y in insert mode. Cheers, Ralph. |
From: Paul M. <pa...@al...> - 2006-10-10 22:03:18
|
> -----Original Message----- > From: pyp...@li... > [mailto:pyp...@li...] On > Behalf Of Ralph Corderoy > Sent: Wednesday, September 13, 2006 7:23 AM > To: Paolo Losi > Cc: pyp...@li... > Subject: Re: [Pyparsing] Operator Precedence and Associativity. > > > Hi Paolo, > > > Ralph Corderoy wrote: <snip> > > I guess I'd like to write > > number = Words(digit) > factor = Forward() > factor << RightBinOpSeq(number, expop, factor) > unary = OptRightUnaryOpSeq(signop, factor) > term = unary ^ LeftBinOpSeq(unary, multop, unary) > expr = term ^ LeftBinOpSeq(term, plusop, term) > statement = expr + semicolon > > and have the /(Opt)?(Left|Right|Non)(Unary|Bin)OpSeq/ > routines do the Grouping implicitly so > > 1+2*-3^4*5+-+-6 > > yeilds the tree > > ( , +, ) > (1, +, ) (-, ) > ( , *, 5) (+, ) > (2, *, ) (-, 6) > (-, ) > (3, ^, 4) > > Although, something like > > expop = Literal('^') > signop = OneOf('+ -') > multop = Literal('*') > plusop = Literal('+') > > expr = Forward() > term = Forward() > unary = Forward() > factor = Forward() > > expr = OperatorGrammar( \ > expop, 2, 'R', > signop, 1, 'R', > multop, 2, 'L', > plusop, 2, 'L', > ''') > > also seems attractive, and simpler to specify. > Well, why not?! Check this out! -- Paul from pyparsing import * def operatorGrammar( baseExpr, opList ): ret = Forward() lastExpr = baseExpr | ( Suppress('(') + ret + Suppress(')') ) i = 0 for opExpr,arity,rightLeftAssoc in opList: thisExpr = Forward() thisExpr.setName("expr%d" % i)#.setDebug() if rightLeftAssoc == 'L': if arity == 1: thisExpr << ( Group( lastExpr + opExpr ) | lastExpr ) else: thisExpr << ( Group( lastExpr + OneOrMore( opExpr + lastExpr ) ) | lastExpr ) else: if arity == 1: # try to avoid LR with this extra test if isinstance(opExpr, Optional): thisExpr << (( FollowedBy(opExpr.expr + thisExpr) + Group( opExpr + thisExpr )) | lastExpr ) else: thisExpr << ( opExpr + thisExpr ) else: thisExpr << ( Group( lastExpr + OneOrMore( opExpr + thisExpr ) ) | lastExpr ) lastExpr = thisExpr thisExpr = Forward() i += 1 ret << lastExpr return ret number = Word(nums).setParseAction(lambda t:int(t[0])) variable = Word(alphas,exact=1) operand = number | variable expop = Literal('^') signop = Optional( oneOf('+ -') ) multop = oneOf('* /') plusop = oneOf('+ -') factop = Literal('!') expr = operatorGrammar( operand, [(factop, 1, 'L'), (expop, 2, 'R'), (signop, 1, 'R'), (multop, 2, 'L'), (plusop, 2, 'L'),] ) test = ["9 + 2 * 3", "(9 + 2) * 3", "(9 + -2) * 3", "(9 + -2) * 3^2^2", "(9! + -2) * 3^2^2", "M*X + B", "1+2*-3^4*5+-+-6",] for t in test: print t print expr.parseString(t) print Gives the following: 9 + 2 * 3 [[9, '+', [2, '*', 3]]] (9 + 2) * 3 [[[9, '+', 2], '*', 3]] (9 + -2) * 3 [[[9, '+', ['-', 2]], '*', 3]] (9 + -2) * 3^2^2 [[[9, '+', ['-', 2]], '*', [3, '^', [2, '^', 2]]]] (9! + -2) * 3^2^2 [[[[9, '!'], '+', ['-', 2]], '*', [3, '^', [2, '^', 2]]]] M*X + B [[['M', '*', 'X'], '+', 'B']] 1+2*-3^4*5+-+-6 [[1, '+', [2, '*', ['-', [3, '^', 4]], '*', 5], '+', ['-', ['+', ['-', 6]]]]] |
From: Ralph C. <ra...@in...> - 2006-10-11 12:09:38
|
Hi Paul, > Well, why not?! Check this out! > > def operatorGrammar( baseExpr, opList ): > ret = Forward() > lastExpr = baseExpr | ( Suppress('(') + ret + Suppress(')') ) > ... That's impressive and exactly what I was after. You may want to consider formalising it as part of pyparsing since recursive descent parsers often use operator precedence parsers for that part of things. Thanks! Ralph. |