From: Paolo Losi <p.losi@hy...>  20060912 15:57:01

Hi Ralph, Ralph Corderoy wrote: > whereas I'd like the left associativity explicit, i.e. > > + > / \ > 1 + 2 + 3 ==> + 3 > / \ > 1 2 > > I can achieve this with a rightassociative operator, e.g. exponent, but > how do I do this for leftassociative ones whilst avoiding endless > leftrecursion? I think you have no other choice but to rearrange ParseResults by setting a specific parse action for the expression: from pyparsing import * plus = Literal('+') num = Word(nums) def left_associative_regroup(s,l,t): new_t = ParseResults( [ t[0], t[2] ] ) for item in t[3:]: if item != "+": new_t = ParseResults( [new_t, item] ) return new_t expr = num + ZeroOrMore( plus + num ) expr.setParseAction(left_associative_regroup) print expr.parseString("1 + 2 + 3 + 4 + 5") > I wonder if PyParsing would benefit from being able to define operator > tables, similar to K&R's, giving the tokens, precedence, and > associativity, and being able to plumb that into the rest of the > recursive descent parser at some point? It's not trivial at all. From http://en.wikipedia.org/wiki/Left_recursion "A formal grammar that contains left recursion cannot be parsed by a recursive descent parser" PyParsing qualifies as recursive descent parser. I think it would rather be possible to build Specilized Constructs such as LeftAssociativeOperator(term=Word(nums), operator=Literal('+') ) but those would only cope with immediate left recursion... Hope this helps Ciao Paolo 
From: Ralph Corderoy <ralph@in...>  20060912 12:24:22

Hi, PyParsing seems great; many thanks. But I'm having trouble concocting the right grammar for the typical binary operator part. I'm aware of the latest fourFn.py but its use of ZeroOrMore() results in + 1 + 2 + 3 ==> /\ 1 2 3 whereas I'd like the left associativity explicit, i.e. + / \ 1 + 2 + 3 ==> + 3 / \ 1 2 I can achieve this with a rightassociative operator, e.g. exponent, but how do I do this for leftassociative ones whilst avoiding endless leftrecursion? I wonder if PyParsing would benefit from being able to define operator tables, similar to K&R's, giving the tokens, precedence, and associativity, and being able to plumb that into the rest of the recursive descent parser at some point? Cheers, Ralph. 
From: Paolo Losi <p.losi@hy...>  20060912 15:57:01

Hi Ralph, Ralph Corderoy wrote: > whereas I'd like the left associativity explicit, i.e. > > + > / \ > 1 + 2 + 3 ==> + 3 > / \ > 1 2 > > I can achieve this with a rightassociative operator, e.g. exponent, but > how do I do this for leftassociative ones whilst avoiding endless > leftrecursion? I think you have no other choice but to rearrange ParseResults by setting a specific parse action for the expression: from pyparsing import * plus = Literal('+') num = Word(nums) def left_associative_regroup(s,l,t): new_t = ParseResults( [ t[0], t[2] ] ) for item in t[3:]: if item != "+": new_t = ParseResults( [new_t, item] ) return new_t expr = num + ZeroOrMore( plus + num ) expr.setParseAction(left_associative_regroup) print expr.parseString("1 + 2 + 3 + 4 + 5") > I wonder if PyParsing would benefit from being able to define operator > tables, similar to K&R's, giving the tokens, precedence, and > associativity, and being able to plumb that into the rest of the > recursive descent parser at some point? It's not trivial at all. From http://en.wikipedia.org/wiki/Left_recursion "A formal grammar that contains left recursion cannot be parsed by a recursive descent parser" PyParsing qualifies as recursive descent parser. I think it would rather be possible to build Specilized Constructs such as LeftAssociativeOperator(term=Word(nums), operator=Literal('+') ) but those would only cope with immediate left recursion... Hope this helps Ciao Paolo 
From: Ralph Corderoy <ralph@in...>  20060913 10:04:22

Hi Paolo, > Ralph Corderoy wrote: > > whereas I'd like the left associativity explicit, i.e. > > > > + > > / \ > > 1 + 2 + 3 ==> + 3 > > / \ > > 1 2 > > > > I can achieve this with a rightassociative operator, e.g. exponent, but > > how do I do this for leftassociative ones whilst avoiding endless > > leftrecursion? > > I think you have no other choice but to rearrange ParseResults by > setting a specific parse action for the expression: OK, thanks. I hadn't got that far, just been banging my head trying to get the desired result naturally. > > I wonder if PyParsing would benefit from being able to define operator > > tables, similar to K&R's, giving the tokens, precedence, and > > associativity, and being able to plumb that into the rest of the > > recursive descent parser at some point? > > It's not trivial at all. > > From http://en.wikipedia.org/wiki/Left_recursion > > "A formal grammar that contains left recursion cannot be parsed by a > recursive descent parser" > > PyParsing qualifies as recursive descent parser. Yes, but I don't think the R.D. parser below, written by hand, not with pyparsing, is leftrecursive. It can just support different associativity depending on how the parse tree is built. Cheers, Ralph. #! /usr/bin/python def main(): test('1', "'1'") test('1+2', "('1', '+', '2')") test('1+2+3', "(('1', '+', '2'), '+', '3')") test('1*2+3', "(('1', '*', '2'), '+', '3')") test('1+2*3', "('1', '+', ('2', '*', '3'))") test('1^2^3', "('1', '^', ('2', '^', '3'))") test('3^2', "('', ('3', '^', '2'))") test('+3^2', "('', ('+', ('', ('3', '^', '2'))))") test('1a+2b*3c', "(('1', 'a'), '+', (('2', 'b'), '*', ('3', 'c')))") test('5a>b', "(('5', 'a'), 'b')") def test(s, e): r = repr(parse(s)) if r == e: print 'pass: %s => %s' % (s, r) else: print 'FAIL: %s: want %s. got %s.' % (s, e, r) def parse(s): l = list(s + ';') e = parse_statement(l) expect(l, ';') return e def parse_statement(s): 'statement := expr [">" unit]' e = parse_expr(s) if peek(s) == '>': consume(s) u = parse_unit(s) if not u: raise 'Expect unit after >.' return (e, u) return e def parse_expr(s): 'expr := term ["+" term]*' a = parse_term(s) while peek(s) == '+': op = consume(s) b = parse_term(s) a = (a, op, b) return a def parse_term(s): 'term := unary ["*" unary]*' a = parse_unary(s) while peek(s) == '*': op = consume(s) b = parse_unary(s) a = (a, op, b) return a def parse_unary(s): 'unary := ["+"  ""] unary  factor' sign = '+' if peek(s) in '+': sign = consume(s) a = parse_unary(s) return (sign, a) return parse_factor(s) def parse_factor(s): 'factor := number ["^" factor]' a = parse_number(s) while peek(s) == '^': op = consume(s) b = parse_factor(s) a = (a, op, b) return a def parse_number(s): 'number := "0"  "1"  ... unit' if not peek(s).isdigit(): raise i = consume(s) u = parse_unit(s) if u: return (i, u) return i def parse_unit(s): 'unit := "a"  "b"  "c"  empty' if peek(s) not in 'abc': return None return consume(s) def peek(s): return s[0] def expect(s, t): c = consume(s) if c != t: raise 'Expected %s but got %s' % (t, c) return def consume(s): return s.pop(0) main() 
From: Paolo Losi <p.losi@hy...>  20060912 20:07:05

Ralph Corderoy wrote: >> "A formal grammar that contains left recursion cannot be parsed by a >> recursive descent parser" >> >> PyParsing qualifies as recursive descent parser. > > Yes, but I don't think the R.D. parser below, written by hand, not with > pyparsing, is leftrecursive. It can just support different > associativity depending on how the parse tree is built. It is not leftrecursive, of course. In your example you basically build the tree in the correct way instead of rearranging it. But please note that in my example I could not add a parse action to ZeroOrMore but to expr. Simply modifing or parametrizing the current tree building policy in one of the existing tokens is not enough. a new token as the ones that I suggested could maybe provide a solution (similar to your parse_term or parse_expr in how the tree is built). Translating a set of operator precedence and associativity rules in order to adapt parse behaviour could be a step further but, in terms of interface, I'd prefer using the one the I suggested... If you think that could be a solution, I could try to provide an implementation... Ciao Paolo > > Cheers, > > > Ralph. > > > #! /usr/bin/python > > def main(): > test('1', "'1'") > test('1+2', "('1', '+', '2')") > test('1+2+3', "(('1', '+', '2'), '+', '3')") > test('1*2+3', "(('1', '*', '2'), '+', '3')") > test('1+2*3', "('1', '+', ('2', '*', '3'))") > test('1^2^3', "('1', '^', ('2', '^', '3'))") > test('3^2', "('', ('3', '^', '2'))") > test('+3^2', "('', ('+', ('', ('3', '^', '2'))))") > test('1a+2b*3c', "(('1', 'a'), '+', (('2', 'b'), '*', ('3', 'c')))") > test('5a>b', "(('5', 'a'), 'b')") > > def test(s, e): > r = repr(parse(s)) > if r == e: > print 'pass: %s => %s' % (s, r) > else: > print 'FAIL: %s: want %s. got %s.' % (s, e, r) > > def parse(s): > l = list(s + ';') > e = parse_statement(l) > expect(l, ';') > return e > > def parse_statement(s): > 'statement := expr [">" unit]' > e = parse_expr(s) > if peek(s) == '>': > consume(s) > u = parse_unit(s) > if not u: > raise 'Expect unit after >.' > return (e, u) > return e > > def parse_expr(s): > 'expr := term ["+" term]*' > a = parse_term(s) > while peek(s) == '+': > op = consume(s) > b = parse_term(s) > a = (a, op, b) > return a > > def parse_term(s): > 'term := unary ["*" unary]*' > a = parse_unary(s) > while peek(s) == '*': > op = consume(s) > b = parse_unary(s) > a = (a, op, b) > return a > > def parse_unary(s): > 'unary := ["+"  ""] unary  factor' > sign = '+' > if peek(s) in '+': > sign = consume(s) > a = parse_unary(s) > return (sign, a) > return parse_factor(s) > > def parse_factor(s): > 'factor := number ["^" factor]' > a = parse_number(s) > while peek(s) == '^': > op = consume(s) > b = parse_factor(s) > a = (a, op, b) > return a > > def parse_number(s): > 'number := "0"  "1"  ... unit' > if not peek(s).isdigit(): > raise > i = consume(s) > u = parse_unit(s) > if u: > return (i, u) > return i > > def parse_unit(s): > 'unit := "a"  "b"  "c"  empty' > if peek(s) not in 'abc': > return None > return consume(s) > > def peek(s): > return s[0] > > def expect(s, t): > c = consume(s) > if c != t: > raise 'Expected %s but got %s' % (t, c) > return > > def consume(s): > return s.pop(0) > > main() > 
From: Ralph Corderoy <ralph@in...>  20060913 12:23:32

Hi Paolo, > Ralph Corderoy wrote: > > Yes, but I don't think the R.D. parser below, written by hand, not with > > pyparsing, is leftrecursive. It can just support different > > associativity depending on how the parse tree is built. > > It is not leftrecursive, of course. > In your example you basically build the tree in the correct way > instead of rearranging it. But please note that in my example I could > not add a parse action to ZeroOrMore but to expr. I agree. Given expr << term + ZeroOrMore(plusop + term) we can't use Group() because the leftmost (term+term) isn't accessible. > a new token as the ones that I suggested could maybe provide a solution > (similar to your parse_term or parse_expr in how the tree is built). > ... > If you think that could be a solution, I could try to provide an > implementation... That's OK. The use of parseAction() that you showed me is enough to give me the idea. I guess I'd like to write number = Words(digit) factor = Forward() factor << RightBinOpSeq(number, expop, factor) unary = OptRightUnaryOpSeq(signop, factor) term = unary ^ LeftBinOpSeq(unary, multop, unary) expr = term ^ LeftBinOpSeq(term, plusop, term) statement = expr + semicolon and have the /(Opt)?(LeftRightNon)(UnaryBin)OpSeq/ routines do the Grouping implicitly so 1+2*3^4*5++6 yeilds the tree ( , +, ) (1, +, ) (, ) ( , *, 5) (+, ) (2, *, ) (, 6) (, ) (3, ^, 4) Although, something like expop = Literal('^') signop = OneOf('+ ') multop = Literal('*') plusop = Literal('+') expr = Forward() term = Forward() unary = Forward() factor = Forward() expr = OperatorGrammar( \ expop, 2, 'R', signop, 1, 'R', multop, 2, 'L', plusop, 2, 'L', ''') also seems attractive, and simpler to specify. Cheers, Ralph. 
From: Paul McGuire <paul@al...>  20060913 13:39:23

This discussion from c.l.py (http://www.velocityreviews.com/forums/t335296basictokenizer.html  about 2 years ago) might also shed some light on this topic, especially the comments (and posted code) from Andrea Griffini. He pointed out the '^' leftassociativity bug in fourFn.py which I was able to fix in a subsequent release. I like the graphical rending of the parse tree. :)  Paul > Original Message > From: pyparsingusersbounces@... > [mailto:pyparsingusersbounces@...] On > Behalf Of Ralph Corderoy > Sent: Wednesday, September 13, 2006 7:23 AM > To: Paolo Losi > Cc: pyparsingusers@... > Subject: Re: [Pyparsing] Operator Precedence and Associativity. > > > Hi Paolo, > > > Ralph Corderoy wrote: > > > Yes, but I don't think the R.D. parser below, written by > hand, not with > > > pyparsing, is leftrecursive. It can just support different > > > associativity depending on how the parse tree is built. > > > > It is not leftrecursive, of course. > > In your example you basically build the tree in the correct way > > instead of rearranging it. But please note that in my > example I could > > not add a parse action to ZeroOrMore but to expr. > > I agree. Given > > expr << term + ZeroOrMore(plusop + term) > > we can't use Group() because the leftmost (term+term) isn't > accessible. > > > a new token as the ones that I suggested could maybe > provide a solution > > (similar to your parse_term or parse_expr in how the tree is built). > > ... > > If you think that could be a solution, I could try to provide an > > implementation... > > That's OK. The use of parseAction() that you showed me is enough to > give me the idea. > > I guess I'd like to write > > number = Words(digit) > factor = Forward() > factor << RightBinOpSeq(number, expop, factor) > unary = OptRightUnaryOpSeq(signop, factor) > term = unary ^ LeftBinOpSeq(unary, multop, unary) > expr = term ^ LeftBinOpSeq(term, plusop, term) > statement = expr + semicolon > > and have the /(Opt)?(LeftRightNon)(UnaryBin)OpSeq/ routines do the > Grouping implicitly so > > 1+2*3^4*5++6 > > yeilds the tree > > ( , +, ) > (1, +, ) (, ) > ( , *, 5) (+, ) > (2, *, ) (, 6) > (, ) > (3, ^, 4) > > Although, something like > > expop = Literal('^') > signop = OneOf('+ ') > multop = Literal('*') > plusop = Literal('+') > > expr = Forward() > term = Forward() > unary = Forward() > factor = Forward() > > expr = OperatorGrammar( \ > expop, 2, 'R', > signop, 1, 'R', > multop, 2, 'L', > plusop, 2, 'L', > ''') > > also seems attractive, and simpler to specify. > > Cheers, > > > Ralph. > > > >  >  > Using Tomcat but need to do more? Need to support web > services, security? > Get stuff done quickly with preintegrated technology to make > your job easier > Download IBM WebSphere Application Server v.1.0.1 based on > Apache Geronimo > http://sel.asus.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&; > dat=121642 > _______________________________________________ > Pyparsingusers mailing list > Pyparsingusers@... > https://lists.sourceforge.net/lists/listinfo/pyparsingusers > > 
From: Ralph Corderoy <ralph@in...>  20060915 11:00:36

Hi Paul, > This discussion from c.l.py > (http://www.velocityreviews.com/forums/t335296basictokenizer.html  > about 2 years ago) might also shed some light on this topic, > especially the comments (and posted code) from Andrea Griffini. He > pointed out the '^' leftassociativity bug in fourFn.py which I was > able to fix in a subsequent release. Thanks, I had already found that, and you're right, it helped concoct the right pyparsing grammar, as in fourFn.py. The issue is I don't want [1, '+', 2, '+', 3] returned, but [[1, '+', 2], '+', 3] which doesn't seem possible without postparsing manipulation. That is, the `classic' parse tree of an infix algebraic expression isn't achievable. > I like the graphical rending of the parse tree. :) Yes, it just occurred to me when I was sitting there with the text all on one line. Not hard to do, either, with vim's CtrlY in insert mode. Cheers, Ralph. 
From: Paul McGuire <paul@al...>  20061010 22:03:18

> Original Message > From: pyparsingusersbounces@... > [mailto:pyparsingusersbounces@...] On > Behalf Of Ralph Corderoy > Sent: Wednesday, September 13, 2006 7:23 AM > To: Paolo Losi > Cc: pyparsingusers@... > Subject: Re: [Pyparsing] Operator Precedence and Associativity. > > > Hi Paolo, > > > Ralph Corderoy wrote: <snip> > > I guess I'd like to write > > number = Words(digit) > factor = Forward() > factor << RightBinOpSeq(number, expop, factor) > unary = OptRightUnaryOpSeq(signop, factor) > term = unary ^ LeftBinOpSeq(unary, multop, unary) > expr = term ^ LeftBinOpSeq(term, plusop, term) > statement = expr + semicolon > > and have the /(Opt)?(LeftRightNon)(UnaryBin)OpSeq/ > routines do the Grouping implicitly so > > 1+2*3^4*5++6 > > yeilds the tree > > ( , +, ) > (1, +, ) (, ) > ( , *, 5) (+, ) > (2, *, ) (, 6) > (, ) > (3, ^, 4) > > Although, something like > > expop = Literal('^') > signop = OneOf('+ ') > multop = Literal('*') > plusop = Literal('+') > > expr = Forward() > term = Forward() > unary = Forward() > factor = Forward() > > expr = OperatorGrammar( \ > expop, 2, 'R', > signop, 1, 'R', > multop, 2, 'L', > plusop, 2, 'L', > ''') > > also seems attractive, and simpler to specify. > Well, why not?! Check this out!  Paul from pyparsing import * def operatorGrammar( baseExpr, opList ): ret = Forward() lastExpr = baseExpr  ( Suppress('(') + ret + Suppress(')') ) i = 0 for opExpr,arity,rightLeftAssoc in opList: thisExpr = Forward() thisExpr.setName("expr%d" % i)#.setDebug() if rightLeftAssoc == 'L': if arity == 1: thisExpr << ( Group( lastExpr + opExpr )  lastExpr ) else: thisExpr << ( Group( lastExpr + OneOrMore( opExpr + lastExpr ) )  lastExpr ) else: if arity == 1: # try to avoid LR with this extra test if isinstance(opExpr, Optional): thisExpr << (( FollowedBy(opExpr.expr + thisExpr) + Group( opExpr + thisExpr ))  lastExpr ) else: thisExpr << ( opExpr + thisExpr ) else: thisExpr << ( Group( lastExpr + OneOrMore( opExpr + thisExpr ) )  lastExpr ) lastExpr = thisExpr thisExpr = Forward() i += 1 ret << lastExpr return ret number = Word(nums).setParseAction(lambda t:int(t[0])) variable = Word(alphas,exact=1) operand = number  variable expop = Literal('^') signop = Optional( oneOf('+ ') ) multop = oneOf('* /') plusop = oneOf('+ ') factop = Literal('!') expr = operatorGrammar( operand, [(factop, 1, 'L'), (expop, 2, 'R'), (signop, 1, 'R'), (multop, 2, 'L'), (plusop, 2, 'L'),] ) test = ["9 + 2 * 3", "(9 + 2) * 3", "(9 + 2) * 3", "(9 + 2) * 3^2^2", "(9! + 2) * 3^2^2", "M*X + B", "1+2*3^4*5++6",] for t in test: print t print expr.parseString(t) print Gives the following: 9 + 2 * 3 [[9, '+', [2, '*', 3]]] (9 + 2) * 3 [[[9, '+', 2], '*', 3]] (9 + 2) * 3 [[[9, '+', ['', 2]], '*', 3]] (9 + 2) * 3^2^2 [[[9, '+', ['', 2]], '*', [3, '^', [2, '^', 2]]]] (9! + 2) * 3^2^2 [[[[9, '!'], '+', ['', 2]], '*', [3, '^', [2, '^', 2]]]] M*X + B [[['M', '*', 'X'], '+', 'B']] 1+2*3^4*5++6 [[1, '+', [2, '*', ['', [3, '^', 4]], '*', 5], '+', ['', ['+', ['', 6]]]]] 
From: Ralph Corderoy <ralph@in...>  20061011 12:09:38

Hi Paul, > Well, why not?! Check this out! > > def operatorGrammar( baseExpr, opList ): > ret = Forward() > lastExpr = baseExpr  ( Suppress('(') + ret + Suppress(')') ) > ... That's impressive and exactly what I was after. You may want to consider formalising it as part of pyparsing since recursive descent parsers often use operator precedence parsers for that part of things. Thanks! Ralph. 