[Pyparsing] fan mail + changes suggestions for operatorPrecedence and Or
Brought to you by:
ptmcg
From: Andrea C. <an...@cd...> - 2010-12-26 05:25:38
|
Hi, I found PyParsing really easy to work with. Here is what I built on top of it: http://andreacensi.github.com/contracts/ In the mean time, I toyed around and changed something in it. Perhaps some of this is helpful. The goal was to get better error messages ("closer" to the error; I hope you know what I mean). The following are the bits that I think are useful. 1) operatorPrecedence, modification 1 Around line 3579, there is: elif arity == 2: if opExpr is not None: matchExpr = FollowedBy(lastExpr + opExpr + lastExpr) + Group( lastExpr + OneOrMore( opExpr + lastExpr ) ) This seems wasteful and does not use "-" when it should. I modified it as such: elif arity == 2: if opExpr is not None: matchExpr = Group(lastExpr + FollowedBy(opExpr) + OneOrMore(opExpr - lastExpr)) In this way, we advance the pointer past the opExpr. I think this is the right semantics for 99% of the cases. The exception is if the user is overloading the opExprs. 2) operatorPrecedence, modification 2 At the beginning, you have: lastExpr = baseExpr | ( Suppress('(') + ret + Suppress(')') ) I modified it using: opnames = ",".join(str(x) for x in allops) parenthesis = Suppress('(') + ret + FollowedBy(NotAny(oneOf(allops))) - Suppress(')') lastExpr = parenthesis | baseExpr Basically if I see the parenthesis, after a ret, if there isn't an operator, we have to find the parenthesis. These two together make me have much better error messages: (see in fixed width) line 1 >list(1,2,(tuple(str,a,(?))) ^ | here or nearby line 1 >1+(3*2?) ^ | here or nearby You can find the whole function here: https://github.com/AndreaCensi/contracts/blob/feature/better_error_messages/src/contracts/pyparsing_utils.py (this is not the main branch) 2) Catching ambiguity in Or(). Because my grammar is context-dependent (meaning that 'x' might be parsed differently according to the context), I had several debugging pains, especially when I was trying to get rid of the Or() in favor of MatchFirst(). What I did was modify Or() such that it checks that, if two clauses can parse the string with the same number of characters, then they have to have the same ParseResults. (if that's not true, it's a disaster waiting to happen) This involved adding __eq__ to ParseResults and then add the following to Or. Where it says: else: if loc2 > maxMatchLoc: maxMatchLoc = loc2 maxMatchExp = e I modify it in: else: if loc2 > maxMatchLoc: maxMatchLoc = loc2 maxMatchExp = e elif loc2 == maxMatchLoc: val1 = e._parse(instring, loc, True) val2 = maxMatchExp._parse(instring, loc, True) if not(val1 == val2): msg = ('Ambiguous syntax, I could match both (and maybe more):\n- %s\n- %s\n.' % (get_desc(e), get_desc(maxMatchExp))) msg += 'Their values are: \n' msg += '- {0!r}\n'.format(val1) msg += '- {0!r}\n'.format(val2) raise ParseFatalException(instring, loc, msg, self) You can see this in https://github.com/AndreaCensi/contracts/blob/feature/better_error_messages/src/contracts/mypyparsing.py#L2546 (ignore the other stuff I changed in mypyparsing; I was "experimenting" to understand that business of Fatal vs non-Fatal exceptions) Best, Andrea -- Andrea Censi PhD student, Control & Dynamical Systems, Caltech http://www.cds.caltech.edu/~andrea/ "Life is far too important to be taken seriously" (Oscar Wilde) |