Re: [Pyparsing] Better way than operatorPrecedence to parse aregexp-like grammar?
Brought to you by:
ptmcg
From: Paul M. <pt...@au...> - 2009-02-26 14:58:28
|
You should start by trying to tune up your definition of term, since this expression gets used a *lot* internally to the operatorPrecedence code. Here are some comments/questions/suggestions on cleaning up term: 1. term includes 2 references to numericRange, why? 2. variable() returns an expression that is a MatchFirst of 10 different characters. Just have this method return Regex("[0-9#]"), it will evaluate much faster. 3. numericRange tests for signedNumbers, and then for unsigned numbers. But your unsigned numbers would match the signedNumber expression, so the second alternative test will never match. Also, signedNumber the way you have defined it would match a single "-" character, probably not desired. Try this: signedNumber = Optional('-') + Word(nums) # or even just Regex(r"-?\d+") numericRange = ( (lbrack + Literal('#') + (signedNumber | '*').setResultsName('min') + Suppress(':') + (signedNumber | '*').setResultsName('max') + rbrack) ) (I also removed the Combine - you might want Group instead.) 4. I streamlined repetition a bit from: repetition = ( ( plus + lbrace + Word(nums).setResultsName("count") + rbrace ) | ( plus + lbrace + Word(nums).setResultsName("minCount")+","+ Word(nums).setResultsName("maxCount") + rbrace ) | plus ) to: repetition = plus + Optional( lbrace + ( ( Word(nums).setResultsName("minCount")+","+ Word(nums).setResultsName("maxCount") ) | Word(nums).setResultsName("count") ) + rbrace ) Which could look a little nicer as: repetition = plus + Optional( lbrace + ( ( Word(nums)("minCount")+","+ Word(nums)("maxCount") ) | Word(nums)("count") ) + rbrace ) (runs no faster, but I find it a little easier to read). Since repetition is the first precedence level, it gets used a lot, so any streamlining here helps. 5. space = OneOrMore(White()) Really? I doubt you are matching any these at the moment, since you aren't taking any steps to disable pyparsing's default behavior of skipping whitespace. But as the first alternative in the list of expressions in term, you are testing for it *many* times. 6. You might be able to reorder the options in term based on the likelihood of occurrence in the input text. Since this is a MatchFirst, testing for more common options ahead of rarer ones will shortcut the rest of the tests, with a performance win. You might also define: integer = Word(nums) And then use integer in all your related expressions, instead of repeating Word(nums) all the time - this will make your code a little easier to read, and the packratting will be a little more efficient, too. I also have some comments on operatorPrecedence itself, but I'll wait until you have gotten term to run a bit better before delving into oP. Just one note, instead of (in your list of precedence definitions): (Empty(), 2, opAssoc.LEFT, self.handleSequence), Try: (None, 2, opAssoc.LEFT, self.handleSequence), -- Paul |