Re: [Pyparsing] How to distinguish a variable from a integer
Brought to you by:
ptmcg
From: Paul M. <pt...@au...> - 2009-05-14 17:47:33
|
> -----Original Message----- > From: Gustavo Narea [mailto:me...@gu...] > Sent: Thursday, May 14, 2009 12:02 PM > To: pyp...@li... > Subject: [Pyparsing] How to distinguish a variable from a integer > > Hello, everybody. > > First of all, I wanted to thank you for this awesome package. I'm having > fun with it. :) Well, well, my friend, so we meet again! I'm pleased to see you have been bitten by the pyparsing bug. :) > How can I fix this? > In general, I think this is why variable names in most computing languages I know do *not* permit the name to begin with a number. But you are the language designer, so I will show you how to do this in pyparsing. Two suggestions, not sure if I have a preference: 1. use "operand = number ^ variable" instead of "operand = number | variable". '|' returns MatchFirst expressions, which return, well, the first matching expression. '^' returns Or expressions, which return *longest* match of all the alternative expressions. Think of the '^' as a little set of dividers, measuring the returned values of all the expressions, and picking the longest. '^' is not a cure-all, though, and can cause infinite run-time recursion in self-referencing grammars (those that include operatorPrecedence or Forward expressions). 2. As you say, invert operand to "operand = variable | number", and then attach a parse action to variable that first tries to evaluate the result as a number. In your current parser, you may eventually attach a parse action to number, something like this: number.setParseAction(lambda tokens: float(tokens[0])) so that at post-parse time, the returned string has already been converted to a float. So instead, attaching something like this to variable (untested): def numOrVar(tokens): try: return float(tokens[0]) except ValueError: pass variable.setParseAction(numOrVar) Now you don't even need the alternation, since as you observed, variable will also match "22", so just define "operand = variable". You could also try this for defining variable: variable = Word(unicode(alphanums+'_')) or variable = Word(unicode(alphanums+alphas8bit+'_')) or to absolutely cover all bases (for 2-byte Unicode, anyway): allUnicodeAlphas = u''.join(c for c in map(unichr,range(65536)) if c.isalpha()) allUnicodeNums = = u''.join(c for c in map(unichr,range(65536)) if c.isdigit()) variable = Word(allUnicodeAlphas + allUnicodeNums + u'_') (It's surprising how many Unicode digits there are besides '0'-'9'.) BTW, this definition of decimals: decimals = Optional(decimal_sep + OneOrMore(Word(nums))) includes some unnecessary repetition. It should be sufficient to write: decimals = Optional(decimal_sep + Word(nums)) Unless I misunderstood your intent here. So, will we see some pyparsing sneak into a repoze package one of these days, perhaps some sort of authorization rights syntax, hmmm? Buena suerte, y mucho gusto! -- Paul |