Re: [Pyparsing] How to distinguish a variable from a integer
Brought to you by:
ptmcg
From: spir <den...@fr...> - 2009-05-14 17:22:08
|
Le Thu, 14 May 2009 19:01:42 +0200, Gustavo Narea <me...@gu...> s'exprima ainsi: > Hello, everybody. > > First of all, I wanted to thank you for this awesome package. I'm having > fun with it. :) > > I've read O'reilly's shortcut on Pyparsing, but still I can't find an > answer to this: > > One of the components of the grammar I'm defining is an operand. Operands > can be a number or a variable. A variable is a string made up of word > characters (in any language), numbers (in any language/culture) and/or a > spacing character (underscores by default). > > I'm using the following: > """ > import re > from pyparsing import * > > # Defining the numbers: > decimal_sep = Literal(".") > decimals = Optional(decimal_sep + OneOrMore(Word(nums))) > number = Combine(Word(nums) + decimals) > > # Defining the variables: > variable = Regex("[\w\d_]+", re.UNICODE) > > # Finally, let's define the operand: > operand = number | variable > """ > > The operand above works perfectly with the following expressions: > hello -> variable > 23 -> number > hello_world -> variable > > But it doesn't support variables which begin with a number (e.g., > "1st_variable"). I get the following exception all the time: > >>>> from varnums import * > >>>> operand.parseString("1st_variable") > >(['1'], {}) > >>>> operand.parseString("1st_variable", parseAll=True) > >Traceback (most recent call last): > > File "<stdin>", line 1, in <module> > > File > > "/home/gustavo/System/pyenvs/booleano/lib/python2.6/site-packages/pyparsing > >-1.5.2-py2.6.egg/pyparsing.py", line 1076, in parseString raise exc > >pyparsing.ParseException: Expected end of text (at char 1), (line:1, col:2) > > > I know I can invert the definition of the operand (i.e., "operand = > variable | number"), but then strings like "22" will be matched as > variables (not numbers). > > How can I fix this? > > Thanks in advance. The issue is that your variables can start like a number (the reason why in most PLs var names cannot start with a digit). So that: * using (number | variable) number masks variable * using the opposite number is eaten by variable You should use the common pattern for a variable, requiring letter or '_' at start: variable = Regex("[a-zA-Z_]\w*", re.UNICODE) (untested with unicode) (also beware that \w includes digits and '_') so that number and variable are mutually exclusive. You could also add a lookahead for !(letter | '_') trailing after the definition of number, but then your definition of variable is unclear. It should be required that a variable has at least one non-digit char. Which is uneasy ;-) Denis ------ la vita e estrany |