Hi,
I'm trying to get a parser working for parsing Project Haystack ZINC format grids, but am having a problem with the parser incorrectly matching invalid strings.
In the example below, I'm trying to match a quantity (a numeric value with a unit). Below is a basic test case with pyparsing 2.2.0, and Python 2.7. I can also reproduce it in pyparsing 2.0.3 as shipped in Debian Jessie.
import pyparsing as pp class Quantity(object): def __init__(self, value, unit): self.value = value self.unit = unit def __repr__(self): return 'Q(%r, %r)' % (self.value, self.unit) hs_unit = pp.Regex(ur"[a-zA-Z%_/$\x80-\xffffffff]+") hs_decimal = pp.Regex(r"-?[\d_]+(\.[\d_]+)?([eE][+\-]?[\d_]+)?").setParseAction( lambda toks : [float(toks[0].replace('_',''))]) hs_quantity = (hs_decimal + hs_unit).setParseAction( lambda toks : [Quantity(toks[0], unit=toks[1])]) hs_quantity.parseString('123.123 abc') # -> ([Q(123.123, 'abc')], {}) hs_quantity.parseString('123.123 abc', parseAll=True) # -> ([Q(123.123, 'abc')], {})
Note that, nowhere does that regex permit spaces. hs_decimal
allows a leading hyphen, digits, underscores, decimal points and the letter e
(either case), but not spaces. Likewise hs_unit
allows alphanumerics, some punctuation, and Unicode code points above 0x80, but not spaces (0x20).
hs_quantity
is literally a hs_decimal
followed immediately (no space) by a hs_unit
… there should be no space. 123.123 abc
should not match this pattern, ever.
Okay, I've now determined this is down to how
pyparsing
behaves… it defaults to matching (and discarding) proceeding whitespace.The fix to this was somewhat ugly, but in essence, I had to wrap each
pyparsing
object with a wrapper that would callleaveWhitespace
before returning it. Since it's nearly impossible to know where this is being applied, I've resorted to doing it on allpyparsing
objects I use so I don't get caught out. The result is visually messy, but works.I think this "feature" could be better described, having a switch that can turn this on or off on an instance-wide basis would be useful too.
If you want to completely disable whitespace skipping, use
ParserElement.setDefaultWhitespaceChars("")
right after importing pyparsing. But whitespace skipping is a basic feature (no air quotes necessary) of pyparsing, and helps avoid having to sprinkle\s*
all over your regexes.Closing this as "works as designed".