Re: [Pyparsing] Word and Regex matching more than they should

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Your sample code was right in front of me!

import pyparsing as pp
class Quantity(object):
     def __init__(self, value, unit):
         self.value = value
         self.unit = unit
     def __repr__(self):
         return 'Q(%r, %r)' % (self.value, self.unit)

# hs_unit         = pp.Regex(r"[a-zA-Z%_/$\x80-\x{:x}]+".format(sys.maxunicode))
hs_unit         = pp.Regex(r"[a-zA-Z%_/$\x80-\xffffff]+").setName("unit-string")
hs_decimal      = pp.Regex(r"-?[\d_]+(\.[\d_]+)?([eE][+\-]?[\d_]+)?").setParseAction(
                lambda toks : [float(toks[0].replace('_',''))]).setName("decimal-numeric")
hs_quantity     = (hs_decimal("value") + hs_unit.leaveWhitespace()("unit")).setParseAction(
                lambda toks: Quantity(**toks))

hs_quantity.runTests("""\
123.123abc
123.123 abc
""")

Oddly enough, I could not specify the unicode range that you did, nor does sys.maxunicode work. This actually looks like a Python bug. I also see that your units is not quite as liberal as the unicode_printables one that I wrote, accepting only '%_/$' punctuation characters. I also see that your decimal expression accepts '_' spacers - the pyparsing_common.number expression that I used in the previous reply does not do this.

I made a few other tweaks to your parser:
- added setName() calls, so that exceptions are a bit clearer looking ("expected unit-string" instead of "expected Re:('[a-zA-Z%_/$\\x80-\\xffffff]+')")
- used results names in hs_quantity so that the name-to-expression mapping was clearer (note that setName() sets the name of the expression itself, while setting results names sets the name to be used for the respective parsed results)

Out of curiosity, why Python2? I would only use Py2 for legacy work at this point, not for new projects.

-- Paul

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus