[Pyparsing] Word and Regex matching more than they should
Brought to you by:
ptmcg
From: Stuart L. <st...@vr...> - 2018-01-19 06:36:15
|
Hi all, I've got a funny issue with trying to get pyparsing to parse a grammar for Project Haystack data. The data format I'm trying to parse is described here: https://www.project-haystack.org/doc/Zinc I'm slowly working my way up the parsing tree, but I'm finding pyparsing is tripping up on my grammar definitions. For the purpose of discussion, I've posted my grammar here: https://github.com/vrtsystems/hszinc/blob/feature/WC-1173-add-list-support/hszinc/grammar.py I'll admit up front I am new to pyparsing. Previously I used Parsimonious, but couldn't quite get to handle the recursive nature of Project Haystack data types, in particular, I had trouble making it parse a filter string. A proof of concept for pyparsing worked, so I'm trying to get a more complete grammar working so that I can parse the data coming back from Project Haystack. I'm finding though that some of my patterns are capturing more than I anticipated. If, for instance, I try to parse a quantity… a quantity is defined as a decimal number, followed by a unit string. The unit string may consist of letters, the symbols %, _, $ and /, or Unicode points 128 or above. Crucially, it may not match a space. I'm finding if I pass one in, it does: > stuartl@vk4msl-ws ~/vrt/projects/widesky/sdk/hszinc $ ipython2 > Python 2.7.14 (default, Jan 17 2018, 17:36:45) > Type "copyright", "credits" or "license" for more information. > > IPython 5.4.1 -- An enhanced Interactive Python. > ? -> Introduction and overview of IPython's features. > %quickref -> Quick reference. > help -> Python's own help system. > object? -> Details about 'object', use 'object??' for extra details. > > In [1]: from hszinc import grammar > > In [2]: grammar.hs_quantity.parseString('123.45 notpartofquantity') > Out[2]: ([BasicQuantity(123.45, 'notpartofquantity')], {}) That has taken ' notpartofquantity', and included it in the raw data for the Quantity. It should ignore that because of the space separation. This breaks hs_meta; which is supposed to parse metadata pairs and markers, e.g. aString:"testing" aNumber:123.45 aMarker Any ideas where I might be going wrong? Thanks in advance. Regards, -- _ ___ Stuart Longland - Systems Engineer \ /|_) | T: +61 7 3535 9619 \/ | \ | 38b Douglas Street F: +61 7 3535 9699 SYSTEMS Milton QLD 4064 http://www.vrt.com.au |