Re: [Pyparsing] setResultsName on a recursive element of grammar
Brought to you by:
ptmcg
From: Elizabeth M. <eli...@in...> - 2016-02-23 11:40:35
|
On 20/02/16 10:36, Paul McGuire wrote: > Here is the whole parser in one copy/pasteable chunk: > > command = Word(alphas) | Word(nums) > COLON = Suppress(':') > middle = ~COLON + Word(printables) > trailing = Word(printables) > params = (OneOrMore(middle)("middle") + > COLON + > ZeroOrMore(trailing)("trailing")) > line = (command("command") + Group(params)("params")) > > tests = """\ > COMMAND param1 param2 : param3""" > line.runTests(tests) > > And no need to kludge in any `listAllMatches` behavior either. > > -- Paul > > > > --- > This email has been checked for viruses by Avast antivirus software. > https://www.avast.com/antivirus > There was one minor thing I forgot. Word(printables) is insufficient in your example, as any 8-bit string is acceptable in parameters (and are often used). The encoding is not specified for these portions (deliberately it seems), but I usually implicitly assume UTF-8, since that is the de-facto standard. As such, UTF-8 is what I decode all stuff from the wire to, using the 'replace' error handler. This is my proposed solution to match all characters but surely there is a better way than the below (perhaps using Regex is better?): # 1111998 is the total number of valid Unicode characters. utf8_chars = ''.join(chr(x) for x in range(1111998)) middle = (~COLON + ~White()) + Word(utf8_chars) -- Elizabeth |