nestedExpr is splitting on QuotedString by default
Brought to you by:
ptmcg
Hello,
nestedExpr
is splitting on a QuotedString
by default. I believe that shouldn't be the case.
When ignoreExpr
is set to None
, splitting on quotes does not occur but then expression is split on whitepsaces inside quotes, which is actually not desired in my case.
from pyparsing import nestedExpr expr = nestedExpr('{{{', '}}}') expr.parseString('{{{ a="a b" }}}') ([(['a=', '"a b"'], {})], {})
To workaround my problem, I need go through the list of parsed tokens and reconnect two neighbouring elements if one ends with '=' and the second starts with a quote.
I think this should be fixed but if there is other way around, please tell.
nestedExpr
is provided in pyparsing as a shortcut for more complex expressions that support nesting on opening and closing grouping strings. But as a shortcut, it does not really do much meaningful with the contents within the groups. So the question is, what shouldnestedExpr
make of the strings that are inside the nested groups?By default,
nestedExpr
will look for space-delimited words of printables, so thatwill parse into
(if you call
asList()
on theParseResults
object that comes back fromparseString()
).It then raises the question, "what if I use a quoted string to represent a nested item that contains a space?", as in:
Returning
is pretty clearly a wrong guess, so
nestedExpr
also looks for quoted strings while parsing contents of the nested bits, giving:This also protects us in case we get a tuple with an open or close paren in quotes:
Nine times out of ten, we don't want that ')' to close the outer group, it is just another character in the nested character string.
But things start to look bad if our nested expression is much more like a Python tuple, with delimiting commas:
Then the delimiting commas get mixed in with our parsed text:
So
nestedExpr
supports an optionalcontent
arg, to permit definition of more complex contents in our groups. If we want to try parsing nested delimited lists of alphabetic words, we can write:And now
nestedExpr
treats the nested contents asdelimitedList
s, which suppress the delimiting commas and just give back the list items:Now what if we had something that really looked like a nested tuple, with commas separating every term, including nested lists. If we use
nestedAlphaLists
to parse this string:We'll get this error:
Our content definition only expects words separated by commas, no trailing or leading commas. We need to further expand our content argument to look like:
And now we can parse our nested tuple successfully.
At this point, are we really parsing? This "smarter" nested list is not too smart, really. It will accept this string as well:
since the leading and trailing commas on the nested content are optional.
I would argue that at this point, we have exceeded the bounds of the
nestedExpr
convenience method, and we need to buckle down and actually parse the expression using a nested parser. Something like this:And if we revisit the earlier desire to accept quoted strings as items that might contain a space, or comma, or '(', then we just update
nested_item
to:And now our parser will handle this tuple-like string as well:
Giving:
Really, using
nestedExpr
for anything more complex than a space-delimited or comma-delimited list is something of a cheat, which is why I call it more of a shortcut than a real parsing element. It is very handy when parsing a language like C for function definitions, where you want to write a lazy parser to match a function method signature, but skip over all the other C syntax that might be found in the function body. Fortunately, unlike Python, C uses braces to delimit the code for a function, so you can define a C "parser" as:And this parser will find function definitions in C, but not really do much parsing of the C language itself.
So finally, to look at your question. You are parsing a string of the form:
Which might sometimes be written as:
Or:
Or even:
And
nestedExpr
is following its default definition of looking for space-delimited printables and possible quoted strings.If 'a = "blah"' has some meaning in your text, then you should probably parse it explicitly, or at least define an expression for it and pass that as the
content
arg tonestedExpr
. Something like:Now your nested expr can look like:
You don't give an example of a nested expression, so I'm not sure how assignments should handle nesting. But I hope this discussion gives you more background on when
nestedExpr
is appropriate, and when you need to do more actual parsing.Here is the test code I wrote to test all these strings and expressions: