QuotedString escChar escapes not just quotes
Brought to you by:
ptmcg
When a string is passed to QuotedString for parsing it appears to remove all occurences of the escChar value without checking whether quoteChar is actually being escaped.
Example:
>>> from pyparsing import QuotedString
>>> q = QuotedString(quoteChar="'", escChar="\\")
>>> r = q.parseString(r"This won\'t work\nwell.")
>>> r.asList()[0]
"This won't worknwell."
>>>
Logged In: YES
user_id=893320
Originator: NO
In general, I don't expect any whitespace escapes in the strings processed by pyparsing, or rather, I expect they've already been converted to their respective whitespace characters. So I'm not entirely convinced that this is the job of pyparsing's classes - if it were, I would probably need to include this behavior into many other pyparsing classes too.
Heres a function that will convert \n, \t, et al. before passing the string to pyparsing, along with an example using your test string.
-- Paul
from pyparsing import QuotedString,Combine,oneOf
t = r"'This won\'t work\nwell.'"
print t
q = QuotedString(quoteChar="'", escChar="\\")
r = q.parseString(t)
print r.asList()[0]
print
def interpretWhitespaceEscapes(s):
def unescape(t):
return { 't':'\t', 'n':'\n', 'f':'\f', 'r':'\r' }[t[0][1]]
return (Combine('\\'+oneOf(list("tnfr"))))\ .setParseAction(unescape)\ .transformString(s)
t = interpretWhitespaceEscapes(t)
print t
q = QuotedString(quoteChar="'", escChar="\\", multiline=True)
r = q.parseString(t)
print r.asList()[0]
Prints:
'This won\'t work\nwell.'
This won't worknwell.
'This won\'t work
well.'
This won't work
well.
Logged In: YES
user_id=1547144
Originator: YES
Framing the issue as a whitespace handling problem hadn't occurred to me.
In my original post, I left out why I think there is a bug in QuotedString. When I passed "'This won\'t work\nwell.'" into QuotedString, I expected "This won't work\nwell." as the result (with the "\n" ignored).
I brought the issue up because, in general, it seemed surprising that QuotedString would operate on all occurrences of the escape character rather than only escaped quote marks. I expected QuotedString's escape handling to operate only on escaped quote marks and leave all other 'escaped' character sequences as is. At least, that's what the documentation seemed to imply.
patches pyparsing.py so that QuotedString ignores escaped characters except for escaped quotes.
Logged In: YES
user_id=1547144
Originator: YES
I've come up with a patch that (partially) addresses the QuotedString escaped quote handling behavior.
If we start with this string (and I've changed escape character to make things clearer):
"'That!'s not going to work!nwell'"
Current QuotedString (v1.4.5) will return this:
"That's not going to worknwell"
After the patch is applied, then this is returned:
"That's not going to work!nwell"
Which is what I had originally expected.
Regarding the patch; I don't fully understand what's happening with endQuoteChar having multiple characters, so the code appending additional expressions for recognizing endQuoteChar to escCharReplacePattern is incomplete.
File Added: quoted_string.diff