#23 QuotedString escChar escapes not just quotes

open
nobody
None
5
2007-02-05
2007-02-05
No

When a string is passed to QuotedString for parsing it appears to remove all occurences of the escChar value without checking whether quoteChar is actually being escaped.

Example:

>>> from pyparsing import QuotedString
>>> q = QuotedString(quoteChar="'", escChar="\\")
>>> r = q.parseString(r"This won\'t work\nwell.")
>>> r.asList()[0]
"This won't worknwell."
>>>

Discussion

  • Paul McGuire

    Paul McGuire - 2007-02-06

    Logged In: YES
    user_id=893320
    Originator: NO

    In general, I don't expect any whitespace escapes in the strings processed by pyparsing, or rather, I expect they've already been converted to their respective whitespace characters. So I'm not entirely convinced that this is the job of pyparsing's classes - if it were, I would probably need to include this behavior into many other pyparsing classes too.

    Heres a function that will convert \n, \t, et al. before passing the string to pyparsing, along with an example using your test string.

    -- Paul

    from pyparsing import QuotedString,Combine,oneOf
    t = r"'This won\'t work\nwell.'"
    print t

    q = QuotedString(quoteChar="'", escChar="\\")
    r = q.parseString(t)
    print r.asList()[0]
    print

    def interpretWhitespaceEscapes(s):
    def unescape(t):
    return { 't':'\t', 'n':'\n', 'f':'\f', 'r':'\r' }[t[0][1]]
    return (Combine('\\'+oneOf(list("tnfr"))))\ .setParseAction(unescape)\ .transformString(s)

    t = interpretWhitespaceEscapes(t)
    print t
    q = QuotedString(quoteChar="'", escChar="\\", multiline=True)
    r = q.parseString(t)
    print r.asList()[0]

    Prints:
    'This won\'t work\nwell.'
    This won't worknwell.

    'This won\'t work
    well.'
    This won't work
    well.

     
  • Jason Peacock

    Jason Peacock - 2007-02-06

    Logged In: YES
    user_id=1547144
    Originator: YES

    Framing the issue as a whitespace handling problem hadn't occurred to me.

    In my original post, I left out why I think there is a bug in QuotedString. When I passed "'This won\'t work\nwell.'" into QuotedString, I expected "This won't work\nwell." as the result (with the "\n" ignored).

    I brought the issue up because, in general, it seemed surprising that QuotedString would operate on all occurrences of the escape character rather than only escaped quote marks. I expected QuotedString's escape handling to operate only on escaped quote marks and leave all other 'escaped' character sequences as is. At least, that's what the documentation seemed to imply.

     
  • Jason Peacock

    Jason Peacock - 2007-02-08

    patches pyparsing.py so that QuotedString ignores escaped characters except for escaped quotes.

     
  • Jason Peacock

    Jason Peacock - 2007-02-08

    Logged In: YES
    user_id=1547144
    Originator: YES

    I've come up with a patch that (partially) addresses the QuotedString escaped quote handling behavior.

    If we start with this string (and I've changed escape character to make things clearer):

    "'That!'s not going to work!nwell'"

    Current QuotedString (v1.4.5) will return this:

    "That's not going to worknwell"

    After the patch is applied, then this is returned:

    "That's not going to work!nwell"

    Which is what I had originally expected.

    Regarding the patch; I don't fully understand what's happening with endQuoteChar having multiple characters, so the code appending additional expressions for recognizing endQuoteChar to escCharReplacePattern is incomplete.

    File Added: quoted_string.diff

     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks