Thread: [Pyparsing] PyParsing and unicode

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hey All,

I've been using PyParsing to handle commands in Imaginary (formerly Pottery).  So far it's done most of the things I've asked of it, and I think I have some ideas to work around the rest, but the behavior with respect to unicode is a bit confusing.

In 1.2 (Ubuntu Breezy packaged version), I could parse a unicode string and get back a unicode string:

    exarkun@boson:~$ python
    Python 2.4.2 (#2, Sep 30 2005, 21:19:01) 
    [GCC 4.0.2 20050808 (prerelease) (Ubuntu 4.0.1-4ubuntu8)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import pyparsing
    >>> pyparsing.__version__
    '1.2'
    >>> pyparsing.quotedString.parseString(u"'foo'")
    ([u"'foo'"], {})
    >>> 
    exarkun@boson:~$

However, on upgrading to 1.3 (Ubuntu Dapper packaged version), this no longer appears to be the case:

    exarkun@kunai:~$ python
    Python 2.4.3 (#2, Apr 27 2006, 14:43:58) 
    [GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import pyparsing
    >>> pyparsing.__version__
    '1.3.3'
    >>> pyparsing.quotedString.parseString(u"'foo'")
    (["'foo'"], {})
    >>> 
    exarkun@kunai:~$ 

More confusing, this behavior seems to depend on the exact expression you use to parse a string: sometimes the result will come out as unicode, sometimes not.  The exact expression I am using (created by the targetString function here <http://divmod.org/trac/browser/trunk/Imaginary/imaginary/commands.py#L19>) allows either quoted or unquoted strings and, frustratingly, if the quotes are supplied the result is a str, but if they are omitted the result is unicode.

I have considered wrapping my usage of PyParsing in an extra layer that does type-checking and decodes when appropriate, but this seems like a hackish work-around for a mis-feature of PyParsing, rather than the correct solution.

Is this a bug, am I mis-using PyParsing, or does PyParsing really just not differentiate between these two types?

Thanks in advance,

Jean-Paul

Thread: [Pyparsing] PyParsing and unicode

pyparsing-users