Hey All,
I've been using PyParsing to handle commands in Imaginary (formerly Pottery). So far it's done most of the things I've asked of it, and I think I have some ideas to work around the rest, but the behavior with respect to unicode is a bit confusing.
In 1.2 (Ubuntu Breezy packaged version), I could parse a unicode string and get back a unicode string:
exarkun@boson:~$ python
Python 2.4.2 (#2, Sep 30 2005, 21:19:01)
[GCC 4.0.2 20050808 (prerelease) (Ubuntu 4.0.1-4ubuntu8)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyparsing
>>> pyparsing.__version__
'1.2'
>>> pyparsing.quotedString.parseString(u"'foo'")
([u"'foo'"], {})
>>>
exarkun@boson:~$
However, on upgrading to 1.3 (Ubuntu Dapper packaged version), this no longer appears to be the case:
exarkun@kunai:~$ python
Python 2.4.3 (#2, Apr 27 2006, 14:43:58)
[GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyparsing
>>> pyparsing.__version__
'1.3.3'
>>> pyparsing.quotedString.parseString(u"'foo'")
(["'foo'"], {})
>>>
exarkun@kunai:~$
More confusing, this behavior seems to depend on the exact expression you use to parse a string: sometimes the result will come out as unicode, sometimes not. The exact expression I am using (created by the targetString function here <http://divmod.org/trac/browser/trunk/Imaginary/imaginary/commands.py#L19>) allows either quoted or unquoted strings and, frustratingly, if the quotes are supplied the result is a str, but if they are omitted the result is unicode.
I have considered wrapping my usage of PyParsing in an extra layer that does type-checking and decodes when appropriate, but this seems like a hackish work-around for a mis-feature of PyParsing, rather than the correct solution.
Is this a bug, am I mis-using PyParsing, or does PyParsing really just not differentiate between these two types?
Thanks in advance,
Jean-Paul
|