[Pyparsing] RE: Pyparsing unicode

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

> -----Original Message-----
> From: Jean-Paul Calderone [mailto:ex...@di...] 
> Sent: Saturday, May 13, 2006 11:41 AM
> To: Paul McGuire
> Subject: Re: Pyparsing unicode
> 
> On Fri, 5 May 2006 15:49:10 -0500, Paul McGuire 
> <pa...@al...> wrote:
> >J-P,
> >
> >How are things for you with this issue?  Is upgrading to 1.4.2 an
acceptable
> >option for you?  I'm seeing through Google that this is giving you a fair
> >amount of heartburn - I'm sorry for this, let me know if things are still
a
> >problem for you.
> 
> Hey Paul,
> 
> I finally got a chance to try out 1.4.2 with Imaginary.  The 
> unicode difficulties seem to be resolved (at least, my test 
> suite no longer complains about getting str instead of 
> unicode), but the previously broken tests still don't 
> actually pass, and actually five new tests are now failing.
> 
> I haven't looked at the failures closely yet, but it looks 
> like they are mostly in areas which rely on quoted strings 
> and are failing because instead of the content of the string 
> being handed back an empty string is received instead.
> 
> Maybe my "targetString" expression isn't being defined 
> correctly?  Here's the definition again:
> 
>     def stripper(s, loc, toks):
>         toks = toks.asList()
>         toks[0] = toks[0][1:-1]
>         return toks
> 
>     def targetString(name):
>         qstr = pyparsing.quotedString.setParseAction(stripper)
>         return (
>             pyparsing.Word(pyparsing.alphanums) ^
>             qstr).setResultsName(name)
> 
> Failures from the test suite mostly come out looking like this:
> 
>     twisted.trial.unittest.FailTest:
>     ["> create pants 'pair of daisy dukes'", ' created.']
>     did not match expected
>     [u"> create pants 'pair of daisy dukes'", 'Pair of daisy dukes
created.']
>     (Line 1)
> 
> and:
> 
>     twisted.trial.unittest.FailTest:
>     ["> create 'vending machine' vendy", "Can't find ."]
>     did not match expected
>     [u"> create 'vending machine' vendy", 'Vendy created.']
>     (Line 1)
> 
> I'll probably investigate this a bit further myself today, 
> but if you have any hints, they'd be much appreciated.
> 
> Feel free to CC the pyparsing list on your response, if you'd 
> like to take this discussion back there.
> 
> Thanks,
> 
> Jean-Paul
> 
>
Jean-Paul,

I extracted your sample, and made a small test case. On the face of things,
the problem does not seem to be with quotedString, or with your stipper
routine (although you might convert over to using removeQuotes instead of
defining your own function).

Also, the results from FailTest aren't unicode, but you are expecting
unicode.  I think the problem may be in the expression that targetString()
is embedded in.

Here's my extracted test.

(As a side question, do your parse actions often assign into the toks
argument, like in the marked line below?  I wont go so far as to say this
isn't supported, but I don't think I test for this case very rigorously.  I
expected ParseResults to be read, not written.  In general, ParseResults get
built up as parsing occurs, and *some* updates (such as del of a slice) are
explicitly implemented and tested.  But assigning a specific item back into
an existing ParseResults is not something I had planned for.)

-- Paul

--------------
import pyparsing

def stripper(s, loc, toks):
    toks = toks.asList()
    toks[0] = toks[0][1:-1]  # <--- assigning to toks 
    return toks

def targetString(name):
    qstr = pyparsing.quotedString.setParseAction(stripper)
    # stripper could be replaced with the removeQuotes parse action
    # qstr = pyparsing.quotedString.setParseAction(pyparsing.removeQuotes)
    return (
        pyparsing.Word(pyparsing.alphanums) ^
        qstr).setResultsName(name)

input = u""" this is a Unicode string with a 'quoted string' in it  """

for toks,s,e in targetString("word").scanString(input):
    print toks.asList()

--------------
Gives me the result:
[u'this']
[u'is']
[u'a']
[u'Unicode']
[u'string']
[u'with']
[u'a']
[u'quoted string']
[u'in']
[u'it']