[Pyparsing] RE: Pyparsing unicode
Brought to you by:
ptmcg
From: Paul M. <pa...@al...> - 2006-05-13 17:35:09
|
> -----Original Message----- > From: Jean-Paul Calderone [mailto:ex...@di...] > Sent: Saturday, May 13, 2006 11:41 AM > To: Paul McGuire > Subject: Re: Pyparsing unicode > > On Fri, 5 May 2006 15:49:10 -0500, Paul McGuire > <pa...@al...> wrote: > >J-P, > > > >How are things for you with this issue? Is upgrading to 1.4.2 an acceptable > >option for you? I'm seeing through Google that this is giving you a fair > >amount of heartburn - I'm sorry for this, let me know if things are still a > >problem for you. > > Hey Paul, > > I finally got a chance to try out 1.4.2 with Imaginary. The > unicode difficulties seem to be resolved (at least, my test > suite no longer complains about getting str instead of > unicode), but the previously broken tests still don't > actually pass, and actually five new tests are now failing. > > I haven't looked at the failures closely yet, but it looks > like they are mostly in areas which rely on quoted strings > and are failing because instead of the content of the string > being handed back an empty string is received instead. > > Maybe my "targetString" expression isn't being defined > correctly? Here's the definition again: > > def stripper(s, loc, toks): > toks = toks.asList() > toks[0] = toks[0][1:-1] > return toks > > def targetString(name): > qstr = pyparsing.quotedString.setParseAction(stripper) > return ( > pyparsing.Word(pyparsing.alphanums) ^ > qstr).setResultsName(name) > > Failures from the test suite mostly come out looking like this: > > twisted.trial.unittest.FailTest: > ["> create pants 'pair of daisy dukes'", ' created.'] > did not match expected > [u"> create pants 'pair of daisy dukes'", 'Pair of daisy dukes created.'] > (Line 1) > > and: > > twisted.trial.unittest.FailTest: > ["> create 'vending machine' vendy", "Can't find ."] > did not match expected > [u"> create 'vending machine' vendy", 'Vendy created.'] > (Line 1) > > I'll probably investigate this a bit further myself today, > but if you have any hints, they'd be much appreciated. > > Feel free to CC the pyparsing list on your response, if you'd > like to take this discussion back there. > > Thanks, > > Jean-Paul > > Jean-Paul, I extracted your sample, and made a small test case. On the face of things, the problem does not seem to be with quotedString, or with your stipper routine (although you might convert over to using removeQuotes instead of defining your own function). Also, the results from FailTest aren't unicode, but you are expecting unicode. I think the problem may be in the expression that targetString() is embedded in. Here's my extracted test. (As a side question, do your parse actions often assign into the toks argument, like in the marked line below? I wont go so far as to say this isn't supported, but I don't think I test for this case very rigorously. I expected ParseResults to be read, not written. In general, ParseResults get built up as parsing occurs, and *some* updates (such as del of a slice) are explicitly implemented and tested. But assigning a specific item back into an existing ParseResults is not something I had planned for.) -- Paul -------------- import pyparsing def stripper(s, loc, toks): toks = toks.asList() toks[0] = toks[0][1:-1] # <--- assigning to toks return toks def targetString(name): qstr = pyparsing.quotedString.setParseAction(stripper) # stripper could be replaced with the removeQuotes parse action # qstr = pyparsing.quotedString.setParseAction(pyparsing.removeQuotes) return ( pyparsing.Word(pyparsing.alphanums) ^ qstr).setResultsName(name) input = u""" this is a Unicode string with a 'quoted string' in it """ for toks,s,e in targetString("word").scanString(input): print toks.asList() -------------- Gives me the result: [u'this'] [u'is'] [u'a'] [u'Unicode'] [u'string'] [u'with'] [u'a'] [u'quoted string'] [u'in'] [u'it'] |