Re: [Pyparsing] PayPal IPN message parsing
Brought to you by:
ptmcg
From: Werner F. B. <wer...@fr...> - 2011-01-07 09:07:28
|
Paul and Eike, Thanks for your pointers. On 07/01/2011 02:18, Paul McGuire wrote: > I'm not super-keen on your variable naming (using alphanums as a Word > expression, overloading the alphanums string defined in pyparsing) As I import pyparsing as pyp it didn't cause me any problems, but you are right, it is changed. , but > let's go with it. The alphanums string in pyparsing is purely 7-bit ASCII > characters. As a first pass, try changing this to add the alphas8bit string: > > alphanums = Word(pyp.alphanums + pyp.alphas8bit) > > This should handle your posted question. That did it but will probably go with below. Thanks Werner > > If you need to handle more of the Unicode set (beyond chr(256)), then you'll > need to use these definitions: > >>>> alphas = u''.join(unichr(c) for c in range(65536) if > unichr(c).isalpha()) >>>> len(alphas) > 47672 >>>> nums = u''.join(unichr(c) for c in range(65536) if unichr(c).isdigit()) >>>> len(nums) > 404 > > So if you go to embracing all Unicode strings, there are actually over 400 > characters that are considered to be numeric digits. But I think alphas8bit > should carry you along for a while. > > -- Paul > > > > -----Original Message----- > From: Werner F. Bruhin [mailto:wer...@fr...] > Sent: Thursday, January 06, 2011 5:45 AM > To: pyp...@li... > Subject: [Pyparsing] PayPal IPN message parsing > > I am having some problems decoding these messages. > > The data comes in as an email message with a defined content type as > "Content-Type: text/plain", however it is really Content-Type: > text/plain; charset="windows-1252", so I read it in with > > thisfile = codecs.open(regFile, "r", "windows-1252"). > > The parsing works fine except on things like: > > address_name = Göran Petterson > > Which I parse with: > alphanums = pyp.Word(pyp.alphanums) > > # address > str_add_name = pyp.Literal("address_name =").suppress() +\ > alphanums + pyp.restOfLine > add_name = str_add_name.setParseAction(self.str_add_nameAction) > > But I get in str_add_nameAction: > ([u'G', u'\xf6ran Petterson\r'], {}) > > The raw data at this point is "address_name = G\xf6ran Petterson" > > What am I doing wrong in all this? > > I tried using pyp.printables instead of alphanums but with the same result. > > A tip would be very much appreciated. > > Werner > > P.S. > Happy New Year to you all. > > > ---------------------------------------------------------------------------- > -- > Learn how Oracle Real Application Clusters (RAC) One Node allows customers > to consolidate database storage, standardize their database environment, > and, > should the need arise, upgrade to a full multi-node Oracle RAC database > without downtime or disruption > http://p.sf.net/sfu/oracle-sfdevnl > _______________________________________________ > Pyparsing-users mailing list > Pyp...@li... > https://lists.sourceforge.net/lists/listinfo/pyparsing-users > > > ------------------------------------------------------------------------------ > Gaining the trust of online customers is vital for the success of any company > that requires sensitive data to be transmitted over the Web. Learn how to > best implement a security strategy that keeps consumers' information secure > and instills the confidence they need to proceed with transactions. > http://p.sf.net/sfu/oracle-sfdevnl |