Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

PayPal notification email parsing

2008-03-28
2013-05-14
  • I need to parse a PayPal email, got it mostly but have a problem with the following.

    The postal address with this email can be either:

    NameOfSomeone's UNCONFIRMED Address
    -----------------------------------

    NameOfSomeone
    addressline 1
    addressline 2
    Postal, city line
    Country

    or:
    NameOfSomeone's UNCONFIRMED Address
    -----------------------------------

    NameOfSomeone
    addressline 1
    Postal, city line
    Country

    Also note that "UNCONFIRMED" my be "CONFIRMED".

    I can't make it work for both formats, fragments of my code are:

            alphanums = pyp.Word(pyp.alphanums)
            addLine = pyp.Combine((pyp.OneOrMore(alphanums) + pyp.restOfLine))
            whiteLine = pyp.FollowedBy(pyp.Suppress(pyp.White()))

            # address
            str_address = (pyp.Suppress("UNCONFIRMED Address") | pyp.Suppress("CONFIRMED Address")) + pyp.Suppress(pyp.Word("-")) +\                 addLine + addLine + addLine + addLine + whiteLine
                   
            address = str_address.setParseAction(self.str_addressAction)

            self.grammar = str_buyer | str_transid | str_itemnum | str_address

    If I add another "addLine" then I get garbage on the 4 line version.

    Any hints on how I could do this would be very appreciated.

    Werner

     
    • I think I figured it out, with lots of try and error.

      As the address is followed by at least one blank line I do this now.

              addLine = pyp.Combine((pyp.OneOrMore(alphanums) + pyp.restOfLine))
              emptyLine = pyp.LineStart() + pyp.LineEnd()

              # address
              str_address = ((pyp.Suppress("UNCONFIRMED Address") | pyp.Suppress("CONFIRMED Address")) + pyp.Suppress(pyp.Word("-")) +\                 addLine + addLine + addLine + (addLine + emptyLine.suppress() ^ (addLine + addLine + emptyLine.suppress()) ))

      Werner

       
      • Paul McGuire
        Paul McGuire
        2008-03-29

        Werner -

        This is certainly one way to do this.  I wonder if you would try an experiment for me though, since there is a new feature that was released in pyparsing 1.4.9, the ability to multiply an expression by an exact number of repetitions, or by a min-max tuple of repetitions.

        Using the min-max tuple, I think your str_address will simplify from:

        str_address = ((pyp.Suppress("UNCONFIRMED Address") | pyp.Suppress("CONFIRMED Address")) + pyp.Suppress(pyp.Word("-")) +\
        addLine + addLine + addLine + (addLine + emptyLine.suppress() ^ (addLine + addLine + emptyLine.suppress()) ))

        to:

        str_address = ((pyp.Suppress("UNCONFIRMED Address") | pyp.Suppress("CONFIRMED Address")) + pyp.Suppress(pyp.Word("-")) +\
        addLine*(4,5) + emptyLine.suppress()

        (not sure, I might have a mismatched paren in there or two)

        Also, please check out the Examples page of the pyparsing wiki, you may find some help in parsing street addresses.

        Cheers!
        -- Paul

         
        • Paul,

          Thanks for the tip, will try it out tomorrow or Monday.

          I have seen the screen parsing code, but as I store the address in one block I don't need this - at least not yet.

          Werner

           
        • Paul,

          Just upgraded to 1.4.11.

          addLine*(4,5) doesn't work for me.

          Please note that I also had to change the following code:
                  emptyLine = pyp.LineStart() + pyp.LineEnd()
          to:
                  emptyLine = pyp.LineEnd()

          To make my code work again.

          If you want me to try something else, or can provide you more info just let me know.

          Werner