This is certainly one way to do this. I wonder if you would try an experiment for me though, since there is a new feature that was released in pyparsing 1.4.9, the ability to multiply an expression by an exact number of repetitions, or by a min-max tuple of repetitions.
Using the min-max tuple, I think your str_address will simplify from:
I need to parse a PayPal email, got it mostly but have a problem with the following.
The postal address with this email can be either:
NameOfSomeone's UNCONFIRMED Address
-----------------------------------
NameOfSomeone
addressline 1
addressline 2
Postal, city line
Country
or:
NameOfSomeone's UNCONFIRMED Address
-----------------------------------
NameOfSomeone
addressline 1
Postal, city line
Country
Also note that "UNCONFIRMED" my be "CONFIRMED".
I can't make it work for both formats, fragments of my code are:
alphanums = pyp.Word(pyp.alphanums)
addLine = pyp.Combine((pyp.OneOrMore(alphanums) + pyp.restOfLine))
whiteLine = pyp.FollowedBy(pyp.Suppress(pyp.White()))
# address
str_address = (pyp.Suppress("UNCONFIRMED Address") | pyp.Suppress("CONFIRMED Address")) + pyp.Suppress(pyp.Word("-")) +\ addLine + addLine + addLine + addLine + whiteLine
address = str_address.setParseAction(self.str_addressAction)
self.grammar = str_buyer | str_transid | str_itemnum | str_address
If I add another "addLine" then I get garbage on the 4 line version.
Any hints on how I could do this would be very appreciated.
Werner
I think I figured it out, with lots of try and error.
As the address is followed by at least one blank line I do this now.
addLine = pyp.Combine((pyp.OneOrMore(alphanums) + pyp.restOfLine))
emptyLine = pyp.LineStart() + pyp.LineEnd()
# address
str_address = ((pyp.Suppress("UNCONFIRMED Address") | pyp.Suppress("CONFIRMED Address")) + pyp.Suppress(pyp.Word("-")) +\ addLine + addLine + addLine + (addLine + emptyLine.suppress() ^ (addLine + addLine + emptyLine.suppress()) ))
Werner
Werner -
This is certainly one way to do this. I wonder if you would try an experiment for me though, since there is a new feature that was released in pyparsing 1.4.9, the ability to multiply an expression by an exact number of repetitions, or by a min-max tuple of repetitions.
Using the min-max tuple, I think your str_address will simplify from:
str_address = ((pyp.Suppress("UNCONFIRMED Address") | pyp.Suppress("CONFIRMED Address")) + pyp.Suppress(pyp.Word("-")) +\
addLine + addLine + addLine + (addLine + emptyLine.suppress() ^ (addLine + addLine + emptyLine.suppress()) ))
to:
str_address = ((pyp.Suppress("UNCONFIRMED Address") | pyp.Suppress("CONFIRMED Address")) + pyp.Suppress(pyp.Word("-")) +\
addLine*(4,5) + emptyLine.suppress()
(not sure, I might have a mismatched paren in there or two)
Also, please check out the Examples page of the pyparsing wiki, you may find some help in parsing street addresses.
Cheers!
-- Paul
Paul,
Thanks for the tip, will try it out tomorrow or Monday.
I have seen the screen parsing code, but as I store the address in one block I don't need this - at least not yet.
Werner
Paul,
Just upgraded to 1.4.11.
addLine*(4,5) doesn't work for me.
Please note that I also had to change the following code:
emptyLine = pyp.LineStart() + pyp.LineEnd()
to:
emptyLine = pyp.LineEnd()
To make my code work again.
If you want me to try something else, or can provide you more info just let me know.
Werner