Re: [Pyparsing] negative lookahead problem

 Re: [Pyparsing] negative lookahead problem From: Paul McGuire - 2008-06-06 00:21:34 ```Ken - Nice catch, and you were in the right *general* vicinity, but ultimately, name was not where the problem was. In fact, the problem was with type_. Here are the original definitions: TYPES = "Street St Boulevard Blvd Lane Ln Road Rd Avenue Ave " \ "Circle Cir Cove Cv Drive Dr Parkway Pkwy Court Ct" type_ = Combine( oneOf(TYPES, caseless=True) + Optional(".").suppress()) name = ~numberSuffix + Word(alphas) In parsing the street name, it is used like this: streetName = ( | Combine(OneOrMore(~type_ + name), joinString=" ",adjacent=False) ) That is, the street name is built up of one or more names, stopping when we reach a type_. You correctly found this to be a problem if the street name was "Main Drag", but I was confused why this would fail, but another test in which the name was "Deer Run" succeeded. The problem was pyparsing's implicit whitespace skipping, or rather *not* skipping in this case. "Drag" begins with "Dr", which matches one of the defined TYPES, and so the matching of words to compose streetName stops after reading "Main", assuming that the leading "Dr" of "Drag" is the street type "Dr". To illustrate other possible problem names, I added these tests: >>> p("100 Integrated Circuit Cir") name: Integrated Circuit, number: 100, type: Cir >>> p("100 Above Average IQ Ave.") name: Above Average IQ, number: 100, type: Ave >>> p("100 Big and Strong St.") name: Big and Strong, number: 100, type: St To fix this, I modified type_ to enforce that after matching the TYPES, that there should be no further word body characters - defined using ~Word(alphas). type_ = Combine( oneOf(TYPES, caseless=True) + ~Word(alphas) + Optional(".").suppress()) With this change (and reverting name back to its original form), all the new tests pass. I uploaded a new file to http://pyparsing.pastebin.com/m39133f55. (This also includes another bugfix that was separately reported, that numberSuffix was missing "rd", as in "53rd St".) I'll correct the example in the next release, and the online version on the pyparsing wiki. Also, thanks for the doctest example, I'll leave the tests in this form (especially since they are actual *tests* now!). -- Paul -----Original Message----- From: pyparsing-users-bounces@... [mailto:pyparsing-users-bounces@...] On Behalf Of Ken Kuhlman Sent: Thursday, June 05, 2008 2:15 PM To: pyparsing-users@... Subject: [Pyparsing] negative lookahead problem In the example at http://pastebin.com/m8248134, I've taken the streetAddressParser.py example and added a failing test to show that the street name grammar is too naive. I've been trying to fix it using negative lookahead.. is this the right general approach? My attempt causes pyparsing to loop endlessly -- any hints? I'm using version 1.5.0. thanks! -Ken ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _______________________________________________ Pyparsing-users mailing list Pyparsing-users@... https://lists.sourceforge.net/lists/listinfo/pyparsing-users ```

Thread view

 [Pyparsing] negative lookahead problem From: Ken Kuhlman - 2008-06-05 19:15:04 ```In the example at http://pastebin.com/m8248134, I've taken the streetAddressParser.py example and added a failing test to show that the street name grammar is too naive. I've been trying to fix it using negative lookahead.. is this the right general approach? My attempt causes pyparsing to loop endlessly -- any hints? I'm using version 1.5.0. thanks! -Ken ```
 Re: [Pyparsing] negative lookahead problem From: Paul McGuire - 2008-06-06 00:21:34 ```Ken - Nice catch, and you were in the right *general* vicinity, but ultimately, name was not where the problem was. In fact, the problem was with type_. Here are the original definitions: TYPES = "Street St Boulevard Blvd Lane Ln Road Rd Avenue Ave " \ "Circle Cir Cove Cv Drive Dr Parkway Pkwy Court Ct" type_ = Combine( oneOf(TYPES, caseless=True) + Optional(".").suppress()) name = ~numberSuffix + Word(alphas) In parsing the street name, it is used like this: streetName = ( | Combine(OneOrMore(~type_ + name), joinString=" ",adjacent=False) ) That is, the street name is built up of one or more names, stopping when we reach a type_. You correctly found this to be a problem if the street name was "Main Drag", but I was confused why this would fail, but another test in which the name was "Deer Run" succeeded. The problem was pyparsing's implicit whitespace skipping, or rather *not* skipping in this case. "Drag" begins with "Dr", which matches one of the defined TYPES, and so the matching of words to compose streetName stops after reading "Main", assuming that the leading "Dr" of "Drag" is the street type "Dr". To illustrate other possible problem names, I added these tests: >>> p("100 Integrated Circuit Cir") name: Integrated Circuit, number: 100, type: Cir >>> p("100 Above Average IQ Ave.") name: Above Average IQ, number: 100, type: Ave >>> p("100 Big and Strong St.") name: Big and Strong, number: 100, type: St To fix this, I modified type_ to enforce that after matching the TYPES, that there should be no further word body characters - defined using ~Word(alphas). type_ = Combine( oneOf(TYPES, caseless=True) + ~Word(alphas) + Optional(".").suppress()) With this change (and reverting name back to its original form), all the new tests pass. I uploaded a new file to http://pyparsing.pastebin.com/m39133f55. (This also includes another bugfix that was separately reported, that numberSuffix was missing "rd", as in "53rd St".) I'll correct the example in the next release, and the online version on the pyparsing wiki. Also, thanks for the doctest example, I'll leave the tests in this form (especially since they are actual *tests* now!). -- Paul -----Original Message----- From: pyparsing-users-bounces@... [mailto:pyparsing-users-bounces@...] On Behalf Of Ken Kuhlman Sent: Thursday, June 05, 2008 2:15 PM To: pyparsing-users@... Subject: [Pyparsing] negative lookahead problem In the example at http://pastebin.com/m8248134, I've taken the streetAddressParser.py example and added a failing test to show that the street name grammar is too naive. I've been trying to fix it using negative lookahead.. is this the right general approach? My attempt causes pyparsing to loop endlessly -- any hints? I'm using version 1.5.0. thanks! -Ken ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _______________________________________________ Pyparsing-users mailing list Pyparsing-users@... https://lists.sourceforge.net/lists/listinfo/pyparsing-users ```