Re: [Pyparsing] negative lookahead problem

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Ken -

Nice catch, and you were in the right *general* vicinity, but ultimately,
name was not where the problem was.

In fact, the problem was with type_.  Here are the original definitions:

TYPES = "Street St Boulevard Blvd Lane Ln Road Rd Avenue Ave " \
        "Circle Cir Cove Cv Drive Dr Parkway Pkwy Court Ct"
type_ = Combine( oneOf(TYPES, caseless=True) + Optional(".").suppress())
name = ~numberSuffix + Word(alphas)

In parsing the street name, it is used like this:

streetName = ( <blah blah... numbered street definition>
                | Combine(OneOrMore(~type_ + name), joinString="
",adjacent=False) )

That is, the street name is built up of one or more names, stopping when we
reach a type_.

You correctly found this to be a problem if the street name was "Main Drag",
but I was confused why this would fail, but another test in which the name
was "Deer Run" succeeded.  The problem was pyparsing's implicit whitespace
skipping, or rather *not* skipping in this case.  "Drag" begins with "Dr",
which matches one of the defined TYPES, and so the matching of words to
compose streetName stops after reading "Main", assuming that the leading
"Dr" of "Drag" is the street type "Dr".

To illustrate other possible problem names, I added these tests:

>>> p("100 Integrated Circuit Cir")
name: Integrated Circuit, number: 100, type: Cir

>>> p("100 Above Average IQ Ave.")
name: Above Average IQ, number: 100, type: Ave

>>> p("100 Big and Strong St.")
name: Big and Strong, number: 100, type: St

To fix this, I modified type_ to enforce that after matching the TYPES, that
there should be no further word body characters - defined using
~Word(alphas).

type_ = Combine( oneOf(TYPES, caseless=True) + ~Word(alphas) +
Optional(".").suppress())

With this change (and reverting name back to its original form), all the new
tests pass.

I uploaded a new file to http://pyparsing.pastebin.com/m39133f55.  (This
also includes another bugfix that was separately reported, that numberSuffix
was missing "rd", as in "53rd St".)

I'll correct the example in the next release, and the online version on the
pyparsing wiki.  Also, thanks for the doctest example, I'll leave the tests
in this form (especially since they are actual *tests* now!).

-- Paul

-----Original Message-----
From: pyp...@li...
[mailto:pyp...@li...] On Behalf Of Ken
Kuhlman
Sent: Thursday, June 05, 2008 2:15 PM
To: pyp...@li...
Subject: [Pyparsing] negative lookahead problem

In the example at http://pastebin.com/m8248134, I've taken the
streetAddressParser.py example and added a failing test to show that the
street name grammar is too naive.

I've been trying to fix it using negative lookahead.. is this the right
general approach?  My attempt causes pyparsing to loop endlessly
-- any hints?

I'm using version 1.5.0.

thanks!
-Ken

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for just about anything Open
Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Pyparsing-users mailing list
Pyp...@li...
https://lists.sourceforge.net/lists/listinfo/pyparsing-users