Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

Total Newbie: Problems with implementation

krrk
2005-03-31
2013-05-14
  • krrk
    krrk
    2005-03-31

    I am a total newbie with Python and especially PyParsing.
    I am trying to write a little script that reads a file in the format:

    name    tel
    street    zip city

    in between the fields there might be special characters from time to time. So far I have managed to write a PyParsing Grammar that recognizes names (single names, names made up from more than one words and names including more than one words and special characters), tel (always in a format like x-y, where x and why can vary in length), street (same as name, one or more words, special characters recognized), zip (always a 5-digit number) and city (one or more words, no special characters).

    Everything works fine on test data, that has the 2 lines seperated

    name      tel
    name      tel
    .
    .
    .

    or

    street     zip city
    street     zip city

    I have no been able to get this to work on files that are in the format required

    name    tel
    street    zip city
    name    tel
    street    zip city
    .
    .
    .
    I did read the post https://sourceforge.net/forum/forum.php?thread_id=1224566&forum_id=337293 but have not been able to get it to work. When I print the result, only the very last entry is printed (actually only the last 2 words of the last city, which should contain 3 words). Again, when parsing a test.file it works just fine.

    The other big problem is getting everything written in a file. I tried to just pickle my result (pickle dump (output, result), but python just complains about " 'str' object is not callable ".

    Any idea on either problem would be greatly appreciated.

    Nils

     
    • krrk
      krrk
      2005-03-31

      Sorry about this seconds post.. I meant to write pickle.dump(result, output)

       
    • Paul McGuire
      Paul McGuire
      2005-03-31

      1. Are you using scanString or parseString?  Posting some code and/or grammar fragments, plus some sample data, may be helpful here.
      2. Parsing results are returned as ParseResults objects, which may not pickle nicely.  Try pickling results.asList(), which will collapse the tokens down to a nested list.

      -- Paul

       
    • krrk
      krrk
      2005-04-14

      thanks for your replay, sorry that I could not get back earlier.

      I do not have access to the actual data right now, but I'll try to give an example.

      Burger King   1234-45679
      Rue de Fontane 12    45458  Chablis

      PennyArcade 2000  02315-4567897
      Highway 15      32154  Dollarville

      B&O   44444-7874564
      Ruppstr. 44  45454 Whateverville

      What I basically need is just having the 5 fields in a single line, seperated by a semicolon. Is there something like "beginingofLine"? If there was it would be a lot easier for me. As it is now, I am struggling with getting the name as a single string (when I use combine, only the very first word is in the string, the rest of the name is thrown into the rest of the data randomly). Same goes for the street. How do I write a grammar, that parses the name and returns a single string? The tel-nr. does work, zip codes does (big thing :)) and the city is basically a restofLine in the second line. So if anyone could give me a hint (or a hand), I'd appreciate it.

       
      • Paul McGuire
        Paul McGuire
        2005-04-15

        Okay, so here's how I interpret your grammar:
        Field 1 - unpredictable, everything up to the phone number
        Field 2 - phone number, format is some digits, a dash, and some more digits, all adjacent
        Field 3 - unpredictable, everything up to a zip code
        Field 4 - zip code, 5 adjacent digits
        Field 5 - everything following the zip code, up to the end of line

        Ok, so you have working grammar fragments for phone number and zip code, probably something like:
        phoneNum = Combine(Word(nums) + "-" + Word(nums)).setResultsName("phone")
        zipCode = Word(nums,exact=5).setResultsName("zip")

        Let's try SkipTo for the fields that we can only define as "everything up to X":
        name = SkipTo( phoneNum ).setResultsName("name")
        street = SkipTo( zipCode ).setResultsName("street")
        city = restOfLine.setResultsName("city")

        Now a single entry looks like:
        entry = Group( name + phoneNum + street + zipCode + city )

        Here's your whole program with all the plumbing:

        testdata = """
        Burger King 1234-45679
        Rue de Fontane 12 45458 Chablis

        PennyArcade 2000 02315-4567897
        Highway 15 32154 Dollarville

        B&O 44444-7874564
        Ruppstr. 44 45454 Whateverville
        """

        from pyparsing import *

        phoneNum = Combine(Word(nums) + "-" + Word(nums)).setResultsName("phone")
        zipCode = Word(nums,exact=5).setResultsName("zip")
        name = SkipTo( phoneNum ).setResultsName("name")
        street = SkipTo( zipCode ).setResultsName("street")
        city = restOfLine.setResultsName("city")

        # Use these alternate forms to skip whitespace before characters
        #~ name = Combine(empty + SkipTo( phoneNum ) ).setResultsName("name")
        #~ street = Combine(empty + SkipTo( zipCode ) ).setResultsName("street")
        #~ city = Combine( empty + restOfLine ).setResultsName("city")

        entry = Group( name + phoneNum + street + zipCode + city )

        results = OneOrMore(entry).parseString( testdata )

        for r in results:
            print "-",r.name
            print "-",r.street
            print "-",r.city
            print "-",r.zip
            print "-",r.phone
            print

        Good luck,
        -- Paul