Python parsing module / Discussion / Help/Open Discussion: Total Newbie: Problems with implementation

krrk - 2005-03-31

I am a total newbie with Python and especially PyParsing.
I am trying to write a little script that reads a file in the format:

name    tel
street    zip city

in between the fields there might be special characters from time to time. So far I have managed to write a PyParsing Grammar that recognizes names (single names, names made up from more than one words and names including more than one words and special characters), tel (always in a format like x-y, where x and why can vary in length), street (same as name, one or more words, special characters recognized), zip (always a 5-digit number) and city (one or more words, no special characters).

Everything works fine on test data, that has the 2 lines seperated

name      tel
name      tel
.
.
.

or

street     zip city
street     zip city

I have no been able to get this to work on files that are in the format required

name    tel
street    zip city
name    tel
street    zip city
.
.
.
I did read the post https://sourceforge.net/forum/forum.php?thread_id=1224566&forum_id=337293 but have not been able to get it to work. When I print the result, only the very last entry is printed (actually only the last 2 words of the last city, which should contain 3 words). Again, when parsing a test.file it works just fine.

The other big problem is getting everything written in a file. I tried to just pickle my result (pickle dump (output, result), but python just complains about " 'str' object is not callable ".

Any idea on either problem would be greatly appreciated.

Nils

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- krrk - 2005-03-31
  
  Sorry about this seconds post.. I meant to write pickle.dump(result, output)
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Paul McGuire - 2005-03-31
  
  1. Are you using scanString or parseString? Posting some code and/or grammar fragments, plus some sample data, may be helpful here.
  2. Parsing results are returned as ParseResults objects, which may not pickle nicely. Try pickling results.asList(), which will collapse the tokens down to a nested list.
  
  -- Paul
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- krrk - 2005-04-14
  
  thanks for your replay, sorry that I could not get back earlier.
  
  I do not have access to the actual data right now, but I'll try to give an example.
  
  Burger King   1234-45679
  Rue de Fontane 12    45458 Chablis
  
  PennyArcade 2000 02315-4567897
  Highway 15      32154 Dollarville
  
  B&O   44444-7874564
  Ruppstr. 44 45454 Whateverville
  
  What I basically need is just having the 5 fields in a single line, seperated by a semicolon. Is there something like "beginingofLine"? If there was it would be a lot easier for me. As it is now, I am struggling with getting the name as a single string (when I use combine, only the very first word is in the string, the rest of the name is thrown into the rest of the data randomly). Same goes for the street. How do I write a grammar, that parses the name and returns a single string? The tel-nr. does work, zip codes does (big thing :)) and the city is basically a restofLine in the second line. So if anyone could give me a hint (or a hand), I'd appreciate it.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Paul McGuire - 2005-04-15
    
    Okay, so here's how I interpret your grammar:
    Field 1 - unpredictable, everything up to the phone number
    Field 2 - phone number, format is some digits, a dash, and some more digits, all adjacent
    Field 3 - unpredictable, everything up to a zip code
    Field 4 - zip code, 5 adjacent digits
    Field 5 - everything following the zip code, up to the end of line
    
    Ok, so you have working grammar fragments for phone number and zip code, probably something like:
    phoneNum = Combine(Word(nums) + "-" + Word(nums)).setResultsName("phone")
    zipCode = Word(nums,exact=5).setResultsName("zip")
    
    Let's try SkipTo for the fields that we can only define as "everything up to X":
    name = SkipTo( phoneNum ).setResultsName("name")
    street = SkipTo( zipCode ).setResultsName("street")
    city = restOfLine.setResultsName("city")
    
    Now a single entry looks like:
    entry = Group( name + phoneNum + street + zipCode + city )
    
    Here's your whole program with all the plumbing:
    
    testdata = """
    Burger King 1234-45679
    Rue de Fontane 12 45458 Chablis
    
    PennyArcade 2000 02315-4567897
    Highway 15 32154 Dollarville
    
    B&O 44444-7874564
    Ruppstr. 44 45454 Whateverville
    """
    
    from pyparsing import *
    
    phoneNum = Combine(Word(nums) + "-" + Word(nums)).setResultsName("phone")
    zipCode = Word(nums,exact=5).setResultsName("zip")
    name = SkipTo( phoneNum ).setResultsName("name")
    street = SkipTo( zipCode ).setResultsName("street")
    city = restOfLine.setResultsName("city")
    
    # Use these alternate forms to skip whitespace before characters
    #~ name = Combine(empty + SkipTo( phoneNum ) ).setResultsName("name")
    #~ street = Combine(empty + SkipTo( zipCode ) ).setResultsName("street")
    #~ city = Combine( empty + restOfLine ).setResultsName("city")
    
    entry = Group( name + phoneNum + street + zipCode + city )
    
    results = OneOrMore(entry).parseString( testdata )
    
    for r in results:
        print "-",r.name
        print "-",r.street
        print "-",r.city
        print "-",r.zip
        print "-",r.phone
        print
    
    Good luck,
    -- Paul
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Total Newbie: Problems with implementation

Forums

Help

Total Newbie: Problems with implementation document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Total Newbie: Problems with implementation