[Pyparsing] This is a follow-up of my post in the forum
Brought to you by:
ptmcg
From: Francis V. <fra...@gm...> - 2009-09-24 06:10:55
|
I have the following data set I want to process: data = """ . BAGONG SILANG BRGY. I RABAGO REVENA, BARANGAY I (POB.), RABAGO, REVENA F 06/15/1925 SAMSON, JAMES HUBILLA 1111-0001A-F1567GHA2 1 . BAGONG SILANG BRGY. I RABAGO REVENA, BARANGAY I (POB.), RABAGO, REVENA M 05/14/1925 CRUZ, JOSE ENDAYA2 . BAGONG SILANG BRGY. I RABAGO REVENA, BARANGAY I (POB.), RABAGO, REVENA M 11/26/1925 PEREZ, JAMES ENDAYA 1111-0001A-K2661CEA1 3 . BRGY. 1 RABAGO REVENA, BARANGAY I (POB.), RABAGO, REVENA F 03/31/1925 CRUZ, RAMON CANTRE4 . BAGONG SILANG BRGY. I RABAGO REVENA, BARANGAY I (POB.), RABAGO, REVENA M 01/20/1925 RAMONCITO, CARLOS ENDAYA 1111-0001A-A2055LEA1 5 . #234, BARANGAY I (POB.), RABAGO, REVENA M 01/20/1925 CRUZ, SUSAN CANTRE 1111-0001A-A2079NCA1-6 6 . BAGONG SILANG BRGY. I RABAGO REVENA, BARANGAY I (POB.), RABAGO, REVENA F 06/03/1925 CRUZ, RAUL ENDAYA 1111-0001A-F0330OEA2 7 . BAGONG SILANG BRGY. I RABAGO REVENA, BARANGAY I (POB.), RABAGO, REVENA M 02/17/1925 JOSE, TEOFISTO ENDAYA8 . BARANGAY I RABAGO REVENA, BARANGAY I (POB.), RABAGO, REVENA M 11/08/1925 RAMONCITO, JOSEPH MASONGSONG 1111-0001A-K0869RMA1 9 . BARANGAY I RABAGO REVENA, BARANGAY I (POB.), RABAGO, REVENA F 12/10/1925 ARAGON, VINCENT GERANCE 1111-0001A-L1071VGA2 10 . BAGONG SILANG BRGY. I RABAGO REVENA, BARANGAY I (POB.), RABAGO, REVENA M 10/20/1925 PASTORA, JOBI SEPTIMO 1111-0001A-J2062DSA1 11 . BARANGAY I RABAGO REVENA, BARANGAY I (POB.), RABAGO, REVENA M 09/09/1925 CRUZ, CARLOS JR. AVENDAÑO 1111-0001A-I0981AAA1 12 . BARANGAY I RABAGO REVENA, BARANGAY I (POB.), RABAGO, REVENA F 10/16/1925 CRUZ, NANCY CASTOR 1111-0001A-J1680NCA2 13 . F. FULE ST. RABAGO REVENA, BARANGAY I (POB.), RABAGO, REVENA F 01/03/1925 CRUZ, CORY ABARCAR 1111-0001A-A0364CAA2 14 . 118 F. FULE ST., BARANGAY I (POB.), RABAGO, REVENA F 11/07/1925 JOSE, FREDA DIONGLAY 1111-0001A-K0723GDA2 15 . F. FULE ST. RABAGO REVENA, BARANGAY I (POB.), RABAGO, REVENA M 03/26/1925 ZAMORA, DANDING DIONGLAY 1111-0001A-C2663MDA1 16 """ NL = LineEnd().suppress() gender = oneOf("M F") integer = Word(nums) date = Combine(integer + '/' + integer + '/' + integer) # define the simple line definitions gender_line = gender("sex") + NL dob_line = date("DOB") + NL name_line = Word(alphas8bit + "," + alphas8bit) + NL id_line = Combine(Word(alphanums) + "-" + Word(alphanums) + "-" + Word(alphanums))("ID") + NL recnum_line = integer("recnum") + NL # define forms of address lines first_addr_line = LineStart() + Suppress('.') + empty + restOfLine + NL # a subsequent address line is any line that is not a gender definition subsq_addr_line = ~(gender_line) + restOfLine + NL # a line with a name and a recnum combined, if there is no ID name_recnum_line = originalTextFor(OneOrMore(Word(alphas+',')))("name") + \ integer("recnum") + NL record = (first_addr_line + ZeroOrMore(subsq_addr_line))("address") + \ gender_line + dob_line + ((name_line + id_line + recnum_line) | name_recnum_line) records = record.searchString(data) But it's not matching the "id_line". What's wrong with the id_line definition? |