[Pyparsing] parse only lines starting with keyword
Brought to you by:
ptmcg
From: James H. <jam...@gm...> - 2014-02-14 13:43:31
|
Hello, I am trying to parse a block of text lines where the lines of interest always begin with a keyword from a set of known keywords. All other lines can be ignored, even if they contain the keyword which is not the first entry in that line. The code that follows almost works but stops when it meets an unwanted line containing one of the known keywords ('kw1' and 'kw2'). from pyparsing import * def main(): # A test string where I want to match all lines starting with # a keyword 'kw1' or 'kw2'. # Other lines should not be matched. test_string_1 = """ An unwanted line can contain anything kw2 par1 kw1 par1 2 another unwanted line kw1 opt 1 another unwanted line that contains a kw1 kw2 h1 yet another unwanted line kw1 = Literal("kw1") kw2 = Literal("kw2") keywords = (kw1 | kw2) kw1_record = (kw1 + Word(alphanums) + Word(nums) + restOfLine.suppress() + LineEnd().suppress()) kw2_record = (kw2 + Word(alphanums) + restOfLine.suppress() + LineEnd().suppress()) valid_records = (kw1_record | kw2_record) record = Group(SkipTo(keywords, include=False, ignore=None, failOn=None).suppress() + valid_records) all_records = ZeroOrMore(record) res = all_records.parseString(test_string_1) for entry in res: print entry if __name__ == '__main__': main() The output from this code is ['kw2', 'par1'] ['kw1', 'par1', '2'] ['kw1', 'opt', '1'] What is missing from the output is ['kw2', 'h1'] I am new to pyparsing and so I am probably missing something obvious. Is there a way to correct my code so that it does what I want? Or is there a better way to achieve my aims? I would be grateful for any suggestions. Thanks in advance, James |