Re: [Pyparsing] A newbie w/nested structures
Brought to you by:
ptmcg
From: Paul M. <pa...@al...> - 2007-09-09 09:30:58
|
Tim - First of all, this is a very ambitious parser to start with, so don't be discouraged. It is a recursive grammar, which is also a more advanced parser to start with. Here are some suggestions on getting started: - pick a part of the sample ADL (I would suggest working section by section) - develop a simple BNF for this grammar Here is a sample parser for the ontology section. It is a recursive example, defining a valueDef that is defined in terms of component valueDefs. It also shows the comment format, and the mechanism for skipping comments. I hope this sample gives you a jump start on a more complete ADL parser. -- Paul from pyparsing import * LT,GT,EQ,LPAR,RPAR,LBRK,RBRK,BAR,QUOT,SEMI = map(Suppress,"<>=()[]|';") upper = srange("[A-Z]") lower = upper.lower() attrName = Word(lower,alphanums+"_") key = attrName | (LBRK+quotedString+RBRK) quotedString.setParseAction(removeQuotes) valueDef = Forward() valueDef << ( key + EQ + LT + ZeroOrMore( Group(valueDef | quotedString )) + GT ) ontologySection = "ontology" + valueDef comment = "--" + restOfLine ontologySection.ignore(comment) sample = """ ontology term_definitions = < ["en"] = < items = < ["at0000"] = < description = <"Generic reporting composition in response to a request for information or testing"> text = <"Report"> > ["at0001"] = < description = <"@ internal @"> text = <"Tree"> > ["at0002"] = < description = <"Information about the request"> text = <"Request details"> > ["at0003"] = < description = <"Identification of the request"> text = <"Request identifier"> > ["at0004"] = < description = <"Information about the requesting clinician"> text = <"Requesting clinician"> > ["at0005"] = < description = <"The date of the request"> text = <"Date of request"> > ["at0006"] = < description = <"Details of the report"> text = <"Report details"> > ["at0007"] = < description = <"Identification information about the report"> text = <"Report identifier"> > ["at0008"] = < description = <"Collection of parties who have been copied the report"> text = <"Copies to"> > ["at0009"] = < description = <"Details of the parties to whom the copies have been copied"> text = <"Copied party details"> > ["at0010"] = < description = <"Collection of parties who have been referred to generate the report"> text = <"Referrals"> > ["at0011"] = < description = <"Details of the parties to whom the specimen or findings have been referred for special testing or elaboration"> text = <"Referred party details"> > ["at0012"] = < description = <"Details for contacting requesting clinician"> text = <"Contact details of requesting clinician"> > ["at0013"] = < description = <"The date and time the report was officially issued"> text = <"Date/time report issued"> > ["at0014"] = < description = <"The status of the report"> text = <"Status"> > ["at0015"] = < description = <"This report is the final report"> text = <"Final"> > ["at0016"] = < description = <"This report is an interim report and a final or further interim report is to be expected"> text = <"Interim"> > ["at0017"] = < description = <"This report is supplementary to a previous report"> text = <"Supplementary"> > ["at0018"] = < description = <"This report is a correction or amendment of a previous report"> text = <"Corrected/amended"> > > > > """ res = ontologySection.parseString(sample) from pprint import pprint pprint( res.asList() ) |