[Pyparsing] Strategies for use with ParseFile
Brought to you by:
ptmcg
From: <dav...@l-...> - 2008-01-22 17:43:43
|
All, Been using pyparsing for a long time, and I feel like I'm using it in a poor fashion, as it seems to be quite cumbersome to use. Some background: We need to parse text files that are routinely hundreds of thousands of lines long. The grammar is rather complicated (guesstimate of 300 rules). The grammer is stored in a class, with each rule a static class variable. I have another class (a parser) that subscribes to rule subsets through the usage of "setParseAction" for the interesting rules. When an interesting rule is encountered, my parser class is called. It then pulls out the interesting tokens, constructs a python object, and then it fires a callback function, where an interested user of this data can act upon it. Now, and "interesting" rule may be composed of say, 10 subrules. I don't need their info individually, but I can get it though the composite object. So, two questions: 1.) Any easy way to retrieve original text for an entire EDT below 2.) Any suggestions for better organization of the data. I've thought about some inheritence usage because the file has header data, and oneOrMore() of 6 different "things" (one of which is a EDT illustrated below), but seems like a bit of a shoehorn. Thanks ------------------------------------------------------------------------ --------------------------------------------- class Grammar: <snip> EnumeratedDataType =3D \ Keyword("(EnumeratedDataType") + \ EDT_Name + \ Optional(EDT_Description) + \ Optional(EDT_MomEnumeratedDataType) + \ EDT_AutoSequence + \ Optional(EDT_Description) + \ EDT_StartValue + \ OneOrMore(EDT_Enumeration) + \ ")";=20 ------------------------------------------------------------------------ --------------------------------------------- class Parser: <snip> def __EDT_setParseActions__(self): """Set the parse actions for the EDT elements""" Grammar.EnumeratedDataType.setParseAction(self.__EDT__); # These can all be handled identically. One of each only. Grammar.EDT_Name.setParseAction(self.__EDT_Element__); =20 Grammar.EDT_MomEnumeratedDataType.setParseAction(self.__EDT_Element__); Grammar.EDT_AutoSequence.setParseAction(self.__EDT_Element__); Grammar.EDT_Description.setParseAction(self.__EDT_Element__); Grammar.EDT_StartValue.setParseAction(self.__EDT_Element__); =20 # You can have one or more of these =20 Grammar.EDT_Enumeration.setParseAction(self.__EDT_Enumeration__); =20 Grammar.EDT_Enumerator.setParseAction(self.__EDT_Enum_Element__); =20 Grammar.EDT_Representation.setParseAction(self.__EDT_Enum_Element__); =20 def __EDT__(self, s, l, toks): # Fire the EDT callback and reset the parent. We've already stored the # data we care about self.__fireCallback__(OMDParser.EDT_TOKEN, self.__ParentElement__); self.__ResetParent__(); def __EDT_Element__(self, s, l, toks): """ This method is called whenever we encounter an EDT element. We add the element to the __ParentElement__ dictionary =20 """ # Init the parent, and add the parsed item self.__InitParent__(self.EDT_TOKEN); =20 self.__ParentElement__.addKey(toks[0], toks[1]); def __EDT_Enumeration__(self, s, l, toks): """=20 This method is called whenever an enumeration is fully parsed. We must now add it to the parent element and reset the child """ self.__ParentElement__.appendKey("Enumerations", self.__ChildElement__); self.__ResetChild__(); def __EDT_Enum_Element__(self, s, l, toks): """ This method is called whenever we encounter an Enumeration element We add the element to the CurrEnumeration dictionary=20 """ # Initialize the child element, and set the current element self.__InitChild__("Enumeration"); =20 self.__ChildElement__.addKey(toks[0], toks[1]);=20 ------------------------------------------------------------------------ --------------------------------------------- USAGE!!!: def gotEDT(EDT): print EDT; # Start of "Main" function =20 if __name__ =3D=3D "__main__": op =3D Parser(<fileName>); op.registerCallback(OMDParser.EDT_TOKEN, gotEDT); |