Thread: [Pyparsing] Strategies for use with ParseFile

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

All,
   Been using pyparsing for a long time, and I feel like I'm using it in
a poor fashion, as it seems to be quite cumbersome to use.

Some background:

We need to parse text files that are routinely hundreds of thousands of
lines long.

The grammar is rather complicated (guesstimate of 300 rules).  The
grammer is stored in a class, with each rule a static class variable.

I have another class (a parser) that subscribes to rule subsets through
the usage of "setParseAction" for the interesting rules.  When an
interesting rule is encountered, my parser class is called.  It then
pulls out the interesting tokens, constructs a python object, and then
it fires a callback function, where an interested user of this data can
act upon it.

Now, and "interesting" rule may be composed of say, 10 subrules.  I
don't need their info individually, but I can get it though the
composite object.

So, two questions:
1.) Any easy way to retrieve original text for an entire EDT below
2.) Any suggestions for better organization of the data.  I've thought
about some inheritence usage because the file has header data, and
oneOrMore() of 6 different "things" (one of which is a EDT illustrated
below), but seems like a bit of a shoehorn.

Thanks

------------------------------------------------------------------------
---------------------------------------------

class Grammar:
   <snip>

    EnumeratedDataType =3D                          \
        Keyword("(EnumeratedDataType")          + \
            EDT_Name                            + \
            Optional(EDT_Description)           + \
            Optional(EDT_MomEnumeratedDataType) + \
            EDT_AutoSequence                    + \
            Optional(EDT_Description)           + \
            EDT_StartValue                      + \
            OneOrMore(EDT_Enumeration)          + \
        ")";=20

------------------------------------------------------------------------
---------------------------------------------

class Parser:

    <snip>

    def __EDT_setParseActions__(self):
        """Set the parse actions for the EDT elements"""
        Grammar.EnumeratedDataType.setParseAction(self.__EDT__);

        # These can all be handled identically. One of each only.
        Grammar.EDT_Name.setParseAction(self.__EDT_Element__);
=20
Grammar.EDT_MomEnumeratedDataType.setParseAction(self.__EDT_Element__);
        Grammar.EDT_AutoSequence.setParseAction(self.__EDT_Element__);
        Grammar.EDT_Description.setParseAction(self.__EDT_Element__);
        Grammar.EDT_StartValue.setParseAction(self.__EDT_Element__);
       =20
        # You can have one or more of these
=20
Grammar.EDT_Enumeration.setParseAction(self.__EDT_Enumeration__);
=20
Grammar.EDT_Enumerator.setParseAction(self.__EDT_Enum_Element__);
=20
Grammar.EDT_Representation.setParseAction(self.__EDT_Enum_Element__);
   =20
    def __EDT__(self, s, l, toks):

        # Fire the EDT callback and reset the parent.  We've already
stored the
        # data we care about
        self.__fireCallback__(OMDParser.EDT_TOKEN,
self.__ParentElement__);
        self.__ResetParent__();

    def __EDT_Element__(self, s, l, toks):
        """
        This method is called whenever we encounter an EDT element.
        We add the element to the __ParentElement__ dictionary      =20
        """
        # Init the parent, and add the parsed item
        self.__InitParent__(self.EDT_TOKEN);       =20
        self.__ParentElement__.addKey(toks[0], toks[1]);

    def __EDT_Enumeration__(self, s, l, toks):
        """=20
        This method is called whenever an enumeration is fully parsed.
        We must now add it to the parent element and reset the child
        """
        self.__ParentElement__.appendKey("Enumerations",
self.__ChildElement__);
        self.__ResetChild__();

    def __EDT_Enum_Element__(self, s, l, toks):
        """
        This method is called whenever we encounter an Enumeration
element
        We add the element to the CurrEnumeration dictionary=20
        """
        # Initialize the child element, and set the current element
        self.__InitChild__("Enumeration");       =20
        self.__ChildElement__.addKey(toks[0], toks[1]);=20

------------------------------------------------------------------------
---------------------------------------------

USAGE!!!:

def gotEDT(EDT):
    print EDT;

# Start of "Main" function   =20
if __name__ =3D=3D "__main__":
    op =3D Parser(<fileName>);
    op.registerCallback(OMDParser.EDT_TOKEN, gotEDT);

Thread: [Pyparsing] Strategies for use with ParseFile

pyparsing-users