Re: [Pyparsing] Strategies for use with ParseFile

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

 I think you answered my main concerns with using the framework.  Some
of the more "strategic" questions don't seem to be answered well in the
documentation, and getting a third party perspective is certainly
useful.

I've heard of several things with the __somefunc__ naming.  One main
thing that I've heard, is it's more like the equivalent to "private"
functions than "magic" functions.  It's a way to make it glaringly
obvious what a class user should keep their sticky little hands out of
:). While not shown in my example, the Parser class does have "public"
functions for subscription to the resultant objects.

The whole framework comes about from the fact that this is really meant
to be a generic parser. There can be quite a few different utility
"things" that can be done w/ the parsed data, and it is really quite
useful to have generic callbacks that different classes can use.  Like I
mentioned previously, the Grammar and file-size make the whole parsing
of the file quite an ordeal, but pyparsing is much easier to use than
the corresponding (ugly as hell) perl framework that we used to use. =20

P.S. I would like to personally thank you for one of the most well
structured, thoughtful (as in you put a lot of thought into it :-P ),
and useful responses to anything I've ever posted on *any* mailing list

Thanks!

> -----Original Message-----
> From: Paul McGuire [mailto:pt...@au...]=20
> Sent: Tuesday, January 22, 2008 10:52 PM
> To: Weber, David C @ Link; pyp...@li...
> Subject: RE: [Pyparsing] Strategies for use with ParseFile
>=20
> David -
>=20
> This does seem fairly complicated, but I think your approach=20
> in using parse actions as parse-time callbacks to build a=20
> data structure is actually pretty typical.
>=20
> To answer your specific questions:
> 1. There is a parse action keepOriginalText which may do the=20
> trick for you.
> Maybe this example would help:
>=20
> from pyparsing import *
>=20
> a_s =3D Word("a")
> b_s =3D Word("b")
> c_s =3D Word("c")
>=20
> allwords =3D a_s + b_s + c_s
> def showTokens(tokens):
>     print "Showing tokens:", tokens.asList()
>    =20
> allwords.setParseAction(showTokens, keepOriginalText, showTokens)
> allwords.parseString("aaaaa  bbbb   cccc")
>=20
>=20
> Prints:
> Showing tokens: ['aaaaa', 'bbbb', 'cccc']
> Showing tokens: ['aaaaa  bbbb   cccc']
> =20
> When allwords is parsed, the 3 parse actions are called in=20
> turn.  First showTokens is called with the individual tokens=20
> returned from matching a_s, b_s, and c_s.  Then=20
> keepOriginalText is called that changes the matched tokens=20
> back to the original text.  Then showTokens is called again=20
> to show the effect of calling keepOriginalText.  Does this help?
>=20
> 2. I don't really have much to go on to answer your second=20
> question.  It
> *is* possible that you don't need multiple callbacks to=20
> create Python objects and return them.  Instead, you can just=20
> have the related class define __init__ to accept the tokens=20
> that are passed to a parse action, and just name the class as=20
> the parse action.  This will cause the __init__ method to be=20
> called with the matched tokens, and the constructed object=20
> will be returned to the parser.  There are examples of this=20
> in the Pycon presentation that ships with pyparsing,=20
> describing the interactive adventure game; there is an=20
> example in the pyparsing O'Reilly short cut, in which a query=20
> string getc converted to a sequence of classes.  For example:
>=20
> class XClass(object):
>     def __init__(self,tokens):
>         self.matchedText =3D tokens[0]
>     def __repr__(self):
>         return "%s:(%s)" % (self.__class__.__name__,self.matchedText)
> class AClass(XClass): pass
> class BClass(XClass): pass
> class CClass(XClass): pass
> a_s.setParseAction(AClass)
> b_s.setParseAction(BClass)
> c_s.setParseAction(CClass)
>=20
> allwords =3D a_s + b_s + c_s
>=20
> print allwords.parseString("aaaaa  bbbb   cccc").asList()
>=20
> Prints:
> [AClass:(aaaaa), BClass:(bbbb), CClass:(cccc)]
>=20
>=20
> Also, your naming convention is a little distracting, leading=20
> and trailing double-underscores are usually reserved for=20
> "magic" functions, such as __str__, __call__, etc.  So when=20
> you use them on your own class and method names, it looks=20
> confusing to me.
>=20
> Also, I don't know if you are gaining anything by burying=20
> different pyparsing expressions/rules inside class variables.=20
>  This sounds vaguely Java-esque to me.  In Python, things=20
> *can* exist outside of a class...
>=20
> I don't feel that I've really addressed all of your=20
> question/concern, can you distill this architecture down to=20
> some small examples, and repost?
> Otherwise, I'd say this is pretty much in line with how you=20
> would parse this data and use it to construct an overall data=20
> structure with it.
>=20
> -- Paul
>=20
>=20