Re: [Pyparsing] Strategies for use with ParseFile
Brought to you by:
ptmcg
From: Paul M. <pt...@au...> - 2008-01-23 04:51:47
|
David - This does seem fairly complicated, but I think your approach in using parse actions as parse-time callbacks to build a data structure is actually pretty typical. To answer your specific questions: 1. There is a parse action keepOriginalText which may do the trick for you. Maybe this example would help: from pyparsing import * a_s = Word("a") b_s = Word("b") c_s = Word("c") allwords = a_s + b_s + c_s def showTokens(tokens): print "Showing tokens:", tokens.asList() allwords.setParseAction(showTokens, keepOriginalText, showTokens) allwords.parseString("aaaaa bbbb cccc") Prints: Showing tokens: ['aaaaa', 'bbbb', 'cccc'] Showing tokens: ['aaaaa bbbb cccc'] When allwords is parsed, the 3 parse actions are called in turn. First showTokens is called with the individual tokens returned from matching a_s, b_s, and c_s. Then keepOriginalText is called that changes the matched tokens back to the original text. Then showTokens is called again to show the effect of calling keepOriginalText. Does this help? 2. I don't really have much to go on to answer your second question. It *is* possible that you don't need multiple callbacks to create Python objects and return them. Instead, you can just have the related class define __init__ to accept the tokens that are passed to a parse action, and just name the class as the parse action. This will cause the __init__ method to be called with the matched tokens, and the constructed object will be returned to the parser. There are examples of this in the Pycon presentation that ships with pyparsing, describing the interactive adventure game; there is an example in the pyparsing O'Reilly short cut, in which a query string getc converted to a sequence of classes. For example: class XClass(object): def __init__(self,tokens): self.matchedText = tokens[0] def __repr__(self): return "%s:(%s)" % (self.__class__.__name__,self.matchedText) class AClass(XClass): pass class BClass(XClass): pass class CClass(XClass): pass a_s.setParseAction(AClass) b_s.setParseAction(BClass) c_s.setParseAction(CClass) allwords = a_s + b_s + c_s print allwords.parseString("aaaaa bbbb cccc").asList() Prints: [AClass:(aaaaa), BClass:(bbbb), CClass:(cccc)] Also, your naming convention is a little distracting, leading and trailing double-underscores are usually reserved for "magic" functions, such as __str__, __call__, etc. So when you use them on your own class and method names, it looks confusing to me. Also, I don't know if you are gaining anything by burying different pyparsing expressions/rules inside class variables. This sounds vaguely Java-esque to me. In Python, things *can* exist outside of a class... I don't feel that I've really addressed all of your question/concern, can you distill this architecture down to some small examples, and repost? Otherwise, I'd say this is pretty much in line with how you would parse this data and use it to construct an overall data structure with it. -- Paul |