Re: [Pyparsing] Proposed Visitor interface to parse strings
Brought to you by:
ptmcg
From: Helmut J. <jar...@ig...> - 2009-09-24 08:58:08
|
On 24 Sep, Paul McGuire wrote: > I am considering adding an interface to pyparsing along the lines of the > Visitor pattern. My intent is to make it easier to work with the scanString > method. Currently, when using scanString, one gets the tokens, start, and > end for each matching text in the input string. This forces the caller to > keep track of some low-level parsing state/locations if they need to do some > processing of the intervening text, or some other stateful work. By writing > a Visitor, this can be tracked in a more object-friendly way. > > Here's how the Visitor would work. The concept is that, after creating a > pyparsing grammar, one could define a class that implements a method > visitExpr, which receives a ParseResults containing the matching tokens, and > optionally a method visitIntervening, which receives a string containing the > portion of the input string between matches - call this class ParseVisitor. > The pyparsing grammar expression - let's call it expr - then accepts this > visitor, and gives us a callable object. This new object can now be called > with an input string, and the visitExpr and visitIntervening methods will > get called as the input string is parsed. Here is a sample: > > from pyparsing import * > > expr = Word(alphas) > > tests = """\ > ABC 123 DEF 456 > ABC 123 DEF 456 XYZ > ABC 123 DEF 456 XYZ > 0 ABC 123 DEF 456 XYZ > """.splitlines() > > > class ParseVisitor(object): > def visitExpr(self, tokens): > print ">%s<" % tokens.asList(), > def visitIntervening(self, strng): > print "^%s^" % strng, > > > visitor = ParseVisitor() > processor = expr.accept(visitor) > for t in tests: > print t > processor(t) > print > print > > Prints out: > > ABC 123 DEF 456 > ^^ >['ABC']< ^ 123 ^ >['DEF']< ^ 456^ > > ABC 123 DEF 456 XYZ > ^^ >['ABC']< ^ 123 ^ >['DEF']< ^ 456 ^ >['XYZ']< > > ABC 123 DEF 456 XYZ > ^ ^ >['ABC']< ^ 123 ^ >['DEF']< ^ 456 ^ >['XYZ']< > > 0 ABC 123 DEF 456 XYZ > ^0 ^ >['ABC']< ^ 123 ^ >['DEF']< ^ 456 ^ >['XYZ']< > > > (In the pure Visitor pattern, ParseVisitor would implement two different > methods, both named visit, with one taking a ParseResults and the other > taking a string. But since Python doesn't do function overloading, I've had > to give these different names. But now, how nice and explicit the resulting > class is!) > > What do people think of this idea? > Yes, that looks great! One question though, would be possible to let visitExpr and visitIntervening return a value (None or a bool) to indicate if they like to be called again or not for the remaining tokens? Helmut. -- Helmut Jarausch Lehrstuhl fuer Numerische Mathematik RWTH - Aachen University D 52056 Aachen, Germany |