[Pyparsing] Proposed Visitor interface to parse strings
Brought to you by:
ptmcg
From: Paul M. <pt...@au...> - 2009-09-24 08:15:28
|
I am considering adding an interface to pyparsing along the lines of the Visitor pattern. My intent is to make it easier to work with the scanString method. Currently, when using scanString, one gets the tokens, start, and end for each matching text in the input string. This forces the caller to keep track of some low-level parsing state/locations if they need to do some processing of the intervening text, or some other stateful work. By writing a Visitor, this can be tracked in a more object-friendly way. Here's how the Visitor would work. The concept is that, after creating a pyparsing grammar, one could define a class that implements a method visitExpr, which receives a ParseResults containing the matching tokens, and optionally a method visitIntervening, which receives a string containing the portion of the input string between matches - call this class ParseVisitor. The pyparsing grammar expression - let's call it expr - then accepts this visitor, and gives us a callable object. This new object can now be called with an input string, and the visitExpr and visitIntervening methods will get called as the input string is parsed. Here is a sample: from pyparsing import * expr = Word(alphas) tests = """\ ABC 123 DEF 456 ABC 123 DEF 456 XYZ ABC 123 DEF 456 XYZ 0 ABC 123 DEF 456 XYZ """.splitlines() class ParseVisitor(object): def visitExpr(self, tokens): print ">%s<" % tokens.asList(), def visitIntervening(self, strng): print "^%s^" % strng, visitor = ParseVisitor() processor = expr.accept(visitor) for t in tests: print t processor(t) print print Prints out: ABC 123 DEF 456 ^^ >['ABC']< ^ 123 ^ >['DEF']< ^ 456^ ABC 123 DEF 456 XYZ ^^ >['ABC']< ^ 123 ^ >['DEF']< ^ 456 ^ >['XYZ']< ABC 123 DEF 456 XYZ ^ ^ >['ABC']< ^ 123 ^ >['DEF']< ^ 456 ^ >['XYZ']< 0 ABC 123 DEF 456 XYZ ^0 ^ >['ABC']< ^ 123 ^ >['DEF']< ^ 456 ^ >['XYZ']< (In the pure Visitor pattern, ParseVisitor would implement two different methods, both named visit, with one taking a ParseResults and the other taking a string. But since Python doesn't do function overloading, I've had to give these different names. But now, how nice and explicit the resulting class is!) What do people think of this idea? -- Paul |