Thread: [Pyparsing] Proposed Visitor interface to parse strings

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

I am considering adding an interface to pyparsing along the lines of the
Visitor pattern.  My intent is to make it easier to work with the scanString
method.  Currently, when using scanString, one gets the tokens, start, and
end for each matching text in the input string.  This forces the caller to
keep track of some low-level parsing state/locations if they need to do some
processing of the intervening text, or some other stateful work.  By writing
a Visitor, this can be tracked in a more object-friendly way.

Here's how the Visitor would work.  The concept is that, after creating a
pyparsing grammar, one could define a class that implements a method
visitExpr, which receives a ParseResults containing the matching tokens, and
optionally a method visitIntervening, which receives a string containing the
portion of the input string between matches - call this class ParseVisitor.
The pyparsing grammar expression - let's call it expr - then accepts this
visitor, and gives us a callable object.  This new object can now be called
with an input string, and the visitExpr and visitIntervening methods will
get called as the input string is parsed.  Here is a sample:

from pyparsing import *

expr = Word(alphas)

tests = """\
ABC 123 DEF 456
ABC 123 DEF 456 XYZ
 ABC 123 DEF 456 XYZ
0 ABC 123 DEF 456 XYZ
""".splitlines()

class ParseVisitor(object):
    def visitExpr(self, tokens):
        print ">%s<" % tokens.asList(),
    def visitIntervening(self, strng):
        print "^%s^" % strng,

visitor = ParseVisitor()
processor = expr.accept(visitor)
for t in tests:
    print t
    processor(t)
    print
    print

Prints out:

ABC 123 DEF 456
^^ >['ABC']< ^ 123 ^ >['DEF']< ^ 456^

ABC 123 DEF 456 XYZ
^^ >['ABC']< ^ 123 ^ >['DEF']< ^ 456 ^ >['XYZ']<

 ABC 123 DEF 456 XYZ
^ ^ >['ABC']< ^ 123 ^ >['DEF']< ^ 456 ^ >['XYZ']<

0 ABC 123 DEF 456 XYZ
^0 ^ >['ABC']< ^ 123 ^ >['DEF']< ^ 456 ^ >['XYZ']<

(In the pure Visitor pattern, ParseVisitor would implement two different
methods, both named visit, with one taking a ParseResults and the other
taking a string.  But since Python doesn't do function overloading, I've had
to give these different names.  But now, how nice and explicit the resulting
class is!)

What do people think of this idea?

-- Paul

Thread: [Pyparsing] Proposed Visitor interface to parse strings

pyparsing-users