Re: [Pyparsing] better parse error reporting

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Gre7g and Martin -

Thanks for the running thread on this "parse error capturing" patch.  I
confess I've not yet had time this week to read it or fully assess it.  I've
tried for a long time to get more accurate error locations in pyparsing.

The biggest problem I've had is in constructions like:

expr1 = OneOrMore( expr2 ) + expr3

and the input text contains:

valid_expr2 invalid_expr2 valid_expr2 valid_expr3

OneOrMore( expr2 ) succeeds after parsing the first valid_expr2, and moves
forward to try to parse an expr3.  Positioned at invalid_expr2, the parse of
expr3 fails, and you get an exception like "expected expr3 at (locn of
invalid_expr2)".

Or what is more likely occuring in many of these grammars is something like
this:

expr1 = OneOrMore( expr2 + expr3 + expr4 + expr5 + expr6 + expr7 + expr8 +
expr9 )

and the input text contains:

valid_expr2 ... valid_expr8 invalid_expr9

and the error says "expected expr1 at (locn of valid_expr2)" and you get
this puzzled "huh?" look on your face.

There is some code in pyparsing that tries to find the furthest successful
match, but the piecework nature the grammar (each object does its own
parsing in more or less isolation) means that I'm limited into how much
state I can pass up the chain, and in the case of a partially successful
OneOrMore(expr1) parse, I can't pass *anything* up.  The only alternative at
the moment is a global "here is the furthest I've gotten so far" variable,
which I suspect is the mechanism behind Gre7g's patch.

When I get some time, it might be worth revisiting pyparsing's design to see
if I can pass some form of state object from element to element, so that a
more suitable parse error location could be captured in it.

-- Paul