Re: [Pyparsing] Get Better Error Messages - Prevent Backtracking
Brought to you by:
ptmcg
From: Paul M. <pt...@au...> - 2008-05-12 08:41:54
|
Eike, et al. - I'm happy to report that I've successfully added ErrorStop-like behavior to pyparsing. In the last 6 weeks or so, there has been a flurry of interest and comment on this feature, and between the various proposals, and some offline parser work (in which I was converting an EBNF to pyparsing), I finally got my thoughts to gel on how to add this important feature to pyparsing. I'll excerpt comments from a posting I made to the wiki a few hours ago (in response to a pyparsing user who needed to raise a syntax error from an expression wrapped in an Optional, and so proposed a mod to Optional to correct the problem): >>>>>>>> It turns out that this issue affects many parts of pyparsing, not just Optional. The root problem actually occurs in the And class, in that if a succession of expressions does not parse completely, than a routine ParseException is raised. For example, in your grammar, you found the need to modify Optional because you did not get the desired error location from: port_clause = "(" + ...body of port definition... + ")" entity = Literal("entity") + "(" + \ Optional( Keyword("port") + port_clause ) + \ ")" ParseException is "routine" because it is a way for any expression to indicate that no match occurred, and other alternatives should be tried. However, in this case, we want non-routine behavior. If the parser reads "port" and it is not followed by "(" and the other interesting port items, then the parser should stop immediately. This is a different flavor of And - when "port" is read, you know that the next items in the string should be the port data, and if it isn't then this is a syntax error. Since normal And sequencing is defined using '+' signs, I'm trying to insert the syntax error trapping using another operator. The logical choice for this operator would be '-'; it is equal to '+' in precedence, and it is visually intuitive as a sequence connector. The distinction will be that, if a parser error occurs after passing the '-' operator, then this error will be flagged immediately as a syntax error. (I am adding the exception class ParseSyntaxException, derived from ParseFatalException.) In your case, your code would become: port_clause = "(" + ...body of port definition... + ")" entity = Literal("entity") + "(" + \ Optional( Keyword("port") - port_clause ) + \ ")" The syntax would be the same if Optional were replaced with ZeroOrMore, OneOrMore, or any of the other repetition classes. It is possible now to have a lot of control over just where syntax errors get signaled. You could define an expression as: expr = A + B + C - D + E + F and any parsing mismatch after having matched A, B, and C would be raised as a syntax error, and parsing would stop immediately. <<<<<<<<<<< So that's it. To implement ErrorStop, I've just added the '-' operator, so that "A - B + C" becomes "A + errorStop + B + C". ErrorStop itself is implemented as a private, internal class to And, and I modified And's parseImpl method to do the right thing when detecting errorStop. It should be noted that you shouldn't just blindly go replacing all of your '+' operators with '-'s; backtracking *is* an important feature for most grammars. The general rule for using '-' is to insert it after an element in your grammar that unambiguously determines a particular path in the grammar, so that backtracking would not find any better match. If you want to experiment with this new feature, you can download it from the pyparsing SVN repository on SourceForge. (You'll note that I've bumped the version to 1.5.0 with this update - the number of new features is really moving us to another level of the package, so I'm probably a little overdue in calling this 1.5.0 instead of 1.4.*.) Since early in the life of pyparsing, I have been writing apologetic e-mails about pyparsing's inability to report helpful syntax error locations. I hope this new feature will help address this deficiency. Thanks to all! -- Paul |