Thread: [Pyparsing] Get Better Error Messages - Prevent Backtracking
Brought to you by:
ptmcg
From: Eike W. <eik...@gm...> - 2008-04-13 23:16:40
|
I want to propose an other way, how to get better error messages: Prevent backtracking at certain points in the grammar. I have attached a simple example implementation that works with Pyparsing 1.4.5. Often it is clear that, when a certain parser fails, there is surely an error in the input. Backtracking in this situation is bad, because information about the location and the cause of the error might be lost. A common situation is after a keyword. Imagine a programming language, where you define variables like this: data a, b, c: Real; After the 'data' keyword there must be a list of variable names, a ':' character, the name of a type and a ';' character. If this pattern does not appear after the 'data' keyword there is an error in the program. (See lines 46, 47 in the example program.) A usual parser for this statement would look like this: dataDef1 = Group(Keyword('data') + attrNameList + ':' + identifier + ';') I propose the 'ErrStop' class, an additional 'ParserElement', that stops parsing when a parser given to it fails. It is used like this: dataDef2 = Group(Keyword('data') + ErrStop(attrNameList + ':' + identifier + ';')) In the example program the 'data' statement is combined with two additional statements, 'foo1;' and 'foo2;' to form a programming language. The program output shows that using the 'ErrStop' class really preserves more information about the error (missing comma after position 15). Additionally the parse result of a successful run is not altered. The program output: Test regular parser: [['foo1', ';'], ['data', ['a', ',', 'a1', ',', 'b'], ':', 'Real', ';'], ['foo1', ';']] Expected end of text (at char 6), (line:1, col:7) Test parser with backtracking stop: [['foo1', ';'], ['data', ['a', ',', 'a1', ',', 'b'], ':', 'Real', ';'], ['foo1', ';']] Expected ":" (at char 17), (line:1, col:18) |
From: Eike W. <eik...@gm...> - 2008-04-13 23:55:03
|
It seems to be impossible to attach files to mails on this list. Therefore I have put the example program on an FTP server. Here is the directory: ftp://ftp.berlios.de/pub/freeode/ The file name is: test_parse_stop.py |
From: Ralph C. <ra...@in...> - 2008-04-14 09:12:44
|
Hi Erik, > It seems to be impossible to attach files to mails on this list. > Therefore I have put the example program on an FTP server. Here is the > directory: > > ftp://ftp.berlios.de/pub/freeode/ > > The file name is: > test_parse_stop.py You `stop backtracking' method of error handing seems good, however, I'm having trouble getting to the above example. $ lftp -c 'get ftp://ftp.berlios.de/pub/freeode/test_parse_stop.py' get: Login failed: 530 Sorry, you may not connect more then 2 times. $ wget(1) also fails, but with less of a diagnostic. What about putting it on http://pastebin.com/ saying it should remain `forever'. Cheers, Ralph. |
From: Eike W. <eik...@gm...> - 2008-04-14 10:57:25
|
The code is now available here: http://pastebin.com/f762576c5 The Berlios FTP server seems to be somehow unreliable. |
From: Ken K. <ksk...@gm...> - 2008-04-20 02:47:03
|
Hi.. I'm a new pyparsing user, and I just wanted to comment on Eike Welk's recent proposal [1] to improve error messages by stopping backtracking at defined points using a "ErrStop" ParserElement. It's awesome. Totally flipping awesome. I'd beat my head against the wall for a couple of hours before I stumbled on it & was instantly relieved. I've posted a diff of the simpleSQL.py example to the pastebin [2] that shows a use of ErrStop on a statement that was previously happily accepted ("select * from table1 where a b" -- note the missing operator). The diff assumes that ErrStop has been added to pyparsing. Thanks Eike for the solution, and to Paul for the excellent package! -ken PS: Sorry for creating a new thread.. I don't know how to continue the old one when I wasn't a list member until just now. PSS: Since the docs & examples aren't in the svn repos, the diff is against what was in the 1.4.11 tarball. [1] http://sourceforge.net/mailarchive/message.php?msg_name=200804140116.37166.eike.welk%40gmx.net [2] http://pastebin.com/f25124b32 |
From: W. M. B. <de...@de...> - 2008-04-20 08:43:42
|
On Mon, Apr 14, 2008 at 01:16:35AM +0200, Eike Welk wrote: > I want to propose an other way, how to get better error messages: > Prevent backtracking at certain points in the grammar. I like your approach and will probably use it in one of my two pyparsing projects. Unfortunately, my other projects has about 600 BNF productions, some of them non-trivial. Adding ErrStop to at least 100 or 200 of them would clutter my code a lot and I'm yet sure it even would give the desired result. For that grammar I still hope for inclusion of LastParseLoc or sth. similar in pyparsing :~) Thanks Eike, Gre7g, and Paul! |
From: Paul M. <pt...@au...> - 2008-04-20 14:58:48
|
I am sorting out what to release in the next version of pyparsing. I will definitely include ErrStop or something similar. I would also like to get an idea of any patterns of usage, to see if there is a natural place to automatically include this feature, in an And say. Or perhaps just incorporate this logic into the And class's parseImpl method. Or make raiseErrorStop an attribute of And (or maybe just of ParserElement), so that any class can report its error location. I'm also looking at something like lastParseLoc. Ideally I'd like lastParseLoc to be an attribute of a ParseStatus object that is passed through the grammar - this would be a natural home for lastParseLoc, as well as things like global debugging flags, whitespace character sets, packrat caches, etc. But I think I would also want to pass this object to parse actions, and this would break the interface to all existing parse actions. :( So for now, it will probably have to be a class attribute, the same way that the packrat cache is handled. So these changes could get pretty drastic. I'll keep them to a minimum for compatability with 1.4.x, maybe bump the minor version to 1.5, and save the really drastic changes for 2.0 someday. -- Paul -----Original Message----- From: pyp...@li... [mailto:pyp...@li...] On Behalf Of W. Martin Borgert Sent: Sunday, April 20, 2008 3:50 AM To: Eike Welk Cc: pyp...@li... Subject: Re: [Pyparsing] Get Better Error Messages - Prevent Backtracking On Mon, Apr 14, 2008 at 01:16:35AM +0200, Eike Welk wrote: > I want to propose an other way, how to get better error messages: > Prevent backtracking at certain points in the grammar. I like your approach and will probably use it in one of my two pyparsing projects. Unfortunately, my other projects has about 600 BNF productions, some of them non-trivial. Adding ErrStop to at least 100 or 200 of them would clutter my code a lot and I'm yet sure it even would give the desired result. For that grammar I still hope for inclusion of LastParseLoc or sth. similar in pyparsing :~) Thanks Eike, Gre7g, and Paul! ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javao ne _______________________________________________ Pyparsing-users mailing list Pyp...@li... https://lists.sourceforge.net/lists/listinfo/pyparsing-users |
From: Eike W. <eik...@gm...> - 2008-04-20 22:57:12
|
I'm really glad that my proposal seems to be usefull for others. I was quite astonished how easy it was to implement the idea. So Pyparsing is not only an easy to use parsing library, it is also easy to write extensions for it. On Sunday 20 April 2008 16:58, you wrote: > I am sorting out what to release in the next version of pyparsing. > I will definitely include ErrStop or something similar. I would > also like to get an idea of any patterns of usage, to see if there I use ErrStop in my toy language, which is a crude special purpose language. http://freeode.berlios.de/ The parser is a pretty long snippet from an even longer file: http://pastebin.com/f70bba102 > is a natural place to automatically include this feature, in an And > say. Or perhaps just incorporate this logic into the And class's A cooperative And, that treats ErrStop specially, would be great! Then one could write: dataDef2 = Group(Keyword('data') + ErrStop() + attrNameList + ':' + identifier + ';') > parseImpl method. Or make raiseErrorStop an attribute of And (or > maybe just of ParserElement), so that any class can report its > error location. An other general method to deal with errors would be the introduction of error actions. ParserElement would get a function: setErrorAction( callableObject ) The error actions would maybe have the signature: errAction( str, loc, error ) In the error action the error (ParseException) could be examined, and a fatal exception could be raised if desired. > > I'm also looking at something like lastParseLoc. Ideally I'd like > lastParseLoc to be an attribute of a ParseStatus object that is > passed through the grammar - this would be a natural home for > lastParseLoc, as well as things like global debugging flags, > whitespace character sets, packrat caches, etc. But I think I > would also want to pass this object to parse actions, and this > would break the interface to all existing parse actions. To keep compatibility you could go the Microsoft way and introduce the function: setParseActionEx( ... ) Or to let old code run with only minimal changes: setParseActionOld( ... ) :-) > > :( So for now, it will probably have to be a class attribute, the > : same way > > that the packrat cache is handled. > > So these changes could get pretty drastic. I'll keep them to a > minimum for compatability with 1.4.x, maybe bump the minor version > to 1.5, and save the really drastic changes for 2.0 someday. > > -- Paul Kind regards, Eike. |
From: Paul M. <pt...@au...> - 2008-04-21 02:29:39
|
Eike - Wow! SIML is really quite an expressive language. I've added a link to the "Who's Using Pyparsing" page of the wiki to your project. pyparsing already has setErrorAction, although it is called setFailAction. Here is the docstring: """Define action to perform if parsing fails at this expression. Fail acton fn is a callable function that takes the arguments fn(s,loc,expr,err) where: - s = string being parsed - loc = location where expression match was attempted and failed - expr = the parse expression that failed - err = the exception thrown The function returns no value. It may throw ParseFatalException if it is desired to stop parsing immediately.""" If you are generating code, you might look into using Python's import hooks. Here is an example where I use pyparsing to parse state machine definitions in a file with the extension .pystate, where the states are defined with a state machine like: TrafficLight = { Red -> Green; Green -> Yellow; Yellow -> Red; } Here is the full example, with links to the code: http://www.geocities.com/ptmcg/python/index.html#stateMachine (I will be updating this soon to use an indentation-based parser for state machines, so I can get rid of those {}'s and ;'s!) -- Paul |
From: Paul M. <pt...@au...> - 2008-05-12 08:41:54
|
Eike, et al. - I'm happy to report that I've successfully added ErrorStop-like behavior to pyparsing. In the last 6 weeks or so, there has been a flurry of interest and comment on this feature, and between the various proposals, and some offline parser work (in which I was converting an EBNF to pyparsing), I finally got my thoughts to gel on how to add this important feature to pyparsing. I'll excerpt comments from a posting I made to the wiki a few hours ago (in response to a pyparsing user who needed to raise a syntax error from an expression wrapped in an Optional, and so proposed a mod to Optional to correct the problem): >>>>>>>> It turns out that this issue affects many parts of pyparsing, not just Optional. The root problem actually occurs in the And class, in that if a succession of expressions does not parse completely, than a routine ParseException is raised. For example, in your grammar, you found the need to modify Optional because you did not get the desired error location from: port_clause = "(" + ...body of port definition... + ")" entity = Literal("entity") + "(" + \ Optional( Keyword("port") + port_clause ) + \ ")" ParseException is "routine" because it is a way for any expression to indicate that no match occurred, and other alternatives should be tried. However, in this case, we want non-routine behavior. If the parser reads "port" and it is not followed by "(" and the other interesting port items, then the parser should stop immediately. This is a different flavor of And - when "port" is read, you know that the next items in the string should be the port data, and if it isn't then this is a syntax error. Since normal And sequencing is defined using '+' signs, I'm trying to insert the syntax error trapping using another operator. The logical choice for this operator would be '-'; it is equal to '+' in precedence, and it is visually intuitive as a sequence connector. The distinction will be that, if a parser error occurs after passing the '-' operator, then this error will be flagged immediately as a syntax error. (I am adding the exception class ParseSyntaxException, derived from ParseFatalException.) In your case, your code would become: port_clause = "(" + ...body of port definition... + ")" entity = Literal("entity") + "(" + \ Optional( Keyword("port") - port_clause ) + \ ")" The syntax would be the same if Optional were replaced with ZeroOrMore, OneOrMore, or any of the other repetition classes. It is possible now to have a lot of control over just where syntax errors get signaled. You could define an expression as: expr = A + B + C - D + E + F and any parsing mismatch after having matched A, B, and C would be raised as a syntax error, and parsing would stop immediately. <<<<<<<<<<< So that's it. To implement ErrorStop, I've just added the '-' operator, so that "A - B + C" becomes "A + errorStop + B + C". ErrorStop itself is implemented as a private, internal class to And, and I modified And's parseImpl method to do the right thing when detecting errorStop. It should be noted that you shouldn't just blindly go replacing all of your '+' operators with '-'s; backtracking *is* an important feature for most grammars. The general rule for using '-' is to insert it after an element in your grammar that unambiguously determines a particular path in the grammar, so that backtracking would not find any better match. If you want to experiment with this new feature, you can download it from the pyparsing SVN repository on SourceForge. (You'll note that I've bumped the version to 1.5.0 with this update - the number of new features is really moving us to another level of the package, so I'm probably a little overdue in calling this 1.5.0 instead of 1.4.*.) Since early in the life of pyparsing, I have been writing apologetic e-mails about pyparsing's inability to report helpful syntax error locations. I hope this new feature will help address this deficiency. Thanks to all! -- Paul |