#226 Documentation: info about on_failure and self.undo()

open
Martin Miller
5
2012-10-02
2012-01-26
Clem Wang
No

(Taken from https://sourceforge.net/projects/quex/forums/forum/574343/topic/4973536/index/page/1)

This behavior needs to be documented:

If you are in a mode and you can't match anything in the mode, nevertheless one character will be consumed when you exit a mode with on_failure.

However, if you want to get a fresh start in a new mode with the part of the string that failed to match, then you need to use the undo() function.

Specifically I have this simplified example:

token { DUMMY; }
start = ONE;
mode ONE{
on_failure {printf("In ONE, couldn't match '%s', on to mode TWO\n", Lexeme); self_enter_mode(&TWO);}
"a" {printf("In ONE, got an %s\n", Lexeme);}
}

mode TWO{
on_failure {printf("In TWO, couldn't match '%s'\n", Lexeme);}
"s" {printf("In TWO, got an %s\n", Lexeme);}
}

If the input is:

struct

Then I get:

In ONE, couldn't match 's', on to mode TWO
In TWO, couldn't match 't'
In TWO, couldn't match 'r'
In TWO, couldn't match 'u'
etc.

One might have naively expected that the "s" would NOT have been consumed in mode ONE so that when you when to mode TWO, the start of the string would still be "s" so that it could be analyzed in mode TWO.

However, If the analyzer would not eat at least the bad character at the beginning it would risk to stall, so on_failure keeps on consuming characters.

To fix this one, needs to use the undo() function (which is only available in C++ code generation in Version 0.60.2.

on_failure {printf("In ONE, couldn't match '%s', on to mode TWO\n", Lexeme); self.undo(); self_enter_mode(&TWO);}

Frank also notes:

Also, 'on_failure' is designed to catch a flaw in your definitions. To catch an anti-pattern, define it properly. This is not too hard if you rely on the precedence. That means, that the anti-pattern can actually contain elements of preceding patterns because they have a higher priority.

Discussion