[re2c-general] patch : salvable states for re2c

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Here's a changeset against re2c-0.9.5 that implements salvable states:

    http://www.mgix.com/re2c-0.9.5-salvable-state-patch.bz2

What are salvable states you ask ?

Well, I like re2c very much, but there is one situation where
I found it to be wanting: re2c assumes a pull model, i.e. it
expects the the code generated to be able to call for more
characters whenever it runs out of them (via the YYFILL macro).

In a recent project, I needed to use re2c in a push model, where an
external source of data (a preprocessor) actually feeds byte buffers
to the code generated by re2c, and it in turn feeds tokens to a push
model grammar parser (in this case, a lemon parser).

Unfortunately, other than using multi-threading (which wasn't an option
for this project) or coroutines (which we don't have in C++), this is
rather hard to do with re2c as it is currently designed: when the code
generated by re2c runs out of bytes in the middle of a token, the YYFILL
macro is called to fetch the needed bytes, and the code is then expected
to continue right after where the YYFILL macro was called.

In a push model, YYFILL must actually save everything that
describes the current state of the DFA and return to the caller.

Later in the game, the data source feeds the re2c code fresh data,
and the re2c code must resume *exactly* where it left off.

As things are now, saving all the state variables is perfectly feasible with
a little editing work, but saving the specific spot in the code which we 
need
to jump back to on next round is not possible.

To get around this problem I patched re2c, and added a '-f' option.

Without -f, re2c behaves as before.

With -f, things behave has described below:

    . The user now needs to #define a  new macro : YYSTATE.
      YYSAVE should be a signed int variable, initialized to -1
      YYSAVE is used by re2c to save the location in the code where
      YYFILL was last called so we can later get back to where we were.

    . Prior to emitting a new YYFILL statement, re2c increases a global
      counter and a statement is emitted to store that value into YYSTATE.
      The usual YYFILL statement is then emitted, followed by a label 
indexed
      by the counter.

    Here's an example of what the emitted code looks like with -f

yy154:    ++YYCURSOR;
    YYSTATE = 6;
    if(YYLIMIT == YYCURSOR) YYFILL(1);
yyFillLabel6:
    yych = *YYCURSOR;

    The same code without -f:

yy154:    ++YYCURSOR;
    if(YYLIMIT == YYCURSOR) YYFILL(1);
    yych = *YYCURSOR;

    . Finally, the code emitted by re2c is prefixed with the following
      dispatch code, which will jump right back where it left off, based
      on the value of YYSTATE.

        switch(YYSTATE)
        {
            case -1: goto yy0;
            case 0: goto yyFillLabel0;
            case 1: goto yyFillLabel1;
            case 2: goto yyFillLabel2;
            case 3: goto yyFillLabel3;
            case 4: goto yyFillLabel4;
            case 5: goto yyFillLabel5;
            case 6: goto yyFillLabel6;
            case 7: goto yyFillLabel7;
            case 8: goto yyFillLabel8;
            case 9: goto yyFillLabel9;
            case 10: goto yyFillLabel10;
            case 11: goto yyFillLabel11;
            case 12: goto yyFillLabel12;

                ... and so on for all possible fill labels ...

            default: /* abort() */;
        }
yyNext:

And there you have it, re2c is now capable of
producing DFA scanners with a fully salvable state.

I'd be glad if you guys could integrate this into the main re2c code.

    - Emmanuel