[Flex-help] Why is rescanning needed?
flex is a tool for generating scanners
Brought to you by:
wlestes
From: Arthur S. <asc...@at...> - 2013-12-04 19:01:14
|
I don't understand the need for rescanning the input. The reasoning (that I have) for why it is not needed is: A scanned token consists of: prefix suffix Where the prefix and suffix are strings. If the RE is: prefix/suffix, where suffix is a single character then: 1: prefix is accepted which implies that on next use of the scanner we start in some state, BEGIN(state). 2: Because the FOLLOW set suffix is a single character, by definition, it is possible to determine the start RE to be evaluated without rescanning. 3: The exit condition, then consists of the 3-tuple <suffix, state, RE> 3: On reentry to yylex(), assuming that acceptance means "return", the exit condition is processed by a yylex() preamble. if the RE is: prefix/suffix, where suffix is multiple characters 1: prefix is accepted which implies that on next use of the scanner we start in some state, BEGIN(state). 2: Because the FOLLOW set suffix is multiple character the RE to be used on reentry must be determined by RE test execution. 3: The exit condition, then consists of the 3-tuple <suffix, state, RE> 3: On reentry to yylex(), assuming that acceptance means "return", the exit condition is processed by a yylex() preamble. The user supplied RE's should be concatenated, not made into separate RE DFSM's. This means: prefixsuffix becomes part of a single machine rather than two machines, prefix and prefixsuffix. For example the following flex input should generate a machine with no backup: CHAR [cC] LONG [lL] SIGNED ({CHAR}|{LONG}) UNSIGNED ([uU]({CHAR}|{LONG})?) OCTAL 0[0-7]* FOLLOW [,;:\(\)<>/{}|[:cntrl:]| ] %% {OCTAL}/{FOLLOW} { printf("OCTAL %s\n", yytext); return 1;} {OCTAL}{SIGNED}/{FOLLOW} { printf("OCTAL[C|L] %s\n", yytext); return 2;} {OCTAL}{UNSIGNED}/{FOLLOW} { printf("OCTAL[UC|UL] %s\n", yytext); return 3;} . %% The resultant machine would accept {OCTAL}/{FOLLOW}, return the value of OCTAL (return 1) and remember the FOLLOW character seen as input in the INITIAL state for RE '.'. And similarly for the remainder. I have a included a scanner, for an example in John Levine's "flex & bison" O'Reilly book. Unfortunately, the original flex input does not have any backup states (sigh) so the example is deficient. But it shows state concatenation and a yylex() preamble. The example scanner input is: /* Companion source code for "flex & bison", published by O'Reilly * Media, ISBN 978-0-596-15597-1 * Copyright (c) 2009, Taughannock Networks. All rights reserved. * See the README file for license conditions and contact info. * $Header: /home/johnl/flnb/code/RCS/fb1-5.l,v 2.1 2009/11/08 02:53:18 johnl Exp $ */ /* recognize tokens for the calculator and print them out */ %{ # include "fb1-5.tab.h" %} %% "+" { return ADD; } "-" { return SUB; } "*" { return MUL; } "/" { return DIV; } "|" { return ABS; } "(" { return OP; } ")" { return CP; } [0-9]+ { yylval = atoi(yytext); return NUMBER; } \n { return EOL; } "//".* [ \t] { /* ignore white space */ } . { yyerror("Mystery character %c\n", *yytext); } %% |