[Flex-help] Multiple input buffers in a reentrant scanner
flex is a tool for generating scanners
Brought to you by:
wlestes
|
From: Arnim L. <arn...@gm...> - 2008-04-04 20:01:07
|
I recently ported a parser/scanner pair for BSDL from yacc&lex to modern
bison and flex.
The grammar descriptions were no problem, but I had to rework some
implementation aspects of the BSDL format. These are related to scanning
& parsing input from different sources
a) from an included file
b) from a string in memory that's built on the fly by the parser
The original lex descrition redefined input() (resp. YY_INPUT) to switch
between these sources but that's discouraged from what I learned in
flex' docs. The "Multiple Input Buffers" feature came in handy here,
since it supports both scenarios plus the ability to stack these buffers.
Scanning from another file was no problem, just like the example given
in the docs. While scanning from a string caused me a bit of a headache
since I couldn't find a clean way to stack the buffer states with
yypush_buffer_state(yy_scan_string(...)) as I did with
yypush_buffer_state(yy_create_buffer(...)). I ended up with a workaround
to yypush YY_CURRENT_BUFFER before calling yy_scan_string(), that did
the trick.
Next step was to turn both parser and scanner into reentrant mode. They
are part of a software that will eventually become a library - global
variables need to be avoided. This was when I learned that the
workaround was not clean at all.
The scanner file provides a function in the user code section that is
called whenever the parser detects the need to scan from a memory
string. This worked well with a non-reentrant scanner where variables
are global. YY_CURRENT_BUFFER accesses such a global variable and I
didn't notice this side effect in the first place. But the reentrant
version relies solely on a local yyscan_t: YY_CURRENT_BUFFER didn't work
anymore inside the user function.
This led to the next workaround where I investigated what
YY_CURRENT_BUFFER actually does in a reentrant scanner and defined the
required yyg variable for the macro. This is what the resulting code
looks like:
void bsdl_flex_switch_buffer(yyscan_t scanner, const char *buffer)
{
/* ugly, ulgy, ugly
prepare yyg for later use of YY_CURRENT_BUFFER */
struct yyguts_t * yyg = (struct yyguts_t*)scanner;
int lineno;
lineno = yyget_lineno(scanner);
/* yy_scan_string() switches to the string buffer internally,
so we must save the current buffer state explicitly by pushing the
stack and setting top of stack to the current buffer state again.
yy_scan_string() can then savely switch YY_CURRENT_BUFFER to the
string buffer. yypop_buffer_state() will delete the string buffer
afterwards and pop the saved current buffer state. */
yypush_buffer_state(YY_CURRENT_BUFFER, scanner);
yy_scan_string(buffer, scanner);
yyset_lineno(lineno, scanner);
}
This all left me with an uneasy feeling since the (stacked) workarounds
make use of flex internals that might change with the next releases (I'm
still on 2.5.33).
Is there a clean way to handle this situation? Given the constraints
that I prefer to stay with multiple buffers in a scanner without global
variables.
Thanks in advance!
Arnim
|