[Flex-devel] Prefixes in a reentrant scanner

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

I've started looking at flex with an eye towards building a new and
cleaner C++ interface based, perhaps, on the reentrant C scanner code.
(I'm happy to talk about this if anyone is interested.)

I'm studying the code but I'm not at all familiar with flex internals,
so I'm starting from scratch and appreciate all the help I can get.  My
first question is about the feasibility of a straightforward development
from the reentrant C code.

If I generate two reentrant C scanners, then I have to give each a
different prefix--say yy and zz.  Obviously yylex and zzlex have (in
general) different implementations, so the question is about the other
symbols that access the yyscanner argument.  Is this merely to
disambiguate the C namespace, where I cannot have two objects named,
say, yylex_init but one will be generated for each scanner, or does the
implementation of yylex_init and zzlex_init necessarily vary depending
on the chosen options and rules?

That probably isn't a very clear question, so let me try to elaborate.
The reentrant interface is already mostly object-oriented in principle,
and at first glance maps cleanly to a C++ class.  Take my first mockup,
the simplest lexer class that knows how to do anything:

namespace flex
{

class lexer
{
public:

    lexer() : my_lexer(NULL)
    {
        int failed = yylex_init(&my_lexer);
        assert(!failed); // eventually do something more intelligent here
    };

    virtual ~lexer()
    {
        yylex_destroy(my_lexer);
        my_lexer = NULL;
    };

    virtual int lex() { return yylex(my_lexer); };

private:

    yyscan_t my_lexer;
};

}

Notice how the interface looks appropriate: the class owns a yyscan_t
(and probably would eventually inherit from struct yyguts_t directly
rather than owning a pointer to one, but this is a mock-up) and the ctor
and dtor take care of init and destroy.  The next logical step would be
to make lex() pure virtual for overriding in a concrete subclass
containing the actual lexer function.  Except for one thing: you can't
write a generic base class, because the names would change in different
subclasses: they're supposed to call ${PREFIX}lex_init(&my_lexer), not a
generic lex_init that knows how to deal with any yyguts_t object.

So the question is really whether the reentrant lexer implementation is
such that it is feasible to generate generic lex_init, lex_destroy, and
so on functions, or whether something about the implementation requires
them to be separate functions.  If the former, then probably there is a
reasonably straightforward path ahead.  If the latter, then the job is
more difficult.  It seems that this would be a desirable approach for
the C scanner too, since it would allow more code to be shared between
scanners, but perhaps there is a good reason it isn't that way.

I've started poking around inside flex to try to find some answers, but
I can see that the learning curve is steep so I'm hoping someone can
keep me pointed in the right direction.  One thing I've learned is that
the vast majority of fields are identical between yyFlexLexer and
yyguts_t, so code sharing certainly should be possible (which ought to
make the maintainers happy).  I have a general idea of how I think it
should work.

Dustin

[Flex-devel] Prefixes in a reentrant scanner

flex is a tool for generating scanners

[Flex-devel] Prefixes in a reentrant scanner