[Flex-devel] Prefixes in a reentrant scanner
flex is a tool for generating scanners
Brought to you by:
wlestes
From: Dustin L. <du...@la...> - 2006-05-18 19:47:54
|
I've started looking at flex with an eye towards building a new and cleaner C++ interface based, perhaps, on the reentrant C scanner code. (I'm happy to talk about this if anyone is interested.) I'm studying the code but I'm not at all familiar with flex internals, so I'm starting from scratch and appreciate all the help I can get. My first question is about the feasibility of a straightforward development from the reentrant C code. If I generate two reentrant C scanners, then I have to give each a different prefix--say yy and zz. Obviously yylex and zzlex have (in general) different implementations, so the question is about the other symbols that access the yyscanner argument. Is this merely to disambiguate the C namespace, where I cannot have two objects named, say, yylex_init but one will be generated for each scanner, or does the implementation of yylex_init and zzlex_init necessarily vary depending on the chosen options and rules? That probably isn't a very clear question, so let me try to elaborate. The reentrant interface is already mostly object-oriented in principle, and at first glance maps cleanly to a C++ class. Take my first mockup, the simplest lexer class that knows how to do anything: namespace flex { class lexer { public: lexer() : my_lexer(NULL) { int failed = yylex_init(&my_lexer); assert(!failed); // eventually do something more intelligent here }; virtual ~lexer() { yylex_destroy(my_lexer); my_lexer = NULL; }; virtual int lex() { return yylex(my_lexer); }; private: yyscan_t my_lexer; }; } Notice how the interface looks appropriate: the class owns a yyscan_t (and probably would eventually inherit from struct yyguts_t directly rather than owning a pointer to one, but this is a mock-up) and the ctor and dtor take care of init and destroy. The next logical step would be to make lex() pure virtual for overriding in a concrete subclass containing the actual lexer function. Except for one thing: you can't write a generic base class, because the names would change in different subclasses: they're supposed to call ${PREFIX}lex_init(&my_lexer), not a generic lex_init that knows how to deal with any yyguts_t object. So the question is really whether the reentrant lexer implementation is such that it is feasible to generate generic lex_init, lex_destroy, and so on functions, or whether something about the implementation requires them to be separate functions. If the former, then probably there is a reasonably straightforward path ahead. If the latter, then the job is more difficult. It seems that this would be a desirable approach for the C scanner too, since it would allow more code to be shared between scanners, but perhaps there is a good reason it isn't that way. I've started poking around inside flex to try to find some answers, but I can see that the learning curve is steep so I'm hoping someone can keep me pointed in the right direction. One thing I've learned is that the vast majority of fields are identical between yyFlexLexer and yyguts_t, so code sharing certainly should be possible (which ought to make the maintainers happy). I have a general idea of how I think it should work. Dustin |