Re: [Flex-devel] [Flex-help] Suggestions for improving Flex
flex is a tool for generating scanners
Brought to you by:
wlestes
From: Joe K. <kr...@ni...> - 2008-10-06 17:35:54
|
Aaron Stone wrote: > Replying back onto the flex-devel mailing list, so that we can track the > conversation. There are definitely some cases where if you're not > playing by the (often unwritten) rules of writing a flex grammar, the > code will bite you. Polishing edges is very welcome! > > On Oct 3, 2008, at 2:42 PM, Joe Krahn wrote: > >> Aaron Stone wrote: >>> There's active maintenance right now, but not active development. >>> That's just a function of available time and not a function of lack >>> of interest -- although the C++ side could use a serious C++ >>> programmer's input. The current maintainers are all C programmers. >>> Aaron >> ... >> I program mostly in C, but am trying to use it with C++ right now. >> However, there are a lot of problems that are not limited to just C++. >> >> For example, the --main and --nomain flags don't work. It appears that >> Flex moved away from manual skeleton processing to using the m4 >> preprocessor instead, and the conversion is sort of only half >> completed. The --main option flags set a cpp macro, but the skeleton >> only honors the m4 macro. > > Oh, that's no good. > >> >> Similarly, the skeleton file has some header sections marked with >> m4_ifdef, and some marked with %ok-for-header and %not-for-header. >> There are also a few unmatched %if sections, which only succeeds >> because someone added an extra push-true at the beginning of skelout(). >> >> I can make an attempt at working on some improvements, but any sort of >> update will surely lead to errors, even if it is best for the long run. > > There are pretty good tests in the tree, so feel free to (carefully) > mess with things and post patches to the list and/or to sourceforge bugs. > >> >> Maybe I should just proceed with some experimental code updates, post >> my initial results, and see what people think? > > For sure! > > Aaron OK, I have done some initial hacking. It almost works, but will take some reviewing and debugging. The changes may seem a bit drastic for something that is mostly stable, but the current state of disorganization is leading to poor maintainability. I hope that other flex developers agree that it needs a general clean-up in skeleton processing. Here is what I have done so far. Comments are very welcome. My current design gets rid of the initial preprocessing m4 stage. Instead, it expects the m4_include files to be available at run time. The M4_GEN_PREFIX macro was updated to work with a single m4 pass. This makes it easier to work with an external skeleton file. (Bison works this way.) I Moved most of the code generated in C source into the skeleton file, and added a few more M4 macro option definitions for the extra logic needed in the skeleton. I replaced the %if/%endif conditionals from misc.c with m4 conditionals. Instead of the messy m4_ifdefs, I added some defined macros like "m4_if_c_only()". The misc.c conditional processing is now essentially empty except for %# comment processing. I reorganized the skeleton into sensible groups where possible: header, non-header, static non-reentrant globals, etc. The header parts are still divided into two parts, before and after user section 1, to ensure compatibility with existing code. Replaced all of the YY_G() macros with m4 substitution macros, similar to what was already done for function prefixes. This keeps the skeleton code simpler. (I am assuming that user code never uses the YY_G() macro.) The reentrant state object was renamed from yyguts_t to yyobject_t. All of the struct members no longer have the yy prefix, because it is not needed when encapsulating them in a struct. (Ideally, the C++ and yyobject_t names should all match, but I have not compared them.) There are now two prefix macros, for names starting with "yy_" versus "yy". For the yyobject_t variables, this avoids names with a leading underscore. For functions and non-reentrant globals, this could be used to make a C++ namespace prefix instead of a simple name prefix, in which case it would also be nice to exclude leading underscores. For now, the underscore is always retained. Bison has much nicer m4 macros for traditional versus ANSI prototype generation. They have variable argument lists, instead one for each argument-list size. I think it would have been much better not to put the _param suffix on yylex arguments in the reentrant version, because it does not work well with a user-defined YY_DECL. Instead, macros to rename them should come just after the start of yylex, but before the user code is inserted. That allows a user-defined YY_DECL to work with normal parameter names. In addition, the current skeleton initializes the lloc and lval pointers after the user-code section, leading to segfaults unless the user-code knows to use the undocumented _param suffix. Unfortunately, changing this will affect code that has already adapted. Maybe there should be a cpp macro or %option to name the yylval and yylloc args? Another problem with reentrant mode is that yyset_lval and yyset_lloc are useless, because yylex sets them every time. An updated yylex should allow for YY_DECL not to have lval and lloc args, but instead allow use of the set/get functions. Maybe the above mentioned yylval/yylloc naming options can also disable one or both, so the automatic pointer-copying code can adapt. I also think the %top section is designed wrong. It should terminate with '%}' instead of trying to count braces. But, how to fix it without breaking existing code? Maybe there could be a new code section called `%header{ ... %}' to emphasize that it is the place to put macros that affect the header section? After the changes I've made so far, I am working on getting it to pass all of the tests. Joe Krahn |