flex-devel Mailing List for flex: the fast lexical analyser (Page 11)
flex is a tool for generating scanners
Brought to you by:
wlestes
You can subscribe to this list here.
2005 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(4) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
|
2007 |
Jan
|
Feb
(1) |
Mar
(4) |
Apr
(5) |
May
(2) |
Jun
(2) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(3) |
2008 |
Jan
(1) |
Feb
(2) |
Mar
(1) |
Apr
(2) |
May
(1) |
Jun
|
Jul
|
Aug
(5) |
Sep
(3) |
Oct
(33) |
Nov
(4) |
Dec
(4) |
2009 |
Jan
|
Feb
|
Mar
(1) |
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
(1) |
Nov
|
Dec
|
2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
(10) |
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
2012 |
Jan
|
Feb
(11) |
Mar
(12) |
Apr
|
May
|
Jun
(3) |
Jul
(62) |
Aug
(2) |
Sep
(1) |
Oct
(1) |
Nov
|
Dec
|
2013 |
Jan
|
Feb
(3) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(2) |
2014 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
(5) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2015 |
Jan
|
Feb
(4) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(2) |
Sep
|
Oct
(3) |
Nov
(33) |
Dec
(31) |
2016 |
Jan
(2) |
Feb
|
Mar
(1) |
Apr
|
May
(2) |
Jun
|
Jul
|
Aug
(2) |
Sep
(5) |
Oct
|
Nov
|
Dec
|
2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
(2) |
Jul
|
Aug
|
Sep
(3) |
Oct
|
Nov
(4) |
Dec
|
2020 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(3) |
Sep
|
Oct
|
Nov
|
Dec
|
2021 |
Jan
|
Feb
(5) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
|
Feb
(2) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2025 |
Jan
|
Feb
(2) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Aaron S. <aa...@se...> - 2008-10-16 17:56:56
|
Awesome! On Oct 16, 2008, at 8:07 AM, Joe Krahn wrote: > I posted a diff of my current modified flex sources to this list. It > was > awaiting moderator approval because it is too large. Instead of > posting > a MIME attachment here, I have canceled the post and will upload as > feature-request in the bug tracker system. > > Joe > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > Build the coolest Linux based applications with Moblin SDK & win > great prizes > Grand prize is a trip for two to an Open Source event anywhere in > the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Flex-devel mailing list > Fle...@li... > https://lists.sourceforge.net/lists/listinfo/flex-devel |
From: Joe K. <kr...@ni...> - 2008-10-16 15:07:55
|
I posted a diff of my current modified flex sources to this list. It was awaiting moderator approval because it is too large. Instead of posting a MIME attachment here, I have canceled the post and will upload as feature-request in the bug tracker system. Joe |
From: Joe K. <kr...@ni...> - 2008-10-16 15:04:16
|
I noticed that the option processor allows a 'no' prefix even where it doesn't make sense. In most cases, it probably is not that important, but someone may get confused if they try to use an option like "noextra-type". Also, there are a few more functions that could be disabled by a no<func_name> option. Instead making command-line options even more numerous, maybe it would be a good idea to add a flag --options=LIST, where LIST is a string containing options processed exactly as in the %option directive. Joe Krahn |
From: Joe K. <kr...@ni...> - 2008-10-14 22:18:23
|
In the reentrant scanner, the serialized table data is not part of the reentrant struct. This is reasonable, because that is how it works for static tables. Loading the tables is essentially a "class" initialization that must be done before any scanner objects are used. As global initialization functions, they really should not include a specific object reference. The only place the object argument is actually references is in calling yy_fatal_error(). Maybe there should be a separate fatal-error function for errors not associated with a specific object? Joe Krahn |
From: Aaron S. <aa...@se...> - 2008-10-14 18:21:09
|
On Oct 14, 2008, at 8:45 AM, Joe Krahn wrote: > What part of the generated code is supposed to be visible to user code > from section 1? My understanding is that section 1 is meant for > including extra headers, and inserting cpp #defines to modify cpp- > based > options. However, it apparently does not come early enough in the > generated code, because somebody added the %top{} feature. > > My first guess is just to avoid moving any code to the other side of > the > current section 1 code insertion, to avoid breaking things. But, there > should be some specific rules. In any case, the %top{} feature is > useful > because it gets written to the public header. Huh, ok. I seem to recall that stuff in the top section came very nearly the beginning, if not actually the beginning, of the output file. I wonder if there are assumptions of this in many scanners. > > Why does the yylineno macro point to the buffer stack bs_lineno only > in > the reentrant scanner? Also, why not support yycolumn for the > non-reentrant scanner? I don't know the details on this. > > Does anybody think that M4 macros should be accessible in user code? > My > version of the skeleton disables all m4 processing of user code. This > fixes at least one bug in the bug tracker, but breaks some of the test > scanners. To work around this, my skeleton also generates cpp macros > to > replace some of the M4 macros. This allows code with M4 macros to > work, > but avoids possible side-effects of m4 processing. Should these be > enabled only for internal testing, or are some users writing scanners > that use these? My idea is to add an option flag, but state that it is > mainly intended for internal testing. I believe there should be absolutely no M4 processing of user code. No good can come of it. If the tests are using this, we should fix them. Aaron |
From: Joe K. <kr...@ni...> - 2008-10-14 15:46:10
|
What part of the generated code is supposed to be visible to user code from section 1? My understanding is that section 1 is meant for including extra headers, and inserting cpp #defines to modify cpp-based options. However, it apparently does not come early enough in the generated code, because somebody added the %top{} feature. My first guess is just to avoid moving any code to the other side of the current section 1 code insertion, to avoid breaking things. But, there should be some specific rules. In any case, the %top{} feature is useful because it gets written to the public header. Why does the yylineno macro point to the buffer stack bs_lineno only in the reentrant scanner? Also, why not support yycolumn for the non-reentrant scanner? Does anybody think that M4 macros should be accessible in user code? My version of the skeleton disables all m4 processing of user code. This fixes at least one bug in the bug tracker, but breaks some of the test scanners. To work around this, my skeleton also generates cpp macros to replace some of the M4 macros. This allows code with M4 macros to work, but avoids possible side-effects of m4 processing. Should these be enabled only for internal testing, or are some users writing scanners that use these? My idea is to add an option flag, but state that it is mainly intended for internal testing. Joe Krahn |
From: Joe K. <kr...@ni...> - 2008-10-14 15:27:27
|
Aaron Stone wrote: > > On Sep 29, 2008, at 11:31 AM, Joe Krahn wrote: ... >> When the option is used to generate a public header, the scanner source >> file should #include that header rather than duplicating the contents in >> the source file. Otherwise, user code that include that header (directly >> or indirectly) gets duplicate declarations. At the very least, generated >> source should pre-define the header's include guard macro. > > That's slightly trickier, but I don't think it would be too hard to get > right. I'd actually have _two_ macros, one for the header itself, and > another for the generated contents of the header. We cannot guarantee > that someone won't add more material to the header in some fashion out > of our control, and we want to avoid breaking such things as much as > possible. > > Something like: > > #ifndef FLEX_HEADER_H > #define FLEX_HEADER_H > > #ifndef FLEX_HEADER_GUTS > #define FLEX_HEADER_GUTS > > ... flex generated stuff... > > #endif > > ... maybe the user puts stuff here?... > > #endif All that is needed is for the written header to include standard guard macros, where the contained HEADER_GUTS part above is a different include file output by Flex, for example, a user header can look like this, where F;ex's output is named "flex_public.h": scanner.h: #ifndef MY_SCANNER_H #define MY_SCANNER_H #include "flex_public.h" ... The flex source code can have the public header content embedded, enclosed in the same guard macros written to flex_public.h. The scanner source could optionally #include flex_public.h, but having the embedded header parts exactly match the written header, including guard macros, will solve the problem of re-reading the header content. > >> Flex has functions yyset_lval() and yyset_lloc() so that the scanner >> globals can be set in reentrant mode, without having to pass them as >> arguments on every call to yylex(). Unfortunately, those functions are >> generated ONLY with bison-bridge options, which also forces them to be >> yylex arguments. My suggestion is to always generate the lval and lloc >> set/get functions in reentrant mode, unless options noyyset_lval, etc., >> are given. > > Sounds reasonable. > >> >> The main reason Flex has memory allocation wrappers seems to be to avoid >> errors for older standards that use (char*) for memory pointers. Why not >> include a flag to directly use malloc, etc.? Or maybe an option to >> define them as inline, to get the same effect? > > The wrappers also allow the developer to have flex use the memory > management system of the application. For example, it is possible to use > a pools, slabs, and garbage collection by providing a yyalloc and > yyrealloc, and defining yyfree to be a noop. You can achieve a > direct-to-malloc conversion with noyyalloc, et al, and #define yyalloc > malloc. > The problem is that the reentrant scanner includes yyscanner references in the allocation prototypes. You can get around it with macros, but it takes more work. You need macros at the top to hide flex's prototypes, which will conflict with C lib prototypes: %top{ #define yyalloc yyalloc__ } ... %{ #undef yyalloc__ #define yyalloc(size,scanner) malloc(size) %} Besides, |
From: Will E. <wl...@us...> - 2008-10-12 22:30:35
|
Joe/Aaron, The tab/space confusion is probably not something intentional. Yes, it should be cleaned up, and there is some infrastructure in the Makefiles, but it's not been thought about in a long time, so who knows how good it is. On Wednesday, 8 October 2008, 10:37 am -0700, Aaron Stone <aa...@se...> wrote: > I noticed this, too, but didn't want to disrupt the code too much. > > Will is the maintainer, and John was the last most active coder -- any > thoughts on this? > > Aaron > > > On Oct 8, 2008, at 7:25 AM, Joe Krahn wrote: > > > I noticed that most of the code uses tabs instead of spaces, but a lot > > of code seems to use spaces for indentation that aligns with a tabstop > > size of 4. I see that this is actually due to a Vim hint placed at the > > bottom of Flex's source files that includes "expandtabs tabstop=4". > > > > So, this line results in vim users (including myself) to slowly > > corrupt > > the formatting style, which is unexpanded tabs. I like an indentation > > level of 4 spaces with expandtabs, but I think it is better to keep > > tab > > stops at the default of 8. Either way is fine with me, as long as it > > is > > consistent. (It would be a lot worse if this was Python source.) > > > > Joe Krahn > > > > ------------------------------------------------------------------------- > > This SF.Net email is sponsored by the Moblin Your Move Developer's > > challenge > > Build the coolest Linux based applications with Moblin SDK & win > > great prizes > > Grand prize is a trip for two to an Open Source event anywhere in > > the world > > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > > _______________________________________________ > > Flex-devel mailing list > > Fle...@li... > > https://lists.sourceforge.net/lists/listinfo/flex-devel > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Flex-devel mailing list > Fle...@li... > https://lists.sourceforge.net/lists/listinfo/flex-devel > -- Will Estes (wl...@us...) Flex Project Maintainer http://flex.sourceforge.net/ |
From: Will E. <wl...@us...> - 2008-10-12 22:26:48
|
Joe, Could you post your patches? We'd be happy to evaluate them and a lot of what you say sounds like it's the sorts of things that owuld be good for flex. We'd need to test/evaluate/etc. before inclusion in flex, of course, but we'd love to see what you've done. On Friday, 10 October 2008, 11:37 am -0400, Joe Krahn <kr...@ni...> wrote: > I have an extensively redesigned skeleton file. It passes all tests > except for C++, which is an area that needs some re-development anyhow. > > My initial goal is to move as much C-generated code into the skeleton > file (as mentioned in the TODO), organize the skeleton into logical > sections, and improve the M4 processing to minimize complexities in the > skeleton C sources. I also put most of the M4 macro setup stuff at the > top, to minimize mixing of m4 and C code. > > I have modified essentially none of the actual C sources in the > skeleton, other than global search-and-replacement on the YY_G() macro. > I modified code in gen.c, main.c and misc.c to move generation of code > into the skeleton where possible, and reorganized some of the option > management. Previously, there were many different places where M4 macros > were set. I tried to group these together. I also removed all of the > directives from misc.c, except for '%%' break points. > > Overall, I think it is a big improvement. Getting things better > organized also revealed some bugs and inconsistencies. For example, I > found that '--main', '--nomain' flags were broken, and that %option > 'line' and 'noline' are missing from "scan.l". > > One problem is that checks for option consistency are not well > organized. Option side effects are not consist between the lexer and the > command-line parser. For example, "%option main" clears the do_yywrap > flag, but "--main" does not. My idea is that all options (or just the > ones with dependencies) should start in an "unspecified" state, then > process all lexer and command-line options, and do consistency checks > and implied side-effects afterwards. > > Also, there are many places in the skeleton where option dependencies > are simply enforced with no user warnings. It makes sense for some (or > even most) of the option checks to be in the skeleton, because many of > these dependencies come directly from the skeleton's design. However, > the skeleton should have a proper mechanism for issuing warnings and errors. > > Another problem is that many of the test examples expect to be processed > by the M4 macros. M4 processing is not part of the API, so user code > should never be exposed to M4. I modified the quoting to exclude M4 > processing. But, there should be a mechanism to handle the reentrant > globals in a way that a given lexer source can be compiled either was. > So, I added CPP macros corresponding to the LAST_ARG and ONLY_ARG M4 > macros, and designed the reentrant-globals CPP macros to avoid the need > for "M4_YY_DECL_GUTS_VAR()". My reentrant CPP macros are defined like: > > #define yyin (((yyguts_t*)yyscanner)->yyin) > > This way, only the variable "void *yyscanner" has to be available, and > only those LAST_ARG and ONLY_ARG macros are needed. If people have > written scanners that actually use M4, there could be an option to allow > M4 processing of user code. > > There are also some issues with code for external tables. They don't use > function declaration/prototype macros to allow for pre-ANSI code. (Is > that feature really still needed?) Also, yydmap[] is not part of the > reentrant structure. This means it is only reentrant-safe after the > external tables are loaded. It might be good to add some safety checks. > > Joe Krahn > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Flex-devel mailing list > Fle...@li... > https://lists.sourceforge.net/lists/listinfo/flex-devel > -- Will Estes (wl...@us...) Flex Project Maintainer http://flex.sourceforge.net/ |
From: Will E. <wl...@us...> - 2008-10-12 22:23:32
|
Thanks for your submitted patch. We will evaluate it for inclusion in the flex codebase. (And apologies for the long delay in response to your message. We're busy with things outside of flex; please don't read this delay is a lack of interest in your submitted patch.) On Saturday, 23 August 2008, 7:08 am -0400, ruertar <ru...@gm...> wrote: > > There are a few cases where flex will output "Input line too long" when > it could be more descriptive. > > We've had at least one question one the flex-help list about a difficult > to track down "input line too long" error. I think simple but > descriptive error messages like these would have helped. > > |
From: Joe K. <kr...@ni...> - 2008-10-10 15:37:34
|
I have an extensively redesigned skeleton file. It passes all tests except for C++, which is an area that needs some re-development anyhow. My initial goal is to move as much C-generated code into the skeleton file (as mentioned in the TODO), organize the skeleton into logical sections, and improve the M4 processing to minimize complexities in the skeleton C sources. I also put most of the M4 macro setup stuff at the top, to minimize mixing of m4 and C code. I have modified essentially none of the actual C sources in the skeleton, other than global search-and-replacement on the YY_G() macro. I modified code in gen.c, main.c and misc.c to move generation of code into the skeleton where possible, and reorganized some of the option management. Previously, there were many different places where M4 macros were set. I tried to group these together. I also removed all of the directives from misc.c, except for '%%' break points. Overall, I think it is a big improvement. Getting things better organized also revealed some bugs and inconsistencies. For example, I found that '--main', '--nomain' flags were broken, and that %option 'line' and 'noline' are missing from "scan.l". One problem is that checks for option consistency are not well organized. Option side effects are not consist between the lexer and the command-line parser. For example, "%option main" clears the do_yywrap flag, but "--main" does not. My idea is that all options (or just the ones with dependencies) should start in an "unspecified" state, then process all lexer and command-line options, and do consistency checks and implied side-effects afterwards. Also, there are many places in the skeleton where option dependencies are simply enforced with no user warnings. It makes sense for some (or even most) of the option checks to be in the skeleton, because many of these dependencies come directly from the skeleton's design. However, the skeleton should have a proper mechanism for issuing warnings and errors. Another problem is that many of the test examples expect to be processed by the M4 macros. M4 processing is not part of the API, so user code should never be exposed to M4. I modified the quoting to exclude M4 processing. But, there should be a mechanism to handle the reentrant globals in a way that a given lexer source can be compiled either was. So, I added CPP macros corresponding to the LAST_ARG and ONLY_ARG M4 macros, and designed the reentrant-globals CPP macros to avoid the need for "M4_YY_DECL_GUTS_VAR()". My reentrant CPP macros are defined like: #define yyin (((yyguts_t*)yyscanner)->yyin) This way, only the variable "void *yyscanner" has to be available, and only those LAST_ARG and ONLY_ARG macros are needed. If people have written scanners that actually use M4, there could be an option to allow M4 processing of user code. There are also some issues with code for external tables. They don't use function declaration/prototype macros to allow for pre-ANSI code. (Is that feature really still needed?) Also, yydmap[] is not part of the reentrant structure. This means it is only reentrant-safe after the external tables are loaded. It might be good to add some safety checks. Joe Krahn |
From: Aaron S. <aa...@se...> - 2008-10-08 17:38:07
|
I noticed this, too, but didn't want to disrupt the code too much. Will is the maintainer, and John was the last most active coder -- any thoughts on this? Aaron On Oct 8, 2008, at 7:25 AM, Joe Krahn wrote: > I noticed that most of the code uses tabs instead of spaces, but a lot > of code seems to use spaces for indentation that aligns with a tabstop > size of 4. I see that this is actually due to a Vim hint placed at the > bottom of Flex's source files that includes "expandtabs tabstop=4". > > So, this line results in vim users (including myself) to slowly > corrupt > the formatting style, which is unexpanded tabs. I like an indentation > level of 4 spaces with expandtabs, but I think it is better to keep > tab > stops at the default of 8. Either way is fine with me, as long as it > is > consistent. (It would be a lot worse if this was Python source.) > > Joe Krahn > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > Build the coolest Linux based applications with Moblin SDK & win > great prizes > Grand prize is a trip for two to an Open Source event anywhere in > the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Flex-devel mailing list > Fle...@li... > https://lists.sourceforge.net/lists/listinfo/flex-devel |
From: Joe K. <kr...@ni...> - 2008-10-08 14:26:09
|
I noticed that most of the code uses tabs instead of spaces, but a lot of code seems to use spaces for indentation that aligns with a tabstop size of 4. I see that this is actually due to a Vim hint placed at the bottom of Flex's source files that includes "expandtabs tabstop=4". So, this line results in vim users (including myself) to slowly corrupt the formatting style, which is unexpanded tabs. I like an indentation level of 4 spaces with expandtabs, but I think it is better to keep tab stops at the default of 8. Either way is fine with me, as long as it is consistent. (It would be a lot worse if this was Python source.) Joe Krahn |
From: Joe K. <kr...@ni...> - 2008-10-06 17:35:54
|
Aaron Stone wrote: > Replying back onto the flex-devel mailing list, so that we can track the > conversation. There are definitely some cases where if you're not > playing by the (often unwritten) rules of writing a flex grammar, the > code will bite you. Polishing edges is very welcome! > > On Oct 3, 2008, at 2:42 PM, Joe Krahn wrote: > >> Aaron Stone wrote: >>> There's active maintenance right now, but not active development. >>> That's just a function of available time and not a function of lack >>> of interest -- although the C++ side could use a serious C++ >>> programmer's input. The current maintainers are all C programmers. >>> Aaron >> ... >> I program mostly in C, but am trying to use it with C++ right now. >> However, there are a lot of problems that are not limited to just C++. >> >> For example, the --main and --nomain flags don't work. It appears that >> Flex moved away from manual skeleton processing to using the m4 >> preprocessor instead, and the conversion is sort of only half >> completed. The --main option flags set a cpp macro, but the skeleton >> only honors the m4 macro. > > Oh, that's no good. > >> >> Similarly, the skeleton file has some header sections marked with >> m4_ifdef, and some marked with %ok-for-header and %not-for-header. >> There are also a few unmatched %if sections, which only succeeds >> because someone added an extra push-true at the beginning of skelout(). >> >> I can make an attempt at working on some improvements, but any sort of >> update will surely lead to errors, even if it is best for the long run. > > There are pretty good tests in the tree, so feel free to (carefully) > mess with things and post patches to the list and/or to sourceforge bugs. > >> >> Maybe I should just proceed with some experimental code updates, post >> my initial results, and see what people think? > > For sure! > > Aaron OK, I have done some initial hacking. It almost works, but will take some reviewing and debugging. The changes may seem a bit drastic for something that is mostly stable, but the current state of disorganization is leading to poor maintainability. I hope that other flex developers agree that it needs a general clean-up in skeleton processing. Here is what I have done so far. Comments are very welcome. My current design gets rid of the initial preprocessing m4 stage. Instead, it expects the m4_include files to be available at run time. The M4_GEN_PREFIX macro was updated to work with a single m4 pass. This makes it easier to work with an external skeleton file. (Bison works this way.) I Moved most of the code generated in C source into the skeleton file, and added a few more M4 macro option definitions for the extra logic needed in the skeleton. I replaced the %if/%endif conditionals from misc.c with m4 conditionals. Instead of the messy m4_ifdefs, I added some defined macros like "m4_if_c_only()". The misc.c conditional processing is now essentially empty except for %# comment processing. I reorganized the skeleton into sensible groups where possible: header, non-header, static non-reentrant globals, etc. The header parts are still divided into two parts, before and after user section 1, to ensure compatibility with existing code. Replaced all of the YY_G() macros with m4 substitution macros, similar to what was already done for function prefixes. This keeps the skeleton code simpler. (I am assuming that user code never uses the YY_G() macro.) The reentrant state object was renamed from yyguts_t to yyobject_t. All of the struct members no longer have the yy prefix, because it is not needed when encapsulating them in a struct. (Ideally, the C++ and yyobject_t names should all match, but I have not compared them.) There are now two prefix macros, for names starting with "yy_" versus "yy". For the yyobject_t variables, this avoids names with a leading underscore. For functions and non-reentrant globals, this could be used to make a C++ namespace prefix instead of a simple name prefix, in which case it would also be nice to exclude leading underscores. For now, the underscore is always retained. Bison has much nicer m4 macros for traditional versus ANSI prototype generation. They have variable argument lists, instead one for each argument-list size. I think it would have been much better not to put the _param suffix on yylex arguments in the reentrant version, because it does not work well with a user-defined YY_DECL. Instead, macros to rename them should come just after the start of yylex, but before the user code is inserted. That allows a user-defined YY_DECL to work with normal parameter names. In addition, the current skeleton initializes the lloc and lval pointers after the user-code section, leading to segfaults unless the user-code knows to use the undocumented _param suffix. Unfortunately, changing this will affect code that has already adapted. Maybe there should be a cpp macro or %option to name the yylval and yylloc args? Another problem with reentrant mode is that yyset_lval and yyset_lloc are useless, because yylex sets them every time. An updated yylex should allow for YY_DECL not to have lval and lloc args, but instead allow use of the set/get functions. Maybe the above mentioned yylval/yylloc naming options can also disable one or both, so the automatic pointer-copying code can adapt. I also think the %top section is designed wrong. It should terminate with '%}' instead of trying to count braces. But, how to fix it without breaking existing code? Maybe there could be a new code section called `%header{ ... %}' to emphasize that it is the place to put macros that affect the header section? After the changes I've made so far, I am working on getting it to pass all of the tests. Joe Krahn |
From: Aaron S. <aa...@se...> - 2008-10-04 01:06:19
|
Replying back onto the flex-devel mailing list, so that we can track the conversation. There are definitely some cases where if you're not playing by the (often unwritten) rules of writing a flex grammar, the code will bite you. Polishing edges is very welcome! On Oct 3, 2008, at 2:42 PM, Joe Krahn wrote: > Aaron Stone wrote: >> There's active maintenance right now, but not active development. >> That's just a function of available time and not a function of lack >> of interest -- although the C++ side could use a serious C++ >> programmer's input. The current maintainers are all C programmers. >> Aaron > ... > I program mostly in C, but am trying to use it with C++ right now. > However, there are a lot of problems that are not limited to just C++. > > For example, the --main and --nomain flags don't work. It appears > that Flex moved away from manual skeleton processing to using the m4 > preprocessor instead, and the conversion is sort of only half > completed. The --main option flags set a cpp macro, but the skeleton > only honors the m4 macro. Oh, that's no good. > > Similarly, the skeleton file has some header sections marked with > m4_ifdef, and some marked with %ok-for-header and %not-for-header. > There are also a few unmatched %if sections, which only succeeds > because someone added an extra push-true at the beginning of > skelout(). > > I can make an attempt at working on some improvements, but any sort > of update will surely lead to errors, even if it is best for the > long run. There are pretty good tests in the tree, so feel free to (carefully) mess with things and post patches to the list and/or to sourceforge bugs. > > Maybe I should just proceed with some experimental code updates, > post my initial results, and see what people think? For sure! Aaron |
From: Aaron S. <aa...@se...> - 2008-10-04 01:04:33
|
On Sep 29, 2008, at 11:31 AM, Joe Krahn wrote: > Here are some suggestions for updating Flex. Is it good to mail > patches > to this list? > > Several newer(?) functions should probably be included in the options > for not generating code, such as "%option noyyget_column > noyyset_column" That sounds reasonable to me. > When the option is used to generate a public header, the scanner > source > file should #include that header rather than duplicating the > contents in > the source file. Otherwise, user code that include that header > (directly > or indirectly) gets duplicate declarations. At the very least, > generated > source should pre-define the header's include guard macro. That's slightly trickier, but I don't think it would be too hard to get right. I'd actually have _two_ macros, one for the header itself, and another for the generated contents of the header. We cannot guarantee that someone won't add more material to the header in some fashion out of our control, and we want to avoid breaking such things as much as possible. Something like: #ifndef FLEX_HEADER_H #define FLEX_HEADER_H #ifndef FLEX_HEADER_GUTS #define FLEX_HEADER_GUTS ... flex generated stuff... #endif ... maybe the user puts stuff here?... #endif > Flex has functions yyset_lval() and yyset_lloc() so that the scanner > globals can be set in reentrant mode, without having to pass them as > arguments on every call to yylex(). Unfortunately, those functions are > generated ONLY with bison-bridge options, which also forces them to be > yylex arguments. My suggestion is to always generate the lval and lloc > set/get functions in reentrant mode, unless options noyyset_lval, > etc., > are given. Sounds reasonable. > > The main reason Flex has memory allocation wrappers seems to be to > avoid > errors for older standards that use (char*) for memory pointers. Why > not > include a flag to directly use malloc, etc.? Or maybe an option to > define them as inline, to get the same effect? The wrappers also allow the developer to have flex use the memory management system of the application. For example, it is possible to use a pools, slabs, and garbage collection by providing a yyalloc and yyrealloc, and defining yyfree to be a noop. You can achieve a direct- to-malloc conversion with noyyalloc, et al, and #define yyalloc malloc. Aaron |
From: Joe K. <kr...@ni...> - 2008-10-02 15:22:41
|
In trying to understand the source code, I found that --main and --nomain don't work because the flags set user_defs instead of setting m4 macros. The %top scanning is not well-implemented. It counts braces, and can easily be misled with comments. It should have ended with '%}' like other code sections. Maybe it's not too late to change, and use the current syntax only when '%}' is not found? The code also defined YY_INT_ALIGNED as "short int" or "long int" even though the actual table code uses int32_t and int16_t for C99, or "short int" and "int". It probably is not used much externally, and maybe really can just be skipped? I also think the reentrant code is not designed well. It expects the yylval and yylloc arguments to yylex() to have a "_param" suffix, even for the non-reentrant version. This interferes with an external YY_DECL definition, which normally expects not to have the _param suffix. Another problem is that user code inserted at the beginning of yylex() also has to use the _param suffix because the arguments are not yet copied to the local or struct variables. My suggestion is to leave the suffix off, and insert the yylval access macros after the beginning of yylex(). To avoid breaking existing code that already uses the _param suffix, there could be a CPP macro that allows the user to define the name of the yylval and yylloc args. In addition, it should be possible to not have yylval or yylloc args, but still support them in the scanner, via the set/get functions. Right now, the set functions are nearly useless because yylex() always sets them at the beginning of a call. Joe Krahn |
From: Joe K. <kr...@ni...> - 2008-10-01 20:17:26
|
Joe Krahn wrote: > I noticed that the skeleton conditional "%if-tables-serialization" does > not push the conditional stack. In the following section from the > skeleton source: > > %if-tables-serialization > #include <sys/types.h> > #include <netinet/in.h> > %endif > > The %endif statement here is actually popping an initial 'true' pushed > on to the conditional stack before processing started, instead of ending > the "%if-tables-serialization" section, which it seems is the actual > intention. > > Joe Krahn Looking into this further, it is definitely a bug, but I have not yet looked into how it affects the output. In addition to "%if-tables-serialization" not pushing onto the conditional stack, there is at least one other place where %if/%endif is unbalanced. Near line 860 is the following: }; /* end struct yyguts_t */ ]]) %if-c-only m4_ifdef( [[M4_YY_NOT_IN_HEADER]], [[ static int yy_init_globals M4_YY_PARAMS( M4_YY_PROTO_ONLY_ARG ); ]]) %endif c-only %if-reentrant This section begins already inside of an %if-c-only and %if-reentrant, but they are repeated here. Later on, an unbalanced %endif pops an extra condition from the stack. Every time skelout() starts, it pushes an extra TRUE onto the stack. This happens at the beginning, after each of the %% insertion sections. The only use for that pushed TRUE condition is to allow for an extra imbalanced %endif. I modified "%if-tables-serialization" to push the stack, and took out the initial sko_push at the beginning of skelout(). When the conditionals are balanced, it works as expected. I have not gone over the skeleton file to figure out the side-effects of balancing conditionals. I assume it passes all of the tests with the current bugs. Joe Krahn |
From: Joe K. <kr...@ni...> - 2008-10-01 19:09:31
|
I noticed that the skeleton conditional "%if-tables-serialization" does not push the conditional stack. In the following section from the skeleton source: %if-tables-serialization #include <sys/types.h> #include <netinet/in.h> %endif The %endif statement here is actually popping an initial 'true' pushed on to the conditional stack before processing started, instead of ending the "%if-tables-serialization" section, which it seems is the actual intention. Joe Krahn |
From: Joe K. <kr...@ni...> - 2008-09-29 18:32:09
|
Here are some suggestions for updating Flex. Is it good to mail patches to this list? Several newer(?) functions should probably be included in the options for not generating code, such as "%option noyyget_column noyyset_column" When the option is used to generate a public header, the scanner source file should #include that header rather than duplicating the contents in the source file. Otherwise, user code that include that header (directly or indirectly) gets duplicate declarations. At the very least, generated source should pre-define the header's include guard macro. Flex has functions yyset_lval() and yyset_lloc() so that the scanner globals can be set in reentrant mode, without having to pass them as arguments on every call to yylex(). Unfortunately, those functions are generated ONLY with bison-bridge options, which also forces them to be yylex arguments. My suggestion is to always generate the lval and lloc set/get functions in reentrant mode, unless options noyyset_lval, etc., are given. The main reason Flex has memory allocation wrappers seems to be to avoid errors for older standards that use (char*) for memory pointers. Why not include a flag to directly use malloc, etc.? Or maybe an option to define them as inline, to get the same effect? Joe Krahn |
From: Joe K. <kr...@ni...> - 2008-09-25 23:22:37
|
I am using Bison's new C++ features with Flex. I did not use the new Flex C++ features, because they seem rather incomplete. Instead, I used Flex reentrant mode. In the process, I have a few suggestions. Given the recent update, I assume there is some active work in this area. The Flex reentrant version actually works quite well with Bison's C++ mode. it just requires a few tricks because Bison has a fixed definition for yylex(). I could put together an example, unless Bison C++ users would rather have a more complete Flex C++ scanner. OTOH, the flex source says "... or omit the C++ scanner altogether.", so maybe a good use of reentrant mode is a better goal? The one problem I had from Flex is that I am generating the output header file, which I also include indirectly in my flex source, causing several conflicts. When the header is generated, I think it would be better not to duplicate everything in the generated source as well, but put an include statement in the generated source, perhaps with a #define so the header can distinguish external and internal #includes. Some of the skeleton-file development would be more efficient by using external skeleton source files, as Bison does. Joe Krahn |
From: Amit C. <cho...@ho...> - 2008-09-17 22:23:08
|
Hi all, I am trying to use yy_scan_string to scan input from a string. However, when I use flex, the scanner does not have the yy_scan_string procedure. Do I need to do something (maybe in FlexLexer.h file) to make the yy_scan_string routine appear in the scanner? Thanks in advance,Amit Get more out of the Web. Learn 10 hidden secrets of Windows Live. Learn Now _________________________________________________________________ Stay up to date on your PC, the Web, and your mobile phone with Windows Live. http://clk.atdmt.com/MRT/go/msnnkwxp1020093185mrt/direct/01/ |
From: Aaron S. <aa...@se...> - 2008-08-24 09:06:23
|
On Aug 23, 2008, at 12:14 PM, ruertar wrote: > Aaron Stone wrote: >> Can't do it -- that would break nearly every existing scanner >> unless it >> added some new option. > > Ya -- the idea would be to add an option to make a traditional, > non-reentrant scanner. Right, that's why it would break every single grammar in existence. The default cannot be changed. Aaron |
From: ruertar <ru...@gm...> - 2008-08-23 19:14:52
|
Aaron Stone wrote: > Can't do it -- that would break nearly every existing scanner unless it > added some new option. Ya -- the idea would be to add an option to make a traditional, non-reentrant scanner. -ayan |
From: Aaron S. <aa...@se...> - 2008-08-23 16:16:18
|
Can't do it -- that would break nearly every existing scanner unless it added some new option. There a lot of code out there that interacts with a flex scanner and assumes that the globals will be available. Aaron On Sat, 2008-08-23 at 08:44 -0400, ruertar wrote: > Hello! > > I don't know if it his has been touched on before but I was wondering if > it would make sense to make reentrant scanner generation default and > have a fallback-mode for traditional, old-style scanners? > > -ayan > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Flex-devel mailing list > Fle...@li... > https://lists.sourceforge.net/lists/listinfo/flex-devel |