From: Ulya <skv...@gm...> - 2022-03-28 21:57:52
|
Hi Ralph, As the documentation says, the dot . does not exclude zero \x00, it excludes only the newline \x0a. Therefore your regular expression named SCHAR does allow strings with \x00 in the middle (following a backslash). And since you use disable bounds checks (as described in "sentinel" method in https://re2c.org/manual/manual_c.html#handling-the-end-of-input), re2c assumes that the sentinel character is \x00 and warns you that your rule allows strings with sentinel in the middle (which it does). This is dangerous, because if you pass a null-terminated string consisting of a single quote and a backslash, the lexer will read past the end of input. So everything works as intended, and the way it is described in the docs. If you want to exclude sentinel from dot . then you should do it explicitly, e.g. [^\x00\n]. Then you can use the sentinel method and you won't be able to lex strings with zero in them. If you want to allow zero, use the "sentinel with bounds checks" or "bounds checks with padding" methods. Hope that helps, and happy to discuss further if I misunderstood your case. -- Ulya On Mon, 28 Mar 2022 at 13:01, Ralph Moses via re2c-devel < re2...@li...> wrote: > > I am new to using re2c, but I think I have found what might be a minor > documentation error. I am using Version 3.0. > > 1) Under the section, "Regular expressions" of the C User Manual, the > documentation reads: > > ". any character except newline" > > 2) A regular expression which uses the ".", such as: > /*!re2c > re2c:indent:top = 2; > re2c:yyfile:enable = 0; > > SCHAR = (['] ([\\][']|[\\].|[\001-\377]\[\\'])* [']); > SCHAR { > // Some C code > return 0; > } > > */ > > will produce the warning message: > > "warning: sentinel symbol 0 occurs in the middle of the rule (note: if a > different sentinel symbol is used, specify it with 're2c:sentinel' > configuration) [-Wsentinel-in-midrule]:" > > 3) The issue is (I think) that the sentinel character \000 is included as > well as the newline character. > > 4) Working code is to add near top (after /*!re2c): > > re2c:eof = 0; > > // End of input special rule "$" > $ { > status = retcode_eof; > break; > } > > 5) I believe the documentation should read something like: > > ". any character except newline or the sentinel character (usually /000)" > > Thank you, > > Ralph Moses > > > > > > _______________________________________________ > re2c-devel mailing list > re2...@li... > https://lists.sourceforge.net/lists/listinfo/re2c-devel > |