Re: [Flex-devel] #include <unistd.h> in middle of scanner: bad idea
flex is a tool for generating scanners
Brought to you by:
wlestes
|
From: Kaz K. <938...@ky...> - 2020-08-07 06:31:33
|
On 2020-08-06 21:26, Kang-Che Sung wrote:
> On Thu, Aug 6, 2020 at 1:37 AM Kaz Kylheku <938...@ky...>
> wrote:
>>
>> Flex generates a file which contains
>> a #include <unistd.h> long after user material. This seems to be for
>> the sake of having a declaration of isatty.
>>
>> The problem with this is that the user code can #define macros that
>> can break the header. C programs have to be careful with what they
>> do before system headers are included.
>
> System headers can always be overridden by user code. A prominent
> example is
> feature test macros, which are defined before the inclusion of system
> headers
> and can change the ABI used for standard library functions.
I was writing more from a standard C point of view in the above
paragraph.
ISO C has no "feature test macros".
They are POSIX feature (imitated in other platforms). There is a
specific
vocabulary of feature test macros. You cannot just make up any symbol.
So for instance -D_POSIX_C_SOURCE on the compiler command line has
a well defined effect in POSIX environments, due to being documented in
a standard. Whereas -D_BLUB_SOURCE, which I just made up, is undefined
behavior.
My comment wasn't meant to imply that an application must never under
any
circumstances define any sort of macro prior to including a system
header.
Ah right; in ISO C, there is a famous counterexample:
#define NDEBUG
#include <assert.h>
Again, documented by language, so okay.
>
>> Here is small repro test case:
>>
>> %{
>>
>> #define pause blah[]
>>
>> %}
>>
>> %%
>>
>> Result:
>>
>> $ flex lex.l ; cc lex.yy.c
>> lex.l:3:15: error: declaration of ‘blah’ as array of functions
>> #define pause blah[]
>
> Why would anyone want to do that? Redefining a standard library symbol
> makes
> the code unportable (except for local scope redefines).
The point is that that is a local scope define. My sample doesn't
even include any headers at all. Of course things like <stdio.h> are
implied
but they are expected to be before #define pause.
The programmer doesn't expect <unistd.h> in there at all.
What breaks it is that Flex generates the code such that there is a
surprise
#include <unistd.h> long after the #define pause.
The inclusion of <unistd.h> will not break if it precedes the #define
pause.
The issue isn't standard library symbols; I just picked that for an easy
repro. Any macro whatsoever could potentially cause a problem.
I also don't care about "portable" as much as "not breaking on any of
the half
dozen platforms I'm building for".
>> How about just an "extern int isatty(int)" somewhere?
>
> Forward declaration can somewhat defeat the purpose of the header. A
> library
> implementation might not declare the function exactly as described in
> the
> standard document. It might contain other attributes or ABI keywords.
Indeed; that's not a good solution.
Probably, the best thing would be not to have that isatty business in
there
at all, unless it's requested by some explicit option.
isatty is still being wastefully called even if YY_INPUT has redirected
the
lexer to scan from a character string instead of a stream.
The isatty calls even happen if the lexer is generated in -B (batch)
mode.
(Flex 2.6.4 on Ubuntu 18).
Aha! I have a nice workaround:
#define YY_NO_UNISTD_H
#undef isatty
#define isatty(x) 0
Presto; I'm no longer including <unistd.h> myself, there is no complaint
about an undeclared isatty, and the numerous ioctls in the strace
output:
ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0
are gone!
(IMHO, the above effect should be obtained by just %option batch or -B.)
And that also gives us one example to the "why would anyone redefine
a library symbol" question.
If <unistd.h> were included at the top, then we could let that include
be, and just do:
#undef isatty
#define isatty(x) 0
This trick is going to work fine on all the platforms I care about,
so it is de facto portable enough.
There are theoretical ways that could still break, but if it were
perpetrated before #include <unistd.h>, the probability of failure
would go to 100%. That's really why you don't want surprise header
inclusions after user-defined material.
|