#346 DU4.0: name conflict on _signed

closed-fixed
None
5
2013-05-25
2002-05-16
No

Trying to build sdcc on a Digital Unix 4.0 host system,
I observed a problem: in src/SDCCsymt.h, there's a
struct with bitfield elements named "_long", "_short",
...

I don't think those names are a good idea, and on this
particular system, one of them bombs, because there's
a line

#define _signed signed

in /usr/include/sys/localedef.h. I solved that by
renaming this particular element to _b_signed. I think
a more global renaming would be advisable.

Discussion

  • Hans-Bernhard Broeker

    Logged In: YES
    user_id=27517

    Further observations:

    * bitfield _const needs similar fix as _signed, for the same
    reason.

    * several cases (esp. in src/<model>/gen.c) where isspace()
    is directly called with an argument of type char. This is
    never correct. It has to be changed to

    isspace((unsigned char) {char expression here});

    E.g. for src/z80/gen.c, line 333 should read:

    while (isspace ((unsigned char) *lbp))

    The same holds for any of the <ctype.h> macro/functions.

     
  • Hans-Bernhard Broeker

    Logged In: YES
    user_id=27517

    No reaction in 3 years, 3 months and 5 days? No change to
    the source file either? Com'on guys.

     
  • Maarten Brock

    Maarten Brock - 2005-08-22

    Logged In: YES
    user_id=888171

    Hans,

    Thanks for reminding us you're still interested. I guess
    nobody answered because noone has access to a Digital
    Unix machine. Btw. Wasn't DU replaced by something else?
    (Tru64?)

    I've looked at this one a few times, and wondered every time
    why we need to change this because some compiler fails on
    it. Isn't there a way to have those defines removed? Some
    PURE_ANSI_C directive or something like that?

    I also wondered why isspace(char) is incorrect. It's defined in
    the standard as:
    int isspace (int c);

    Greets,
    Maarten

     
  • Hans-Bernhard Broeker

    Logged In: YES
    user_id=27517

    Digital Unix was not exactly replaced -- it just got a yet
    another new name tag stuck on it. Nowadays, I don't have
    access to a box using that system any more, either.

    The problem is that names with a leading underscore are not
    exactly in application name space. It would be worse yet if
    instead of _single you had written __single or _SINGLE ---
    then you would be well inside the name space reserved
    exclusively for the implementation, even for non-global
    names. As it is, names beginning in an underscore and then
    a lower-case letter are reserved to the implementation, but
    only if used at file-scope. Summing it up, while DEC is the
    primary offender here, by invading user name space with a
    #define, SDCC could avoid running into that trap quite
    easily, by choosing a less vulnerable name. Renaming it to
    flag_signed would already do.

    As to isspace() & friends: the problem is that you don't
    know whether 'char' is signed or not on the host compiler.
    Calling isspace(char) creates programs that aren't "8-bit
    safe". If the compiler has char == signed char, and
    somebody passed in a string with negative-coded characters
    (e.g. I myself might want to spell my real last name in a
    comment, so youl'ld have lines with Brker in them) you'll
    be passing negative numbers between SCHAR_MIN and -1. But
    you're not supposed to pass these functions any other
    negative value than EOF (--> C90, 7.3 paragraph 1). If you
    do, that undefined behaviour.

    So there are only two correct types of arguments to the
    <ctype.h> macros:

    1) ints that came more-or-less directly out of a <stdio.h>
    function (like getchar()), and thus may be either a
    character represented by its unsigned char value stored in
    an int, or EOF

    2) characters represented as unsigned chars

    To pass a character to these macros, it has to be
    represented as unsigned char. If char is unsigned already,
    fine --- but for a portable program you cannot know that.

     
  • Maarten Brock

    Maarten Brock - 2005-08-23
    • assigned_to: nobody --> maartenbrock
     
  • Maarten Brock

    Maarten Brock - 2005-08-23

    Logged In: YES
    user_id=888171

    Ok, confirmed standard says undefined behaviour (7.4 par. 1
    in C99). But then again undefined doesn't mean you have to
    generate insane behaviour. So if a compiler chooses to make
    char signed it might as well implement 'expected' behaviour
    for <ctype.h> functions.

    For now I will change those _signed etc. to b_signed, but
    until someone can tell me <ctype.h> functions actually go
    wrong and can test it after changing, I don't feel like it's worth
    putting my time in.

    Maarten

     
  • Hans-Bernhard Broeker

    Logged In: YES
    user_id=27517

    > doesn't mean you have to generate insane behaviour.

    You're missing the point. "Undefined behaviour" means that
    the source code has given up all chance it ever had of
    reliably *avoiding* insanity. It means the source code
    contains an error, which only luck can keep from manifesting
    itself when the program is run.

    > So if a compiler chooses to make char signed it might as well
    > implement 'expected' behaviour for <ctype.h> functions.

    No. It generally can't do that without rendering the
    (explicitly allowed) macro implementations of <ctype.h>
    calls either inacceptably inefficient or plain wrong. A
    signed 'char' typically has 256 values, 128 of which are
    negative. One of those negative values will quite certainly
    be equal to EOF (usually -1). And that means all results
    for <ctype.h> calls using that particular character must be
    "false".

    From a more hands-on perspective, the canonical
    implementation of these macros is

    unsigned short ctype_array[257] = { /* one bit pattern per
    input */ };
    #define EOF (-1)
    #define isascii(c) (ctype_array[(c)+1] & ctype_ascii_mask)

    This maps the two types of allowed inputs (EOF: -1, unsigned
    representations of characters: 0..255) to a single
    contiguous range [0..256], and uses that as an array index.

    Now imagine what happens if you call isascii(c) with a non-ASCII
    character c, on a signed-char platform. c will be something
    like -60, which means you'll access ctype_array[-59] --- a
    perfect recipe for disaster.

     
  • Maarten Brock

    Maarten Brock - 2005-08-23

    Logged In: YES
    user_id=888171

    Fixed the _signed etc. bitfields.

    And you're right, allthough I was ignoring the point not
    missing it. I would assume in such a case at least a
    boundary check would be performed, but it seems not so.
    Compiler vendors seem build on this 'undefined behaviour'.

    So only half of this is now fixed and the other half remains
    open.

    Maarten

     
  • Hans-Bernhard Broeker

    Logged In: YES
    user_id=27517

    As a matter of fact, the C standard often decrees 'undefined
    behaviour' for the express purpose of allowing efficient
    implementations to still be conforming. In cases where
    several viable extensions of well-defined behaviour into
    unusual corners of parameter space (e.g. negative arguments
    to shift operators) exist, but each would be unduly
    difficult to implement on some fraction of existing
    platforms, the standard generally "chickens out" and rules
    that this causes undefined behaviour.

    Undefined behaviour gives the implementors the breathing
    space they need to produce defined behaviour without loss of
    efficiency. That's exactly what they're doing in the case
    at hand.

     
  • Maarten Brock

    Maarten Brock - 2005-09-03

    Logged In: YES
    user_id=888171

    This is what Borut Razem had to say about this in the sdcc
    developer list:

    8<--------------------------------------------
    Hi Hans-Bernhard Broeker and Maarten Brock,

    here is my comment on your discussion:

    while (isspace ((unsigned char) *lbp)) will not solve the
    problem, because isspace() is defined only for 7 bit ASCII
    values, which means from 0 to 127. The behavior is undefined
    for values out of this range. The proper solution is to test if
    the character is valid ASCII character, and then test if it is a
    space, and no casting is needed:

    while (isascii(*lbp) && isspace(*lbp))

    Borut
    8<--------------------------------------------

    And this is what C99 says about it:

    8<--------------------------------------------
    The header <ctype.h> declares several functions useful for
    classifying and mapping characters. In all cases the
    argument is an int, the value of which shall be representable
    as an unsigned char or shall equal the value of the macro
    EOF. If the argument has any other value, the behavior is
    undefined.
    (...)
    The isalpha function tests for any character for which isupper
    or islower is true, or any character that is one of a locale-
    specific set of alphabetic characters for which none of iscntrl,
    isdigit, ispunct, or isspace is true. In the "C" locale, isalpha
    returns true only for the characters for which isupper or
    islower is true.
    8<--------------------------------------------

    My conclusion is that Borut is incorrect and that casting to
    unsigned char should be sufficient. And on top of that: isascii
    is not defined in the C99 standard (MSVC6 doesn't even
    implement it, it has _isascii).

    I'll start fixing this.

     
  • Hans-Bernhard Broeker

    Logged In: YES
    user_id=27517

    > while (isspace ((unsigned char) *lbp)) will not solve the
    > problem, because isspace() is defined only for 7 bit ASCII
    > values, which means from 0 to 127. The behavior is undefined
    > for values out of this range.

    That's indeed wrong. Not only in C99, either, but ever
    since C was first standardized (ANSI C89, ISO C90).

    For starters, there's nothing in standard C that would
    favour ASCII encoding over any other one --- properly
    written C programs work just fine on EBCDIC machines. The
    only requirement on the execution character is that the
    digits are sequential, i.e. '0'+1 == '1' and so on.

    Actually, a big part of the reason for functions like
    'tolower' or 'islower' to exist is to try and keep
    programmers from making the "all the world's text's in
    ASCII" assumption. It would be quite silly of the standard
    to require a library function if the programmer could just
    exploit features of the ASCII encoding and thus write stuff
    like (c - 'A' + 'a') or (c >= 'a' && c <= 'z') instead. The
    authors of the C standards aren't nearly as dumb as that.

    Thanks for hearing me out and (finally ;-) doing something
    about this. For further reading on matters like this, I'd
    like to strongly recommend "The Standard C Library" by
    P.J.Plauger. It only covers C90, but that's fine --- C99 is
    still largely a paper tiger.

     
  • Borut Ražem

    Borut Ražem - 2005-09-05

    Logged In: YES
    user_id=568035

    Yes, I was wrong. I failed the range: it is 0 - 255. Sorry
    for the noise :-(

    Borut

     
  • Maarten Brock

    Maarten Brock - 2005-09-05

    Logged In: YES
    user_id=888171

    I want to solve this with:

    #define SAFE_CHAR(c) ((c)==EOF ? (c) : ((c) & 0xff)
    #define ISALPHA(c) isalpha( SAFE_CHAR(c) )

    And replace all ctype.h functions with these macros. See
    any pitfalls?

     
  • Borut Ražem

    Borut Ražem - 2005-09-05

    Logged In: YES
    user_id=568035

    This won't work when pass arguments with "side effects", for
    example:
    ISALPHA(++c) will increment c two times.

    A possible solution is to define SAFE_CHAR as a function:
    int safe_char(int c)
    {
    return (c == EOF) ? c : (unsigned char)c;
    }

     
  • Hans-Bernhard Broeker

    Logged In: YES
    user_id=27517

    The correct definition of SAFE_CHAR really is

    #define SAFE_CHAR(c) ((unsigned char) c)

    And no, you shouldn't "replace all ctype.h functions by such
    macros". You must replace those calls that have an argument
    of type char, but not those whose argument is an int that
    could be holding an EOF.

    Testing for c == EOF in the macro is wrong. If 'char' is
    actually a signed type, then one of the actual characters of
    the system will have char value -1, which is almost
    guaranteed to be the same as EOF --- getting us full circle
    to the original argument about why isalpha(c) is always
    wrong: you're now evaluating isalpha(EOF) for an actual
    character.

    To repeat: assuming 8-bit chars, isalpha() and friends have
    257 allowed input values. 0..255, and EOF (which typically
    will be -1). The 256 possible values of a signed char
    variable *must* become 0..255. An int expression passed to
    them must already be in this range of values (which means
    the cast may have to be put elsewhere in the program, rather
    than inside the call of isalpha()).

    Honestly, I don't think there's terribly much to be gained
    by hiding this cast behind a macro.

     
  • Maarten Brock

    Maarten Brock - 2005-09-08
    • milestone: --> fixed
    • status: open --> closed-fixed
     
  • Maarten Brock

    Maarten Brock - 2005-09-08

    Logged In: YES
    user_id=888171

    Fixed in SDCC 2.5.3 #1107

     

Log in to post a comment.