Menu

#2577 Escaped hex code in char string incorrectly encoded

closed-rejected
None
Front-end
5
2017-01-16
2017-01-16
alvin
No

sdcc 3.6.5 #9833 (MINGW32)

I've found that hex character codes encoded in character strings cause the strings to be improperly defined in memory.

Example:

const unsigned char hall_valids[42] = "\x01ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789.~ {";

sdcc -mz80 -S test.c

Result:

_hall_valids:
    .db 0xef
    .ascii "GHIJKLMNOPQRSTUVWXYZ0123456789.~ {"
    .db 0x00
    .db 0x00
    .db 0x00
    .db 0x00
    .db 0x00
    .db 0x00
    .db 0x00

The 0x01 byte that should appear at front is changed to 0xef and the initial part of the array A-F is missing.

If I encode that byte in octal:

const unsigned char hall_valids[42] = "\001ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789.~ {";

The string is correctly encoded.

Discussion

  • alvin

    alvin - 2017-01-16

    A litte further research and it turns out this is not a bug.

    The C standard specfies that an octal or hexadecimal character escape uses the longest valid hex string or octal string following the initial \x or \0 to form the hex or octal constant. This is exactly what is happening above.

    I was under the impression that following \x exactly two hex digits are read and following the \0 exactly three octal digits are read but that only works in a world where characters are 8-bits. Now I'm wondering if this changed at some point or if I always had this wrong :P

     
  • Philipp Klaus Krause

    • status: open --> closed-rejected
    • assigned_to: Philipp Klaus Krause
     
  • Philipp Klaus Krause

    I checked the ISO C90 and ISO C11 standards, and they both agree. So if this ever changed in C it must have been in pre-standard times. And if it ever changed in SDCC it was a bugfix.

    Philipp

     

Log in to post a comment.