Menu

#3543 Correct %p printf() formatting on big-endian platforms

closed-fixed
None
other
5
2023-02-04
2021-11-12
No

When using the %p format specifier in a format string with the standard library function printf() on big-endian platforms, the hex-formatted pointer value has the bytes output in the wrong order. For example, a pointer with value of 0x87a1 is output as "0xa187".

It would be desirable if the standard library code could be corrected to fix this minor flaw. Filing as a feature request because at the moment it is only an annoyance rather than outright broken or unusable.

The following test program illustrates the issue:

#include <stdint.h>
#include <stdio.h>

#define UCSIM_IF (*(volatile unsigned char *)0x5800)

static int foo;

int putchar(int c) {
    UCSIM_IF = 'p'; // Print command
    UCSIM_IF = (unsigned char)c;
    return c;
}

void main(void) {
    const int *p = &foo;
    printf("printf() pointer formatting: uintptr_t = 0x%04x, %%p = %p\n", (uintptr_t)p, p);
    while(1);
}

Compile as follows for each of a representative platform that is big-endian (STM8) and little-endian (Z80):

sdcc -mstm8 -o test_stm8.ihx test.c
sdcc -mz80 -o test_z80.ihx test.c

Run the test binaries under uCsim:

sstm8 -t STM8S -I if=rom[0x5800] -g test_stm8.ihx
sz80 -t Z80 -I if=rom[0x5800] -g test_z80.ihx

The output is as follows:

STM8: printf() pointer formatting: uintptr_t = 0x0001, %p = 0x0100
Z80: printf() pointer formatting: uintptr_t = 0x8000, %p = 0x8000

As can be seen, the STM8 %p formatting has the bytes transposed.

I believe - correct me if I'm wrong - the only platforms supported by SDCC that are big-endian are STM8 and HC08/S08. Perhaps just a simple modification will be needed to the printf() code to test if it is being compiled for those platforms and swap the order of pointer byte output in those cases.

Discussion

  • Basil Hussain

    Basil Hussain - 2021-11-13

    I have made modifications to printf() and now output is correct for big-endian platform:

    Z80: printf() pointer formatting: uintptr_t = 0x8000, %p = 0x8000
    STM8: printf() pointer formatting: uintptr_t = 0x0001, %p = 0x0001

    It appears the printf() implementation in printf_large.c is the only one that needed modifying; all others were either 'small' and didn't implement %p or were MCS51-specific (i.e. little-endian).

    Please see attached patch.

     
  • Maarten Brock

    Maarten Brock - 2021-11-14

    Instead of checking the target I would prefer to create a new predefined macro to indicate the endianness of the target and then use that here.

     
    • Maarten Brock

      Maarten Brock - 2021-11-14

      I suggest to use the same as GCC uses:

      __BYTE_ORDER__
      __ORDER_LITTLE_ENDIAN__
      __ORDER_BIG_ENDIAN__
      
      __BYTE_ORDER__ is defined to one of the values
      __ORDER_LITTLE_ENDIAN__ or __ORDER_BIG_ENDIAN__ to reflect the layout
      of multi-byte and multi-word quantities in memory. If __BYTE_ORDER__
      is equal to __ORDER_LITTLE_ENDIAN__ or __ORDER_BIG_ENDIAN__, then
      multi-byte and multi-word quantities are laid out identically: the
      byte (word) at the lowest address is the least significant or most
      significant byte (word) of the quantity, respectively.
      
      You should use these macros for testing like this:
      
      /* Test for a little-endian machine */
      #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
      
       
      • Philipp Klaus Krause

        _STDC_ENDIAN_LITTLE__ == __STDC_ENDIAN_NATIVE__
        

        Is the proposal for ISO C23. But WG14 has not yet decided on it.

        But I think we shouldn't have to use either here. Can't we just keep those mcs51 and ds390 special cases, but for all other targets fall through to the integer printing code to just print the address as if it was printed as uintptr_t?

         
        • Basil Hussain

          Basil Hussain - 2021-11-16

          Is this the latest proposal document regarding C23 endianness enumerations?

          I am reading it and am slightly confused. Is __STDC_ENDIAN_NATIVE__ (whose value will be little, big or 'other') defined within stdbit.h too? I can't see how it could be anything other than compiler-defined. Otherwise, in a multi-platform compiler like SDCC, a separate stdbit.h file would be needed per platform.

          Would it be too hasty (ahead of any final C23 decision) to propose doing the following now?

          • Add a rudimentary stdbit.h to the SDCC standard library with just the __STDC_ENDIAN_LITTLE__ and __STDC_ENDIAN_BIG__ defines.
          • Add a compiler define of something like __SDCC_ENDIAN_NATIVE to SDCC which has value equal to one of the __STDC_ENDIAN_*__, depending on platform.
          • Find all instances where platform endian-specific conditionality occurs in the stdlib code and change to use these endianness defines.
          • Later, if/when C23 is ratified, change (or add as an alias) the compiler define to be __STDC... instead.
           
          • Philipp Klaus Krause

            From the proposal, the macros are defined in stdbit.h. So users can only rely on them when including stdbit.h. On the other hand, the name starts with double-underscore-capital-letter, so it is a reserved name, so implementations would also be free to just always define it.

            What you linked to is the latest version officially submitted to WG14 as a proposal for C23. The author also has an updated draft, that takes into account some feedback from WG14, which I guess will be further revised and then submitted officially in time for the WG14 meeting in February.

            The draft is here:
            https://thephd.dev/_vendor/future_cxx/papers/C%20-%20Modern%20Bit%20Utilities.html

             
          • Philipp Klaus Krause

            With the proposal under discussion, I'd suggest to wait for the result. Otherwise we'll introduce something, and users will start to rely on it, that might turn out to be incompatible with what ends up in C23. We might the end up with macros that behave differently when using --std-sdccXX vs. --std-cXX, which I want to avoid.

            However, there is something we could do now: Introduce a macro, for library use only, don't expose it to users. For such stuff, we have the headers device/include/asm/*/features.h, which can be included by the library source files.

            Then, if WG14 decides to include a version of the endianess macros in C23, we can still modify our library to use the standard version later.

             
            • Maarten Brock

              Maarten Brock - 2021-11-16

              I was unaware of this proposal. This means we don't need any new predefined macro. We can just test the existing target macros in stdbit.h and define __STDC_ENDIAN_NATIVE__ accordingly. We can start in asm/*/features.h or go directly for stdbit.h, I would not mind.

               
            • Basil Hussain

              Basil Hussain - 2021-11-16

              Ah, I wasn't aware of features.h and how it is used. After looking into it, I think that could be just as good a solution.

              However, one very minor reservation I have with that is that doing an endianness macro as a compiler define has the advantage that it would simply follow the 'canonical' port->little-endian flag as featured in each port's PORT struct. Whereas adding defines to features.h files feels a little less 'authoritative' to me. How do you feel about that?

              But anyway, going down the features.h route, I think perhaps the following could be done:

              • Create hc08/features.h. I believe this would just be a copy of default/features.h.
              • Add #define _SDCC_ENDIAN_BIG 1 (or similar name) to stm8 and hc08 features.h files.
              • Perhaps also add a #define _SDCC_ENDIAN_LITTLE 1 to the features.h files of all other ports.
              • Add a new conditional section to sdcc-lib.h for __SDCC_hc08 and __SDCC_s08 that includes hc08/features.h.
              • Change all instances of endianess compile-time conditional tests in device/lib code to use these new macros (adding an include for sdcc-lib.h if not already present).

              Later on, any stdbit.h implementation could then use the endianness macros from sdcc-lib.h.

               
    • Basil Hussain

      Basil Hussain - 2021-11-14

      That sounds like a good idea. Although, it may take some work to search the standard library for code that checks for big-endian targets.

      I did a cursory search of device/lib/* when I made my modifications, and turned up the following results. Searching for all three STM8/HC08/S08 defines being tested or any mention of "big-endian" (with or without hyphen).

      ./_rlulonglong.c:35:#if defined(__SDCC_hc08) || defined(__SDCC_s08) || defined(__SDCC_stm8) // Big-endian
      ./_mulint.c:211:#if defined(__SDCC_hc08) || defined(__SDCC_s08) || defined(__SDCC_stm8)
      ./_mullong.c:632:#if defined(__SDCC_hc08) || defined(__SDCC_s08) || defined(__SDCC_stm8)
      ./_mullong.c:633:/* big endian order */
      ./_rrslonglong.c:36:#if defined(__SDCC_hc08) || defined(__SDCC_s08) || defined(__SDCC_stm8) // Big-endian
      ./printf_large.c:657:#elif defined (__SDCC_hc08) || defined (__SDCC_s08) || defined (__SDCC_stm8)
      ./_rrulonglong.c:36:#if defined(__SDCC_hc08) || defined(__SDCC_s08) || defined(__SDCC_stm8) // Big-endian
      ./pic14/libsdcc/_mulint.c:214:#if defined(__SDCC_hc08) || defined(__SDCC_s08) || defined(__SDCC_stm8)
      ./pic14/libsdcc/_mullong.c:635:#if defined(__SDCC_hc08) || defined(__SDCC_s08) || defined(__SDCC_stm8)
      ./pic14/libsdcc/_mullong.c:636:/* big endian order */
      

      I am sure this isn't everything, so some more diligent searching may be necessary to identify all the places that an endianness macro should be substituted in.

       
  • Basil Hussain

    Basil Hussain - 2023-01-28

    I've been reminded of this yesterday when writing some STM8 code printing pointer values. I note that now the standard library has stdbit.h, we have the C2x __STDC_ENDIAN_* macros available, and also that other code in the standard library has been updated to use them (e.g. device/lib/_mulint.c, device/lib/_mullong.c).

    So, I would like to produce a revised patch to fix %p output on big-endian platforms using the new standard macros. The code changes will be trivial, but one thing gives me pause.

    In device/lib/printf_large.c, after the header includes, there is a line:

    #pragma std_c11
    

    If an include for stdbit.h is added alongside the existing header includes, will this cause problems? I don't know whether the std_* pragmas are file-scope or apply only to subsequent code. If the former, I would imagine it may cause SDCC to complain at the use of _BitInt in stdbit.h.

     
    • Philipp Klaus Krause

      The #pragma is due to [bugs:#3497].

       

      Related

      Bugs: #3497

    • Basil Hussain

      Basil Hussain - 2023-01-31

      I trialled changes to printf_large.c, part of which was to include stdbit.h, but I am surprised that it compiles without any warning or error.

      I would have assumed that when source code is compiled with a pragma or argument specifying a language standard other than C2x, the use of _BitInt() in stdint.h would cause SDCC to throw an error. Should this not be the case?

       
    • Basil Hussain

      Basil Hussain - 2023-01-31

      Ah, wait, I failed to notice that when I updated from SVN earlier today, it included the changes from [r13841], which removes the std-c11 pragma from printf_large.c. So I was in fact compiling in C2x mode.

       

      Related

      Commit: [r13841]

  • Basil Hussain

    Basil Hussain - 2023-02-01

    I have submitted a patch, #457.

     

    Related

    Patches: #457

  • Philipp Klaus Krause

    Ticket moved from /p/sdcc/feature-requests/756/

    Can't be converted:

    • _milestone:
     
  • Philipp Klaus Krause

    IMO, the old behaviour can be considered a bug, so I moved this to bug reports (though strictly speaking, it is not as the standard does not specify the representation for printing pointers - the standard only requires that %p is compatible between printf and scanf and we don't have scanf yet; the standard also requires that this implementation-defined behaviour be documented, and our manual tells the user to read the library source to find out about this).

    Personally, I would prefer %p of a pointer to just output the same value as when printing the pointer cast to uintptr_t as hex number. However, that would mean changing current ds390 and mcs51 behaviour. But it would have the clear advantage of a single, simple and elegant solution that works for all ports.

    We definitely should decide on the solution before scanf gets implemented.

     

    Last edit: Philipp Klaus Krause 2023-02-02
  • Philipp Klaus Krause

    • status: open --> closed-fixed
    • assigned_to: Philipp Klaus Krause
    • Category: --> other
     
  • Philipp Klaus Krause

    Fixed today by applying the provided patch.

     

Log in to post a comment.

MongoDB Logo MongoDB