When using the %p format specifier in a format string with the standard library function printf() on big-endian platforms, the hex-formatted pointer value has the bytes output in the wrong order. For example, a pointer with value of 0x87a1 is output as "0xa187".
It would be desirable if the standard library code could be corrected to fix this minor flaw. Filing as a feature request because at the moment it is only an annoyance rather than outright broken or unusable.
The following test program illustrates the issue:
#include <stdint.h>
#include <stdio.h>
#define UCSIM_IF (*(volatile unsigned char *)0x5800)
static int foo;
int putchar(int c) {
UCSIM_IF = 'p'; // Print command
UCSIM_IF = (unsigned char)c;
return c;
}
void main(void) {
const int *p = &foo;
printf("printf() pointer formatting: uintptr_t = 0x%04x, %%p = %p\n", (uintptr_t)p, p);
while(1);
}
Compile as follows for each of a representative platform that is big-endian (STM8) and little-endian (Z80):
sdcc -mstm8 -o test_stm8.ihx test.c
sdcc -mz80 -o test_z80.ihx test.c
Run the test binaries under uCsim:
sstm8 -t STM8S -I if=rom[0x5800] -g test_stm8.ihx
sz80 -t Z80 -I if=rom[0x5800] -g test_z80.ihx
The output is as follows:
STM8: printf() pointer formatting: uintptr_t = 0x0001, %p = 0x0100
Z80: printf() pointer formatting: uintptr_t = 0x8000, %p = 0x8000
As can be seen, the STM8 %p formatting has the bytes transposed.
I believe - correct me if I'm wrong - the only platforms supported by SDCC that are big-endian are STM8 and HC08/S08. Perhaps just a simple modification will be needed to the printf() code to test if it is being compiled for those platforms and swap the order of pointer byte output in those cases.
I have made modifications to printf() and now output is correct for big-endian platform:
Z80:
printf() pointer formatting: uintptr_t = 0x8000, %p = 0x8000STM8:
printf() pointer formatting: uintptr_t = 0x0001, %p = 0x0001It appears the printf() implementation in printf_large.c is the only one that needed modifying; all others were either 'small' and didn't implement %p or were MCS51-specific (i.e. little-endian).
Please see attached patch.
Instead of checking the target I would prefer to create a new predefined macro to indicate the endianness of the target and then use that here.
I suggest to use the same as GCC uses:
Is the proposal for ISO C23. But WG14 has not yet decided on it.
But I think we shouldn't have to use either here. Can't we just keep those mcs51 and ds390 special cases, but for all other targets fall through to the integer printing code to just print the address as if it was printed as uintptr_t?
Is this the latest proposal document regarding C23 endianness enumerations?
I am reading it and am slightly confused. Is
__STDC_ENDIAN_NATIVE__(whose value will be little, big or 'other') defined within stdbit.h too? I can't see how it could be anything other than compiler-defined. Otherwise, in a multi-platform compiler like SDCC, a separate stdbit.h file would be needed per platform.Would it be too hasty (ahead of any final C23 decision) to propose doing the following now?
__STDC_ENDIAN_LITTLE__and__STDC_ENDIAN_BIG__defines.__SDCC_ENDIAN_NATIVEto SDCC which has value equal to one of the__STDC_ENDIAN_*__, depending on platform.__STDC...instead.From the proposal, the macros are defined in stdbit.h. So users can only rely on them when including stdbit.h. On the other hand, the name starts with double-underscore-capital-letter, so it is a reserved name, so implementations would also be free to just always define it.
What you linked to is the latest version officially submitted to WG14 as a proposal for C23. The author also has an updated draft, that takes into account some feedback from WG14, which I guess will be further revised and then submitted officially in time for the WG14 meeting in February.
The draft is here:
https://thephd.dev/_vendor/future_cxx/papers/C%20-%20Modern%20Bit%20Utilities.html
With the proposal under discussion, I'd suggest to wait for the result. Otherwise we'll introduce something, and users will start to rely on it, that might turn out to be incompatible with what ends up in C23. We might the end up with macros that behave differently when using --std-sdccXX vs. --std-cXX, which I want to avoid.
However, there is something we could do now: Introduce a macro, for library use only, don't expose it to users. For such stuff, we have the headers device/include/asm/*/features.h, which can be included by the library source files.
Then, if WG14 decides to include a version of the endianess macros in C23, we can still modify our library to use the standard version later.
I was unaware of this proposal. This means we don't need any new predefined macro. We can just test the existing target macros in stdbit.h and define
__STDC_ENDIAN_NATIVE__accordingly. We can start in asm/*/features.h or go directly for stdbit.h, I would not mind.Ah, I wasn't aware of features.h and how it is used. After looking into it, I think that could be just as good a solution.
However, one very minor reservation I have with that is that doing an endianness macro as a compiler define has the advantage that it would simply follow the 'canonical'
port->little-endianflag as featured in each port'sPORTstruct. Whereas adding defines to features.h files feels a little less 'authoritative' to me. How do you feel about that?But anyway, going down the features.h route, I think perhaps the following could be done:
#define _SDCC_ENDIAN_BIG 1(or similar name) to stm8 and hc08 features.h files.#define _SDCC_ENDIAN_LITTLE 1to the features.h files of all other ports.__SDCC_hc08and__SDCC_s08that includes hc08/features.h.Later on, any stdbit.h implementation could then use the endianness macros from sdcc-lib.h.
That sounds like a good idea. Although, it may take some work to search the standard library for code that checks for big-endian targets.
I did a cursory search of device/lib/* when I made my modifications, and turned up the following results. Searching for all three STM8/HC08/S08 defines being tested or any mention of "big-endian" (with or without hyphen).
I am sure this isn't everything, so some more diligent searching may be necessary to identify all the places that an endianness macro should be substituted in.
I've been reminded of this yesterday when writing some STM8 code printing pointer values. I note that now the standard library has stdbit.h, we have the C2x
__STDC_ENDIAN_*macros available, and also that other code in the standard library has been updated to use them (e.g. device/lib/_mulint.c, device/lib/_mullong.c).So, I would like to produce a revised patch to fix %p output on big-endian platforms using the new standard macros. The code changes will be trivial, but one thing gives me pause.
In device/lib/printf_large.c, after the header includes, there is a line:
If an include for stdbit.h is added alongside the existing header includes, will this cause problems? I don't know whether the
std_*pragmas are file-scope or apply only to subsequent code. If the former, I would imagine it may cause SDCC to complain at the use of_BitIntin stdbit.h.The #pragma is due to [bugs:#3497].
Related
Bugs:
#3497I trialled changes to printf_large.c, part of which was to include stdbit.h, but I am surprised that it compiles without any warning or error.
I would have assumed that when source code is compiled with a pragma or argument specifying a language standard other than C2x, the use of
_BitInt()in stdint.h would cause SDCC to throw an error. Should this not be the case?Ah, wait, I failed to notice that when I updated from SVN earlier today, it included the changes from [r13841], which removes the
std-c11pragma from printf_large.c. So I was in fact compiling in C2x mode.Related
Commit: [r13841]
I have submitted a patch, #457.
Related
Patches:
#457Ticket moved from /p/sdcc/feature-requests/756/
Can't be converted:
IMO, the old behaviour can be considered a bug, so I moved this to bug reports (though strictly speaking, it is not as the standard does not specify the representation for printing pointers - the standard only requires that %p is compatible between printf and scanf and we don't have scanf yet; the standard also requires that this implementation-defined behaviour be documented, and our manual tells the user to read the library source to find out about this).
Personally, I would prefer %p of a pointer to just output the same value as when printing the pointer cast to uintptr_t as hex number. However, that would mean changing current ds390 and mcs51 behaviour. But it would have the clear advantage of a single, simple and elegant solution that works for all ports.
We definitely should decide on the solution before scanf gets implemented.
Last edit: Philipp Klaus Krause 2023-02-02
Fixed today by applying the provided patch.