Running mcs51-large-stack-auto regression tests
results/mcs51-large-stack-auto/tst_addr_modes.out:28:--- FAIL: "Assertion failed" on P3 == 0xBB at cases/../../../../../../share/HOST/TEST/SRC/sdcc-4.2.2_orig/support/regression/cases/../tests/addr_modes.c:21
results/mcs51-large-stack-auto/tst_addr_modes.out:29:--- FAIL: "Assertion failed" on temp == 0xCC at cases/../../../../../../share/HOST/TEST/SRC/sdcc-4.2.2_orig/support/regression/cases/../tests/addr_modes.c:24
Failure: addr_modes.c
tst_addr_modes (f: 2, t: 3, c: 2, b: 1728, T: 365160)
Some guidance for your assessment
The implementation of __data (direct) and _idata (indirect data) is not done stringent.
A direct access should not replaced by an indirect access, because in direct mode >0x80 sfr space is used.
affected functions in src/mcs51/gen.c
genPointerPush
emitPtrByteGet
emitPtrByteSet
genUnpackBits
genNearPointerGet
genPackBits
genNearPointerSet
genPointerSet
genPointerGet
affected functions in src/SDCCmem.c
initMem
The internal stack is always assessed indirect, never direct (it is just not possible).
Libs
all usage of __data have to be reviewed, in the most of the cases __idata is the required access
e.g. setjmp
Testcases
all usage of __data have to be reviewed for correct usage
You can gain performance if the implementation is done correctly.
I fixed almost everything for my use case, the generated code looks now reasonable.
Alternative option would be to generate a compiler warning that the access for __data is not generated directly and indirect addressing mode is used.
Sorry not a tiny fix.
There is a reason why SDCC complains when you try to take the address of an SFR. This is just plain illegal.
temp=(* (__data unsigned char *)(0xB0));What SDCC could do in this case is check if the constant being cast to __data pointer is below 0x80 and throw an error here as well. But I'm sure you can come up with another piece of code where SDCC won't know the value at compile time and nothing can be done about it anymore.
Please allow me some justifications towards your response.
SDCC is not complaining, it just compiles no warning ....
The test case shows that SDCC is generating the wrong code, more friendly sdcc is using indirect instead of direct.
Both SDCC manual and MCS51 documentation are describing clearly the difference between direct and indirect addressing modes.
__data/__near is direct addressing
__idata is indirect addressing
check the following, compiled with sdcc as it is available
I think we both agree, that we have here indirect addressing mode in front of us and not direct like the qualifier says.
So warning is here the minimum.
__idata instead of __data will deliver the same assembly.
What is the difference for __data and __idata ;-)
It is functional neutral for address below <0x80. Indirect and direct addressing is here using the same physical ram location.
For >=0x80 I have shown the functional impact with the test case.
If __data is used, I would expect...
The benefit is here also that you have not to spill r0/r1 or potentially even acc (seen for bitfields), which costs you performance.
->What SDCC could do in this case is check if the constant being cast to __data pointer is below 0x80 and throw an error here as well.
__data is direct access. If your explanation is that __data is behaving in SDCC like __idata, then generate a warning if >=0x80.
The following sequence should normally not be compile-able if the __data qualifier is taken as a real and binding qualifier. I would expect an abort by the compiler
with an error message. The code can just be not generated in direct addressing mode, because i is an variadic argument and can not resolved in link/locate.
With __idata this should be possible.
I honor all the achievements the team here did, I like really to work with the compiler and especially that you made it fully transparent how it was tested.
It is obvious for me, that the implementation is not clean or the consequences are not properly documented.
Please check!
If you confirm, that my statements are correct, I can provide a summary about the changes which I did.
Last edit: Maarten Brock 2022-06-15
__data tells the compiler to place a variable in direct accessible memory, in other words below 0x80. It does not mean that the variable can only be accessed using direct access, because that would make it illegal to take the address of the variable.
__idata tells the compiler to always access a variable using indirect access (through R0/R1). The linker is free to place it anywhere in internal memory between 0x08 and 0xFF.
Indeed SDCC does not generate a warning or error on your code with literals above 0x7F and that is what could be improved. But when the literal is replaced by a label whose address is unknown at compile time there is once again not much SDCC can do.
SDCC could also do a better optimization when the address is known to be below 0x80 and do a direct access instead of an indirect one using R0/R1.
We could also disallow pointers to __data and when the address of a variable in __data is taken turn it into an __idata pointer. That would mean we can never do the optimization to direct access upon dereference anymore.
Please replace:
check the following, compiled with sdcc as it is available
by (my mistake copy and paste failure)
Last edit: Maarten Brock 2022-06-15
I understand your comments.
It is not something which can be easy answered, none of the standards clearly describes how strict additional type qualifier have to be used in different code sequences.
So for SDCC it is actually, use __data as dominating location qualifier. It directs also to use the direct addressing mode, if not possible fall back to indirect addressing mode.
The __idata is more stringent: always locate in IRAM and always access in indirect.
A tiny, but a significant difference for my corner use case (the testcase i provided).
A warning would be good in case of a literal >0x7F for __data.
I would also appreciate a warning if the compiler falls back from __data (if explicitly wanted) to indirect OR
a compiler switch --enable-strict-data-handling (could be default off, then the legacy behavior is not endangered)
But again this is question what __data is? Is it only a storage qualifier or an exact type qualifier incl. the to be used addressing mode.
I prefer exact behavior, like for const, restrict, volatile...
If __data as address mode qualifier can not be used for pointer to variables (e.g. no literal, no label) , I can anyhow take __idata.
The gain for the performance improvements with stricter usage of direct <0x80, I can actually not judge.
Looking at the relevant standard (I.e. the Embedded C TR), these are called "address-space type qualifiers" for "intrinsic address spaces" (SDCC also supports non-intrinsic named address spaces via __addressmod):
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1275.pdf
In words of the Embedded C TR both __data and __idata are intrinsic named address spaces where __data is a subset of __idata. __sfr is another intrinsic named address space which overlaps with neither __data nor __idata.
By constructing a pointer to __data with an address value outside the valid address range for __data you're entering undefined behavior.
In this special case where a literal is used SDCC might notice that and give an error. But when the literal is substituted by an int variable (possibly extern) there is nothing SDCC can do. Your example test() function above is valid only while you provide it a value for i that is in the valid address range of __data.
I'm still wondering why you would want to use this construct instead of using __sfr declarations.
The background is:
you can do:
etc.
Replace it by union:
or you do
The usage of bitfields for register access is common.
The are several threads which are discussing pros and cons in some forums.
It is clear that I can use __sfr and tons of defines.
As I stated already I could solve all the issues, with strict direct addressing usage for __data, which also improves the performance.
The suggested __addressmod I was interested and implemented it to check if it is alternative approach for the issue.
It is working with limitations. Additional address-space qualifiers are using the generic addressmode only __xdata for large and __data for small.
In addition a call overhead is generated and finally the indirect access is used for bitfields.
"By constructing a pointer to __data with an address value outside the valid address range for __data you're entering undefined behavior."
Please have a look on the implementation of setjmp and longjmp
Looks like that __setjmp is then also implemented in an undefined behavior.
Thank you for all your feedback also guiding me to the standard https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1275.pdf
Last edit: Maarten Brock 2022-12-08
You are right and the cast should have used __idata because the stack is in __idata. But the optimization you are seeking would not take place here since the contents of SP are not known at compile time and thus an indirect access is the only possible implementation. Reading SP here does not return the address of SP (which is in __sfr space), but the value of SP which is a pointer into __idata space.
I did not claim an performance improvement for the implementation of setjmp. Just an undefined behavior according to your feedback. So I was confused, I am doing undefined, the libraries not. So fine for me, you confirmed library is also undefined. You will find more undefined test cases e.g. bug1938300.c...
In addition
can you confirm that the implementation of
in SDCCmem.c
for istack is incorrect, should be IPOINTER.
1) Can you confirm or not!
The heading in may claim was "bug - inconsistent implementation of __data and __idata qualifiers".
Please take my comments series! You confirmed now already one, it is not an issue for me to generate more!
Last edit: Maarten Brock 2022-12-08
A qemu maintainer told me by phone that I am to friendly and have to be stronger insist on my technical justifications. So again I want to contribute and not to claim, sdcc is in a good shape!
Why are you using exclamation marks? Feels like you are shouting to me.
And I do take your input serieous.
I have just updated _setjmp.c to use __idata.
initMem() in SDCCmem.c is different. It is there for all targets, not just mcs51. And other targets do have __data, but not __idata.
Further I have a strong feeling we do not completely understand one another. What is it that you want to get accomplished? Maybe there is already too much misinformation in this thread and starting a new fresh one is the best way forward.
Kind regards,
Maarten
It was not my intention to shout you. I think you are right, we have now several things and mixing it up. I will later on only add the initial problem description and same background info.
Thanks again for your patience and taking me series.
I want to do Register accesses witch are declared as bitfields.
I removed the pointer discussion, because it will add just another dimension of discussion, where I would like to later follow up!
Please can you check the following test case:
line 31 is it done correctly and not illegal.
line 248 assignment is done correctly, incl. fail on line 249
I tried as much as I can to deliver disassembly inside the testcase and comment it.
I have done this not shout you, but to give you the best preparation to do an easy judgement.
If something is missing, please state and will I will add.
First, I would make use of an anonymous struct to define the bitfields in TPORT3. But if you prefer to write PORT3.B.PIN7 instead of PORT3.PIN7that is also fine.
Line 29 is wrong, there is no address 0xB0 in __data space.
Line 30 is OK, because 0x30 is in __data space.
Line 31 is OK in medium and large memory model, though it depends on the memory model in which memory space the variable will end up (__pdata or __xdata). It is not OK for small memory model as there is no address 0xB0 in __data space. The generated asm shows you used small memory model though. The register keyword tells the compiler that it may optimize this variable in internal registers (typically R0-R7, ACC, B & DPTR) and you promise you will not take the address of it. This could have been used by SDCC as a way to define SFR's, but history has proven otherwise.
Line 32 is wrong, because 0x30 is not in __sfr space.
The comment on line 96 is wrong. Though you used the register keyword, that does not mean that the compiler must place it in a register and use direct access only.
Line 161 happens to generate the code you expected, but this is not valid code since address 0xB0 is not in __data space.
Lines 236, 238, 240, 241, 243, 245, 246, 248 & 250 are wrong, 0xB0 is not in __data space which will produce undefined behavior.
Lines 239, 240, 245 & 250 pass as this is one way of fulfilling undefined behavior.
Lines 244 & 249 do not pass which is another fulfillment of undefined behavior.
What you seem to want is what is almost shown in line 32. Unfortunately sfr is historically used as both a memory space and a type. It would be a feature request to change this so that __sfr is a memory space only (with a backward compatibility to use an unsigned char type when no other type is specified).
Ideally it would be like this:
__data space has address range 0x00-0x7F. Any variable in it can be accessed both directly and indirectly.
__idata space has address range 0x00-0xFF and contains __data space. Variables placed in it will only ever be accessed indirectly.
__sfr space has address range 0x80-0xFF. Any variable in it can only be accessed directly.
Currently __sfr space only contains bit / char / int / long variables defined by __sbit / __sfr / __sfr16 / __sfr32. Bitfields and other struct/union types are not supported.
Beware that even when bitfields would be supported in __sfr space their access could never be optimized to __sbit like access. The best you can hope for is that "PORT3TEST.B.PIN4=1" is translated to "ORL PORT3TEST, #0x10" (3 bytes). It will not be translated to "SETB PORT3TEST.4" (2 bytes).
Maarten
I can confirm your statements. So the only way is to introduce another memory qualifier for sfr direct without a type qualifier.
I will implement __sfrdata (as new memory type) to solve it. I can also confirm that for __sfrdata the optimization towards __sbit like behavior is only possible in special cases.
address known, etc.
__data space has address range 0x00-0x7F. Any variable in it can be accessed both directly and indirectly.
<- Yes. So far I remember, I detected some misleading usage of __data in some testcases and I think also in setjmp/longjmp is affected. I will do notice.
The optimization which are possible for __data, I will also notice.
__idata space has address range 0x00-0xFF and contains __data space. Variables placed in it will only ever be accessed indirectly.
<- Yes, so far I have seen this is true. If the address is known an optimization towards direct access is still possible.
__sfr space has address range 0x80-0xFF. Any variable in it can only be accessed directly.
<- Yes see __sfrdata, if indirect is here required the compiler should abort with a failure message.
Thank you for your time+feedback.
Please consider the item as closed. It is clear for me what to do now.
If it is of interest I can post the outcome.