I'm currently looking into calling conventions. The basic idea is this:
See what calling conventions are the most efficient (can depend on function type). Choose an efficient one.
The basis is the noasm2 branch, which replaces all asm functions in the standard library by C code, so no asm code needs to be rewritten when experimenting with different calling conventions.
Also, the code generators need to be fixed so they generate correct code for different calling conventions. Basically: Try a new calling convention in a local copy of sdcc based on the noasm2 branch, if it breaks, fix the bugs in trunk.
So far I want to consider the stm8, hc08-related and z80-related ports. Aspects of the calling convention looked into will include registers used for return value and the use of register parameters.
This is long-term work; I hope to have results sometime in between the SDCC 4.1.0 and 4.2.0 releases.
You may use IY. Or it required extra code to do such call:
Some of our users are on systems where ix and/or iy are reserved for use by the BIOS or OS (we introduced --reserve-regs-iy for them).
After some time with porting to new calling convention. I think, it is better to use unified calling convention for all ports with IX register. Imho, current rules are very very difficult.
Bug [#3260] needs to be fixed before register parameters can be enabled in trunk for stm8 and z80.
Related
Bugs:
#3260I've created the breaktheworld branch for implementing the new calling convention.
I suggest to prevent usage of names like "oldcall", it is not descriptible.
__sdcc_cc_stack
is much better for me.Moreover, I suggest to keep old calling convention by default and use new one only if explicitly specified or special compiler switch is present. At least before next major release.
Last edit: Sergey Belyashov 2021-07-24
I went for __sdcc_oldcall, since in general we change more than just the passing of parameters. Having a more efficient calling convention has been a common feature request, and once we have one, we IMO should make it the default. Users can test the breaktheworld branch to get used to the new one.
I think using the branch works better than a compiler switch for this testing, since the branch will also provide a standard library using the new convention.
Experiments confirm that the convention chosen for r3ka is also well-suited for ez80_z80 and tlcs90.
As of now, regression tests pass for stm8, gbz80, r2k, r2ka, r3ka, tlcs90 in the breaktheworld branch (z80, z180, z80n, ez80_z80 fail due to not-yet ported __itoa family).
In trunk, for those and z80, z180, z80n, ez80_z80, the new convention can be used via __sdcccall(1) and --sdcccall 1.
Last edit: Philipp Klaus Krause 2021-08-22
If the last few experiments don't yield any surprises, I intend to make the new convention the default for stm8 sometimes next week. The other ports will need a bit more time.
I hope that this gives us a good calling convention for the next 8 to 10 years. After that, I expect that we'll have to revisit this again.
In [r12673], the new calling convention became the default for stm8 and gbz80.
It looks to me like the last open questions for the future z80 (and z80n, z180) calling convention have been resolved. Once bug #3181 is fixed, I think we can make it the default.
For r2k, r2ka, r3ka, tlcs90, ez80_z80 it is still open if they should use the same convention as z80 (for simplicity) or a slightly different one (for slightly better code size and speed).
What I don’t really understand is, why is even 8bit parameter limited to only two parameters?
Some experiments done early on showed that having too many register parameters doesn't work very well for SDCC. While implementing register parameter support involved lifting many restrictions in code generation, other restrictions restrictions, including some that originate in earlier phases haven't been lifted.
SDCC send and receives parameters in a fixed order. Each parameters send (on the caller side) and receive (on the callee side) is one iCode.
So the more register parametrs there are, the more restricted is code generation for receiving or sending some of them (since registers are already in use by other parameters). In principle, it should be possible to do something about this problem (maybe similarly to how codegen for memcpy vie ldir for z80 is handled, where all parameters are considered at once), but at the time I wanted to be able to get register parameter support working in a reasonable timeframe (to make sure it is there a few months before 4.2.0, so it gets enough testing).
To illustrate the problem: Consider void f(int, int) and void g (long), where hl and de are used for parameters. Now assume that in f/g and the caller to f/g, the best way to use hl and de are swapped vs. what the ABI mandates. For an assembler programmer, this would be no problem, an 'ex de, hl' would be inserted as the last instruction before the call, and the first instruction in f each.
But current SDCC can't do that for the call to f, and would have to use less-than-optimal registers, since the two parameters are considered one after the other. On the other hand, for g, SDCC would generate the 'ex hl, de':
due to the way current SDCC handles parameters, not only does it matter how many bytes of registers are used, but also how many register parameters there are.
I guess this is something that we should look into before breaking the ABI next time. But that ABI break is still quite a while away: IMO we need experience with wide function pointers, lambdas, struct / union parameters, etc before. I guess 2030 could be a good time to look into this again.
Can you illustrate example in some code?
So sdcc struggles too much to juggle with the current values from registers if they have to be used for arguments and with registers needed for accessing the stack, because it is not capable of putting them in an optimal order?
Just want to say THANKS ALOT for this functionality now found for z80 in 4.2. I've just converted a large c+asm project to 4.2. Not only is the code faster, but all my asm-functions are much cleaner as most now do not involve the stack. Those few functions that involve the stack (/has many parameters) consistently (before and after 4.2) use the
__z88dk_callee
which has proved to give me quite "clean" (or compact) code anyway.Code size has reduced 1.5-3% too. Great stuff!
Last edit: Pål Frogner Hansen 2022-04-06
In [r14373], the new calling convention has been enabled for the remaining z80related ports.
Related
Commit: [r14373]