SDCC is not that good at allocating variables on the stack. It makes rather wasteful use of stack space. E.g. "SDCC generates very bad stack-overflow-prone code for long functions […] effectively eliminates possibilities for usage of SDCC with any RTOS on STM8." from https://github.com/shkolnick-kun/bugurtos/issues/13 (though stm8 is worse than the other backends).
In the long term, it might be possible to come up with an advanced scheme based on the works of Thorup in regoster allocation.
For now, a big step forward could be done by using the state of the art, which, AFAIK, is based on Chaitin's work in register allocation. A few years ago I did some experiments showing that an unaligned variant of this could save even more stack space compared to the aligned variant. However, the unaligned variant would require changes in code generation.
As a first step, I want to change to aligned Chaitin for the z80-based backends; I have been working on this most of yesterday and today, and hope to be able to commit my work later today.
Next steps:
Philipp
P.S.: I had already worked on this in 2011 / 2012 in the stack-compact branch. Now I basically just fix parts of that and port them to current SDCC.
Diff:
Diff:
Diff:
In [r10654], aligned Chaitin is the new defautl allocator for z80-related backends (the old one can be chosen by changing the line
in ralloc.h.
Overall, it tends to result in lower stack usage (a typical example would be print_format() from the printf() implementation now using 52 instead of 67 bytes for variables on the stack), though there are a few stack size regression, too (mostly in functions with few, but large variables) - the regressions seem to be due to stricter alignment.
Philipp
In [r10659], aigned Chaitin is used as the allcoator for stm8. Since there were serious issues in the old one, no option to go back to the old one is provided.
Again, there are some stack size regressions for some functions with few, large variables. For most functions there is a reduction in stack usage (e.g. print_format() went from 109 to 52 bytes).
For some functions that were affected by bugs in teh old allocator, there is a huge reduction (e.g. a function in the regression test gcc-torture-execute-ashldi-1 went from 504 bytes down to 8).
Philipp
In revision [r10666], with a bit of fine-tuning the heuristis in the aligned Chaitin allocator, the few regressions got fixed; overall, stack usage went down a bit further.
SDCC now has a state-of-the-art stack allocator (for the stm8, z80, z180, gbz80, tlcs90, r2k and r3ka backends).
There probably is further potential in:
Philipp
¹ This will involve a trade-off: Code size and speed vs. stack size.
² Needs some modifications in the backends.
Last edit: Philipp Klaus Krause 2018-11-20
In revision [r10675], the aligned Chaitin allocator has been replaced by unaligned Chaitin¹. The result is a further, but small reduction in stack usage.
Philipp
¹ So far relative alignment restristions are enforced on operands that are used in the same instructions. Pendingn changes in the backends, this could be relaxed a bit further, to gain a little bit more stack space.
Sometimes the compiler creates too many auxiliary locals for the same local struct variable (Version 3.8.4 #10807 (MINGW32), compiler options: -mstm8 --std-c11 --opt-code-speed --max-allocs-per-node 100000).
E.g., we have the next types:
Assignment of every structure field to a local variable causes creating a separate local pointer to nested struct for every opreation:
produces the next code:
Full test program text:
Full compiler output for test2 function:
It's interesting that removing either while loop or "if(i) break;" line makes compiler producing better output (without these weird locals, taking 13 bytes of stack memory instead of 25).