When the last instruction before a return is call to another function (tail call) ther is potential for optimization.
Feature request #1893626 did this on Z80 for functions without parameters as in the following example (making it as efficient as if a loop had been used):
extern volatile unsigned char ready;
extern volatile unsigned char data;
unsigned char poll_rec(void)
{
if(ready)
return(data);
return(poll_rec());
}
However for functions with parameters this seems too complicated to do in the peepholes.
When the callee's parameters do not use more stack space than the caller's, the parameters could be passed in the stack space used by the caller's parameters (the caller's parameters are no longer needed, since there are no instruction after the call). This would make it possible to optimize tail calls as in the following example (currently this would run out of stack space):
extern volatile unsigned char *readies;
extern volatile unsigned char *datas;
unsigned char poll_rec_i(unsigned char i)
{
if(readies[i])
return(datas[i]);
return(poll_rec_i(i));
}
Philipp
Logged In: YES
user_id=888171
Originator: NO
I think this is more difficult than you think. What if you need to pass parameters on the stack to the function that are already on the stack but maybe in a different place?
And I also see no need to let the compiler solve a memory leak created by the user with this recursive calling.
Hey Maarten,
I am here because Philipp pointed me from there
https://sourceforge.net/p/sdcc/discussion/1864/thread/ff9323e58b/
I have practically zero knowledge about compilers.
So, for that matter, I will leave it to you, Philipp, and others.
However, I am a software architect, and I wanted to argue as well as persuade you.
That this feature is crucial and has value far beyond you imagine.
It is not about fixing some memory leaks nor enhancing recursive algorithms.
I am developing a powerful technique for embedded systems.
Based on the idea around the composability
https://en.wikipedia.org/wiki/Composability
To make this quick, please look at my demo
https://github.com/midnight-wonderer/composed-1hz/blob/master/src/main.c
The program is just your plain old LED blinking.
The LED blink at 1Hz with timer interrupts.
What matters is the program is a composition of 2 solid and carefully designed modules, independent of each other.
https://github.com/midnight-wonderer/stm8s-tim4-periodic-timer
https://github.com/the-cave/time-prescaler
STM8S TIM4 Periodic Timer is a firmware (which depends on hardware)
while
Time Prescaler is a software (which independents from hardware)
To achieve composability in c language, I exploited the function pointers.
Like jigsaws, functions can be composed perfectly with function pointers.
There are principles involved to design such modules, but to keep this brief, I won't go into detail here.
For a full-blown system (everything is composable), the stack could get really deep.
It will be one hell of functions and their pointers.
And this particular optimization happens to prevent the shortcoming.
But what does this mean to you and the world?
(let's assume every people are using the technique)
https://github.com/the-cave/button-debounce, one of the best IMO.
And you know what?
I can't do any of what I said. I can't really build a community. I can't even alter the functionalities of the compiler I am using.
But, I try to enable things on my parts. Showing people what is possible, inventing a better way to do things.
So, we are a step closer to something better.
And this optimization would really help.
Plug section:
If you don't really interest in my composability technique. Fine.
However, if by chance, you want to read more about it.
I plan to publish it one day on my blog (currently nothing there), once I have more time to work on it.
https://slime.systems/
TL;DR
The things that you claims to be bugs are actually features. And the use case of the compiler is valid.
I agree that a first step could be to do the optimization in cases where there are no parameters, where we would just move the stack frame adjustment in front of the call.
But even there we need to be careful, and analyze the code before we do so:
a could store c in a global pointer variable that f or g could access, so in this case we could not do such an optimization:
It matters if the caller (test3 here) has any variables that had their address taken.
Today, I made some improvements to tail call optimization in the stm8, z80, z180, gbz80, r2k, r3ka, tlcs90, ez80-z80, pdk13, pdk14, pdk15 backends. In particular, parts of tail call optimization that used to be done in the peephole optimizer are now done in code generation instead.
As of [r11425], for stm8, tail calls should always be optimized when the following conditions hold:
Sarun, is this sufficient for you?
For some simple cases, tail calls will be optimized even if some of these conditions don't hold.
Last edit: Philipp Klaus Krause 2019-10-13
Whoa,
The response is quite fast.
But, I just realized something, the problem is quite tough indeed.
The main blocker for me is: "The callee has no arguments".
I just realize the arguments in the stack is cleared by the caller.
If the caller is the one who clear the stack, that would make the call not a tail call anymore.
So "a function that take parameter can't be optimized by tail call eliminations".
I wonder why the calee is not the one who clean up after their usages.
However, I suspected that the reason, whatever it is, could sound quite valid as well.
For me the problem is still there, but with your optimization the call stack usage efficientcy would double. (half of tail calls have been eliminated) thanks.
This is not a show stopper for me.
I have plan B.
https://github.com/the-cave/deferred-execution
which under the same condition as the optimization (the callee takes no argument) can flatten the stack frame, albeit some downside.
I don't know much about compiler but may I ask you why it is design this way? (the caller clear the stack as opposed to the calee do it.)
@spth the last change introduces a bug in ISR.
In case of ISR, the optimization will leads to
ret
instead ofiret
.Thanks for reporting. it is fixed now [r11446].
Last edit: Philipp Klaus Krause 2019-11-05