Inspecting the asm code generated it seems that the for loops starts always with a test on the exit conditions even if its result on the first iteration is known.
In those cases it is possible to move the test on the exit condition to the bottom of the loop and merge it with the operations needed for increasing/decreasing the counter
The resulting code is more efficient and shorter
As an example, look at the for() at lines 84-86 in intro.c
and to its asm equivalent
As we know that at the first iteration the condition i>0 is true, the loop can be arranged moving the test at the bottom saving a jump
e.g.
Last edit: Ragozini Arturo 2022-03-22
Shouldn't the loop reverse optimization take care of this? Someone should check why it doesn't apply to your example.
No, it shouldn’t.
To my knowledge it only works for
for (j=0;j<16*3;j++), see [bugs:#3162]Related
Bugs: #3162
All the loops in the intro.c code where the counter variable is defined within the for() do not have the loop reverse optimization. It has to be related to the bug you mention.
I don’t know where/how to fix the loop optimization and generation, but I also didn’t look into it yet.
Here is some demonstration of how it works currently in [r13284]
funcis not optimized, because the optimization rule does not support this new loop style.func2is not optimized because it has a function call, which is not supported and documented. But at least it did the comparison at the end of the loop.func3does the optimization. It’s fine for sm83, but not for z80. Because it decrementsa, it can’t usedjnz, which might only be applied through peep hole rules.func4does the optimization and is not fine.On sm83 you would completely change the whole loop. Don’t know if z80 also has a special instruction for that or if it’s not worth to write it that way due to cycles:
$ sdcc -mz80 -S test.c$ sdcc -msm83 -S test.c(disclaimer: might contain off by one errors, I’m not fully awake yet)
(edit: fix snippet)
Related
Commit: [r13284]
Last edit: Sebastian Riedel 2022-03-23
Suggestion about loops. On z80 the cpi instruction can be used to control 16 bit loops on BC.
This loop:
is equivalent to this other loop
where the dec hl compensates the side effects of CPI and can be omitted if HL is not used or holds useless values.
The trick can work only on BC so it has to go in the code generation during the register assignment phase
Last edit: Maarten Brock 2022-12-15
I don’t see the benefit if
hlcan’t be trashed. In my table it’s just slower.cpi2B 16Cld a,b1B 4Cor a,c1B 4CEven when transhing hl it’s 2B 16C vs 3B 14C, so it would need to depend on the code optimization settings (speed/size).
It might be a problem to have memory at random places be accessed? Since it compares
awith(hl)Last edit: Sebastian Riedel 2022-03-22
For the read access to random memory locations: SDCC may generate them only if the option --allow-unsafe-read is used.
You are right, if hl cannot be trashed it is just slower, otherwise it is 1 byte shorter (but 2 cycles slower). About reading a byte at hl normally it should be safe, at least on the msx and on the colecovision.
Due to
inandoutthere is probably no z80 who uses memory mapping.It looks like it could be worth it to swap
bandcand use a similar strategy like we use on GameBoy. That allows to usejrnzin the inner loop, therefore speed that part up significantly. For constants it’s the same size ascpiifhlcan’t be trashed. Ifbandchave to be swapped first andhlis free, it gets bigger in size.That
jrnz addrstrategy could also be used on rabbit, which lackscpi. TLCS even hasjrnz bc, addro.O(edit: fix snippet)
Last edit: Sebastian Riedel 2022-03-23
There are.
Consider e.g. the ColecoVision. By itself, it doesn't use memory-mapped I/O. The designers treated the cartridge port as a way to map ROM into 32K of the address space. For that you don't need the IO signals, you don't even need read / write signals.
So all the system designers put there were data lines, address lines, power and an (actually 4, but that doesn't matter here) active signal.
So when game programmers wanted to use more than 32K in a game cartridge, they had do do the mappers via memory-mapped I/O. And since the cartridge port doesn't have a read / write signal, a read from an address used by the mapper will trigger a write, changing the memory mapping / banking registers.
I think you are right. Some home brew cartridges for colecovision have a rom mapper whose pages are activated by reading specific rom addresses. Anyway, if you allow the cpi optimisation only if --allow-unsafe-read is set and you need to optimise for size, it could work.
BTW there are cases where the HL cannot be trashed but it is actively pointing to an array sliding one step at each iteration (up or down).
In those cases the cpi trick (and its cpd equivalent) can be used to include the inc hl / dec hl step, resulting in shorter and faster code (because you would have has inc/dec hl in the other way).
(edit: added hint about inc hl/dec hl and cpi/cpd)
Last edit: Ragozini Arturo 2022-03-23
on z80 it is
it decrements B and jumps if it is not zero
you should do
Any news on adding this fix to the next release ?
No. Maybe some SDCC developer will work on it, maybe not. Maybe some user will post a patch that gets applied, maybe not.
But if something happens, you will notice by this issue being closed.