There are situations when testing a bit of an IO/peripheral register where the code that is generated does not make use of STM8's BTJT/BTJF instructions, but instead does LD+BCP+JRxx. This only seems to apply to scenarios where the register has been #define'd, and not when using absolute-addressed vars with __at
attribute (although there is a separate issue with the latter - see end).
Some example C code to illustrate:
/* Compile with: sdcc -mstm8 --nostdlib -S --fverbose-asm <file>.c */ #define _SFR(mem_addr) (*(volatile unsigned char *)(mem_addr)) #define PA_IDR _SFR(0x5001) #define PB_IDR _SFR(0x5006) #define PC_IDR _SFR(0x500B) volatile unsigned char __at(0x5010) PD_IDR; static volatile unsigned char foo; void main(void) { volatile unsigned int bar = 0; /* Could use btjt/btjf, does not */ foo = (PA_IDR & (1 << 5) ? 0x86 : 0x10); while(PB_IDR & (1 << 3)) bar++; if(!(PC_IDR & (1 << 0))) bar++; /* Uses btjt/btjf, but branch logic is sub-optimal */ if(foo & (1 << 7)) bar++; if(PD_IDR & (1 << 1)) bar++; if(!(PD_IDR & (1 << 2))) bar--; }
The assembly code generated for these bit-test condition cases generally takes the form:
ld a, 0x5001 bcp a, #0x20 jreq 00117$
It of course varies depending on the nature of the condition - e.g. jreq
/jrne
according to polarity of test; srl a
and jrnc
/jrc
when testing bit zero. (By the way, when testing bit 7, ld
+jrmi
/jrpl
is generated, but that case is actually optimal because it is the same number of cycles and bytes as BTJx.)
These can be made more efficient by using btjt
or btjf
instructions. The above combination uses 3/4 cycles and 6/7 bytes, whereas a single bit-test instruction uses 2/3 cycles and 5 bytes - faster and smaller.
I have drafted a set of (not yet fully tested) peephole rules to potentially make these optimisations - see attached. However, it may be that it is better to be changed in the compiler's code generation.
A related issue comes when the bit being tested is from a global variable or register mapped to a variable with __at
. For these, btjt
and btjf
are indeed used, but seem to generate inoptimal code for the branching logic:
btjf _PD_IDR+0, #2, 00167$ jra 00113$ 00167$: ; ... code inside if() ... 00113$:
This can be optimised by simply inverting the bit-test logic, allowing to eliminate the jra
. I added rules to the attached peephole definitions file for this too.
All output was from SDCC version 4.0.2 revision 11715, compiled from SVN on Linux.
Ticket moved from /p/sdcc/bugs/3084/
Can't be converted:
What is the purpose of the immdInRange(0 65535 '+' 0 %1 %3) conditions?
Just doing like the existing
bset
/bres
substitution rules do. I'm not 100% sure precisely why - I think it is to ensure the operand is a literal memory address (in the range 0x0-0xFFFF), and not anything else (e.g. immediate value, indirect address, etc). But I'm not entirely convinced it's necessary, because I think possibly theoperandsLiteral(%1)
will suffice; as far as I understand, that function simply tests if the first character is a numeric digit, which immediate values (starts with#
) and indexed or indirect addressing (starts with(
or[
) won't. Maybe there are reasons I don't see.There was a problem in the last two rules, as they remove label
%3
, even if that label is still used. Better to keep the label, and let rule j30 remove any unused labels later.Fixed versions of those two rules are now in [r11732].
Okay, cool, thanks.
Yes, of course, it should have occurred to me that the label should only be removed if references are strictly zero. I sort of misunderstood the purpose of
labelRefCountChange()
, that it's not used when removing label, but when removing instruction referencing that label.Getting SDCC to generate btjt here in code generation would be substantial work (3 iCodes would have to be combined), for the non-__at example code.
So this is indeed a good candidate for a peephole rule.
I notice that [r11709] adds support to
notUsed()
for multiple arguments. I should probably amend my set of rules to take advantage, yes? Will make it a bit more concise.Question: will SDCC always emit
ld
+srl
+jrnc
/jrc
when testing bit zero, andld
+jrmi
/jrpl
when testing bit 7?If so, the rules replacing
bcp
variants for bits 0 and 7 could be omitted, as they won't ever be used.