Small Device C Compiler (SDCC) / Feature Requests / #690 [stm8] Generated code could make better use of btjt/btjf instructions

#690 [stm8] Generated code could make better use of btjt/btjf instructions

Milestone: None

Status: open

Owner: nobody

Labels: None

Priority: 5

Updated: 2021-05-19

Created: 2020-07-15

Creator: Basil Hussain

Private: No

There are situations when testing a bit of an IO/peripheral register where the code that is generated does not make use of STM8's BTJT/BTJF instructions, but instead does LD+BCP+JRxx. This only seems to apply to scenarios where the register has been #define'd, and not when using absolute-addressed vars with __at attribute (although there is a separate issue with the latter - see end).

Some example C code to illustrate:

/* Compile with: sdcc -mstm8 --nostdlib -S --fverbose-asm <file>.c */

#define _SFR(mem_addr) (*(volatile unsigned char *)(mem_addr))

#define PA_IDR _SFR(0x5001)
#define PB_IDR _SFR(0x5006)
#define PC_IDR _SFR(0x500B)

volatile unsigned char __at(0x5010) PD_IDR;
static volatile unsigned char foo;

void main(void) {
    volatile unsigned int bar = 0;

    /* Could use btjt/btjf, does not */
    foo = (PA_IDR & (1 << 5) ? 0x86 : 0x10);
    while(PB_IDR & (1 << 3)) bar++;
    if(!(PC_IDR & (1 << 0))) bar++;

    /* Uses btjt/btjf, but branch logic is sub-optimal */
    if(foo & (1 << 7)) bar++;
    if(PD_IDR & (1 << 1)) bar++;
    if(!(PD_IDR & (1 << 2))) bar--;
}

The assembly code generated for these bit-test condition cases generally takes the form:

    ld  a, 0x5001
    bcp a, #0x20
    jreq    00117$

It of course varies depending on the nature of the condition - e.g. jreq/jrne according to polarity of test; srl a and jrnc/jrc when testing bit zero. (By the way, when testing bit 7, ld+jrmi/jrpl is generated, but that case is actually optimal because it is the same number of cycles and bytes as BTJx.)

These can be made more efficient by using btjt or btjf instructions. The above combination uses 3/4 cycles and 6/7 bytes, whereas a single bit-test instruction uses 2/3 cycles and 5 bytes - faster and smaller.

I have drafted a set of (not yet fully tested) peephole rules to potentially make these optimisations - see attached. However, it may be that it is better to be changed in the compiler's code generation.

A related issue comes when the bit being tested is from a global variable or register mapped to a variable with __at. For these, btjt and btjf are indeed used, but seem to generate inoptimal code for the branching logic:

    btjf    _PD_IDR+0, #2, 00167$
    jra 00113$
00167$:
    ; ... code inside if() ...
00113$:

This can be optimised by simply inverting the bit-test logic, allowing to eliminate the jra. I added rules to the attached peephole definitions file for this too.

All output was from SDCC version 4.0.2 revision 11715, compiled from SVN on Linux.

1 Attachments

peeph.def

Discussion

Philipp Klaus Krause - 2020-07-16

Ticket moved from /p/sdcc/bugs/3084/

Can't be converted:

_category: STM8
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Philipp Klaus Krause - 2020-07-16

What is the purpose of the immdInRange(0 65535 '+' 0 %1 %3) conditions?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Basil Hussain - 2020-07-16

Just doing like the existing bset/bres substitution rules do. I'm not 100% sure precisely why - I think it is to ensure the operand is a literal memory address (in the range 0x0-0xFFFF), and not anything else (e.g. immediate value, indirect address, etc). But I'm not entirely convinced it's necessary, because I think possibly the operandsLiteral(%1) will suffice; as far as I understand, that function simply tests if the first character is a numeric digit, which immediate values (starts with #) and indexed or indirect addressing (starts with ( or [) won't. Maybe there are reasons I don't see.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Philipp Klaus Krause - 2020-07-16

There was a problem in the last two rules, as they remove label %3, even if that label is still used. Better to keep the label, and let rule j30 remove any unused labels later.

Fixed versions of those two rules are now in [r11732].

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Basil Hussain - 2020-07-16

Okay, cool, thanks.

Yes, of course, it should have occurred to me that the label should only be removed if references are strictly zero. I sort of misunderstood the purpose of labelRefCountChange(), that it's not used when removing label, but when removing instruction referencing that label.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Philipp Klaus Krause - 2020-07-16

Getting SDCC to generate btjt here in code generation would be substantial work (3 iCodes would have to be combined), for the non-__at example code.
So this is indeed a good candidate for a peephole rule.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Basil Hussain - 2020-07-16

I notice that [r11709] adds support to notUsed() for multiple arguments. I should probably amend my set of rules to take advantage, yes? Will make it a bit more concise.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Basil Hussain - 2020-07-16

Question: will SDCC always emit ld+srl+jrnc/jrc when testing bit zero, and ld+jrmi/jrpl when testing bit 7?

If so, the rules replacing bcp variants for bits 0 and 7 could be omitted, as they won't ever be used.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

[stm8] Generated code could make better use of btjt/btjf instructions

The Small Device C Compiler (SDCC), targeting 8-bit architectures

Group

Searches

Help

#690 [stm8] Generated code could make better use of btjt/btjf instructions

Discussion