Hello,
I think I found an error in the optimization.
In the example code, the first 5 statements are optimized correctly (bset, bres), but the 6th statement is not (lines 120 - 131).
Why isn't the last statement optimized?
112 ; -----------------------------------------
113 ; function i2c_init
114 ; -----------------------------------------
00000A 115 _i2c_init:
116 ; tbasic.c: 51: *(uint8_t *)(SCL_BASE + P_DDR) |= SCL_MASK; // Set DDR to output
00000A 72 16 50 11 [ 1] 117 bset 0x5011, #3
118 ; tbasic.c: 52: *(uint8_t *)(SCL_BASE + P_CR1) &= ~SCL_MASK; // Set CR1 to open-drain
00000E 72 17 50 12 [ 1] 119 bres 0x5012, #3
120 ; tbasic.c: 53: *(uint8_t *)(SCL_BASE + P_CR2) &= ~SCL_MASK; // Set CR2 to 2MHz
000012 72 17 50 13 [ 1] 121 bres 0x5013, #3
122 ; tbasic.c: 54: i2c_set_scl();
000016 CDr00r00 [ 4] 123 call _i2c_set_scl
124 ; tbasic.c: 56: *(uint8_t *)(SDA_BASE + P_DDR) |= SDA_MASK; // Set DDR to output
000019 72 18 50 11 [ 1] 125 bset 0x5011, #4
126 ; tbasic.c: 57: *(uint8_t *)(SDA_BASE + P_CR1) &= ~SDA_MASK; // Set CR1 to open-drain
00001D 72 19 50 12 [ 1] 127 bres 0x5012, #4
128 ; tbasic.c: 58: *(uint8_t *)(SDA_BASE + P_CR2) &= ~SDA_MASK; // Set CR2 to 2MHz
000021 C6 50 13 [ 1] 129 ld a, 0x5013
000024 A4 EF [ 1] 130 and a, #0xef
000026 C7 50 13 [ 1] 131 ld 0x5013, a
132 ; tbasic.c: 59: i2c_set_sda();
133 ; tbasic.c: 60: }
000029 CCr00r05 [ 2] 134 jp _i2c_set_sda
SDCC 4.3.0
STM8S103F3
Please see atached files.
Sorry, I gave the wrong line numbers.
The lines 129-130 are not optimized.
This happens because ~ upcasts the value to an int.
The same happens if i cast the value to uint8_t:
That does not help because your uint_8 casted value gets upcasted to an int for the ~ operator. You need to cast the inverted value down for this to work.
Even this does not work, see the code.
The lines above are the same, and its optimized.
What is different with this one line?
Last edit: Andreas7 2024-01-29
Hello,
the optimization is done as soon the call to _i2c_set_sda is removed.
But i can't see any reason for this behaviour.
Hello,
i investigated this further.
The same behaviour happens on other places too.
Please see the two indentical "t >>= 16" instructions in the following code.
The first instruction produces this code:
The second instruction produces this code:
Not having looked into this in detail: I suspect that codegen always generates the longer code, then the peephole optimizer replaces it by bset/bres. Then an inaccurate notUsed peephole optimizer helper function would result in the last one not being replaced, as SDCC couldn't prove there that the value in a is no longer needed.
I'd like to see .asm or .lst output from a current SDCC version when using --fverbose-asm.
Last edit: Philipp Klaus Krause 2024-01-31
Hello,
this is the "bset bres" issue compiled by SDCC 4.4.0 with --fverbose-asm:
This is the "shift by 16" issue compiled by SDCC 4.4.0 with --fverbose-asm:
The .asm indicates this is indeed the peephole optimizer that makes the difference here.
Can you provide a self-contained, compileable C code sample to reproduce the issue?
The peephole optimizer couldn't handle the tail call well, triggering a fail-safe fallback.
In [r14920], this was improved, and we not get the
bres
.Related
Commit: [r14920]
Hello,
here is the self contained code file, the list file and the make script.
The not optimized lines are:
000021 C6 50 13 [ 1] 128 ld a, 0x5013
000024 A4 EF [ 1] 129 and a, #0xef
000026 C7 50 13 [ 1] 130 ld 0x5013, a
Thanks. I can reproduce the issue in SDCC built from current trunk on my Debian GNU/Linux testing system.
Hello,
here is a small selfcontaining example to reproduce the "right shift by 16" no optimization problem.
The not optimized lines:
198 ; tbasic.c: 53: t >>= 16;
00006C CDr00r00 [ 4] 199 call __mullong
00006F 5B 08 [ 2] 200 addw sp, #8
000071 1F 0D [ 2] 201 ldw (0x0d, sp), x
000073 17 0B [ 2] 202 ldw (0x0b, sp), y
000075 17 01 [ 2] 203 ldw (0x01, sp), y
000077 1E 0D [ 2] 204 ldw x, (0x0d, sp)
000079 07 01 [ 1] 205 sra (0x01, sp)
00007B 06 02 [ 1] 206 rrc (0x02, sp)
00007D 56 [ 2] 207 rrcw x
00007E 07 01 [ 1] 208 sra (0x01, sp)
000080 06 02 [ 1] 209 rrc (0x02, sp)
000082 56 [ 2] 210 rrcw x
000083 07 01 [ 1] 211 sra (0x01, sp)
000085 06 02 [ 1] 212 rrc (0x02, sp)
000087 56 [ 2] 213 rrcw x
000088 07 01 [ 1] 214 sra (0x01, sp)
00008A 06 02 [ 1] 215 rrc (0x02, sp)
00008C 56 [ 2] 216 rrcw x
00008D 07 01 [ 1] 217 sra (0x01, sp)
00008F 06 02 [ 1] 218 rrc (0x02, sp)
000091 56 [ 2] 219 rrcw x
000092 07 01 [ 1] 220 sra (0x01, sp)
000094 06 02 [ 1] 221 rrc (0x02, sp)
000096 56 [ 2] 222 rrcw x
000097 07 01 [ 1] 223 sra (0x01, sp)
000099 06 02 [ 1] 224 rrc (0x02, sp)
00009B 56 [ 2] 225 rrcw x
00009C 07 01 [ 1] 226 sra (0x01, sp)
00009E 06 02 [ 1] 227 rrc (0x02, sp)
0000A0 56 [ 2] 228 rrcw x
0000A1 07 01 [ 1] 229 sra (0x01, sp)
0000A3 06 02 [ 1] 230 rrc (0x02, sp)
0000A5 56 [ 2] 231 rrcw x
0000A6 07 01 [ 1] 232 sra (0x01, sp)
0000A8 06 02 [ 1] 233 rrc (0x02, sp)
0000AA 56 [ 2] 234 rrcw x
0000AB 07 01 [ 1] 235 sra (0x01, sp)
0000AD 06 02 [ 1] 236 rrc (0x02, sp)
0000AF 56 [ 2] 237 rrcw x
0000B0 07 01 [ 1] 238 sra (0x01, sp)
0000B2 06 02 [ 1] 239 rrc (0x02, sp)
0000B4 56 [ 2] 240 rrcw x
0000B5 07 01 [ 1] 241 sra (0x01, sp)
0000B7 06 02 [ 1] 242 rrc (0x02, sp)
0000B9 56 [ 2] 243 rrcw x
0000BA 07 01 [ 1] 244 sra (0x01, sp)
0000BC 06 02 [ 1] 245 rrc (0x02, sp)
0000BE 56 [ 2] 246 rrcw x
0000BF 07 01 [ 1] 247 sra (0x01, sp)
0000C1 06 02 [ 1] 248 rrc (0x02, sp)
0000C3 56 [ 2] 249 rrcw x
0000C4 07 01 [ 1] 250 sra (0x01, sp)
0000C6 06 02 [ 1] 251 rrc (0x02, sp)
0000C8 56 [ 2] 252 rrcw x
Thanks. I can reproduce the shift issue.
in [r14921], better code is generated for the shift.
Related
Commit: [r14921]
Ticket moved from /p/sdcc/bugs/3705/
Can't be converted:
Moved to feature request tracker, since the generated code is correct, just not as efficient as we would want it to be.