For an expression (x & ~y), the mos6502 codegen will generate a faulty instruction sequence unless the result of this expression is stored to an intermediate variable.
An example using GBDK can be found here: https://github.com/michel-iwaniec/gbdk-2020/commit/22d37a1ae44fcdf5955329a5a4cc6b8861d1e00c
The example creates two global 8-bit variables:
uint8_t x = 0xFF;
uint8_t y = 0xAA;
As well as a macro and a function returning the (x & ~y) expression:
#define XANDNOTY_MACRO() (x & ~y)
uint8_t xandnoty_function()
{
return (x & ~y);
}
The main function then tries to call a small helper function print_hex in 3 differnt ways:
Cases 1 and 2 work fine and give the expected value of 0x55, while case 3 does not.
Looking into the code generated for these cases, we can clearly see how 1 and 2 give a short expected sequence of instructions:
case 1:
00C4E8 115 _xandnoty_function:
000000 116 C$x_and_not_y.c$25$1_0$103 ==.
117 ; src/x_and_not_y.c: 25: return (x & ~y);
00C4E8 AD 01 03 [ 4] 118 lda _y
00C4EB 49 FF [ 2] 119 eor #0xff
00C4ED 2D 00 03 [ 4] 120 and _x
000008 121 C$x_and_not_y.c$26$1_0$103 ==.
122 ; src/x_and_not_y.c: 26: }
000008 123 C$x_and_not_y.c$26$1_0$103 ==.
000008 124 XG$xandnoty_function$0$0 ==.
00C4F0 60 [ 6] 125 rts
[...]
241 ; src/x_and_not_y.c: 51: print_hex(xandnoty_function());
00C559 20 E8 C4 [ 6] 242 jsr _xandnoty_function
00C55C A2 00 [ 2] 243 ldx #0x00
00C55E 20 F1 C4 [ 6] 244 jsr _print_hex
case 2:
266 ; src/x_and_not_y.c: 56: r = XANDNOTY_MACRO();
00C57C AD 01 03 [ 4] 267 lda _y
00C57F 49 FF [ 2] 268 eor #0xff
00C581 2D 00 03 [ 4] 269 and _x
00009C 270 C$x_and_not_y.c$57$1_0$107 ==.
271 ; src/x_and_not_y.c: 57: print_hex(r);
00C584 A2 00 [ 2] 272 ldx #0x00
00C586 20 F1 C4 [ 6] 273 jsr _print_hex
Whereas case 3 generates a very long sequence of instructions, where the value of the "lda _y" gets immediately overwritten by a "ldx #0x00 / txa" pair:
295 ; src/x_and_not_y.c: 62: print_hex(XANDNOTY_MACRO());
00C5A4 AD 01 03 [ 4] 296 lda _y
00C5A7 A2 00 [ 2] 297 ldx #0x00
00C5A9 8A [ 2] 298 txa
00C5AA 49 FF [ 2] 299 eor #0xff
00C5AC 85 2F [ 3] 300 sta (_main_sloc0_1_0 + 1)
00C5AE 49 FF [ 2] 301 eor #0xff
00C5B0 85 2E [ 3] 302 sta _main_sloc0_1_0
00C5B2 AD 00 03 [ 4] 303 lda _x
00C5B5 25 2E [ 3] 304 and _main_sloc0_1_0
00C5B7 85 34 [ 3] 305 sta (REGTEMP+0)
00C5B9 8A [ 2] 306 txa
00C5BA 25 2F [ 3] 307 and (_main_sloc0_1_0 + 1)
00C5BC AA [ 2] 308 tax
00C5BD A5 34 [ 3] 309 lda (REGTEMP+0)
00C5BF 20 F1 C4 [ 6] 310 jsr _print_hex
0000DA 311 C$x_and_not_y.c$63$1_0$107 ==.
312 ; src/x_and_not_y.c: 63: vsync();
00C5C2 20 58 C3 [ 6] 313 jsr _vsync
0000DD 314 C$x_and_not_y.c$64$1_0$107 ==.
315 ; src/x_and_not_y.c: 64: }
0000DD 316 C$x_and_not_y.c$64$1_0$107 ==.
0000DD 317 XG$main$0$0 ==.
00C5C5 60 [ 6] 318 rts
Other targets like Z80 and SM83 do not seem to suffer from this kind of issue.
Case 3 in the above (the actual failing case) was a bit incorrect/badly formatted, but I can't edit the ticket.
It should say:
Whereas case 3 generates a very long sequence of instructions, where the value of the "lda y" gets immediately overwritten by a "ldx #0x00 / txa" pair:
Seems it was actually fixed in [r14650]. So this ticket can be closed.
There still seems to be a problem with case 3 generating a 16-bit sequence that seems redundant to me, given that the two input variables and the function parameter are all 8-bit.
But that's not a functional bug - just a possible optimization, assuming the compiler can predict this.
Related
Commit: [r14650]
Last edit: Maarten Brock 2024-06-06