In this function...
void
return_values_broken(unsigned char *a, unsigned char *b)
{
regs_t r;
r.regs_de = 0x1234;
do_nothing(&r);
*a = r.reg_d;
*b = r.reg_e;
}
... everything is OK up to and including the do_nothing() call. After that, the peephole mis-optimises the offset of the reg_d offset, and grabs a random byte from the stack. The optimised code looks like this:
115 ; peephole 43b removed loads by exploiting commutativity of addition.
0035 21 08 00 116 ld hl,#0x0000 + 0x0008
0038 39 117 add hl,sp
118 ; peephole 41d used 16-bit addition.
0039 4D 119 ld c, l
003A 44 120 ld b, h
121 ; peephole 0a removed dead load from h into a.
122 ; genPlus
123 ; genPlusIncr
003B 11 09 00 124 ld de, #0x0009
003E 19 125 add hl, de
reg_e is at offset 0x8, and reg_d is at offset 0x9. Lines 116/117 correctly calculate the offset of reg_e, but the offset of reg_d is calculated based on ®_e rather than the start of the struct again. The assembler continues:
126 ; peephole 41c used 16-bit addition.
127 ; peephole 0a removed dead load from h into a.
128 ; genPointerGet
129 ; peephole 0zg loaded a from (hl) directly instead of going through (de).
003F 7E 130 ld a,(hl)
131 ; genAssign (pointer)
132 ; isBitvar = 0
0040 FD 77 00 133 ld (iy),a
134 ;peepbug.c:44: *b = r.reg_e;
135 ; genAssign
136 ; AOP_STK for
0043 DD 5E 06 137 ld e,6 (ix)
0046 DD 56 07 138 ld d,7 (ix)
0049 D5 139 push de
004A FD E1 140 pop iy
141 ; genPointerGet
004C 0A 142 ld a,(bc)
143 ; genAssign (pointer)
144 ; isBitvar = 0
004D FD 77 00 145 ld (iy),a
At line 130, we use hl, which is %sp + 8 + 9 (incorrect).
At line 142, we revert to using the original value of ®_e stored in bc, and this is correct.
The attached file is the full code (standalone).
1) Removing the function call appears to produce correct code.
2) Removing the assignment to regs_de before do_nothing() still produces incorrect code.
3) Removing both function call and assignment produces incorrect code.
It would appear that the presence of the function call triggers different behaviour in the optimiser.
Compiling with --no-peep produces code which works correctly.
Compile line:
$ /opt/local/bin/sdcc -mz80 --verbose-asm peepbug.c
$ sdcc -v
SDCC : z80 2.9.7 #5973 (Jan 6 2011) (Solaris i386)
Same result using: SDCC : z80 3.0.0 #6037 (Feb 17 2011) (Solaris i386)
Observed behaviour:
Offsets %sp+0x11 and %sp+0x8 are read.
Expected behaviour:
Offsets %sp+0x9 and %sp+0x8 are read.
It's possible this could be the same root cause as bug 3179271.
Example code / test case
Formatting of original description seems to have been lost. Sorry about that. I've attached the .lst file produced from sdcc-3.0.0
Test case listing
This is not a peephole optimizer problem. The code is wrong even before the peephole optimizer sees it, as can be verified by compiling with --no-peep.
The problem can still be observed in current sdcc #6226, both in trunk and in the optralloc branch. It does not occour in the optralloc branch when using --optralloc-all, which means that this is most likely a bug in packRegsForHLUse3() in the old register allocator.
Philipp
Forgive me if I'm missing something, but I'm not (yet) convinced. When I compile this with sdcc 3.0.0 and --no-peep, I get the following assembler output, which seems correct to me:
;peepbug.c:44: *a = r.reg_d;
ld c,4 (ix)
ld b,5 (ix)
push bc
pop iy
ld hl,#0x0000
add hl,sp <== hl = start of struct
ld l,l
ld h,h
ld a,l \ add a,#0x08 |
ld c,a | Add 0x0008 to hl
ld a,h | and store in bc
adc a,#0x00 |
ld b,a /
ld a,l \ add a,#0x09 |
ld e,a | Add 0x0009 to hl
ld a,h | and store in de
adc a,#0x00 |
ld d,a /
ld a,(de) <== Load from de (%sp + 0x9)
ld (iy),a <== store in *a
;peepbug.c:45: *b = r.reg_e;
ld e,6 (ix)
ld d,7 (ix)
push de
pop iy
ld a,(bc) <== Load from bc (%sp + 0x8)
ld (iy),a <== Store in *b
Seems I was confused. Peepholes 41c and 41d were broken. They're fixed in #3.0.1 #6227.
Philipp
The regression test for this bug fails for hc08.
'Shall live on as #3187162 as a hc08 bug. For now the new regression test is disabled for hc08.
Philipp