Small Device C Compiler (SDCC) / Bugs / #2263 Addition of a 32-bit return value and a 16-bit function argument does not work

#2263 Addition of a 32-bit return value and a 16-bit function argument does not work

Status: closed-fixed

Owner: Ben Shi

Labels: None

Category: MCS51

Priority: 5

Updated: 2016-01-11

Created: 2014-03-18

Creator: David Grayson

Private: No

I am using Windows 8 64-bit and my sdcc -v is today's snapshot build:

SDCC : mcs51/z80/z180/r2k/r3ka/gbz80/tlcs90/ds390/pic16/pic14/TININative/ds400/h
c08/s08/stm8 3.4.0/*rc1*/ #8960 (Mar 17 2014) (MINGW64)

The test code that reproduces the bug is:

unsigned long g()
{
  return 0xFFFFFFFF;
}

void f(unsigned short m)
{
  unsigned long e = g() + m;
  while(e){}
}

void main()
{
  f(1);
}

I compiled the code by simply running "sdcc test.c". In the assembly output file (test.asm), we can see evidence of a problem in two ways:

The function g puts its return value into the registers dpl, dph, a, and b. However, the function f only looks at dpl and dph; it never uses the values that g placed into a and b. Essentially it is ignoring the upper 16 bits of the return value from g.
The function f performs the 32-bit addition in an incorrect way where it stores the first two addition results in r4 and r5, but then it tries to use those values again as inputs to the third and fourth addition instructions.

Here is the excerpt from test.asm that shows the assembly code for f, along with my comments explaining what is happening.

_f:
  mov   r6,dpl  ; Put the low byte of the argument m into r6.
  mov   r7,dph  ; Put the high byte of the argument m into r7.  Now m is in r6:r7.
;   test.c:8: unsigned long e = g() + m;
  push  ar7   ; Push r7 (low byte of m) onto the stack so we can call g.
  push  ar6   ; Push r6 (low byte of m) onto the stack so we can call g.
  lcall _g    ; Call g
  mov   r2,dpl  ; Move byte 0 of the g return value into r2.
  mov   r3,dph  ; Move byte 1 of the g return value into r3.
  pop   ar6     ; Restore r6 (low byte of m) from stack.
  pop   ar7     ; Restore r7 (low byte of m) from stack.
  mov   ar0,r6  ; Move low byte of m from r6 to r0.
  mov   ar1,r7  ; Move low byte of m from r7 to r1.  Now m is in r0:r1.
  clr   a       ; Clear the a.
  mov   r6,a    ; Clear r6.
  mov   r7,a    ; Clear r7.
  mov   a,r0
  add   a,r2
  mov   r4,a    ; Set r4 equal to (low byte of m) + (byte 0 of the g return value)
  mov   a,r1
  addc  a,r3
  mov   r5,a    ; Set r5 equal to (high byte of m) + (byte 1 of the g return value) + (carry bit)
  mov   a,r6    ; Clear a (r6 was 0).
  addc  a,r4
  mov   r6,a    ; BAD: this sets r6 to r4 + (carry bit), so r6 is almost a copy of r4.
  mov   a,r7    ; Clear a (r7 was 0).
  addc  a,r5
  mov   r7,a    ; BAD: this sets 
;   test.c:9: while(e){}
00101$:
  mov   a,r4
  orl   a,r5
  orl   a,r6
  orl   a,r7
  jnz   00101$
  ret

During the addition phase, it looks like SDCC is adding r0:r1:r6:r7 to r2:r3:r4:r5, but it is storing the result in r4:r5:r6:r7, and that doesn't work because by the time it reads from r4 to perform the third addition, r4 already holds the least significant byte of the result.

In the example code, the values that are being added together are 0xFFFFFFFF and 0x0001. At the end of the addition, I think r4 and r5 should be equal to 0 as expected, but I think r6 would be 1 and r7 would be 0xFF, so the compiler thinks that endtime is equal to 0xFF010000 and the loop will run forever, even though it should terminate quickly.

We were also able to reproduce this bug with the snapshot build mentioned above, and we also reproduced it on SDCC 3.1.0, SDCC 3.3.0, and SDCC 3.4.0-RC1.

In SDCC 3.0.0, this bug does not seem to be present. Here is the assembly source code of _f when compiled with SDCC 3.0.0:

_f:
  mov   r2,dpl
  mov   r3,dph
;   test.c:8: unsigned long e = g() + m;
  push  ar2
  push  ar3
  lcall _g
  mov   r4,dpl
  mov   r5,dph
  mov   r6,b
  mov   r7,a
  pop   ar3
  pop   ar2
  clr   a
  mov   r0,a
  mov   r1,a
  mov   a,r2
  add   a,r4
  mov   r2,a
  mov   a,r3
  addc  a,r5
  mov   r3,a
  mov   a,r0
  addc  a,r6
  mov   r4,a
  mov   a,r1
  addc  a,r7
  mov   r5,a
;   test.c:9: while(e){}
00101$:
  mov   a,r2
  orl   a,r3
  orl   a,r4
  orl   a,r5
  jnz   00101$
  ret

Discussion

David Grayson - 2014-03-18

Oops, I made a few mistakes and it seems that I cannot edit my bug report. When I said "endtime" I actually meant to say "e". For my last assembly comment, I meant to write:

; BAD: this sets r7 to r5 + (carry bit), so r7 is almost a copy of r5.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Erik Petrich - 2014-03-18

The main problem is the register assignment during the addition, where the registers for the result value overlap with the source registers, but at different byte positions. The return value from g() is handled correctly, but the moves to the registers for the upper 2 bytes (r4 and r5) are optimized away by the peepholer because those registers are overwritten as result bytes of the addition before they are used due to the register assignment problem.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Philipp Klaus Krause - 2014-03-18

Seems to work fine on z80 and stm8.

Philipp

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Philipp Klaus Krause - 2014-03-18

Category: other --> MCS51
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Erik Petrich - 2014-03-19

The change that made the bug apparent for this specific code example is r6382 ("prefer to use adjacent ascending registers"), but the underlying bug has been there since the beginning and potentially affects all of the backends that use the original register allocator: definitely mcs51 and ds390, possibly z80 family with --oldralloc (there may be other register constraints that still keep it out of trouble, though), not sure about the PICs since they are only allocating pseudo registers. The hc08/s08 backends are okay, even with --oldralloc, because it doesn't have enough registers to trigger this bug.
The fundamental problem is that positionRegs() is looking at the left and right operands in isolation, but is faced with a register assignment for the result that cannot simultaneously satisfy both. Prior to r6382, it was also failing to detect this case, but as long as the registers were used as sources before they were reused as results the generated code is still valid.
I suggest we defer fixing this bug until after the 3.4.0 release.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Maarten Brock - 2014-03-19

This bug should certainly not be fixed before the 3.4.0 release. Messing with register allocation is way too tricky to do before.

Maarten

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ben Shi - 2016-01-11

Fixed in reversion [r9453].

Last edit: Maarten Brock 2016-01-11

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ben Shi - 2016-01-11

status: open --> closed-fixed

assigned_to: Ben Shi
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Addition of a 32-bit return value and a 16-bit function argument does not work

The Small Device C Compiler (SDCC), targeting 8-bit architectures

Searches

Help

#2263 Addition of a 32-bit return value and a 16-bit function argument does not work

Discussion