## #404 Z80 code inefficient when returning 32-bit global variable from function

open
nobody
None
5
2014-12-09
2013-09-11
No

Playing with some of the Numerical Recipes examples, I coded up their very simple random number generator:

```unsigned long idum = 0;

long
rnd(void)
{
idum = 1664525L * idum + 1013904223L;

return (idum);
}
```

At the end of the function, to return idum, we get this:

```;rand.c:11: return (idum);
;       genRet
;fetchLitPair
ld      iy,#_idum
ld      l,0 (iy)
;fetchLitPair
ld      iy,#_idum
ld      h,1 (iy)
;fetchLitPair
ld      iy,#_idum
ld      e,2 (iy)
;fetchLitPair
ld      iy,#_idum
ld      d,3 (iy)
;       genLabel
; peephole 149 removed unused label 00101\$.
;       genEndFunction
ret
```

This takes 28 bytes and 160Tstates. Clearly there is no need to reload iy each time, and eliminating the redundant iy reloads reduces this to 16 bytes and 90T. If I were hand coding it, I'd do something like this (11 bytes, 60T):

```        ld      hl, #_idum+3
ld      d, (hl)
dec     hl
ld      e, (hl)
dec     hl
ld      a, (hl)
dec     hl
ld      l, (hl)
ld      h, a
```

Given this corresponds to a single C statement, and I believe a single icode statement, I can't see why this couldn't be done in the code generation rather than later in the optimiser.

In fact, a spot more analysis shows that we can prove that hl already points to idum+3 as a result of storing the addition there just before, so we could in theory eliminate the load of hl completely, saving a further 3 bytes and 10T.

Above output is from sdcc #8839, using:
\$ /build/sdcc/sdcc-20130910-8839/bin/sdcc --i-code-in-asm --opt-code-size -mz80 --fverbose-asm -I../include -c -o rand.rel rand.c

## Discussion

• Brian Ruthven - 2013-09-11

I just spotted that the return type from rnd() should be "unsigned long", but it makes no difference to the code generated.

• We used to have an optimization that avoided reloading iy, even across multiple iCodes. It was disabled since it reulted in a bug that seemed rather hard to fix otherwise.
However I agree that we should bring this back. Definitely within a single iCode, maybe even across iCodes (doing it the correct way would leave handling of rematerialization to the register allocator).

Philipp

• Brian Ruthven - 2013-10-05

In fact I think it should be simpler than this (although avoiding the iy reload would probably help in other places. Given that we are returning from a function, and the return type is "long", we must be populating DEHL to return the value. Therefore, we can simply:

```ld hl, (_idum)
ld de, (_idum+2)
```

This takes only 7 bytes (36T), and is similar to the existing load of idum at the start of the function:

```   0000 2Ar02r00      [16]   60         ld      hl,(_idum + 2)
0003 E5            [11]   61         push    hl
62 ;fetchPairLong
0004 2Ar00r00      [16]   63         ld      hl,(_idum)
0007 E5            [11]   64         push    hl
65 ;       genIpush
```