#404 Z80 code inefficient when returning 32-bit global variable from function

open
nobody
None
5
2014-12-09
2013-09-11
No

Playing with some of the Numerical Recipes examples, I coded up their very simple random number generator:

unsigned long idum = 0;

long
rnd(void)
{
        idum = 1664525L * idum + 1013904223L;

        return (idum);
}

At the end of the function, to return idum, we get this:

;rand.c:11: return (idum);
;       genRet
;fetchLitPair
        ld      iy,#_idum
        ld      l,0 (iy)
;fetchLitPair
        ld      iy,#_idum
        ld      h,1 (iy)
;fetchLitPair
        ld      iy,#_idum
        ld      e,2 (iy)
;fetchLitPair
        ld      iy,#_idum
        ld      d,3 (iy)
;       genLabel
; peephole 149 removed unused label 00101$.
;       genEndFunction
        ret

This takes 28 bytes and 160Tstates. Clearly there is no need to reload iy each time, and eliminating the redundant iy reloads reduces this to 16 bytes and 90T. If I were hand coding it, I'd do something like this (11 bytes, 60T):

        ld      hl, #_idum+3
        ld      d, (hl)
        dec     hl
        ld      e, (hl)
        dec     hl
        ld      a, (hl)
        dec     hl
        ld      l, (hl)
        ld      h, a

Given this corresponds to a single C statement, and I believe a single icode statement, I can't see why this couldn't be done in the code generation rather than later in the optimiser.

In fact, a spot more analysis shows that we can prove that hl already points to idum+3 as a result of storing the addition there just before, so we could in theory eliminate the load of hl completely, saving a further 3 bytes and 10T.

Above output is from sdcc #8839, using:
$ /build/sdcc/sdcc-20130910-8839/bin/sdcc --i-code-in-asm --opt-code-size -mz80 --fverbose-asm -I../include -c -o rand.rel rand.c

Discussion

  • Brian Ruthven

    Brian Ruthven - 2013-09-11

    I just spotted that the return type from rnd() should be "unsigned long", but it makes no difference to the code generated.

     
  • Philipp Klaus Krause

    We used to have an optimization that avoided reloading iy, even across multiple iCodes. It was disabled since it reulted in a bug that seemed rather hard to fix otherwise.
    However I agree that we should bring this back. Definitely within a single iCode, maybe even across iCodes (doing it the correct way would leave handling of rematerialization to the register allocator).

    Philipp

     
  • Brian Ruthven

    Brian Ruthven - 2013-10-05

    In fact I think it should be simpler than this (although avoiding the iy reload would probably help in other places. Given that we are returning from a function, and the return type is "long", we must be populating DEHL to return the value. Therefore, we can simply:

    ld hl, (_idum)
    ld de, (_idum+2)
    

    This takes only 7 bytes (36T), and is similar to the existing load of idum at the start of the function:

       0000 2Ar02r00      [16]   60         ld      hl,(_idum + 2)
       0003 E5            [11]   61         push    hl
                                 62 ;fetchPairLong
       0004 2Ar00r00      [16]   63         ld      hl,(_idum)
       0007 E5            [11]   64         push    hl
                                 65 ;       genIpush
    
     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks