Menu

#903 unnecessary use of additional register (seen on z80 code)

None
open
nobody
None
5
2024-04-19
2024-04-03
No

For the following function

void printchar( unsigned char c );

void printnibble( unsigned char x ) {
    x = x & 0xf;
    if ( x > 10 )
        x += 'a' - 10;
    else
        x += '0';
    printchar( x );
}

Compiled using:

; SDCC  TD- 4.4.1 #14768 (MINGW64)
sdcc -mz80 --reserve-regs-iy --opt-code-size --fverbose-asm --max-allocs-per-node2000000  --peep-return  -c

Produces the verbose output which follows here, but before, to point to the problem, the shorter version:

; Function printnibble
; ---------------------------------
_printnibble::
        and     a, #0x0f
        ld      c, a
        cp      a, #0x0b
        jr      C, 00102$
        ld      a, c
        add     a, #0x57
        ld      c, a
        jr      00103$
00102$:
        ld      a, c
        add     a, #0x30
        ld      c, a
00103$:
        ld      a, c
        jp      _printchar

Here it's clear that the register C is unnecessarily used throughout in spite of "cp" being non destructive and the possibility to keep the results in A all the time, as the previous values are not supposed to "live" (except through the comparison).

;       ---------------------------------
; Function printnibble
; ---------------------------------
;       Register assignment is optimal.
; Stack space usage: 0 bytes.
_printnibble::
;       genReceive
;       genMove_o size 1
;check-expired.c:117: x = x & 0xf;
;       genAnd
        and     a, #0x0f
;       genMove_o size 1
;check-expired.c:119: x += 'a' - 10;
;       genCast
;       genMove_o size 1
        ld      c, a
;       genMove_o size 0
;check-expired.c:118: if ( x > 10 )
;       genCmpGt
; common peephole 6c removed dead cp a, #0x0a
        cp      a, #0x0b
; common peephole 163 changed absolute to relative conditional jump.
        jr      C, 00102$
;       skipping generated iCode
;check-expired.c:119: x += 'a' - 10;
;       genPlus
        ld      a, c
        add     a, #0x57
        ld      c, a
;       genGoto
; common peephole 162 changed absolute to relative unconditional jump.
        jr      00103$
;       genLabel
00102$:
;check-expired.c:121: x += '0';
;       genPlus
        ld      a, c
        add     a, #0x30
        ld      c, a
;       genLabel
00103$:
;check-expired.c:122: printchar( x );
;       genSend
;       genMove_o size 1
        ld      a, c
;       genMove_o size 0
;       genCall
;       genLabel
; common peephole 158 removed unused label 00104$.
;check-expired.c:123: }
;       genEndFunction
        jp      _printchar
; common peephole 152 removed unused ret.

Discussion

  • Maarten Brock

    Maarten Brock - 2024-04-06

    Ticket moved from /p/sdcc/bugs/3724/

    Can't be converted:

    • _category: redundancy elimination
     
  • Maarten Brock

    Maarten Brock - 2024-04-06

    Suboptimal code generation is not a bug. It can be a feature request though to fix it.

     
    • Janko Stamenović

      Many thanks, I apologize for not understanding that until now.

       
      • Janko Stamenović

        This one is then, I think, also not a bug:

        https://sourceforge.net/p/sdcc/bugs/3723/

         
        • Philipp Klaus Krause

          Yes. It is moved to the feature request tracker now.

           
  • Janko Stamenović

    A different example of an unnecessary load:

    unsigned calcx( unsigned x ) {
        return x;
    }
    unsigned calcIncH( unsigned x ) {
        return calcx( x + 256 );
    }
    

    A result, in short:

    _calcIncH::
        inc h
        ld  a, h
        jp  _calcx
    

    The ld a, h remains in spite of the fact that A is clearly not used, as the calcx needs only HL.

    The verbose output:

    ;   ---------------------------------
    ; Function calcIncH
    ; ---------------------------------
    ;   Register assignment is optimal.
    ; Stack space usage: 0 bytes.
    _calcIncH::
    ;   genReceive
    ;   genMove_o size 2
    ;   genPlus
    ;   genSend
    ;   genMove_o size 2
    ; common peephole 9 loaded h from a directly instead of going through c.
        inc h
        ld  a, h
    ; common peephole 104b inc h directly to remove redundant load from a into h
    ;   genMove_o size 0
    ;   genCall
    ;   genMove_o size 2
    ;   genRet
    ;   genMove_o size 2
    ;   genLabel
    ; common peephole 158 removed unused label 00101$.
    ;   genEndFunction
        jp  _calcx
    ; common peephole 152 removed unused ret.
    

    I am not familiar enough with the possibilities of peep hole defs, but I've tried also to add it to see if I understand it enough to remove it tat way:

    sdcc           --fverbose-asm --max-allocs-per-node200000  --no-c-code-in-asm -mz80n  --opt-code-size --reserve-regs-iy --no-std-crt0 --nostdlib  --Werror  --peep-return --peep-file peep-my.def  -c
    

    where peep-my.def contains only:

    replace restart {
        ld  a, h
    } by {
        ; removed redundant load a
    } if notUsed( 'a' )
    
    barrier
    

    but I don't see that it was used. And I have believed that the removal of loads to the unused registers is anyways something that already exists?

     
    • Philipp Klaus Krause

      Peephole optimizer rule 1 should have eliminated that:

      replace restart {
              ld      %1, %2
      } by {
              ; common peephole 1 removed dead load from %2 into %1.
      } if notVolatile(%1), notUsed(%1), notVolatile(%2), notSame(%1 '(hl+)' '(hl-)'), notSame(%2 '(hl+)' '(hl-)')
      

      I guess we get a false negative on notUsed here, possibly notUsed accuracy suffers due to the jp fromt tail-call optimization; when I find time I might look into it to see if that is really the issue and if it is easy to improve notUsed accuracy here. Another question is why codegen originally used c and a here, instead of going directly with h.

      P.S.: I'm a bit busy right now, I hope to find time to look into z80 stuff in the second half of next week.

       

      Last edit: Maarten Brock 2024-04-06
      • Janko Stamenović

        I hope I haven't made everything more confusing, I thought that these two are different symptoms with possible different causes, one in the top post, involving c, and a different one with that "notUsed" of a, but I also thought maybe they both fit "unnecessary use of (an) additional register" title and can be checked at the same time.

        I don't make these posts with any expectations for some prompt reactions, the generated code for my C is anyway very, very useful, and, most importantly, gives the correct end results. Many thanks for your work! I just thought that if I report some of my observations, maybe it will eventually help as some pointers where some improvements are possible, and that only if that's so, and whenever that happens.

         
        • Janko Stamenović

          isUsed( 'a' ) definitely doesn't behave as I'd expect:

          Starting with the following code:

          unsigned char flags;
          volatile unsigned char sc;
          
          unsigned char readc( void );
          
          unsigned char sendc( unsigned char c )
          {
              flags &= ~( 0b00000010 ); // reset bit 1
              unsigned char r = readc();
              if ( r == 0xff )
                  return r;
              sc = c;
              return c;
          }
          

          The begin of the generated function is:

              ld  c, a
              ld  hl, #_flags
              ld  a, (hl)
              and a, #0xfd
              ld  (hl), a
          

          Now if I add what I believe is wrong peep rule:

          replace restart {
              ld  a, (hl)
              and a, #%1
              ld  (hl), a
          } by {
              res 1, (hl)
              ; common peephole NN2.1: use res 1 (hl)
          } if immdInRange(0xfd 0xfd '+' %1 0 %2)
          

          It will be

              ld  hl, #_flags
              res 1, (hl)
          

          But using what I think is a correct rule (with notUsed('a'))

          replace restart {
              ld  a, (hl)
              and a, #%1
              ld  (hl), a
          } by {
              res 1, (hl)
              ; common peephole NN2.1: use res 1 (hl)
          } if notUsed('a'), immdInRange(0xfd 0xfd '+' %1 0 %2)
          

          I get then (showing first the short asm of the whole function):

          ;   ---------------------------------
          ; Function sendc
          ; ---------------------------------
          _sendc::
              ld  c, a
              ld  hl, #_flags
              ld  a, (hl)
              and a, #0xfd
              ld  (hl), a
              push    bc
              call    _readc
              pop bc
              cp  a, #0xff
              ret Z
              ld  hl, #_sc
              ld  (hl), c
              ld  a, c
              ret
          

          Note that the value to be used later is kept in C, not in A, and that A is not supposed to be considered alive after that function call with no arguments, but which returns a byte.

          Which also means that the point when notUsed('a') becomes wrong can't be a tail call optimization, as this call here is a "simple" one.

          ;   ---------------------------------
          ; Function sendc
          ; ---------------------------------
          ;   Register assignment is optimal.
          ; Stack space usage: 0 bytes.
          _sendc::
          ;   genReceive
          ;   genMove_o size 1
              ld  c, a
          ;   genMove_o size 0
          ;check.c:161: flags &= ~( 0b00000010 ); // reset bit 1
          ;   genAnd
          ;fetchLitPair   hl
              ld  hl, #_flags
              ld  a, (hl)
              and a, #0xfd
          ;   genMove_o size 1
          ;fetchLitPair   hl
              ld  (hl), a
          ;check.c:162: unsigned char r = readc();
          ;   genCall
              push    bc
              call    _readc
          ;   genMove_o size 1
              pop bc
          ;check.c:163: if ( r == 0xff )
          ;   genCmpEq
              cp  a, #0xff
              ret Z
          ; common peephole 161 replaced jump by return.
          ; common peephole 84 jumped to 00103$ directly instead of via 00113$.
          ; common peephole 81 removed jp by using inverse jump logic
          ; common peephole 158 removed unused label 00112$.
          ;   skipping generated iCode
          ;check.c:164: return r;
          ;   genRet
          ;   genMove_o size 1
          ; common peephole 87b removed unreachable jump to 00103$
          ;   genLabel
          ; common peephole 85a eliminated jump.
          ; common peephole 158 removed unused label 00102$.
          ;check.c:165: sc = c;
          ;   genAssign
          ;   genMove_o size 1
          ;fetchLitPair   hl
              ld  hl, #_sc
              ld  (hl), c
          ;check.c:166: return c;
          ;   genRet
          ;   genMove_o size 1
              ld  a, c
          ;   genMove_o size 0
          ;   genLabel
          ; common peephole 158 removed unused label 00103$.
          ;check.c:167: }
          ;   genEndFunction
              ret
          

          Tangentially, I can contribute these peephole rules to make these set and res ,(hl)cases for all possible bits, if the peep rules are the right point at all to implement such transformations in SDCC generally, and (I assume) if the "notUsed('a')" can be relied upon.

           

          Last edit: Janko Stamenović 2024-04-07
          • Philipp Klaus Krause

            If you want to debug issues in notUsed() for a z80-related port, enable the D macro for debugging output in line 35 of src/z80/peep.c.

             
            👍
            1
            • Janko Stamenović

              Now with Version 4.4.1 #14808 (Linux) I can add notUsed('a') in the rule which matches what it should be substituted and make the rule happen!

              So this whole branch of this topic about the use of A register starting with the post with unsigned calcIncH( unsigned x ) and including the posts about the use of notUsed('a') in the rules is...

              solved in #14808

              🥳

               
  • Janko Stamenović

    Another example of unnecessary use, here of the stack locations ( Version 4.4.1 #14792 (MINGW64))

    (Like always, I know that even better implementations possible in assembly, but here it's only about what could be reasonably expected from SDCC, and I hope that an unnecessary use illustrated here is avoidable) (Edit: the function is just an experiment to see what SDCC generates, otherwise not used by me, and this report is just to preserve an information where the improvements are maybe possible):

    typedef unsigned uint16_t;
    typedef unsigned size_t;
    typedef unsigned char uint8_t;
    
    uint16_t flSum[2];
    
    void fletch_sum_small_mod( unsigned char m, const unsigned char* s )
    {
        // ASSERT( m <= 21 )
        uint16_t s0 = flSum[0];
        uint16_t s1 = flSum[1];
        do {
            s0 += *s++;
            s1 += s0;
        } while ( --m );
        s0 %= 255;
        s1 %= 255;
        flSum[0] = s0;
        flSum[1] = s1;
    }
    
    compiled using:
    -mz80 --reserve-regs-iy --opt-code-size --no-c-code-in-asm    --peep-file peep-my.def    --max-allocs-per-node2000000 --Werror --peep-return  -c  
    

    produces (full verbose output follows afterwards):

    note unnecessary ld -3 (ix), a ld -2 (ix), #0x00 ld a, -3 (ix) inside of the 00101$: loop, which are also never used afterwards:

    _fletch_sum_small_mod::
        call    ___sdcc_enter_ix
        push    af
        dec sp
        ld  -1 (ix), a
        ld  bc, (#_flSum + 0)
        ld  hl, (#(_flSum + 2) + 0)
    00101$:
        ld  a, (de)
        inc de
        ld  -3 (ix), a
        ld  -2 (ix), #0x00
        ld  a, -3 (ix)
        add a, c
        ld  c, a
        ld  a, #0x00
        adc a, b
        ld  b, a
        add hl, bc
        dec -1 (ix)
        jr  NZ, 00101$
        push    hl
        ld  de, #0x00ff
        ld  l, c
        ld  h, b
        call    __moduint
        ld  c, e
        ld  b, d
        pop hl
        push    bc
        ld  de, #0x00ff
        call    __moduint
        pop bc
        ld  (_flSum), bc
        ld  ((_flSum + 2)), de
        ld  sp, ix
        pop ix
        ret
    
    ;check.c:9: void fletch_sum_small_mod( unsigned char m, const unsigned char* s )
    ;   genLabel
    ;   genFunction
    ;   ---------------------------------
    ; Function fletch_sum_small_mod
    ; ---------------------------------
    ;   Register assignment is optimal.
    ; Stack space usage: 3 bytes.
    _fletch_sum_small_mod::
        call    ___sdcc_enter_ix
    ;   adjustStack by -3
        push    af
        dec sp
    ;   genReceive
    ;   AOP_STK for _fletch_sum_small_mod_sloc0_1_0
    ;   genMove_o size 1 result type 5 source type 2 hl_dead 1
        ld  -1 (ix), a
    ;   genMove_o size 0 result type 5 source type 1 hl_dead 1
    ;   genReceive
    ;   genMove_o size 2 result type 2 source type 2 hl_dead 1
    ;check.c:12: uint16_t s0 = flSum[0];
    ;   skipping iCode since result will be rematerialized
    ;   genPointerGet
        ld  bc, (#_flSum + 0)
    ;check.c:13: uint16_t s1 = flSum[1];
    ;   skipping iCode since result will be rematerialized
    ;   genPointerGet
        ld  hl, (#(_flSum + 2) + 0)
    ;check.c:14: do {
    ;   genLabel
    00101$:
    ;check.c:15: s0 += *s++;
    ;   genPointerGet
        ld  a, (de)
    ;   genMove_o size 1 result type 2 source type 2 hl_dead 0
    ;   genPlus
        inc de
    ;   genCast
    ;   (locations are the same)
    ;   genCast
    ;   AOP_STK for _fletch_sum_small_mod_sloc1_1_0
    ;   genMove_o size 1 result type 5 source type 2 hl_dead 0
        ld  -3 (ix), a
    ;   genMove_o size 0 result type 5 source type 1 hl_dead 0
    ;   genMove_o size 1 result type 5 source type 1 hl_dead 0
        ld  -2 (ix), #0x00
    ;   genPlus
    ;   AOP_STK for _fletch_sum_small_mod_sloc1_1_0
        ld  a, -3 (ix)
        add a, c
        ld  c, a
        ld  a, #0x00
        adc a, b
        ld  b, a
    ;check.c:16: s1 += s0;
    ;   genPlus
        add hl, bc
    ;check.c:17: } while ( --m );
    ;   genMinus
    ;   AOP_STK for _fletch_sum_small_mod_sloc0_1_0
        dec -1 (ix)
    ; common peephole 7 removed dead load from -1 (ix) into a.
    ; common peephole 117 decremented in -1 (ix) instead of going through a.
    ; common peephole 163 changed absolute to relative conditional jump.
        jr  NZ, 00101$
    ;   skipping generated iCode
    ;check.c:18: s0 %= 255;
    ;   genSend
        push    hl
    ;   genMove_o size 2 result type 2 source type 1 hl_dead 0
        ld  de, #0x00ff
    ;   genSend
    ;   genMove_o size 2 result type 2 source type 2 hl_dead 0
        ld  l, c
        ld  h, b
    ;   genMove_o size 0 result type 2 source type 1 hl_dead 0
    ;check.c:19: s1 %= 255;
    ;   genCall
        call    __moduint
    ;   genMove_o size 2 result type 2 source type 2 hl_dead 1
        ld  c, e
        ld  b, d
    ;   genMove_o size 0 result type 2 source type 1 hl_dead 1
        pop hl
    ;   genSend
        push    bc
    ;   genMove_o size 2 result type 2 source type 1 hl_dead 0
        ld  de, #0x00ff
    ;   genSend
    ;   genMove_o size 2 result type 2 source type 2 hl_dead 1
    ;check.c:20: flSum[0] = s0;
    ;   genCall
        call    __moduint
    ;   genMove_o size 2 result type 2 source type 2 hl_dead 1
        pop bc
    ;   genAssign (pointer)
        ld  (_flSum), bc
    ;check.c:21: flSum[1] = s1;
    ;   genAssign (pointer)
        ld  ((_flSum + 2)), de
    ;   genLabel
    ; common peephole 158 removed unused label 00104$.
    ;check.c:22: }
    ;   genEndFunction
        ld  sp, ix
        pop ix
        ret
    
     

    Last edit: Janko Stamenović 2024-04-13
    • Janko Stamenović

      One more, maybe only a bit more obvious, example of unnecessary use of the stack to use the value which is already in A, Additionally, here the assignments of A introduced into the loop could be seen, which are better to happen outside of it:

      Note ld a, #0x01 and xor a, a which aren't executed only on the loop exit but even if the loop continues:

      The kinds of functions I'm playing are library-like. For this one the compiler has nicely managed to not de-reference the src1 and src2 pointers more than once, so IMO it's a pity that the end result is as it is, as it seems that there is potential for more. Like before, I'm just trying to provide some samples of what could maybe be improved and I think this kind of issues is now reasonably covered here, and there's no need for more similar.

      int n_strsame( unsigned char len, const char* src1, const char* src2 )
      {
          for ( ; *src1 == *src2 && *src2 != 0 ; src1++, src2++ ) {
              if ( --len == 0 )
                  break;
          }
          return *src1 == *src2;
      }
      
      TD- 4.4.1 #14792 (MINGW64)
      

      First the short listing (note ld -1 (ix), a and ld a, -1 (ix) (unnecessary) and ld a, #0x01 and xor a, a (making the loop longer)) then follows verbose:

      sdcc  -mz80  --opt-code-speed --no-c-code-in-asm  --max-allocs-per-node2000000 --Werror --peep-return  -c
      
      _n_strsame::
          push    ix
          ld  ix,#0
          add ix,sp
          dec sp
          ld  c, a
          ld  l, 4 (ix)
          ld  h, 5 (ix)
      00106$:
          ld  a, (de)
          ld  -1 (ix), a
          ld  b, (hl)
          ld  a, -1 (ix)
          sub a, b
          ld  a, #0x01
          jr  Z, 00140$
          xor a, a
      00140$:
          or  a, a
          jr  Z, 00103$
          inc b
          dec b
          jr  Z, 00103$
          dec c
          jr  Z, 00103$
          inc hl
          inc de
          jp  00106$
      00103$:
          ld  e, a
          ld  d, #0x00
          inc sp
          pop ix
          pop hl
          pop af
          jp  (hl)
      

      Verbose:

      -mz80        --opt-code-speed --fverbose-asm     --max-allocs-per-node2000000 --Werror --peep-return  -c
      
      ;check.c:2: int n_strsame( unsigned char len, const char* src1, const char* src2 )
      ;   genLabel
      ;   genFunction
      ;   ---------------------------------
      ; Function n_strsame
      ; ---------------------------------
      ;   Register assignment is optimal.
      ; Stack space usage: 1 bytes.
      _n_strsame::
          push    ix
          ld  ix,#0
          add ix,sp
      ;   adjustStack by -1
          dec sp
      ;   genReceive
      ;   genMove_o size 1 result type 2 source type 2 hl_dead 1
          ld  c, a
      ;   genMove_o size 0 result type 2 source type 1 hl_dead 1
      ;   genReceive
      ;   genMove_o size 2 result type 2 source type 2 hl_dead 1
      ;check.c:8: return *src1 == *src2;
      ;   genAssign
      ;   AOP_STK for 
      ;   genMove_o size 2 result type 2 source type 5 hl_dead 1
          ld  l, 4 (ix)
          ld  h, 5 (ix)
      ;   genMove_o size 0 result type 2 source type 1 hl_dead 1
      ;   genLabel
      00106$:
      ;check.c:4: for ( ; *src1 == *src2 && *src2 != 0 ; src1++, src2++ ) {
      ;   genPointerGet
      ;   AOP_STK for _n_strsame_sloc0_1_0
          ld  a, (de)
      ;   genMove_o size 1 result type 5 source type 2 hl_dead 0
          ld  -1 (ix), a
      ;   genMove_o size 0 result type 5 source type 1 hl_dead 0
      ;   genPointerGet
          ld  b, (hl)
      ;   genCmpEq
      ;   AOP_STK for _n_strsame_sloc0_1_0
          ld  a, -1 (ix)
          sub a, b
          ld  a, #0x01
      ; common peephole 163 changed absolute to relative conditional jump.
          jr  Z, 00140$
      ; common peephole 169xnz used double assignment in case of NZ condition.
      ; common peephole 158 removed unused label 00139$.
          xor a, a
      00140$:
      ;   genMove_o size 1 result type 2 source type 2 hl_dead 0
      ;   genIfx
          or  a, a
      ; common peephole 163 changed absolute to relative conditional jump.
          jr  Z, 00103$
      ;   genIfx
          inc b
          dec b
      ; common peephole 163 changed absolute to relative conditional jump.
          jr  Z, 00103$
      ;check.c:5: if ( --len == 0 )
      ;   genMinus
          dec c
      ; common peephole 163 changed absolute to relative conditional jump.
          jr  Z, 00103$
      ;   skipping generated iCode
      ;check.c:4: for ( ; *src1 == *src2 && *src2 != 0 ; src1++, src2++ ) {
      ;   genPlus
      ;   genPlus
      ; common peephole 96c move inc hl before inc de
          inc hl
          inc de
      ;   genGoto
          jp  00106$
      ;   genLabel
      00103$:
      ;check.c:8: return *src1 == *src2;
      ;   genCast
      ;   genMove_o size 1 result type 2 source type 2 hl_dead 1
          ld  e, a
      ;   genMove_o size 0 result type 2 source type 1 hl_dead 1
      ;   genMove_o size 1 result type 2 source type 1 hl_dead 1
          ld  d, #0x00
      ;   genRet
      ;   genMove_o size 2 result type 2 source type 2 hl_dead 1
      ;   genLabel
      ; common peephole 158 removed unused label 00108$.
      ;check.c:9: }
      ;   genEndFunction
      ;   adjustStack by 1
          inc sp
          pop ix
          pop hl
      ;   adjustStack by 2
          pop af
          jp  (hl)
      
       
  • Janko Stamenović

    Just to report that, in spite of my hopes that #14808 maybe solved this one too, the code:

    void printchar( unsigned char c );
    
    void printnibble( unsigned char x ) {
        x = x & 0xf;
        if ( x > 10 )
            x += 'a' - 10;
        else
            x += '0';
        printchar( x );
    }
    

    compiled using Version 4.4.1 #14808 (Linux) still produces the use of the register C "for no reason".:

    _printnibble::
        and a, #0x0f
        ld  c, a
        cp  a, #0x0b
        jr  C, 00102$
        ld  a, c
        add a, #0x57
        ld  c, a
        jp  00103$
    00102$:
        ld  a, c
        add a, #0x30
        ld  c, a
    00103$:
        ld  a, c
        jp  _printchar
    

    The verbose output follows where I understand now that the register allocator decided that C is needed to do that addition of value #'a'-10 (see ;ic:25:) which was then generated as ld a, c add a, #0x57. I'll try to understand more for myself in the next few days, now that I've learned something related to the fixes for #14808.

    ;   ---------------------------------
    ; Function printnibble
    ; ---------------------------------
    ;   Register assignment is optimal.
    ; Stack space usage: 0 bytes.
    _printnibble::
    ;ic:2: iTemp12 [k17 lr3:4 so:0]{ ia0 a2p0 re0 rm0 nos0 ru0 dp0}{unsigned-char fixed}[a ] = recv _printnibble [k1 lr0:0 so:0]{ ia0 a2p0 re0 rm0 nos0 ru0 dp0}{void function ( unsigned-char auto) fixed}
    ;   genReceive
    ;   genMove_o size 1 result type 2 source type 2 hl_dead 1
    ;check.c:4: x = x & 0xf;
    ;ic:4:  iTemp0 [k2 lr4:6 so:0]{ ia0 a2p0 re1 rm0 nos0 ru0 dp0}{unsigned-char fixed}{ sir@ _printnibble_x_10000_2}[a ] = iTemp12 [k17 lr3:4 so:0]{ ia0 a2p0 re0 rm0 nos0 ru0 dp0}{unsigned-char fixed}[a ] & 0xf {unsigned-char literal}
    ;   genAnd
    ;   cheapMove
        and a, #0x0f
    ;   genMove_o size 1 result type 2 source type 2 hl_dead 1
    ;check.c:6: x += 'a' - 10;
    ;ic:25:     iTemp11 [k16 lr5:11 so:0]{ ia0 a2p0 re0 rm0 nos0 ru0 dp0}{signed-char fixed}[c ] = (signed-char register)iTemp0 [k2 lr4:6 so:0]{ ia0 a2p0 re1 rm0 nos0 ru0 dp0}{unsigned-char fixed}{ sir@ _printnibble_x_10000_2}[a ]
    ;   genCast
    ;   genMove_o size 1 result type 2 source type 2 hl_dead 1
    ;   cheapMove
        ld  c, a
    ;   genMove_o size 0 result type 2 source type 1 hl_dead 1
    ;check.c:5: if ( x > 10 )
    ;ic:7:  iTemp4 [k7 lr6:7 so:0]{ ia0 a2p0 re0 rm0 nos0 ru0 dp0}{unsigned-char fixed} = iTemp0 [k2 lr4:6 so:0]{ ia0 a2p0 re1 rm0 nos0 ru0 dp0}{unsigned-char fixed}{ sir@ _printnibble_x_10000_2}[a ] > 0xa {unsigned-char literal}
    ;   genCmpGt
    ; common peephole 6c removed dead cp a, #0x0a
        cp  a, #0x0b
    ; common peephole 163 changed absolute to relative conditional jump.
        jr  C, 00102$
    ;ic:8:  if iTemp4 [k7 lr6:7 so:0]{ ia0 a2p0 re0 rm0 nos0 ru0 dp0}{unsigned-char fixed} == 0 goto __iffalse_0($2)
    ;   skipping generated iCode
    ;check.c:6: x += 'a' - 10;
    ;ic:12:     iTemp13 [k18 lr8:13 so:0]{ ia0 a2p0 re0 rm0 nos0 ru0 dp0}{unsigned-char fixed}[c ] = iTemp11 [k16 lr5:11 so:0]{ ia0 a2p0 re0 rm0 nos0 ru0 dp0}{signed-char fixed}[c ] + 0x57 {signed-char literal}
    ;   genPlus
    ;   cheapMove
        ld  a, c
        add a, #0x57
    ;   cheapMove
        ld  c, a
    ;ic:14:      goto __ifend_0($3)
    ;   genGoto
        jp  00103$
    ;ic:15:  __iffalse_0($2) :
    ;   genLabel
    00102$:
    ;check.c:8: x += '0';
    ;ic:17:     iTemp13 [k18 lr8:13 so:0]{ ia0 a2p0 re0 rm0 nos0 ru0 dp0}{unsigned-char fixed}[c ] = iTemp11 [k16 lr5:11 so:0]{ ia0 a2p0 re0 rm0 nos0 ru0 dp0}{signed-char fixed}[c ] + 0x30 {signed-char literal}
    ;   genPlus
    ;   cheapMove
        ld  a, c
        add a, #0x30
    ;   cheapMove
        ld  c, a
    ;ic:19:  __ifend_0($3) :
    ;   genLabel
    00103$:
    ;check.c:9: printchar( x );
    ;ic:20:     send iTemp13 [k18 lr8:13 so:0]{ ia0 a2p0 re0 rm0 nos0 ru0 dp0}{unsigned-char fixed}[c ]{argreg = 1}
    ;   genSend
    ;   genMove_o size 1 result type 2 source type 2 hl_dead 1
    ;   cheapMove
        ld  a, c
    ;   genMove_o size 0 result type 2 source type 1 hl_dead 1
    ;ic:21:     iTemp9 [k14 lr14:14 so:0]{ ia0 a2p0 re0 rm0 nos0 ru0 dp0}{void fixed} = call _printchar [k12 lr0:0 so:0]{ ia0 a2p0 re0 rm0 nos0 ru0 dp0}{void function ( unsigned-char fixed) fixed}
    ;   genCall
    ;ic:22:  _return($4) :
    ;   genLabel
    ; common peephole 158 removed unused label 00104$.
    ;check.c:10: }
    ;ic:23:     eproc _printnibble [k1 lr0:0 so:0]{ ia0 a2p0 re0 rm0 nos0 ru0 dp0}{void function ( unsigned-char auto) fixed}
    ;   genEndFunction
        jp  _printchar
    ; common peephole 152 removed unused ret.
    
     

    Last edit: Janko Stamenović 2024-04-18
    • Philipp Klaus Krause

      iTemp0 is in a, conflicts with iTemp 13, so iTemp 13 can't be in a.

      In this case, we'd get better code if iTemp0 was just replaced by iTemp13 in the comparison, but that isn't true in general: imagine a situation where iTemp13 should be spilt for whatever reason (e.g. function calls in the branches of the if). Then being able to use iTemp0, and thus the value still in a in the comparison results in better code.

      Maybe it would be possible to tell the register allocator that iTemp0 and iTemp13 are actually the same, and can thus both reside in the same register?

       
      • Janko Stamenović

        I think now we come to these subjects which I touched speaking about what's live where and when? The way I would understand all this: in each line where in the source "x" gets a new value (as a lval) there indeed must be a new iTemp imagined, but the same assignment to lval makes the previous iTemp "dead"?

        How I'd understand it:

        void printnibble( unsigned char x ) {  // let's call it T0
            x = x & 0xf;       // T1 := T0 & 0xf, T0 becomes "dead"
            if ( x > 10 )     
                x += 'a' - 10;   //  T2 := T1 + c1, T1 dead from here down
            else
                x += '0';        //  T3 := T1 + c2, T1 dead from here down
        
            // here is T1 dead, considering the traversal of all 
            // paths that arrive here and printchar gets either T2 OR T3 here
            // so nothing else to be preserved but T2 and T3 ?
        
            printchar( x );   
        }
        
         
        • Philipp Klaus Krause

          For the register allocator used by most ports, including z80: use --dump-graphs, the file named *.dumpralloccfg* is a control-flow graph at the time of register allocation annotated with live-ranges.

          Each node is labeled like: "x, y: a, b, …". The first number is just a numbering of the nodes, the second one is the number of the corresponding iCode (e.g. what you see in --i-code-in-asm output as "ic:23" would have a node label. "x, 23: a, b, …". The numbers after the colon are the variable bytes alive there (the numbers correspond to the numbers of the nodes in the conflict graph).

           
          • Janko Stamenović

            I'm trying to think at the level before the actual generation: If I write the C code like this, it is obvious that x is not used when both if branches come back together:

            void printchar( unsigned char c );
            
            void printnibble( unsigned char x ) {
                x = x & 0xf;
                unsigned char y;
                if ( x > 10 )
                    y = x + 'a' - 10;
                else
                    y = x + '0';
                printchar( y );
            }
            

            but I still get for that input:

            ;   ---------------------------------
            ; Function printnibble
            ; ---------------------------------
            ;   Register assignment is optimal.
            ; Stack space usage: 0 bytes.
            _printnibble::
            ;ic:2: iTemp13 [k19 lr3:4 so:0]{ ia0 a2p0 re0 rm0 nos0 ru0 dp0}{unsigned-char fixed}[a ] = recv _printnibble [k1 lr0:0 so:0]{ ia0 a2p0 re0 rm0 nos0 ru0 dp0}{void function ( unsigned-char auto) fixed}
            ;   genReceive
            ;   genMove_o size 1 result type 2 source type 2 hl_dead 1
            ;check.c:4: x = x & 0xf;
            ;ic:4:  iTemp0 [k2 lr4:6 so:0]{ ia0 a2p0 re1 rm0 nos0 ru0 dp0}{unsigned-char fixed}{ sir@ _printnibble_x_10000_2}[a ] = iTemp13 [k19 lr3:4 so:0]{ ia0 a2p0 re0 rm0 nos0 ru0 dp0}{unsigned-char fixed}[a ] & 0xf {unsigned-char literal}
            ;   genAnd
            ;   cheapMove
                and a, #0x0f
            ;   genMove_o size 1 result type 2 source type 2 hl_dead 1
            ;check.c:7: y = x + 'a' - 10;
            ;ic:25:     iTemp12 [k18 lr5:11 so:0]{ ia0 a2p0 re0 rm0 nos0 ru0 dp0}{signed-char fixed}[c ] = (signed-char register)iTemp0 [k2 lr4:6 so:0]{ ia0 a2p0 re1 rm0 nos0 ru0 dp0}{unsigned-char fixed}{ sir@ _printnibble_x_10000_2}[a ]
            ;   genCast
            ;   genMove_o size 1 result type 2 source type 2 hl_dead 1
            ;   cheapMove
                ld  c, a
            ;   genMove_o size 0 result type 2 source type 1 hl_dead 1
            ;check.c:6: if ( x > 10 )
            ;ic:7:  iTemp4 [k7 lr6:7 so:0]{ ia0 a2p0 re0 rm0 nos0 ru0 dp0}{unsigned-char fixed} = iTemp0 [k2 lr4:6 so:0]{ ia0 a2p0 re1 rm0 nos0 ru0 dp0}{unsigned-char fixed}{ sir@ _printnibble_x_10000_2}[a ] > 0xa {unsigned-char literal}
            ;   genCmpGt
            ; common peephole 6c removed dead cp a, #0x0a
                cp  a, #0x0b
            ; common peephole 163 changed absolute to relative conditional jump.
                jr  C, 00102$
            ;ic:8:  if iTemp4 [k7 lr6:7 so:0]{ ia0 a2p0 re0 rm0 nos0 ru0 dp0}{unsigned-char fixed} == 0 goto __iffalse_0($2)
            ;   skipping generated iCode
            ;check.c:7: y = x + 'a' - 10;
            ;ic:12:     iTemp5 [k8 lr8:13 so:0]{ ia0 a2p0 re1 rm0 nos0 ru0 dp0}{unsigned-char fixed}[c ] = iTemp12 [k18 lr5:11 so:0]{ ia0 a2p0 re0 rm0 nos0 ru0 dp0}{signed-char fixed}[c ] + 0x57 {signed-char literal}
            ;   genPlus
            ;   cheapMove
                ld  a, c
                add a, #0x57
            ;   cheapMove
                ld  c, a
            ;ic:14:      goto __ifend_0($3)
            ;   genGoto
                jp  00103$
            ;ic:15:  __iffalse_0($2) :
            ;   genLabel
            00102$:
            ;check.c:9: y = x + '0';
            ;ic:17:     iTemp5 [k8 lr8:13 so:0]{ ia0 a2p0 re1 rm0 nos0 ru0 dp0}{unsigned-char fixed}[c ] = iTemp12 [k18 lr5:11 so:0]{ ia0 a2p0 re0 rm0 nos0 ru0 dp0}{signed-char fixed}[c ] + 0x30 {signed-char literal}
            ;   genPlus
            ;   cheapMove
                ld  a, c
                add a, #0x30
            ;   cheapMove
                ld  c, a
            ;ic:19:  __ifend_0($3) :
            ;   genLabel
            00103$:
            ;check.c:10: printchar( y );
            ;ic:20:     send iTemp5 [k8 lr8:13 so:0]{ ia0 a2p0 re1 rm0 nos0 ru0 dp0}{unsigned-char fixed}[c ]{argreg = 1}
            ;   genSend
            ;   genMove_o size 1 result type 2 source type 2 hl_dead 1
            ;   cheapMove
                ld  a, c
            ;   genMove_o size 0 result type 2 source type 1 hl_dead 1
            ;ic:21:     iTemp10 [k16 lr14:14 so:0]{ ia0 a2p0 re0 rm0 nos0 ru0 dp0}{void fixed} = call _printchar [k14 lr0:0 so:0]{ ia0 a2p0 re0 rm0 nos0 ru0 dp0}{void function ( unsigned-char fixed) fixed}
            ;   genCall
            ;ic:22:  _return($4) :
            ;   genLabel
            ; common peephole 158 removed unused label 00104$.
            ;check.c:11: }
            ;ic:23:     eproc _printnibble [k1 lr0:0 so:0]{ ia0 a2p0 re0 rm0 nos0 ru0 dp0}{void function ( unsigned-char auto) fixed}
            ;   genEndFunction
                jp  _printchar
            
             
            • Janko Stamenović

              And to improve the naming of the "temporary" variables, for this input the C register still preserves the value of x1 as soon as that one is created:

              void printchar( unsigned char c );
              
              void printnibble( unsigned char x ) {
                  unsigned char x1;
                  x1 = x & 0xf;           // x not used afterwards
                  unsigned char x2;
                  if ( x1 > 10 )
                      x2 = x1 + 'a' - 10;   // x1 not used afterwards in this br.
                  else
                      x2 = x1 + '0';        // x1 not used afterwards in this br.
                  printchar( x2 );       // only x2 used here
              }
              

              An actual output, with my comments added to it:

              _printnibble::
                  and a, #0x0f
                  ld  c, a           ; C := x1
                  cp  a, #0x0b
                  jr  C, 00102$
                  ld  a, c           ; A both before and after has x1
                  add a, #0x57
                  ld  c, a           ; C := x2
                  jp  00103$
              00102$:
                  ld  a, c           ; A both before and after has  x1
                  add a, #0x30
                  ld  c, a           ; C := x2
              00103$:
                  ld  a, c           ; A := x2
                  jp  _printchar
              

              There's clearly both analysis of what can be reused and the reuse of the same register for different variables (compared to the names in the starting source), but still creating some unnecessary use?

               

              Last edit: Janko Stamenović 2024-04-19
            • Philipp Klaus Krause

              Yes, sdcc should have used a for y here. Now, in [r14810], it does.

               
              👍
              1

              Related

              Commit: [r14810]

              • Janko Stamenović

                Thanks! I learn so much, especially as I will be able to learn from the changes.

                Now with #14810 the C register is still used to preserve x1 which is in A already:

                void printchar( unsigned char c );
                
                void printnibble( unsigned char x ) {
                    unsigned char x1;
                    x1 = x & 0xf;
                    unsigned char x2;
                    if ( x1 > 10 )
                        x2 = x1 + 'a' - 10;
                    else
                        x2 = x1 + '0';
                    printchar( x2 );
                }
                

                the output, with my comments added:

                ;--------------------------------------------------------
                ; File Created by SDCC : free open source ISO C Compiler 
                ; Version 4.4.1 #14810 (Linux)
                ;--------------------------------------------------------
                
                ...
                
                _printnibble::
                    and a, #0x0f     ; A := x1
                    ld  c, a         ; C := x1  ; unnecessary
                    cp  a, #0x0b
                    jr  C, 00102$
                    ld  a, c         ; in A is x1, but A := x1 from C ; unnecessary
                    add a, #0x57     ; A := x1 + const1
                    jp  _printchar
                00102$:
                    ld  a, c         ; in A is x1, but A := x1 from C ; unnecessary
                    add a, #0x30     ; A := x1 + const2
                    jp  _printchar
                
                 

                Last edit: Janko Stamenović 2024-04-19
                • Janko Stamenović

                  So now with #14810 the C register is used foriTemp13, but why is iTemp13 there at all?

                  The i-Code seen in the verbose output for

                  void printnibble( unsigned char x ) {
                      unsigned char x1;
                      x1 = x & 0xf;
                      unsigned char x2;
                      if ( x1 > 10 )
                          x2 = x1 + 'a' - 10;
                      else
                          x2 = x1 + '0';
                      printchar( x2 );
                  }
                  

                  introduces iTemp13 to keep the copy of the iTemp1 which has the value of x1, but I don't understand why:

                  Here iTemp1 (which is == x1) (I'll keep them not marked as code intentionally so that they could wrap):

                  ;ic:4: iTemp1 [k4 lr4:6 so:0]{ ia0 a2p0 re1 rm0 nos0 ru0 dp0}{unsigned-char fixed}[a ] = iTemp0 [k2 lr3:4 so:0]{ ia0 a2p0 re1 rm0 nos0 ru0 dp0}{unsigned-char fixed}{ sir@ _printnibble_x_10000_2}[a ] & 0xf {unsigned-char literal}

                  Here iTemp13

                  ;ic:25: iTemp13 [k20 lr5:11 so:0]{ ia0 a2p0 re0 rm0 nos0 ru0 dp0}{signed-char fixed}[c ] = (signed-char register)iTemp1 [k4 lr4:6 so:0]{ ia0 a2p0 re1 rm0 nos0 ru0 dp0}{unsigned-char fixed}[a ]

                  Then this iTemp13 is used in both if branches to do the add-constant. And both branches then assign to the same iTemp6.

                  ;ic:12: iTemp6 [k10 lr8:13 so:0]{ ia0 a2p0 re1 rm0 nos0 ru0 dp0}{unsigned-char fixed}[a ] = iTemp13 [k20 lr5:11 so:0]{ ia0 a2p0 re0 rm0 nos0 ru0 dp0}{signed-char fixed}[c ] + 0x57 {signed-char literal}

                  ;ic:17: iTemp6 [k10 lr8:13 so:0]{ ia0 a2p0 re1 rm0 nos0 ru0 dp0}{unsigned-char fixed}[a ] = iTemp13 [k20 lr5:11 so:0]{ ia0 a2p0 re0 rm0 nos0 ru0 dp0}{signed-char fixed}[c ] + 0x30 {signed-char literal}

                  And the ralloccon graph does say that iTemp13 can't be the same register as iTemp1 (edge between nodes 1 and 2, where 1 is iTemo1, and 2 is iTemp13)

                  graph G {
                  0[label="0 : iTemp0:0"];
                  1[label="1 : iTemp1:0"];
                  2[label="2 : iTemp13:0"];
                  3[label="3 : iTemp6:0"];
                  1--2 ;
                  }
                  

                  But: WHY? Why should iTemp13 exist at all? I can't understand where it comes from.

                  I understand that given that input register allocator decides that it has to use C for iTemp13, and that it can't use A.

                  But what and why introduced that iTemp13, which is, by the way {signed-char fixed} and used to add {signed-char literal} in both branches of the if, maybe that is a clue?

                   

                  Last edit: Janko Stamenović 2024-04-19
                  • Janko Stamenović

                    And the answer was indeed "the signs": if I rewrite the input to

                    void printchar( unsigned char c );
                    
                    void printnibble( unsigned char x ) {
                        unsigned char x1;
                        x1 = x & 0xf;
                        unsigned char x2;
                        if ( x1 > 10 )
                            x2 = x1 + (unsigned char)('a' - 10);
                        else
                            x2 = x1 + (unsigned char)('0');
                        printchar( x2 );
                    }
                    

                    Then I do get

                    _printnibble::
                        and a, #0x0f
                        cp  a, #0x0b
                        jr  C, 00102$
                        add a, #0x57
                        jp  _printchar
                    00102$:
                        add a, #0x30
                        jp  _printchar
                    

                    So, finally, starting from the initial example and with #14810 adding one cast is needed to remove the C register use:

                    void printnibble( unsigned char x ) {
                        x &= 0xf;
                        if ( x > 10 )
                            x += (unsigned char)('a' - 10); // currently needed to avoid the C register!!!
                        else
                            x += '0';
                        printchar( x );
                    }
                    

                    removing that cast will introduce the use of register C

                     
          • Janko Stamenović

            (deleted #14808 dump as now there's #14810)

             

            Last edit: Janko Stamenović 2024-04-19

Log in to post a comment.