Menu

#907 Inefficient z80 code (16 bit instructions neglected)

None
open
nobody
None
5
2024-05-01
2024-04-13
No

This code is translated very poorly

                    u8 dy = *(MyObj->dy+jj);
                    y = MyObj->y + dy;

The first line becomes this, where BC is holding the value of MyObj (pointer to a struct)

;C:\Users\PC\OneDrive\Documenti\MSXgl-1.0.0\projects\template_msx2/template_s255_b1.c:145: u8 dy = *(MyObj->dy+jj);
    ld  e, c
    ld  d, b
    ld  hl, #10
    add hl, de
    ld  e, (hl)
    inc hl
    ld  d, (hl)
    ld  a, e
    ld  iy, #_jj
    add a, 0 (iy)
    ld  e, a
    jr  NC, 00341$
    inc d
00341$:
    ld  a, (de)
    ld  e, a

but using all the z80 instruction this could have been coded as:

        ld  hl, #10
        add hl, bc
        ld  e, (hl)
        inc hl
        ld  d, (hl)     // now DE = MyObj->dy
        ld hl,(#_jj)
        ld h,0
        add hl,de       // now HL = MyObj->dy+jj
        ld e,(hl)       // now E = dy 

The compiler neglects the use of ADD HL,BC
Moreover it resorts to IY and to 8 bit ADD, when using a 16 bit ADD would have avoided IY at all

The second line of C is coded as follow ( BC is holding the value of MyOb)

;C:\Users\PC\OneDrive\Documenti\MSXgl-1.0.0\projects\template_msx2/template_s255_b1.c:146: y = MyObj->y + dy;
    ld  l, c
;   spillPairReg hl
;   spillPairReg hl
    ld  h, b
;   spillPairReg hl
;   spillPairReg hl
    inc hl
    inc hl
    inc hl
    ld  a, (hl)
    inc hl
    ld  h, (hl)
;   spillPairReg hl
    ld  l, a
;   spillPairReg hl
;   spillPairReg hl
    ld  d, #0x00
    add hl, de
    ld  (_y), hl

but again it could have shortened as:

        ld hl,3
        add hl,bc
        ld a,(hl)
        inc hl
        ld h,(hl)
        ld l,a
        ld  d, #0x00
        add hl, de
        ld  (_y), hl

It seems that the compiler does not consider that 16bit ADD HL,BC is faster than 3 INC HL

Discussion

  • Janko Stamenović

    In the first "could have been coded" there's a 16-bit access of a 8-bit variable (fetching the byte behind the jj in memory)

    ld hl,(#_jj)
    

    I don't think that, in general, such accesses should be acceptable for SDCC, unless explicitly allowed, and there are probably more "low hanging fruits" for improvements. But it appears to me that the same solution with:

    ld a,(#_jj)
    ld l,a  
    

    is still better than using IY there.

    Ragozini Arturo, to allow easier reproduction, it would be very helpful if you would take some time to reduce your examples to the pieces of self-contained code, which could be easily copy-pasted and compiled independently without your own external headers or any other files (including providing the exact command line parameters of SDCC that produce your code). I've seen you've posted a few .c files in other posts, but some definitions (which affect the sizes of the structures used and their content) are still in headers known only to you. I think this could be an example of a self-sufficient and more minimal code demonstrating a problem:

    https://sourceforge.net/p/sdcc/bugs/3727/?limit=25#8b7e/fdb0

     
  • Ragozini Arturo

    Ragozini Arturo - 2024-04-17

    Also using the value of a pointer to a field to reach another field of the same structure by adding the offset between them should be easy to implement...
    The saving is that the offset between fields in a struct is a constant, while the base address of the struct is usually more difficult to be computed and you do not need to do that twice

     

    Last edit: Ragozini Arturo 2024-04-17
  • Philipp Klaus Krause

    In [r14816], I improved the use of Z80 16-bit loads for 8-bit registers. I think we are now making good use of those (assuming --allow-unsafe-read is used). I believe there is some further potential in better using Rabbit 16-bit loads and Z80 16-bit dec, inc and add.

     

    Related

    Commit: [r14816]

  • Philipp Klaus Krause

    Do we have a not-too-big, self-contained, compileable sample to reproduce this issue?

     
    • Janko Stamenović

      https://sourceforge.net/p/sdcc/feature-requests/909/?page=1&limit=25#300a/42f5/73c1

      for the first:

      ;check.c:104: i8 dy = (i8) MyObj->dy[jj];
          pop bc
          push    bc
          ld  hl, #10
          add hl, bc
          ld  c, (hl)
          inc hl
          ld  b, (hl)
          ld  a, (_jj)
          ld  l, a
          ld  h, #0x00
          add hl, bc
          ld  c, (hl)
      

      so it seems "neglects the use of ADD HL,BC Moreover it resorts to IY and to 8 bit ADD, when using a 16 bit ADD would have avoided IY at all" aren't current anymore?

      But the second is:

      ;check.c:105: y = MyObj->y + dy;
          pop hl
          push    hl
          inc hl
          inc hl
          inc hl
          ld  a, (hl)
          inc hl
          ld  h, (hl)
          ld  l, a
          ld  a, c
          rlca
          sbc a, a
          ld  b, a
          add hl, bc
          ld  (_y), hl
      

      which is much longer than the suggested, and the pushes / pops appear to me that it's about the compiler's attempt to reuse something earlier, so the result needs the previous.

              ld hl,3
              add hl,bc
              ld a,(hl)
              inc hl
              ld h,(hl)
              ld l,a
              ld  d, #0x00
              add hl, de
              ld  (_y), hl
      

      but also note that you used "defaults" in your compilation.

      with 200000 I for the same file, same version of compiler it's without push pop, but the second line still clearly long:

      ;check.c:104: i8 dy = (i8) MyObj->dy[jj];
          ld  e, c
          ld  d, b
          ld  hl, #10
          add hl, de
          ld  e, (hl)
          inc hl
          ld  d, (hl)
          ld  a, (_jj)
          ld  l, a
          ld  h, #0x00
          add hl, de
          ld  e, (hl)
      ;check.c:105: y = MyObj->y + dy;
          ld  l, c
          ld  h, b
          inc hl
          inc hl
          inc hl
          ld  a, (hl)
          inc hl
          ld  h, (hl)
          ld  l, a
          ld  a, e
          rlca
          sbc a, a
          ld  d, a
          add hl, de
          ld  (_y), hl
      

      Edit: and this shorter C (attached) produces with your latest version #14815 (Linux)

      ;check.c:92: i8 dy = (i8) MyObj->dy[jj];
          pop hl
          push    hl
          ld  de, #0x000a
          add hl, de
          ld  c, (hl)
          inc hl
          ld  b, (hl)
          ld  a, (_jj)
          ld  l, a
          ld  h, #0x00
          add hl, bc
          ld  c, (hl)
      ;check.c:93: y = MyObj->y + dy;
          ld  l, -10 (ix)
          ld  h, -9 (ix)
          inc hl
          inc hl
          inc hl
          ld  a, (hl)
          inc hl
          ld  b, (hl)
          ld  l, a
          ld  a, c
          rlca
          sbc a, a
          ld  h, a
          add hl, bc
          ld  (_y), hl
      
       

      Last edit: Janko Stamenović 2024-04-22
  • Philipp Klaus Krause

    • summary: Bad z80 code (16 bit instructions neglected) --> Inefficient z80 code (16 bit instructions neglected)
    • Group: -->
     
  • Ragozini Arturo

    Ragozini Arturo - 2024-04-23

    be aware that this code seems broken

    ;check.c:93: y = MyObj->y + dy;
        ld  l, -10 (ix)
        ld  h, -9 (ix)
        inc hl
        inc hl
        inc hl
        ld  a, (hl)
        inc hl
        ld  b, (hl)
        ld  l, a
        ld  a, c
        rlca
        sbc a, a
        ld  h, a
        add hl, bc
        ld  (_y), hl
    

    it is mixing the low and the high bytes of Y and dy

     
    • Philipp Klaus Krause

      Looks fine to me: dy (extended to 16 bits) is stored in h:c, while y is in b:l. I.e. the sum we want is (256 * h + c) + (256 * b + l) which by commutativity of addition is (256 * h + l) + (256 * b + c), and thus add hl, bc gives the correct result.

       
      👍
      1
  • Ragozini Arturo

    Ragozini Arturo - 2024-04-23

    I see, you are right, a bit perverted but correct and it is using only HL and BC, without resorting to DE (the drawback is that you cannot reuse BC later -just in case, not here, because C=dy and B= y/256)

     

    Last edit: Ragozini Arturo 2024-04-23

Log in to post a comment.

MongoDB Logo MongoDB