Small Device C Compiler (SDCC) / Feature Requests / #907 Inefficient z80 code (16 bit instructions neglected)

Janko Stamenović - 2024-04-14

In the first "could have been coded" there's a 16-bit access of a 8-bit variable (fetching the byte behind the jj in memory)

ld hl,(#_jj)

I don't think that, in general, such accesses should be acceptable for SDCC, unless explicitly allowed, and there are probably more "low hanging fruits" for improvements. But it appears to me that the same solution with:

ld a,(#_jj) ld l,a

is still better than using IY there.

Ragozini Arturo, to allow easier reproduction, it would be very helpful if you would take some time to reduce your examples to the pieces of self-contained code, which could be easily copy-pasted and compiled independently without your own external headers or any other files (including providing the exact command line parameters of SDCC that produce your code). I've seen you've posted a few .c files in other posts, but some definitions (which affect the sizes of the structures used and their content) are still in headers known only to you. I think this could be an example of a self-sufficient and more minimal code demonstrating a problem:

https://sourceforge.net/p/sdcc/bugs/3727/?limit=25#8b7e/fdb0
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ragozini Arturo - 2024-04-17

Also using the value of a pointer to a field to reach another field of the same structure by adding the offset between them should be easy to implement...
The saving is that the offset between fields in a struct is a constant, while the base address of the struct is usually more difficult to be computed and you do not need to do that twice

Last edit: Ragozini Arturo 2024-04-17

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Philipp Klaus Krause - 2024-04-22

In [r14816], I improved the use of Z80 16-bit loads for 8-bit registers. I think we are now making good use of those (assuming --allow-unsafe-read is used). I believe there is some further potential in better using Rabbit 16-bit loads and Z80 16-bit dec, inc and add.

Related

Commit: [r14816]

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Do we have a not-too-big, self-contained, compileable sample to reproduce this issue?

https://sourceforge.net/p/sdcc/feature-requests/909/?page=1&limit=25#300a/42f5/73c1

for the first:

;check.c:104: i8 dy = (i8) MyObj->dy[jj];
    pop bc
    push    bc
    ld  hl, #10
    add hl, bc
    ld  c, (hl)
    inc hl
    ld  b, (hl)
    ld  a, (_jj)
    ld  l, a
    ld  h, #0x00
    add hl, bc
    ld  c, (hl)

so it seems "neglects the use of ADD HL,BC Moreover it resorts to IY and to 8 bit ADD, when using a 16 bit ADD would have avoided IY at all" aren't current anymore?

But the second is:

;check.c:105: y = MyObj->y + dy;
    pop hl
    push    hl
    inc hl
    inc hl
    inc hl
    ld  a, (hl)
    inc hl
    ld  h, (hl)
    ld  l, a
    ld  a, c
    rlca
    sbc a, a
    ld  b, a
    add hl, bc
    ld  (_y), hl

which is much longer than the suggested, and the pushes / pops appear to me that it's about the compiler's attempt to reuse something earlier, so the result needs the previous.

        ld hl,3
        add hl,bc
        ld a,(hl)
        inc hl
        ld h,(hl)
        ld l,a
        ld  d, #0x00
        add hl, de
        ld  (_y), hl

but also note that you used "defaults" in your compilation.

with 200000 I for the same file, same version of compiler it's without push pop, but the second line still clearly long:

;check.c:104: i8 dy = (i8) MyObj->dy[jj];
    ld  e, c
    ld  d, b
    ld  hl, #10
    add hl, de
    ld  e, (hl)
    inc hl
    ld  d, (hl)
    ld  a, (_jj)
    ld  l, a
    ld  h, #0x00
    add hl, de
    ld  e, (hl)
;check.c:105: y = MyObj->y + dy;
    ld  l, c
    ld  h, b
    inc hl
    inc hl
    inc hl
    ld  a, (hl)
    inc hl
    ld  h, (hl)
    ld  l, a
    ld  a, e
    rlca
    sbc a, a
    ld  d, a
    add hl, de
    ld  (_y), hl

Edit: and this shorter C (attached) produces with your latest version #14815 (Linux)

;check.c:92: i8 dy = (i8) MyObj->dy[jj];
    pop hl
    push    hl
    ld  de, #0x000a
    add hl, de
    ld  c, (hl)
    inc hl
    ld  b, (hl)
    ld  a, (_jj)
    ld  l, a
    ld  h, #0x00
    add hl, bc
    ld  c, (hl)
;check.c:93: y = MyObj->y + dy;
    ld  l, -10 (ix)
    ld  h, -9 (ix)
    inc hl
    inc hl
    inc hl
    ld  a, (hl)
    inc hl
    ld  b, (hl)
    ld  l, a
    ld  a, c
    rlca
    sbc a, a
    ld  h, a
    add hl, bc
    ld  (_y), hl

Last edit: Janko Stamenović 2024-04-22

check.c

Philipp Klaus Krause - 2024-04-22

summary: Bad z80 code (16 bit instructions neglected) --> Inefficient z80 code (16 bit instructions neglected)

Group: -->
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ragozini Arturo - 2024-04-23

be aware that this code seems broken

;check.c:93: y = MyObj->y + dy; ld l, -10 (ix) ld h, -9 (ix) inc hl inc hl inc hl ld a, (hl) inc hl ld b, (hl) ld l, a ld a, c rlca sbc a, a ld h, a add hl, bc ld (_y), hl

it is mixing the low and the high bytes of Y and dy
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Philipp Klaus Krause - 2024-04-23
  
  Looks fine to me: dy (extended to 16 bits) is stored in h:c, while y is in b:l. I.e. the sum we want is (256 * h + c) + (256 * b + l) which by commutativity of addition is (256 * h + l) + (256 * b + c), and thus add hl, bc gives the correct result.
  
  👍
  1
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ragozini Arturo - 2024-04-23

I see, you are right, a bit perverted but correct and it is using only HL and BC, without resorting to DE (the drawback is that you cannot reuse BC later -just in case, not here, because C=dy and B= y/256)

Last edit: Ragozini Arturo 2024-04-23

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Inefficient z80 code (16 bit instructions neglected)

The Small Device C Compiler (SDCC), targeting 8-bit architectures

Group

Searches

Help

#907 Inefficient z80 code (16 bit instructions neglected)

Discussion

Related