This code is translated very poorly
u8 dy = *(MyObj->dy+jj);
y = MyObj->y + dy;
The first line becomes this, where BC is holding the value of MyObj (pointer to a struct)
;C:\Users\PC\OneDrive\Documenti\MSXgl-1.0.0\projects\template_msx2/template_s255_b1.c:145: u8 dy = *(MyObj->dy+jj);
ld e, c
ld d, b
ld hl, #10
add hl, de
ld e, (hl)
inc hl
ld d, (hl)
ld a, e
ld iy, #_jj
add a, 0 (iy)
ld e, a
jr NC, 00341$
inc d
00341$:
ld a, (de)
ld e, a
but using all the z80 instruction this could have been coded as:
ld hl, #10
add hl, bc
ld e, (hl)
inc hl
ld d, (hl) // now DE = MyObj->dy
ld hl,(#_jj)
ld h,0
add hl,de // now HL = MyObj->dy+jj
ld e,(hl) // now E = dy
The compiler neglects the use of ADD HL,BC
Moreover it resorts to IY and to 8 bit ADD, when using a 16 bit ADD would have avoided IY at all
The second line of C is coded as follow ( BC is holding the value of MyOb)
;C:\Users\PC\OneDrive\Documenti\MSXgl-1.0.0\projects\template_msx2/template_s255_b1.c:146: y = MyObj->y + dy;
ld l, c
; spillPairReg hl
; spillPairReg hl
ld h, b
; spillPairReg hl
; spillPairReg hl
inc hl
inc hl
inc hl
ld a, (hl)
inc hl
ld h, (hl)
; spillPairReg hl
ld l, a
; spillPairReg hl
; spillPairReg hl
ld d, #0x00
add hl, de
ld (_y), hl
but again it could have shortened as:
ld hl,3
add hl,bc
ld a,(hl)
inc hl
ld h,(hl)
ld l,a
ld d, #0x00
add hl, de
ld (_y), hl
It seems that the compiler does not consider that 16bit ADD HL,BC is faster than 3 INC HL
In the first "could have been coded" there's a 16-bit access of a 8-bit variable (fetching the byte behind the jj in memory)
I don't think that, in general, such accesses should be acceptable for SDCC, unless explicitly allowed, and there are probably more "low hanging fruits" for improvements. But it appears to me that the same solution with:
is still better than using IY there.
Ragozini Arturo, to allow easier reproduction, it would be very helpful if you would take some time to reduce your examples to the pieces of self-contained code, which could be easily copy-pasted and compiled independently without your own external headers or any other files (including providing the exact command line parameters of SDCC that produce your code). I've seen you've posted a few .c files in other posts, but some definitions (which affect the sizes of the structures used and their content) are still in headers known only to you. I think this could be an example of a self-sufficient and more minimal code demonstrating a problem:
https://sourceforge.net/p/sdcc/bugs/3727/?limit=25#8b7e/fdb0
Also using the value of a pointer to a field to reach another field of the same structure by adding the offset between them should be easy to implement...
The saving is that the offset between fields in a struct is a constant, while the base address of the struct is usually more difficult to be computed and you do not need to do that twice
Last edit: Ragozini Arturo 2024-04-17
In [r14816], I improved the use of Z80 16-bit loads for 8-bit registers. I think we are now making good use of those (assuming --allow-unsafe-read is used). I believe there is some further potential in better using Rabbit 16-bit loads and Z80 16-bit dec, inc and add.
Related
Commit: [r14816]
Do we have a not-too-big, self-contained, compileable sample to reproduce this issue?
https://sourceforge.net/p/sdcc/feature-requests/909/?page=1&limit=25#300a/42f5/73c1
for the first:
so it seems "neglects the use of ADD HL,BC Moreover it resorts to IY and to 8 bit ADD, when using a 16 bit ADD would have avoided IY at all" aren't current anymore?
But the second is:
which is much longer than the suggested, and the pushes / pops appear to me that it's about the compiler's attempt to reuse something earlier, so the result needs the previous.
but also note that you used "defaults" in your compilation.
with 200000 I for the same file, same version of compiler it's without push pop, but the second line still clearly long:
Edit: and this shorter C (attached) produces with your latest version
#14815 (Linux)Last edit: Janko Stamenović 2024-04-22
be aware that this code seems broken
it is mixing the low and the high bytes of Y and dy
Looks fine to me: dy (extended to 16 bits) is stored in h:c, while y is in b:l. I.e. the sum we want is (256 * h + c) + (256 * b + l) which by commutativity of addition is (256 * h + l) + (256 * b + c), and thus add hl, bc gives the correct result.
I see, you are right, a bit perverted but correct and it is using only HL and BC, without resorting to DE (the drawback is that you cannot reuse BC later -just in case, not here, because C=dy and B= y/256)
Last edit: Ragozini Arturo 2024-04-23