When i compile this in C Code
g_numActors = 0; // g_numActors is a global var.
I obtain in assembler:
ld iy,#_g_numActors ( 4 cycles / 4 bytes)
ld 0(iy),#0x00 ( 5 cycles / 3 bytes)
Why not generates this one?
ld (#_g_numActors), #0x00 ( 6 cycles / 4 bytes)
or
ld hl, #_g_numActors ( 2 cycles / 3 bytes)
ld (hl), #0x00 ( 3 cycles / 2 bytes)
Thank you.
Logged In: YES
user_id=564030
Originator: NO
The currently code generation generates code for all sizes of g_numActors. If g_nuMActors would be an unsigned int or unsigned long int going through iy is faster since the other bytes can be fetched using ld 1(iy), etc.
However I see that for 8-bit operands it is not optimal. Other operations like g_numActors &= 0x07 or g_numActors += 0x07 suffer from the same inefficiency. It's even worse if the results are not stored in the same variabale.
Philipp
P.S. Here' three nice testcases:
unsigned char c;
unsigned long d;
void test(void)
{
c &= 0x07;
}
void test2(void)
{
d &= 0x07070707u;
}
unsigned char test3(void)
{
return(c & 0x07);
}
Logged In: YES
user_id=564030
Originator: NO
Here's an example, which implements the improvement for and operations where the destination is not the same as the left operand:
Index: src/z80/gen.c
--- src/z80/gen.c (Revision 5000)
+++ src/z80/gen.c (Arbeitskopie)
@@ -5285,6 +5285,16 @@
aopPut (AOP (result), "!zero", offset);
continue;
}
+ // Don't go through iy. Saves a few bytes if size <= 3, same code size for size == 4.
+ else if(AOP_TYPE (left) == AOP_IY)
+ {
+ emit2 ("ld a,!hashedstr",
+ aopGetLitWordLong (AOP(left), offset, FALSE));
+ emit2 ("and a,%s",
+ aopGet (AOP (right), offset, FALSE));
+ aopPut (AOP (result), "a", offset);
+ continue;
+ }
}
// faster than result <- left, anl result,right
// and better if result is SFR
Logged In: YES
user_id=564030
Originator: NO
However excessive use of iy is an ubiquitous problem in the Z80 port. I see two possible solutions:
1) Changes similar to the patch I posted here all over gen.c
2) More information for the peephole optimizer: A new function notUsed() would be very useful both for optimizing use of iy away and in lots of other cases as well. notUsed() would return true exactly if it's argument is written to after the code in question before the next label (or it's not a register used for returning values and it is not read before the next unconditional ret):
Tehre could be a peephole rule to optimize
ld iy,#_c
ld a,0(iy)
to
ld a,(#_c)
using
replace restart {
ld iy,#%1
ld a,%2(iy)
} by {
ld a, (#%1 + %2)
} if notUsed(iy)
only if the store to iy is dead.
In a similar way sdcc often generates code like this:
ld e, a
ld a, e
add a, #0x42
ld e, a
Currently the peephole optimizer can optimize the "ld a, e" away (it checks that e is not volatile and removes the load). the new function would allow to optimize the first "ld e, a" away too by a generic rule:
replace restart {
ld %1, %2
} by {
} if notValatile(%1 %2), notUsed(%1)
This would simplify the peephole rules since now there's many rules that do the same for special cases like the one above.
Philipp
Logged In: YES
user_id=564030
Originator: NO
Implemented in revision #5198.