Menu

#229 Optimization for z80

closed
None
5
2008-07-12
2007-12-31
Anonymous
No

When i compile this in C Code

g_numActors = 0; // g_numActors is a global var.

I obtain in assembler:

ld iy,#_g_numActors ( 4 cycles / 4 bytes)
ld 0(iy),#0x00 ( 5 cycles / 3 bytes)

Why not generates this one?

ld (#_g_numActors), #0x00 ( 6 cycles / 4 bytes)

or

ld hl, #_g_numActors ( 2 cycles / 3 bytes)
ld (hl), #0x00 ( 3 cycles / 2 bytes)

Thank you.

Discussion

  • Maarten Brock

    Maarten Brock - 2008-01-10
    • labels: 355282 -->
    • summary: Optimization --> Optimization for z80
     
  • Philipp Klaus Krause

    Logged In: YES
    user_id=564030
    Originator: NO

    The currently code generation generates code for all sizes of g_numActors. If g_nuMActors would be an unsigned int or unsigned long int going through iy is faster since the other bytes can be fetched using ld 1(iy), etc.

    However I see that for 8-bit operands it is not optimal. Other operations like g_numActors &= 0x07 or g_numActors += 0x07 suffer from the same inefficiency. It's even worse if the results are not stored in the same variabale.

    Philipp

    P.S. Here' three nice testcases:

    unsigned char c;
    unsigned long d;

    void test(void)
    {
    c &= 0x07;
    }

    void test2(void)
    {
    d &= 0x07070707u;
    }

    unsigned char test3(void)
    {
    return(c & 0x07);
    }

     
  • Philipp Klaus Krause

    Logged In: YES
    user_id=564030
    Originator: NO

    Here's an example, which implements the improvement for and operations where the destination is not the same as the left operand:

    Index: src/z80/gen.c

    --- src/z80/gen.c (Revision 5000)
    +++ src/z80/gen.c (Arbeitskopie)
    @@ -5285,6 +5285,16 @@
    aopPut (AOP (result), "!zero", offset);
    continue;
    }
    + // Don't go through iy. Saves a few bytes if size <= 3, same code size for size == 4.
    + else if(AOP_TYPE (left) == AOP_IY)
    + {
    + emit2 ("ld a,!hashedstr",
    + aopGetLitWordLong (AOP(left), offset, FALSE));
    + emit2 ("and a,%s",
    + aopGet (AOP (right), offset, FALSE));
    + aopPut (AOP (result), "a", offset);
    + continue;
    + }
    }
    // faster than result <- left, anl result,right
    // and better if result is SFR

     
  • Philipp Klaus Krause

    Logged In: YES
    user_id=564030
    Originator: NO

    However excessive use of iy is an ubiquitous problem in the Z80 port. I see two possible solutions:

    1) Changes similar to the patch I posted here all over gen.c

    2) More information for the peephole optimizer: A new function notUsed() would be very useful both for optimizing use of iy away and in lots of other cases as well. notUsed() would return true exactly if it's argument is written to after the code in question before the next label (or it's not a register used for returning values and it is not read before the next unconditional ret):
    Tehre could be a peephole rule to optimize

    ld iy,#_c
    ld a,0(iy)

    to

    ld a,(#_c)

    using

    replace restart {
    ld iy,#%1
    ld a,%2(iy)
    } by {
    ld a, (#%1 + %2)
    } if notUsed(iy)

    only if the store to iy is dead.

    In a similar way sdcc often generates code like this:

    ld e, a
    ld a, e
    add a, #0x42
    ld e, a

    Currently the peephole optimizer can optimize the "ld a, e" away (it checks that e is not volatile and removes the load). the new function would allow to optimize the first "ld e, a" away too by a generic rule:

    replace restart {
    ld %1, %2
    } by {
    } if notValatile(%1 %2), notUsed(%1)

    This would simplify the peephole rules since now there's many rules that do the same for special cases like the one above.

    Philipp

     
  • Philipp Klaus Krause

    • assigned_to: nobody --> spth
     
  • Philipp Klaus Krause

    • status: open --> closed
     
  • Philipp Klaus Krause

    Logged In: YES
    user_id=564030
    Originator: NO

    Implemented in revision #5198.

     

Log in to post a comment.