Menu

#483 Use information on registers preserved by called functions

None
open
nobody
None
5
2020-03-09
2016-01-20
No

Using information on which registers are preserved by a called function, SDCC could generate more efficiet code by not saving those. This would also automatically affect register allocation choices when communicated via the cost function.

The information would be obtained in two ways:
1) From the user using a new keyword. This would mostly be useful for functions implemented in assembler.
2) By automatic analysis of the generated asm code. The information then would be available at subsequent calls to the function in the same compilation unit.

Due to the nature of this optmization it would probably be most useful for the z80 and z180, due to their relatively large number of registers, and CISC-instruction (the latter resulting in asm-implented standard library functions).

Philipp

Discussion

  • Frieder Ferlemann

    for the mcs51 some work in direction of way 2) has been done in
    sdcc/src/mcs51/rtrack.c
    which does value tracking (of literals) in registers by parsing the asm code while it is being generated. It discards info as soon as it sees a label though.

     
  • Philipp Klaus Krause

    Here's a first implementation of the frontend part for 1)

    Philipp

     
  • Philipp Klaus Krause

    In revision [r9470], I implemented a first version. It is incomplete, but already useful.

    What should work: For the z80-related ports, the information on registers b, c, d, e in declarations is taken into account when saving registers for function calls. The information is used by the register allocator, i.e. it will prefer to put variables that are live across function calls into preserved registers.

    Philipp

    Attachment: Usage example that also shows the interaction with the register allocator.

     

    Last edit: Maarten Brock 2016-01-22
    • Philipp Klaus Krause

      Warning: Support on the peephole optimizer side is still missing, so code generated with the peephole optimizer active is likely to be wrong.

      Philipp

       
  • sverx

    sverx - 2016-01-21

    Thanks for this feature! :)
    Anyway I don't get why the peephole optimizer would break the generated code... isn't it supposed to eventually change asm code with different asm code that does the same thing?

     
    • Philipp Klaus Krause

      notUsed() would previously assume that no registers other than ix are preserved across function calls, and thus the peephole optimizer would assume all writes to registers before the function call are dead unless the registers are used for register parameters.

      Philipp

       
  • sverx

    sverx - 2016-01-21

    Oh, wait, I realized what you mean (the peephole could replace registers...)

     
  • Philipp Klaus Krause

    In revision [r9471] I fixed the handling in the peephole optimizer. It currently is overly conservative, resulting in code size regressions for calls through function pointers. But I also added preserved register infromation for three standard-libary functions, and in the regression tests the code size gains from that are far bigger than what we loose on function pointers.

    This feature should now be safe to use. But there is more to implement to unlock further benefits.

    Philipp

     

    Last edit: Maarten Brock 2016-01-22
    • sverx

      sverx - 2016-01-22

      Thanks, I'll start playing with it ASAP (mmm... what happened to snapshot builds? :| )

       
  • Maarten Brock

    Maarten Brock - 2016-01-22

    Philipp,

    Can you explain how it is conservative for function pointers? I don't see any exceptions for it in your commits. And a function pointer is called with a 'call' instruction just like any other function, isn't it?

    Further, I think the frontend should check this keyword (probably in compareFuncType() in SDCCsymt.c) on function pointer assignment. At least one way. You may assign a preserving function to a non-preserving function pointer, but not the other way around.

    Maarten

     
    • Philipp Klaus Krause

      Function pointers are called via call (hl) or call(iy). Since (h) and (iy) are not valid identifiers, findSym in line 447 of peep.c will return 0, and we get to the conservative fallback in line 472.

      I agree on checking function pointer assignments. The assignment x = y should be ok iff the set of preserved registers for x is a subset of the set of preserved registers for y.

      Philipp

       
      • Philipp Klaus Krause

        In revision #9475, the estimate got a bit more exact, and thus less conservative. Some information on presereved registers frm code generation is passed to the peephole optimizer. This help with the code size regression on calls through function pointers.

        Philipp

         
      • Philipp Klaus Krause

        The check on function pointers is implemented in revision #9476.

        Philipp

         
  • Philipp Klaus Krause

    • summary: Use information on reigsters preserved by called functions --> Use information on registers preserved by called functions
    • Group: -->
     
  • sverx

    sverx - 2016-01-25

    I'm using revision #9479, which I just downloaded, and I can't make this work. :|
    For instance, I declared this function, both in my code and in the .h header file:

    void SMS_setTile (unsigned int tile) __z88dk_fastcall __preserves_regs(b,c,d,e,h,l,iyh,iyl);
    

    then I use that function in my program, but in the generated code I see the register pushing/popping which hasn't changed:

    ;main.c:199: SMS_setTile (EMPTY_TILE);
        push    bc
        ld  hl,#0x0000
        call    _SMS_setTile
        pop bc
    

    I don't get what I'm doing wrong :|
    Thanks

     
    • Philipp Klaus Krause

      Please give a small, compileable example.

      Philipp

       
  • sverx

    sverx - 2016-01-25

    Sure, sorry.

    __sfr __at 0xBE VDPDataPort;
    
    #define ASM_HL_TO_VDP_DATA                                \
      __asm                                                   \
        ld a,l                                                \
        out (_VDPDataPort),a      ; 11                        \
        ld a,h                    ; 4                         \
        sub #0                    ; 7                         \
        nop                       ; 4 = 26 *VRAM SAFE*        \
        out (_VDPDataPort),a                                  \
      __endasm
    
    #pragma save
    #pragma disable_warning 85
    void SMS_setTile (unsigned int tile) __z88dk_fastcall __preserves_regs(b,c,d,e,h,l,iyh,iyl) {
      ASM_HL_TO_VDP_DATA;
    }
    #pragma restore
    
    void main (void) {
      unsigned char x,y;
      for (y=0;y<28;y++)
        for (x=0;x<32;x++)
          SMS_setTile (0);
    }
    
     
  • alvin

    alvin - 2016-01-25

    It looks like the z88dk_fastcall decoration has to be last.

     
    • Philipp Klaus Krause

      Actually, the preserved regs had to be first, no matter what else or ho many others there would be. Fixed in revision #9484.

      Philipp

       
  • alvin

    alvin - 2016-01-26

    I had some build trouble with MSVC so hopefully it isn't affecting the results here. The SDCC nightly build is still at #9479 where the same problem below is present.

    zsdcc -v
    3.5.5 #9485

    extern unsigned int strlen(char *s);
    extern unsigned int strlen_f(char *s) __z88dk_fastcall;
    extern unsigned int strlen_pfde(char *s) __preserves_regs(d,e) __z88dk_fastcall;
    extern unsigned int strlen_pfbc(char *s) __preserves_regs(b,c) __z88dk_fastcall;
    extern unsigned int strlen_fpde(char *s) __z88dk_fastcall __preserves_regs(d,e);
    extern unsigned int strlen_fpbc(char *s) __z88dk_fastcall __preserves_regs(b,c);
    
    unsigned int len;
    
    void main()
    {
       len = strlen("Hello World\n");
       len = strlen_f("Hello World\n");
       len = strlen_pfde("Hello World\n");
       len = strlen_pfbc("Hello World\n");
       len = strlen_fpde("Hello World\n");
       len = strlen_fpbc("Hello World\n");
    }
    

    sdcc -mz80 -S test.c

    _main::
    ;test.c:13: len = strlen("Hello World\n");
        ld  hl,#___str_0
        push    hl
        call    _strlen
        pop af
        ld  (_len),hl
    ;test.c:14: len = strlen_f("Hello World\n");
        ld  hl,#___str_0
        call    _strlen_f
        ld  (_len),hl
    ;test.c:15: len = strlen_pfde("Hello World\n");
        ld  hl,#___str_0
        push    hl
        call    _strlen_pfde
        pop af
        ld  (_len),hl
    ;test.c:16: len = strlen_pfbc("Hello World\n");
        ld  hl,#___str_0
        push    hl
        call    _strlen_pfbc
        pop af
        ld  (_len),hl
    ;test.c:17: len = strlen_fpde("Hello World\n");
        ld  hl,#___str_0
        call    _strlen_fpde
        ld  (_len),hl
    ;test.c:18: len = strlen_fpbc("Hello World\n");
        ld  hl,#___str_0
        call    _strlen_fpbc
        ld  (_len),hl
        ret
    

    It looks like z88dk_fastcall now has to be first in the list otherwise it is ignored.

    I've just started looking at some results (I've only done preserve_regs on string.h) but here's one snippet:

    BEFORE:

        call    _strstr_callee
        ld  c,l
        ld  b,h
        ld  hl,(_main_keywordIndex_1_400)
        ld  h,0x00
        add hl, hl
        ld  de,_keywords
        add hl,de
        ld  e,(hl)
        inc hl
        ld  h,(hl)
        push    bc
        ld  l, e
        call    _strlen_fastcall
        pop bc
        add hl,bc
    

    AFTER:

        call    _strstr_callee
        ex  de,hl
        ld  hl,(_main_keywordIndex_1_400)
        ld  h,0x00
        add hl, hl
        ld  bc,_keywords
        add hl,bc
        ld  c,(hl)
        inc hl
        ld  h,(hl)
        ld  l, c
        call    _strlen_fastcall
        add hl,de
    

    strlen_fastcall preserves DE. In the after case, the compiler manages to move what was formerly in BC in the BEFORE case into DE so that in the second case it does not need to push around the call to strlen. Very neat :) I'm not sure how many functions this will touch but it did touch more than I thought in string.h. In the whole program, the BEFORE and AFTER cases came out to the same number of asm lines but I think that's probably down to different code patterns in the AFTER case that are not being optimized by existing peephole cases.

     

    Last edit: alvin 2016-01-26
    • Philipp Klaus Krause

      It always had to be first, this is not a new bug. Fixed in revision #9487.

      Philipp

       
  • sverx

    sverx - 2016-01-27

    It seems to me it's now working as expected, no more uselessly pushing/popping untouched registers around function calls :)

    I'm just curious if the compiler can at this point realize when it's assigning a constant to a register which is already holding that value, across a function call that won't modify it.
    I mean, say I call a z88dk_fastcall function twice, passing the same constant value, and I declared that the called function will preserve HL. Next assignment would thus be useless, right?

    ;main.c:212: SMS_setTile (BG_TILE);
        ld  hl,#0x0001
        call    _SMS_setTile
    ;main.c:213: SMS_setTile (BG_TILE);
        ld  hl,#0x0001
        call    _SMS_setTile
    

    I understand that it may be way more complicated than what it looks to me, though.
    Thanks!

     
  • alvin

    alvin - 2016-01-27

    Maybe similar to sverx and a solution I am trying with peephole rules below. This is a snippet of output code:

        call    _atoi_fastcall
        ld  e,l
        ld  d,h
        jr  l_main_00104
    l_main_00103:
        call    _randomize
        ld  e,l
        ld  d,h
    l_main_00104:
        ld  c, e
        ld  b, d
        ex  de,hl
        call    _srand_fastcall
        push    bc
        ld  hl,___str_1
        push    hl
        call    _printf
    

    srand_fastcall() preserves a,b,c,d,e,h,l (HL is the input parameter which is unchanged by the function call). The output from either atoi() or randomize(), in HL, is passed to srand() in HL to act as seed. As you can see the compiler is moving HL into DE and then it is saving that value into BC for reuse after the srand_fastcall(). Of course it doesn't have to do anything - it could just use the value in HL without moving into any registers and also have that HL after the srand_fastcall().

    Ideal code would look like this:

        call    _atoi_fastcall
        jr  l_main_00104
    l_main_00103:
        call    _randomize
    l_main_00104:
        call    _srand_fastcall
        push    hl
        ld  hl,___str_1
        push    hl
        call    _printf
    

    But code generation issues aside, I had an idea to partially address this with peephole rules. I tried this:

    replace restart {
        ld  c,e
        ld  b,d
        ex  de,hl
        call    %1
        push    bc
    } by {
        ex  de,hl
        call    %1
        push    hl
        ld c,l
        ld b,h
        ; peephole z88dk-435a
    } if notUsedFrom(%1 'hl')
    

    notUsedFrom() works with conditional branches so I thought I'd try with calls as well but it does not work. The hope embedded in the rule above is the load into bc will be eliminated by other rules.

    Is it even possible to get notUsedFrom() to work with function call targets maybe by looking at the preserves_reg() set? A similar rule could be used to fix sverx's example if it cannot be easily fixed in the compiler.

     

    Last edit: alvin 2016-01-27
    • Philipp Klaus Krause

      Currently only information on preserved b, c, d, e is used in code generation.

      Philipp

       

Log in to post a comment.