Menu

#253 Register parameter passing

None
closed-fixed
z80 port (48)
5
2025-08-31
2008-03-20
No

Similar to RFE #979838, but #979838 is for mcs51, while this one is for Z80.

Passing arguments in registers would reduce call overhead.
However this makes sense for small parameters only: Any register pair can be pushed by the caller, but if the arguments are passed in registers that could mean that we'd have to move them around in registers a lot before the call. In a simialr way the callee would probably spend a lot of time reordering arguments, unless we take the registers used for arguments away from the register allocator.
Using the second register set is probably quite complicated.

I see two possible solutions:
*Use register arguments for functions where the sum of the arguments' sizes is below 24 bits only. We could then use a, h, l, which are not used by the register allocator for arguments.
*Let the user decide. The C standard allows use of the register storage class for function parameters.

Passing arguments in registers would mostly help with small function, which is currently one of sdcc's weak points.

Philipp

Related

Feature Requests: #949
Wiki: SDCC 4.3.0 Release
Wiki: SDCC 4.4.0 Release
Wiki: SDCC 4.5.0 Release
Wiki: SDCC 4.6.0 Release

Discussion

1 2 3 > >> (Page 1 of 3)
  • Philipp Klaus Krause

    • labels: --> z80 port
     
  • Philipp Klaus Krause

    Steps needed:

    - Make the notUsed() peephole function aware of passing parameters in registers, so assignments to register arguments are not considered dead code. This has to be done exactly, since notUsed() is probably the peephole optimizer's single most powerful tool, and it's effectiveness should not be compromised.

    - Fix bug #2811521.

    - Enable register parameters by changing the default value of --no-reg-params to 0 for the Z80 port.

    - This solution would use de and bc for passing parameters that have the register storage class specifier.

    Philipp

     
  • Philipp Klaus Krause

    I recently learned that the C standard ignores storage class specifiers for the parameters in function declarations, they matter only in the definition. Thus, we cannot do the passing in registers depending on the storage class specifier. That makes this feature request unlikely to get implemented soon, if at all.

    Philipp

     

    Last edit: Maarten Brock 2021-11-22
  • Sergey Belyashov

    Is it possible add feature of using specified registers as arguments? It is useful for external functions written on assembler.
    For example, I have set of asm procedures which takes parameters: IX - some address, HL - some other address/data, B, C, D, E, returns IX, uses all, except AF, BC, DE, HL, IX. So I want to use these functions (they are time critical) without wrappers. For example:
    extern uint16_t* asm_func(uint16_t*, const uint16_t*, uint8_t, uint8_t) __naked(IX : IX, HL, B, C : AF, BC, DE, HL, IX);

     
  • Sergey Belyashov

    *fix:
    ... returns IX, uses AF, BC, DE, HL, IX. ...

     
  • alvin

    alvin - 2010-11-11

    As Philipp mentioned, the z80's small number and non-orthogonal instruction set means placing parameters in arbitrary registers prior to a call can involve a lot of overhead that would make it detrimental in most cases involving more than 2 or 3 parameters. However there are two calling linkages that we have found greatly improves z80 code: (1) CALLEE linkage where the callee is responsible for stack cleanup and (2) FASTCALL linkage where a small number of parameters (1 or 2) are passed in [DE]HL. The latter is consistent with how values are returned out of functions in all z80 C compilers I have found (specifically longs are returned in DEHL).

    We've used both for library code and supply an additional asm entrypoint in library functions using CALLEE linkage for asm programmers to sidestep the register initilizaition in function calls. Here is an example of both:

    CALLEE linkage:

    "
    (CALLEE linkage assumes left to right -pushing of params on stack)
    ; long __CALLEE__ strtol(const char * restrict nptr, char ** restrict endptr, int base)

    XLIB strtol, asm_strtol
    ; export C and asm entrypoints

    strtol:

    pop hl
    pop bc
    pop ix
    ex (sp),hl

    ; enter:
    ; bc = base
    ; ix = char **endptr
    ; hl = char *nptr
    ;
    ; exit:
    ; dehl = result (could be LONG_MAX or LONG_MIN on overflow)
    ; bc = address of next char to examine in nptr[]
    ; carry = error (overflow, bad base, empty conversion string)
    ; errno set to ERANGE (overflow), EINVAL (bad base / empty conversion string)
    ; *endptr set appropriately
    ;
    ; uses:
    ; af, bc, de, hl, af', bc', de', hl', ix

    asm_strtol:
    ........ body continues
    "

    compiler calls like so:

    ; parameter collection in hl
    push hl ; nptr
    ; parameter collection in hl
    push hl ; endptr
    ; parameter collection in hl
    push hl ; base
    call strtol
    ; note no stack cleanup, this adds up quickly and saves a lot on code size

    An example of FASTCALL linkage (incoming parameter always in [DE]HL)

    "
    ; char *strrev(char *s)
    ; reverses string s
    ;
    ; enter:
    ; hl = char *s
    ;
    ; exit:
    ; hl = char *s
    ;
    ; uses:
    ; af, bc, de

    strrev:
    .... body continues
    "

    compiler calls like so:

    ld hl,parameter
    call strrev

    The C and asm entrypoint is shared.

     
  • Sergey Belyashov

    As I said before there is some cases where it is required to call external function, passing parameters in registers.
    Currently to do call to my functions I need use inline assembler with lot of overhead:
    1. store IX and other registers
    2. load BC
    3. load HL
    4. load IX (it may be difficult)
    5. do call
    6. restore IX and other registers

    I do not know which registers I should save at any time. It is compiler's job, isn't it?

     
  • alvin

    alvin - 2010-11-13

    You misunderstand, I agree with you :) The way things are now in sdcc-z80 is not adequate for adding external asm functions, including library functions. I am just proposing alternatives to what you suggested.

    The idea of specifying parameters to be placed in certain registers will not work well IMO unless it is very few parameters (ie one, two, *maybe* three). This is because of all the gymnastic the compiler would have to perform to compute parameter values and get them into the right registers prior to call. This cost is paid in code size every time the external asm function is called. Also, I would guess that parameters would have to be temporarily saved to the stack prior to final register set up quite frequently which would also make it slower than the CALLEE alternative I suggested.

    The CALLEE alternative has the compiler push parameters onto the stack and the external asm function pop them into the correct registers. The caller (ie compiler) does not have clean up the stack afterward as the external asm function does that itself when popping parameters into registers.

    The FASTCALL alternative I mentioned works with passing a limited number of parameters in DE,HL. It could be one parameter in HL (16 bit), one in DEHL (32 bit) or two 16-bit in DE,HL. This would be the passing params in registers thing you are requesting but only for one or two params. This places essentially no overhead on the compiler because quite frequently a parameter will be computed in DE,HL for the call. The external asm function quite frequently wants the one or two parameters in DEHL (due to the nature of the z80 instruction set!) so it is almost a clean transfer of program flow without the compiler needing more information about the internals of an external function.

    Lastly, I agree with it being the compiler's job to ensure its temporaries are saved prior to a call. I don't know if it does that now or if a new CALLER-save qualifer needs to be introduced to tell the compiler to save its temporaries.

    So my proposal is add: CALLEE, FASTCALL linkage and CALLER-save qualifier if necessary. It is just not practical to add external asm functions and even libraries without them.

    Longer term, it may be advantageous to pass some metadata about the external function to the compiler such as which registers are actually destroyed so the compiler can make more intelligent decisions about placement of the function call and what temporaries actually need to be saved but this is something that would need a lot of work. I would also like to see sdcc-z80 get away from assigning roles to registers and using ix as a stack frame altogether but that is also a very long term project if it ever happens... there is a reason why expert z80 programmers almost never use ix as a frame pointer in their hand coded assembler :)

     
  • Sergey Belyashov

    aralbrec, your suggestion is good for large number of passing data. Because data is always pushed to the stack and popped back. My suggestion is not optimal too, but it cause simpler use of external libraries...

    spth, is there any progress?

     
  • Sergey Belyashov

    Also it is possible to pass first argument (if it pointer to variable of complex type (struct/union) only) in IY.

    void memcpy(void*, void*, uint16_t) will accept: HL, DE, BC
    void process(struct MyStruct *ms, uint16_t, uint16_t) will accept IY, HL, DE

     
  • Philipp Klaus Krause

    No progress directly on this issue.

    However,
    1) I'm not sure this is really worth it
    2) There's other things to be done that would improve code quality more

    For 1): Passing arguments in register is only an advantage if the arguments can stay there. If the function is longer the callee would have to put them on the stack, which most of the time is less efficient than when the caller does it. As a result, register arguments are good for small functions, but bad for big functions. But the calling convention cannot depend on this, it may only depend on parts of t the function declaration. The one exception would be static functions that do not have their address taken. But if those are small, inlining them is even better. On the other hand, stack (and thus parameter access has been improved somewaht recently for small functions, so the advantage of register parameters even there is no longer as big as it seemed when this feature request was filed. See e.g. the last column at https://sourceforge.net/apps/trac/sdcc/wiki/Philipp%27s%20TODO%20list, which shows a reduction from revision #6749 to #7347 in code size from 24 byte to 12 byte for the smallest function in the benchmark.

    2) In the graph at the page linked above, you can see a significant increase in code size and compilation speed that affects all ports related to the z80 in revision #6761 (bug #3400613). This is due to a bug fix. However I think that by improving common subexpression elimination, we could regain what was lost.

    3) There are compilation time issues with the new register allcoator. For some sdcc users it is too slow. It seems this is partially due to my not so efficent implementation of some algorithms (I wanted to make it work first, fast later) and partially due to a problem with Thorup's algorithm. If these issues can be fixed, everyone will benefit, by faster compilation or better code quality for a given compilation time. See the graph at the link above again, to see waht is possible and the current situation: As you can see from the red, green, light blue and violet lines, code size is a lot smaller when one uses --max-allocs-per-node 1000000, but compilation tkaes much longer.

    Philipp

    Philipp

     
  • Sergey Belyashov

    May be add keyword 'register' to the core language (as I understand, currently it is not supported)? Or more powerfull version of it, like: void func(__register("HL") uint16_t x); where HL can be changed by a16 (from 16-bit Accumulator) or r16 (any 16-bit register/pair).

     
  • Philipp Klaus Krause

    We support the keyword "register". However, according to the standard (at least last time I looked, which was before the release of the C11 standard - I'll have another look today), we are not allowed to change the calling convention depending upon its presence. I.e. a programmer is allowed to do this:

    voif f(register char) { // Defintion

    void f(char); // Declaration in another file

    Philipp

     
  • Philipp Klaus Krause

    It's still there in C11: "The storage-class specifier in the declaration specifiers for a parameter declaration, if present, is ignored unless the declared parameter is one of the members of the parameter type list for a function definition."

    Philipp

     
  • Sergey Belyashov

    this is actual for big machines with big caches.
    If I declare simple function like:

    void func1(register uint16_t *address, register uint16_t value)
    {
    *address = value;
    }
    void func2(register uint16_t *address, register uint16_t value)
    {
    func1(address + 0x1000/2, value);
    }

    then result asm-code will be

    _func1:
    push ix
    ld ix,#0
    add ix,sp
    ld l,4 (ix)
    ld h,5 (ix)
    ld a,6 (ix)
    ld (hl),a
    inc hl
    ld a,7 (ix)
    ld (hl),a
    pop ix
    ret

    _func2:
    push ix
    ld ix,#0
    add ix,sp
    ld a,4 (ix)
    add a, #0x00
    ld e,a
    ld a,5 (ix)
    adc a, #0x10
    ld d,a
    ld l,6 (ix)
    ld h,7 (ix)
    push hl
    push de
    call _func2
    pop af
    pop af
    pop ix
    ret

    Expected code:
    _func1:
    ld (hl),e
    inc hl
    ld (hl),d
    ret

    _func2:
    ld a,$10
    add a,h
    ld h,a
    call _func1 ;if func1 is not inlined, else:
    ;ld (hl),e
    ;inc hl
    ;ld (hl),d
    ret

     
  • Philipp Klaus Krause

    We now have not some support for this, but it needs to be exlicitly enabled using appropriate keywords per function. I leave this feature request open, since later, we might want to automate it.

    Philipp

     
  • Sergey Belyashov

    I want to remind about this feature request.

    Looking to code produced by SDCC I see many stack manipulations which are unnecessary. For example, small functions (isspace,isalpha,...) on their prologue always pops two pairs and pushes them again. This can be removed by direct passing value in register pair.
    My suggestion:
    Arguments are allocated from next lists: (A) for only one 8-bit arg and (DE, BC, HL). 8-bit argument, occupies whole register pair (in LSB register). DEHL is used for first 32-bit arg if it on first or second place. Generally HL has lower priority. It is used for 32-bit args or in case of all 3 args. It is much better than current __z88dk_fastcall: callee may use HL in prologue in most functions.
    Return value: A - 8-bit, HL - 16-bit, or DEHL - 32-bit. It is possible, as special case, use Carry Flag for boolean returns.
    Examples:

    char func (char x) //x -> A, _func -> A
    int func (int x) //x -> DE, _func -> HL
    int func (int x, int y) //x -> DE, y -> BC, _func -> HL
    long func (long x) //x -> DEHL, _func -> DEHL
    long func (long x, int y) //x -> DEHL, y -> BC, _func -> DEHL
    long func (int x, long y) //x -> BC, y -> DEHL, _func -> DEHL
    long func (long x, long y) //x -> DEHL, y -> (SP), _func -> DEHL
    void* func (void *d,void*s,int n) //d -> DE, s -> HL, n -> BC, _func -> HL
    char* func (char *d, char*s) // d -> DE, s -> BC, _func -> HL
    

    Why use register A? Some small functions may accept only one 8-bit argument. In most cases process of this argument will be done in accumulator. So this requirement just removes extra load insructions.

    Passing arguments in register is only an advantage if the arguments can stay there. If the function is longer the callee would have to put them on the stack, which most of the time is less efficient than when the caller does it. As a result, register arguments are good for small functions, but bad for big functions.

    Yes, you right. But for big functions extra push/pop no makes any sence. And for small functions any extra instruction highly decreases perfomance. Moreover, it may force programmers to break their complex functions by small ones. ;-)

    Also, I suggest to put IY to callee save list (I think, it is not so complex) and add special __classcall attribute: feature-requests/634 for functions with pointer to object as first argument.

     
    👍
    2
  • Sergey Belyashov

    After some more thinking about register parameter passing, I may suggest to use registers from list (descending priority): A or E, D, C, B, L, H, IY (2 byte arg is used DE only, 4 byte‘ de, bc; 6 byte: de, hl, bc; 8 byte: iy, de, hl, bc; 9+: iy, de, bc, stack). For __class_call order is: IY, A or E, D, C, B, H, L. If parameters more, than registers are available, then HL is excluded from list too. There some benchmarks are required to investigate best priority orders. return value in DE:BC, DE or A.

    char func (char x) //x -> A, _func -> A
    int func (int x) //x -> DE, _func -> DE
    int func (int x, int y) //x -> DE, y -> BC, _func -> DE
    long func (long x) //x -> DEBC, _func -> DEBC
    long func (long x, int y) //x -> DEHL, y -> BC, _func -> DEBC
    long func (int x, long y) //x -> DE, y -> HLBC, _func -> DEBC
    long func (long x, long y) //x -> IYDE, y -> HLBC, _func -> DEBC
    void func (char x, int y, int z, int k, char n) //x - IYL, y - DE, z - BC, k,n - (stack)
    void* func (void *d,void*s,int n) //d -> DE, s -> HL, n -> BC, _func -> DE
    char* func (char *d, char*s) // d -> DE, s -> BC, _func -> DE
    
     
    👍
    1
    • Sebastian Riedel

      I see why you use A for 8 bit return and HL for 16 bit return, since those are the registers where all the calculations take place.
      But I would also do the same when calling a function, there are two reasons for this:

      1. the calling function likely already uses those registers when parameters get calculated
      2. this allows to pass the return value of one function easily as an argument (probably rare case)
      unsigned char func1 (unsigned char x) __adehl_fastcall
      unsigned int func2 (unsigned char x) __adehl_fastcall
      long func3 (unsigned int x, unsigned int y) __adehl_fastcall
      
      func3(func2(func3(a + 42)), 1337);
      

      This is mostly helpful for the first argument.

       
      • Sergey Belyashov

        BC is in denylist?

         
        • Sebastian Riedel

          What is a denylist?

          Right, I overread that you are using BC. It might be beneficial to keep one register pair free, so you can move around registers without push/pop. And you likely have to do that if you calculate something.

          If you use A, E, D, C, B, L, H for 8 bit, it would be HL, BC, DE for 16 bit accordingly.

           
          • Sergey Belyashov

            denylist is "modern" replacement for blacklist.

            As I write above register allocation for parameters should be "smart". All generic register pairs allocated only in case of parameters use 48 bits...

            Calling function passing parameters in registers will save clock in most cases. In other cases speed will be unchanged, because callee just push all of registers to stack:

            ;void func(void *d, void *s, unsigned n);
            ;
            _func::
                push ix
                ld ix, #0
                add ix, sp
                ld sp, ix
                push bc
                push hl
                push de
                ...
                ld sp, ix ;deallocate all local vars and saved parameters
                pop ix
                ret
            
             

            Last edit: Sergey Belyashov 2020-07-17
            • Sebastian Riedel

              Wouldn’t it problematic to use BC if the function contains a loop, since the generator use that pair for the iterator?
              And HL souldn’t be used if the parameters are >48 bit.

               
              • Sergey Belyashov

                As I write, callee may push any register pair and it be not slower (imho faster) than caller. So it is better to use maximum registers for parameters.

                 
1 2 3 > >> (Page 1 of 3)

Log in to post a comment.

MongoDB Logo MongoDB