I have gotten the best results by writing the entire routine in assembly and then linking it with C code.  The calling convention isn't well documented, but not hard to understand by looking at the list files.

Here is a sample routine that takes a single 16 bit argument and returns a 32 bit result:

    ;; uint32_t core_read (uint16_t addr);
    push    ix
    ld    ix,#0
    add    ix,sp
    ld    a, 4(ix)
        out     (reg_addr0), a
    ld    a, 5(ix)
        out     (reg_addr1), a
        ld      a, #1
        out     (reg_control), a

        ;; get return value
        in      a, (reg_rd_data0)
        ld      l, a
        in      a, (reg_rd_data1)
        ld      h, a
        in      a, (reg_rd_data2)
        ld      e, a
        in      a, (reg_rd_data3)
        ld      d, a
        pop     ix

 I needed a Port I/O routine for the Z80 that would accept the address as a parameter, so I wrote 2 functions with inline ASM to do it. I would have used sfr, but the addresses need to be passed as a parameter (ie. I can't just #define or hardcode them).

 I'm pretty new to ASM, and with all the warning in the manual about protecting the registers I just wanted to check with some experienced people to see if these functions looked OK. They do work fine under emulation, haven't tried them on real hardware.

//these 2 are outside of the functions so the symbols will be seen by the ASM code
static uint8 io_val;
static uint8 io_addr;

uint8 io_read(uint8 addr) {
       io_addr = addr;
       push af
       push bc
       ld bc, (_io_addr)
       in a, (c)
       ld (_io_val), a
       pop bc
       pop af
       return io_val;

void io_write(uint8 addr, uint8 val) {
       io_addr = addr;
       io_val = val;
       push af
       push bc
       ld a,(_io_val)
       ld bc,(_io_addr)
       out (c),a
       pop bc
       pop af

