Small Device C Compiler (SDCC) / Patches / #47 bankswitching revisited

Nobody/Anonymous - 2005-08-10

RFC as in mail but white space preserved

bank.c

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Maarten Brock - 2005-08-11

Logged In: YES
user_id=888171

Who filed this patch? Frieder?

Some comments:
1) If you use RET instead of LJMP where is it returning to?
The common area or another bank? What restores this bank?

2) Instead of letting the compiler/linker set a register now the
user must adapt n! banked_call functions to his/her hardware
instead of just one.

3) Intra bank function calls don't need to be banked at all.
Just don't specify the banked keyword. Nothing lost/saved.

4) Function pointers are exactly the type of stuff that make
programs big. I don't think we can do away with them.

Maarten

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Frieder Ferlemann - 2005-08-11

Logged In: YES
user_id=589052

Hi Maarten,

thank you for digging through my cryptic proposal:)

> Who filed this patch? Frieder?

Yes it was me. I would have preferred a mail to sdcc-devel
but the list wouldn't accept it (my mail provider does not
define postmaster@web.de so my mail bounced:(

> 1)
The mechanism of calling a function in bank 3 from bank 1
would be:

mov r0,#dest
mov r1,#dest>>8
lcall __sdcc_banked_call_1_3

then __sdcc_banked_call_1_3 pushes the address of
__sdcc_bank_switch_1 onto the stack, then pushes the address
of (R0,R1) onto the stack and then LJMPs to
__sdcc_bank_switch_3.
The RET instruction of __sdcc_bank_switch_3 then jumps to
the target address previously in registers (R0,R1).

After the code of the target address is finished the RET
instruction returns to the code which switches back to bank
1 (and then to the address where the call originated from).

The mechanism of calling a function in bank 3 from bank 0
would be:
lcall __bank_sel_3
lcall my_func_in_bank_3

or (if the previously used bank was bank 3) simply:

lcall my_func_in_bank_3

> 2)

No not really, the user would have to provide the n
__bank_sel_n functions (4==n in this case)
The __sdcc_banked_call_n_m functions don't contain code
which would be specific for the bank switching mechanism.
The linker then would decide which of these end up in the
binary.

> 3)

Yes.
My proposal would additionally allow to put code into a
banked region and call it either via __sdcc_banked_call_n_m
or directly.
If you are under extreme pressure you could theoretically
even put library stuff (like floating point code) into a
banked area...

> 4)

You cannot call them _directly_ but could get away with one
additional level of indirection. A 'trampoline' function
like "void my_func_trampoline(void){my_func();}" would add
an overhead of 3 bytes code (and no stack overhead) . Should
not be an issue.

Note, the code generated by bank.c maybe didn't shine enough:

It would generate unnecessary bank switching code for
switching from the common bank 0 other banks.
(in this case the calling bank doesn't need to be pushed and
the compiler could insert an lcall __bank_sel_n if needed).

In the end the proposal seems to need about 150 bytes code
more in common memory (if the linker needs to include
each(!) __sdcc_banked_call_n_m) than crtbank.asm.
As it saves 3 to 6 bytes (in common bank) per call of a
banked function from common bank there would be a break even
at latest at 25..50 calls (this break even point can
probably be reached relatively early).

I'm attaching file bank2.lst which contains the complete
bank switching code for 128kByte separated in 32k common
area and 3 banked pages.
(The file bank2.lst is more or less automatically generated
by executing "sdcc -c bank2.c". bank2.c contains some
additional comments about the pros and cons of the proposal.)

Greetings,

Frieder

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Frieder Ferlemann - 2005-08-11

bank switching code generated from "sdcc -c bank2.c"

bank2.lst

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Frieder Ferlemann - 2005-08-11

some code removed, comment updated

bank2.c

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

10 years later this is still open ;)

I've said this before (probably on the mailing list or in another tracker item) but this fails if there are parameters on the internal stack. But when going for bank switching, you're probably very low on internal stack space already, so maybe it is safe to assume that in that case --xstack is used as well.

But all in all I think internal stack space is the most limiting resource in a bank switched application. Not code space and not cycles. So IMHO it would be best if we could do bank switching with only 2 bytes stack overhead and this might just be possible if we force banked function calls on aligned addresses. The assembler has the .bndry directive for this, but currently the linker does not support it correctly. And the assembler would have to insert NOPs.

Say we force all banked function calls on a 4-fold address with .bndry 4 then the sdcc_banked_call could modify the LSB of the return address on stack with the current bank. And the sdcc_banked_ret could OR it with 0x3 again before returning.

        mov r0, #my_banked_func
        mov r1, #(my_banked_func>>8)
        mov r2, #(my_banked_func>>16)
        .bndry 4                        ; should insert 0/1/2/3 NOPs
        lcall __sdcc_banked_call
        ...                             ; address always ends in 0b...11

__sdcc_banked_call::
        mov r3,a                        ; save a in r3
        pop acc
        mov r4,a                        ; save MSB(ret) in r4
        pop acc
        anl a,#0xFC                     ; clear 2 lsbit
        mov r5,a
        mov a,_PSBANK
        anl a,#0x03                     ; get 2 lsbit of current bank
        orl a,r5
        push acc                        ; save LSB(ret) + bank
        mov a,r4
        push acc                        ; restore MSB(ret)
        mov a,r0
        push acc                        ; push LSB(dest)
        mov a,r1
        push acc                        ; push MSB(dest)
        mov a,r2
        anl a,#0x03
        anl _PSBANK,#0xFC
        orl _PSANK,a                    ; select new bank
        mov a,r3                        ; restore a
        ret                             ; make the call

my_banked_func:
        ...
        ljmp __sdcc_banked_ret          ; return from banked func

__sdcc_banked_ret::
        mov r3,a                        ; save a in r3
        pop acc
        mov r4,a                        ; save MSB(ret) in r4
        pop acc
        mov r5,a                        ; get LSB(ret) + bank
        orl a,#0x03
        push acc                        ; restore LSB(ret)
        mov a,r4
        push acc                        ; restore MSB(ret)
        mov a,r5
        anl a,#0x03
        anl _PSBANK,#0xFC
        orl _PSBANK,a                   ; restore bank
        mov a,r3                        ; restore a
        ret                             ; return to caller

If we would also force banked functions on 4-fold addresses it would become a little simpler and no longer need r2. And if the calling conventions changed freeing ACC and DPTR it could also be simpler.

Above example only works upto 4 banks but this scheme scales to 8 or 16 with a different .bndry setting.

It's also possible that a different packing leads to more optimal code. I haven't checked.

Frieder Ferlemann - 2015-11-07

Hi Maarten,

thanks for revisiting!)

[...] force banked functions on 4-fold addresses [...]

Just asking whether I got your idea correctly, do you mean the linker would e.g. know an extended version of .bndry which takes two arguments a and b (b defaulting to zero) which would result in new_address with (new_address % a) == b?
So functions in bank 0 could be made to start at an address 0b...00, those in bank 1 at address 0b...01 etc. With the calling function then looking like

; If my_banked_func is aligned so that ; address(my_banked_func) % num_banks == bank(my_banked_func) ; then the (two) lowermost bits of r0 hold the target bank ; of my_banked_func (A16, A17) mov r0, #my_banked_func mov r1, #(my_banked_func>>8) .bndry num_banks, (currentbank + 1) % num_banks ; should insert 0/1/2/3 NOPs (for 4==num_banks) lcall __sdcc_banked_call ; two lowermost bit of this address also are the current bank now. ; (So the lowermost bits put on the stack via the lcall ; hold the bank to return to) ...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Maarten Brock - 2015-11-07

No, that was not my idea. I just imagined something like this:

mov r0,#((my_banked_func & 0xFC) | ((my_banked_func>>16) & 0x03)) mov r1,#(my_banked_func>>8) .bndry 4 lcall __sdcc_banked_call .bndry 4 my_banked_func: ...

But you made me realize that we don't need .bndry 4 if we just insert 3 NOPs after the lcall:

lcall __sdcc_banked_call nop nop nop

This means .bndry doesn't need to insert NOPs but must only be honored by the linker.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Frieder Ferlemann - 2015-11-07

nice:)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sergey Belyashov - 2021-03-17

status: open --> closed-out-of-date

Group: -->
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sergey Belyashov - 2021-03-17

labels: --> 8051
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

bankswitching revisited

The Small Device C Compiler (SDCC), targeting 8-bit architectures

Group

Searches

Help

#47 bankswitching revisited

Discussion