#47 bankswitching revisited

open
nobody
None
5
2015-11-07
2005-08-10
Anonymous
No

Hi Maarten, hi folks,

this is a proposal which might be a little hard to read
because
all the interesting stuff is hidden in the comments of the
source below. Please take my apologees.

I hope it's not just Maarten replying on me and me
replying on
Maarten, so please add your 0.02$.

I'd want to propose bank switching code which would
look like
this f.e. for calling bank 1 from bank 0:

---------8<----------------------------------------------------------------

;--untested and buggy, RFC
only--------------------------------------------
__sdcc_banked_call_0_1::
inc sp
inc sp
xch a,r0
push acc
mov a,r1
push acc
;
mov a,sp
add a,#-3
mov r1,a
mov a,#__sdcc_bank_switch_0 >> 8
mov @r1,a
dec r1
mov a,#__sdcc_bank_switch_0
mov @r1,a
mov a,r0
;
ljmp __sdcc_bank_switch_1

...

__sdcc_bank_switch_1::
anl P2, #(0x03)
orl P2, #1
ret

...

--------->8----------------------------------------------------------------

The above code could be auto-generated from
something like the c-code below:

---------8<----------------------------------------------------------------

/* Proposal for SDCC bank switching routines (Request
For Comments)
(LGPL 2005 Frieder Ferlemann)

On first sight this bank switching code seems more
awkward than
the present code in sdcc/device/lib/mcs51/crtbank.asm
so I feel some advertisement is needed:

* banked functions do end with "ret" instead of "ljmp"
(saving 2 byte per function)

* inter bank function calls do not need to specify
destination bank in a register.
(2 bytes less for an inter bank call.
Register usage currently does not matter but might
come
at a cost in future)

* intra bank function calls can be made directly
(no need to setup R0,R1,R2, saving 6 byte per
function.)

* intra bank function calls need only 2 byte on stack
(instead of 3).

And as you'll most likely find out the weaknesses of
my proposal anyway
I might as well include some most (?) of them here:

- needs more cycles for inter bank calls (my
comment: this is offset
by the fact that intra bank function calls do not
need any additional
cycles)

- needs more code in common area

(my comment: and 8 byte less per function in the
banked area).

- inter bank calls need one more byte stack space

(my comment: needs 1 byte less stack space per
intra bank call.
Probably it's save to assume that intra bank calls
are more
frequent than inter bank calls. In this case the
proposed
code uses less stack)

- the destination bank of a function call must be
known at
compile time, so no calling of a function in a
different
bank by a function pointer.

(my comment: would you really want this? - it's
8051 embedded
programming)

- the current code does not scale well beyond 4
paged areas because
you need to have code for each page calling any other

(my comment: good argument. The code size of this
bank switching
code basically scales with n! so if you should use
the current code
with many pages my proposal is on lost grounds.
On the other hand switching between 3 banks
(giving you
128k on a memory layout with 1*32k common and
3*32k paged memory)
is probably the "sweet spot" for 8051 if you
choose to do
bank switching at all.
The scaling problem additionally might be slightly
softened by the
linker including only the bank-switching code that
is needed.
A more dedicated bank-switch routine which doesn't
suffer from
n! (but probably needing more stack and cycles but
less code size)
could be written.)

Note: this bank switching code requires the compiler
(as opposed to
the linker) to know about the bank a function of
your code resides in.
So a prototype for banked function would need to
specify the
code bank:

int my_function( int i ) __banked (3);

This is pretty much like the syntax for the __using
keyword.

IMHO this compile time information puts SDCC in a
much better position
with respect to code banking than compilers which
leave the assignment
of functions to code banks to the linking stage (and
thus (unless the
linker is pretty clever) do the check for bank
switching at runtime).
*/

#define BANK_MASK (0x03)
#define BANK_REGISTER P2

#define ASM(...) __asm __VA_ARGS__ __endasm

#define sdcc_banked_call(x,y) \ ASM(;--untested and
buggy---------------------------------------------------------);\ ASM(.area CSEG_SDCC_BANKED_CALL_ ## x ##_##y
(CODE) ; extra segment so linker might remove it);\ ASM(__sdcc_banked_call_ ## x ##_##y:: );\ ASM( inc sp );\ ASM( inc sp );\ ASM( xch a,r0 );\ ASM( push acc );\ ASM( mov a,r1 );\ ASM( push acc );\ ASM( ; );\ ASM( mov a,sp );\ ASM( ; how to insert '#' with cpp for
immediate addressing? );\ ASM( add a,-3 ; '#' );\ ASM( mov r1,a );\ ASM( mov a,__sdcc_bank_switch_ ##x >> 8 ;'#' );\ ASM( mov @r1,a );\ ASM( dec r1 );\ ASM( mov a,__sdcc_bank_switch_ ##x ; '#' );\ ASM( mov @r1,a );\ ASM( mov a,r0 );\ ASM( ljmp __sdcc_bank_switch_ ##y ; ajmp? );\ ASM( ; );

#define bank_sel(x) \ ASM(;--untested and
buggy---------------------------------------------------------);\ ASM(.area CSEG_SDCC_BANK_SWITCH_ ## x (CODE) );\ ASM(__sdcc_bank_switch_ ##x:: );\ ASM( ; anl BANK_REGISTER, BANK_MASK ; '#' );\ ASM( ; orl BANK_REGISTER, x ; '#' );\ ASM( nop nop nop nop nop nop ; adapt );\ ASM( ret );

static void dummy(void) __naked
{
ASM( .globl BANK_REGISTER );

/* linker decides which ones are needed */
sdcc_banked_call (0, 1);
sdcc_banked_call (0, 2);
sdcc_banked_call (0, 3);

/* sdcc_banked_call (1, 0); bank 0 is common,
should not be generated */
/* sdcc_banked_call (1, 1); should not be generated */
sdcc_banked_call (1, 2);
sdcc_banked_call (1, 3);

/* sdcc_banked_call (2, 0); bank 0 is common,
should not be generated */
sdcc_banked_call (2, 1);
/* sdcc_banked_call (2, 2); should not be generated */
sdcc_banked_call (2, 3);

/* sdcc_banked_call (3, 0); bank 0 is common,
should not be generated */
sdcc_banked_call (3, 1);
sdcc_banked_call (3, 2);
/* sdcc_banked_call (3, 3); should not be generated */

/* linker decides which ones are needed */
bank_sel (0);
bank_sel (1);
bank_sel (2);
bank_sel (3);
}
--------->8----------------------------------------------------------------

Discussion

  • Nobody/Anonymous

    RFC as in mail but white space preserved

     
  • Maarten Brock

    Maarten Brock - 2005-08-11

    Logged In: YES
    user_id=888171

    Who filed this patch? Frieder?

    Some comments:
    1) If you use RET instead of LJMP where is it returning to?
    The common area or another bank? What restores this bank?

    2) Instead of letting the compiler/linker set a register now the
    user must adapt n! banked_call functions to his/her hardware
    instead of just one.

    3) Intra bank function calls don't need to be banked at all.
    Just don't specify the banked keyword. Nothing lost/saved.

    4) Function pointers are exactly the type of stuff that make
    programs big. I don't think we can do away with them.

    Maarten

     
  • Frieder Ferlemann

    Logged In: YES
    user_id=589052

    Hi Maarten,

    thank you for digging through my cryptic proposal:)

    > Who filed this patch? Frieder?

    Yes it was me. I would have preferred a mail to sdcc-devel
    but the list wouldn't accept it (my mail provider does not
    define postmaster@web.de so my mail bounced:(

    > 1)
    The mechanism of calling a function in bank 3 from bank 1
    would be:

    mov r0,#dest
    mov r1,#dest>>8
    lcall __sdcc_banked_call_1_3

    then __sdcc_banked_call_1_3 pushes the address of
    __sdcc_bank_switch_1 onto the stack, then pushes the address
    of (R0,R1) onto the stack and then LJMPs to
    __sdcc_bank_switch_3.
    The RET instruction of __sdcc_bank_switch_3 then jumps to
    the target address previously in registers (R0,R1).

    After the code of the target address is finished the RET
    instruction returns to the code which switches back to bank
    1 (and then to the address where the call originated from).

    The mechanism of calling a function in bank 3 from bank 0
    would be:
    lcall __bank_sel_3
    lcall my_func_in_bank_3

    or (if the previously used bank was bank 3) simply:

    lcall my_func_in_bank_3

    > 2)

    No not really, the user would have to provide the n
    __bank_sel_n functions (4==n in this case)
    The __sdcc_banked_call_n_m functions don't contain code
    which would be specific for the bank switching mechanism.
    The linker then would decide which of these end up in the
    binary.

    > 3)

    Yes.
    My proposal would additionally allow to put code into a
    banked region and call it either via __sdcc_banked_call_n_m
    or directly.
    If you are under extreme pressure you could theoretically
    even put library stuff (like floating point code) into a
    banked area...

    > 4)

    You cannot call them _directly_ but could get away with one
    additional level of indirection. A 'trampoline' function
    like "void my_func_trampoline(void){my_func();}" would add
    an overhead of 3 bytes code (and no stack overhead) . Should
    not be an issue.

    Note, the code generated by bank.c maybe didn't shine enough:

    It would generate unnecessary bank switching code for
    switching from the common bank 0 other banks.
    (in this case the calling bank doesn't need to be pushed and
    the compiler could insert an lcall __bank_sel_n if needed).

    In the end the proposal seems to need about 150 bytes code
    more in common memory (if the linker needs to include
    each(!) __sdcc_banked_call_n_m) than crtbank.asm.
    As it saves 3 to 6 bytes (in common bank) per call of a
    banked function from common bank there would be a break even
    at latest at 25..50 calls (this break even point can
    probably be reached relatively early).

    I'm attaching file bank2.lst which contains the complete
    bank switching code for 128kByte separated in 32k common
    area and 3 banked pages.
    (The file bank2.lst is more or less automatically generated
    by executing "sdcc -c bank2.c". bank2.c contains some
    additional comments about the pros and cons of the proposal.)

    Greetings,

    Frieder

     
  • Frieder Ferlemann

    bank switching code generated from "sdcc -c bank2.c"

     
  • Frieder Ferlemann

    some code removed, comment updated

     
  • Maarten Brock

    Maarten Brock - 2015-11-07

    10 years later this is still open ;)

    I've said this before (probably on the mailing list or in another tracker item) but this fails if there are parameters on the internal stack. But when going for bank switching, you're probably very low on internal stack space already, so maybe it is safe to assume that in that case --xstack is used as well.

    But all in all I think internal stack space is the most limiting resource in a bank switched application. Not code space and not cycles. So IMHO it would be best if we could do bank switching with only 2 bytes stack overhead and this might just be possible if we force banked function calls on aligned addresses. The assembler has the .bndry directive for this, but currently the linker does not support it correctly. And the assembler would have to insert NOPs.

    Say we force all banked function calls on a 4-fold address with .bndry 4 then the sdcc_banked_call could modify the LSB of the return address on stack with the current bank. And the sdcc_banked_ret could OR it with 0x3 again before returning.

            mov r0, #my_banked_func
            mov r1, #(my_banked_func>>8)
            mov r2, #(my_banked_func>>16)
            .bndry 4                        ; should insert 0/1/2/3 NOPs
            lcall __sdcc_banked_call
            ...                             ; address always ends in 0b...11
    
    __sdcc_banked_call::
            mov r3,a                        ; save a in r3
            pop acc
            mov r4,a                        ; save MSB(ret) in r4
            pop acc
            anl a,#0xFC                     ; clear 2 lsbit
            mov r5,a
            mov a,_PSBANK
            anl a,#0x03                     ; get 2 lsbit of current bank
            orl a,r5
            push acc                        ; save LSB(ret) + bank
            mov a,r4
            push acc                        ; restore MSB(ret)
            mov a,r0
            push acc                        ; push LSB(dest)
            mov a,r1
            push acc                        ; push MSB(dest)
            mov a,r2
            anl a,#0x03
            anl _PSBANK,#0xFC
            orl _PSANK,a                    ; select new bank
            mov a,r3                        ; restore a
            ret                             ; make the call
    
    my_banked_func:
            ...
            ljmp __sdcc_banked_ret          ; return from banked func
    
    __sdcc_banked_ret::
            mov r3,a                        ; save a in r3
            pop acc
            mov r4,a                        ; save MSB(ret) in r4
            pop acc
            mov r5,a                        ; get LSB(ret) + bank
            orl a,#0x03
            push acc                        ; restore LSB(ret)
            mov a,r4
            push acc                        ; restore MSB(ret)
            mov a,r5
            anl a,#0x03
            anl _PSBANK,#0xFC
            orl _PSBANK,a                   ; restore bank
            mov a,r3                        ; restore a
            ret                             ; return to caller
    

    If we would also force banked functions on 4-fold addresses it would become a little simpler and no longer need r2. And if the calling conventions changed freeing ACC and DPTR it could also be simpler.

    Above example only works upto 4 banks but this scheme scales to 8 or 16 with a different .bndry setting.

    It's also possible that a different packing leads to more optimal code. I haven't checked.

     
  • Frieder Ferlemann

    Hi Maarten,

    thanks for revisiting!)

    [...] force banked functions on 4-fold addresses [...]

    Just asking whether I got your idea correctly, do you mean the linker would e.g. know an extended version of .bndry which takes two arguments a and b (b defaulting to zero) which would result in new_address with (new_address % a) == b?
    So functions in bank 0 could be made to start at an address 0b...00, those in bank 1 at address 0b...01 etc. With the calling function then looking like

           ; If my_banked_func is aligned so that 
           ; address(my_banked_func) % num_banks == bank(my_banked_func)
           ; then the (two) lowermost bits of r0 hold the target bank
           ; of my_banked_func (A16, A17)
            mov r0, #my_banked_func         
            mov r1, #(my_banked_func>>8)
    
            .bndry num_banks, (currentbank + 1) % num_banks ; should insert 0/1/2/3 NOPs (for 4==num_banks)
            lcall __sdcc_banked_call
            ; two lowermost bit of this address also are the current bank now.
            ; (So the lowermost bits put on the stack via the lcall
            ; hold the bank to return to)
            ...
    
     
  • Maarten Brock

    Maarten Brock - 2015-11-07

    No, that was not my idea. I just imagined something like this:

        mov r0,#((my_banked_func & 0xFC) | ((my_banked_func>>16) & 0x03))
        mov r1,#(my_banked_func>>8)
        .bndry 4
        lcall __sdcc_banked_call
    
        .bndry 4
    my_banked_func:
        ...
    

    But you made me realize that we don't need .bndry 4 if we just insert 3 NOPs after the lcall:

        lcall __sdcc_banked_call
        nop
        nop
        nop
    

    This means .bndry doesn't need to insert NOPs but must only be honored by the linker.

     
  • Frieder Ferlemann

    nice:)

     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks