Menu

Looking for banking advice for 8051-ish target

Help
mark
2023-07-24
2023-10-05
  • mark

    mark - 2023-07-24

    Hi,

    I'm looking for some assistance with an 8051 project. I should preface this with
    a warning that despite many years of experience as a high-level programmer, I'm
    pretty novice at low level programming and an outright noob when it comes to
    embedded devices, particularly of Harvard architecture - so if my questions come
    across as those that might be asked by an amateur with no idea what he is doing,
    that's entirely to be expected... : )

    I'm working with an off-the-shelf PCB which is powered by an SOC with a 8051
    core and embedded peripherals (RAM, flash, etc.) The core is mostly DS80C320
    compatible. Firmware source code for this chip is available online, but it is
    designed to be compiled using Keil. I'm intending to use this PCB as the
    controller for several devices I am building, but I also do not want to use
    Keil, so I've decided to refactor the codebase to be compiled with SDCC instead.
    I'm a full time Linux user so I'd like to just see this through and finish the
    port rather than take the easy option of "just use Keil and forget about it" :).

    I've gone through the rote work of translating the codebase to account for the
    differences in syntax between SDCC and Keil, and I now have something which at
    least compiles without error, so long as I use --stack-auto and --model-huge.
    But I think I have problems with my banking implementation, and want to check a
    few things before I try to run the code on the device.

    I noticed that the Keil is producing around 100KB of object code whereas SDCC is
    producing somewhere around 200KB. This seems to be because --model-huge adds 3
    mov instructions for every function call to implement banking. By luck, it
    seems that it is quite possible to arrange the modules such that there are many
    small frequently called functions which fit comfortably in CSEG; and there are
    also lots of functions which will only ever be called by other functions in the
    same bank. So, I can use model-large and manually mark functions as __banked
    to save a lot of bytes.

    If I understand correctly, the procedure to minimise code size here will be to
    go through all the functions, and manually mark them with as __banked ONLY
    when they (a) do not reside in HOME or CSEG; (b) are called by functions outside
    of their own bank. I can then compile with --model-large. Is that accurate?

    There's quite a lot to go through - is there any kind of static analysis tool
    available which can take care of this if I use the codeseg pragma in every
    module? In particular it will be quite inconvenient to move a module from one
    bank to another if I need to re-check whether every referenced function is only
    called from its own bank; I'd appreciate any pointers here.

    Even still, SDCC does seem to generate assembly code that is larger than it
    has to be or ought to be. As an example, say I have a module with three
    functions DoFoo, DoBar, and DoFooAndBar. All three functions are called
    from code banks other than their own and are therefore declared __banked. The
    third function is simply a helper which calls DoFoo and DoBar in sequence.
    The generated ASM for DoFooBar is observed to call the trampoline twice, even
    though, being in the same module as both functions, the trampoline call will
    have no effect. Is this simply a limitation of SDCC which I have to live with,
    or is it due to improper usage on my part?

    I'd also appreciate advice on the banking implementation. On this chip, code
    banking is implemented by setting the value of an XFR at address FFFF. The
    entire code memory from 0x0000 to 0xFFFF is mapped to external SPI flash
    location 0xN0000 to 0xNFFFF where N is the selected bank number.

    So, the first problem I have here is that all segments below BANK0 in the output
    binary must be repeated at flash address 0x10000, 0x20000, etc. Is there a way I
    can tell the linker to output ihex which reflects this, or am I limited to
    manually moving bytes around in the final binary file?

    Second, and more difficult, is the trampoline implementation. The SDCC manual
    indicates that registers R0, R1 and R2 are reserved for passing the LSB, MSB and
    bank respectively. It also notes that DPL, DPH, B and ACC are used to pass
    parameters and return values around. This causes me some difficulty because of
    course to set the XFR I need to use DPTR. Fortunately enough, the device has
    dual data pointers, so I've taken advantage of this and written some trampoline
    routines:

    .module rtd_banking
    
    .area DSEG    (DATA)
    
    ; elsewhere in C:
    ; __sfr __at(0x84) DPL1;
    ; __sfr __at(0x85) DPH1;
    ; __sfr __at(0x86) DPS;
    
    _acctmp::                 ; temp variable for storing A whilst reading xfr
        .ds 1
    
    ;  Banking behaviour of the target device is defined as:
    ;
    ;  XFR at 0xFFFC is used to enable banking
    ;  XFR at 0xFFFF sets current bank number
    ;
    ;  Bank 0 is external SPI flash addr 0x00000 - 0x0ffff
    ;  Bank 1 is external SPI flash addr 0x10000 - 0x1ffff
    ;  Bank n is external SPI flash addr 0xn0000 - 0xnffff
    
    .area HOME    (CODE)
    
    ; R0 = LSB
    ; R1 = MSB
    ; R2 = target bank
    __sdcc_banked_call::
        mov    _acctmp, a     ; save acc in temp variable
    
        mov    _DPS, 0x01     ; enable secondary data pointer
    
        mov    dptr, #0xffff  ; set data pointer to xfr address
        movx   a, @dptr       ; get current bank
        push   acc            ; push current bank to stack
    
        mov    a, r0          ; get address LSB from register
        push   acc            ; push to stack
        mov    a, r1          ; get address MSB from reigster
        push   acc            ; push to stack
    
        mov    a, r2          ; get new bank
        movx   @dptr, a       ; set new bank
    
        mov    _DPS, 0x00     ; disable secondary data pointer
    
        mov    a, _acctmp     ; restore acc from temp variable
        ret                   ; make the call
    
    __sdcc_banked_ret::
        mov    _acctmp, a     ; save acc in temp variable
        pop    a              ; get previous bank from stack
    
        mov    _DPS, 0x01     ; enable secondary data pointer
        mov    dptr, #0xffff  ; set data pointer to xfr address
        movx   @dptr, a       ; set bank to previous
        mov    _DPS, 0x00     ; disable secondary data pointer
    
        mov    a, _acctmp     ; restore acc from temp variable
        ret                   ; return to caller
    

    However, that's the first asm I've ever written and I'd appreciate if anyone has
    any ideas about improving it. I actually haven't tested it on the device yet
    either so if it seems like it shouldn't work, you might be right. To me it feels
    like there's really quite a lot of code there to be calling for every single
    function call; and the holding of A in a temporary variable feels a bit hackish.
    But it might just be that that's how it has to be if I'm to compile the code
    using SDCC. However if there's some flag I can pass to stop DPL and DPH being
    used to pass data between function calls, I think that would help?

    Anyway, I think that's enough to be getting on with for now. Thanks in advance
    to anyone who's got any advice for me!

    Cheers,

    Mark

     
  • Maarten Brock

    Maarten Brock - 2023-08-11

    Hello Mark,

    If I understand correctly, the procedure to minimise code size here will be to
    go through all the functions, and manually mark them with as __banked ONLY
    when they (a) do not reside in HOME or CSEG; (b) are called by functions outside
    of their own bank. I can then compile with --model-large. Is that accurate?

    Yes, that is correct.

    I know of no static analysis tool for this, but I believe there are tools to at least create a call tree graph.

    Your DoFooAndBar will always need to use the trampoline because both callees need to return through the trampoline. Look at the generated code for returning from these functions. You'll have to live with it.

    Replicating the non-banked parts in the final binary can be handled by post-processing with E.g. srec_cat.

    What the manual does not mention is that R3 is free for you to use. You can replace _acctmp with R3.

    You can probably use inc _DPS / dec _DPS as cheaper variants of mov _DPS,#0x01 / mov _DPS,#0x00. Your current code is buggy as it reads _DPS from address 0x00.

    SDCC does not generate any code using the second DPTR. You might want to init it to 0xFFFF at startup and let the trampoline assume it is still there.

    pop a should be pop acc.

    Unfortunately, SDCC has no flag to stop it from using DPTR and ACC for passing parameters.

    There is a long standing wish for a new register allocator for mcs51 which most probably will no longer use these registers for passing parameters and results. It should also probably use the DPTR for passing the destination of a banked call so the trampoline can use JMP @DPTR instead of the pushes and ret. But don't expect this any time soon.

    HTH,
    Maarten

     

    Last edit: Maarten Brock 2023-08-11
  • mark

    mark - 2023-10-04

    Hi Maarten,

    Sorry, I didn't get the e-mail notifying me you'd replied - thanks for your help!

    Look at the generated code for returning from these functions.

    Sorry, yes, that seems very obvious in retrospect, of course since the banked return is always present then you always have to go through the banked call.

    Replicating the non-banked parts in the final binary can be handled by post-processing

    Thanks - this was a good pointer, for anyone else who ends up here via google the incantation I've ended up with in my Makefile is this:

    $(SDCC_OUTPUTDIR)/firmware.bin: $(SDCC_OUTPUTDIR)/firmware.hex
    # crop the common segment (0x0000 - 0x1cff) from the intel hex file and
    # copy it to each eeprom bank (i.e. at 0x010000 and 0x020000)
        srec_cat \
            $(SDCC_OUTPUTDIR)/firmware.hex -intel \
            '(' \
                $(SDCC_OUTPUTDIR)/firmware.hex -intel -crop 0x000000 0x001cff -offset 0x010000 \
            ')' \
            '(' \
                $(SDCC_OUTPUTDIR)/firmware.hex -intel -crop 0x000000 0x001cff -offset 0x020000 \
            ')' \
            -o $(SDCC_OUTPUTDIR)/firmware.bin -binary
        truncate -s 524288 $(SDCC_OUTPUTDIR)/firmware.bin
    

    You might want to init it to 0xFFFF at startup and let the trampoline assume it is still there.

    Is it sufficient to add the below to banking.asm?

    .area HOME    (CODE)
    
    ___sdcc_external_startup::
        mov    _DPL1, #0xff   ; set secondary data pointer to 0xffff
        mov    _DPH1, #0xff   ; 0xffff is xfr address of page
        ret
    

    Thank you for the rest of your advice on the asm code.

    don't expect this any time soon

    Fair enough.

    I believe there are tools to at least create a call tree graph

    For now I've managed to manually do this analysis so I'll shelf this, but, incase I revisit it, do you know have a particular tool in mind which works specifically with sdcc, or is some general purpose C tool sufficient?

    Thanks!

    Mark

     

    Last edit: mark 2023-10-05
  • Maarten Brock

    Maarten Brock - 2023-10-05

    Hi Mark,

    I did get the email ;-)

    Is it sufficient to add the below to banking.asm?

    No, it's not. __sdcc_external_startup() must return a value in DPL, usually 0x00. And you may use C instead of asm for it as long as you don't use initialized variables.
    https://sourceforge.net/p/sdcc/code/HEAD/tree/trunk/sdcc/device/lib/_startup.c

    do you have a particular tool in mind which works specifically with sdcc, or is some general purpose C tool sufficient?

    I seem to remember that I used a generic C tool once a long time ago, but I forgot which.

    Out of curiosity which MCU are you using?

    Maarten

     
  • mark

    mark - 2023-10-05

    Hey,

    So, this in main.c?

    #pragma codeseg CSEG
    // ...
    unsigned char __sdcc_external_startup(void)
    {
        DPL1 = 0xFF;
        DPH1 = 0xFF;
        return 0;
    }
    

    Sorry if that seems like an obtuse question. I can always check the generated code : )

    Out of curiosity which MCU are you using?

    I'll avoid naming the actual chip, it's produced by a popular Taiwanese IC design company and is an LCD controller (HDMI/VGA/Composite comes in, LVDS goes out). If I drop the part number you'll probably end up getting weird e-mails about connecting screens to things. I'm just playing with it as an embedded software learning exercise because I happened to have a pile of them.

    The embedded CPU is a DesignWare DW8051 core which claims to be a slightly reduced DS80C320 clone. The SPI flash interface and banking scheme XFR seem to be implemented by the video chip rather than the DW8051 core, though, and the datasheet for the video chip is somewhat lacking in clarity - I suppose you're supposed to just use the firmware provided by the manufacturer. But, it must be a common enough way to manage banking, as the Keil simulator seems to support it pretty seamlessly.

    I'm going to play around with ucsim at some point to see if I can convince it to simulate the dual data pointer and XFR banking scheme - it doesn't seem like it should be impossible - and try to run the dumped firmware binary. Hints would be good if you have any but I'm sure I'll work it out otherwise : )

    Cheers.

    Mark

     

Log in to post a comment.