Small Device C Compiler (SDCC) / Discussion / Help: Looking for banking advice for 8051-ish target

mark - 2023-07-24

Hi,

I'm looking for some assistance with an 8051 project. I should preface this with
a warning that despite many years of experience as a high-level programmer, I'm
pretty novice at low level programming and an outright noob when it comes to
embedded devices, particularly of Harvard architecture - so if my questions come
across as those that might be asked by an amateur with no idea what he is doing,
that's entirely to be expected... : )

I'm working with an off-the-shelf PCB which is powered by an SOC with a 8051
core and embedded peripherals (RAM, flash, etc.) The core is mostly DS80C320
compatible. Firmware source code for this chip is available online, but it is
designed to be compiled using Keil. I'm intending to use this PCB as the
controller for several devices I am building, but I also do not want to use
Keil, so I've decided to refactor the codebase to be compiled with SDCC instead.
I'm a full time Linux user so I'd like to just see this through and finish the
port rather than take the easy option of "just use Keil and forget about it" :).

I've gone through the rote work of translating the codebase to account for the
differences in syntax between SDCC and Keil, and I now have something which at
least compiles without error, so long as I use --stack-auto and --model-huge.
But I think I have problems with my banking implementation, and want to check a
few things before I try to run the code on the device.

I noticed that the Keil is producing around 100KB of object code whereas SDCC is
producing somewhere around 200KB. This seems to be because --model-huge adds 3
mov instructions for every function call to implement banking. By luck, it
seems that it is quite possible to arrange the modules such that there are many
small frequently called functions which fit comfortably in CSEG; and there are
also lots of functions which will only ever be called by other functions in the
same bank. So, I can use model-large and manually mark functions as __banked
to save a lot of bytes.

If I understand correctly, the procedure to minimise code size here will be to
go through all the functions, and manually mark them with as __banked ONLY
when they (a) do not reside in HOME or CSEG; (b) are called by functions outside
of their own bank. I can then compile with --model-large. Is that accurate?

There's quite a lot to go through - is there any kind of static analysis tool
available which can take care of this if I use the codeseg pragma in every
module? In particular it will be quite inconvenient to move a module from one
bank to another if I need to re-check whether every referenced function is only
called from its own bank; I'd appreciate any pointers here.

Even still, SDCC does seem to generate assembly code that is larger than it
has to be or ought to be. As an example, say I have a module with three
functions DoFoo, DoBar, and DoFooAndBar. All three functions are called
from code banks other than their own and are therefore declared __banked. The
third function is simply a helper which calls DoFoo and DoBar in sequence.
The generated ASM for DoFooBar is observed to call the trampoline twice, even
though, being in the same module as both functions, the trampoline call will
have no effect. Is this simply a limitation of SDCC which I have to live with,
or is it due to improper usage on my part?

I'd also appreciate advice on the banking implementation. On this chip, code
banking is implemented by setting the value of an XFR at address FFFF. The
entire code memory from 0x0000 to 0xFFFF is mapped to external SPI flash
location 0xN0000 to 0xNFFFF where N is the selected bank number.

So, the first problem I have here is that all segments below BANK0 in the output
binary must be repeated at flash address 0x10000, 0x20000, etc. Is there a way I
can tell the linker to output ihex which reflects this, or am I limited to
manually moving bytes around in the final binary file?

Second, and more difficult, is the trampoline implementation. The SDCC manual
indicates that registers R0, R1 and R2 are reserved for passing the LSB, MSB and
bank respectively. It also notes that DPL, DPH, B and ACC are used to pass
parameters and return values around. This causes me some difficulty because of
course to set the XFR I need to use DPTR. Fortunately enough, the device has
dual data pointers, so I've taken advantage of this and written some trampoline
routines:

.module rtd_banking .area DSEG (DATA) ; elsewhere in C: ; __sfr __at(0x84) DPL1; ; __sfr __at(0x85) DPH1; ; __sfr __at(0x86) DPS; _acctmp:: ; temp variable for storing A whilst reading xfr .ds 1 ; Banking behaviour of the target device is defined as: ; ; XFR at 0xFFFC is used to enable banking ; XFR at 0xFFFF sets current bank number ; ; Bank 0 is external SPI flash addr 0x00000 - 0x0ffff ; Bank 1 is external SPI flash addr 0x10000 - 0x1ffff ; Bank n is external SPI flash addr 0xn0000 - 0xnffff .area HOME (CODE) ; R0 = LSB ; R1 = MSB ; R2 = target bank __sdcc_banked_call:: mov _acctmp, a ; save acc in temp variable mov _DPS, 0x01 ; enable secondary data pointer mov dptr, #0xffff ; set data pointer to xfr address movx a, @dptr ; get current bank push acc ; push current bank to stack mov a, r0 ; get address LSB from register push acc ; push to stack mov a, r1 ; get address MSB from reigster push acc ; push to stack mov a, r2 ; get new bank movx @dptr, a ; set new bank mov _DPS, 0x00 ; disable secondary data pointer mov a, _acctmp ; restore acc from temp variable ret ; make the call __sdcc_banked_ret:: mov _acctmp, a ; save acc in temp variable pop a ; get previous bank from stack mov _DPS, 0x01 ; enable secondary data pointer mov dptr, #0xffff ; set data pointer to xfr address movx @dptr, a ; set bank to previous mov _DPS, 0x00 ; disable secondary data pointer mov a, _acctmp ; restore acc from temp variable ret ; return to caller

However, that's the first asm I've ever written and I'd appreciate if anyone has
any ideas about improving it. I actually haven't tested it on the device yet
either so if it seems like it shouldn't work, you might be right. To me it feels
like there's really quite a lot of code there to be calling for every single
function call; and the holding of A in a temporary variable feels a bit hackish.
But it might just be that that's how it has to be if I'm to compile the code
using SDCC. However if there's some flag I can pass to stop DPL and DPH being
used to pass data between function calls, I think that would help?

Anyway, I think that's enough to be getting on with for now. Thanks in advance
to anyone who's got any advice for me!

Cheers,

Mark
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Maarten Brock - 2023-08-11

Hello Mark,

If I understand correctly, the procedure to minimise code size here will be to
go through all the functions, and manually mark them with as __banked ONLY
when they (a) do not reside in HOME or CSEG; (b) are called by functions outside
of their own bank. I can then compile with --model-large. Is that accurate?

Yes, that is correct.

I know of no static analysis tool for this, but I believe there are tools to at least create a call tree graph.

Your DoFooAndBar will always need to use the trampoline because both callees need to return through the trampoline. Look at the generated code for returning from these functions. You'll have to live with it.

Replicating the non-banked parts in the final binary can be handled by post-processing with E.g. srec_cat.

What the manual does not mention is that R3 is free for you to use. You can replace _acctmp with R3.

You can probably use inc _DPS / dec _DPS as cheaper variants of mov _DPS,#0x01 / mov _DPS,#0x00. Your current code is buggy as it reads _DPS from address 0x00.

SDCC does not generate any code using the second DPTR. You might want to init it to 0xFFFF at startup and let the trampoline assume it is still there.

pop a should be pop acc.

Unfortunately, SDCC has no flag to stop it from using DPTR and ACC for passing parameters.

There is a long standing wish for a new register allocator for mcs51 which most probably will no longer use these registers for passing parameters and results. It should also probably use the DPTR for passing the destination of a banked call so the trampoline can use JMP @DPTR instead of the pushes and ret. But don't expect this any time soon.

HTH,
Maarten

Last edit: Maarten Brock 2023-08-11

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

mark - 2023-10-04

Hi Maarten,

Sorry, I didn't get the e-mail notifying me you'd replied - thanks for your help!

Look at the generated code for returning from these functions.

Sorry, yes, that seems very obvious in retrospect, of course since the banked return is always present then you always have to go through the banked call.

Replicating the non-banked parts in the final binary can be handled by post-processing

Thanks - this was a good pointer, for anyone else who ends up here via google the incantation I've ended up with in my Makefile is this:

$(SDCC_OUTPUTDIR)/firmware.bin: $(SDCC_OUTPUTDIR)/firmware.hex # crop the common segment (0x0000 - 0x1cff) from the intel hex file and # copy it to each eeprom bank (i.e. at 0x010000 and 0x020000) srec_cat \ $(SDCC_OUTPUTDIR)/firmware.hex -intel \ '(' \ $(SDCC_OUTPUTDIR)/firmware.hex -intel -crop 0x000000 0x001cff -offset 0x010000 \ ')' \ '(' \ $(SDCC_OUTPUTDIR)/firmware.hex -intel -crop 0x000000 0x001cff -offset 0x020000 \ ')' \ -o $(SDCC_OUTPUTDIR)/firmware.bin -binary truncate -s 524288 $(SDCC_OUTPUTDIR)/firmware.bin

You might want to init it to 0xFFFF at startup and let the trampoline assume it is still there.

Is it sufficient to add the below to banking.asm?

.area HOME (CODE) ___sdcc_external_startup:: mov _DPL1, #0xff ; set secondary data pointer to 0xffff mov _DPH1, #0xff ; 0xffff is xfr address of page ret

Thank you for the rest of your advice on the asm code.

don't expect this any time soon

Fair enough.

I believe there are tools to at least create a call tree graph

For now I've managed to manually do this analysis so I'll shelf this, but, incase I revisit it, do you know have a particular tool in mind which works specifically with sdcc, or is some general purpose C tool sufficient?

Thanks!

Mark

Last edit: mark 2023-10-05
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Maarten Brock - 2023-10-05

Hi Mark,

I did get the email ;-)

Is it sufficient to add the below to banking.asm?

No, it's not. __sdcc_external_startup() must return a value in DPL, usually 0x00. And you may use C instead of asm for it as long as you don't use initialized variables.
https://sourceforge.net/p/sdcc/code/HEAD/tree/trunk/sdcc/device/lib/_startup.c

do you have a particular tool in mind which works specifically with sdcc, or is some general purpose C tool sufficient?

I seem to remember that I used a generic C tool once a long time ago, but I forgot which.

Out of curiosity which MCU are you using?

Maarten

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

mark - 2023-10-05

Hey,

So, this in main.c?

#pragma codeseg CSEG // ... unsigned char __sdcc_external_startup(void) { DPL1 = 0xFF; DPH1 = 0xFF; return 0; }

Sorry if that seems like an obtuse question. I can always check the generated code : )

Out of curiosity which MCU are you using?

I'll avoid naming the actual chip, it's produced by a popular Taiwanese IC design company and is an LCD controller (HDMI/VGA/Composite comes in, LVDS goes out). If I drop the part number you'll probably end up getting weird e-mails about connecting screens to things. I'm just playing with it as an embedded software learning exercise because I happened to have a pile of them.

The embedded CPU is a DesignWare DW8051 core which claims to be a slightly reduced DS80C320 clone. The SPI flash interface and banking scheme XFR seem to be implemented by the video chip rather than the DW8051 core, though, and the datasheet for the video chip is somewhat lacking in clarity - I suppose you're supposed to just use the firmware provided by the manufacturer. But, it must be a common enough way to manage banking, as the Keil simulator seems to support it pretty seamlessly.

I'm going to play around with ucsim at some point to see if I can convince it to simulate the dual data pointer and XFR banking scheme - it doesn't seem like it should be impossible - and try to run the dumped firmware binary. Hints would be good if you have any but I'm sure I'll work it out otherwise : )

Cheers.

Mark
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Looking for banking advice for 8051-ish target

The Small Device C Compiler (SDCC), targeting 8-bit architectures

Forums

Help

Looking for banking advice for 8051-ish target

Looking for banking advice for 8051-ish target

The Small Device C Compiler (SDCC), targeting 8-bit architectures

Forums

Help

Looking for banking advice for 8051-ish target document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Looking for banking advice for 8051-ish target