Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo
I would like to start a general thread discussing how users control memory usage to maximise the potential of a given processor.
My experience (and problem) is on the 8051 and so my comments are focused on this processor, but hopefully users will expand the discussion so it could possibly be added to the SDCC manual.
Currently there is 1300 bytes of code space available. Internal RAM has 2 bytes free, but plenty os stack (determined from the mem file). So although code is limited I am confident the remaining functions can be added assuming the internal ram usage can be optimised.
A bit of background: The internal ram of 8051 consists of 256 bytes of which 128 can be directly accessed, or the whole 256 can be indirect (idata, there are two pointers available) addressed. The directly accessible (data) area also includes the register banks (32 bytes, all used), bit array (8 bytes used), and (in my case) a 6 byte overlay area (that must be contiguous). Most 8051 have some amount of extra memory (xdata) which requires 16-bit addresses to access.
Functions that use local variables will attempt to place them in internal registers, or place them in the direct memory area if the current register bank is fully used. These local variables in data space cannot be re-used (apparently) as the compiler does not know the function calling order. The result is that the data space becomes filled with local variables and will ultimately run out of space so the function (code) cannot be compiled.
So, what to do...
1) I have placed some globals in the idata space and therefore forced the compiler to use pointers to them. This has had some success but has been a bit hit and miss where previously compiling functions fail because the register shuffle required to use the pointers results in local variables being moved to the direct memory space.
2) Place locals in idata space - this has not been too successful and may increase memory consumption as what previously fitted in the internal registers (and so were automatically reused by the next function) are now forced into memory.
3) I have not tried, but am contemplating, making more use of globals and reusing them whenever possible. As you can imagine for this to be successful would require a high understanding of the program flow under all conditions to ensure that the reuse was appropriate.
4) Expand or encourage the compiler to use the overlay area more. Also, 'fix' the compiler so that the overlay area does not specifically have to be in direct memory space. Could a SDCC guru explain how the overlay area is used and how it can be comptrolled? Thanks.
I would like to encourage other users to reveal their techniques on optimising memory consumption. I admit that I may have missed some compiler controls which could have helped so feel free to share all!
This has been a long post so I hope the size of it does not put people off from answering it. General comments or working examples welcome,
1) and 2) might help.
3) is a way I would not encourage.
4) To encourage the compiler/linker to use more overlay you need to move all local variables into leaf-functions of the call-tree. The linker can only overlay variables in functions that do not call any other function themselves.
Changing SDCC to move overlay to another memory than direct space will lead to trouble. The compiler will need more registers as it cannot perform most operations on non-direct memory. Code size will also increase.
5) Another option is to use --stack-auto or judicious make some functions reentrant. This will put local variables on stack (idata) and thus automatically overlay ALL locals not just those in leaf-functions. It does cost some code space and execution time though.
Consider placing some less frequently used variables (especially short buffers) in pdata space. It is first 256 bytes of xdata, but accessed with 8bit pointers, just like idata (except that only operations are read/write, no direct arithmetic etc. like in idata). 8bit pointers (R0 and R1) are more flexible than DPTR (easy to decrement, add, multiply, compare etc.), and can be passed to functions expecting xdata pointers by simply casting.
Much of 'data' space (in small memory model) is wasted for arguments of functions with more than one argument. When a function really needs more arguments, I combine two 8bit arguments into one 16bit (using macros).
For very simple functions that don't use registers (or use little, but are called from complicated functions that use many) #pragma callee_saves can reduce stack usage.