#381 Link Time Overlay

open
nobody
5
2012-10-12
2012-10-12
No

I'm surprised I couldn't find this request. The rewards would be huge.

I shall illustrate with an example.

This is the software for a steering wheel, a couple of analogue and digital inputs, 7-seg displays, shift indicator, high speed CAN and runtime flash writing for configuration updates. Note that only 24 bytes are left for the stack and only 9 bytes are overlayed:
<code>
Internal RAM layout:
0 1 2 3 4 5 6 7 8 9 A B C D E F
0x00:|0|0|0|0|0|0|0|0|1|1|1|1|1|1|1|1|
0x10:|2|2|2|2|2|2|2|2|a|a|a|a|a|a| | |
0x20:|B|B|B|B|T|b|b|b|b|b|b|b|b|b|b|b|
0x30:|b|b|b|b|b|b|b|b|b|b|b|b|b|b|b|b|
0x40:|b|b|b|b|b|b|b|b|b|b|b|b|b|b|b|b|
0x50:|b|b|b|b|b|b|b|b|b|b|b|b|b|b|b|b|
0x60:|b|b|b|b|b|b|b|b|b|b|b|b|b|c|c|c|
0x70:|c|c|d|d|d|Q|Q|Q|Q|Q|Q|Q|Q|Q|I|I|
0x80:|I|I|I|I|I|I|I|I|I|I|I|I|I|I|I|I|
0x90:|I|I|I|I|I|I|I|I|I|I|I|I|I|I|I|I|
0xa0:|I|I|I|I|I|I|I|I|I|I|I|I|I|I|I|I|
0xb0:|I|I|I|I|I|I|I|I|I|I|I|I|I|I|I|I|
0xc0:|I|I|I|I|I|I|I|I|I|I|I|I|I|I|I|I|
0xd0:|I|I|I|I|I|I|I|I|I|I|I|I|I|I|I|I|
0xe0:|I|I|I|I|I|I|I|I|S|S|S|S|S|S|S|S|
0xf0:|S|S|S|S|S|S|S|S|S|S|S|S|S|S|S|S|
0-3:Reg Banks, T:Bit regs, a-z:Data, B:Bits, Q:Overlay, I:iData, S:Stack, A:Absolute

Stack starts at: 0xe8 (sp set to 0xe7) with 24 bytes available.

Other memory:
Name Start End Size Max
---------------- -------- -------- -------- --------
PAGED EXT. RAM 0xf000 0xf085 134 256
EXTERNAL RAM 0xf000 0xf06a 107 3072
ROM/EPROM/FLASH 0x0000 0x4075 16502 65536
</code>

This is the same application compiled by C51, note that the Stack starts at 0x71. I.e. there are still 129 bytes for the stack left:
<code>
START STOP LENGTH ALIGN RELOC MEMORY CLASS SEGMENT NAME
=========================================================================

* * * * * * * * * * * D A T A M E M O R Y * * * * * * * * * * * * *
000000H 000007H 000008H --- AT.. DATA "REG BANK 0"
000008H 00000FH 000008H --- AT.. DATA "REG BANK 1"
000010H 000017H 000008H --- AT.. DATA "REG BANK 2"
000018H.0 00001FH.7 000008H.0 --- --- **GAP**
000020H.0 000022H.3 000002H.4 BIT UNIT BIT _BIT_GROUP_
000022H.4 000022H.4 000000H.1 BIT UNIT BIT ?BI?HSK_CAN
000022H.5 000022H 000000H.3 --- --- **GAP**
000023H 00005EH 00003CH BYTE UNIT DATA _DATA_GROUP_
00005FH 000070H 000012H BYTE UNIT DATA ?DT?MAIN
000071H 000071H 000001H BYTE UNIT IDATA ?STACK
</code>

The reasons are 3-fold:

- Link time overlay allows the overlaying of all functions, not just those grouped together into a single .c file, this project is linked from 9 .rel files
- The REMOVEUNUSED linker directive completely eliminates unused functions post-compile, the majority of .c files are shared between many projects and thus carry a lot of code not required in that particular application
- Manual call tree manipulations allow the correct overlaying of functions called through function pointers, nooverlay is no longer necessary

LTO would probably have some side benefits:

- The requirement to make ISRs public could be eliminated, much like _sdcc_external_startup() does not need to be public

Discussion

  • Maarten Brock

    Maarten Brock - 2012-10-13

    Just clear things up, overlaying is not used only on functions in a single file. It is used over all files. But it can only overlay leaf-functions in the calling tree. And it assumes all leaf functions can be overlayed, which is why you must disable it for functions called from an interrupt. It does not know that interrupts form their own tree.

    The current linker cannot create a call-tree and thus not overlay all functions, remove unused functions or calculate stack usage.

     
  • Kamikaze Dominic Fandrey

    So I misunderstood this statement from the manual:
    Note that the compiler (not the linkage editor) makes the decision for overlaying the data items.

    So, what stands in the way of creating a call tree? Without knowing anything about the linker architecture, I'd guess it's not an overwhelming task.

    Anyway the huge majority of my functions are leaf functions, so it should work better than it does. Here is a call tree:
    http://hsk.sourceforge.net/dev/main_8c.html#a6288eba0f8e8ad3ab1544ad731eb7667

    All the calls of *_isr_* functions are false positives, caused by assigning function pointers to callback vectors. All these functions are surrounded by:
    #pragma save
    #ifdef SDCC
    #pragma nooverlay
    #endif
    <function>
    #pragma restore

    I suppose SDCC doesn't see functions handling function pointers as leaves?

    For LX51 (the Keil linker) I use the following call tree manipulations:
    * ~ (hsk_boot_isr_nmipll, hsk_flash_isr_nmiflash),
    * ~ (hsk_pwc_isr_cctOverflow),
    * ~ (hsk_adc_isr, hsk_adc_isr_warmup),
    * ~ (hsk_pwc_isr_cc0, hsk_pwc_isr_cc1, hsk_pwc_isr_cc2, hsk_pwc_isr_cc3),
    * ~ (tick0),
    ISR_hsk_isr14 ! (hsk_boot_isr_nmipll, hsk_flash_isr_nmiflash),
    ISR_hsk_isr5 ! (hsk_pwc_isr_cctOverflow),
    ISR_hsk_isr6 ! (hsk_adc_isr, hsk_adc_isr_warmup),
    ISR_hsk_isr9 ! (hsk_pwc_isr_cc0, hsk_pwc_isr_cc1, hsk_pwc_isr_cc2, hsk_pwc_isr_cc3),
    ISR_hsk_timer0 ! (tick0)

    The "* ~ (...)" entries remove all calls to the functions from the call tree.
    The ISR_* functions are my ISRs and the "! (...)" assigns the callback functions they will call back. I have an awk script that generates these call tree adjustments for me.

    Removing unused functions would probably bring the greatest benefit for me. I have several projects using these libraries, 300-1000 lines of application logic pull in 5000-10000 lines of library code. Of course not every function of every library is required in every application.

     
  • Raphael Neider

    Raphael Neider - 2012-10-13

    The problem of pulling in unrelated functions is usually addressed by the author of the library by moving each function (or small group of closely related functions) into a separate source file.
    In my opinion, having the linker perform code elimination (and thus change relative addresses within object files) is not the right approach, as this restricts compile time optimizations such as sharing epilogue code among functions.
    Just my 2c.

     
  • Kamikaze Dominic Fandrey

    @tecodev
    I must strongly disagree.

    How would I group these things? Isn't it reasonable to group hsk_can_enable() and hsk_can_disable() together? Yet not both of them are not always needed.

    If I segregated them I would have to expose private information like bit positions and data structures in a common header, which would be a severe layering breach.

    I could duplicate preprocessor defines, which isn't any better in my book. And I'd still have to expose data structures, allowing the application layer code to mess with them directly instead of using the provided methods.

     
  • Kamikaze Dominic Fandrey

    It appears I have to apologize, I just came across this sentence in the Overlay chapter of the SDCC manual:
    > If an explicit storage class is specified for a local variable, it will NOT be overlaid.

    It's not the first time I read this, but its significance must have escaped me. I'm actually blowing up my memory needs by using idata! Now I reorganized my use of idata to:
    a) Function arguments from the 5th byte (this is rare)
    b) Globals, statics and rarely accessed locals in the context of the while(1) loop

    I've got plenty of free stack space, now. Not as much as C51/LX51, but a fairly huge amount non the less.
    SDCC: 99 bytes stack
    C51: 158 bytes stack

    - You see a call tree might still be handy,
    - As might the ability to manipulate it and remove unused functions (when I turn it off C51 still leaves me with 138 bytes of stack).
    - Overlaying for all memory types (even explicitly stated) would also be nice

    Any way, I'm sorry for exaggerating the issue.

    Now I've got to overhaul my documentation.

     
  • Kamikaze Dominic Fandrey

    One thing that can be done without building a call tree is overlaying leaf functions with a common register bank (using directive). Because that normally implies they are called with a common interrupt priority and thus cannot interrupt each other.

    This would also eliminate the a lot of nooverlay pragmas in my code.

    I tried this in µVision, where you can manually do something like this with the OVERLAY linker directive, and won a couple of extra bytes for the stack. Not much, but that's to be expected, ISRs are short and rarely use local variables.

     

Log in to post a comment.