#60 SLOC Overlay Option

open
nobody
None
5
2004-01-25
2004-01-25
No

Here's a crazy idea that might allow SDCC to be much
more useful for building large projects. The main
problem that large projects run into is the number of
SLOC variables allocated grows and grows, until there
is no internal code memory (mcs51) remaining.

What if SLOCs could be overlaid, perhaps using an
option like --sloc-overlay. The code generation for
calling a function would need to push any in-use SLOCs
onto the stack, just as it currently does for
registers, and pop them back off after the function
call. The SLOC naming would probably change to a
generic format so that every function within one file
would share the same SLOCs rather than allocating a new
batch.... and by allocating them into their own memory
segment ane using the linker's overlay feature, the
same space could be shared by all files in the entire
project. For a large project with hundreds of
functions, the total SLOC usage would be only the
maximum number used by any single function, and the
total precious internal RAM used would be that plus
only the in-use SLOCs pushed onto the stack during the
deepest nesting of subroutines.

This could free up a LOT of the very limited internal
RAM for large projects, allowing SDCC to be used on
large projects.

Discussion

  • Maarten Brock

    Maarten Brock - 2004-01-26

    Logged In: YES
    user_id=888171

    Hi Paul,

    Did this come out of the blue or were you inspired by my
    Open Discussion Forum question titled Overlaying?

    What I have observed is that SLOCs do start out in the
    overlayed segment, but that the overlaying scheme cannot
    overlay variables unless they are in a function that is a leaf in
    the function calling tree. In other words: only variables
    (including SLOCs) used in functions that do not call any other
    function get overlayed. This is a pity and would ask for an
    improvement (RFE).

    So currently, I think, SDCC does try to put SLOCs in the
    overlay segment. It's just not so good at overlaying.

    What you propose, is kind of what happens when you use the
    keyword reentrant. SDCC then allocates the SLOCs on the
    stack right away. This is what I had to resort to when my
    large project created too much SLOCs for data memory to
    contain.

    To conclude, I think SLOC allocation in overlayed data is
    usually the best, unless you need reentrancy. And the RFE
    should be to optimize the overlaying scheme.

    Greets,
    Maarten

     
  • Paul Stoffregen

    Paul Stoffregen - 2004-01-26

    Logged In: YES
    user_id=104682

    Maarten,

    Actually, I wrote a lengthy post to the developer mail list,
    with essentially this same SLOC overlay request... about 1
    year ago. I was inspired to write about it again, partly
    due to a private email exchange with another SDCC user
    facing (unrelated) problems, partly due to seeing discussion
    about linker memory allocation because some of the
    regression tests are running into memory shortage (perhaps
    from excessive SLOCs?), and partly because I'm getting back
    into my mp3 player project again and the recent live range
    bug fix introduced more conservative register allocation
    that increases SLOC usage. I read your post in the
    sourceforge sdcc open discussion forum only just now.

    Right now, there are three segments that correspond to the
    8051's internal ram: DSEG, OSEG and ISEG. When compiling in
    small model, SDCC places all of a function's directly
    addressed variables into either DESG or OSEG, depending on
    whether other functions are called, or the function is a
    leaf. (technically, there are linker segments for the
    register banks too) It is true that a more sophisticated
    approach could be used to analyze which functions call each
    other and make better use of overlaying (as Keil claims to
    do). But that would require a lot of work. I'm envisioning
    something much simpler.

    The basic idea is to create a special SLOC segment...
    perhaps "SSEG" or "SLOC_SEG". All functions, except perhaps
    interrupt routines and definately reentrant functions, could
    allocate their SLOCs to this segment rather than DSEG or
    OSEG. It would not matter what functions are called,
    because genCall would be changed to push all the in-use
    SLOCs onto the stack, as it currently does for in-use
    registers, and then pop them back off afterwards. Actually,
    the function prolog/epilog would also be changed to
    save/restore whatever SLOCs were used if CALLEE-SAVES is
    specified (or it's an interrupt), and genCall would not save
    the SLOCs when calling CALLEE-SAVES functions. I'm
    envisioning that SLOCs would be handled as closely as
    possible to registers, when it comes to saving and restoring
    them between functions.

    This approach is completely separate from the current
    overlaying strategy, where only leaf funtions are overlaid,
    at least in the compiler. All SLOCs would be overlaid in
    one common poll of memory. To the linker, it's just another
    segment that's overlaid. From a big picture perspective,
    the idea is to treat the allocation of SLOCs more like
    registers and less like global/static memory allocation.

    For large projects that quickly run out of DSEG due to all
    those spill locations (distributed over many functions),
    it's a huge win and it will even allow some critical xdata
    variables to be declared data for a significant improvement
    that offsets the tiny increase due to the extra SLOC pushes
    and pops when calling functions. But it should probably
    remain an option, since some projects may not want those
    extra pushes and pops. The linker should probably have a
    bit of code to check for excessive slocs when DSEG is full,
    and give the user a suggestion to try this sloc overlay option.

    I wanted to at least get this idea recorded into the feature
    request tracker, rather than long-lost on a mail list or web
    forum, and collect some feedback (or have someone point out
    ovbious problems I'm not forseeing). I too will soon run
    out of DSEG. Many times I've tweaked for hours, trying to
    find a way to get SDCC to not need some many temporaries and
    eliminate a few bytes of SLOCs. Sooner or later, tweaking
    in gen.c is going to be my path of least resistance to
    finally escaping the SLOC optimization hell.

    I'm glad you took the time to comment. Thanks, Paul

     
  • Frieder Ferlemann

    Logged In: YES
    user_id=589052

    Hi,
    I'd like to be able to explicitly specify an overlay segment
    for local variables and generated slocs.

    In the example below this would allow functions f() and g()
    to share temporary memory:

    #pragma save
    #pragma oseg 5
    void f()
    {
    char i; char xdata j;
    ..
    some_other_function();
    }

    void g()
    {
    char k; char xdata l;
    ..
    some_other_function();
    }
    #pragma restore

    The variables i and k and the slocs (in small memory model)
    would then be located in ODSEG_5.
    j, l would end up in OXSEG_5. ODSEG_5 would be defined as
    .area ODSEG_5 (DATA,OVR).

    This proposal can in some cases significantly reduce the
    amount of needed memory but as it requires "building (parts
    of) a call tree by hand" it is an error prone process.
    This is probably done for only a few functions so 5 to 10
    overlay segments should be enough.
    Compared to the proposals above this approach makes no
    difference between local variables and slocs and it wouldn't
    need to push slocs to the stack (pro: less invasive for the
    compiler, less work for cpu, con: more work for the programmer).

    I'd like my proposal even more if it would be usable for a
    compiler generated call tree, but I feel it doesn't scale
    well. There might be better ideas?

    Greetings

    Frieder

     
  • Maarten Brock

    Maarten Brock - 2007-05-05

    Logged In: YES
    user_id=888171
    Originator: NO

    Looking at it again, I like the idea of putting all sloc's or even all local variables in an overlay segment. Instead of pushing/popping this segment on stack it might also pay to move it to / retrieve it from xdata. I guess it all depends on how much space it needs.

    It probably can use some helper functions for the transfers.

    But wait a minute... this sounds kinda familiar. Apart from parameter passing this scheme makes all functions reentrant with the advantages of direct access on top of it.

     

Log in to post a comment.