Menu

#950 [Z80] Add end-of-function marker

None
closed
nobody
5
2025-12-14
2024-10-30
Aoineko
No

Hello,

The biggest flaw with SDCC at the moment is the lack of unused code removal feature.
In a library like MSXgl with thousands of functions, this is very problematic.
I get around this by separating my code in files as much as possible and adding lots of define to activate/deactivate code, but:
1) it makes using the library more complex (for me when I want to add functions, but also for users),
2) there's still a lot of mess as I haven't created a define for any single function.

As a result, I'm planning to add a unused code removal feature myself:

  • Compile all modules in assembler
  • Make an analysis pass to build a function dependency network
  • Solve the network from main
  • Eliminate all unused functions from assembler files before generating RELs and making the link

It's on this last point that I'd like the add a feature which, in my opinion, would be very simple for you to add, but which would make my task much easier: add a label to the assembler code for the end of a function.

For example, void foo() function may generate:

_foo:: ; function start
/* ... */ ; function content
_foo__END__: ; function end

At first, I thought I'd use the label at the start of the next function or the end of the file to delimit the content of the functions to be deleted, but it's not that simple as some information essential to the assembler is sometimes stored between functions.

It could be a compile option if you don't want it to be default behavior.

And for those of you who might be wondering “why don't you add the deletion of unused code directly into SDCC yourself”, the answer is that I don't know SDCC well enough to have any idea how to do it. I'm going to do it in a context I'm familiar with (MSXgl's Build tool) but I will share my script here if it may help the SDCC team to implement the feature directly in their tool.

For now, all I need is just a end-of-function marker.

Discussion

  • Aoineko

    Aoineko - 2024-10-30

    I can't edit my post so here is the right formatting for the provided example:

    _foo:: ; function start
      ... ; function content
      ...
      ...
    _foo__END__: ; function end
    
     
    • Janko Stamenović

      I understand your needs, but have you tried to "simply" (I know it can appear "less than simple" for you) split the .c files to a single function per file, compile each separately and add each .rel separately to the .lib file? The linker will link only what's actually used.
      The "standard C" library in SDCC is also split to a function per file.
      And I've also created my own "specific target library" that way: to depend on the linker (I know, it's surely much, much smaller than yours) and I produce a few lib files, but it works all together and keeps the minimal resulting binary size perfectly.
      That's the traditional approach of creating C libraries, for many decades already.
      Even if it seems as a big task, it's still conceptually many orders of magnitude simpler than developing "an analysis pass to build a function dependency network" without any bugs (scripting the most trivial steps, it could probably be "solved" in a day or two, even with your code base). It's just the functions you've already written, but in more .c files. Nothing complicated in any way.
      You also avoid a dependency to custom tools of yours, additional processing by such tools during the compilation, maintaining the tools etc. It's simply "everything always prepared" to be linked to the minimally needed code, just by having more .c files.

       
      • Aoineko

        Aoineko - 2024-10-30

        I know, but no way I split my library into several hundreds of files. ^^
        It's not just a question of wasting time once cutting everything out.
        I don't think you realize how much more complex and time-consuming it is to have to manage such a large number of files.
        Now I have about sixty source files and when I need to make global changes, it already takes a lot of time.
        If SDCC prefer to stay old school on that matter, that's fine with me, but for my part I'll find more modern solutions to make life easier for myself and my lib's users. :)

         

        Last edit: Aoineko 2024-10-30
        • Janko Stamenović

          In my case, at least, I've worked on quite big projects already a few decades ago (megabytes of code), but I understand not everybody is used to such workflows. The first thing I've learned is to never search for anything using the file names, instead, to just search through the files, so I never needed to remember the file names or where exactly something is, and my workflow is still the same independently of where some code is (e.g. grep -r -n asinf * output has both the subfolders, file names and the lines where asinf is mentioned, in all folders and subfolders) So then: "but you have to change many files for to change that" was for me "OK, no problem with that". And I've also even then used the "patch" tools (with checksums, made by me, to be sure that what's patched is only what it should be) to organize the team's collaboration , and that also made sure, again, that the names of the files where some code is didn't have to be fixed in any way, the unit of the change was the patch, no matter how many files are changed by it.

          On another side, I also understand the need for "dead code elimination" across the whole binary, and the last time I've seen, there was some open feature request for SDCC for that, and I can imagine it will be eventually implemented.

          I'm just suggesting that specifically for libraries, splitting to more .c files is, from my point of view, very reasonable.

           
          • Under4Mhz

            Under4Mhz - 2024-11-05

            I personally prefer to have multiple functions per file.

            For my code, I've tended to have single source file per hardware platform (C64, NES, MSX etc) and find this more convenient to maintain. I'm in the source file for the platform and I can scroll up and down the file to make changes to platform.

            I'm usually on the limit of the 32K ram usage for all my games, so a few extra kilobytes makes a difference, but I really have resisted to moving to a single file per function strategy.

            An unused code elimination feature in SDCC would be great.

             
  • Benedikt Freisen

    Would it help if SDCC were able to e.g. output a list of global symbols defined in a translation unit and a list of global symbols referenced in a translation unit and accepted another list as input which lets you mark certain global symbols as globally unused?
    That would allow you to initially compile all translation units once, then do your magic on the lists and finally compile everything once more.
    The modifications to SDCC would remain relatively minor, because most of the relevant information is already there, internally.

    P.S.: I have fixed the formatting for you.

     
    • Janko Stamenović

      As far as I understand, it's really about "where the function ends" in an .asm file, which is what Aoineko would like to be exactly in a way convenient to him to split the files to separate assembly to smaller parts. But, IMO, it's possible that his problem would not be solved, even if these labels would exist: right now, one .c is, per C definition, a "single translation unit" with the static variables being valid inside of the translation unit. If he already has static variables, as soon as he'd split .c he'd have to solve "where these belong", and a static variable can be shared by more than one function. That's why I'm suggesting that, for a library, the cleanest approach is for the author to decide about all this at the C-files-level by creating the files "the right way".

       
      • Benedikt Freisen

        My idea was to have a second compilation run and to let the compiler know which symbols were globally unused across all compilation units in the first compilation run.
        In the second compilation run, the compiler would then omit emitting code for them, eliminating any need to remove unused code from the output.

        This approach could later be expanded to full-fledged whole program optimization with constant propagation in function calls between compilation units, control flow based assessment whether reentrant code is needed etc.

         
      • Aoineko

        Aoineko - 2024-10-30

        My plan is to:

        • Compile all my .C files to corresponding .ASM file.
        • Parse all the .ASM file to create a graph with all functions and all the their dependencies to other function.
        • Solve the graph from the main() function to get all needed functions.
        • Parse again the .ASM files and remove from code all unused functions (it's where I needed a reliable way to detect function boundaries in assembler).
        • Assemble all the .ASM to .REL.
        • Link all .REL together.
         
        👍
        1

        Last edit: Aoineko 2024-10-30
        • Janko Stamenović

          If you need to know "where the _foo__END__ would appear", do you have some example where the rule "it ends at the first next non-temporary label which starts with _ " fails? I think the only exception would be some label in an assembly you've written inside of a .c, and I think you can change that.
          I'm aware that some functions use some constants, these constants aren't, from the compiler' point of view, strictly the part of the function code, as, AFAIK, the constants could be shared by more than one function? That means that you'd anyway have to consider and handle everything that, from the compiler's point of view, wouldn't be inside _foo and _foo__END__, and you'd probably want to track also the use of the constants and remove all these that aren't used by the functions actually called?

          Edit: I think this discussion helps clearing up strategies to implement the feature in SDCC too, so thanks all.

           

          Last edit: Janko Stamenović 2024-10-30
          • Janko Stamenović

            Example for what I've described, this input:

            const char msg1[] = "Hello";
            const char msg2[] = "World";
            const char msg3[] = "!";
            void p( const char* s );
            void f1( void ) { p( msg1 ); }
            void f2( void ) { p( msg2 ); }
            void f3( void ) { p( msg3 ); }
            

            generates

            ;   ---------------------------------
            ; Function f1
            ; ---------------------------------
            _f1::
                ld  hl, #_msg1
                jp  _p
            _msg1:
                .ascii "Hello"
                .db 0x00
            _msg2:
                .ascii "World"
                .db 0x00
            _msg3:
                .ascii "!"
                .db 0x00
            ;t.c:7: void f2( void ) { p( msg2 ); }
            ;   ---------------------------------
            ; Function f2
            ; ---------------------------------
            _f2::
                ld  hl, #_msg2
                jp  _p
            ;t.c:8: void f3( void ) { p( msg3 ); }
            ;   ---------------------------------
            ; Function f3
            ; ---------------------------------
            _f3::
                ld  hl, #_msg3
                jp  _p
              .area _CODE    
             ...
            

            Note that _f1 ends as soon as _msg1 appears, and that all three constants are together before _f2, and that you can't know there which constant is used by which function unless you track that usage yourself.

             
          • Aoineko

            Aoineko - 2024-10-30

            You already gave an example of assembler code added next to a function even if it is not related (this is the case with all constant data). There is also some definition that are sometime added between function like:

            ;--------------------------------------------------------
            ; code
            ;--------------------------------------------------------
                .area _CODE
            ;./main.c:175: void VDP_InterruptHandler()
            ;   ---------------------------------
            ; Function VDP_InterruptHandler
            ; ---------------------------------
            _VDP_InterruptHandler::
            ;./main.c:177: g_VBlank = 1;
                ld  hl, #_g_VBlank
                ld  (hl), #0x01
            ;./main.c:178: }
                ret
            _g_RDPRIM   =   0xf380 ; <----- Here!
            _g_WRPRIM   =   0xf385
            _g_CLPRIM   =   0xf38c
            _g_USRTAB   =   0xf39a
            _g_CNSDFG   =   0xf3de
            _g_RG0SAV   =   0xf3df
            _g_RG1SAV   =   0xf3e0
            _g_RG2SAV   =   0xf3e1
            [...]
            

            In fact, adding a marker (assembly label) at the end of a function would have another benefit: in MSXgl's Build tool, I have an analysis tool that gives the user statistics on the size of the code in the various modules, the list of the largest functions, and so on. The size is sometimes incorrect precisely because of the static data stored between the functions.
            The end-of-function marker would give the exact size of each function.

             

            Last edit: Aoineko 2024-10-30
            • Janko Stamenović

              Still, my point is that you'd have to have a "smart enough" parsing of assembly anyway, and that it has to be so advanced that you actually don't need the "end markers" to achieve the goals of knowing where the function is, as you'd anyway have to track all the labels, their appearance and their usage -- so the first non-function-local label is already also the end of the previous function.
              Also note that the constants I've written there are actually global constants:

              ;--------------------------------------------------------
              ; Public variables in this module
              ;--------------------------------------------------------
                  .globl _f3
                  .globl _f2
                  .globl _f1
                  .globl _p
                  .globl _msg3
                  .globl _msg2
                  .globl _msg1
              

              To sum, seems to me, to really solve the problem the way you attempt to do, you would "get" the "function ends" simply by implementing everything you have to implement anyway, and except for that different goal of "knowing exactly how big the function is" independently of that "good", ends-resulting, all labels tracking .asm processing, "end" labels aren't necessary.

               

              Last edit: Janko Stamenović 2024-10-30
              • Aoineko

                Aoineko - 2024-10-31

                Sorry but I really don't get you.

                Having a function-end marker is a very simple and reliable way to know exactly where each function starts and ends. Nothing smart needed here.

                Sure I can try to figurate out by myself where the function ends by interpreting every single line of the assembler code but:
                1) it's a quite difficult task as for example to know if a given data definition is outside a function (static C variable) or inside the function (defined in assembler in the function),
                2) it's not reliable while I can't be sure I track all possible end conditions (for now and the future).

                Adding a function-end marker seems simple to me and opens up many possibilities for parsing assembly code.

                That said, if the SDCC team doesn't want to add this feature, no problem, I'll manage to find another solution that satisfies my needs.

                 

                Last edit: Aoineko 2024-10-31
  • Ragozini Arturo

    Ragozini Arturo - 2024-10-31

    Placing an "end label" after each function should be easy for the compiler. The problem is to include the constant data used by the function itself in the block. The critical part is when a table/string/constant is used by multiple functions.

    The optimal solution would be to compute the coverage of each data structure and include it selectively with its functions, removing duplications.
    Actually, SDCC has already a coverage test, as it issues a warning for unused variables and data structures.

     
    • Philipp Klaus Krause

      SDCC currently only warns for unused local variables, not for unused global ones; and those unused local ones can (and AFAIK are) omitted by the compiler anyway, so they won't end up in the assembler code.

       
  • Benedikt Freisen

    Does sdas support asxxxx's conditional assembly directives, i.e. .if, .else and .endif?
    If it does, having the compiler wrap functions in them and leaving the actual filtering to the assembler could be an elegant solution.

     
    • Philipp Klaus Krause

      I don't think so. Omitting unused non-static functions and objects is fundamentally a link-time optimization: [feature-requests:#452]. [feature-requests:#413]. Doing so for static ones could be done either at compile time (would require making SDCC a two-pass compiler) or link time.

       

      Related

      Feature Requests: #413
      Feature Requests: #452


      Last edit: Philipp Klaus Krause 2024-11-07
  • Aoineko

    Aoineko - 2025-12-14

    You can close this request.
    I'll put my hope in Feature Requests: #452.

     
  • Benedikt Freisen

    • status: open --> closed
     
  • Benedikt Freisen

    Closed as requested. See [feature-requests:#452].

     

    Related

    Feature Requests: #452


Log in to post a comment.

MongoDB Logo MongoDB