Small Device C Compiler (SDCC) / Feature Requests / #950 [Z80] Add end-of-function marker

Aoineko - 2024-10-30

I can't edit my post so here is the right formatting for the provided example:

_foo:: ; function start ... ; function content ... ... _foo__END__: ; function end
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Janko Stamenović - 2024-10-30
  
  I understand your needs, but have you tried to "simply" (I know it can appear "less than simple" for you) split the .c files to a single function per file, compile each separately and add each .rel separately to the .lib file? The linker will link only what's actually used.
  The "standard C" library in SDCC is also split to a function per file.
  And I've also created my own "specific target library" that way: to depend on the linker (I know, it's surely much, much smaller than yours) and I produce a few lib files, but it works all together and keeps the minimal resulting binary size perfectly.
  That's the traditional approach of creating C libraries, for many decades already.
  Even if it seems as a big task, it's still conceptually many orders of magnitude simpler than developing "an analysis pass to build a function dependency network" without any bugs (scripting the most trivial steps, it could probably be "solved" in a day or two, even with your code base). It's just the functions you've already written, but in more .c files. Nothing complicated in any way.
  You also avoid a dependency to custom tools of yours, additional processing by such tools during the compilation, maintaining the tools etc. It's simply "everything always prepared" to be linked to the minimally needed code, just by having more .c files.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Aoineko - 2024-10-30
    
    I know, but no way I split my library into several hundreds of files. ^^
    It's not just a question of wasting time once cutting everything out.
    I don't think you realize how much more complex and time-consuming it is to have to manage such a large number of files.
    Now I have about sixty source files and when I need to make global changes, it already takes a lot of time.
    If SDCC prefer to stay old school on that matter, that's fine with me, but for my part I'll find more modern solutions to make life easier for myself and my lib's users. :)
    
    Last edit: Aoineko 2024-10-30
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Janko Stamenović - 2024-10-30
      
      In my case, at least, I've worked on quite big projects already a few decades ago (megabytes of code), but I understand not everybody is used to such workflows. The first thing I've learned is to never search for anything using the file names, instead, to just search through the files, so I never needed to remember the file names or where exactly something is, and my workflow is still the same independently of where some code is (e.g. grep -r -n asinf * output has both the subfolders, file names and the lines where asinf is mentioned, in all folders and subfolders) So then: "but you have to change many files for to change that" was for me "OK, no problem with that". And I've also even then used the "patch" tools (with checksums, made by me, to be sure that what's patched is only what it should be) to organize the team's collaboration , and that also made sure, again, that the names of the files where some code is didn't have to be fixed in any way, the unit of the change was the patch, no matter how many files are changed by it.
      
      On another side, I also understand the need for "dead code elimination" across the whole binary, and the last time I've seen, there was some open feature request for SDCC for that, and I can imagine it will be eventually implemented.
      
      I'm just suggesting that specifically for libraries, splitting to more .c files is, from my point of view, very reasonable.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Under4Mhz - 2024-11-05
        
        I personally prefer to have multiple functions per file.
        
        For my code, I've tended to have single source file per hardware platform (C64, NES, MSX etc) and find this more convenient to maintain. I'm in the source file for the platform and I can scroll up and down the file to make changes to platform.
        
        I'm usually on the limit of the 32K ram usage for all my games, so a few extra kilobytes makes a difference, but I really have resisted to moving to a single file per function strategy.
        
        An unused code elimination feature in SDCC would be great.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Benedikt Freisen - 2024-10-30

Would it help if SDCC were able to e.g. output a list of global symbols defined in a translation unit and a list of global symbols referenced in a translation unit and accepted another list as input which lets you mark certain global symbols as globally unused?
That would allow you to initially compile all translation units once, then do your magic on the lists and finally compile everything once more.
The modifications to SDCC would remain relatively minor, because most of the relevant information is already there, internally.

P.S.: I have fixed the formatting for you.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Janko Stamenović - 2024-10-30
  
  As far as I understand, it's really about "where the function ends" in an .asm file, which is what Aoineko would like to be exactly in a way convenient to him to split the files to separate assembly to smaller parts. But, IMO, it's possible that his problem would not be solved, even if these labels would exist: right now, one .c is, per C definition, a "single translation unit" with the static variables being valid inside of the translation unit. If he already has static variables, as soon as he'd split .c he'd have to solve "where these belong", and a static variable can be shared by more than one function. That's why I'm suggesting that, for a library, the cleanest approach is for the author to decide about all this at the C-files-level by creating the files "the right way".
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Benedikt Freisen - 2024-10-30
    
    My idea was to have a second compilation run and to let the compiler know which symbols were globally unused across all compilation units in the first compilation run.
    In the second compilation run, the compiler would then omit emitting code for them, eliminating any need to remove unused code from the output.
    
    This approach could later be expanded to full-fledged whole program optimization with constant propagation in function calls between compilation units, control flow based assessment whether reentrant code is needed etc.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Aoineko - 2024-10-30
    
    My plan is to:
    
    Compile all my .C files to corresponding .ASM file.
    
    Parse all the .ASM file to create a graph with all functions and all the their dependencies to other function.
    
    Solve the graph from the main() function to get all needed functions.
    
    Parse again the .ASM files and remove from code all unused functions (it's where I needed a reliable way to detect function boundaries in assembler).
    
    Assemble all the .ASM to .REL.
    
    Link all .REL together.
    
    👍
    1
    
    Last edit: Aoineko 2024-10-30
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Janko Stamenović - 2024-10-30
      
      If you need to know "where the _foo__END__ would appear", do you have some example where the rule "it ends at the first next non-temporary label which starts with _ " fails? I think the only exception would be some label in an assembly you've written inside of a .c, and I think you can change that.
      I'm aware that some functions use some constants, these constants aren't, from the compiler' point of view, strictly the part of the function code, as, AFAIK, the constants could be shared by more than one function? That means that you'd anyway have to consider and handle everything that, from the compiler's point of view, wouldn't be inside _foo and _foo__END__, and you'd probably want to track also the use of the constants and remove all these that aren't used by the functions actually called?
      
      Edit: I think this discussion helps clearing up strategies to implement the feature in SDCC too, so thanks all.
      
      Last edit: Janko Stamenović 2024-10-30
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Janko Stamenović - 2024-10-30
        
        Example for what I've described, this input:
        
        const char msg1[] = "Hello"; const char msg2[] = "World"; const char msg3[] = "!"; void p( const char* s ); void f1( void ) { p( msg1 ); } void f2( void ) { p( msg2 ); } void f3( void ) { p( msg3 ); }
        
        generates
        
        ; --------------------------------- ; Function f1 ; --------------------------------- _f1:: ld hl, #_msg1 jp _p _msg1: .ascii "Hello" .db 0x00 _msg2: .ascii "World" .db 0x00 _msg3: .ascii "!" .db 0x00 ;t.c:7: void f2( void ) { p( msg2 ); } ; --------------------------------- ; Function f2 ; --------------------------------- _f2:: ld hl, #_msg2 jp _p ;t.c:8: void f3( void ) { p( msg3 ); } ; --------------------------------- ; Function f3 ; --------------------------------- _f3:: ld hl, #_msg3 jp _p .area _CODE ...
        
        Note that _f1 ends as soon as _msg1 appears, and that all three constants are together before _f2, and that you can't know there which constant is used by which function unless you track that usage yourself.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Aoineko - 2024-10-30
        
        You already gave an example of assembler code added next to a function even if it is not related (this is the case with all constant data). There is also some definition that are sometime added between function like:
        
        ;-------------------------------------------------------- ; code ;-------------------------------------------------------- .area _CODE ;./main.c:175: void VDP_InterruptHandler() ; --------------------------------- ; Function VDP_InterruptHandler ; --------------------------------- _VDP_InterruptHandler:: ;./main.c:177: g_VBlank = 1; ld hl, #_g_VBlank ld (hl), #0x01 ;./main.c:178: } ret _g_RDPRIM = 0xf380 ; <----- Here! _g_WRPRIM = 0xf385 _g_CLPRIM = 0xf38c _g_USRTAB = 0xf39a _g_CNSDFG = 0xf3de _g_RG0SAV = 0xf3df _g_RG1SAV = 0xf3e0 _g_RG2SAV = 0xf3e1 [...]
        
        In fact, adding a marker (assembly label) at the end of a function would have another benefit: in MSXgl's Build tool, I have an analysis tool that gives the user statistics on the size of the code in the various modules, the list of the largest functions, and so on. The size is sometimes incorrect precisely because of the static data stored between the functions.
        The end-of-function marker would give the exact size of each function.
        
        Last edit: Aoineko 2024-10-30
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Janko Stamenović - 2024-10-30
        
        Still, my point is that you'd have to have a "smart enough" parsing of assembly anyway, and that it has to be so advanced that you actually don't need the "end markers" to achieve the goals of knowing where the function is, as you'd anyway have to track all the labels, their appearance and their usage -- so the first non-function-local label is already also the end of the previous function.
        Also note that the constants I've written there are actually global constants:
        
        ;-------------------------------------------------------- ; Public variables in this module ;-------------------------------------------------------- .globl _f3 .globl _f2 .globl _f1 .globl _p .globl _msg3 .globl _msg2 .globl _msg1
        
        To sum, seems to me, to really solve the problem the way you attempt to do, you would "get" the "function ends" simply by implementing everything you have to implement anyway, and except for that different goal of "knowing exactly how big the function is" independently of that "good", ends-resulting, all labels tracking .asm processing, "end" labels aren't necessary.
        
        Last edit: Janko Stamenović 2024-10-30
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Aoineko - 2024-10-31
        
        Sorry but I really don't get you.
        
        Having a function-end marker is a very simple and reliable way to know exactly where each function starts and ends. Nothing smart needed here.
        
        Sure I can try to figurate out by myself where the function ends by interpreting every single line of the assembler code but:
        1) it's a quite difficult task as for example to know if a given data definition is outside a function (static C variable) or inside the function (defined in assembler in the function),
        2) it's not reliable while I can't be sure I track all possible end conditions (for now and the future).
        
        Adding a function-end marker seems simple to me and opens up many possibilities for parsing assembly code.
        
        That said, if the SDCC team doesn't want to add this feature, no problem, I'll manage to find another solution that satisfies my needs.
        
        Last edit: Aoineko 2024-10-31
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ragozini Arturo - 2024-10-31

Placing an "end label" after each function should be easy for the compiler. The problem is to include the constant data used by the function itself in the block. The critical part is when a table/string/constant is used by multiple functions.

The optimal solution would be to compute the coverage of each data structure and include it selectively with its functions, removing duplications.
Actually, SDCC has already a coverage test, as it issues a warning for unused variables and data structures.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Philipp Klaus Krause - 2024-11-07
  
  SDCC currently only warns for unused local variables, not for unused global ones; and those unused local ones can (and AFAIK are) omitted by the compiler anyway, so they won't end up in the assembler code.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Benedikt Freisen - 2024-11-05

Does sdas support asxxxx's conditional assembly directives, i.e. .if, .else and .endif?
If it does, having the compiler wrap functions in them and leaving the actual filtering to the assembler could be an elegant solution.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Philipp Klaus Krause - 2024-11-07
  
  I don't think so. Omitting unused non-static functions and objects is fundamentally a link-time optimization: [feature-requests:#452]. [feature-requests:#413]. Doing so for static ones could be done either at compile time (would require making SDCC a two-pass compiler) or link time.
  
  Related
  
  Feature Requests: #413
  Feature Requests: #452
  
  Last edit: Philipp Klaus Krause 2024-11-07
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Aoineko - 2025-12-14

You can close this request.
I'll put my hope in Feature Requests: #452.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Benedikt Freisen - 2025-12-14

status: open --> closed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Benedikt Freisen - 2025-12-14

Closed as requested. See [feature-requests:#452].

Related

Feature Requests: #452

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

[Z80] Add end-of-function marker

The Small Device C Compiler (SDCC), targeting 8-bit architectures

Group

Searches

Help

#950 [Z80] Add end-of-function marker

Discussion

Related

Related