Hello,
The biggest flaw with SDCC at the moment is the lack of unused code removal feature.
In a library like MSXgl with thousands of functions, this is very problematic.
I get around this by separating my code in files as much as possible and adding lots of define to activate/deactivate code, but:
1) it makes using the library more complex (for me when I want to add functions, but also for users),
2) there's still a lot of mess as I haven't created a define for any single function.
As a result, I'm planning to add a unused code removal feature myself:
It's on this last point that I'd like the add a feature which, in my opinion, would be very simple for you to add, but which would make my task much easier: add a label to the assembler code for the end of a function.
For example, void foo() function may generate:
_foo:: ; function start
/* ... */ ; function content
_foo__END__: ; function end
At first, I thought I'd use the label at the start of the next function or the end of the file to delimit the content of the functions to be deleted, but it's not that simple as some information essential to the assembler is sometimes stored between functions.
It could be a compile option if you don't want it to be default behavior.
And for those of you who might be wondering “why don't you add the deletion of unused code directly into SDCC yourself”, the answer is that I don't know SDCC well enough to have any idea how to do it. I'm going to do it in a context I'm familiar with (MSXgl's Build tool) but I will share my script here if it may help the SDCC team to implement the feature directly in their tool.
For now, all I need is just a end-of-function marker.
I can't edit my post so here is the right formatting for the provided example:
I understand your needs, but have you tried to "simply" (I know it can appear "less than simple" for you) split the .c files to a single function per file, compile each separately and add each .rel separately to the .lib file? The linker will link only what's actually used.
The "standard C" library in SDCC is also split to a function per file.
And I've also created my own "specific target library" that way: to depend on the linker (I know, it's surely much, much smaller than yours) and I produce a few lib files, but it works all together and keeps the minimal resulting binary size perfectly.
That's the traditional approach of creating C libraries, for many decades already.
Even if it seems as a big task, it's still conceptually many orders of magnitude simpler than developing "an analysis pass to build a function dependency network" without any bugs (scripting the most trivial steps, it could probably be "solved" in a day or two, even with your code base). It's just the functions you've already written, but in more .c files. Nothing complicated in any way.
You also avoid a dependency to custom tools of yours, additional processing by such tools during the compilation, maintaining the tools etc. It's simply "everything always prepared" to be linked to the minimally needed code, just by having more .c files.
I know, but no way I split my library into several hundreds of files. ^^
It's not just a question of wasting time once cutting everything out.
I don't think you realize how much more complex and time-consuming it is to have to manage such a large number of files.
Now I have about sixty source files and when I need to make global changes, it already takes a lot of time.
If SDCC prefer to stay old school on that matter, that's fine with me, but for my part I'll find more modern solutions to make life easier for myself and my lib's users. :)
Last edit: Aoineko 2024-10-30
In my case, at least, I've worked on quite big projects already a few decades ago (megabytes of code), but I understand not everybody is used to such workflows. The first thing I've learned is to never search for anything using the file names, instead, to just search through the files, so I never needed to remember the file names or where exactly something is, and my workflow is still the same independently of where some code is (e.g.
grep -r -n asinf *output has both the subfolders, file names and the lines where asinf is mentioned, in all folders and subfolders) So then: "but you have to change many files for to change that" was for me "OK, no problem with that". And I've also even then used the "patch" tools (with checksums, made by me, to be sure that what's patched is only what it should be) to organize the team's collaboration , and that also made sure, again, that the names of the files where some code is didn't have to be fixed in any way, the unit of the change was the patch, no matter how many files are changed by it.On another side, I also understand the need for "dead code elimination" across the whole binary, and the last time I've seen, there was some open feature request for SDCC for that, and I can imagine it will be eventually implemented.
I'm just suggesting that specifically for libraries, splitting to more .c files is, from my point of view, very reasonable.
I personally prefer to have multiple functions per file.
For my code, I've tended to have single source file per hardware platform (C64, NES, MSX etc) and find this more convenient to maintain. I'm in the source file for the platform and I can scroll up and down the file to make changes to platform.
I'm usually on the limit of the 32K ram usage for all my games, so a few extra kilobytes makes a difference, but I really have resisted to moving to a single file per function strategy.
An unused code elimination feature in SDCC would be great.
Would it help if SDCC were able to e.g. output a list of global symbols defined in a translation unit and a list of global symbols referenced in a translation unit and accepted another list as input which lets you mark certain global symbols as globally unused?
That would allow you to initially compile all translation units once, then do your magic on the lists and finally compile everything once more.
The modifications to SDCC would remain relatively minor, because most of the relevant information is already there, internally.
P.S.: I have fixed the formatting for you.
As far as I understand, it's really about "where the function ends" in an
.asmfile, which is what Aoineko would like to be exactly in a way convenient to him to split the files to separate assembly to smaller parts. But, IMO, it's possible that his problem would not be solved, even if these labels would exist: right now, one .c is, per C definition, a "single translation unit" with the static variables being valid inside of the translation unit. If he already has static variables, as soon as he'd split .c he'd have to solve "where these belong", and a static variable can be shared by more than one function. That's why I'm suggesting that, for a library, the cleanest approach is for the author to decide about all this at the C-files-level by creating the files "the right way".My idea was to have a second compilation run and to let the compiler know which symbols were globally unused across all compilation units in the first compilation run.
In the second compilation run, the compiler would then omit emitting code for them, eliminating any need to remove unused code from the output.
This approach could later be expanded to full-fledged whole program optimization with constant propagation in function calls between compilation units, control flow based assessment whether reentrant code is needed etc.
My plan is to:
Last edit: Aoineko 2024-10-30
If you need to know "where the
_foo__END__would appear", do you have some example where the rule "it ends at the first next non-temporary label which starts with_" fails? I think the only exception would be some label in an assembly you've written inside of a .c, and I think you can change that.I'm aware that some functions use some constants, these constants aren't, from the compiler' point of view, strictly the part of the function code, as, AFAIK, the constants could be shared by more than one function? That means that you'd anyway have to consider and handle everything that, from the compiler's point of view, wouldn't be inside
_fooand_foo__END__, and you'd probably want to track also the use of the constants and remove all these that aren't used by the functions actually called?Edit: I think this discussion helps clearing up strategies to implement the feature in SDCC too, so thanks all.
Last edit: Janko Stamenović 2024-10-30
Example for what I've described, this input:
generates
Note that
_f1ends as soon as_msg1appears, and that all three constants are together before_f2, and that you can't know there which constant is used by which function unless you track that usage yourself.You already gave an example of assembler code added next to a function even if it is not related (this is the case with all constant data). There is also some definition that are sometime added between function like:
In fact, adding a marker (assembly label) at the end of a function would have another benefit: in MSXgl's Build tool, I have an analysis tool that gives the user statistics on the size of the code in the various modules, the list of the largest functions, and so on. The size is sometimes incorrect precisely because of the static data stored between the functions.
The end-of-function marker would give the exact size of each function.
Last edit: Aoineko 2024-10-30
Still, my point is that you'd have to have a "smart enough" parsing of assembly anyway, and that it has to be so advanced that you actually don't need the "end markers" to achieve the goals of knowing where the function is, as you'd anyway have to track all the labels, their appearance and their usage -- so the first non-function-local label is already also the end of the previous function.
Also note that the constants I've written there are actually global constants:
To sum, seems to me, to really solve the problem the way you attempt to do, you would "get" the "function ends" simply by implementing everything you have to implement anyway, and except for that different goal of "knowing exactly how big the function is" independently of that "good", ends-resulting, all labels tracking .asm processing, "end" labels aren't necessary.
Last edit: Janko Stamenović 2024-10-30
Sorry but I really don't get you.
Having a function-end marker is a very simple and reliable way to know exactly where each function starts and ends. Nothing smart needed here.
Sure I can try to figurate out by myself where the function ends by interpreting every single line of the assembler code but:
1) it's a quite difficult task as for example to know if a given data definition is outside a function (static C variable) or inside the function (defined in assembler in the function),
2) it's not reliable while I can't be sure I track all possible end conditions (for now and the future).
Adding a function-end marker seems simple to me and opens up many possibilities for parsing assembly code.
That said, if the SDCC team doesn't want to add this feature, no problem, I'll manage to find another solution that satisfies my needs.
Last edit: Aoineko 2024-10-31
Placing an "end label" after each function should be easy for the compiler. The problem is to include the constant data used by the function itself in the block. The critical part is when a table/string/constant is used by multiple functions.
The optimal solution would be to compute the coverage of each data structure and include it selectively with its functions, removing duplications.
Actually, SDCC has already a coverage test, as it issues a warning for unused variables and data structures.
SDCC currently only warns for unused local variables, not for unused global ones; and those unused local ones can (and AFAIK are) omitted by the compiler anyway, so they won't end up in the assembler code.
Does sdas support asxxxx's conditional assembly directives, i.e.
.if,.elseand.endif?If it does, having the compiler wrap functions in them and leaving the actual filtering to the assembler could be an elegant solution.
I don't think so. Omitting unused non-static functions and objects is fundamentally a link-time optimization: [feature-requests:#452]. [feature-requests:#413]. Doing so for static ones could be done either at compile time (would require making SDCC a two-pass compiler) or link time.
Related
Feature Requests: #413
Feature Requests: #452
Last edit: Philipp Klaus Krause 2024-11-07
You can close this request.
I'll put my hope in Feature Requests: #452.
Closed as requested. See [feature-requests:#452].
Related
Feature Requests: #452