I've had some success getting my STM32F0 project to compile using Link Time Optimisation with -flto.
My first attempts passing the options "-flto -Ox" (where x was 1,2 or 3) to the compile stage (linker stage will work it out) appeared to compile and link correctly but wouldn't actually run because the Reset and ISR handlers were being optimised away by the LTO process, as explained here: http://www.coocox.org/forum/topic.php?id=3002
To fix I added the attribute "used" to all the functions in cortexm/exception_handlers.c.
This now works when I compile with "-flto -O1" and "-flto -O2", and I see a great improvement in code size.
However it fails for "-flto -O3" with the error (shortened for post):
In function Reset_Handler':
../system/src/cortexm/exception_handlers.c:36:(.after_vectors+0x10): relocation truncated to fit: R_ARM_THM_JUMP11 against symbol_start' defined in .after_vectors section in /tmp/ccgJ4h7q.ltrans0.ltrans.o
I'm not sure why.
I should point out that although my project was originally based on a gnuarmeclipse STM32F031 template project it is using a custom Makefile, so may differ from a default setup.
I ran into the Reset_Handler "relocation truncated to fit: R_ARM_THM_JUMP11 against symbol_start" error again, even with -O1 and -O2.
There are two versions of Reset_Handler depending on whether you are compiling with DEBUG defined or not.
The DEBUG version is just a pure c call to _start ().
The non-DEBUG (Release) version contains embedded assembly:
"b _start \n"
With -flto enabled at least, _start may to be too far away to do a direct Branch (b) to it, depending on your program.
I think the pure c version allows the linker to compensate for this (a quick look at the disassembly shows Reset_Handler contains push, bl, and nop) and it cannot do this with the assembly.
The easy fix is to ensure the pure c version is always used.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
before giving up the assembly jump optimisation, can you check how would the code look like when using a long jump in assembly? for example loading the pc via another register?
the C call is there simply to help the debugger identify the proper stack frame, otherwise it is just a waste of stack space.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Sorry for the delay in following up, just to confirm that I updated the system folder in my custom-makefile STM32F0 project with the latest release and LTO seems to be working fine with no errors.
Thanks for implementing!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I've had some success getting my STM32F0 project to compile using Link Time Optimisation with -flto.
My first attempts passing the options "-flto -Ox" (where x was 1,2 or 3) to the compile stage (linker stage will work it out) appeared to compile and link correctly but wouldn't actually run because the Reset and ISR handlers were being optimised away by the LTO process, as explained here:
http://www.coocox.org/forum/topic.php?id=3002
To fix I added the attribute "used" to all the functions in cortexm/exception_handlers.c.
This now works when I compile with "-flto -O1" and "-flto -O2", and I see a great improvement in code size.
However it fails for "-flto -O3" with the error (shortened for post):
I'm not sure why.
I should point out that although my project was originally based on a gnuarmeclipse STM32F031 template project it is using a custom Makefile, so may differ from a default setup.
I'm using arm-none-eabi-gcc 4.8.4 20140526 from https://launchpad.net/gcc-arm-embedded.
Forgot to mention also changed the line:
attribute ((section(".isr_vector")))
to:
attribute ((used,section(".isr_vector")))
in cmsis/vectors_stm32f0xx.c
thank you for your suggestions, I'll consider them when testing the +flto option.
I ran into the Reset_Handler "relocation truncated to fit: R_ARM_THM_JUMP11 against symbol_start" error again, even with -O1 and -O2.
There are two versions of Reset_Handler depending on whether you are compiling with DEBUG defined or not.
The DEBUG version is just a pure c call to _start ().
The non-DEBUG (Release) version contains embedded assembly:
With -flto enabled at least, _start may to be too far away to do a direct Branch (b) to it, depending on your program.
I think the pure c version allows the linker to compensate for this (a quick look at the disassembly shows Reset_Handler contains push, bl, and nop) and it cannot do this with the assembly.
The easy fix is to ensure the pure c version is always used.
before giving up the assembly jump optimisation, can you check how would the code look like when using a long jump in assembly? for example loading the pc via another register?
the C call is there simply to help the debugger identify the proper stack frame, otherwise it is just a waste of stack space.
I just added -flto as an explicit configuration option in the common Optimizations section.
I also explicitly marked the vectors as "used" and I updated the branch in Reset_Handler to a long jump.
Could you test the beta version available from updates-test?
Great, I will test this as soon as I can and get back to you.
implemented since 2.4.1-201410142110.
Sorry for the delay in following up, just to confirm that I updated the system folder in my custom-makefile STM32F0 project with the latest release and LTO seems to be working fine with no errors.
Thanks for implementing!
please be aware that LTO support is yet not fully functional, and you may still encounter cases when the linker will fail.
also please see: https://bugs.launchpad.net/gcc-arm-embedded/+bug/1383856
Last edit: Liviu Ionescu (ilg) 2014-11-06