SourceForge has been redesigned. Learn more.

#1855 Z80: Very long compilation time for certain files

z80 port (188)

I'm compiling the Petit FatFs source (attached) with the following command:

sdcc -mz80 --data-loc 0xC001 --no-std-crt0 --no-peep --code-loc 0x8000 pff.c

My SDCC version is:

mcs51/gbz80/z80/z180/ds390/pic16/pic14/TININative/ds400/hc08 3.0.5 #6899
(Oct 2 2011) (MINGW32)

The problem is that compiling this particular file takes several minutes on a 2 GHz Core2 T7200 (running 32-bit XP SP2), which seems absurd considering that it's only ~1000 lines of code + a couple of 100 lines worth of header files. I have other source code files about the same size that compile in a few seconds with the same version of SDCC on the same computer.


  • Comment has been marked as spam. 

    You can see all pending comments posted by this user  here


    Anonymous - 2011-10-03

    demo code

    Last edit: Anonymous 2013-09-25
  • Philipp Klaus Krause

    I already noticed that compiling FatFS seems to be a relatively hard task for the new register allocator, so it is not surprising that this carries over to Petit FatFS. I'll have a look at the sources and try to find out what's so special about the FatFS source.


  • Philipp Klaus Krause

    The chk_mounted() function from ff.c is a monster. It has 770 bytes of local variables and rather complex control flow. This results in the register allocator taking a long time.
    As a workaround for this issue you can use --oldralloc.


  • Brian Ruthven

    Brian Ruthven - 2011-10-17

    I think I've noticed this since the new register allocator too. My project went from a compile time of 20-30 seconds up to about 5-6 minutes. I've also noticed that building sdcc from source stalls for a long time on printf_large.c. Here's my data for this latter point, using sdcc-src-20111015-6966:

    % pwd
    % /usr/bin/time ../../../bin/sdcc -mz80 -I./../../include -I. --std-c99 -c --oldralloc ../printf_large.c

    real 0.9
    user 0.8
    sys 0.0

    Compiling the same file on the same system without --oldralloc gives a rather large 26 minutes:

    % /usr/bin/time ../../../bin/sdcc -mz80 -I./../../include -I. --std-c99 -c ../printf_large.c

    real 26:26.4
    user 26:22.1
    sys 0.3

    The effect is obviously magnified if building z80 + z180 + gbz80 as each builds a copy for their own device lib.

    I think the new register allocator does a good job (from what I've seen of it so far), but I wonder if the algorithm can be tightened up to improve the speed.

  • Philipp Klaus Krause

    I currently do not see an easy way to fix this. However there seems to be a problem with obtaining the tree decomposition of the control-flow graph here; it has higher width than it should have, which can result in longer compilation time. Maybe Thorup's heuristic, which we currently use is not optimal here (or there's a bug in the implementation). Alternatives will be investigated, but this may take time.


  • Philipp Klaus Krause

    • priority: 5 --> 3
  • Philipp Klaus Krause

    Reducing priority a bit, since this is just a sdcc performance problem.


  • Philipp Klaus Krause

    • Category: --> Z80
  • Philipp Klaus Krause

    • Category: Z80 --> other
    • Priority: 3 --> 5
  • Philipp Klaus Krause

    Bug #2296 is a duplicate of this one. Again, Thorup's heurisistc gives a decomposition of a width much higher than we would want. THe example in bug #2296 is interesting as it has a very regular structure.

    Since more ports than the Z80 use tree-decomposition-based algorithms now, I'm removing the category. I am also increasing the priority again, since this issue is important for the compilation time / code quality trade-off. AFAIR, printf_large() in the standard library is affected as well.


  • Philipp Klaus Krause

    It seems that the problem is most pronounced on Windows: On My Core i7 with Debian GNU/Linux the pff.c takes about 4 seconds to compile.


  • Philipp Klaus Krause

    • status: open --> closed-rejected
    • assigned_to: Philipp Klaus Krause
  • Philipp Klaus Krause

    The new methods to obtain tree-decompositions, while giving decompositions of lower width, so far unfortunately don't seem to have a big impact on sdcc runtime.

    The new register allocator makes a lot of calls to "new" (the C++ equivalent of malloc()). "new" is slow on Windows XP. I guess the easiest way is for sdcc users to use an OS with a faster "new"", such as GNU/Linux or Windows 7. I will investigate the tree-decomposition aspect further, but I don't expect a big speed improvement on Windows XP anytime soon.


    P.S.: Another workaround for now is using --oldralloc, which will reduce the number of invocations of "new", but might affect code size.

  • Philipp Klaus Krause

    With the recent fix for deeply nested branches in tree-decompositions, the new methods perform better. While Dhrystone and Whestone seem unaffected, the compilation time for Coremark is reduced by 40-60% (depending on target) on GNU/Linux. I would expect similar improvements on other OSes.



Log in to post a comment.