Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

#1855 Z80: Very long compilation time for certain files

closed-rejected
z80 port (188)
other
5
2015-02-18
2011-10-03
Anonymous
No

I'm compiling the Petit FatFs source (attached) with the following command:

sdcc -mz80 --data-loc 0xC001 --no-std-crt0 --no-peep --code-loc 0x8000 pff.c

My SDCC version is:

mcs51/gbz80/z80/z180/ds390/pic16/pic14/TININative/ds400/hc08 3.0.5 #6899
(Oct 2 2011) (MINGW32)

The problem is that compiling this particular file takes several minutes on a 2 GHz Core2 T7200 (running 32-bit XP SP2), which seems absurd considering that it's only ~1000 lines of code + a couple of 100 lines worth of header files. I have other source code files about the same size that compile in a few seconds with the same version of SDCC on the same computer.

Discussion


  • Anonymous
    2011-10-03

    demo code

     
    Last edit: Anonymous 2013-09-25
    Attachments
  • I already noticed that compiling FatFS seems to be a relatively hard task for the new register allocator, so it is not surprising that this carries over to Petit FatFS. I'll have a look at the sources and try to find out what's so special about the FatFS source.

    Philipp

     
  • The chk_mounted() function from ff.c is a monster. It has 770 bytes of local variables and rather complex control flow. This results in the register allocator taking a long time.
    As a workaround for this issue you can use --oldralloc.

    Philipp

     
  • Brian Ruthven
    Brian Ruthven
    2011-10-17

    I think I've noticed this since the new register allocator too. My project went from a compile time of 20-30 seconds up to about 5-6 minutes. I've also noticed that building sdcc from source stalls for a long time on printf_large.c. Here's my data for this latter point, using sdcc-src-20111015-6966:

    % pwd
    /build/sdcc/device/lib/z80
    % /usr/bin/time ../../../bin/sdcc -mz80 -I./../../include -I. --std-c99 -c --oldralloc ../printf_large.c

    real 0.9
    user 0.8
    sys 0.0

    Compiling the same file on the same system without --oldralloc gives a rather large 26 minutes:

    % /usr/bin/time ../../../bin/sdcc -mz80 -I./../../include -I. --std-c99 -c ../printf_large.c

    real 26:26.4
    user 26:22.1
    sys 0.3

    The effect is obviously magnified if building z80 + z180 + gbz80 as each builds a copy for their own device lib.

    I think the new register allocator does a good job (from what I've seen of it so far), but I wonder if the algorithm can be tightened up to improve the speed.

     
  • I currently do not see an easy way to fix this. However there seems to be a problem with obtaining the tree decomposition of the control-flow graph here; it has higher width than it should have, which can result in longer compilation time. Maybe Thorup's heuristic, which we currently use is not optimal here (or there's a bug in the implementation). Alternatives will be investigated, but this may take time.

    Philipp

     
    • priority: 5 --> 3
     
  • Reducing priority a bit, since this is just a sdcc performance problem.

    Philipp

     
    • Category: --> Z80
     
    • Category: Z80 --> other
    • Priority: 3 --> 5
     
  • Bug #2296 is a duplicate of this one. Again, Thorup's heurisistc gives a decomposition of a width much higher than we would want. THe example in bug #2296 is interesting as it has a very regular structure.

    Since more ports than the Z80 use tree-decomposition-based algorithms now, I'm removing the category. I am also increasing the priority again, since this issue is important for the compilation time / code quality trade-off. AFAIR, printf_large() in the standard library is affected as well.

    Philipp

     
  • It seems that the problem is most pronounced on Windows: On My Core i7 with Debian GNU/Linux the pff.c takes about 4 seconds to compile.

    Philipp

     
    • status: open --> closed-rejected
    • assigned_to: Philipp Klaus Krause
     
  • The new methods to obtain tree-decompositions, while giving decompositions of lower width, so far unfortunately don't seem to have a big impact on sdcc runtime.

    The new register allocator makes a lot of calls to "new" (the C++ equivalent of malloc()). "new" is slow on Windows XP. I guess the easiest way is for sdcc users to use an OS with a faster "new"", such as GNU/Linux or Windows 7. I will investigate the tree-decomposition aspect further, but I don't expect a big speed improvement on Windows XP anytime soon.

    Philipp

    P.S.: Another workaround for now is using --oldralloc, which will reduce the number of invocations of "new", but might affect code size.

     
  • With the recent fix for deeply nested branches in tree-decompositions, the new methods perform better. While Dhrystone and Whestone seem unaffected, the compilation time for Coremark is reduced by 40-60% (depending on target) on GNU/Linux. I would expect similar improvements on other OSes.

    Philipp