Z80 very long compile even with low --max-allocs-per-node

The Small Device C Compiler (SDCC), targeting 8-bit architectures

Brought to you by: benshi, drdani, epetrich, jesusc, and 3 others

#1955 Z80 very long compile even with low --max-allocs-per-node

Status: closed-fixed

Owner: Philipp Klaus Krause

Labels: z80 port (190)

Category: Z80

Priority: 3

Updated: 2017-11-14

Created: 2012-02-29

Creator: Shiru

Private: No

I ran into a known bug that makes SDCC completely unusable for me, so I thought to provide details just in case they could help to fix it eventually. It is in latest release version and snapshot builds.

The code was ~400 lines long when it started to compile ~10 seconds. At ~550 lines it compiles a minute. With --oldralloc it compiles fast, but one of functions does not work properly (it works when compiled without the option).

Code contains a long while(1) block with a break by a condition (main.c:531). If I remove the condition, compile time reduces greatly. If I move it to the while(..), the problem is back again.

sdcc -mz80 --code-loc 0x0006 --data-loc 0 --no-std-crt0 -I..\evosdk ..\evosdk\crt0.rel ..\evosdk\evo.rel --opt-code-speed --fomit-frame-pointer main.c -o %temp%\out.ihx

I provide all the files involved into the compile, hopefully I didn't miss something.

As a side note, in the past I sucessfully used 2.9.0 for a larger (2300 lines) and more complex project, perharps the problem was introduced later. I also attempted to compile current project with 3.0.0, it compiled fast, but didn't started at all, haven't investigated why.

Discussion

Shiru - 2012-02-29

Project files

sdcc_long_compile_time.zip

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Philipp Klaus Krause - 2012-02-29

When cmpilation speed matters, an alternative to --oldralloc is using --max-allocs-per-node with an argumnet lower than the default 3000.

Philipp

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Shiru - 2012-02-29

Forgot to mention this, I also attempted to use --max-allocs-per-node instead of -oldralloc with values 2000, 1000, 500, 100, and even 10. With 100 and 10 it is ~10 seconds, which is still unacceptable for a project that is so small. With greater values it is way longer, I didn't count how much because it is impractical anyway.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Philipp Klaus Krause - 2012-03-02

I fixed two --oldralloc bugs in revision #7393. Can you test if this fixes --oldralloc for you?

Philipp

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Shiru - 2012-03-02

I can't test it with exactly the same code because the project is in active development (I moved back to 2.9.0 for now). However, in its current state it works as it should when compiled with --oldralloc in revision #7393.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Philipp Klaus Krause - 2012-03-03

priority: 5 --> 3

summary: Z80 very long compile time or incorrect resulting code --> Z80 very long compile even with low --max-allocs-per-node
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Philipp Klaus Krause - 2012-03-03

OK, I'll leave this open as a reminder about the too long compile time even for low values of --max-allocs-per-node.

Philipp

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Philipp Klaus Krause - 2013-07-16

Category: --> Z80
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Philipp Klaus Krause - 2014-12-29

assigned_to: Philipp Klaus Krause
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Philipp Klaus Krause - 2015-01-03

So far progress on this issue looks good: An issue regarding maximal I-chains in Thorup's heuristic for obtaining tree-decompositions has been identified as the most likely cause. An evaluation of techniques from "Treewidth computations I" and the unpublished "Treewidth computations III" on control-flow graphs obtained from various benchmarks so far gave promising results regarding both the width of the resulting tree-decompositions and the runtime of the algorithms for obtaining tree-decompositions.
However, the impact on compiler runtime and code quality has not been evaluated yet (this requires integration of the new techniques into sdcc, so far it was only done on the dumped cfgs obtained using --dump-graphs).

Philipp

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Philipp Klaus Krause - 2017-10-31

After years in the shadows, work on fixing this bug now continues visible to all in the nothorup branch:

https://svn.code.sf.net/p/sdcc/code/branches/nothorup/sdcc

Philipp

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Philipp Klaus Krause - 2017-11-13

status: open --> closed-fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Philipp Klaus Krause - 2017-11-14

As of [r10185], the algorithm for computing tree-decomposiition is chosen per function. This avoids some compilaton time regressions from the previous change, and also increases the compilation time reductions further for some workloads.

However, there is further potential. In particular in:

Speeding up the calculation of the cost function

Parallelizing the register allocator

Research into aspects of tree-decompositions other than their width

However, the current situation seems got enough to me to close the bug report for now. I have attached the change in compilation time for the Whetstone, Dhrystone, Coremark benchmarks when using default optimization options. Effects on code size and benchmark score is rather small (below 3%) and looks mostly like noise, though the general tendency is towards improvements.
At other optimization settings, the improvement might be smaller or bigger, or there might be a larger improvement in code size / speed.

Philipp

time.eps
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.