|
From: Julian S. <js...@ac...> - 2017-12-07 18:10:20
|
tl;dr Unavoidable but small Memcheck slowdowns to reduce false positives. As compilers have become more aggressive over the past few years, Memcheck's false-positive rate for undefined-value-error reports has been creeping up. This has particularly become a problem with GCC 6 and LLVM 4. There are various causes of these false errors. Many of them have to do with the fact that Memcheck does not track definedness exactly through some kinds of integer arithmetic operations. For example, for addition, it assumes worst-case carry propogation, so it believes that any undefined bits in either operand will make all higher-order bits in the result be undefined. There is similar shortcutting in many other places too. These shortcuts all have the property that they are "safe", in the sense that they underestimate the definedness of computations. This means they won't lead to false negatives. But they are now causing false positives. The trunk has for a while had --expensive-definedness-checks=no|yes [no], which, when enabled, gives exactly correct behaviour for integer add, subtract, and equality comparison. I propose to enable this by default. As it stands, this gives a performance hit of up to 30%. At https://bugs.kde.org/show_bug.cgi?id=387664 I have a patch which reduces that performance hit substantially, by using a couple of optimisations described in that bug. It is a shame to lose performance, but I consider that maintaining a very low false positive rate is of higher priority. I believe the actual perf loss for run-of-the-mill integer code will be in the range 1.5% to 3%. As a benchmark, I measured gcc-8 cc1 compiling a 6500 line program (perf/bz2.c) (command line below). Results are --expensive-definedness-checks= setting | | time(s) code bytes produced ratio to orig no [old default] 73.2 (+ 0.0%) 60,529,262 (+ 0.0%) 16.7:1 auto [new default] 74.6 (+ 1.9%) 62,544,067 (+ 3.3%) 17.3:1 yes 82.1 (+11.2%) 72,180,432 (+19.2%) 19.9:1 So for gcc-8 the perf hit is 1.9%. I also measured with the perf/vg_perf framework. The numbers are below. They are varied but on the whole quite a bit worse than 1.9%. I think this is partly the cost of the new instrumentation-time analysis pass (see bug 387664), partly a result of different frequencies of integer add and compare in the different benchmarks, and partly dependent on how effective the abovementioned analysis pass is finding operations for which the expensive and cheap instrumentation schemes give the same result. I regret the loss of performance, but I cannot see any other feasible way to proceed. J gcc test run command: ./vg-in-place --stats=yes \ /home/sewardj/Tools/InstGcc8X/libexec/gcc/x86_64-pc-linux-gnu/8.0.0/cc1 \ -quiet -I . perf/bz2.c -quiet -dumpbase bz2.c -mtune=generic \ -march=x86-64 -auxbase-strip bz2.c -O -o bz2.s perf test results (--reps=10, very quiet machine): -- Running tests in ./perf -------------------------------------------- -- bigcode1 -- bigcode1 trunk :0.07s no: 1.3s (18.7x, -----) me: 2.4s (34.9x, -----) bigcode1 patched :0.07s no: 1.3s (18.7x, 0.0%) me: 2.7s (38.6x,-10.7%) -- bigcode2 -- bigcode2 trunk :0.08s no: 2.8s (34.4x, -----) me: 5.7s (71.1x, -----) bigcode2 patched :0.08s no: 2.8s (34.4x, 0.0%) me: 6.6s (82.8x,-16.3%) -- bz2 -- bz2 trunk :0.52s no: 1.7s ( 3.2x, -----) me: 4.8s ( 9.2x, -----) bz2 patched :0.52s no: 1.7s ( 3.2x, 0.0%) me: 5.2s (10.1x, -9.2%) -- fbench -- fbench trunk :0.16s no: 0.9s ( 5.4x, -----) me: 3.0s (18.6x, -----) fbench patched :0.16s no: 0.9s ( 5.4x, 0.0%) me: 3.0s (18.7x, -0.3%) -- ffbench -- ffbench trunk :0.17s no: 0.9s ( 5.6x, -----) me: 2.7s (16.1x, -----) ffbench patched :0.17s no: 0.9s ( 5.6x, 0.0%) me: 2.9s (16.8x, -4.0%) -- heap -- heap trunk :0.06s no: 0.7s (11.0x, -----) me: 4.0s (66.5x, -----) heap patched :0.06s no: 0.7s (10.8x, 1.5%) me: 4.0s (67.0x, -0.8%) -- heap_pdb4 -- heap_pdb4 trunk :0.07s no: 0.7s (10.3x, -----) me: 6.3s (89.7x, -----) heap_pdb4 patched :0.07s no: 0.7s (10.3x, 0.0%) me: 6.3s (90.3x, -0.6%) -- many-loss-records -- many-loss-records trunk :0.01s no: 0.2s (20.0x, -----) me: 1.0s (97.0x, -----) many-loss-records patched :0.01s no: 0.2s (20.0x, 0.0%) me: 1.0s (98.0x, -1.0%) -- many-xpts -- many-xpts trunk :0.03s no: 0.3s ( 8.7x, -----) me: 1.2s (40.3x, -----) many-xpts patched :0.03s no: 0.3s ( 8.7x, 0.0%) me: 1.2s (41.3x, -2.5%) -- memrw -- memrw trunk :0.04s no: 0.3s ( 8.8x, -----) me: 0.8s (20.5x, -----) memrw patched :0.04s no: 0.4s ( 9.0x, -2.9%) me: 0.9s (23.5x,-14.6%) -- sarp -- sarp trunk :0.02s no: 0.2s (12.0x, -----) me: 1.7s (83.5x, -----) sarp patched :0.02s no: 0.2s (11.5x, 4.2%) me: 1.8s (88.5x, -6.0%) -- tinycc -- tinycc trunk :0.12s no: 1.0s ( 8.3x, -----) me: 7.1s (59.4x, -----) tinycc patched :0.12s no: 1.0s ( 8.3x, 0.0%) me: 7.2s (60.4x, -1.7%) -- Finished tests in ./perf -------------------------------------------- |