|
From: Philippe W. <phi...@sk...> - 2017-11-11 22:27:41
|
This mail gives some measurements of the perf impact of using link time
optimisations when building valgrind with lto (NB: some hacks
documented below were used to build with -flto).
A summary of the perf impact is:
* callgrind : all perf tests are faster (between 5 to 10%).
* memcheck : many tests are faster, some are equal, one degraded
(I retried this one later, there was then no degradation).
* helgrind : many tests are faster, a few are slower.
The regression tests seem basically ok (some 30 failures mostly
due to some stacktraces differences, as the tests were also
compiled with -flto).
This experiment was done on Debian 9/amd64, gcc 6.3.0
The build was done using:
export LD=/usr/bin/gold
./autogen.sh
export CFLAGS="-flto -fuse-linker-plugin"
CFLAGS="$CFLAGS" ./configure --enable-only64bit --prefix=`pwd`/Inst
nice make -j4 2>&1 | tee m.out
The make then failed for a bunch of errors. These were (hackily)
bypassed:
* a compilation fails because the generation of libvex_guest_offsets.h
itself fails. IIUC, this file is generated by post-processing a .o
file, but with -flto, the .o file does not contain the relevant
information.
Bypassed by copying the .h file from a normal build.
* ar and ranlib commands are failing, complaining about a missing
plugin.
Bypassed by editing manually coregrind/Makefile and VEX/Makefile,
replacing AR = /usr/bin/ar by AR = gcc-ar and RANLIB = ranlib
by RANLIB = gcc-ranlib
* then linking of the tools was failing due to unknown symbols
VG_MINIMAL_SETJMP and VG_MINIMAL_LONGJMP.
Bypassed by copying libcoregrind_amd64_linux_a-m_libcsetjmp.o
from a normal build, and then making again
make libcoregrind-amd64-linux.a
* linker complained that it could not find _start, and set
a default (non working) start address.
Bypassed by copying libcoregrind_amd64_linux_a-m_main.o
from a normal build, rebuilding the coregring library again,
and relaunching make.
I guess it should not be too difficult to fix properly the
above in the build system (e.g. by not using -flto for
the 3 files causing problems and for the tests).
There are some drawbacks to using -flto: link time is significantly
longer, as the code generation happens mostly during link, and so
is repeated for each tool. The installed coregrind/VEX libs
are also using lto, which is not usable for users building
their own tools based on the VEX lib.
So, we for sure need --enable-lto configure option (off by default?)
and maybe even with lto, we might better compile
the libraries and the tools once without lto, and once with
lto (and e.g. have a --lto=yes|no option for valgrind to
choose which version of the tool to use).
Feedback ?
Philippe
perl perf/vg_perf --vg=../trunk_untouched --vg=../smallthing --tools=none,memcheck,helgrind,callgrind --reps=5 perf/ |& tee perf.out
-- Running tests in perf ----------------------------------------------
-- bigcode1 --
bigcode1 trunk_untouched:0.07s no: 1.2s (17.4x, -----) me: 2.2s (32.1x, -----) he: 1.7s (23.7x, -----) ca: 9.1s (129.7x, -----)
bigcode1 smallthing:0.07s no: 1.2s (17.1x, 1.6%) me: 2.2s (32.0x, 0.4%) he: 1.6s (23.4x, 1.2%) ca: 8.3s (119.1x, 8.1%)
-- bigcode2 --
bigcode2 trunk_untouched:0.07s no: 2.5s (35.7x, -----) me: 5.1s (72.3x, -----) he: 3.2s (46.0x, -----) ca:18.8s (269.1x, -----)
bigcode2 smallthing:0.07s no: 2.5s (35.1x, 1.6%) me: 5.0s (71.4x, 1.2%) he: 3.2s (45.1x, 1.9%) ca:18.0s (257.7x, 4.2%)
-- bz2 --
bz2 trunk_untouched:0.43s no: 1.5s ( 3.5x, -----) me: 4.5s (10.5x, -----) he: 6.7s (15.6x, -----) ca:10.4s (24.2x, -----)
bz2 smallthing:0.43s no: 1.5s ( 3.5x, -0.7%) me: 4.4s (10.2x, 2.7%) he: 6.5s (15.2x, 2.2%) ca: 9.3s (21.7x, 10.1%)
-- fbench --
fbench trunk_untouched:0.14s no: 0.8s ( 5.9x, -----) me: 2.8s (19.8x, -----) he: 1.9s (13.4x, -----) ca: 4.0s (28.4x, -----)
fbench smallthing:0.14s no: 0.8s ( 5.9x, 0.0%) me: 2.8s (19.8x, 0.0%) he: 1.8s (12.8x, 4.8%) ca: 3.5s (25.2x, 11.3%)
-- ffbench --
ffbench trunk_untouched:0.15s no: 0.9s ( 5.9x, -----) me: 2.6s (17.2x, -----) he: 3.4s (22.8x, -----) ca: 1.5s (10.2x, -----)
ffbench smallthing:0.15s no: 0.9s ( 5.9x, 0.0%) me: 2.6s (17.2x, 0.0%) he: 3.3s (22.3x, 2.3%) ca: 1.4s ( 9.6x, 5.9%)
-- heap --
heap trunk_untouched:0.05s no: 0.6s (11.8x, -----) me: 3.7s (73.2x, -----) he: 5.0s (100.6x, -----) ca: 4.9s (98.0x, -----)
heap smallthing:0.05s no: 0.6s (11.8x, 0.0%) me: 3.5s (69.2x, 5.5%) he: 5.2s (104.0x, -3.4%) ca: 4.3s (86.4x, 11.8%)
-- heap_pdb4 --
heap_pdb4 trunk_untouched:0.06s no: 0.6s (10.5x, -----) me: 5.9s (98.2x, -----) he: 5.7s (94.3x, -----) ca: 5.2s (87.3x, -----)
heap_pdb4 smallthing:0.06s no: 0.6s (10.7x, -1.6%) me: 5.5s (91.5x, 6.8%) he: 5.8s (96.0x, -1.8%) ca: 4.7s (78.8x, 9.7%)
-- many-loss-records --
many-loss-records trunk_untouched:0.01s no: 0.2s (22.0x, -----) me: 1.0s (104.0x, -----) he: 0.8s (83.0x, -----) ca: 0.8s (77.0x, -----)
many-loss-records smallthing:0.01s no: 0.2s (21.0x, 4.5%) me: 0.9s (94.0x, 9.6%) he: 0.9s (89.0x, -7.2%) ca: 0.7s (70.0x, 9.1%)
-- many-xpts --
many-xpts trunk_untouched:0.02s no: 0.3s (13.5x, -----) me: 1.2s (58.0x, -----) he: 1.4s (69.5x, -----) ca: 1.9s (94.0x, -----)
many-xpts smallthing:0.02s no: 0.3s (13.0x, 3.7%) me: 1.1s (53.5x, 7.8%) he: 1.4s (71.0x, -2.2%) ca: 1.6s (82.0x, 12.8%)
-- memrw --
memrw trunk_untouched:0.04s no: 0.4s ( 9.2x, -----) me: 0.9s (21.5x, -----) he: 2.3s (58.2x, -----) ca: 1.9s (47.0x, -----)
memrw smallthing:0.04s no: 0.3s ( 8.8x, 5.4%) me: 0.9s (22.0x, -2.3%) he: 2.2s (55.5x, 4.7%) ca: 1.7s (41.5x, 11.7%)
-- sarp --
sarp trunk_untouched:0.02s no: 0.2s (12.0x, -----) me: 1.5s (77.0x, -----) he: 3.4s (169.0x, -----) ca: 1.3s (63.0x, -----)
sarp smallthing:0.02s no: 0.2s (12.0x, 0.0%) me: 1.5s (77.0x, 0.0%) he: 3.3s (166.0x, 1.8%) ca: 1.1s (56.0x, 11.1%)
-- tinycc --
tinycc trunk_untouched:0.10s no: 0.9s ( 9.2x, -----) me: 6.7s (66.8x, -----) he: 6.7s (66.6x, -----) ca: 7.3s (72.8x, -----)
tinycc smallthing:0.10s no: 0.9s ( 9.2x, 0.0%) me: 6.5s (65.5x, 1.9%) he: 6.5s (65.1x, 2.3%) ca: 6.6s (66.0x, 9.3%)
-- Finished tests in perf ----------------------------------------------
== 12 programs, 96 timings =================
|