|
From: Philippe W. <phi...@sk...> - 2014-07-23 18:19:18
|
Find below some comparison numbers of a recent svn valgrind compiled on x86 (an old pentium). In general, gcc 4.9.1 produces faster code, but not always (e.g. memcheck bz2 is significantly faster but helgrind bz2 slows down significantly). callgrind benefits consistently from lto. This means callgrind can probably be optimised by having more inlining done between (some) source files. It might be that the below numbers are heavily influenced by the small pentium cache, and the figures might differ on more modern cpu. Link time optimisation was relatively easy to setup : only a few hacks. I have attached the (hacky) patch to have valgrind compiled with lto. (it of course not usable to commit, e.g. missing configure tests and similar). I could not use -fno-fat-lto-objects : I had problems at linking time, so I am not completely sure that lto was used to its full power. Philippe -- Running tests in perf ---------------------------------------------- -- bigcode1 -- bigcode1 trunk_untouched:0.18s no: 3.7s (20.3x, -----) me: 6.7s (37.1x, -----) ca:35.7s (198.4x, -----) he: 4.7s (25.8x, -----) bigcode1 491trunk_untouched:0.18s no: 3.8s (20.9x, -3.0%) me: 6.6s (36.7x, 0.9%) ca:34.1s (189.2x, 4.6%) he: 4.7s (25.9x, -0.4%) bigcode1 lto491trunk_untouched:0.18s no: 3.6s (20.0x, 1.6%) me: 6.4s (35.7x, 3.7%) ca:32.7s (181.6x, 8.5%) he: 4.4s (24.7x, 4.5%) -- bigcode2 -- bigcode2 trunk_untouched:0.18s no: 8.7s (48.3x, -----) me:18.0s (99.9x, -----) ca:58.5s (325.2x, -----) he:12.4s (68.7x, -----) bigcode2 491trunk_untouched:0.18s no: 8.6s (48.0x, 0.7%) me:17.1s (95.3x, 4.6%) ca:56.7s (315.2x, 3.1%) he:11.8s (65.4x, 4.8%) bigcode2 lto491trunk_untouched:0.18s no: 8.2s (45.4x, 6.1%) me:16.7s (92.8x, 7.1%) ca:54.2s (301.1x, 7.4%) he:11.3s (62.8x, 8.6%) -- bz2 -- bz2 trunk_untouched:1.19s no: 5.1s ( 4.3x, -----) me:20.1s (16.9x, -----) ca:48.6s (40.8x, -----) he:35.7s (30.0x, -----) bz2 491trunk_untouched:1.19s no: 4.8s ( 4.0x, 5.9%) me:18.4s (15.4x, 8.7%) ca:46.3s (38.9x, 4.7%) he:42.5s (35.7x,-18.9%) bz2 lto491trunk_untouched:1.19s no: 4.8s ( 4.0x, 7.0%) me:17.3s (14.6x, 13.7%) ca:43.3s (36.4x, 10.7%) he:40.2s (33.8x,-12.8%) -- fbench -- fbench trunk_untouched:0.64s no: 5.6s ( 8.8x, -----) me:13.1s (20.5x, -----) ca:19.6s (30.7x, -----) he:14.2s (22.2x, -----) fbench 491trunk_untouched:0.64s no: 5.7s ( 9.0x, -1.8%) me:12.7s (19.8x, 3.6%) ca:19.4s (30.4x, 1.0%) he:15.3s (24.0x, -7.9%) fbench lto491trunk_untouched:0.64s no: 5.6s ( 8.8x, 0.4%) me:13.2s (20.6x, -0.5%) ca:18.6s (29.1x, 5.3%) he:15.0s (23.4x, -5.3%) -- ffbench -- ffbench trunk_untouched:2.06s no: 5.5s ( 2.7x, -----) me: 9.4s ( 4.6x, -----) ca: 8.3s ( 4.1x, -----) he:30.8s (14.9x, -----) ffbench 491trunk_untouched:2.06s no: 5.7s ( 2.8x, -4.0%) me: 9.8s ( 4.8x, -4.7%) ca: 8.6s ( 4.2x, -3.4%) he:29.9s (14.5x, 2.6%) ffbench lto491trunk_untouched:2.06s no: 5.6s ( 2.7x, -2.6%) me: 9.7s ( 4.7x, -3.1%) ca: 8.2s ( 4.0x, 1.8%) he:27.6s (13.4x, 10.3%) -- heap -- heap trunk_untouched:0.21s no: 2.2s (10.5x, -----) me:11.9s (56.6x, -----) ca:22.8s (108.6x, -----) he:24.9s (118.6x, -----) heap 491trunk_untouched:0.21s no: 2.6s (12.6x,-19.5%) me:11.0s (52.3x, 7.7%) ca:23.1s (110.1x, -1.4%) he:24.4s (116.0x, 2.2%) heap lto491trunk_untouched:0.21s no: 2.4s (11.6x,-10.4%) me:11.1s (53.0x, 6.4%) ca:21.6s (102.8x, 5.3%) he:23.7s (112.9x, 4.9%) -- heap_pdb4 -- heap_pdb4 trunk_untouched:0.25s no: 2.4s ( 9.4x, -----) me:23.1s (92.2x, -----) ca:24.6s (98.6x, -----) he:27.9s (111.6x, -----) heap_pdb4 491trunk_untouched:0.25s no: 2.8s (11.2x,-19.1%) me:21.7s (86.7x, 6.0%) ca:25.2s (100.8x, -2.2%) he:26.9s (107.7x, 3.5%) heap_pdb4 lto491trunk_untouched:0.25s no: 2.6s (10.3x, -9.4%) me:21.4s (85.5x, 7.3%) ca:22.8s (91.1x, 7.6%) he:29.8s (119.2x, -6.7%) -- many-loss-records -- many-loss-records trunk_untouched:0.02s no: 0.7s (34.5x, -----) me: 3.0s (152.5x, -----) ca: 3.3s (163.5x, -----) he: 3.7s (184.0x, -----) many-loss-records 491trunk_untouched:0.02s no: 0.9s (45.0x,-30.4%) me: 3.1s (154.0x, -1.0%) ca: 3.4s (171.0x, -4.6%) he: 4.0s (199.5x, -8.4%) many-loss-records lto491trunk_untouched:0.02s no: 0.8s (39.0x,-13.0%) me: 2.9s (145.5x, 4.6%) ca: 3.1s (153.0x, 6.4%) he: 3.6s (181.0x, 1.6%) -- many-xpts -- many-xpts trunk_untouched:0.09s no: 1.0s (11.1x, -----) me: 4.7s (52.1x, -----) ca: 8.9s (98.6x, -----) he: 9.9s (110.4x, -----) many-xpts 491trunk_untouched:0.09s no: 1.2s (13.4x,-21.0%) me: 4.5s (49.6x, 4.9%) ca: 9.0s (99.8x, -1.2%) he:10.4s (115.2x, -4.3%) many-xpts lto491trunk_untouched:0.09s no: 1.1s (12.0x, -8.0%) me: 5.0s (55.9x, -7.2%) ca: 8.2s (91.4x, 7.2%) he: 9.9s (109.8x, 0.6%) -- sarp -- sarp trunk_untouched:0.06s no: 1.0s (16.0x, -----) me: 5.7s (94.7x, -----) ca: 5.4s (89.5x, -----) he:27.3s (455.0x, -----) sarp 491trunk_untouched:0.06s no: 1.0s (17.3x, -8.3%) me: 5.5s (92.3x, 2.5%) ca: 5.6s (92.8x, -3.7%) he:27.9s (465.3x, -2.3%) sarp lto491trunk_untouched:0.06s no: 1.0s (16.3x, -2.1%) me: 5.3s (87.7x, 7.4%) ca: 5.1s (85.2x, 4.8%) he:27.3s (455.5x, -0.1%) -- tinycc -- tinycc trunk_untouched:0.40s no: 3.5s ( 8.6x, -----) me:24.4s (60.9x, -----) ca:32.4s (81.0x, -----) he:40.5s (101.1x, -----) tinycc 491trunk_untouched:0.40s no: 3.6s ( 9.1x, -5.5%) me:23.3s (58.2x, 4.5%) ca:32.9s (82.2x, -1.5%) he:44.2s (110.4x, -9.2%) tinycc lto491trunk_untouched:0.40s no: 3.5s ( 8.8x, -1.4%) me:24.8s (62.1x, -1.8%) ca:30.9s (77.2x, 4.7%) he:43.0s (107.4x, -6.2%) -- Finished tests in perf ---------------------------------------------- == 11 programs, 132 timings ================= |