|
From: Josef W. <Jos...@gm...> - 2011-11-18 16:16:59
|
Hi, I am currently playing with different strategies to make the cache simulator faster (not topic of this email). For that, the ugly huge macro currently used in cachegrind makes it a little difficult. The attached patch converts the simulation routine from using the macro into regular functions to be inlined by the compiler. There is absolutely nothing changed otherwise. For my system (gcc 4.6.1, amd64), it actually gets a little bit faster some times. I have to say that the results are a bit unstable between runs. I would be interested if this is similar on other systems. Before: Valgrind 3.7.0: > perl perf/vg_perf --tools=cachegrind perf/ -- Running tests in perf ---------------------------------------------- bigcode1 valgrind :0.14s ca: 7.1s (50.5x, -----) bigcode2 valgrind :0.13s ca:10.9s (84.2x, -----) bz2 valgrind :0.67s ca:19.2s (28.7x, -----) fbench valgrind :0.29s ca: 5.5s (19.1x, -----) ffbench valgrind :0.26s ca: 6.2s (24.0x, -----) heap valgrind :0.09s ca: 5.8s (64.9x, -----) sarp valgrind :0.04s ca: 1.5s (37.5x, -----) tinycc valgrind :0.24s ca:13.5s (56.2x, -----) -- Finished tests in perf ---------------------------------------------- With attached patch applied: > perl perf/vg_perf --tools=cachegrind perf/ -- Running tests in perf ---------------------------------------------- bigcode1 valgrind :0.15s ca: 6.9s (45.7x, -----) bigcode2 valgrind :0.15s ca:10.9s (72.5x, -----) bz2 valgrind :0.66s ca:19.9s (30.1x, -----) fbench valgrind :0.28s ca: 5.5s (19.5x, -----) ffbench valgrind :0.27s ca: 6.5s (24.2x, -----) heap valgrind :0.11s ca: 5.7s (52.1x, -----) sarp valgrind :0.04s ca: 1.4s (34.5x, -----) tinycc valgrind :0.22s ca:13.4s (60.7x, -----) -- Finished tests in perf ---------------------------------------------- Josef |