|
From: Nicholas N. <n.n...@gm...> - 2023-04-03 09:29:43
|
Hi, Cachegrind has an option `--cache-sim`. If you run with `--cache-sim=yes` (the default) it tells it Cachegrind to do a full cache simulation with lots of events: Ir, I1mr, ILmr, Dr, D1mr, DLmr, Dw, D1mw, DLmw. If you run with `--cache-sim=no` then the cache simulation is disabled and you just get one event: Ir. (This is "instruction cache reads", which is equivalent to "instructions executed".) I have been using `--cache-sim=no` almost exclusively for a long time. The cache simulation done by Valgrind is an approximation of the memory hierarchy of a 2002 AMD Athlon processor. Its accuracy for a modern memory hierarchy with three levels of cache, prefetching, non-LRU replacement, and who-knows-what-else is likely to be low. If you want to accurately know about cache behaviour you'd be much better off using hardware counters via `perf` or some other profiler. But `--cache-sim=no` is still very useful because instruction execution counts are still very useful. Therefore, I propose changing the default to `--cache-sim=no`. Does anyone have any objections to this? Thanks. Nick |
|
From: David F. <fa...@kd...> - 2023-04-03 11:56:17
|
[removing valgrind-developers, since I guess I can't post there] On lundi 3 avril 2023 11:29:25 CEST Nicholas Nethercote wrote: > I have been using `--cache-sim=no` almost exclusively for a long time. The > cache simulation done by Valgrind is an approximation of the memory > hierarchy of a 2002 AMD Athlon processor. Its accuracy for a modern memory > hierarchy with three levels of cache, prefetching, non-LRU replacement, and > who-knows-what-else is likely to be low. If you want to accurately know > about cache behaviour you'd be much better off using hardware counters via > `perf` or some other profiler. > > But `--cache-sim=no` is still very useful because instruction execution > counts are still very useful. > > Therefore, I propose changing the default to `--cache-sim=no`. Does anyone > have any objections to this? I agree that simulating a cache from 2002 isn't very useful. But then, what's the difference between `cachegrind --cache-sim=no` and `callgrind`? https://accu.org/journals/overload/20/111/floyd_1886/ says "The main differences are that Callgrind has more information about the callstack whilst cachegrind gives more information about cache hit rates." Wouldn't one want callstacks? (if this means stack traces). I know I must be missing something, thanks for enlightening me. -- David Faure, fa...@kd..., http://www.davidfaure.fr Working on KDE Frameworks 5 |
|
From: Nicholas N. <n.n...@gm...> - 2023-04-03 21:47:06
|
On Mon, 3 Apr 2023 at 21:36, David Faure <fa...@kd...> wrote: > > But then, what's the difference between `cachegrind --cache-sim=no` > and `callgrind`? > > https://accu.org/journals/overload/20/111/floyd_1886/ says > "The main differences are that Callgrind has more information about the > callstack whilst cachegrind gives more information about cache hit rates." > > Wouldn't one want callstacks? (if this means stack traces). > I know I must be missing something, thanks for enlightening me. > Callgrind is a forked and extended version of Cachegrind. It also simulates a cache, with a slightly different simulation to Cachegrind's. The fact that both tools exist is due to historical reasons; if starting from scratch today you wouldn't deliberately split them. Call stacks are often useful (I regularly use Callgrind as well as Cachegrind) but they aren't always necessary. Without them, Cachegrind runs faster than Callgrind and produces smaller data files. Cachegrind also supports diffing and merging different files, while Callgrind does not. Nick |
|
From: David F. <fa...@kd...> - 2023-04-04 09:25:06
|
On lundi 3 avril 2023 23:46:46 CEST Nicholas Nethercote wrote: > On Mon, 3 Apr 2023 at 21:36, David Faure <fa...@kd...> wrote: > > But then, what's the difference between `cachegrind --cache-sim=no` > > and `callgrind`? > > > > https://accu.org/journals/overload/20/111/floyd_1886/ says > > "The main differences are that Callgrind has more information about the > > callstack whilst cachegrind gives more information about cache hit rates." > > > > Wouldn't one want callstacks? (if this means stack traces). > > I know I must be missing something, thanks for enlightening me. > > Callgrind is a forked and extended version of Cachegrind. It also simulates > a cache, with a slightly different simulation to Cachegrind's. The fact > that both tools exist is due to historical reasons; if starting from > scratch today you wouldn't deliberately split them. Thanks for the information. This is indeed confusing - like anything that is "due to historical reasons" ;-) > Call stacks are often useful (I regularly use Callgrind as well as > Cachegrind) but they aren't always necessary. Without them, Cachegrind runs > faster than Callgrind and produces smaller data files. Cachegrind also > supports diffing and merging different files, while Callgrind does not. OK. I thought call stacks were mandatory for any tool to be useful (they certainly are for KCachegrind (*)), but I now found the documentation on cg_annotate. But then, with no cache simulation and no call stacks, what's left in `cachegrind --cache-sim=no`? (*) This naming adds to the confusion: kcachegrind requires callgrind, it can't work with cachegrind... I know, historical reasons :-) -- David Faure, fa...@kd..., http://www.davidfaure.fr Working on KDE Frameworks 5 |
|
From: Nicholas N. <n.n...@gm...> - 2023-04-04 10:11:36
|
On Tue, 4 Apr 2023 at 19:24, David Faure <fa...@kd...> wrote: > > But then, with no cache simulation and no call stacks, what's left in > `cachegrind --cache-sim=no`? > >From the email that started this thread: If you run with `--cache-sim=no` then the cache simulation is disabled and > you just get one event: Ir. (This is "instruction cache reads", which is > equivalent to "instructions executed".) > |
|
From: David F. <fa...@kd...> - 2023-04-04 10:49:32
|
On mardi 4 avril 2023 12:11:18 CEST Nicholas Nethercote wrote: > On Tue, 4 Apr 2023 at 19:24, David Faure <fa...@kd...> wrote: > > But then, with no cache simulation and no call stacks, what's left in > > `cachegrind --cache-sim=no`? > > From the email that started this thread: > > If you run with `--cache-sim=no` then the cache simulation is disabled and > > you just get one event: Ir. (This is "instruction cache reads", which is > > equivalent to "instructions executed".) Ah, right, sorry. So to summarize the big picture: cachegrind -> instructions count, without call stacks, useful for overall numbers or with cg_annotate callgrind -> instructions count, with call stacks, best viewed in kcachegrind I wish those two could do cycles and not just instructions, but I guess this requires a good cache simulator again, back to square one ;) (perf does cycles, but doesn't give exact number of method calls, that's one benefit of cachegrind/callgrind) -- David Faure, fa...@kd..., http://www.davidfaure.fr Working on KDE Frameworks 5 |