|
From: Nicholas N. <n.n...@gm...> - 2023-04-03 09:29:43
|
Hi, Cachegrind has an option `--cache-sim`. If you run with `--cache-sim=yes` (the default) it tells it Cachegrind to do a full cache simulation with lots of events: Ir, I1mr, ILmr, Dr, D1mr, DLmr, Dw, D1mw, DLmw. If you run with `--cache-sim=no` then the cache simulation is disabled and you just get one event: Ir. (This is "instruction cache reads", which is equivalent to "instructions executed".) I have been using `--cache-sim=no` almost exclusively for a long time. The cache simulation done by Valgrind is an approximation of the memory hierarchy of a 2002 AMD Athlon processor. Its accuracy for a modern memory hierarchy with three levels of cache, prefetching, non-LRU replacement, and who-knows-what-else is likely to be low. If you want to accurately know about cache behaviour you'd be much better off using hardware counters via `perf` or some other profiler. But `--cache-sim=no` is still very useful because instruction execution counts are still very useful. Therefore, I propose changing the default to `--cache-sim=no`. Does anyone have any objections to this? Thanks. Nick |
|
From: Roger L. <ro...@at...> - 2023-04-03 10:33:02
|
Hi, Whether or not you do this, it might be worth updating the description on https://valgrind.org/info/tools.html with some of the information in your email. Cheers, Roger On Mon, 3 Apr 2023 at 10:30, Nicholas Nethercote <n.n...@gm...> wrote: > Hi, > > Cachegrind has an option `--cache-sim`. > > If you run with `--cache-sim=yes` (the default) it tells it Cachegrind to > do a full cache simulation with lots of events: Ir, I1mr, ILmr, Dr, D1mr, > DLmr, Dw, D1mw, DLmw. > > If you run with `--cache-sim=no` then the cache simulation is disabled and > you just get one event: Ir. (This is "instruction cache reads", which is > equivalent to "instructions executed".) > > I have been using `--cache-sim=no` almost exclusively for a long time. The > cache simulation done by Valgrind is an approximation of the memory > hierarchy of a 2002 AMD Athlon processor. Its accuracy for a modern memory > hierarchy with three levels of cache, prefetching, non-LRU replacement, and > who-knows-what-else is likely to be low. If you want to accurately know > about cache behaviour you'd be much better off using hardware counters via > `perf` or some other profiler. > > But `--cache-sim=no` is still very useful because instruction execution > counts are still very useful. > > Therefore, I propose changing the default to `--cache-sim=no`. Does anyone > have any objections to this? > > Thanks. > > Nick > > _______________________________________________ > Valgrind-developers mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-developers > |
|
From: Floyd, P. <pj...@wa...> - 2023-04-03 16:12:30
|
On 03/04/2023 11:29, Nicholas Nethercote wrote: > Hi, > > > Therefore, I propose changing the default to `--cache-sim=no`. Does > anyone have any objections to this? > No objection. I tend to use Linux perf at work because the things we want to optimize have runtimes of hours to days with 8 threads. A+ Paul |