|
From: martin <iom...@io...> - 2015-09-25 21:04:21
|
Hello, I'm new to Cachegrind (and cache simulation in general). I'm trying to do a memory trace of my application, but only for operations that go to DRAM, that is, only if there was a LLC miss should I care. Is it possible to achieve that with Cachegrind? AFAICS, it only counts the number of events (and where they happened), but maybe it wouldn't be too hard to modify it to print the address every time a LLC miss happens. If anyone could point me to the right place to look, I would appreciate it. Thank you, Martin |
|
From: Milian W. <ma...@mi...> - 2015-09-27 18:02:27
Attachments:
signature.asc
|
On Freitag, 25. September 2015 20:27:45 CEST martin wrote: > Hello, > > I'm new to Cachegrind (and cache simulation in general). > > I'm trying to do a memory trace of my application, but only for > operations that go to DRAM, that is, only if there was a LLC miss > should I care. Is it possible to achieve that with Cachegrind? AFAICS, > it only counts the number of events (and where they happened), but > maybe it wouldn't be too hard to modify it to print the address every > time a LLC miss happens. If anyone could point me to the right place > to look, I would appreciate it. You could also use perf for this use case, if your CPU has the required performance counters. This is also going to be much faster and more accurate, as you don't need to simulate anything, but get the real counters directly from hardware: perf record --event cache-misses --call-graph dwarf <your application> Visualize it then with either a FlameGraph, ur directly with perf report. HTH -- Milian Wolff ma...@mi... http://milianw.de |
|
From: Josef W. <Jos...@gm...> - 2015-09-28 16:07:38
|
Am 27.09.2015 um 20:02 schrieb Milian Wolff: > On Freitag, 25. September 2015 20:27:45 CEST martin wrote: >> Hello, >> >> I'm new to Cachegrind (and cache simulation in general). >> >> I'm trying to do a memory trace of my application, but only for >> operations that go to DRAM, that is, only if there was a LLC miss >> should I care. Is it possible to achieve that with Cachegrind? AFAICS, >> it only counts the number of events (and where they happened), but >> maybe it wouldn't be too hard to modify it to print the address every >> time a LLC miss happens. If anyone could point me to the right place >> to look, I would appreciate it. > > You could also use perf for this use case, if your CPU has the required > performance counters. This is also going to be much faster and more accurate, > as you don't need to simulate anything, but get the real counters directly > from hardware: > > perf record --event cache-misses --call-graph dwarf <your application> Yes, using perf for real measurement is another option if you are fine with sampled results (not every access). This way you get the behavior of the real cache. But if I remember right, it is more tricky to get the data addresses printed out. On Intel something like perf record -d -e cpu/mem-loads,ldlat=100/pp <app> "-d" is writing data addresses into output, "/pp" is requesting PEBS for hardware to provide data addresses in the first place, and "ldlat=100" only counts events with load latency at least 100 cycles, which should be memory accesses. To get the addresses, you could use perf report -D | grep addr There is a field "period" in the output which shows you how many events you lost due to sampling. Theoretically, one can ask for every event via "perf record -c 1 ...", but I suppose there are most events missed due to buffer overrun in the kernel or other effects (at least the machine does not lock up here :-). Josef |
|
From: Josef W. <Jos...@gm...> - 2015-09-28 14:47:51
|
Am 25.09.2015 um 22:27 schrieb martin:
> I'm trying to do a memory trace of my application, but only for
> operations that go to DRAM, that is, only if there was a LLC miss
> should I care. Is it possible to achieve that with Cachegrind? AFAICS,
> it only counts the number of events (and where they happened), but
> maybe it wouldn't be too hard to modify it to print the address every
> time a LLC miss happens. If anyone could point me to the right place
> to look, I would appreciate it.
See cachegrind/cg_sim.c, function "cachesim_D1_doref". The "*(mL)++" is
incrementing the counter for a last-level cache miss. At this point,
you can use VG_(printf) to print out the address ("a").
If you also want to print out the address of the instruction doing the
memory
access, or whether it is a read or write, you need to modify the callers
of cachesim_D1_doref (and change the return value to tell if it's a LLC
miss).
Josef
>
> Thank you,
> Martin
>
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Valgrind-users mailing list
> Val...@li...
> https://lists.sourceforge.net/lists/listinfo/valgrind-users
>
|