|
From: Abdel-Hameed Abdel-S. B. <sha...@gm...> - 2007-10-26 16:54:29
|
Below is a detailed output a valgrind-cachegrind run. I am questioning the cache miss rate results for the L2 cache. The following is the results relisted here for clarity. ==18315== I refs: 561,873 ==18315== I1 misses: 3,089 ==18315== L2i misses: 1,497 ==18315== I1 miss rate: 0.54% ==18315== L2i miss rate: 0.26% ==18315== D refs: 295,619 (209,070 rd + 86,549 wr) ==18315== D1 misses: 4,752 ( 3,948 rd + 804 wr) ==18315== L2d misses: 2,584 ( 1,960 rd + 624 wr) ==18315== D1 miss rate: 1.6% ( 1.8% + 0.9% ) ==18315== L2d miss rate: 0.8% ( 0.9% + 0.7% ) ==18315== L2 refs: 7,841 ( 7,037 rd + 804 wr) ==18315== L2 misses: 4,081 ( 3,457 rd + 624 wr) ==18315== L2 miss rate: 0.4% ( 0.4% + 0.7% ) <val...@li...>L2 refs =D1misses + I1misses. L2 misses = L2d misses + L2i misses. The L2 miss rate equation that I know is (Please correct me if I am wrong on this): L2 miss rate = L2 misses/ L2 refs * 100 So, L2 miss rate should be 52% in this case. L2d miss rate should be also 54.4% assuming only we look at the data misses and refs only. Why in the world, these are no the numbers reported by the valgrind? <val...@li...> $ valgrind --tool=cachegrind -v ls ==18315== Cachegrind, an I1/D1/L2 cache profiler. ==18315== Copyright (C) 2002-2006, and GNU GPL'd, by Nicholas Nethercote et al. ==18315== Using LibVEX rev 1658, a library for dynamic binary translation. ==18315== Copyright (C) 2004-2006, and GNU GPL'd, by OpenWorks LLP. ==18315== Using valgrind-3.2.1, a dynamic binary instrumentation framework. ==18315== Copyright (C) 2000-2006, and GNU GPL'd, by Julian Seward et al. ==18315== --18315-- Command line --18315-- ls --18315-- Startup, with flags: --18315-- --tool=cachegrind --18315-- -v --18315-- Contents of /proc/version: --18315-- Linux version 2.6.18-8.1.4.el5 (bre...@hs...) (gcc version 4.1.1 20070105 (Red Hat 4.1.1-52)) #1 SMP Fri May 4 22:15:13 EDT 2007 --18315-- Arch and hwcaps: X86, x86-sse1-sse2 --18315-- Valgrind library directory: /usr/lib/valgrind --18315-- warning: Pentium 4 with 12 KB micro-op instruction trace cache --18315-- Simulating a 16 KB I-cache with 32 B lines ==18315== Cache configuration used: ==18315== I1: 16384B, 8-way, 32B lines ==18315== D1: 16384B, 8-way, 64B lines ==18315== L2: 262144B, 4-way, 64B lines --18315-- Reading syms from /bin/ls (0x8048000) --18315-- object doesn't have a symbol table --18315-- Reading syms from /usr/lib/valgrind/x86-linux/cachegrind (0x38000000) --18315-- object doesn't have a dynamic symbol table --18315-- Reading syms from /lib/ld-2.5.so (0x4303D000) --18315-- Reading syms from /usr/lib/valgrind/x86-linux/vgpreload_core.so (0x4001000) --18315-- Reading syms from /lib/librt-2.5.so (0x43DF3000) --18315-- Reading syms from /lib/libacl.so.1.1.0 (0x44706000) --18315-- object doesn't have a symbol table --18315-- Reading syms from /lib/libselinux.so.1 (0x43073000) --18315-- object doesn't have a symbol table --18315-- Reading syms from /lib/libc-2.5.so (0x43A0C000) --18315-- Reading syms from /lib/libpthread-2.5.so (0x43B7A000) --18315-- Reading syms from /lib/libattr.so.1.1.0 (0x43A02000) --18315-- object doesn't have a symbol table --18315-- Reading syms from /lib/libdl-2.5.so (0x43B74000) --18315-- Reading syms from /lib/libsepol.so.1 (0x430D7000) --18315-- object doesn't have a symbol table ==18315== <val...@li...> ==18315== I refs: 561,873 ==18315== I1 misses: 3,089 ==18315== L2i misses: 1,497 ==18315== I1 miss rate: 0.54% ==18315== L2i miss rate: 0.26% ==18315== ==18315== D refs: 295,619 (209,070 rd + 86,549 wr) ==18315== D1 misses: 4,752 ( 3,948 rd + 804 wr) ==18315== L2d misses: 2,584 ( 1,960 rd + 624 wr) ==18315== D1 miss rate: 1.6% ( 1.8% + 0.9% ) ==18315== L2d miss rate: 0.8% ( 0.9% + 0.7% ) ==18315== ==18315== L2 refs: 7,841 ( 7,037 rd + 804 wr) ==18315== L2 misses: 4,081 ( 3,457 rd + 624 wr) ==18315== L2 miss rate: 0.4% ( 0.4% + 0.7% ) --18315-- --18315-- cachegrind: distinct files: 1 --18315-- cachegrind: distinct fns: 220 --18315-- cachegrind: distinct lines: 220 --18315-- cachegrind: distinct instrs:3365 --18315-- cachegrind: debug lookups : 20807 --18315-- cachegrind: with full info: 0.0% (0) --18315-- cachegrind: with file/line info: 0.0% (0) --18315-- cachegrind: with fn name info: 81.5% (16971) --18315-- cachegrind: with zero info: 18.4% (3836) --18315-- cachegrind: string table size: 220 --18315-- cachegrind: CC table size: 220 --18315-- cachegrind: InstrInfo table size: 3365 --18315-- translate: fast SP updates identified: 0 ( --%) --18315-- translate: generic_known SP updates identified: 0 ( --%) --18315-- translate: generic_unknown SP updates identified: 0 ( --%) --18315-- tt/tc: 6,747 tt lookups requiring 6,894 probes --18315-- tt/tc: 6,747 fast-cache updates, 2 flushes --18315-- transtab: new 3,365 (71,099 -> 863,098; ratio 121:10) [0 scs] --18315-- transtab: dumped 0 (0 -> ??) --18315-- transtab: discarded 0 (0 -> ??) --18315-- scheduler: 104,818 jumps (bb entries). --18315-- scheduler: 1/3,507 major/minor sched events. --18315-- sanity: 2 cheap, 1 expensive checks. --18315-- exectx: 30,011 lists, 0 contexts (avg 0 per list) --18315-- exectx: 0 searches, 0 full compares (0 per 1000) --18315-- exectx: 0 cmp2, 0 cmp4, 0 cmpAll <val...@li...> -- Hameed. |
|
From: Josef W. <Jos...@gm...> - 2007-10-26 23:21:52
|
On Friday 26 October 2007, Abdel-Hameed Abdel-Salam Badawy wrote: > ==18315== I refs: 561,873 > ==18315== I1 misses: 3,089 > ==18315== L2i misses: 1,497 > ==18315== I1 miss rate: 0.54% > ==18315== L2i miss rate: 0.26% > ==18315== D refs: 295,619 (209,070 rd + 86,549 wr) > ==18315== D1 misses: 4,752 ( 3,948 rd + 804 wr) > ==18315== L2d misses: 2,584 ( 1,960 rd + 624 wr) > ==18315== D1 miss rate: 1.6% ( 1.8% + 0.9% ) > ==18315== L2d miss rate: 0.8% ( 0.9% + 0.7% ) > ==18315== L2 refs: 7,841 ( 7,037 rd + 804 wr) > ==18315== L2 misses: 4,081 ( 3,457 rd + 624 wr) > ==18315== L2 miss rate: 0.4% ( 0.4% + 0.7% ) > ... > So, L2 miss rate should be 52% in this case. > L2d miss rate should be also 54.4% assuming only we look at the data misses > and refs only. > > Why in the world, these are no the numbers reported by the valgrind? Hi, I think your point is valid; "miss rate" is "misses/refs". However, I think we should keep above numbers, but could add the real "L2 miss rate" in addition. The current output actually talks about the efficiency of combined cache levels. So we should write e.g. "I1+L2 miss rate" instead of "L2i miss rate" (the current output is kind of bogus as our cache model does not have a separate L2 instruction cache). The reason for using the miss rate of combined cache levels at all is because the value is also to be used in sorted lists of functions and source annotation to pinpoint at code with bad cache efficiency, and for this to be useful, you really need combined figures. Nick: If we change this (and IMHO we should), this should be done exactly the same for cachegrind and callgrind. Josef |
|
From: Abdel-Hameed Abdel-S. B. <sha...@gm...> - 2007-10-29 00:54:53
|
This is in reply to both Josef's and Nicholas's emails. I think the metric you are showing if it is the L2 misses per reference or instruction then it should be the standardized "mpki" or Misses per Kilo Instruction. I think naming this an the L2 miss rate is sort of misleading at first since it is calling something by something else's name. You can easily report both since it is just the division that make the difference. I came across valgrind as a cache profiler through the microarchitecture webpage and I am sure any person with an architecture background would be at first mislead by the numbers but it is easy to figure out that the miss rate doesn't reflect the misses nor the refs of both the L1 and the L2. Thanks. --Hameed. On 10/26/07, Josef Weidendorfer <Jos...@gm...> wrote: > > On Friday 26 October 2007, Abdel-Hameed Abdel-Salam Badawy wrote: > > ==18315== I refs: 561,873 > > ==18315== I1 misses: 3,089 > > ==18315== L2i misses: 1,497 > > ==18315== I1 miss rate: 0.54% > > ==18315== L2i miss rate: 0.26% > > ==18315== D refs: 295,619 (209,070 rd + 86,549 wr) > > ==18315== D1 misses: 4,752 ( 3,948 rd + 804 wr) > > ==18315== L2d misses: 2,584 ( 1,960 rd + 624 wr) > > ==18315== D1 miss rate: 1.6% ( 1.8% + 0.9% ) > > ==18315== L2d miss rate: 0.8% ( 0.9% + 0.7% ) > > ==18315== L2 refs: 7,841 ( 7,037 rd + 804 wr) > > ==18315== L2 misses: 4,081 ( 3,457 rd + 624 wr) > > ==18315== L2 miss rate: 0.4% ( 0.4% + 0.7% ) > > ... > > So, L2 miss rate should be 52% in this case. > > L2d miss rate should be also 54.4% assuming only we look at the data > misses > > and refs only. > > > > Why in the world, these are no the numbers reported by the valgrind? > > Hi, > > I think your point is valid; "miss rate" is "misses/refs". > > However, I think we should keep above numbers, but could add the real > "L2 miss rate" in addition. The current output actually talks about > the efficiency of combined cache levels. > So we should write e.g. "I1+L2 miss rate" instead of "L2i miss rate" > (the current output is kind of bogus as our cache model does not have > a separate L2 instruction cache). > > The reason for using the miss rate of combined cache levels at all > is because the value is also to be used in sorted lists of functions and > source annotation to pinpoint at code with bad cache efficiency, and > for this to be useful, you really need combined figures. > > Nick: If we change this (and IMHO we should), this should be done > exactly the same for cachegrind and callgrind. > > Josef > -- Hameed. |
|
From: Nicholas N. <nj...@cs...> - 2007-10-27 02:04:44
|
On Fri, 26 Oct 2007, Abdel-Hameed Abdel-Salam Badawy wrote: > ==18315== I refs: 561,873 > ==18315== I1 misses: 3,089 > ==18315== L2i misses: 1,497 > ==18315== I1 miss rate: 0.54% > ==18315== L2i miss rate: 0.26% > ==18315== D refs: 295,619 (209,070 rd + 86,549 wr) > ==18315== D1 misses: 4,752 ( 3,948 rd + 804 wr) > ==18315== L2d misses: 2,584 ( 1,960 rd + 624 wr) > ==18315== D1 miss rate: 1.6% ( 1.8% + 0.9% ) > ==18315== L2d miss rate: 0.8% ( 0.9% + 0.7% ) > ==18315== L2 refs: 7,841 ( 7,037 rd + 804 wr) > ==18315== L2 misses: 4,081 ( 3,457 rd + 624 wr) > ==18315== L2 miss rate: 0.4% ( 0.4% + 0.7% ) > > <val...@li...>L2 refs =D1misses + I1misses. > L2 misses = L2d misses + L2i misses. > > The L2 miss rate equation that I know is (Please correct me if I am wrong on > this): > L2 miss rate = L2 misses/ L2 refs * 100 As Josef said, the computed rate is actually: L2 miss rate = L2 misses / total refs * 100 I believe this is a more useful metric than the one you suggest, because it gives an overall sense of how frequent L2 misses are. The number you suggest must be considered in combination with the D1/I1 miss rates in order to know what it meanss. Nick |