|
From: Mehmet B. <mb...@gm...> - 2007-11-19 22:10:59
|
Hi Everyone, I thought it would be interesting to compare valgrind (cachegrind) cache event results to those of PAPI, which uses real HW counters. Despite all my efforts to configure Valgrind properly, I am unable to get comparable results. I hope a Valgrind Guru might give me a hand here :) I run Valgrind using: valgrind --tool=cachegrind --I1=65536,2,64 --D1=65536,2,64 --L2=1048576,16,16384 ./mycode The characteristics of the machine I use, which is a 64 bit Opteron, is attached to the bottom of this message. I believe my settings in the valgrind line are correct. Here's what I get from PAPI for the code section that I am exploring: total: L1 Access= 30582440, Hit= 28659298, Miss= 1923142 total: L2 Access= 1923365, Hit= 1823712 , Miss= 99653 And here's what I read from Valgrind (please also see the attached image below): 1,918,926 total L1 misses (compare to 1,923,142 in PAPI) ----Close enough!!---- 6,360 total L2 misses (compare to 99,653 in PAPI) ---- NOT even close?? ---- Am I doing anything wrong while configuring L2 cache? I will appreciate any comments... Thanks a lot in advance! -Memo ============= ADDITIONAL INFO ============ The specs of Opteron are as follows: OPTERON Test case: Memory Information. ------------------------------------------------------------------------ L1 Instruction TLB: Number of Entries: 512; Associativity: 4 L1 Data TLB: Number of Entries: 512; Associativity: 4 L1 Instruction Cache: Total size: 64KB Line size: 64B Number of Lines: 1024 Associativity: 2 L1 Data Cache Total size: 64KB Line size: 64B Number of Lines: 1024 Associativity: 2 L2 Unified Cache Total size: 1024KB Line size: 64B Number of Lines: 16384 Associativity: 16 |