Hi @ll,

I'm evaluating program performance on AMD Opteron 270 with OProfile. I refer to "Basic Performance Measurements for AMD Athlon™ 64, AMD Opteron™ and AMD Phenom™ Processors" by Paul J. Drongowski.

Paul's artcile introduces two methods for L2 cache. One is direct method and another is indirect method.

Direct method:
L2 request rate = (L2_requests + L2_fill_write) / Ret_instructions
L2 miss ratio = L2_misses / (L2_requests + L2_fill_write)

Indirect method:
IC_misses = IC_refills_L2 + IC_refills_sys
DC_misses = DC_refills_L2 + DC_refills_sys
L2_requests = IC_misses + DC_misses + L2_requests_TLB
L2 request rate = L2_requests / Ret_instructions
L2_misses = IC_refills_sys + DC_refills_sys + L2_misses_TLB
L2 miss ratio = L2_misses / L2_requests

I have some questions about L2 Cache measurement with OProfile.

1. How to compute L2_request_TLB in the indirect method?

My understanding is L2_request_TLB is equal to the sum of L1_ITLB_MISS_AND_L2_ITLB_MISS and L1_DTLB_AND_L2_DTLB_MISS. Event REQUESTS_TO_L2 has a mask bit (0x4) for TLB. I measured both mcf and vortex in SPEC2000.

opcontrol --event=REQUESTS_TO_L2:50003:0x4--event=L1_ITLB_MISS_AND_L2_ITLB_MISS:50003  --event=L1_DTLB_AND_L2_DTLB_MISS:50003 --image=mcf.exe,vortex.exe

L1_DTLB_AND_L2_DTLB_MISS|REQUESTS_TO_L2:0x4|L1_ITLB_MISS_AND_L2_ITLB_MISS:50003|
  samples|      %|  samples|      %|  samples|      %|
-------------------------------------------------------------------------
     1377 100.000      1664 100.000      0 100.000  mcf.exe
     1192 100.000        10 100.000      1816 100.000 vortex.exe

There is a big discrepancy between REQUESTS_TO_L2:0x4 and (L1_ITLB_MISS_AND_L2_ITLB_MISS + L1_DTLB_AND_L2_DTLB_MISS). Which is appropriate? 

2. How to compute L2_request?

Direct method: L2_requests + L2_fill_write
Indirect method: IC_misses(IC_refills_L2 + IC_refills_sys) + DC_misses(DC_refills_L2 + DC_refills_sys) + L2_requests_TLB

1) Direct method
opcontrol --event=L2_CACHE_FILL_WRITEBACK:50003 --event=REQUESTS_TO_L2:50003:0x7 --image=mcf.exe, vortex.exe

L2_CACHE_FILL_WRITEBACK|REQUESTS_TO_L2:0x7|
  samples|      %|  samples|      %|
------------------------------------
    16402 100.000     15920 100.000 mcf.exe
    11610 100.000     13761 100.000 vortex.exe

L2_request_mcf_direct = 16402 + 15920 = 32322
L2_request_vortex_direct = 11610 + 13761 = 25371

2) Indirect method
opcontrol --event=DATA_CACHE_REFILLS_FROM_L2_OR_SYSTEM:50003--event=INSTRUCTION_CACHE_REFILLS_FROM_L2:50003 --event=INSTRUCTION_CACHE_REFILLS_FROM_SYSTEM:50003--event=REQUESTS_TO_L2:50003:0x4 --image=mcf.exe,vortex.exe

INSTRUCTION_CACHE_REFILLS_FROM_L2|INSTRUCTION_CACHE_REFILLS_FROM_SYSTEM|REQUESTS_TO_L2:0x4|DATA_CACHE_REFILLS_FROM_L2_OR_SYSTEM |
  samples|      %|  samples|      %|  samples|      %|  samples|      %|
------------------------------------------------------------------------
     2251 100.000        15 100.000      2160 100.000      7587 100.000 vortex.exe
     1 100.000           0 100.000      1660 100.000     10491 100.000 mcf.exe

L2_request_mcf_indirect = 2251 + 15 + 2160 + 7587 = 12013
L2_request_vortex_indirect = 1 + 1660 + 10491 = 12152

There is a VERY BIG discrepancy between L2_request computed with direct and indirect methods. Why?

3. Are the following statements right?

1) INSTRUCTION_CACHE_REFILLS_FROM_SYSTEM is equal to L2_CACHE_MISS:0x1.
2) DATA_CACHE_REFILLS_FROM_SYSTEM is equal to L2_CACHE_MISS:0x2.

Any suggestion is welcome! Appropriate measurement parameters are very necessary and important. We should have a unified version.

--
Regards,
Paul Yuan (袁鹏)