From: Paul Y. <yin...@gm...> - 2009-10-20 01:47:53
|
Hi @ll, I'm evaluating program performance on AMD Opteron 270 with OProfile. I refer to "Basic Performance Measurements for AMD Athlon™ 64, AMD Opteron™ and AMD Phenom™ Processors" by Paul J. Drongowski. Paul's artcile introduces two methods for L2 cache. One is direct method and another is indirect method. Direct method: L2 request rate = (L2_requests + L2_fill_write) / Ret_instructions L2 miss ratio = L2_misses / (L2_requests + L2_fill_write) Indirect method: IC_misses = IC_refills_L2 + IC_refills_sys DC_misses = DC_refills_L2 + DC_refills_sys L2_requests = IC_misses + DC_misses + L2_requests_TLB L2 request rate = L2_requests / Ret_instructions L2_misses = IC_refills_sys + DC_refills_sys + L2_misses_TLB L2 miss ratio = L2_misses / L2_requests I have some questions about L2 Cache measurement with OProfile. 1. How to compute L2_request_TLB in the indirect method? My understanding is L2_request_TLB is equal to the sum of L1_ITLB_MISS_AND_L2_ITLB_MISS and L1_DTLB_AND_L2_DTLB_MISS. Event REQUESTS_TO_L2 has a mask bit (0x4) for TLB. I measured both mcf and vortex in SPEC2000. opcontrol --event=REQUESTS_TO_L2:50003:0x4--event=L1_ITLB_MISS_AND_L2_ITLB_MISS:50003 --event=L1_DTLB_AND_L2_DTLB_MISS:50003 --image=mcf.exe,vortex.exe L1_DTLB_AND_L2_DTLB_MISS|REQUESTS_TO_L2:0x4|L1_ITLB_MISS_AND_L2_ITLB_MISS:50003| samples| %| samples| %| samples| %| ------------------------------------------------------------------------- 1377 100.000 1664 100.000 0 100.000 mcf.exe 1192 100.000 10 100.000 1816 100.000 vortex.exe There is a big discrepancy between REQUESTS_TO_L2:0x4 and (L1_ITLB_MISS_AND_L2_ITLB_MISS + L1_DTLB_AND_L2_DTLB_MISS). Which is appropriate? 2. How to compute L2_request? Direct method: L2_requests + L2_fill_write Indirect method: IC_misses(IC_refills_L2 + IC_refills_sys) + DC_misses(DC_refills_L2 + DC_refills_sys) + L2_requests_TLB 1) Direct method opcontrol --event=L2_CACHE_FILL_WRITEBACK:50003 --event=REQUESTS_TO_L2:50003:0x7 --image=mcf.exe, vortex.exe L2_CACHE_FILL_WRITEBACK|REQUESTS_TO_L2:0x7| samples| %| samples| %| ------------------------------------ 16402 100.000 15920 100.000 mcf.exe 11610 100.000 13761 100.000 vortex.exe L2_request_mcf_direct = 16402 + 15920 = 32322 L2_request_vortex_direct = 11610 + 13761 = 25371 2) Indirect method opcontrol --event=DATA_CACHE_REFILLS_FROM_L2_OR_SYSTEM:50003--event=INSTRUCTION_CACHE_REFILLS_FROM_L2:50003 --event=INSTRUCTION_CACHE_REFILLS_FROM_SYSTEM:50003--event=REQUESTS_TO_L2:50003:0x4 --image=mcf.exe,vortex.exe INSTRUCTION_CACHE_REFILLS_FROM_L2|INSTRUCTION_CACHE_REFILLS_FROM_SYSTEM|REQUESTS_TO_L2:0x4|DATA_CACHE_REFILLS_FROM_L2_OR_SYSTEM | samples| %| samples| %| samples| %| samples| %| ------------------------------------------------------------------------ 2251 100.000 15 100.000 2160 100.000 7587 100.000 vortex.exe 1 100.000 0 100.000 1660 100.000 10491 100.000 mcf.exe L2_request_mcf_indirect = 2251 + 15 + 2160 + 7587 = 12013 L2_request_vortex_indirect = 1 + 1660 + 10491 = 12152 There is a VERY BIG discrepancy between L2_request computed with direct and indirect methods. Why? 3. Are the following statements right? 1) INSTRUCTION_CACHE_REFILLS_FROM_SYSTEM is equal to L2_CACHE_MISS:0x1. 2) DATA_CACHE_REFILLS_FROM_SYSTEM is equal to L2_CACHE_MISS:0x2. Any suggestion is welcome! Appropriate measurement parameters are very necessary and important. We should have a unified version. -- Regards, Paul Yuan (袁鹏) |