From: John S. Jr <jsa...@gm...> - 2009-06-06 12:00:10
|
I have a laptop w/ Intel Core Duo processor (Intel T2600 @ 2.16Ghz), and use oprofile 0.9.3. What I need to do is measure L2 cache hits and L2 cache misses from a C program I am profiling. What events do I need to specify to oprofile to do this? I have looked at the available events from 'opcontrol --list-events' (which I give at the end of this post), but cannot find such an event. Thanks! --john =========================== oprofile: available events for CPU type "Core Solo / Duo" See Intel Architecture Developer's Manual Volume 3, Appendix A and Intel Architecture Optimization Reference Manual (730795-001) CPU_CLK_UNHALTED: (counter: all) Unhalted clock cycles (min count: 6000) Unit masks (default 0x0) ---------- 0x00: Unhalted core cycles 0x01: Unhalted bus cycles 0x02: Unhalted bus cycles of this core while the other core is halted INST_RETIRED: (counter: all) number of instructions retired (min count: 6000) L2_RQSTS: (counter: all) number of L2 requests (min count: 6000) Unit masks (default 0xf) ---------- 0x08: (M)odified cache state 0x04: (E)xclusive cache state 0x02: (S)hared cache state 0x01: (I)nvalid cache state 0x0f: All cache states 0x10: HW prefetched line only 0x20: all prefetched line w/o regarding mask 0x10. LD_BLOCKS: (counter: all) number of store buffer blocks (min count: 500) SB_DRAINS: (counter: all) number of store buffer drain cycles (min count: 500) MISALIGN_MEM_REF: (counter: all) number of misaligned data memory references (min count: 500) SEGMENT_REG_LOADS: (counter: all) number of segment register loads (min count: 500) EMON_KNI_PREF_DISPATCHED: (counter: all) number of SSE pre-fetch/weakly ordered insns retired (min count: 500) Unit masks (default 0x0) ---------- 0x00: prefetch NTA 0x01: prefetch T1 0x02: prefetch T2 0x03: weakly-ordered stores FLOPS: (counter: 0) number of computational FP operations executed (min count: 3000) FP_ASSIST: (counter: 1) number of FP exceptions handled by microcode (min count: 500) MUL: (counter: 1) number of multiplies (min count: 1000) DIV: (counter: 1) number of divides (min count: 500) CYCLES_DIV_BUSY: (counter: 0) cycles divider is busy (min count: 1000) L2_ADS: (counter: all) number of L2 address strobes (min count: 500) L2_DBUS_BUSY: (counter: all) number of cycles data bus was busy (min count: 500) L2_DBUS_BUSY_RD: (counter: all) cycles data bus was busy in xfer from L2 to CPU (min count: 500) L2_LINES_IN: (counter: all) number of allocated lines in L2 (min count: 500) L2_M_LINES_INM: (counter: all) number of modified lines allocated in L2 (min count: 500) L2_LINES_OUT: (counter: all) number of recovered lines from L2 (min count: 500) L2_M_LINES_OUTM: (counter: all) number of modified lines removed from L2 (min count: 500) L2_IFETCH: (counter: all) number of L2 instruction fetches (min count: 500) Unit masks (default 0xf) ---------- 0x08: (M)odified cache state 0x04: (E)xclusive cache state 0x02: (S)hared cache state 0x01: (I)nvalid cache state 0x0f: All cache states 0x10: HW prefetched line only 0x20: all prefetched line w/o regarding mask 0x10. L2_LD: (counter: all) number of L2 data loads (min count: 500) Unit masks (default 0xf) ---------- 0x08: (M)odified cache state 0x04: (E)xclusive cache state 0x02: (S)hared cache state 0x01: (I)nvalid cache state 0x0f: All cache states 0x10: HW prefetched line only 0x20: all prefetched line w/o regarding mask 0x10. L2_ST: (counter: all) number of L2 data stores (min count: 500) Unit masks (default 0xf) ---------- 0x08: (M)odified cache state 0x04: (E)xclusive cache state 0x02: (S)hared cache state 0x01: (I)nvalid cache state 0x0f: All cache states 0x10: HW prefetched line only 0x20: all prefetched line w/o regarding mask 0x10. L2_REJECT_CYCLES: (counter: all) Cycles L2 is busy and rejecting new requests (min count: 500) Unit masks (default 0xf) ---------- 0x08: (M)odified cache state 0x04: (E)xclusive cache state 0x02: (S)hared cache state 0x01: (I)nvalid cache state 0x0f: All cache states 0x10: HW prefetched line only 0x20: all prefetched line w/o regarding mask 0x10. L2_NO_REQUEST_CYCLES: (counter: all) Cycles there is no request to access L2 (min count: 500) Unit masks (default 0xf) ---------- 0x08: (M)odified cache state 0x04: (E)xclusive cache state 0x02: (S)hared cache state 0x01: (I)nvalid cache state 0x0f: All cache states 0x10: HW prefetched line only 0x20: all prefetched line w/o regarding mask 0x10. EST_TRANS_ALL: (counter: all) Intel(tm) Enhanced SpeedStep(r) Technology transitions (min count: 500) Unit masks (default 0x0) ---------- 0x00: any transitions 0x10: Intel(tm) Enhanced SpeedStep(r) Technology frequency transitions 0x20: any transactions THERMAL_TRIP: (counter: all) Duration in a thremal trip based on the current core clock (min count: 500) Unit masks (default 0xc0) ---------- 0xc0: No unit mask DCACHE_CACHE_LD: (counter: all) L1 cacheable data read operations (min count: 500) Unit masks (default 0xf) ---------- 0x08: (M)odified cache state 0x04: (E)xclusive cache state 0x02: (S)hared cache state 0x01: (I)nvalid cache state 0x0f: All cache states 0x10: HW prefetched line only 0x20: all prefetched line w/o regarding mask 0x10. DCACHE_CACHE_ST: (counter: all) L1 cacheable data write operations (min count: 500) Unit masks (default 0xf) ---------- 0x08: (M)odified cache state 0x04: (E)xclusive cache state 0x02: (S)hared cache state 0x01: (I)nvalid cache state 0x0f: All cache states 0x10: HW prefetched line only 0x20: all prefetched line w/o regarding mask 0x10. DCACHE_CACHE_LOCK: (counter: all) L1 cacheable lock read operations to invalid state (min count: 500) Unit masks (default 0xf) ---------- 0x08: (M)odified cache state 0x04: (E)xclusive cache state 0x02: (S)hared cache state 0x01: (I)nvalid cache state 0x0f: All cache states 0x10: HW prefetched line only 0x20: all prefetched line w/o regarding mask 0x10. DATA_MEM_REFS: (counter: all) all L1 memory references, cachable and non (min count: 500) Unit masks (default 0x1) ---------- 0x01: No unit mask DATA_MEM_CACHE_REFS: (counter: all) L1 data cacheable read and write operations (min count: 500) Unit masks (default 0x2) ---------- 0x02: No unit mask DCACHE_REPL: (counter: all) L1 data cache line replacements (min count: 500) Unit masks (default 0xf) ---------- 0x0f: No unit mask DCACHE_M_REPL: (counter: all) L1 data M-state cache line allocated (min count: 500) DCACHE_M_EVICT: (counter: all) L1 data M-state cache line evicted (min count: 500) DCACHE_PEND_MISS: (counter: all) Weighted cycles of L1 miss outstanding (min count: 500) Unit masks (default 0x0) ---------- 0x00: Weighted cycles 0x01: Duration of cycles DTLB_MISS: (counter: all) Data references that missed TLB (min count: 500) SSE_PREF_MISS: (counter: all) SSE instructions that missed all caches (min count: 500) Unit masks (default 0x0) ---------- 0x00: PREFETCHNTA 0x01: PREFETCHT1 0x02: PREFETCHT2 0x03: SSE streaming store instructions L1_PREF_REQ: (counter: all) L1 prefetch requests due to DCU cache misses (min count: 500) BUS_REQ_OUTSTANDING: (counter: all) weighted number of outstanding bus requests (min count: 500) BUS_BNR_DRV: (counter: all) External bus cycles this processor is driving BNR pin (min count: 500) BUS_DRDY_CLOCKS: (counter: all) External bus cycles DRDY is asserted (min count: 500) BUS_LOCK_CLOCKS: (counter: all) External bus cycles LOCK is asserted (min count: 500) BUS_DATA_RCV: (counter: all) External bus cycles this processor is receiving data (min count: 500) Unit masks (default 0x40) ---------- 0x40: No unit mask BUS_TRAN_BRD: (counter: all) number of burst read transactions (min count: 500) BUS_TRAN_RFO: (counter: all) number of completed read for ownership transactions (min count: 500) BUS_TRAN_WB: (counter: all) number of completed writeback transactions (min count: 500) Unit masks (default 0xc0) ---------- 0xc0: No unit mask BUS_TRAN_IFETCH: (counter: all) number of completed instruction fetch transactions (min count: 500) BUS_TRAN_INVAL: (counter: all) number of completed invalidate transactions (min count: 500) BUS_TRAN_PWR: (counter: all) number of completed partial write transactions (min count: 500) BUS_TRANS_P: (counter: all) number of completed partial transactions (min count: 500) BUS_TRANS_IO: (counter: all) number of completed I/O transactions (min count: 500) BUS_TRANS_DEF: (counter: all) number of completed defer transactions (min count: 500) Unit masks (default 0x20) ---------- 0x20: No unit mask BUS_TRAN_BURST: (counter: all) number of completed burst transactions (min count: 500) Unit masks (default 0xc0) ---------- 0xc0: No unit mask BUS_TRAN_MEM: (counter: all) number of completed memory transactions (min count: 500) Unit masks (default 0xc0) ---------- 0xc0: No unit mask BUS_TRAN_ANY: (counter: all) number of any completed bus transactions (min count: 500) Unit masks (default 0xc0) ---------- 0xc0: No unit mask BUS_SNOOPS: (counter: all) External bus cycles (min count: 500) DCU_SNOOP_TO_SHARE: (counter: all) DCU snoops to share-state L1 cache line due to L1 misses (min count: 500) Unit masks (default 0x1) ---------- 0x01: No unit mask BUS_NOT_IN_USE: (counter: all) Number of cycles there is no transaction from the core (min count: 500) BUS_SNOOP_STALL: (counter: all) Number of bus cycles during bus snoop stall (min count: 500) ICACHE_READS: (counter: all) number of instruction fetches (min count: 500) ICACHE_MISSES: (counter: all) number of instruction fetch misses (min count: 500) ITLB_MISS: (counter: all) number of ITLB misses (min count: 500) IFU_MEM_STALL: (counter: all) cycles instruction fetch pipe is stalled (min count: 500) ILD_STALL: (counter: all) cycles instruction length decoder is stalled (min count: 500) BR_INST_EXEC: (counter: all) Branch instructions executed (not necessarily retired) (min count: 3000) BR_MISSP_EXEC: (counter: all) Branch instructions executed that were mispredicted at execution (min count: 3000) BR_BAC_MISSP_EXEC: (counter: all) Branch instructions executed that were mispredicted at Front End (BAC) (min count: 3000) BR_CND_EXEC: (counter: all) Conditional Branch instructions executed (min count: 3000) BR_CND_MISSP_EXEC: (counter: all) Conditional Branch instructions executed that were mispredicted (min count: 3000) BR_IND_EXEC: (counter: all) Indirect Branch instructions executed (min count: 3000) BR_IND_MISSP_EXEC: (counter: all) Indirect Branch instructions executed that were mispredicted (min count: 3000) BR_RET_EXEC: (counter: all) Return Branch instructions executed (min count: 3000) BR_RET_MISSP_EXEC: (counter: all) Return Branch instructions executed that were mispredicted at Execution (min count: 3000) BR_RET_BAC_MISSP_EXEC: (counter: all) Branch instructions executed that were mispredicted at Front End (BAC) (min count: 3000) BR_CALL_EXEC: (counter: all) CALL instruction executed (min count: 3000) BR_CALL_MISSP_EXEC: (counter: all) CALL instruction executed and miss predicted (min count: 3000) BR_IND_CALL_EXEC: (counter: all) Indirect CALL instruction executed (min count: 3000) RESOURCE_STALLS: (counter: all) cycles during resource related stalls (min count: 500) MMX_INSTR_EXEC: (counter: all) number of MMX instructions executed (not MOVQ and MOVD) (min count: 500) SIMD_SAT_INSTR_EXEC: (counter: all) number of SIMD saturating instructions executed (min count: 3000) MMX_INSTR_TYPE_EXEC: (counter: all) number of MMX packing instructions (min count: 3000) Unit masks (default 0x3f) ---------- 0x01: MMX packed multiplies 0x02: MMX packed shifts 0x04: MMX pack operations 0x08: MMX unpack operations 0x10: MMX packed logical 0x20: MMX packed arithmetic 0x3f: all of the above COMP_FLOP_RET: (counter: 0) number of computational FP operations retired (min count: 3000) UOPS_RETIRED: (counter: all) number of UOPs retired (min count: 6000) SMC_DETECTED: (counter: all) number of times self-modifying code condition is detected (min count: 500) BR_INST_RETIRED: (counter: all) number of branch instructions retired (min count: 500) BR_MISS_PRED_RETIRED: (counter: all) number of mispredicted branches retired (min count: 500) CYCLES_INT_MASKED: (counter: all) cycles interrupts are disabled (min count: 500) CYCLES_INT_PENDING_AND_MASKED: (counter: all) cycles interrupts are disabled with pending interrupts (min count: 500) HW_INT_RX: (counter: all) number of hardware interrupts received (min count: 500) BR_TAKEN_RETIRED: (counter: all) number of taken branches retired (min count: 500) BR_MISS_PRED_TAKEN_RET: (counter: all) number of taken mispredictions branches retired (min count: 500) FP_MMX_TRANS: (counter: all) MMX-floating point transitions (min count: 3000) Unit masks (default 0x0) ---------- 0x00: MMX->float operations 0x01: float->MMX operations MMX_ASSIST: (counter: all) number of EMMS instructions executed (min count: 500) MMX_INSTR_RET: (counter: all) number of MMX instructions retired (min count: 3000) INST_DECODED: (counter: all) number of instructions decoded (min count: 6000) ESP_UOPS: (counter: all) Number of ESP folding instructions decoded (min count: 3000) EMON_SSE_SSE2_INST_RETIRED: (counter: all) Streaming SIMD Extensions Instructions Retired (min count: 3000) Unit masks (default 0x0) ---------- 0x00: SSE Packed Single 0x01: SSE Scalar-Single 0x02: SSE2 Packed-Double 0x03: SSE2 Scalar-Double EMON_SSE_SSE2_COMP_INST_RETIRED: (counter: all) Computational SSE Instructions Retired (min count: 3000) Unit masks (default 0x0) ---------- 0x00: SSE Packed Single 0x01: SSE Scalar-Single 0x02: SSE2 Packed-Double 0x03: SSE2 Scalar-Double EMON_FUSED_UOPS_RET: (counter: all) Number of retired fused micro-ops (min count: 3000) Unit masks (default 0x0) ---------- 0x00: All fused micro-ops 0x01: Only load+Op micro-ops 0x02: Only std+sta micro-ops EMON_UNFUSION: (counter: all) Number of unfusion events in the ROB, happened on a FP exception to a fused uOp (min count: 3000) BR_INST_DECODED: (counter: all) number of branch instructions decoded (min count: 500) BTB_MISSES: (counter: all) number of branches that miss the BTB (min count: 500) BR_BOGUS: (counter: all) number of bogus branches (min count: 500) BACLEARS: (counter: all) number of times BACLEAR is asserted (min count: 500) EMON_PREF_RQSTS_UP: (counter: all) Number of upward prefetches issued (min count: 3000) EMON_PREF_RQSTS_DN: (counter: all) Number of downward prefetches issued (min count: 3000) |