From: Benjamin L. <bc...@kv...> - 2006-08-22 19:30:59
|
Hello, This patch adds events for the Core 2 CPUs to oprofile. It is diffed against 0.9.1, but I can rebase if that is desired. The events list should correspond to those listed in the upcoming revision of the Intel systems programming manuals. This code has been approved for public release under the GPL. -ben Signed-off-by: Benjamin LaHaise <ben...@in...> diff -urN oprofile-0.9.1/events/i386/core_2/events oprofile-0.9.1.core_2/events/i386/core_2/events --- oprofile-0.9.1/events/i386/core_2/events 1969-12-31 19:00:00.000000000 -0500 +++ oprofile-0.9.1.core_2/events/i386/core_2/events 2006-08-18 16:57:30.000000000 -0400 @@ -0,0 +1,133 @@ +# Core 2 events +# +# Architectural events +# +event:0x3c counters:0,1 um:nonhlt minimum:6000 name:CPU_CLK_UNHALTED : Clock cycles when not halted +event:0xc0 counters:0,1 um:zero minimum:6000 name:INST_RETIRED.ANY_P : number of instructions retired +event:0x2e counters:0,1 um:mesi minimum:6000 name:L2_RQSTS : number of L2 requests +event:0x2e counters:0,1 um:x41 minimum:6000 name:L2_RQSTS.SELF.DEMAND.I_STATE : L2 cache demand requests from this core that missed the L2 +event:0x2e counters:0,1 um:x4f minimum:6000 name:L2_RQSTS.SELF.DEMAND.I_STATE : L2 cache demand requests from this core +# +# Model specific events +# +event:0x03 counters:0,1 um:load_block minimum:500 name:LOAD_BLOCK : events pertaining to loads +event:0x04 counters:0,1 um:store_block minimum:500 name:STORE_BLOCK : events pertaining to stores +event:0x05 counters:0,1 um:zero minimum:500 name:MISALIGN_MEM_REF : number of misaligned data memory references +event:0x06 counters:0,1 um:zero minimum:500 name:SEGMENT_REG_LOADS : number of segment register loads +event:0x07 counters:0,1 um:sse_prefetch minimum:500 name:SSE_PRE_EXEC : number of SSE pre-fetch/weakly ordered insns retired +event:0x08 counters:0,1 um:dtlb_miss minimum:500 name:DTLB_MISSES : DTLB miss events +event:0x09 counters:0,1 um:memory_dis minimum:1000 name:MEMORY_DISAMBIGUATION : Memory disambiguation reset cycles. +event:0x0c counters:0,1 um:page_walks minimum:500 name:PAGE_WALKS : Page table walk events +event:0x10 counters:0,1 um:zero minimum:3000 name:FLOPS : number of FP computational micro-ops executed +event:0x11 counters:0,1 um:zero minimum:500 name:FP_ASSIST : number of FP assists +event:0x12 counters:0,1 um:zero minimum:1000 name:MUL : number of multiplies +event:0x13 counters:0,1 um:zero minimum:500 name:DIV : number of divides +event:0x14 counters:0,1 um:zero minimum:1000 name:CYCLES_DIV_BUSY : cycles divider is busy +event:0x18 counters:0,1 um:zero minimum:1000 name:IDLE_DURING_DIV : cycles divider is busy and all other execution units are idle. +event:0x19 counters:0,1 um:delayed_bypass minimum:1000 name:DELAYED_BYPASS : Delayed bypass events +event:0x21 counters:0,1 um:core minimum:500 name:L2_ADS : Cycles the L2 address bus is in use. +event:0x23 counters:0,1 um:core minimum:500 name:L2_DBUS_BUSY_RD : Cycles the L2 transfers data to the core. +event:0x24 counters:0,1 um:core_prefetch minimum:500 name:L2_LINES_IN : number of allocated lines in L2 +event:0x25 counters:0,1 um:core_prefetch minimum:500 name:L2_M_LINES_IN : number of modified lines allocated in L2 +event:0x26 counters:0,1 um:core_prefetch minimum:500 name:L2_LINES_OUT : number of recovered lines from L2 +event:0x27 counters:0,1 um:core_prefetch minimum:500 name:L2_M_LINES_OUT : number of modified lines removed from L2 +event:0x28 counters:0,1 um:core_mesi minimum:500 name:L2_IFETCH : number of L2 cacheable instruction fetches +event:0x29 counters:0,1 um:core_prefetch_mesi minimum:500 name:L2_LD : number of L2 data loads +event:0x2a counters:0,1 um:core_mesi minimum:500 name:L2_ST : number of L2 data stores +event:0x2b counters:0,1 um:core_mesi minimum:500 name:L2_LOCK : number of locked L2 data accesses +event:0x2e counters:0,1 um:core_prefetch_mesi minimum:500 name:L2_RQSTS : number of L2 cache requests +event:0x30 counters:0,1 um:core_prefetch_mesi minimum:500 name:L2_REJECT_BUSQ : Rejected L2 cache requests +event:0x32 counters:0,1 um:core minimum:500 name:L2_NO_REQ : Cycles no L2 cache requests are pending +event:0x3a counters:0,1 um:zero minimum:500 name:EIST_TRANS_ALL : Intel(tm) Enhanced SpeedStep(r) Technology transitions +event:0x3b counters:0,1 um:xc0 minimum:500 name:THERMAL_TRIP : Number of thermal trips +event:0x40 counters:0,1 um:mesi minimum:500 name:L1D_CACHE_LD : L1 cacheable data read operations +event:0x41 counters:0,1 um:mesi minimum:500 name:L1D_CACHE_ST : L1 cacheable data write operations +event:0x42 counters:0,1 um:mesi minimum:500 name:L1D_CACHE_LOCK : L1 cacheable lock read operations +event:0x42 counters:0,1 um:x10 minimum:500 name:L1D_CACHE_LOCK_DURATION : Duration of L1 data cacheable locked operations +event:0x43 counters:0,1 um:x10 minimum:500 name:L1D_ALL_REF : All references to the L1 data cache +event:0x43 counters:0,1 um:two minimum:500 name:L1D_ALL_CACHE_REF : L1 data cacheable reads and writes +event:0x45 counters:0,1 um:x0f minimum:500 name:L1D_REPL : Cache lines allocated in the L1 data cache +event:0x46 counters:0,1 um:zero minimum:500 name:L1D_M_REPL : Modified cache lines allocated in the L1 data cache +event:0x47 counters:0,1 um:zero minimum:500 name:L1D_M_EVICT : Modified cache lines evicted from the L1 data cache +event:0x48 counters:0,1 um:dc_pend_miss minimum:500 name:L1D_PEND_MISS : Weighted cycles of L1 miss outstanding +event:0x49 counters:0,1 um:l1d_split minimum:500 name:L1D_SPLIT : Cache line split load/stores +event:0x4b counters:0,1 um:sse_miss minimum:500 name:SSE_PREF_MISS : SSE instructions that missed all caches +event:0x4c counters:0,1 um:zero minimum:500 name:LOAD_HIT_PRE : Load operations conflicting with a software prefetch to the same address +event:0x4e counters:0,1 um:x10 minimum:500 name:L1D_PREFETCH : L1 data cache prefetch requests +# +event:0x60 counters:0,1 um:core_and_bus_agents minimum:500 name:BUS_REQ_OUTSTANDING : Outstanding cacheable data read bus requests duration +event:0x61 counters:0,1 um:bus_agents minimum:500 name:BUS_BNR_DRV : Number of Bus Not Ready signals asserted +event:0x62 counters:0,1 um:bus_agents minimum:500 name:BUS_DRDY_CLOCKS : Bus cycles when data is sent on the bus +event:0x63 counters:0,1 um:core_and_bus_agents minimum:500 name:BUS_LOCK_CLOCKS : Bus cycles when a LOCK signal is asserted +event:0x64 counters:0,1 um:core_and_bus_agents minimum:500 name:BUS_DATA_RCV : Bus cycles while processor receives data +event:0x65 counters:0,1 um:core_and_bus_agents minimum:500 name:BUS_TRAN_BRD : Burst read bus transactions +event:0x66 counters:0,1 um:core_and_bus_agents minimum:500 name:BUS_TRAN_RFO : number of completed read for ownership transactions +event:0x67 counters:0,1 um:core_and_bus_agents minimum:500 name:BUS_TRAN_WB : number of explicit writeback bus transactions +event:0x68 counters:0,1 um:core_and_bus_agents minimum:500 name:BUS_TRAN_IFETCH : number of instruction fetch transactions +event:0x69 counters:0,1 um:core_and_bus_agents minimum:500 name:BUS_TRAN_INVAL : number of invalidate transactions +event:0x6a counters:0,1 um:core_and_bus_agents minimum:500 name:BUS_TRAN_PWR : number of partial write bus transactions +event:0x6b counters:0,1 um:core_and_bus_agents minimum:500 name:BUS_TRANS_P : number of partial bus transactions +event:0x6c counters:0,1 um:core_and_bus_agents minimum:500 name:BUS_TRANS_IO : number of I/O bus transactions +event:0x6d counters:0,1 um:core_and_bus_agents minimum:500 name:BUS_TRANS_DEF : number of completed defer transactions +event:0x6e counters:0,1 um:core_and_bus_agents minimum:500 name:BUS_TRAN_BURST : number of completed burst transactions +event:0x6f counters:0,1 um:core_and_bus_agents minimum:500 name:BUS_TRAN_MEM : number of completed memory transactions +event:0x70 counters:0,1 um:core_and_bus_agents minimum:500 name:BUS_TRAN_ANY : number of any completed bus transactions +event:0x77 counters:0,1 um:bus_agents_and_snoop minimum:500 name:EXT_SNOOP : External snoops +event:0x78 counters:0,1 um:core_and_snoop minimum:500 name:CMP_SNOOP : L1 data cache is snooped by other core +event:0x7a counters:0,1 um:bus_agents minimum:500 name:BUS_HIT_DRV : HIT signal asserted +event:0x7b counters:0,1 um:bus_agents minimum:500 name:BUS_HITM_DRV : HITM signal asserted +event:0x7d counters:0,1 um:core minimum:500 name:BUSQ_EMPTY : Bus queue is empty +event:0x7e counters:0,1 um:core_and_bus_agents minimum:500 name:SNOOP_STALL_DRV : Bus stalled for snoops +event:0x7f counters:0,1 um:core minimum:500 name:BUS_IO_WAIT : IO requests waiting in the bus queue +event:0x80 counters:0,1 um:zero minimum:500 name:L1I_READS : number of instruction fetches +event:0x81 counters:0,1 um:zero minimum:500 name:L1I_MISSES : number of instruction fetch misses +event:0x82 counters:0,1 um:itlb_miss minimum:500 name:ITLB : number of ITLB misses +event:0x83 counters:0,1 um:two minimum:500 name:INST_QUEUE.FULL : cycles during which the instruction queue is full +event:0x86 counters:0,1 um:zero minimum:500 name:IFU_MEM_STALL : cycles instruction fetch pipe is stalled +event:0x87 counters:0,1 um:zero minimum:500 name:ILD_STALL : cycles instruction length decoder is stalled +event:0x88 counters:0,1 um:zero minimum:3000 name:BR_INST_EXEC : Branch instructions executed (not necessarily retired) +event:0x89 counters:0,1 um:zero minimum:3000 name:BR_MISSP_EXEC : Branch instructions executed that were mispredicted at execution +event:0x8a counters:0,1 um:zero minimum:3000 name:BR_BAC_MISSP_EXEC : Branch instructions executed that were mispredicted at Front End (BAC) +event:0x8b counters:0,1 um:zero minimum:3000 name:BR_CND_EXEC : Conditional Branch instructions executed +event:0x8c counters:0,1 um:zero minimum:3000 name:BR_CND_MISSP_EXEC : Conditional Branch instructions executed that were mispredicted +event:0x8d counters:0,1 um:zero minimum:3000 name:BR_IND_EXEC : Indirect Branch instructions executed +event:0x8e counters:0,1 um:zero minimum:3000 name:BR_IND_MISSP_EXEC : Indirect Branch instructions executed that were mispredicted +event:0x8f counters:0,1 um:zero minimum:3000 name:BR_RET_EXEC : Return Branch instructions executed +event:0x90 counters:0,1 um:zero minimum:3000 name:BR_RET_MISSP_EXEC : Return Branch instructions executed that were mispredicted at Execution +event:0x91 counters:0,1 um:zero minimum:3000 name:BR_RET_BAC_MISSP_EXEC :Return Branch instructions executed that were mispredicted at Front End (BAC) +event:0x92 counters:0,1 um:zero minimum:3000 name:BR_CALL_EXEC : CALL instruction executed +event:0x93 counters:0,1 um:zero minimum:3000 name:BR_CALL_MISSP_EXEC : CALL instruction executed and miss predicted +event:0x94 counters:0,1 um:zero minimum:3000 name:BR_IND_CALL_EXEC : Indirect CALL instruction executed +event:0x97 counters:0,1 um:zero minimum:3000 name:BR_TKN_BUBBLE_1 : Branch predicted taken with bubble 1 +event:0x98 counters:0,1 um:zero minimum:3000 name:BR_TKN_BUBBLE_2 : Branch predicted taken with bubble 2 +event:0xa0 counters:0,1 um:zero minimum:1000 name:RS_UOPS_DISPATCHED : Micro-ops dispatched for execution +event:0xaa counters:0,1 um:macro_insts minimum:500 name:MACRO_INSTS : instructions decoded +event:0xab counters:0,1 um:esp minimum:500 name:ESP : ESP register events +event:0xb0 counters:0,1 um:zero minimum:500 name:SIMD_UOPS_EXEC : SIMD micro-ops executed (excluding stores) +event:0xb1 counters:0,1 um:zero minimum:3000 name:SIMD_SAT_UOP_EXEC : number of SIMD saturating instructions executed +event:0xb3 counters:0,1 um:simd_instr_type_exec minimum:3000 name:SIMD_UOP_TYPE_EXEC : number of SIMD packing instructions +event:0xc0 counters:0,1 um:inst_retired minimum:6000 name:INST_RETIRED : number of instructions retired +event:0xc1 counters:0,1 um:x87_ops_retired minimum:500 name:X87_OPS_RETIRED : number of computational FP operations retired +event:0xc2 counters:0,1 um:uops_retired minimum:6000 name:UOPS_RETIRED : number of UOPs retired +event:0xc3 counters:0,1 um:machine_nukes minimum:500 name:MACHINE_NUKES.SMC : number of pipeline flushing events +event:0xc4 counters:0,1 um:br_inst_retired minimum:500 name:BR_INST_RETIRED : number of branch instructions retired +event:0xc5 counters:0,1 um:zero minimum:500 name:BR_MISS_PRED_RETIRED : number of mispredicted branches retired (precise) +event:0xc6 counters:0,1 um:cycles_int_masked minimum:500 name:CYCLES_INT_MASKED : cycles interrupts are disabled +event:0xc7 counters:0,1 um:simd_inst_retired minimum:500 name:SIMD_INST_RETIRED : SSE/SSE2 instructions retired +event:0xc8 counters:0,1 um:zero minimum:500 name:HW_INT_RCV : number of hardware interrupts received +event:0xc9 counters:0,1 um:zero minimum:500 name:ITLB_MISS_RETIRED : Retired instructions that missed the ITLB +event:0xca counters:0,1 um:simd_comp_inst_retired minimum:500 name:SIMD_COMP_INST_RETIRED : Retired computational SSE/SSE2 instructions +event:0xcb counters:0,1 um:mem_load_retired minimum:500 name:MEM_LOAD_RETIRED : Retired loads +event:0xcc counters:0,1 um:mmx_trans minimum:3000 name:FP_MMX_TRANS : MMX-floating point transitions +event:0xcd counters:0,1 um:zero minimum:500 name:MMX_ASSIST : number of EMMS instructions executed +event:0xce counters:0,1 um:zero minimum:500 name:SIMD_INSTR_RET : number of SIMD instructions retired +event:0xcf counters:0,1 um:zero minimum:500 name:SIMD_SAT_INSTR_RET : number of saturated arithmetic instructions retired +event:0xd2 counters:0,1 um:rat_stalls minimum:6000 name:RAT_STALLS : Partial register stall cycles +event:0xd4 counters:0,1 um:seg_regs minimum:500 name:SEG_RENAME_STALLS : Segment rename stalls +event:0xd5 counters:0,1 um:seg_regs minimum:500 name:SEG_RENAMES : Segment renames +event:0xdc counters:0,1 um:resource_stalls minimum:3000 name:RESOURCE_STALLS : Cycles during which resource stalls occur +event:0xe0 counters:0,1 um:zero minimum:500 name:BR_INST_DECODED : number of branch instructions decoded +event:0xe4 counters:0,1 um:zero minimum:500 name:BR_BOGUS : number of bogus branches +event:0xe6 counters:0,1 um:zero minimum:500 name:BACLEARS : number of times BACLEAR is asserted +event:0xf0 counters:0,1 um:zero minimum:3000 name:PREF_RQSTS_UP : Number of upward prefetches issued +event:0xf8 counters:0,1 um:zero minimum:3000 name:PREF_RQSTS_DN : Number of downward prefetches issued diff -urN oprofile-0.9.1/events/i386/core_2/unit_masks oprofile-0.9.1.core_2/events/i386/core_2/unit_masks --- oprofile-0.9.1/events/i386/core_2/unit_masks 1969-12-31 19:00:00.000000000 -0500 +++ oprofile-0.9.1.core_2/events/i386/core_2/unit_masks 2006-08-18 16:57:35.000000000 -0400 @@ -0,0 +1,197 @@ +# Core 2 possible unit masks +# +name:zero type:mandatory default:0x0 + 0x0 No unit mask +#name:one type:mandatory default:0x1 +# 0x1 No unit mask +name:two type:mandatory default:0x2 + 0x2 No unit mask +name:x0f type:mandatory default:0xf + 0xf No unit mask +name:x10 type:mandatory default:0x10 + 0x10 No unit mask +#name:x20 type:mandatory default:0x20 +# 0x20 No unit mask +#name:x40 type:mandatory default:0x40 +# 0x40 No unit mask +name:x41 type:mandatory default:0x41 + 0x41 No unit mask +name:x4f type:mandatory default:0x4f + 0x4f No unit mask +name:xc0 type:mandatory default:0xc0 + 0xc0 No unit mask +name:nonhlt type:exclusive default:0x0 + 0x0 Unhalted core cycles + 0x1 Unhalted bus cycles + 0x2 Unhalted bus cycles of this core while the other core is halted +name:mesi type:bitmask default:0x0f00 + 0x0800 (M)ESI: Modified + 0x0400 M(E)SI: Exclusive + 0x0200 ME(S)I: Shared + 0x0100 MES(I): Invalid +name:sse_prefetch type:exclusive default:0x0 + 0x00 prefetch NTA instructions executed. + 0x01 prefetch T1 instructions executed. + 0x02 prefetch T1 and T2 instructions executed. + 0x03 SSE weakly-ordered stores +name:simd_instr_type_exec type:bitmask default:0x3f + 0x01 SIMD packed multiplies + 0x02 SIMD packed shifts + 0x04 SIMD pack operations + 0x08 SIMD unpack operations + 0x10 SIMD packed logical + 0x20 SIMD packed arithmetic + 0x3f all of the above +name:mmx_trans type:exclusive default:0x0 + 0x00 MMX->float operations + 0x01 float->MMX operations +name:dc_pend_miss type:exclusive default:0x0 + 0x00 Weighted cycles + 0x01 Duration of cycles +name:sse_miss type:exclusive default:0x0 + 0x00 PREFETCHNTA + 0x01 PREFETCHT0 + 0x02 PREFETCHT1/PREFETCHT2 +name:load_block type:bitmask default:0x3e + 0x02 STA Loads blocked by a preceding store with unknown address. + 0x04 STD Loads blocked by a preceding store with unknown data. + 0x08 OVERLAP_STORE Loads that partially overlap an earlier store, or 4K aliased with a previous store. + 0x10 UNTIL_RETIRE Loads blocked until retirement. + 0x20 L1D Loads blocked by the L1 data cache. +name:store_block type:bitmask default:0x0b + 0x01 SB_DRAIN_CYCLES Cycles while stores are blocked due to store buffer drain. + 0x02 ORDER Cycles while store is waiting for a preceding store to be globally observed. + 0x08 NOOP A store is blocked due to a conflict with an external or internal snoop. +name:dtlb_miss type:bitmask default:0x0f + 0x01 ANY Memory accesses that missed the DTLB. + 0x02 MISS_LD DTLB misses due to load operations. + 0x04 L0_MISS_LD L0 DTLB misses due to load operations. + 0x08 MISS_ST TLB misses due to store operations. +name:memory_dis type:exclusive default:0x01 + 0x01 RESET Memory disambiguation reset cycles. + 0x02 SUCCESS Number of loads that were successfully disambiguated. +name:page_walks type:exclusive default:0x02 + 0x01 COUNT Number of page-walks executed. + 0x02 CYCLES Duration of page-walks in core cycles. +name:delayed_bypass type:exclusive default:0x00 + 0x00 FP Delayed bypass to FP operation. + 0x01 SIMD Delayed bypass to SIMD operation. + 0x02 LOAD Delayed bypass to load operation. +name:core type:exclusive default:0x4000 + 0xc000 All cores + 0x4000 This core +name:core_prefetch type:bitmask default:0xf000 + 0xc000 core: all cores + 0x4000 core: this core + 0x3000 prefetch: all inclusive + 0x1000 prefetch: Hardware prefetch only + 0x0000 prefetch: exclude hardware prefetch +name:core_mesi type:bitmask default:0xcf00 + 0xc000 core: all cores + 0x4000 core: this core + 0x0800 (M)ESI: Modified + 0x0400 M(E)SI: Exclusive + 0x0200 ME(S)I: Shared + 0x0100 MES(I): Invalid +name:core_prefetch_mesi type:bitmask default:0xff00 + 0xc000 core: all cores + 0x4000 core: this core + 0x3000 prefetch: all inclusive + 0x1000 prefetch: Hardware prefetch only + 0x0000 prefetch: exclude hardware prefetch + 0x0800 (M)ESI: Modified + 0x0400 M(E)SI: Exclusive + 0x0200 ME(S)I: Shared + 0x0100 MES(I): Invalid +name:l1d_split type:exclusive default:0x1 + 0x1 split loads + 0x2 split stores +name:bus_agents type:exclusive default:0x0000 + 0x0000 this agent + 0x1000 include all agents +name:core_and_bus_agents type:bitmask default:0xc000 + 0xc000 core: all cores + 0x4000 core: this core + 0x0000 bus: this agent + 0x1000 bus: include all agents +name:bus_agents_and_snoop type:bitmask default:0x1300 + 0x0000 bus: this agent + 0x1000 bus: include all agents + 0x0100 snoop: CMP2I snoops + 0x0200 snoop: CMP2S snoops +name:core_and_snoop type:bitmask default:0xc000 + 0xc000 core: all cores + 0x4000 core: this core + 0x0100 snoop: CMP2I snoops + 0x0200 snoop: CMP2S snoops +name:itlb_miss type:bitmask default:0x12 + 0x02 ITLB small page misses + 0x10 ITLB large page misses + 0x40 ITLB flushes +name:macro_insts type:bitmask default:0x09 + 0x01 Instructions decoded + 0x08 CISC Instructions decoded +name:esp type:bitmask default:0x01 + 0x01 ESP register content synchronizations + 0x02 ESP register automatic additions +name:inst_retired type:bitmask default:0x00 + 0x00 Any + 0x01 Loads + 0x02 Stores + 0x04 Other +name:x87_ops_retired type:exclusive default:0xfe + 0x01 FXCH instructions retired + 0xfe Retired floating-point computational operations (precise) +name:uops_retired type:bitmask default:0x0f + 0x01 Fused load+op or load+indirect branch retired + 0x02 Fused store address + data retired + 0x04 Retired instruction pairs fused into one micro-op + 0x07 Fused micro-ops retired + 0x08 Non-fused micro-ops retired + 0x0f Micro-ops retired +name:machine_nukes type:bitmask default:0x05 + 0x01 Self-Modifying Code detected + 0x04 Execution pipeline restart due to memory ordering conflict or memory disambiguation misprediction +name:br_inst_retired type:bitmask default:0xa + 0x01 predicted not-taken + 0x02 mispredicted not-taken + 0x04 predicted taken + 0x08 mispredicted taken +name:cycles_int_masked type:exclusive default:0x02 + 0x01 Interrupts disabled + 0x02 Interrupts pending and disabled +name:simd_inst_retired type:bitmask default:0x1f + 0x01 Retired SSE packed-single instructions + 0x02 Retired SSE scalar-single instructions + 0x04 Retired SSE2 packed-double instructions + 0x08 Retired SSE2 scalar-double instructions + 0x10 Retired SSE2 vector integer instructions + 0x1f Retired Streaming SIMD instructions (precise event) +name:simd_comp_inst_retired type:bitmask default:0xf + 0x01 Retired computational SSE packed-single instructions + 0x02 Retired computational SSE scalar-single instructions + 0x04 Retired computational SSE2 packed-double instructions + 0x08 Retired computational SSE2 scalar-double instructions +name:mem_load_retired type:exclusive default:0x01 + 0x01 Retired loads that miss the L1 data cache (precise event) + 0x02 L1 data cache line missed by retired loads (precise event) + 0x04 Retired loads that miss the L2 cache (precise event) + 0x08 L2 cache line missed by retired loads (precise event) + 0x10 Retired loads that miss the DTLB (precise event) +name:rat_stalls type:bitmask default:0xf + 0x01 ROB read port + 0x02 Partial register + 0x04 Flag + 0x08 FPU status word + 0x0f All RAT +name:seg_regs type:bitmask default:0x0f + 0x01 ES + 0x02 DS + 0x04 FS + 0x08 GS +name:resource_stalls type:bitmask default:0x0f + 0x01 when the ROB is full + 0x02 during which the RS is full + 0x04 during which the pipeline has exceeded the load or store limit or is waiting to commit all stores + 0x08 due to FPU control word write + 0x10 due to branch misprediction diff -urN oprofile-0.9.1/events/Makefile.am oprofile-0.9.1.core_2/events/Makefile.am --- oprofile-0.9.1/events/Makefile.am 2005-07-11 16:46:23.000000000 -0400 +++ oprofile-0.9.1.core_2/events/Makefile.am 2006-07-27 07:11:13.000000000 -0400 @@ -5,6 +5,7 @@ alpha/ev6/events alpha/ev6/unit_masks \ alpha/pca56/events alpha/pca56/unit_masks \ i386/athlon/events i386/athlon/unit_masks \ + i386/core_2/events i386/core_2/unit_masks \ i386/p4/events i386/p4-ht/events \ i386/p4-ht/unit_masks i386/p4/unit_masks \ i386/pii/events i386/pii/unit_masks \ diff -urN oprofile-0.9.1/events/Makefile.in oprofile-0.9.1.core_2/events/Makefile.in --- oprofile-0.9.1/events/Makefile.in 2006-08-18 16:59:54.000000000 -0400 +++ oprofile-0.9.1.core_2/events/Makefile.in 2006-07-27 07:11:17.000000000 -0400 @@ -186,6 +186,7 @@ alpha/ev6/events alpha/ev6/unit_masks \ alpha/pca56/events alpha/pca56/unit_masks \ i386/athlon/events i386/athlon/unit_masks \ + i386/core_2/events i386/core_2/unit_masks \ i386/p4/events i386/p4-ht/events \ i386/p4-ht/unit_masks i386/p4/unit_masks \ i386/pii/events i386/pii/unit_masks \ @@ -254,7 +255,7 @@ distdir: $(DISTFILES) - $(mkdir_p) $(distdir)/alpha/ev4 $(distdir)/alpha/ev5 $(distdir)/alpha/ev6 $(distdir)/alpha/ev67 $(distdir)/alpha/pca56 $(distdir)/arm/xscale1 $(distdir)/arm/xscale2 $(distdir)/i386/athlon $(distdir)/i386/p4 $(distdir)/i386/p4-ht $(distdir)/i386/p6_mobile $(distdir)/i386/pii $(distdir)/i386/piii $(distdir)/i386/ppro $(distdir)/ia64/ia64 $(distdir)/ia64/itanium $(distdir)/ia64/itanium2 $(distdir)/mips/24K $(distdir)/mips/r10000 $(distdir)/mips/r12000 $(distdir)/mips/rm7000 $(distdir)/mips/rm9000 $(distdir)/mips/sb1 $(distdir)/mips/vr5432 $(distdir)/mips/vr5500 $(distdir)/ppc/e500 $(distdir)/ppc64/970 $(distdir)/ppc64/power4 $(distdir)/ppc64/power5 $(distdir)/rtc $(distdir)/x86-64/hammer + $(mkdir_p) $(distdir)/alpha/ev4 $(distdir)/alpha/ev5 $(distdir)/alpha/ev6 $(distdir)/alpha/ev67 $(distdir)/alpha/pca56 $(distdir)/arm/xscale1 $(distdir)/arm/xscale2 $(distdir)/i386/athlon $(distdir)/i386/core_2 $(distdir)/i386/p4 $(distdir)/i386/p4-ht $(distdir)/i386/p6_mobile $(distdir)/i386/pii $(distdir)/i386/piii $(distdir)/i386/ppro $(distdir)/ia64/ia64 $(distdir)/ia64/itanium $(distdir)/ia64/itanium2 $(distdir)/mips/24K $(distdir)/mips/r10000 $(distdir)/mips/r12000 $(distdir)/mips/rm7000 $(distdir)/mips/rm9000 $(distdir)/mips/sb1 $(distdir)/mips/vr5432 $(distdir)/mips/vr5500 $(distdir)/ppc/e500 $(distdir)/ppc64/970 $(distdir)/ppc64/power4 $(distdir)/ppc64/power5 $(distdir)/rtc $(distdir)/x86-64/hammer @srcdirstrip=`echo "$(srcdir)" | sed 's|.|.|g'`; \ topsrcdirstrip=`echo "$(top_srcdir)" | sed 's|.|.|g'`; \ list='$(DISTFILES)'; for file in $$list; do \ diff -urN oprofile-0.9.1/libop/op_cpu_type.c oprofile-0.9.1.core_2/libop/op_cpu_type.c --- oprofile-0.9.1/libop/op_cpu_type.c 2005-07-11 16:46:23.000000000 -0400 +++ oprofile-0.9.1.core_2/libop/op_cpu_type.c 2006-07-27 07:11:13.000000000 -0400 @@ -55,6 +55,7 @@ { "NEC VR5432", "mips/vr5432", CPU_MIPS_VR5432, 2 }, { "NEC VR5500", "mips/vr5500", CPU_MIPS_VR5500, 2 }, { "e500", "ppc/e500", CPU_PPC_E500, 4 }, + { "Core 2", "i386/core_2", CPU_CORE_2, 2 }, }; static size_t const nr_cpu_descrs = sizeof(cpu_descrs) / sizeof(struct cpu_descr); diff -urN oprofile-0.9.1/libop/op_cpu_type.h oprofile-0.9.1.core_2/libop/op_cpu_type.h --- oprofile-0.9.1/libop/op_cpu_type.h 2005-07-11 16:46:23.000000000 -0400 +++ oprofile-0.9.1.core_2/libop/op_cpu_type.h 2006-07-27 07:11:13.000000000 -0400 @@ -51,6 +51,7 @@ CPU_MIPS_VR5432, /**< NEC VR5432 */ CPU_MIPS_VR5500, /**< MIPS VR5500, VR5532 and VR7701 */ CPU_PPC_E500, /**< e500 */ + CPU_CORE_2, /**< Intel Core 2 */ MAX_CPU_TYPE } op_cpu; |
From: William C. <wc...@re...> - 2006-09-06 14:44:21
|
Benjamin LaHaise wrote: > Hello, > > This patch adds events for the Core 2 CPUs to oprofile. It is diffed against > 0.9.1, but I can rebase if that is desired. The events list should > correspond to those listed in the upcoming revision of the Intel systems > programming manuals. This code has been approved for public release under > the GPL. > > -ben Hi Ben, How much testing have these files gotten? I don't seem to be getting sample for many of the events, e.g. opcontrol --setup --no-vmlinux --event=L1D_CACHE_LD:6000:0x0f00 --event=L1D_CACHE_ST:6000:0x0f00 opcontrol --setup --no-vmlinux --event=L2_LD:6000:0x4f00 --event=L2_ST:6000:0x4f00 The sampling mechanism is work the following set up definitely records date: opcontrol --setup --no-vmlinux --event=INST_RETIRED:500000:0 --event=CPU_CLK_UNHALTED:500000:0 > Signed-off-by: Benjamin LaHaise <ben...@in...> > diff -urN oprofile-0.9.1/events/i386/core_2/events oprofile-0.9.1.core_2/events/i386/core_2/events > --- oprofile-0.9.1/events/i386/core_2/events 1969-12-31 19:00:00.000000000 -0500 > +++ oprofile-0.9.1.core_2/events/i386/core_2/events 2006-08-18 16:57:30.000000000 -0400 > +event:0x2e counters:0,1 um:mesi minimum:6000 name:L2_RQSTS : number of L2 requests > +event:0x2e counters:0,1 um:x41 minimum:6000 name:L2_RQSTS.SELF.DEMAND.I_STATE : L2 cache demand requests from this core that missed the L2 > +event:0x2e counters:0,1 um:x4f minimum:6000 name:L2_RQSTS.SELF.DEMAND.I_STATE : L2 cache demand requests from this core Events should have unique names The second L2_RQSTS.SELF.DEMAND.I_STATE is not going to be seen in the file. Were these two events suppose to have different names? -Will |
From: Benjamin L. <bc...@kv...> - 2006-09-06 15:25:16
|
On Wed, Sep 06, 2006 at 10:44:18AM -0400, William Cohen wrote: > How much testing have these files gotten? I don't seem to be getting > sample for many of the events, e.g. Some -- I checked a random set of events on the machine I had and went over the spec a couple of times to make sure the events were correctly numbered. > opcontrol --setup --no-vmlinux --event=L2_LD:6000:0x4f00 > --event=L2_ST:6000:0x4f00 According to the docs I have, that event should be correct, but I'm seeing the same behaviour you are. I'll ask around. > Events should have unique names The second L2_RQSTS.SELF.DEMAND.I_STATE > is not going to be seen in the file. Were these two events suppose to > have different names? Whoops, I noticed that the first one should be _L2. The . -> _ patch has a fix for that one, too. -ben -- "Time is of no importance, Mr. President, only life is important." Don't Email: <do...@kv...>. |
From: Benjamin L. <bc...@kv...> - 2006-09-06 15:01:56
|
Below is an incremental patch over the Core 2 events, as further testing showed that those events could not be enabled. Signed-off-by: Benjamin LaHaise <ben...@in...> diff -ur oprofile-0.9.1.old/events/i386/core_2/events oprofile-0.9.1/events/i386/core_2/events --- oprofile-0.9.1.old/events/i386/core_2/events 2006-09-06 10:33:34.000000000 -0400 +++ oprofile-0.9.1/events/i386/core_2/events 2006-09-06 10:43:22.000000000 -0400 @@ -3,10 +3,10 @@ # Architectural events # event:0x3c counters:0,1 um:nonhlt minimum:6000 name:CPU_CLK_UNHALTED : Clock cycles when not halted -event:0xc0 counters:0,1 um:zero minimum:6000 name:INST_RETIRED.ANY_P : number of instructions retired +event:0xc0 counters:0,1 um:zero minimum:6000 name:INST_RETIRED_ANY_P : number of instructions retired event:0x2e counters:0,1 um:mesi minimum:6000 name:L2_RQSTS : number of L2 requests -event:0x2e counters:0,1 um:x41 minimum:6000 name:L2_RQSTS.SELF.DEMAND.I_STATE : L2 cache demand requests from this core that missed the L2 -event:0x2e counters:0,1 um:x4f minimum:6000 name:L2_RQSTS.SELF.DEMAND.I_STATE : L2 cache demand requests from this core +event:0x2e counters:0,1 um:x41 minimum:6000 name:L2_RQSTS_SELF_DEMAND_I_STATE_L2 : L2 cache demand requests from this core that missed the L2 +event:0x2e counters:0,1 um:x4f minimum:6000 name:L2_RQSTS_SELF_DEMAND_I_STATE : L2 cache demand requests from this core # # Model specific events # @@ -82,7 +82,7 @@ event:0x80 counters:0,1 um:zero minimum:500 name:L1I_READS : number of instruction fetches event:0x81 counters:0,1 um:zero minimum:500 name:L1I_MISSES : number of instruction fetch misses event:0x82 counters:0,1 um:itlb_miss minimum:500 name:ITLB : number of ITLB misses -event:0x83 counters:0,1 um:two minimum:500 name:INST_QUEUE.FULL : cycles during which the instruction queue is full +event:0x83 counters:0,1 um:two minimum:500 name:INST_QUEUE_FULL : cycles during which the instruction queue is full event:0x86 counters:0,1 um:zero minimum:500 name:IFU_MEM_STALL : cycles instruction fetch pipe is stalled event:0x87 counters:0,1 um:zero minimum:500 name:ILD_STALL : cycles instruction length decoder is stalled event:0x88 counters:0,1 um:zero minimum:3000 name:BR_INST_EXEC : Branch instructions executed (not necessarily retired) @@ -109,7 +109,7 @@ event:0xc0 counters:0,1 um:inst_retired minimum:6000 name:INST_RETIRED : number of instructions retired event:0xc1 counters:0,1 um:x87_ops_retired minimum:500 name:X87_OPS_RETIRED : number of computational FP operations retired event:0xc2 counters:0,1 um:uops_retired minimum:6000 name:UOPS_RETIRED : number of UOPs retired -event:0xc3 counters:0,1 um:machine_nukes minimum:500 name:MACHINE_NUKES.SMC : number of pipeline flushing events +event:0xc3 counters:0,1 um:machine_nukes minimum:500 name:MACHINE_NUKES_SMC : number of pipeline flushing events event:0xc4 counters:0,1 um:br_inst_retired minimum:500 name:BR_INST_RETIRED : number of branch instructions retired event:0xc5 counters:0,1 um:zero minimum:500 name:BR_MISS_PRED_RETIRED : number of mispredicted branches retired (precise) event:0xc6 counters:0,1 um:cycles_int_masked minimum:500 name:CYCLES_INT_MASKED : cycles interrupts are disabled |
From: John L. <le...@mo...> - 2006-09-07 00:30:13
|
On Wed, Sep 06, 2006 at 11:01:41AM -0400, Benjamin LaHaise wrote: > Below is an incremental patch over the Core 2 events, as further testing > showed that those events could not be enabled. > Will - this is OK for 0.9.2 if it works OK in testing. (I wonder why 'make check' isn't picking up the naming problems? It probably should.) regards john |
From: William C. <wc...@re...> - 2006-09-07 15:37:00
|
John Levon wrote: > On Wed, Sep 06, 2006 at 11:01:41AM -0400, Benjamin LaHaise wrote: > > >>Below is an incremental patch over the Core 2 events, as further testing >>showed that those events could not be enabled. >> > > > Will - this is OK for 0.9.2 if it works OK in testing. This patch doesn't seem to be working on my test machine: /usr/local/bin/opcontrol --verbose=all --setup --no-vmlinux \ --event=L2_RQSTS_SELF_DEMAND_I_STATE:6000 Invalid unit mask 0x4f for event L2_RQSTS_SELF_DEMAND_I_STATE I have tried other events with mandatory unit masks and they seem to work (e.g. THERMAL_TRIP). It looks like ophelp is having a problem. opcontrol passes the following in for L2_RQSTS_SELF_DEMAND_I_STATE /usr/local/bin/ophelp --check-events L2_RQSTS_SELF_DEMAND_I_STATE:6000:79:1:1 --callgraph=0 Invalid unit mask 0x4f for event L2_RQSTS_SELF_DEMAND_I_STATE 79 is 0x4f, so that appears to be right. -Will > (I wonder why 'make check' isn't picking up the naming problems? It > probably should.) Does the 'make check' have a test to look at the names in the event files? I didn't think it did that type of checking. > > regards > john > > ------------------------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > oprofile-list mailing list > opr...@li... > https://lists.sourceforge.net/lists/listinfo/oprofile-list |
From: Benjamin L. <bc...@kv...> - 2006-09-07 16:03:09
|
On Thu, Sep 07, 2006 at 11:53:14AM -0400, William Cohen wrote: > These unit masks don't match, causing the failure. Given the multiple > event names with different unit masks mappings to the same event number. > This is going to be a problem. The entries do indeed have the same event number. The entry+unit masks in question are the architectural definitions, which are defined for any future CPUs with family == 6. That they overlap with another currently defined event is intentional, but I'm not sure if this is the best way to handle things. -ben -- "Time is of no importance, Mr. President, only life is important." Don't Email: <do...@kv...>. |
From: William C. <wc...@re...> - 2006-09-08 21:18:16
Attachments:
core_2_8bit_um_names.diff
|
Benjamin LaHaise wrote: > On Thu, Sep 07, 2006 at 03:00:04PM -0400, William Cohen wrote: > >>The driver assembles the various bits fields from the files into the >>complete configuration. Any adjustment to position the bits is done in >>the driver, > > > Ahhh, I see. You're right, the masks should only be in only the lower > 8 bits. The way the documentation jumps back and forth between 8 bit > and 16 bit values is decidedly confusing. Something like the below > version of the events file is what I've got now, and it actually generates > events for the mesi and core_mesi using events. > > >>The mesi unit mask is only used on L1 cache events. Is the L1 cache >>private to each processor? In that case I could see why it isn't needed. > > > Yup, the L1 cache is per-core, with the L2 cache shared between both cores. > > -ben Hi Ben, Here is the revision that I have to correct the unit mask and revert the names back to the original. With these unit masks the events seem to be getting samples and they seemd to be making seens, e.g. about twice the number of loads block as store blocks. The unit masks should be okay, but could you look over the names and see the event names are the official names. If it is okay, I will check the changes into oprofile cvs. Is there a public Intel document that lists these events? If so, could you give a pointer to it and where we should be looking in there? -Will |
From: Benjamin L. <bc...@kv...> - 2006-09-11 13:40:15
|
On Fri, Sep 08, 2006 at 05:18:06PM -0400, William Cohen wrote: > Here is the revision that I have to correct the unit mask and revert the > names back to the original. With these unit masks the events seem to be > getting samples and they seemd to be making seens, e.g. about twice the > number of loads block as store blocks. The unit masks should be okay, > but could you look over the names and see the event names are the > official names. If it is okay, I will check the changes into oprofile cvs. That looks pretty good. > Is there a public Intel document that lists these events? If so, could > you give a pointer to it and where we should be looking in there? The docs are still going through review, and are not out just yet. When they get published, it should be as an updated version of the System Programming Guide under technical documents for the Core 2. -ben -- "Time is of no importance, Mr. President, only life is important." Don't Email: <do...@kv...>. |
From: William C. <wc...@re...> - 2006-09-11 14:07:51
|
Benjamin LaHaise wrote: > On Fri, Sep 08, 2006 at 05:18:06PM -0400, William Cohen wrote: > >>Here is the revision that I have to correct the unit mask and revert the >>names back to the original. With these unit masks the events seem to be >>getting samples and they seemd to be making seens, e.g. about twice the >>number of loads block as store blocks. The unit masks should be okay, >>but could you look over the names and see the event names are the >>official names. If it is okay, I will check the changes into oprofile cvs. > > > That looks pretty good. Okay it has been checked into the cvs repository. >>Is there a public Intel document that lists these events? If so, could >>you give a pointer to it and where we should be looking in there? > > > The docs are still going through review, and are not out just yet. When > they get published, it should be as an updated version of the System > Programming Guide under technical documents for the Core 2. > > -ben Thanks, I was just making sure that I didn't miss the document that had that information. -Will |
From: William C. <wc...@re...> - 2006-09-07 15:53:19
|
William Cohen wrote: > John Levon wrote: > >>On Wed, Sep 06, 2006 at 11:01:41AM -0400, Benjamin LaHaise wrote: >> >> >> >>>Below is an incremental patch over the Core 2 events, as further testing >>>showed that those events could not be enabled. >>> >> >> >>Will - this is OK for 0.9.2 if it works OK in testing. > > > This patch doesn't seem to be working on my test machine: > > /usr/local/bin/opcontrol --verbose=all --setup --no-vmlinux \ > --event=L2_RQSTS_SELF_DEMAND_I_STATE:6000 > Invalid unit mask 0x4f for event L2_RQSTS_SELF_DEMAND_I_STATE > > I have tried other events with mandatory unit masks and they seem to > work (e.g. THERMAL_TRIP). > > It looks like ophelp is having a problem. opcontrol passes the following > in for L2_RQSTS_SELF_DEMAND_I_STATE > > > /usr/local/bin/ophelp --check-events > L2_RQSTS_SELF_DEMAND_I_STATE:6000:79:1:1 --callgraph=0 > Invalid unit mask 0x4f for event L2_RQSTS_SELF_DEMAND_I_STATE > > > 79 is 0x4f, so that appears to be right. I see what is going wrong. The checking routing op_check_events uses the numerical value. In this case 0x2e. There are multiple events names that have that same number. The first event encountered in the list is the one selected. In this case L2_RQSTS is selected, which has the mesi mask: name:mesi type:bitmask default:0x0f00 0x0800 (M)ESI: Modified 0x0400 M(E)SI: Exclusive 0x0200 ME(S)I: Shared 0x0100 MES(I): Invalid L2_RQSTS_SELF_DEMAND_I_STATE has name:x41 type:mandatory default:0x41 0x41 No unit mask These unit masks don't match, causing the failure. Given the multiple event names with different unit masks mappings to the same event number. This is going to be a problem. -Will > > > > > -Will > > >>(I wonder why 'make check' isn't picking up the naming problems? It >>probably should.) > > > Does the 'make check' have a test to look at the names in the event > files? I didn't think it did that type of checking. > > > >>regards >>john >> >>------------------------------------------------------------------------- >>Using Tomcat but need to do more? Need to support web services, security? >>Get stuff done quickly with pre-integrated technology to make your job easier >>Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo >>http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 >>_______________________________________________ >>oprofile-list mailing list >>opr...@li... >>https://lists.sourceforge.net/lists/listinfo/oprofile-list > > > > ------------------------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > oprofile-list mailing list > opr...@li... > https://lists.sourceforge.net/lists/listinfo/oprofile-list |
From: William C. <wc...@re...> - 2006-09-07 16:13:33
|
Benjamin LaHaise wrote: > On Thu, Sep 07, 2006 at 11:53:14AM -0400, William Cohen wrote: > >>These unit masks don't match, causing the failure. Given the multiple >>event names with different unit masks mappings to the same event number. >>This is going to be a problem. > > > The entries do indeed have the same event number. The entry+unit masks in > question are the architectural definitions, which are defined for any > future CPUs with family == 6. That they overlap with another currently > defined event is intentional, but I'm not sure if this is the best way > to handle things. > > -ben Would it be okay to superset the L2_RQSTS unit mask to include the architected events? Also how wide is the unit mask on the core processors? Is the performance monitoring hardware design similar to the p6 based processors? I am wondering if there is some confusion in the unit masks where some of them have the lower 8bit added to align them. OProfile handles alignment, so they shouldn't be there. That would explain why things using the mesi unit mask are not working. -Will |
From: William C. <wc...@re...> - 2006-09-07 16:30:17
Attachments:
core_2_8bit_um.diff
|
Benjamin LaHaise wrote: > On Thu, Sep 07, 2006 at 11:53:14AM -0400, William Cohen wrote: > >>These unit masks don't match, causing the failure. Given the multiple >>event names with different unit masks mappings to the same event number. >>This is going to be a problem. > > > The entries do indeed have the same event number. The entry+unit masks in > question are the architectural definitions, which are defined for any > future CPUs with family == 6. That they overlap with another currently > defined event is intentional, but I'm not sure if this is the best way > to handle things. > > -ben Hi Ben, Does the following patch look reasonable? It trims the low 8bits from larger unit masks. Should the other events using "um:mesi" also use core_mesi? -Will |
From: Benjamin L. <bc...@kv...> - 2006-09-07 16:58:06
|
On Thu, Sep 07, 2006 at 12:30:11PM -0400, William Cohen wrote: > Does the following patch look reasonable? It trims the low 8bits from > larger unit masks. Should the other events using "um:mesi" also use > core_mesi? Exactly how is oprofile adjusting these value? I'm assuming the mask is getting written into the IA32_PERFEVTSELx registers, in which case the bit positions are as the masks provided state. As for mesi vs core_mesi, the docs do not include the core selector (self, all) for those events. If they seem to be required, the docs will need to be fixed. -ben -- "Time is of no importance, Mr. President, only life is important." Don't Email: <do...@kv...>. |
From: Benjamin L. <bc...@kv...> - 2006-09-07 17:00:24
|
On Thu, Sep 07, 2006 at 12:13:19PM -0400, William Cohen wrote: > Would it be okay to superset the L2_RQSTS unit mask to include the > architected events? Sounds like a kludge, but it would work. > Also how wide is the unit mask on the core processors? Is the > performance monitoring hardware design similar to the p6 based > processors? I am wondering if there is some confusion in the unit masks > where some of them have the lower 8bit added to align them. OProfile > handles alignment, so they shouldn't be there. That would explain why > things using the mesi unit mask are not working. It seems to be implied that the low 16 bits of the IA32_PERFEVTSELx MSRs are being used. Is oprofile shifting these values? -ben -- "Time is of no importance, Mr. President, only life is important." Don't Email: <do...@kv...>. |
From: William C. <wc...@re...> - 2006-09-07 18:34:23
|
Benjamin LaHaise wrote: > On Thu, Sep 07, 2006 at 12:13:19PM -0400, William Cohen wrote: > >>Would it be okay to superset the L2_RQSTS unit mask to include the >>architected events? > > > Sounds like a kludge, but it would work. > > >>Also how wide is the unit mask on the core processors? Is the >>performance monitoring hardware design similar to the p6 based >>processors? I am wondering if there is some confusion in the unit masks >>where some of them have the lower 8bit added to align them. OProfile >>handles alignment, so they shouldn't be there. That would explain why >>things using the mesi unit mask are not working. > > > It seems to be implied that the low 16 bits of the IA32_PERFEVTSELx MSRs > are being used. Is oprofile shifting these values? > > -ben Hi Ben, The low 8 bits are the actual event number on the p6 based hardware. OProfile takes care of the alignment, so the low bits do not need to be included in the unit masks. Given that the problem with finding the umask was due to the using the event number to look things up rather than "." vs "_" what are the preferred names for the events that had the "." in them? I would like to get consistent naming for them so all performance monitoring software (e.g. PAPI and perfmon) are using consistent names for the same events. I took a quick look at the event names in the perfmon2 libpfm to see whether the names matched up. There are some differences from libpfm/lib/coreduo.h. Do the events using the mesi unit mask need to use the core_mesi unit mask? -Will |
From: William C. <wc...@re...> - 2006-09-07 19:00:14
|
Benjamin LaHaise wrote: > On Thu, Sep 07, 2006 at 12:30:11PM -0400, William Cohen wrote: > >>Does the following patch look reasonable? It trims the low 8bits from >>larger unit masks. Should the other events using "um:mesi" also use >>core_mesi? > > > Exactly how is oprofile adjusting these value? I'm assuming the mask is > getting written into the IA32_PERFEVTSELx registers, in which case the > bit positions are as the masks provided state. As for mesi vs core_mesi, > the docs do not include the core selector (self, all) for those events. > If they seem to be required, the docs will need to be fixed. > > -ben The OProfile driver make has a directory /dev/oprofile to communicate between the kernel and user-space: # ls /dev/oprofile/ 0/ buffer_size cpu_type pointer_size 1/ buffer_watershed dump stats/ buffer cpu_buffer_size enable The two numbered directories "0" and "1" hold the information for the two counters. The unit_mask and event are separate files in those directories: # ls /dev/oprofile/0 count enabled event kernel unit_mask user The driver assembles the various bits fields from the files into the complete configuration. Any adjustment to position the bits is done in the driver, The mesi unit mask is only used on L1 cache events. Is the L1 cache private to each processor? In that case I could see why it isn't needed. -Will |
From: Benjamin L. <bc...@kv...> - 2006-09-07 19:15:02
|
On Thu, Sep 07, 2006 at 03:00:04PM -0400, William Cohen wrote: > The driver assembles the various bits fields from the files into the > complete configuration. Any adjustment to position the bits is done in > the driver, Ahhh, I see. You're right, the masks should only be in only the lower 8 bits. The way the documentation jumps back and forth between 8 bit and 16 bit values is decidedly confusing. Something like the below version of the events file is what I've got now, and it actually generates events for the mesi and core_mesi using events. > The mesi unit mask is only used on L1 cache events. Is the L1 cache > private to each processor? In that case I could see why it isn't needed. Yup, the L1 cache is per-core, with the L2 cache shared between both cores. -ben ... unit_masks ... # Core 2 possible unit masks # name:zero type:mandatory default:0x0 0x0 No unit mask #name:one type:mandatory default:0x1 # 0x1 No unit mask name:two type:mandatory default:0x2 0x2 No unit mask name:x0f type:mandatory default:0xf 0xf No unit mask name:x10 type:mandatory default:0x10 0x10 No unit mask #name:x20 type:mandatory default:0x20 # 0x20 No unit mask #name:x40 type:mandatory default:0x40 # 0x40 No unit mask name:x41 type:mandatory default:0x41 0x41 No unit mask name:x4f type:mandatory default:0x4f 0x4f No unit mask name:xc0 type:mandatory default:0xc0 0xc0 No unit mask name:nonhlt type:exclusive default:0x0 0x0 Unhalted core cycles 0x1 Unhalted bus cycles 0x2 Unhalted bus cycles of this core while the other core is halted name:mesi type:bitmask default:0x0f 0x40 core: this core 0x08 (M)ESI: Modified 0x04 M(E)SI: Exclusive 0x02 ME(S)I: Shared 0x01 MES(I): Invalid name:sse_prefetch type:exclusive default:0x0 0x00 prefetch NTA instructions executed. 0x01 prefetch T1 instructions executed. 0x02 prefetch T1 and T2 instructions executed. 0x03 SSE weakly-ordered stores name:simd_instr_type_exec type:bitmask default:0x3f 0x01 SIMD packed multiplies 0x02 SIMD packed shifts 0x04 SIMD pack operations 0x08 SIMD unpack operations 0x10 SIMD packed logical 0x20 SIMD packed arithmetic 0x3f all of the above name:mmx_trans type:exclusive default:0x0 0x00 MMX->float operations 0x01 float->MMX operations name:dc_pend_miss type:exclusive default:0x0 0x00 Weighted cycles 0x01 Duration of cycles name:sse_miss type:exclusive default:0x0 0x00 PREFETCHNTA 0x01 PREFETCHT0 0x02 PREFETCHT1/PREFETCHT2 name:load_block type:bitmask default:0x3e 0x02 STA Loads blocked by a preceding store with unknown address. 0x04 STD Loads blocked by a preceding store with unknown data. 0x08 OVERLAP_STORE Loads that partially overlap an earlier store, or 4K aliased with a previous store. 0x10 UNTIL_RETIRE Loads blocked until retirement. 0x20 L1D Loads blocked by the L1 data cache. name:store_block type:bitmask default:0x0b 0x01 SB_DRAIN_CYCLES Cycles while stores are blocked due to store buffer drain. 0x02 ORDER Cycles while store is waiting for a preceding store to be globally observed. 0x08 NOOP A store is blocked due to a conflict with an external or internal snoop. name:dtlb_miss type:bitmask default:0x0f 0x01 ANY Memory accesses that missed the DTLB. 0x02 MISS_LD DTLB misses due to load operations. 0x04 L0_MISS_LD L0 DTLB misses due to load operations. 0x08 MISS_ST TLB misses due to store operations. name:memory_dis type:exclusive default:0x01 0x01 RESET Memory disambiguation reset cycles. 0x02 SUCCESS Number of loads that were successfully disambiguated. name:page_walks type:exclusive default:0x02 0x01 COUNT Number of page-walks executed. 0x02 CYCLES Duration of page-walks in core cycles. name:delayed_bypass type:exclusive default:0x00 0x00 FP Delayed bypass to FP operation. 0x01 SIMD Delayed bypass to SIMD operation. 0x02 LOAD Delayed bypass to load operation. name:core type:exclusive default:0x40 0xc0 All cores 0x40 This core name:core_prefetch type:bitmask default:0xf0 0xc0 core: all cores 0x40 core: this core 0x30 prefetch: all inclusive 0x10 prefetch: Hardware prefetch only 0x00 prefetch: exclude hardware prefetch name:core_mesi type:bitmask default:0xcf 0xc0 core: all cores 0x40 core: this core 0x08 (M)ESI: Modified 0x04 M(E)SI: Exclusive 0x02 ME(S)I: Shared 0x01 MES(I): Invalid name:core_prefetch_mesi type:bitmask default:0xff 0xc0 core: all cores 0x40 core: this core 0x30 prefetch: all inclusive 0x10 prefetch: Hardware prefetch only 0x00 prefetch: exclude hardware prefetch 0x08 (M)ESI: Modified 0x04 M(E)SI: Exclusive 0x02 ME(S)I: Shared 0x01 MES(I): Invalid name:l1d_split type:exclusive default:0x1 0x1 split loads 0x2 split stores name:bus_agents type:exclusive default:0x00 0x00 this agent 0x10 include all agents name:core_and_bus_agents type:bitmask default:0xc0 0xc0 core: all cores 0x40 core: this core 0x00 bus: this agent 0x10 bus: include all agents name:bus_agents_and_snoop type:bitmask default:0x13 0x00 bus: this agent 0x10 bus: include all agents 0x01 snoop: CMP2I snoops 0x02 snoop: CMP2S snoops name:core_and_snoop type:bitmask default:0xc0 0xc0 core: all cores 0x40 core: this core 0x01 snoop: CMP2I snoops 0x02 snoop: CMP2S snoops name:itlb_miss type:bitmask default:0x12 0x02 ITLB small page misses 0x10 ITLB large page misses 0x40 ITLB flushes name:macro_insts type:bitmask default:0x09 0x01 Instructions decoded 0x08 CISC Instructions decoded name:esp type:bitmask default:0x01 0x01 ESP register content synchronizations 0x02 ESP register automatic additions name:inst_retired type:bitmask default:0x00 0x00 Any 0x01 Loads 0x02 Stores 0x04 Other name:x87_ops_retired type:exclusive default:0xfe 0x01 FXCH instructions retired 0xfe Retired floating-point computational operations (precise) name:uops_retired type:bitmask default:0x0f 0x01 Fused load+op or load+indirect branch retired 0x02 Fused store address + data retired 0x04 Retired instruction pairs fused into one micro-op 0x07 Fused micro-ops retired 0x08 Non-fused micro-ops retired 0x0f Micro-ops retired name:machine_nukes type:bitmask default:0x05 0x01 Self-Modifying Code detected 0x04 Execution pipeline restart due to memory ordering conflict or memory disambiguation misprediction name:br_inst_retired type:bitmask default:0xa 0x01 predicted not-taken 0x02 mispredicted not-taken 0x04 predicted taken 0x08 mispredicted taken name:cycles_int_masked type:exclusive default:0x02 0x01 Interrupts disabled 0x02 Interrupts pending and disabled name:simd_inst_retired type:bitmask default:0x1f 0x01 Retired SSE packed-single instructions 0x02 Retired SSE scalar-single instructions 0x04 Retired SSE2 packed-double instructions 0x08 Retired SSE2 scalar-double instructions 0x10 Retired SSE2 vector integer instructions 0x1f Retired Streaming SIMD instructions (precise event) name:simd_comp_inst_retired type:bitmask default:0xf 0x01 Retired computational SSE packed-single instructions 0x02 Retired computational SSE scalar-single instructions 0x04 Retired computational SSE2 packed-double instructions 0x08 Retired computational SSE2 scalar-double instructions name:mem_load_retired type:exclusive default:0x01 0x01 Retired loads that miss the L1 data cache (precise event) 0x02 L1 data cache line missed by retired loads (precise event) 0x04 Retired loads that miss the L2 cache (precise event) 0x08 L2 cache line missed by retired loads (precise event) 0x10 Retired loads that miss the DTLB (precise event) name:rat_stalls type:bitmask default:0xf 0x01 ROB read port 0x02 Partial register 0x04 Flag 0x08 FPU status word 0x0f All RAT name:seg_regs type:bitmask default:0x0f 0x01 ES 0x02 DS 0x04 FS 0x08 GS name:resource_stalls type:bitmask default:0x0f 0x01 when the ROB is full 0x02 during which the RS is full 0x04 during which the pipeline has exceeded the load or store limit or is waiting to commit all stores 0x08 due to FPU control word write 0x10 due to branch misprediction |