From: Andi K. <an...@fi...> - 2013-06-18 23:41:58
|
Including Suravee's patch for named default masks, and another patch to make all Intel default unit masks unique. -Andi |
From: Andi K. <an...@fi...> - 2013-06-18 23:40:42
|
From: Andi Kleen <ak...@li...> Use names to make all non unique Intel default unit masks unique --- events/i386/haswell/unit_masks | 24 ++++++++++++------------ events/i386/ivybridge/unit_masks | 22 +++++++++++----------- events/i386/sandybridge/unit_masks | 20 ++++++++++---------- 3 files changed, 33 insertions(+), 33 deletions(-) diff --git a/events/i386/haswell/unit_masks b/events/i386/haswell/unit_masks index e594e9f..32e1c1e 100644 --- a/events/i386/haswell/unit_masks +++ b/events/i386/haswell/unit_masks @@ -27,7 +27,7 @@ name:dtlb_load_misses type:exclusive default:0x1 0x20 extra: stlb_hit_4k Load misses that miss the DTLB and hit the STLB (4K) 0x40 extra: stlb_hit_2m Load misses that miss the DTLB and hit the STLB (2M) 0x80 extra: pde_cache_miss DTLB demand load misses with low part of linear-to-physical address translation missed -name:uops_issued type:exclusive default:0x1 +name:uops_issued type:exclusive default:any 0x1 extra: any Uops that Resource Allocation Table (RAT) issues to Reservation Station (RS) 0x10 extra: flags_merge Number of flags-merge uops being allocated. Such uops considered perf sensitive; added by GSR u-arch. 0x20 extra: slow_lea Number of slow LEA uops being allocated. A uop is generally considered SlowLea if it has 3 sources (e.g. 2 sources + immediate) regardless if as a result of LEA instruction or not. @@ -51,7 +51,7 @@ name:l2_rqsts type:exclusive default:0x21 0xe7 extra: all_demand_references Demand requests to L2 cache 0x3f extra: miss All requests that miss L2 cache 0xff extra: references All L2 requests -name:l1d_pend_miss type:exclusive default:0x1 +name:l1d_pend_miss type:exclusive default:pending 0x1 extra: pending L1D miss oustandings duration in cycles 0x1 extra:cmask=1 pending_cycles Cycles with L1D load Misses outstanding. 0x1 extra:cmask=1,edge occurences This event counts the number of L1D misses outstanding, using an edge detect to count transitions. @@ -81,7 +81,7 @@ name:move_elimination type:exclusive default:0x1 0x2 extra: simd_eliminated Number of SIMD Move Elimination candidate uops that were eliminated. 0x4 extra: int_not_eliminated Number of integer Move Elimination candidate uops that were not eliminated. 0x8 extra: simd_not_eliminated Number of SIMD Move Elimination candidate uops that were not eliminated. -name:cpl_cycles type:exclusive default:0x1 +name:cpl_cycles type:exclusive default:ring0 0x1 extra: ring0 Unhalted core cycles when the thread is in ring 0 0x2 extra: ring123 Unhalted core cycles when thread is in rings 1, 2, or 3 0x1 extra:cmask=1,edge ring0_trans Number of intervals between processor halts while thread is in ring 0 @@ -145,14 +145,14 @@ name:br_misp_exec type:exclusive default:0xff 0xc1 extra: all_conditional Speculative and retired mispredicted macro conditional branches 0xc4 extra: all_indirect_jump_non_call_ret Mispredicted indirect branches excluding calls and returns 0xa0 extra: taken_indirect_near_call Taken speculative and retired mispredicted indirect calls -name:idq_uops_not_delivered type:exclusive default:0x1 +name:idq_uops_not_delivered type:exclusive default:core 0x1 extra: core Uops not delivered to Resource Allocation Table (RAT) per thread when backend of the machine is not stalled 0x1 extra:cmask=4 cycles_0_uops_deliv_core Cycles per thread when 4 or more uops are not delivered to Resource Allocation Table (RAT) when backend of the machine is not stalled 0x1 extra:cmask=3 cycles_le_1_uop_deliv_core Cycles per thread when 3 or more uops are not delivered to Resource Allocation Table (RAT) when backend of the machine is not stalled 0x1 extra:cmask=2 cycles_le_2_uop_deliv_core Cycles with less than 2 uops delivered by the front end. 0x1 extra:cmask=1 cycles_le_3_uop_deliv_core Cycles with less than 3 uops delivered by the front end. 0x1 extra:cmask=1,inv cycles_fe_was_ok Counts cycles FE delivered 4 uops or Resource Allocation Table (RAT) was stalling FE. -name:uops_executed_port type:exclusive default:0x1 +name:uops_executed_port type:exclusive default:port_0 0x1 extra: port_0 Cycles per thread when uops are executed in port 0 0x2 extra: port_1 Cycles per thread when uops are executed in port 1 0x4 extra: port_2 Cycles per thread when uops are executed in port 2 @@ -183,7 +183,7 @@ name:offcore_requests type:exclusive default:0x2 0x2 extra: demand_code_rd Cacheable and noncachaeble code read requests 0x4 extra: demand_rfo Demand RFO requests including regular RFOs, locks, ItoM 0x8 extra: all_data_rd Demand and prefetch data reads -name:uops_executed type:exclusive default:0x1 +name:uops_executed type:exclusive default:thread 0x1 extra: thread Counts the number of uops to be executed per-thread each cycle. 0x2 extra: core Number of uops executed on the core. 0x1 extra:cmask=1,inv stall_cycles Counts number of cycles no uops were dispatched to be executed on this thread. @@ -207,7 +207,7 @@ name:other_assists type:exclusive default:0x8 0x8 extra: avx_to_sse Number of transitions from AVX-256 to legacy SSE when penalty applicable. 0x10 extra: sse_to_avx Number of transitions from SSE to AVX-256 when penalty applicable. 0x40 extra: any_wb_assist Number of times any microcode assist is invoked by HW upon uop writeback. -name:uops_retired type:exclusive default:0x1 +name:uops_retired type:exclusive default:all 0x1 extra: all Actually retired uops. 0x2 extra: retire_slots Retirement slots used. 0x1 extra:pebs all_ps Actually retired uops. (Precise Event - PEBS) @@ -219,7 +219,7 @@ name:machine_clears type:exclusive default:0x2 0x2 extra: memory_ordering Counts the number of machine clears due to memory order conflicts. 0x4 extra: smc Self-modifying code (SMC) detected. 0x20 extra: maskmov This event counts the number of executed Intel AVX masked load operations that refer to an illegal address range with the mask bits set to 0. -name:br_inst_retired type:exclusive default:0x1 +name:br_inst_retired type:exclusive default:all_branches_ps 0x1 extra: conditional Conditional branch instructions retired. 0x2 extra: near_call Direct and indirect near call instructions retired. 0x8 extra: near_return Return instructions retired. @@ -233,7 +233,7 @@ name:br_inst_retired type:exclusive default:0x1 0x20 extra:pebs near_taken_ps Taken branch instructions retired. (Precise Event - PEBS) 0x2 extra: near_call_r3 Direct and indirect macro near call instructions retired (captured in ring 3). 0x2 extra:pebs near_call_r3_ps Direct and indirect macro near call instructions retired (captured in ring 3). (Precise Event - PEBS) -name:br_misp_retired type:exclusive default:0x1 +name:br_misp_retired type:exclusive default:all_branches_ps 0x1 extra: conditional Mispredicted conditional branch instructions retired. 0x1 extra:pebs conditional_ps Mispredicted conditional branch instructions retired. (Precise Event - PEBS) 0x4 extra:pebs all_branches_ps Mispredicted macro branch instructions retired. (Precise Event - PEBS) @@ -263,7 +263,7 @@ name:fp_assist type:exclusive default:0x1e 0x4 extra: x87_input Number of X87 assists due to input value. 0x8 extra: simd_output Number of SIMD FP assists due to Output values 0x10 extra: simd_input Number of SIMD FP assists due to input values -name:mem_uops_retired type:exclusive default:0x11 +name:mem_uops_retired type:exclusive default:all_loads 0x11 extra: stlb_miss_loads Load uops with true STLB miss retired to architected path. 0x12 extra: stlb_miss_stores Store uops with true STLB miss retired to architected path. 0x21 extra: lock_loads Load uops with locked access retired to architected path. @@ -278,7 +278,7 @@ name:mem_uops_retired type:exclusive default:0x11 0x42 extra:pebs split_stores_ps Line-splitted store uops retired to architected path. (Precise Event - PEBS) 0x81 extra:pebs all_loads_ps Load uops retired to architected path with filter on bits 0 and 1 applied. (Precise Event - PEBS) 0x82 extra:pebs all_stores_ps Store uops retired to architected path with filter on bits 0 and 1 applied. (Precise Event - PEBS) -name:mem_load_uops_retired type:exclusive default:0x1 +name:mem_load_uops_retired type:exclusive default:l1_hit 0x1 extra: l1_hit Retired load uops with L1 cache hits as data sources. 0x2 extra: l2_hit Retired load uops with L2 cache hits as data sources. 0x4 extra: l3_hit Retired load uops which data sources were data hits in LLC without snoops required. @@ -288,7 +288,7 @@ name:mem_load_uops_retired type:exclusive default:0x1 0x2 extra:pebs l2_hit_ps Retired load uops with L2 cache hits as data sources. (Precise Event - PEBS) 0x4 extra:pebs l3_hit_ps Miss in last-level (L3) cache. Excludes Unknown data-source. (Precise Event - PEBS) 0x40 extra:pebs hit_lfb_ps Retired load uops which data sources were load uops missed L1 but hit FB due to preceding miss to the same cache line with data not ready. (Precise Event - PEBS) -name:mem_load_uops_l3_hit_retired type:exclusive default:0x1 +name:mem_load_uops_l3_hit_retired type:exclusive default:xsnp_miss 0x1 extra: xsnp_miss Retired load uops which data sources were LLC hit and cross-core snoop missed in on-pkg core cache. 0x2 extra: xsnp_hit Retired load uops which data sources were LLC and cross-core snoop hits in on-pkg core cache. 0x4 extra: xsnp_hitm Retired load uops which data sources were HitM responses from shared LLC. diff --git a/events/i386/ivybridge/unit_masks b/events/i386/ivybridge/unit_masks index ffee1fa..7ff16e3 100644 --- a/events/i386/ivybridge/unit_masks +++ b/events/i386/ivybridge/unit_masks @@ -15,17 +15,17 @@ name:dtlb_load_misses type:exclusive default:0x81 0x81 extra: demand_ld_miss_causes_a_walk Demand load Miss in all translation lookaside buffer (TLB) levels causes an page walk of any page size. 0x82 extra: demand_ld_walk_completed Demand load Miss in all translation lookaside buffer (TLB) levels causes a page walk that completes of any page size. 0x84 extra: demand_ld_walk_duration Demand load cycles page miss handler (PMH) is busy with this walk. -name:int_misc type:exclusive default:0x3 +name:int_misc type:exclusive default:recovery_cycles 0x3 extra:cmask=1 recovery_cycles Number of cycles waiting for the checkpoints in Resource Allocation Table (RAT) to be recovered after Nuke due to all other cases except JEClear (e.g. whenever a ucode assist is needed like SSE exception, memory disambiguation, etc...) 0x3 extra:cmask=1,edge recovery_stalls_count Number of occurences waiting for the checkpoints in Resource Allocation Table (RAT) to be recovered after Nuke due to all other cases except JEClear (e.g. whenever a ucode assist is needed like SSE exception, memory disambiguation, etc...) -name:uops_issued type:exclusive default:0x1 +name:uops_issued type:exclusive default:any 0x1 extra: any Uops that Resource Allocation Table (RAT) issues to Reservation Station (RS) 0x1 extra:cmask=1,inv stall_cycles Cycles when Resource Allocation Table (RAT) does not issue Uops to Reservation Station (RS) for the thread 0x1 extra:cmask=1,inv,any core_stall_cycles Cycles when Resource Allocation Table (RAT) does not issue Uops to Reservation Station (RS) for all threads 0x10 extra: flags_merge Number of flags-merge uops being allocated. 0x20 extra: slow_lea Number of slow LEA uops being allocated. A uop is generally considered SlowLea if it has 3 sources (e.g. 2 sources + immediate) regardless if as a result of LEA instruction or not. 0x40 extra: single_mul Number of Multiply packed/scalar single precision uops allocated -name:arith type:bitmask default:0x1 +name:arith type:bitmask default:fpu_div_active 0x1 extra: fpu_div_active Cycles when divider is busy executing divide operations 0x4 extra:cmask=1,edge fpu_div Divide operations executed name:l2_rqsts type:exclusive default:0x1 @@ -49,7 +49,7 @@ name:l2_l1d_wb_rqsts type:exclusive default:0x1 0x4 extra: hit_e Not rejected writebacks from L1D to L2 cache lines in E state 0x8 extra: hit_m Not rejected writebacks from L1D to L2 cache lines in M state 0xf extra: all Not rejected writebacks from L1D to L2 cache lines in any state. -name:l1d_pend_miss type:exclusive default:0x1 +name:l1d_pend_miss type:exclusive default:pending_cycles 0x1 extra: pending L1D miss oustandings duration in cycles 0x1 extra:cmask=1 pending_cycles Cycles with L1D load Misses outstanding. 0x1 extra:cmask=1,edge occurences This event counts the number of L1D misses outstanding, using an edge detect to count transitions. @@ -68,7 +68,7 @@ name:move_elimination type:bitmask default:0x1 0x2 extra: simd_not_eliminated Number of SIMD Move Elimination candidate uops that were not eliminated. 0x4 extra: int_eliminated Number of integer Move Elimination candidate uops that were eliminated. 0x8 extra: simd_eliminated Number of SIMD Move Elimination candidate uops that were eliminated. -name:cpl_cycles type:exclusive default:0x1 +name:cpl_cycles type:exclusive default:ring0 0x1 extra: ring0 Unhalted core cycles when the thread is in ring 0 0x1 extra:cmask=1,edge ring0_trans Number of intervals between processor halts while thread is in ring 0 0x2 extra: ring123 Unhalted core cycles when thread is in rings 1, 2, or 3 @@ -76,7 +76,7 @@ name:rs_events type:mandatory default:0x1 0x1 extra: empty_cycles Cycles when Reservation Station (RS) is empty for the thread name:tlb_access type:mandatory default:0x4 0x4 extra: load_stlb_hit Load operations that miss the first DTLB level but hit the second and do not cause page walks -name:offcore_requests_outstanding type:exclusive default:0x1 +name:offcore_requests_outstanding type:exclusive default:cycles_with_demand_data_rd 0x1 extra: demand_data_rd Offcore outstanding Demand Data Read transactions in uncore queue. 0x1 extra:cmask=1 cycles_with_demand_data_rd Cycles when offcore outstanding Demand Data Read transactions are present in SuperQueue (SQ), queue to uncore 0x2 extra: demand_code_rd Offcore outstanding code reads transactions in SuperQueue (SQ), queue to uncore, every cycle @@ -87,7 +87,7 @@ name:offcore_requests_outstanding type:exclusive default:0x1 name:lock_cycles type:bitmask default:0x1 0x1 extra: split_lock_uc_lock_duration Cycles when L1 and L2 are locked due to UC or split lock 0x2 extra: cache_lock_duration Cycles when L1D is locked -name:idq type:exclusive default:0x2 +name:idq type:exclusive default:empty 0x2 extra: empty Instruction Decode Queue (IDQ) empty cycles 0x4 extra: mite_uops Uops delivered to Instruction Decode Queue (IDQ) from MITE path 0x4 extra:cmask=1 mite_cycles Cycles when uops are being delivered to Instruction Decode Queue (IDQ) from MITE path @@ -137,14 +137,14 @@ name:br_misp_exec type:exclusive default:0x41 0xc1 extra: all_conditional Speculative and retired mispredicted macro conditional branches 0xc4 extra: all_indirect_jump_non_call_ret Mispredicted indirect branches excluding calls and returns 0xff extra: all_branches Speculative and retired mispredicted macro conditional branches -name:idq_uops_not_delivered type:exclusive default:0x1 +name:idq_uops_not_delivered type:exclusive default:core 0x1 extra: core Uops not delivered by the Frontend to the Backend of the machine, while there is no Backend stall 0x1 extra:cmask=1 cycles_le_3_uop_deliv.core Cycles with 3 or less uops delivered by the Frontend to the Backend of the machine, while there is no Backend stall 0x1 extra:cmask=1,inv cycles_fe_was_ok Cycles with 4 uops delivered by the Frontend to the Backend of the machine, or the Backend was stalling 0x1 extra:cmask=2 cycles_le_2_uop_deliv.core Cycles with 2 or less uops delivered by the Frontend to the Backend of the machine, while there is no Backend stall 0x1 extra:cmask=3 cycles_le_1_uop_deliv.core Cycles with 1 or less uops delivered by the Frontend to the Backend of the machine, while there is no Backend stall 0x1 extra:cmask=4 cycles_0_uops_deliv.core Cycles with no uops delivered by the Frontend to the Backend of the machine, while there is no Backend stall -name:uops_dispatched_port type:exclusive default:0x1 +name:uops_dispatched_port type:exclusive default:port_0 0x1 extra: port_0 Cycles per thread when uops are dispatched to port 0 0x1 extra:any port_0_core Cycles per core when uops are dispatched to port 0 0x2 extra: port_1 Cycles per thread when uops are dispatched to port 1 @@ -181,7 +181,7 @@ name:offcore_requests type:bitmask default:0x1 0x2 extra: demand_code_rd Cacheable and noncachaeble code read requests 0x4 extra: demand_rfo Demand RFO requests including regular RFOs, locks, ItoM 0x8 extra: all_data_rd Demand and prefetch data reads -name:uops_executed type:exclusive default:0x1 +name:uops_executed type:exclusive default:thread 0x1 extra: thread Counts the number of uops to be executed per-thread each cycle. 0x1 extra:cmask=1 cycles_ge_1_uop_exec Cycles where at least 1 uop was executed per-thread 0x1 extra:cmask=1,inv stall_cycles Counts number of cycles no uops were dispatched to be executed on this thread. @@ -196,7 +196,7 @@ name:other_assists type:bitmask default:0x8 0x8 extra: avx_store Number of AVX memory assist for stores. AVX microcode assist is being invoked whenever the hardware is unable to properly handle AVX-256b operations. 0x10 extra: avx_to_sse Number of transitions from AVX-256 to legacy SSE when penalty applicable. 0x20 extra: sse_to_avx Number of transitions from SSE to AVX-256 when penalty applicable. -name:uops_retired type:exclusive default:0x1 +name:uops_retired type:exclusive default:all 0x1 extra: all Actually retired uops. 0x1 extra:cmask=1,inv stall_cycles Cycles without actually retired uops. 0x1 extra:cmask=1,inv,any core_stall_cycles Cycles without actually retired uops. diff --git a/events/i386/sandybridge/unit_masks b/events/i386/sandybridge/unit_masks index 3235c09..f35f32d 100644 --- a/events/i386/sandybridge/unit_masks +++ b/events/i386/sandybridge/unit_masks @@ -30,10 +30,10 @@ name:int_misc type:bitmask default:0x40 0x40 extra: rat_stall_cycles Cycles Resource Allocation Table (RAT) external stall is sent to Instruction Decode Queue (IDQ) for this thread. 0x3 extra:cmask=1 recovery_cycles Number of cycles waiting to be recover after Nuke due to all other cases except JEClear. 0x3 extra:cmask=1,edge recovery_stalls_count Edge applied to recovery_cycles, thus counts occurrences. -name:uops_issued type:bitmask default:0x1 +name:uops_issued type:bitmask default:any 0x1 extra: any Number of Uops issued by the Resource Allocation Table (RAT) to the Reservation Station (RS) 0x1 extra:cmask=1,inv stall_cycles cycles no uops issued by this thread. -name:arith type:bitmask default:0x1 +name:arith type:bitmask default:fpu_div_active 0x1 extra: fpu_div_active Cycles that the divider is busy with any divide or sqrt operation. 0x1 extra:cmask=1,edge fpu_div Number of times that the divider is actived, includes INT, SIMD and FP. name:l2_rqsts type:bitmask default:0x1 @@ -56,7 +56,7 @@ name:l2_store_lock_rqsts type:bitmask default:0xf name:l2_l1d_wb_rqsts type:bitmask default:0x4 0x4 extra: hit_e writebacks from L1D to L2 cache lines in E state 0x8 extra: hit_m writebacks from L1D to L2 cache lines in M state -name:l1d_pend_miss type:bitmask default:0x1 +name:l1d_pend_miss type:bitmask default:pending 0x1 extra: pending Cycles with L1D load Misses outstanding. 0x1 extra:cmask=1,edge occurences This event counts the number of L1D misses outstanding occurences. name:dtlb_store_misses type:bitmask default:0x1 @@ -72,7 +72,7 @@ name:l1d type:bitmask default:0x1 0x2 extra: allocated_in_m L1D M-state Data Cache Lines Allocated 0x4 extra: eviction L1D M-state Data Cache Lines Evicted due to replacement (only) 0x8 extra: all_m_replacement All Modified lines evicted out of L1D -name:partial_rat_stalls type:bitmask default:0x20 +name:partial_rat_stalls type:bitmask default:flags_merge_uop 0x20 extra: flags_merge_uop Number of perf sensitive flags-merge uops added by Sandy Bridge u-arch. 0x40 extra: slow_lea_window Number of cycles with at least 1 slow Load Effective Address (LEA) uop being allocated. 0x80 extra: mul_single_uop Number of Multiply packed/scalar single precision uops allocated @@ -82,11 +82,11 @@ name:resource_stalls2 type:bitmask default:0x40 0xf extra: all_prf_control Resource stalls2 control structures full for physical registers 0xc extra: all_fl_empty Cycles with either free list is empty 0x4f extra: ooo_rsrc Resource stalls2 control structures full Physical Register Reclaim Table (PRRT), Physical History Table (PHT), INT or SIMD Free List (FL), Branch Order Buffer (BOB) -name:cpl_cycles type:bitmask default:0x1 +name:cpl_cycles type:bitmask default:ring0 0x1 extra: ring0 Unhalted core cycles the Thread was in Rings 0. 0x1 extra:cmask=1,edge ring0_trans Transitions from ring123 to Ring0. 0x2 extra: ring123 Unhalted core cycles the Thread was in Rings 1/2/3. -name:offcore_requests_outstanding type:bitmask default:0x1 +name:offcore_requests_outstanding type:bitmask default:cycles_with_demand_data_rd 0x1 extra: demand_data_rd Offcore outstanding Demand Data Read transactions in the SuperQueue (SQ), queue to uncore, every cycle. Includes L1D data hardware prefetches. 0x1 extra:cmask=1 cycles_with_demand_data_rd cycles there are Offcore outstanding RD data transactions in the SuperQueue (SQ), queue to uncore. 0x2 extra: demand_code_rd Offcore outstanding Code Reads transactions in the SuperQueue (SQ), queue to uncore, every cycle. @@ -148,7 +148,7 @@ name:br_misp_exec type:bitmask default:0xff 0xc1 extra: all_conditional All mispredicted macro conditional branch instructions. 0xc4 extra: all_indirect_jump_non_call_ret All mispredicted indirect branches that are not calls nor returns. 0xd0 extra: all_direct_near_call All mispredicted non-indirect calls -name:idq_uops_not_delivered type:bitmask default:0x1 +name:idq_uops_not_delivered type:bitmask default:core 0x1 extra: core Count number of non-delivered uops to Resource Allocation Table (RAT). 0x1 extra:cmask=4 cycles_0_uops_deliv.core Counts the cycles no uops were delivered 0x1 extra:cmask=3 cycles_le_1_uop_deliv.core Counts the cycles less than 1 uops were delivered @@ -190,21 +190,21 @@ name:offcore_requests type:bitmask default:0x1 0x2 extra: demand_code_rd Offcore Code read requests. Includes Cacheable and Un-cacheables. 0x4 extra: demand_rfo Offcore Demand RFOs. Includes regular RFO, Locks, ItoM. 0x8 extra: all_data_rd Offcore Demand and prefetch data reads returned to the core. -name:uops_dispatched type:bitmask default:0x1 +name:uops_dispatched type:bitmask default:thread 0x1 extra: thread Counts total number of uops to be dispatched per-thread each cycle. 0x1 extra:cmask=1,inv stall_cycles Counts number of cycles no uops were dispatced to be executed on this thread. 0x2 extra: core Counts total number of uops dispatched from any thread name:tlb_flush type:bitmask default:0x1 0x1 extra: dtlb_thread Count number of DTLB flushes of thread-specific entries. 0x20 extra: stlb_any Count number of any STLB flushes -name:l1d_blocks type:bitmask default:0x1 +name:l1d_blocks type:bitmask default:bank_conflict_cycles 0x1 extra: ld_bank_conflict Any dispatched loads cancelled due to DCU bank conflict 0x5 extra:cmask=1 bank_conflict_cycles Cycles with l1d blocks due to bank conflicts name:other_assists type:bitmask default:0x2 0x2 extra: itlb_miss_retired Instructions that experienced an ITLB miss. Non Pebs 0x10 extra: avx_to_sse Number of transitions from AVX-256 to legacy SSE when penalty applicable Non Pebs 0x20 extra: sse_to_avx Number of transitions from legacy SSE to AVX-256 when penalty applicable Non Pebs -name:uops_retired type:bitmask default:0x1 +name:uops_retired type:bitmask default:all 0x1 extra: all All uops that actually retired. 0x2 extra: retire_slots number of retirement slots used non PEBS 0x1 extra:cmask=1,inv stall_cycles Cycles no executable uops retired -- 1.8.1.4 |
From: Maynard J. <may...@us...> - 2013-06-19 22:23:56
|
On 06/18/2013 06:07 PM, Andi Kleen wrote: > From: Andi Kleen <ak...@li...> > > Use names to make all non unique Intel default unit masks > unique Andi, Hunks 6 & 8 of the ivybridge changes failed to apply. I couldn't figure out why, so I just added those few changes by hand. If you can figure out what's going on, please submit a revised patch. Otherwise, once I commit the patch set, you'll need to double-check my hand-editing. -Maynard > --- > events/i386/haswell/unit_masks | 24 ++++++++++++------------ > events/i386/ivybridge/unit_masks | 22 +++++++++++----------- > events/i386/sandybridge/unit_masks | 20 ++++++++++---------- > 3 files changed, 33 insertions(+), 33 deletions(-) > > diff --git a/events/i386/haswell/unit_masks b/events/i386/haswell/unit_masks > index e594e9f..32e1c1e 100644 > --- a/events/i386/haswell/unit_masks > +++ b/events/i386/haswell/unit_masks > @@ -27,7 +27,7 @@ name:dtlb_load_misses type:exclusive default:0x1 > 0x20 extra: stlb_hit_4k Load misses that miss the DTLB and hit the STLB (4K) > 0x40 extra: stlb_hit_2m Load misses that miss the DTLB and hit the STLB (2M) > 0x80 extra: pde_cache_miss DTLB demand load misses with low part of linear-to-physical address translation missed > -name:uops_issued type:exclusive default:0x1 > +name:uops_issued type:exclusive default:any > 0x1 extra: any Uops that Resource Allocation Table (RAT) issues to Reservation Station (RS) > 0x10 extra: flags_merge Number of flags-merge uops being allocated. Such uops considered perf sensitive; added by GSR u-arch. > 0x20 extra: slow_lea Number of slow LEA uops being allocated. A uop is generally considered SlowLea if it has 3 sources (e.g. 2 sources + immediate) regardless if as a result of LEA instruction or not. > @@ -51,7 +51,7 @@ name:l2_rqsts type:exclusive default:0x21 > 0xe7 extra: all_demand_references Demand requests to L2 cache > 0x3f extra: miss All requests that miss L2 cache > 0xff extra: references All L2 requests > -name:l1d_pend_miss type:exclusive default:0x1 > +name:l1d_pend_miss type:exclusive default:pending > 0x1 extra: pending L1D miss oustandings duration in cycles > 0x1 extra:cmask=1 pending_cycles Cycles with L1D load Misses outstanding. > 0x1 extra:cmask=1,edge occurences This event counts the number of L1D misses outstanding, using an edge detect to count transitions. > @@ -81,7 +81,7 @@ name:move_elimination type:exclusive default:0x1 > 0x2 extra: simd_eliminated Number of SIMD Move Elimination candidate uops that were eliminated. > 0x4 extra: int_not_eliminated Number of integer Move Elimination candidate uops that were not eliminated. > 0x8 extra: simd_not_eliminated Number of SIMD Move Elimination candidate uops that were not eliminated. > -name:cpl_cycles type:exclusive default:0x1 > +name:cpl_cycles type:exclusive default:ring0 > 0x1 extra: ring0 Unhalted core cycles when the thread is in ring 0 > 0x2 extra: ring123 Unhalted core cycles when thread is in rings 1, 2, or 3 > 0x1 extra:cmask=1,edge ring0_trans Number of intervals between processor halts while thread is in ring 0 > @@ -145,14 +145,14 @@ name:br_misp_exec type:exclusive default:0xff > 0xc1 extra: all_conditional Speculative and retired mispredicted macro conditional branches > 0xc4 extra: all_indirect_jump_non_call_ret Mispredicted indirect branches excluding calls and returns > 0xa0 extra: taken_indirect_near_call Taken speculative and retired mispredicted indirect calls > -name:idq_uops_not_delivered type:exclusive default:0x1 > +name:idq_uops_not_delivered type:exclusive default:core > 0x1 extra: core Uops not delivered to Resource Allocation Table (RAT) per thread when backend of the machine is not stalled > 0x1 extra:cmask=4 cycles_0_uops_deliv_core Cycles per thread when 4 or more uops are not delivered to Resource Allocation Table (RAT) when backend of the machine is not stalled > 0x1 extra:cmask=3 cycles_le_1_uop_deliv_core Cycles per thread when 3 or more uops are not delivered to Resource Allocation Table (RAT) when backend of the machine is not stalled > 0x1 extra:cmask=2 cycles_le_2_uop_deliv_core Cycles with less than 2 uops delivered by the front end. > 0x1 extra:cmask=1 cycles_le_3_uop_deliv_core Cycles with less than 3 uops delivered by the front end. > 0x1 extra:cmask=1,inv cycles_fe_was_ok Counts cycles FE delivered 4 uops or Resource Allocation Table (RAT) was stalling FE. > -name:uops_executed_port type:exclusive default:0x1 > +name:uops_executed_port type:exclusive default:port_0 > 0x1 extra: port_0 Cycles per thread when uops are executed in port 0 > 0x2 extra: port_1 Cycles per thread when uops are executed in port 1 > 0x4 extra: port_2 Cycles per thread when uops are executed in port 2 > @@ -183,7 +183,7 @@ name:offcore_requests type:exclusive default:0x2 > 0x2 extra: demand_code_rd Cacheable and noncachaeble code read requests > 0x4 extra: demand_rfo Demand RFO requests including regular RFOs, locks, ItoM > 0x8 extra: all_data_rd Demand and prefetch data reads > -name:uops_executed type:exclusive default:0x1 > +name:uops_executed type:exclusive default:thread > 0x1 extra: thread Counts the number of uops to be executed per-thread each cycle. > 0x2 extra: core Number of uops executed on the core. > 0x1 extra:cmask=1,inv stall_cycles Counts number of cycles no uops were dispatched to be executed on this thread. > @@ -207,7 +207,7 @@ name:other_assists type:exclusive default:0x8 > 0x8 extra: avx_to_sse Number of transitions from AVX-256 to legacy SSE when penalty applicable. > 0x10 extra: sse_to_avx Number of transitions from SSE to AVX-256 when penalty applicable. > 0x40 extra: any_wb_assist Number of times any microcode assist is invoked by HW upon uop writeback. > -name:uops_retired type:exclusive default:0x1 > +name:uops_retired type:exclusive default:all > 0x1 extra: all Actually retired uops. > 0x2 extra: retire_slots Retirement slots used. > 0x1 extra:pebs all_ps Actually retired uops. (Precise Event - PEBS) > @@ -219,7 +219,7 @@ name:machine_clears type:exclusive default:0x2 > 0x2 extra: memory_ordering Counts the number of machine clears due to memory order conflicts. > 0x4 extra: smc Self-modifying code (SMC) detected. > 0x20 extra: maskmov This event counts the number of executed Intel AVX masked load operations that refer to an illegal address range with the mask bits set to 0. > -name:br_inst_retired type:exclusive default:0x1 > +name:br_inst_retired type:exclusive default:all_branches_ps > 0x1 extra: conditional Conditional branch instructions retired. > 0x2 extra: near_call Direct and indirect near call instructions retired. > 0x8 extra: near_return Return instructions retired. > @@ -233,7 +233,7 @@ name:br_inst_retired type:exclusive default:0x1 > 0x20 extra:pebs near_taken_ps Taken branch instructions retired. (Precise Event - PEBS) > 0x2 extra: near_call_r3 Direct and indirect macro near call instructions retired (captured in ring 3). > 0x2 extra:pebs near_call_r3_ps Direct and indirect macro near call instructions retired (captured in ring 3). (Precise Event - PEBS) > -name:br_misp_retired type:exclusive default:0x1 > +name:br_misp_retired type:exclusive default:all_branches_ps > 0x1 extra: conditional Mispredicted conditional branch instructions retired. > 0x1 extra:pebs conditional_ps Mispredicted conditional branch instructions retired. (Precise Event - PEBS) > 0x4 extra:pebs all_branches_ps Mispredicted macro branch instructions retired. (Precise Event - PEBS) > @@ -263,7 +263,7 @@ name:fp_assist type:exclusive default:0x1e > 0x4 extra: x87_input Number of X87 assists due to input value. > 0x8 extra: simd_output Number of SIMD FP assists due to Output values > 0x10 extra: simd_input Number of SIMD FP assists due to input values > -name:mem_uops_retired type:exclusive default:0x11 > +name:mem_uops_retired type:exclusive default:all_loads > 0x11 extra: stlb_miss_loads Load uops with true STLB miss retired to architected path. > 0x12 extra: stlb_miss_stores Store uops with true STLB miss retired to architected path. > 0x21 extra: lock_loads Load uops with locked access retired to architected path. > @@ -278,7 +278,7 @@ name:mem_uops_retired type:exclusive default:0x11 > 0x42 extra:pebs split_stores_ps Line-splitted store uops retired to architected path. (Precise Event - PEBS) > 0x81 extra:pebs all_loads_ps Load uops retired to architected path with filter on bits 0 and 1 applied. (Precise Event - PEBS) > 0x82 extra:pebs all_stores_ps Store uops retired to architected path with filter on bits 0 and 1 applied. (Precise Event - PEBS) > -name:mem_load_uops_retired type:exclusive default:0x1 > +name:mem_load_uops_retired type:exclusive default:l1_hit > 0x1 extra: l1_hit Retired load uops with L1 cache hits as data sources. > 0x2 extra: l2_hit Retired load uops with L2 cache hits as data sources. > 0x4 extra: l3_hit Retired load uops which data sources were data hits in LLC without snoops required. > @@ -288,7 +288,7 @@ name:mem_load_uops_retired type:exclusive default:0x1 > 0x2 extra:pebs l2_hit_ps Retired load uops with L2 cache hits as data sources. (Precise Event - PEBS) > 0x4 extra:pebs l3_hit_ps Miss in last-level (L3) cache. Excludes Unknown data-source. (Precise Event - PEBS) > 0x40 extra:pebs hit_lfb_ps Retired load uops which data sources were load uops missed L1 but hit FB due to preceding miss to the same cache line with data not ready. (Precise Event - PEBS) > -name:mem_load_uops_l3_hit_retired type:exclusive default:0x1 > +name:mem_load_uops_l3_hit_retired type:exclusive default:xsnp_miss > 0x1 extra: xsnp_miss Retired load uops which data sources were LLC hit and cross-core snoop missed in on-pkg core cache. > 0x2 extra: xsnp_hit Retired load uops which data sources were LLC and cross-core snoop hits in on-pkg core cache. > 0x4 extra: xsnp_hitm Retired load uops which data sources were HitM responses from shared LLC. > diff --git a/events/i386/ivybridge/unit_masks b/events/i386/ivybridge/unit_masks > index ffee1fa..7ff16e3 100644 > --- a/events/i386/ivybridge/unit_masks > +++ b/events/i386/ivybridge/unit_masks > @@ -15,17 +15,17 @@ name:dtlb_load_misses type:exclusive default:0x81 > 0x81 extra: demand_ld_miss_causes_a_walk Demand load Miss in all translation lookaside buffer (TLB) levels causes an page walk of any page size. > 0x82 extra: demand_ld_walk_completed Demand load Miss in all translation lookaside buffer (TLB) levels causes a page walk that completes of any page size. > 0x84 extra: demand_ld_walk_duration Demand load cycles page miss handler (PMH) is busy with this walk. > -name:int_misc type:exclusive default:0x3 > +name:int_misc type:exclusive default:recovery_cycles > 0x3 extra:cmask=1 recovery_cycles Number of cycles waiting for the checkpoints in Resource Allocation Table (RAT) to be recovered after Nuke due to all other cases except JEClear (e.g. whenever a ucode assist is needed like SSE exception, memory disambiguation, etc...) > 0x3 extra:cmask=1,edge recovery_stalls_count Number of occurences waiting for the checkpoints in Resource Allocation Table (RAT) to be recovered after Nuke due to all other cases except JEClear (e.g. whenever a ucode assist is needed like SSE exception, memory disambiguation, etc...) > -name:uops_issued type:exclusive default:0x1 > +name:uops_issued type:exclusive default:any > 0x1 extra: any Uops that Resource Allocation Table (RAT) issues to Reservation Station (RS) > 0x1 extra:cmask=1,inv stall_cycles Cycles when Resource Allocation Table (RAT) does not issue Uops to Reservation Station (RS) for the thread > 0x1 extra:cmask=1,inv,any core_stall_cycles Cycles when Resource Allocation Table (RAT) does not issue Uops to Reservation Station (RS) for all threads > 0x10 extra: flags_merge Number of flags-merge uops being allocated. > 0x20 extra: slow_lea Number of slow LEA uops being allocated. A uop is generally considered SlowLea if it has 3 sources (e.g. 2 sources + immediate) regardless if as a result of LEA instruction or not. > 0x40 extra: single_mul Number of Multiply packed/scalar single precision uops allocated > -name:arith type:bitmask default:0x1 > +name:arith type:bitmask default:fpu_div_active > 0x1 extra: fpu_div_active Cycles when divider is busy executing divide operations > 0x4 extra:cmask=1,edge fpu_div Divide operations executed > name:l2_rqsts type:exclusive default:0x1 > @@ -49,7 +49,7 @@ name:l2_l1d_wb_rqsts type:exclusive default:0x1 > 0x4 extra: hit_e Not rejected writebacks from L1D to L2 cache lines in E state > 0x8 extra: hit_m Not rejected writebacks from L1D to L2 cache lines in M state > 0xf extra: all Not rejected writebacks from L1D to L2 cache lines in any state. > -name:l1d_pend_miss type:exclusive default:0x1 > +name:l1d_pend_miss type:exclusive default:pending_cycles > 0x1 extra: pending L1D miss oustandings duration in cycles > 0x1 extra:cmask=1 pending_cycles Cycles with L1D load Misses outstanding. > 0x1 extra:cmask=1,edge occurences This event counts the number of L1D misses outstanding, using an edge detect to count transitions. > @@ -68,7 +68,7 @@ name:move_elimination type:bitmask default:0x1 > 0x2 extra: simd_not_eliminated Number of SIMD Move Elimination candidate uops that were not eliminated. > 0x4 extra: int_eliminated Number of integer Move Elimination candidate uops that were eliminated. > 0x8 extra: simd_eliminated Number of SIMD Move Elimination candidate uops that were eliminated. > -name:cpl_cycles type:exclusive default:0x1 > +name:cpl_cycles type:exclusive default:ring0 > 0x1 extra: ring0 Unhalted core cycles when the thread is in ring 0 > 0x1 extra:cmask=1,edge ring0_trans Number of intervals between processor halts while thread is in ring 0 > 0x2 extra: ring123 Unhalted core cycles when thread is in rings 1, 2, or 3 > @@ -76,7 +76,7 @@ name:rs_events type:mandatory default:0x1 > 0x1 extra: empty_cycles Cycles when Reservation Station (RS) is empty for the thread > name:tlb_access type:mandatory default:0x4 > 0x4 extra: load_stlb_hit Load operations that miss the first DTLB level but hit the second and do not cause page walks > -name:offcore_requests_outstanding type:exclusive default:0x1 > +name:offcore_requests_outstanding type:exclusive default:cycles_with_demand_data_rd > 0x1 extra: demand_data_rd Offcore outstanding Demand Data Read transactions in uncore queue. > 0x1 extra:cmask=1 cycles_with_demand_data_rd Cycles when offcore outstanding Demand Data Read transactions are present in SuperQueue (SQ), queue to uncore > 0x2 extra: demand_code_rd Offcore outstanding code reads transactions in SuperQueue (SQ), queue to uncore, every cycle > @@ -87,7 +87,7 @@ name:offcore_requests_outstanding type:exclusive default:0x1 > name:lock_cycles type:bitmask default:0x1 > 0x1 extra: split_lock_uc_lock_duration Cycles when L1 and L2 are locked due to UC or split lock > 0x2 extra: cache_lock_duration Cycles when L1D is locked > -name:idq type:exclusive default:0x2 > +name:idq type:exclusive default:empty > 0x2 extra: empty Instruction Decode Queue (IDQ) empty cycles > 0x4 extra: mite_uops Uops delivered to Instruction Decode Queue (IDQ) from MITE path > 0x4 extra:cmask=1 mite_cycles Cycles when uops are being delivered to Instruction Decode Queue (IDQ) from MITE path > @@ -137,14 +137,14 @@ name:br_misp_exec type:exclusive default:0x41 > 0xc1 extra: all_conditional Speculative and retired mispredicted macro conditional branches > 0xc4 extra: all_indirect_jump_non_call_ret Mispredicted indirect branches excluding calls and returns > 0xff extra: all_branches Speculative and retired mispredicted macro conditional branches > -name:idq_uops_not_delivered type:exclusive default:0x1 > +name:idq_uops_not_delivered type:exclusive default:core > 0x1 extra: core Uops not delivered by the Frontend to the Backend of the machine, while there is no Backend stall > 0x1 extra:cmask=1 cycles_le_3_uop_deliv.core Cycles with 3 or less uops delivered by the Frontend to the Backend of the machine, while there is no Backend stall > 0x1 extra:cmask=1,inv cycles_fe_was_ok Cycles with 4 uops delivered by the Frontend to the Backend of the machine, or the Backend was stalling > 0x1 extra:cmask=2 cycles_le_2_uop_deliv.core Cycles with 2 or less uops delivered by the Frontend to the Backend of the machine, while there is no Backend stall > 0x1 extra:cmask=3 cycles_le_1_uop_deliv.core Cycles with 1 or less uops delivered by the Frontend to the Backend of the machine, while there is no Backend stall > 0x1 extra:cmask=4 cycles_0_uops_deliv.core Cycles with no uops delivered by the Frontend to the Backend of the machine, while there is no Backend stall > -name:uops_dispatched_port type:exclusive default:0x1 > +name:uops_dispatched_port type:exclusive default:port_0 > 0x1 extra: port_0 Cycles per thread when uops are dispatched to port 0 > 0x1 extra:any port_0_core Cycles per core when uops are dispatched to port 0 > 0x2 extra: port_1 Cycles per thread when uops are dispatched to port 1 > @@ -181,7 +181,7 @@ name:offcore_requests type:bitmask default:0x1 > 0x2 extra: demand_code_rd Cacheable and noncachaeble code read requests > 0x4 extra: demand_rfo Demand RFO requests including regular RFOs, locks, ItoM > 0x8 extra: all_data_rd Demand and prefetch data reads > -name:uops_executed type:exclusive default:0x1 > +name:uops_executed type:exclusive default:thread > 0x1 extra: thread Counts the number of uops to be executed per-thread each cycle. > 0x1 extra:cmask=1 cycles_ge_1_uop_exec Cycles where at least 1 uop was executed per-thread > 0x1 extra:cmask=1,inv stall_cycles Counts number of cycles no uops were dispatched to be executed on this thread. > @@ -196,7 +196,7 @@ name:other_assists type:bitmask default:0x8 > 0x8 extra: avx_store Number of AVX memory assist for stores. AVX microcode assist is being invoked whenever the hardware is unable to properly handle AVX-256b operations. > 0x10 extra: avx_to_sse Number of transitions from AVX-256 to legacy SSE when penalty applicable. > 0x20 extra: sse_to_avx Number of transitions from SSE to AVX-256 when penalty applicable. > -name:uops_retired type:exclusive default:0x1 > +name:uops_retired type:exclusive default:all > 0x1 extra: all Actually retired uops. > 0x1 extra:cmask=1,inv stall_cycles Cycles without actually retired uops. > 0x1 extra:cmask=1,inv,any core_stall_cycles Cycles without actually retired uops. > diff --git a/events/i386/sandybridge/unit_masks b/events/i386/sandybridge/unit_masks > index 3235c09..f35f32d 100644 > --- a/events/i386/sandybridge/unit_masks > +++ b/events/i386/sandybridge/unit_masks > @@ -30,10 +30,10 @@ name:int_misc type:bitmask default:0x40 > 0x40 extra: rat_stall_cycles Cycles Resource Allocation Table (RAT) external stall is sent to Instruction Decode Queue (IDQ) for this thread. > 0x3 extra:cmask=1 recovery_cycles Number of cycles waiting to be recover after Nuke due to all other cases except JEClear. > 0x3 extra:cmask=1,edge recovery_stalls_count Edge applied to recovery_cycles, thus counts occurrences. > -name:uops_issued type:bitmask default:0x1 > +name:uops_issued type:bitmask default:any > 0x1 extra: any Number of Uops issued by the Resource Allocation Table (RAT) to the Reservation Station (RS) > 0x1 extra:cmask=1,inv stall_cycles cycles no uops issued by this thread. > -name:arith type:bitmask default:0x1 > +name:arith type:bitmask default:fpu_div_active > 0x1 extra: fpu_div_active Cycles that the divider is busy with any divide or sqrt operation. > 0x1 extra:cmask=1,edge fpu_div Number of times that the divider is actived, includes INT, SIMD and FP. > name:l2_rqsts type:bitmask default:0x1 > @@ -56,7 +56,7 @@ name:l2_store_lock_rqsts type:bitmask default:0xf > name:l2_l1d_wb_rqsts type:bitmask default:0x4 > 0x4 extra: hit_e writebacks from L1D to L2 cache lines in E state > 0x8 extra: hit_m writebacks from L1D to L2 cache lines in M state > -name:l1d_pend_miss type:bitmask default:0x1 > +name:l1d_pend_miss type:bitmask default:pending > 0x1 extra: pending Cycles with L1D load Misses outstanding. > 0x1 extra:cmask=1,edge occurences This event counts the number of L1D misses outstanding occurences. > name:dtlb_store_misses type:bitmask default:0x1 > @@ -72,7 +72,7 @@ name:l1d type:bitmask default:0x1 > 0x2 extra: allocated_in_m L1D M-state Data Cache Lines Allocated > 0x4 extra: eviction L1D M-state Data Cache Lines Evicted due to replacement (only) > 0x8 extra: all_m_replacement All Modified lines evicted out of L1D > -name:partial_rat_stalls type:bitmask default:0x20 > +name:partial_rat_stalls type:bitmask default:flags_merge_uop > 0x20 extra: flags_merge_uop Number of perf sensitive flags-merge uops added by Sandy Bridge u-arch. > 0x40 extra: slow_lea_window Number of cycles with at least 1 slow Load Effective Address (LEA) uop being allocated. > 0x80 extra: mul_single_uop Number of Multiply packed/scalar single precision uops allocated > @@ -82,11 +82,11 @@ name:resource_stalls2 type:bitmask default:0x40 > 0xf extra: all_prf_control Resource stalls2 control structures full for physical registers > 0xc extra: all_fl_empty Cycles with either free list is empty > 0x4f extra: ooo_rsrc Resource stalls2 control structures full Physical Register Reclaim Table (PRRT), Physical History Table (PHT), INT or SIMD Free List (FL), Branch Order Buffer (BOB) > -name:cpl_cycles type:bitmask default:0x1 > +name:cpl_cycles type:bitmask default:ring0 > 0x1 extra: ring0 Unhalted core cycles the Thread was in Rings 0. > 0x1 extra:cmask=1,edge ring0_trans Transitions from ring123 to Ring0. > 0x2 extra: ring123 Unhalted core cycles the Thread was in Rings 1/2/3. > -name:offcore_requests_outstanding type:bitmask default:0x1 > +name:offcore_requests_outstanding type:bitmask default:cycles_with_demand_data_rd > 0x1 extra: demand_data_rd Offcore outstanding Demand Data Read transactions in the SuperQueue (SQ), queue to uncore, every cycle. Includes L1D data hardware prefetches. > 0x1 extra:cmask=1 cycles_with_demand_data_rd cycles there are Offcore outstanding RD data transactions in the SuperQueue (SQ), queue to uncore. > 0x2 extra: demand_code_rd Offcore outstanding Code Reads transactions in the SuperQueue (SQ), queue to uncore, every cycle. > @@ -148,7 +148,7 @@ name:br_misp_exec type:bitmask default:0xff > 0xc1 extra: all_conditional All mispredicted macro conditional branch instructions. > 0xc4 extra: all_indirect_jump_non_call_ret All mispredicted indirect branches that are not calls nor returns. > 0xd0 extra: all_direct_near_call All mispredicted non-indirect calls > -name:idq_uops_not_delivered type:bitmask default:0x1 > +name:idq_uops_not_delivered type:bitmask default:core > 0x1 extra: core Count number of non-delivered uops to Resource Allocation Table (RAT). > 0x1 extra:cmask=4 cycles_0_uops_deliv.core Counts the cycles no uops were delivered > 0x1 extra:cmask=3 cycles_le_1_uop_deliv.core Counts the cycles less than 1 uops were delivered > @@ -190,21 +190,21 @@ name:offcore_requests type:bitmask default:0x1 > 0x2 extra: demand_code_rd Offcore Code read requests. Includes Cacheable and Un-cacheables. > 0x4 extra: demand_rfo Offcore Demand RFOs. Includes regular RFO, Locks, ItoM. > 0x8 extra: all_data_rd Offcore Demand and prefetch data reads returned to the core. > -name:uops_dispatched type:bitmask default:0x1 > +name:uops_dispatched type:bitmask default:thread > 0x1 extra: thread Counts total number of uops to be dispatched per-thread each cycle. > 0x1 extra:cmask=1,inv stall_cycles Counts number of cycles no uops were dispatced to be executed on this thread. > 0x2 extra: core Counts total number of uops dispatched from any thread > name:tlb_flush type:bitmask default:0x1 > 0x1 extra: dtlb_thread Count number of DTLB flushes of thread-specific entries. > 0x20 extra: stlb_any Count number of any STLB flushes > -name:l1d_blocks type:bitmask default:0x1 > +name:l1d_blocks type:bitmask default:bank_conflict_cycles > 0x1 extra: ld_bank_conflict Any dispatched loads cancelled due to DCU bank conflict > 0x5 extra:cmask=1 bank_conflict_cycles Cycles with l1d blocks due to bank conflicts > name:other_assists type:bitmask default:0x2 > 0x2 extra: itlb_miss_retired Instructions that experienced an ITLB miss. Non Pebs > 0x10 extra: avx_to_sse Number of transitions from AVX-256 to legacy SSE when penalty applicable Non Pebs > 0x20 extra: sse_to_avx Number of transitions from legacy SSE to AVX-256 when penalty applicable Non Pebs > -name:uops_retired type:bitmask default:0x1 > +name:uops_retired type:bitmask default:all > 0x1 extra: all All uops that actually retired. > 0x2 extra: retire_slots number of retirement slots used non PEBS > 0x1 extra:cmask=1,inv stall_cycles Cycles no executable uops retired > |
From: Andi K. <an...@fi...> - 2013-06-18 23:40:43
|
From: Suravee Suthikulpanit <sur...@am...> Current libop does not allow named mask to be used as default mask. This causes the tools to fail when user does not specify the named mask of an event and relying on the numerical default unit mask which could have duplication. Signed-off-by: Suravee Suthikulpanit <sur...@am...> --- libop/op_events.c | 84 ++++++++++++++++++++++++++++++++++++++++++------------- libop/op_events.h | 2 ++ 2 files changed, 66 insertions(+), 20 deletions(-) diff --git a/libop/op_events.c b/libop/op_events.c index 276386d..ea23f23 100644 --- a/libop/op_events.c +++ b/libop/op_events.c @@ -197,7 +197,12 @@ static void parse_um(struct op_unit_mask * um, char const * line) if (seen_default) parse_error("duplicate default: tag"); seen_default = 1; - um->default_mask = parse_hex(tagend); + if (0 != strncmp(tagend, "0x", 2)) { + um->default_mask_name = op_xstrndup( + tagend, valueend - tagend); + } else { + um->default_mask = parse_hex(tagend); + } } else { parse_error("invalid unit mask tag"); } @@ -215,32 +220,50 @@ static void parse_um(struct op_unit_mask * um, char const * line) /* \t0x08 (M)odified cache state */ -/* \t0x08 extra:inv,cmask=... (M)odified cache state */ +/* \t0x08 extra:inv,cmask=... mod_cach_state (M)odified cache state */ static void parse_um_entry(struct op_described_um * entry, char const * line) { char const * c = line; + /* value */ c = skip_ws(c); entry->value = parse_hex(c); - c = skip_nonws(c); + /* extra: */ + c = skip_nonws(c); c = skip_ws(c); + if (!*c) + goto invalid_out; + if (strisprefix(c, "extra:")) { c += 6; entry->extra = parse_extra(c); + /* named mask */ c = skip_nonws(c); - } else - entry->extra = 0; - - if (!*c) - parse_error("invalid unit mask entry"); + c = skip_ws(c); + if (!*c) + goto invalid_out; - c = skip_ws(c); + /* "extra:" !!ALWAYS!! followed by named mask */ + entry->name= op_xstrndup(c, strcspn(c, " ")); + c = skip_nonws(c); + c = skip_ws(c); + } else { + entry->extra = 0; + } - if (!*c) - parse_error("invalid unit mask entry"); + /* desc */ + if (!*c) { + /* This is a corner case where the named unit mask entry + * only has one word. This should really be fixed in the + * unit_mask file */ + entry->desc = xstrdup(entry->name); + } else + entry->desc = xstrdup(c); + return; - entry->desc = xstrdup(c); +invalid_out: + parse_error("invalid unit mask entry"); } @@ -605,10 +628,17 @@ static int check_unit_mask(struct op_unit_mask const * um, "(%s)\n", um->name, cpu_name); err = EXIT_FAILURE; } - } else { - for (i = 0; i < um->num; ++i) { - if (um->default_mask == um->um[i].value) - break; + } else if (um->unit_type_mask == utm_exclusive) { + if (um->default_mask_name) { + for (i = 0; i < um->num; ++i) { + if (0 == strcmp(um->default_mask_name, um->um[i].name)) + break; + } + } else { + for (i = 0; i < um->num; ++i) { + if (um->default_mask == um->um[i].value) + break; + } } if (i == um->num) { @@ -708,6 +738,7 @@ static void load_events(op_cpu cpu_type) unit_mask->unit_type_mask = utm_mandatory; unit_mask->um[0].extra = 0; unit_mask->um[0].value = 0; + unit_mask->um[0].name = xstrdup(""); unit_mask->um[0].desc = xstrdup("No unit mask"); unit_mask->used = 1; @@ -1335,10 +1366,18 @@ static void extra_check(struct op_event *e, char *name, unsigned w) } } -static void do_resolve_unit_mask(struct op_event *e, struct parsed_event *pe, u32 *extra) +static void do_resolve_unit_mask(struct op_event *e, + struct parsed_event *pe, u32 *extra) { unsigned i; + /* If not specified um and the default um is name type + * we populate pe unitmask name with default name */ + if ((e->unit->default_mask_name != NULL) && + (pe->unit_mask_name == NULL)) { + pe->unit_mask_name = xstrdup(e->unit->default_mask_name); + } + for (;;) { if (pe->unit_mask_name == NULL) { /* For numerical unit mask */ @@ -1374,9 +1413,14 @@ static void do_resolve_unit_mask(struct op_event *e, struct parsed_event *pe, u3 } else { /* For named unit mask */ for (i = 0; i < e->unit->num; i++) { - int len = strcspn(e->unit->um[i].desc, " \t"); - if (!strncmp(pe->unit_mask_name, e->unit->um[i].desc, - len) && pe->unit_mask_name[len] == '\0') + int len = 0; + + if (e->unit->um[i].name) + len = strlen(e->unit->um[i].name); + + if (len && + !strncmp(pe->unit_mask_name, e->unit->um[i].name, len) && + pe->unit_mask_name[len] == '\0') break; } if (i == e->unit->num) { diff --git a/libop/op_events.h b/libop/op_events.h index a437f2a..33114f7 100644 --- a/libop/op_events.h +++ b/libop/op_events.h @@ -56,9 +56,11 @@ struct op_unit_mask { u32 num; /**< number of possible unit masks */ enum unit_mask_type unit_type_mask; u32 default_mask; /**< only the gui use it */ + char * default_mask_name; struct op_described_um { u32 extra; u32 value; + char * name; char * desc; } um[MAX_UNIT_MASK]; struct list_head um_next; /**< next um in list */ -- 1.8.1.4 |
From: Maynard J. <may...@us...> - 2013-06-19 22:23:38
|
On 06/18/2013 06:07 PM, Andi Kleen wrote: > From: Suravee Suthikulpanit <sur...@am...> Hi, Andi, Your inclusion of Suravee's patch below has somehow dropped the changes to ophelp.c. So I'll just use Suravee's original patch. Thanks. -Maynard > > Current libop does not allow named mask to be used as default mask. > This causes the tools to fail when user does not specify the named > mask of an event and relying on the numerical default unit mask > which could have duplication. > > Signed-off-by: Suravee Suthikulpanit <sur...@am...> > --- > libop/op_events.c | 84 ++++++++++++++++++++++++++++++++++++++++++------------- > libop/op_events.h | 2 ++ > 2 files changed, 66 insertions(+), 20 deletions(-) > > diff --git a/libop/op_events.c b/libop/op_events.c > index 276386d..ea23f23 100644 > --- a/libop/op_events.c > +++ b/libop/op_events.c > @@ -197,7 +197,12 @@ static void parse_um(struct op_unit_mask * um, char const * line) > if (seen_default) > parse_error("duplicate default: tag"); > seen_default = 1; > - um->default_mask = parse_hex(tagend); > + if (0 != strncmp(tagend, "0x", 2)) { > + um->default_mask_name = op_xstrndup( > + tagend, valueend - tagend); > + } else { > + um->default_mask = parse_hex(tagend); > + } > } else { > parse_error("invalid unit mask tag"); > } > @@ -215,32 +220,50 @@ static void parse_um(struct op_unit_mask * um, char const * line) > > > /* \t0x08 (M)odified cache state */ > -/* \t0x08 extra:inv,cmask=... (M)odified cache state */ > +/* \t0x08 extra:inv,cmask=... mod_cach_state (M)odified cache state */ > static void parse_um_entry(struct op_described_um * entry, char const * line) > { > char const * c = line; > > + /* value */ > c = skip_ws(c); > entry->value = parse_hex(c); > - c = skip_nonws(c); > > + /* extra: */ > + c = skip_nonws(c); > c = skip_ws(c); > + if (!*c) > + goto invalid_out; > + > if (strisprefix(c, "extra:")) { > c += 6; > entry->extra = parse_extra(c); > + /* named mask */ > c = skip_nonws(c); > - } else > - entry->extra = 0; > - > - if (!*c) > - parse_error("invalid unit mask entry"); > + c = skip_ws(c); > + if (!*c) > + goto invalid_out; > > - c = skip_ws(c); > + /* "extra:" !!ALWAYS!! followed by named mask */ > + entry->name= op_xstrndup(c, strcspn(c, " ")); > + c = skip_nonws(c); > + c = skip_ws(c); > + } else { > + entry->extra = 0; > + } > > - if (!*c) > - parse_error("invalid unit mask entry"); > + /* desc */ > + if (!*c) { > + /* This is a corner case where the named unit mask entry > + * only has one word. This should really be fixed in the > + * unit_mask file */ > + entry->desc = xstrdup(entry->name); > + } else > + entry->desc = xstrdup(c); > + return; > > - entry->desc = xstrdup(c); > +invalid_out: > + parse_error("invalid unit mask entry"); > } > > > @@ -605,10 +628,17 @@ static int check_unit_mask(struct op_unit_mask const * um, > "(%s)\n", um->name, cpu_name); > err = EXIT_FAILURE; > } > - } else { > - for (i = 0; i < um->num; ++i) { > - if (um->default_mask == um->um[i].value) > - break; > + } else if (um->unit_type_mask == utm_exclusive) { > + if (um->default_mask_name) { > + for (i = 0; i < um->num; ++i) { > + if (0 == strcmp(um->default_mask_name, um->um[i].name)) > + break; > + } > + } else { > + for (i = 0; i < um->num; ++i) { > + if (um->default_mask == um->um[i].value) > + break; > + } > } > > if (i == um->num) { > @@ -708,6 +738,7 @@ static void load_events(op_cpu cpu_type) > unit_mask->unit_type_mask = utm_mandatory; > unit_mask->um[0].extra = 0; > unit_mask->um[0].value = 0; > + unit_mask->um[0].name = xstrdup(""); > unit_mask->um[0].desc = xstrdup("No unit mask"); > unit_mask->used = 1; > > @@ -1335,10 +1366,18 @@ static void extra_check(struct op_event *e, char *name, unsigned w) > } > } > > -static void do_resolve_unit_mask(struct op_event *e, struct parsed_event *pe, u32 *extra) > +static void do_resolve_unit_mask(struct op_event *e, > + struct parsed_event *pe, u32 *extra) > { > unsigned i; > > + /* If not specified um and the default um is name type > + * we populate pe unitmask name with default name */ > + if ((e->unit->default_mask_name != NULL) && > + (pe->unit_mask_name == NULL)) { > + pe->unit_mask_name = xstrdup(e->unit->default_mask_name); > + } > + > for (;;) { > if (pe->unit_mask_name == NULL) { > /* For numerical unit mask */ > @@ -1374,9 +1413,14 @@ static void do_resolve_unit_mask(struct op_event *e, struct parsed_event *pe, u3 > } else { > /* For named unit mask */ > for (i = 0; i < e->unit->num; i++) { > - int len = strcspn(e->unit->um[i].desc, " \t"); > - if (!strncmp(pe->unit_mask_name, e->unit->um[i].desc, > - len) && pe->unit_mask_name[len] == '\0') > + int len = 0; > + > + if (e->unit->um[i].name) > + len = strlen(e->unit->um[i].name); > + > + if (len && > + !strncmp(pe->unit_mask_name, e->unit->um[i].name, len) && > + pe->unit_mask_name[len] == '\0') > break; > } > if (i == e->unit->num) { > diff --git a/libop/op_events.h b/libop/op_events.h > index a437f2a..33114f7 100644 > --- a/libop/op_events.h > +++ b/libop/op_events.h > @@ -56,9 +56,11 @@ struct op_unit_mask { > u32 num; /**< number of possible unit masks */ > enum unit_mask_type unit_type_mask; > u32 default_mask; /**< only the gui use it */ > + char * default_mask_name; > struct op_described_um { > u32 extra; > u32 value; > + char * name; > char * desc; > } um[MAX_UNIT_MASK]; > struct list_head um_next; /**< next um in list */ > |
From: Andi K. <an...@fi...> - 2013-06-18 23:40:46
|
From: Andi Kleen <ak...@li...> Add empty extra: lines to every Intel event with a unique first word in the description. This can then be used to specify the unit mask symbolically. Haswell already had the empty extra masks. Core 2 did not have unique words for everything. I only add it there to the events which had. v2: Do changes for Atom too Signed-off-by: Andi Kleen <ak...@li...> --- events/i386/atom/unit_masks | 154 +++++------ events/i386/core_2/unit_masks | 62 ++--- events/i386/ivybridge/unit_masks | 330 +++++++++++----------- events/i386/nehalem/unit_masks | 552 ++++++++++++++++++------------------- events/i386/sandybridge/unit_masks | 374 ++++++++++++------------- events/i386/westmere/unit_masks | 480 ++++++++++++++++---------------- 6 files changed, 976 insertions(+), 976 deletions(-) diff --git a/events/i386/atom/unit_masks b/events/i386/atom/unit_masks index acaec23..4802ddb 100644 --- a/events/i386/atom/unit_masks +++ b/events/i386/atom/unit_masks @@ -3,118 +3,118 @@ # include:i386/arch_perfmon name:store_forwards type:mandatory default:0x81 - 0x81 good Good store forwards + 0x81 extra: good Good store forwards name:segment_reg_loads type:mandatory default:0x00 - 0x00 any Number of segment register loads + 0x00 extra: any Number of segment register loads name:simd_prefetch type:bitmask default:0x01 - 0x01 prefetcht0 Streaming SIMD Extensions (SSE) PrefetchT0 instructions executed - 0x06 sw_l2 Streaming SIMD Extensions (SSE) PrefetchT1 and PrefetchT2 instructions executed - 0x08 prefetchnta Streaming SIMD Extensions (SSE) Prefetch NTA instructions executed + 0x01 extra: prefetcht0 Streaming SIMD Extensions (SSE) PrefetchT0 instructions executed + 0x06 extra: sw_l2 Streaming SIMD Extensions (SSE) PrefetchT1 and PrefetchT2 instructions executed + 0x08 extra: prefetchnta Streaming SIMD Extensions (SSE) Prefetch NTA instructions executed name:data_tlb_misses type:bitmask default:0x07 - 0x07 dtlb_miss Memory accesses that missed the DTLB - 0x05 dtlb_miss_ld DTLB misses due to load operations - 0x09 l0_dtlb_miss_ld L0_DTLB misses due to load operations - 0x06 dtlb_miss_st DTLB misses due to store operations + 0x07 extra: dtlb_miss Memory accesses that missed the DTLB + 0x05 extra: dtlb_miss_ld DTLB misses due to load operations + 0x09 extra: l0_dtlb_miss_ld L0_DTLB misses due to load operations + 0x06 extra: dtlb_miss_st DTLB misses due to store operations name:page_walks type:bitmask default:0x03 - 0x03 walks Number of page-walks executed - 0x03 cycles Duration of page-walks in core cycles + 0x03 extra: walks Number of page-walks executed + 0x03 extra: cycles Duration of page-walks in core cycles name:x87_comp_ops_exe type:bitmask default:0x81 - 0x01 s Floating point computational micro-ops executed - 0x81 ar Floating point computational micro-ops retired + 0x01 extra: s Floating point computational micro-ops executed + 0x81 extra: ar Floating point computational micro-ops retired name:fp_assist type:mandatory default:0x81 - 0x81 ar Floating point assists + 0x81 extra: ar Floating point assists name:mul type:bitmask default:0x01 - 0x01 s Multiply operations executed - 0x81 ar Multiply operations retired + 0x01 extra: s Multiply operations executed + 0x81 extra: ar Multiply operations retired name:div type:bitmask default:0x01 - 0x01 s Divide operations executed - 0x81 ar Divide operations retired + 0x01 extra: s Divide operations executed + 0x81 extra: ar Divide operations retired name:l2_rqsts type:bitmask default:0x41 - 0x41 i_state L2 cache demand requests from this core that missed the L2 + 0x41 extra: i_state L2 cache demand requests from this core that missed the L2 0x4F mesi L2 cache demand requests from this core name:cpu_clk_unhalted type:bitmask default:0x00 - 0x00 core_p Core cycles when core is not halted - 0x01 bus Bus cycles when core is not halted - 0x02 no_other Bus cycles when core is active and the other is halted + 0x00 extra: core_p Core cycles when core is not halted + 0x01 extra: bus Bus cycles when core is not halted + 0x02 extra: no_other Bus cycles when core is active and the other is halted name:l1d_cache type:bitmask default:0x21 - 0x21 ld L1 Cacheable Data Reads - 0x22 st L1 Cacheable Data Writes + 0x21 extra: ld L1 Cacheable Data Reads + 0x22 extra: st L1 Cacheable Data Writes name:icache type:bitmask default:0x03 - 0x03 accesses Instruction fetches - 0x02 misses Icache miss + 0x03 extra: accesses Instruction fetches + 0x02 extra: misses Icache miss name:itlb type:bitmask default:0x04 - 0x04 flush ITLB flushes - 0x02 misses ITLB misses + 0x04 extra: flush ITLB flushes + 0x02 extra: misses ITLB misses name:macro_insts type:exclusive default:0x03 - 0x02 cisc_decoded CISC macro instructions decoded - 0x03 all_decoded All Instructions decoded + 0x02 extra: cisc_decoded CISC macro instructions decoded + 0x03 extra: all_decoded All Instructions decoded name:simd_uops_exec type:exclusive default:0x80 - 0x00 s SIMD micro-ops executed (excluding stores) - 0x80 ar SIMD micro-ops retired (excluding stores) + 0x00 extra: s SIMD micro-ops executed (excluding stores) + 0x80 extra: ar SIMD micro-ops retired (excluding stores) name:simd_sat_uop_exec type:bitmask default:0x00 - 0x00 s SIMD saturated arithmetic micro-ops executed - 0x80 ar SIMD saturated arithmetic micro-ops retired + 0x00 extra: s SIMD saturated arithmetic micro-ops executed + 0x80 extra: ar SIMD saturated arithmetic micro-ops retired name:simd_uop_type_exec type:bitmask default:0x01 - 0x01 s SIMD packed multiply microops executed - 0x81 ar SIMD packed multiply microops retired - 0x02 s SIMD packed shift micro-ops executed - 0x82 ar SIMD packed shift micro-ops retired - 0x04 s SIMD pack micro-ops executed - 0x84 ar SIMD pack micro-ops retired - 0x08 s SIMD unpack micro-ops executed - 0x88 ar SIMD unpack micro-ops retired - 0x10 s SIMD packed logical microops executed - 0x90 ar SIMD packed logical microops retired - 0x20 s SIMD packed arithmetic micro-ops executed + 0x01 extra: s SIMD packed multiply microops executed + 0x81 extra: ar SIMD packed multiply microops retired + 0x02 extra: s SIMD packed shift micro-ops executed + 0x82 extra: ar SIMD packed shift micro-ops retired + 0x04 extra: s SIMD pack micro-ops executed + 0x84 extra: ar SIMD pack micro-ops retired + 0x08 extra: s SIMD unpack micro-ops executed + 0x88 extra: ar SIMD unpack micro-ops retired + 0x10 extra: s SIMD packed logical microops executed + 0x90 extra: ar SIMD packed logical microops retired + 0x20 extra: s SIMD packed arithmetic micro-ops executed 0xA0 ar SIMD packed arithmetic micro-ops retired name:uops_retired type:mandatory default:0x10 - 0x10 any Micro-ops retired + 0x10 extra: any Micro-ops retired name:br_inst_retired type:bitmask default:0x00 - 0x00 any Retired branch instructions - 0x01 pred_not_taken Retired branch instructions that were predicted not-taken - 0x02 mispred_not_taken Retired branch instructions that were mispredicted not-taken - 0x04 pred_taken Retired branch instructions that were predicted taken - 0x08 mispred_taken Retired branch instructions that were mispredicted taken + 0x00 extra: any Retired branch instructions + 0x01 extra: pred_not_taken Retired branch instructions that were predicted not-taken + 0x02 extra: mispred_not_taken Retired branch instructions that were mispredicted not-taken + 0x04 extra: pred_taken Retired branch instructions that were predicted taken + 0x08 extra: mispred_taken Retired branch instructions that were mispredicted taken 0x0A mispred Retired mispredicted branch instructions (precise event) 0x0C taken Retired taken branch instructions 0x0F any1 Retired branch instructions name:cycles_int_masked type:bitmask default:0x01 - 0x01 cycles_int_masked Cycles during which interrupts are disabled - 0x02 cycles_int_pending_and_masked Cycles during which interrupts are pending and disabled + 0x01 extra: cycles_int_masked Cycles during which interrupts are disabled + 0x02 extra: cycles_int_pending_and_masked Cycles during which interrupts are pending and disabled name:simd_inst_retired type:bitmask default:0x01 - 0x01 packed_single Retired Streaming SIMD Extensions (SSE) packed-single instructions - 0x02 scalar_single Retired Streaming SIMD Extensions (SSE) scalar-single instructions - 0x04 packed_double Retired Streaming SIMD Extensions 2 (SSE2) packed-double instructions - 0x08 scalar_double Retired Streaming SIMD Extensions 2 (SSE2) scalar-double instructions - 0x10 vector Retired Streaming SIMD Extensions 2 (SSE2) vector instructions + 0x01 extra: packed_single Retired Streaming SIMD Extensions (SSE) packed-single instructions + 0x02 extra: scalar_single Retired Streaming SIMD Extensions (SSE) scalar-single instructions + 0x04 extra: packed_double Retired Streaming SIMD Extensions 2 (SSE2) packed-double instructions + 0x08 extra: scalar_double Retired Streaming SIMD Extensions 2 (SSE2) scalar-double instructions + 0x10 extra: vector Retired Streaming SIMD Extensions 2 (SSE2) vector instructions 0x1F any Retired Streaming SIMD instructions name:simd_comp_inst_retired type:bitmask default:0x01 - 0x01 packed_single Retired computational Streaming SIMD Extensions (SSE) packed-single instructions - 0x02 scalar_single Retired computational Streaming SIMD Extensions (SSE) scalar-single instructions - 0x04 packed_double Retired computational Streaming SIMD Extensions 2 (SSE2) packed-double instructions - 0x08 scalar_double Retired computational Streaming SIMD Extensions 2 (SSE2) scalar-double instructions + 0x01 extra: packed_single Retired computational Streaming SIMD Extensions (SSE) packed-single instructions + 0x02 extra: scalar_single Retired computational Streaming SIMD Extensions (SSE) scalar-single instructions + 0x04 extra: packed_double Retired computational Streaming SIMD Extensions 2 (SSE2) packed-double instructions + 0x08 extra: scalar_double Retired computational Streaming SIMD Extensions 2 (SSE2) scalar-double instructions name:mem_load_retired type:bitmask default:0x01 - 0x01 l2_hit Retired loads that hit the L2 cache (precise event) - 0x02 l2_miss Retired loads that miss the L2 cache (precise event) - 0x04 dtlb_miss Retired loads that miss the DTLB (precise event) + 0x01 extra: l2_hit Retired loads that hit the L2 cache (precise event) + 0x02 extra: l2_miss Retired loads that miss the L2 cache (precise event) + 0x04 extra: dtlb_miss Retired loads that miss the DTLB (precise event) name:thermal_trip type:mandatory default:0xc0 - 0xc0 thermal_trip Number of thermal trips. + 0xc0 extra: thermal_trip Number of thermal trips. # 18-11 name:core type:bitmask default:0x180 - 0x180 all All cores. - 0x080 this This Core. + 0x180 extra: all All cores. + 0x080 extra: this This Core. # 18-12 name:agent type:bitmask default:0x00 - 0x00 this This agent - 0x40 any Include any agents + 0x00 extra: this This agent + 0x40 extra: any Include any agents # 18-13 name:prefetch type:bitmask default:0x60 - 0x60 all All inclusive - 0x20 hw Hardware prefetch only - 0x00 exclude_hw Exclude hardware prefetch + 0x60 extra: all All inclusive + 0x20 extra: hw Hardware prefetch only + 0x00 extra: exclude_hw Exclude hardware prefetch # 18-14 name:mesi type:bitmask default:0x0f - 0x08 modified Counts modified state - 0x04 exclusive Counts exclusive state - 0x02 shared Counts shared state - 0x01 invalid Counts invalid state + 0x08 extra: modified Counts modified state + 0x04 extra: exclusive Counts exclusive state + 0x02 extra: shared Counts shared state + 0x01 extra: invalid Counts invalid state diff --git a/events/i386/core_2/unit_masks b/events/i386/core_2/unit_masks index d528f17..f1d64eb 100644 --- a/events/i386/core_2/unit_masks +++ b/events/i386/core_2/unit_masks @@ -33,7 +33,7 @@ name:sse_prefetch type:exclusive default:0x0 0x00 prefetch NTA instructions executed. 0x01 prefetch T1 instructions executed. 0x02 prefetch T1 and T2 instructions executed. - 0x03 SSE weakly-ordered stores + 0x03 extra: SSE weakly-ordered stores name:simd_instr_type_exec type:bitmask default:0x3f 0x01 SIMD packed multiplies 0x02 SIMD packed shifts @@ -41,7 +41,7 @@ name:simd_instr_type_exec type:bitmask default:0x3f 0x08 SIMD unpack operations 0x10 SIMD packed logical 0x20 SIMD packed arithmetic - 0x3f all of the above + 0x3f extra: all of the above name:mmx_trans type:bitmask default:0x3 0x01 float->MMX transitions 0x02 MMX->float transitions @@ -50,30 +50,30 @@ name:sse_miss type:exclusive default:0x0 0x01 PREFETCHT0 0x02 PREFETCHT1/PREFETCHT2 name:load_block type:bitmask default:0x3e - 0x02 STA Loads blocked by a preceding store with unknown address. - 0x04 STD Loads blocked by a preceding store with unknown data. - 0x08 OVERLAP_STORE Loads that partially overlap an earlier store, or 4K aliased with a previous store. - 0x10 UNTIL_RETIRE Loads blocked until retirement. - 0x20 L1D Loads blocked by the L1 data cache. + 0x02 extra: STA Loads blocked by a preceding store with unknown address. + 0x04 extra: STD Loads blocked by a preceding store with unknown data. + 0x08 extra: OVERLAP_STORE Loads that partially overlap an earlier store, or 4K aliased with a previous store. + 0x10 extra: UNTIL_RETIRE Loads blocked until retirement. + 0x20 extra: L1D Loads blocked by the L1 data cache. name:store_block type:bitmask default:0x0b - 0x01 SB_DRAIN_CYCLES Cycles while stores are blocked due to store buffer drain. - 0x02 ORDER Cycles while store is waiting for a preceding store to be globally observed. - 0x08 NOOP A store is blocked due to a conflict with an external or internal snoop. + 0x01 extra: SB_DRAIN_CYCLES Cycles while stores are blocked due to store buffer drain. + 0x02 extra: ORDER Cycles while store is waiting for a preceding store to be globally observed. + 0x08 extra: NOOP A store is blocked due to a conflict with an external or internal snoop. name:dtlb_miss type:bitmask default:0x0f - 0x01 ANY Memory accesses that missed the DTLB. - 0x02 MISS_LD DTLB misses due to load operations. - 0x04 L0_MISS_LD L0 DTLB misses due to load operations. - 0x08 MISS_ST TLB misses due to store operations. + 0x01 extra: ANY Memory accesses that missed the DTLB. + 0x02 extra: MISS_LD DTLB misses due to load operations. + 0x04 extra: L0_MISS_LD L0 DTLB misses due to load operations. + 0x08 extra: MISS_ST TLB misses due to store operations. name:memory_dis type:exclusive default:0x01 - 0x01 RESET Memory disambiguation reset cycles. - 0x02 SUCCESS Number of loads that were successfully disambiguated. + 0x01 extra: RESET Memory disambiguation reset cycles. + 0x02 extra: SUCCESS Number of loads that were successfully disambiguated. name:page_walks type:exclusive default:0x02 - 0x01 COUNT Number of page-walks executed. - 0x02 CYCLES Duration of page-walks in core cycles. + 0x01 extra: COUNT Number of page-walks executed. + 0x02 extra: CYCLES Duration of page-walks in core cycles. name:delayed_bypass type:exclusive default:0x00 - 0x00 FP Delayed bypass to FP operation. - 0x01 SIMD Delayed bypass to SIMD operation. - 0x02 LOAD Delayed bypass to load operation. + 0x00 extra: FP Delayed bypass to FP operation. + 0x01 extra: SIMD Delayed bypass to SIMD operation. + 0x02 extra: LOAD Delayed bypass to load operation. name:core type:exclusive default:0x40 0xc0 All cores 0x40 This core @@ -133,13 +133,13 @@ name:esp type:bitmask default:0x01 0x01 ESP register content synchronizations 0x02 ESP register automatic additions name:inst_retired type:bitmask default:0x00 - 0x00 Any - 0x01 Loads - 0x02 Stores - 0x04 Other + 0x00 extra: Any + 0x01 extra: Loads + 0x02 extra: Stores + 0x04 extra: Other name:x87_ops_retired type:exclusive default:0xfe - 0x01 FXCH instructions retired - 0xfe Retired floating-point computational operations (precise) + 0x01 extra: FXCH instructions retired + 0xfe extra: Retired floating-point computational operations (precise) name:uops_retired type:bitmask default:0x0f 0x01 Fused load+op or load+indirect branch retired 0x02 Fused store address + data retired @@ -183,10 +183,10 @@ name:rat_stalls type:bitmask default:0xf 0x08 FPU status word 0x0f All RAT name:seg_regs type:bitmask default:0x0f - 0x01 ES - 0x02 DS - 0x04 FS - 0x08 GS + 0x01 extra: ES + 0x02 extra: DS + 0x04 extra: FS + 0x08 extra: GS name:resource_stalls type:bitmask default:0x0f 0x01 when the ROB is full 0x02 during which the RS is full diff --git a/events/i386/ivybridge/unit_masks b/events/i386/ivybridge/unit_masks index ddb59a0..ffee1fa 100644 --- a/events/i386/ivybridge/unit_masks +++ b/events/i386/ivybridge/unit_masks @@ -5,163 +5,163 @@ # include:i386/arch_perfmon name:ld_blocks type:mandatory default:0x2 - 0x2 store_forward loads blocked by overlapping with store buffer that cannot be forwarded + 0x2 extra: store_forward loads blocked by overlapping with store buffer that cannot be forwarded name:misalign_mem_ref type:bitmask default:0x1 - 0x1 loads Speculative cache line split load uops dispatched to L1 cache - 0x2 stores Speculative cache line split STA uops dispatched to L1 cache + 0x1 extra: loads Speculative cache line split load uops dispatched to L1 cache + 0x2 extra: stores Speculative cache line split STA uops dispatched to L1 cache name:ld_blocks_partial type:mandatory default:0x1 - 0x1 address_alias False dependencies in MOB due to partial compare on address + 0x1 extra: address_alias False dependencies in MOB due to partial compare on address name:dtlb_load_misses type:exclusive default:0x81 - 0x81 demand_ld_miss_causes_a_walk Demand load Miss in all translation lookaside buffer (TLB) levels causes an page walk of any page size. - 0x82 demand_ld_walk_completed Demand load Miss in all translation lookaside buffer (TLB) levels causes a page walk that completes of any page size. - 0x84 demand_ld_walk_duration Demand load cycles page miss handler (PMH) is busy with this walk. + 0x81 extra: demand_ld_miss_causes_a_walk Demand load Miss in all translation lookaside buffer (TLB) levels causes an page walk of any page size. + 0x82 extra: demand_ld_walk_completed Demand load Miss in all translation lookaside buffer (TLB) levels causes a page walk that completes of any page size. + 0x84 extra: demand_ld_walk_duration Demand load cycles page miss handler (PMH) is busy with this walk. name:int_misc type:exclusive default:0x3 0x3 extra:cmask=1 recovery_cycles Number of cycles waiting for the checkpoints in Resource Allocation Table (RAT) to be recovered after Nuke due to all other cases except JEClear (e.g. whenever a ucode assist is needed like SSE exception, memory disambiguation, etc...) 0x3 extra:cmask=1,edge recovery_stalls_count Number of occurences waiting for the checkpoints in Resource Allocation Table (RAT) to be recovered after Nuke due to all other cases except JEClear (e.g. whenever a ucode assist is needed like SSE exception, memory disambiguation, etc...) name:uops_issued type:exclusive default:0x1 - 0x1 any Uops that Resource Allocation Table (RAT) issues to Reservation Station (RS) + 0x1 extra: any Uops that Resource Allocation Table (RAT) issues to Reservation Station (RS) 0x1 extra:cmask=1,inv stall_cycles Cycles when Resource Allocation Table (RAT) does not issue Uops to Reservation Station (RS) for the thread 0x1 extra:cmask=1,inv,any core_stall_cycles Cycles when Resource Allocation Table (RAT) does not issue Uops to Reservation Station (RS) for all threads - 0x10 flags_merge Number of flags-merge uops being allocated. - 0x20 slow_lea Number of slow LEA uops being allocated. A uop is generally considered SlowLea if it has 3 sources (e.g. 2 sources + immediate) regardless if as a result of LEA instruction or not. - 0x40 single_mul Number of Multiply packed/scalar single precision uops allocated + 0x10 extra: flags_merge Number of flags-merge uops being allocated. + 0x20 extra: slow_lea Number of slow LEA uops being allocated. A uop is generally considered SlowLea if it has 3 sources (e.g. 2 sources + immediate) regardless if as a result of LEA instruction or not. + 0x40 extra: single_mul Number of Multiply packed/scalar single precision uops allocated name:arith type:bitmask default:0x1 - 0x1 fpu_div_active Cycles when divider is busy executing divide operations + 0x1 extra: fpu_div_active Cycles when divider is busy executing divide operations 0x4 extra:cmask=1,edge fpu_div Divide operations executed name:l2_rqsts type:exclusive default:0x1 - 0x1 demand_data_rd_hit Demand Data Read requests that hit L2 cache - 0x3 all_demand_data_rd Demand Data Read requests - 0x4 rfo_hit RFO requests that hit L2 cache - 0x8 rfo_miss RFO requests that miss L2 cache - 0xc all_rfo RFO requests to L2 cache - 0x10 code_rd_hit L2 cache hits when fetching instructions, code reads. - 0x20 code_rd_miss L2 cache misses when fetching instructions - 0x30 all_code_rd L2 code requests - 0x40 pf_hit Requests from the L2 hardware prefetchers that hit L2 cache - 0x80 pf_miss Requests from the L2 hardware prefetchers that miss L2 cache - 0xc0 all_pf Requests from L2 hardware prefetchers + 0x1 extra: demand_data_rd_hit Demand Data Read requests that hit L2 cache + 0x3 extra: all_demand_data_rd Demand Data Read requests + 0x4 extra: rfo_hit RFO requests that hit L2 cache + 0x8 extra: rfo_miss RFO requests that miss L2 cache + 0xc extra: all_rfo RFO requests to L2 cache + 0x10 extra: code_rd_hit L2 cache hits when fetching instructions, code reads. + 0x20 extra: code_rd_miss L2 cache misses when fetching instructions + 0x30 extra: all_code_rd L2 code requests + 0x40 extra: pf_hit Requests from the L2 hardware prefetchers that hit L2 cache + 0x80 extra: pf_miss Requests from the L2 hardware prefetchers that miss L2 cache + 0xc0 extra: all_pf Requests from L2 hardware prefetchers name:l2_store_lock_rqsts type:exclusive default:0x1 - 0x1 miss RFOs that miss cache lines - 0x8 hit_m RFOs that hit cache lines in M state - 0xf all RFOs that access cache lines in any state + 0x1 extra: miss RFOs that miss cache lines + 0x8 extra: hit_m RFOs that hit cache lines in M state + 0xf extra: all RFOs that access cache lines in any state name:l2_l1d_wb_rqsts type:exclusive default:0x1 - 0x1 miss Count the number of modified Lines evicted from L1 and missed L2. (Non-rejected WBs from the DCU.) - 0x4 hit_e Not rejected writebacks from L1D to L2 cache lines in E state - 0x8 hit_m Not rejected writebacks from L1D to L2 cache lines in M state - 0xf all Not rejected writebacks from L1D to L2 cache lines in any state. + 0x1 extra: miss Count the number of modified Lines evicted from L1 and missed L2. (Non-rejected WBs from the DCU.) + 0x4 extra: hit_e Not rejected writebacks from L1D to L2 cache lines in E state + 0x8 extra: hit_m Not rejected writebacks from L1D to L2 cache lines in M state + 0xf extra: all Not rejected writebacks from L1D to L2 cache lines in any state. name:l1d_pend_miss type:exclusive default:0x1 - 0x1 pending L1D miss oustandings duration in cycles + 0x1 extra: pending L1D miss oustandings duration in cycles 0x1 extra:cmask=1 pending_cycles Cycles with L1D load Misses outstanding. 0x1 extra:cmask=1,edge occurences This event counts the number of L1D misses outstanding, using an edge detect to count transitions. name:dtlb_store_misses type:bitmask default:0x1 - 0x1 miss_causes_a_walk Store misses in all DTLB levels that cause page walks - 0x2 walk_completed Store misses in all DTLB levels that cause completed page walks - 0x4 walk_duration Cycles when PMH is busy with page walks - 0x10 stlb_hit Store operations that miss the first TLB level but hit the second and do not cause page walks + 0x1 extra: miss_causes_a_walk Store misses in all DTLB levels that cause page walks + 0x2 extra: walk_completed Store misses in all DTLB levels that cause completed page walks + 0x4 extra: walk_duration Cycles when PMH is busy with page walks + 0x10 extra: stlb_hit Store operations that miss the first TLB level but hit the second and do not cause page walks name:load_hit_pre type:bitmask default:0x1 - 0x1 sw_pf Not software-prefetch load dispatches that hit forward buffer allocated for software prefetch - 0x2 hw_pf Not software-prefetch load dispatches that hit forward buffer allocated for hardware prefetch + 0x1 extra: sw_pf Not software-prefetch load dispatches that hit forward buffer allocated for software prefetch + 0x2 extra: hw_pf Not software-prefetch load dispatches that hit forward buffer allocated for hardware prefetch name:l1d type:mandatory default:0x1 - 0x1 replacement L1D data line replacements + 0x1 extra: replacement L1D data line replacements name:move_elimination type:bitmask default:0x1 - 0x1 int_not_eliminated Number of integer Move Elimination candidate uops that were not eliminated. - 0x2 simd_not_eliminated Number of SIMD Move Elimination candidate uops that were not eliminated. - 0x4 int_eliminated Number of integer Move Elimination candidate uops that were eliminated. - 0x8 simd_eliminated Number of SIMD Move Elimination candidate uops that were eliminated. + 0x1 extra: int_not_eliminated Number of integer Move Elimination candidate uops that were not eliminated. + 0x2 extra: simd_not_eliminated Number of SIMD Move Elimination candidate uops that were not eliminated. + 0x4 extra: int_eliminated Number of integer Move Elimination candidate uops that were eliminated. + 0x8 extra: simd_eliminated Number of SIMD Move Elimination candidate uops that were eliminated. name:cpl_cycles type:exclusive default:0x1 - 0x1 ring0 Unhalted core cycles when the thread is in ring 0 + 0x1 extra: ring0 Unhalted core cycles when the thread is in ring 0 0x1 extra:cmask=1,edge ring0_trans Number of intervals between processor halts while thread is in ring 0 - 0x2 ring123 Unhalted core cycles when thread is in rings 1, 2, or 3 + 0x2 extra: ring123 Unhalted core cycles when thread is in rings 1, 2, or 3 name:rs_events type:mandatory default:0x1 - 0x1 empty_cycles Cycles when Reservation Station (RS) is empty for the thread + 0x1 extra: empty_cycles Cycles when Reservation Station (RS) is empty for the thread name:tlb_access type:mandatory default:0x4 - 0x4 load_stlb_hit Load operations that miss the first DTLB level but hit the second and do not cause page walks + 0x4 extra: load_stlb_hit Load operations that miss the first DTLB level but hit the second and do not cause page walks name:offcore_requests_outstanding type:exclusive default:0x1 - 0x1 demand_data_rd Offcore outstanding Demand Data Read transactions in uncore queue. + 0x1 extra: demand_data_rd Offcore outstanding Demand Data Read transactions in uncore queue. 0x1 extra:cmask=1 cycles_with_demand_data_rd Cycles when offcore outstanding Demand Data Read transactions are present in SuperQueue (SQ), queue to uncore - 0x2 demand_code_rd Offcore outstanding code reads transactions in SuperQueue (SQ), queue to uncore, every cycle - 0x4 demand_rfo Offcore outstanding RFO store transactions in SuperQueue (SQ), queue to uncore + 0x2 extra: demand_code_rd Offcore outstanding code reads transactions in SuperQueue (SQ), queue to uncore, every cycle + 0x4 extra: demand_rfo Offcore outstanding RFO store transactions in SuperQueue (SQ), queue to uncore 0x4 extra:cmask=1 cycles_with_demand_rfo Offcore outstanding demand rfo reads transactions in SuperQueue (SQ), queue to uncore, every cycle - 0x8 all_data_rd Offcore outstanding cacheable Core Data Read transactions in SuperQueue (SQ), queue to uncore + 0x8 extra: all_data_rd Offcore outstanding cacheable Core Data Read transactions in SuperQueue (SQ), queue to uncore 0x8 extra:cmask=1 cycles_with_data_rd Cycles when offcore outstanding cacheable Core Data Read transactions are present in SuperQueue (SQ), queue to uncore name:lock_cycles type:bitmask default:0x1 - 0x1 split_lock_uc_lock_duration Cycles when L1 and L2 are locked due to UC or split lock - 0x2 cache_lock_duration Cycles when L1D is locked + 0x1 extra: split_lock_uc_lock_duration Cycles when L1 and L2 are locked due to UC or split lock + 0x2 extra: cache_lock_duration Cycles when L1D is locked name:idq type:exclusive default:0x2 - 0x2 empty Instruction Decode Queue (IDQ) empty cycles - 0x4 mite_uops Uops delivered to Instruction Decode Queue (IDQ) from MITE path + 0x2 extra: empty Instruction Decode Queue (IDQ) empty cycles + 0x4 extra: mite_uops Uops delivered to Instruction Decode Queue (IDQ) from MITE path 0x4 extra:cmask=1 mite_cycles Cycles when uops are being delivered to Instruction Decode Queue (IDQ) from MITE path - 0x8 dsb_uops Uops delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path + 0x8 extra: dsb_uops Uops delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path 0x8 extra:cmask=1 dsb_cycles Cycles when uops are being delivered to Instruction Decode Queue (IDQ) from Decode Stream Buffer (DSB) path - 0x10 ms_dsb_uops Uops initiated by Decode Stream Buffer (DSB) that are being delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy + 0x10 extra: ms_dsb_uops Uops initiated by Decode Stream Buffer (DSB) that are being delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy 0x10 extra:cmask=1 ms_dsb_cycles Cycles when uops initiated by Decode Stream Buffer (DSB) are being delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy 0x10 extra:cmask=1,edge ms_dsb_occur Deliveries to Instruction Decode Queue (IDQ) initiated by Decode Stream Buffer (DSB) while Microcode Sequenser (MS) is busy 0x18 extra:cmask=1 all_dsb_cycles_any_uops Cycles Decode Stream Buffer (DSB) is delivering any Uop 0x18 extra:cmask=4 all_dsb_cycles_4_uops Cycles Decode Stream Buffer (DSB) is delivering 4 Uops - 0x20 ms_mite_uops Uops initiated by MITE and delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy + 0x20 extra: ms_mite_uops Uops initiated by MITE and delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy 0x24 extra:cmask=1 all_mite_cycles_any_uops Cycles MITE is delivering any Uop 0x24 extra:cmask=4 all_mite_cycles_4_uops Cycles MITE is delivering 4 Uops - 0x30 ms_uops Uops delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy + 0x30 extra: ms_uops Uops delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy 0x30 extra:cmask=1 ms_cycles Cycles when uops are being delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy - 0x3c mite_all_uops Uops delivered to Instruction Decode Queue (IDQ) from MITE path + 0x3c extra: mite_all_uops Uops delivered to Instruction Decode Queue (IDQ) from MITE path name:icache type:mandatory default:0x2 - 0x2 misses Instruction cache, streaming buffer and victim cache misses + 0x2 extra: misses Instruction cache, streaming buffer and victim cache misses name:itlb_misses type:bitmask default:0x1 - 0x1 miss_causes_a_walk Misses at all ITLB levels that cause page walks - 0x2 walk_completed Misses in all ITLB levels that cause completed page walks - 0x4 walk_duration Cycles when PMH is busy with page walks - 0x10 stlb_hit Operations that miss the first ITLB level but hit the second and do not cause any page walks + 0x1 extra: miss_causes_a_walk Misses at all ITLB levels that cause page walks + 0x2 extra: walk_completed Misses in all ITLB levels that cause completed page walks + 0x4 extra: walk_duration Cycles when PMH is busy with page walks + 0x10 extra: stlb_hit Operations that miss the first ITLB level but hit the second and do not cause any page walks name:ild_stall type:bitmask default:0x1 - 0x1 lcp Stalls caused by changing prefix length of the instruction. - 0x4 iq_full Stall cycles because IQ is full + 0x1 extra: lcp Stalls caused by changing prefix length of the instruction. + 0x4 extra: iq_full Stall cycles because IQ is full name:br_inst_exec type:exclusive default:0x41 - 0x41 nontaken_conditional Not taken macro-conditional branches - 0x81 taken_conditional Taken speculative and retired macro-conditional branches - 0x82 taken_direct_jump Taken speculative and retired macro-conditional branch instructions excluding calls and indirects - 0x84 taken_indirect_jump_non_call_ret Taken speculative and retired indirect branches excluding calls and returns - 0x88 taken_indirect_near_return Taken speculative and retired indirect branches with return mnemonic - 0x90 taken_direct_near_call Taken speculative and retired direct near calls - 0xa0 taken_indirect_near_call Taken speculative and retired indirect calls - 0xc1 all_conditional Speculative and retired macro-conditional branches - 0xc2 all_direct_jmp Speculative and retired macro-unconditional branches excluding calls and indirects - 0xc4 all_indirect_jump_non_call_ret Speculative and retired indirect branches excluding calls and returns - 0xc8 all_indirect_near_return Speculative and retired indirect return branches. - 0xd0 all_direct_near_call Speculative and retired direct near calls - 0xff all_branches Speculative and retired branches + 0x41 extra: nontaken_conditional Not taken macro-conditional branches + 0x81 extra: taken_conditional Taken speculative and retired macro-conditional branches + 0x82 extra: taken_direct_jump Taken speculative and retired macro-conditional branch instructions excluding calls and indirects + 0x84 extra: taken_indirect_jump_non_call_ret Taken speculative and retired indirect branches excluding calls and returns + 0x88 extra: taken_indirect_near_return Taken speculative and retired indirect branches with return mnemonic + 0x90 extra: taken_direct_near_call Taken speculative and retired direct near calls + 0xa0 extra: taken_indirect_near_call Taken speculative and retired indirect calls + 0xc1 extra: all_conditional Speculative and retired macro-conditional branches + 0xc2 extra: all_direct_jmp Speculative and retired macro-unconditional branches excluding calls and indirects + 0xc4 extra: all_indirect_jump_non_call_ret Speculative and retired indirect branches excluding calls and returns + 0xc8 extra: all_indirect_near_return Speculative and retired indirect return branches. + 0xd0 extra: all_direct_near_call Speculative and retired direct near calls + 0xff extra: all_branches Speculative and retired branches name:br_misp_exec type:exclusive default:0x41 - 0x41 nontaken_conditional Not taken speculative and retired mispredicted macro conditional branches - 0x81 taken_conditional Taken speculative and retired mispredicted macro conditional branches - 0x84 taken_indirect_jump_non_call_ret Taken speculative and retired mispredicted indirect branches excluding calls and returns - 0x88 taken_return_near Taken speculative and retired mispredicted indirect branches with return mnemonic - 0xa0 taken_indirect_near_call Taken speculative and retired mispredicted indirect calls - 0xc1 all_conditional Speculative and retired mispredicted macro conditional branches - 0xc4 all_indirect_jump_non_call_ret Mispredicted indirect branches excluding calls and returns - 0xff all_branches Speculative and retired mispredicted macro conditional branches + 0x41 extra: nontaken_conditional Not taken speculative and retired mispredicted macro conditional branches + 0x81 extra: taken_conditional Taken speculative and retired mispredicted macro conditional branches + 0x84 extra: taken_indirect_jump_non_call_ret Taken speculative and retired mispredicted indirect branches excluding calls and returns + 0x88 extra: taken_return_near Taken speculative and retired mispredicted indirect branches with return mnemonic + 0xa0 extra: taken_indirect_near_call Taken speculative and retired mispredicted indirect calls + 0xc1 extra: all_conditional Speculative and retired mispredicted macro conditional branches + 0xc4 extra: all_indirect_jump_non_call_ret Mispredicted indirect branches excluding calls and returns + 0xff extra: all_branches Speculative and retired mispredicted macro conditional branches name:idq_uops_not_delivered type:exclusive default:0x1 - 0x1 core Uops not delivered by the Frontend to the Backend of the machine, while there is no Backend stall + 0x1 extra: core Uops not delivered by the Frontend to the Backend of the machine, while there is no Backend stall 0x1 extra:cmask=1 cycles_le_3_uop_deliv.core Cycles with 3 or less uops delivered by the Frontend to the Backend of the machine, while there is no Backend stall 0x1 extra:cmask=1,inv cycles_fe_was_ok Cycles with 4 uops delivered by the Frontend to the Backend of the machine, or the Backend was stalling 0x1 extra:cmask=2 cycles_le_2_uop_deliv.core Cycles with 2 or less uops delivered by the Frontend to the Backend of the machine, while there is no Backend stall 0x1 extra:cmask=3 cycles_le_1_uop_deliv.core Cycles with 1 or less uops delivered by the Frontend to the Backend of the machine, while there is no Backend stall 0x1 extra:cmask=4 cycles_0_uops_deliv.core Cycles with no uops delivered by the Frontend to the Backend of the machine, while there is no Backend stall name:uops_dispatched_port type:exclusive default:0x1 - 0x1 port_0 Cycles per thread when uops are dispatched to port 0 + 0x1 extra: port_0 Cycles per thread when uops are dispatched to port 0 0x1 extra:any port_0_core Cycles per core when uops are dispatched to port 0 - 0x2 port_1 Cycles per thread when uops are dispatched to port 1 + 0x2 extra: port_1 Cycles per thread when uops are dispatched to port 1 0x2 extra:any port_1_core Cycles per core when uops are dispatched to port 1 - 0xc port_2 Cycles per thread when load or STA uops are dispatched to port 2 + 0xc extra: port_2 Cycles per thread when load or STA uops are dispatched to port 2 0xc extra:any port_2_core Cycles per core when load or STA uops are dispatched to port 2 - 0x30 port_3 Cycles per thread when load or STA uops are dispatched to port 3 + 0x30 extra: port_3 Cycles per thread when load or STA uops are dispatched to port 3 0x30 extra:any port_3_core Cycles per core when load or STA uops are dispatched to port 3 - 0x40 port_4 Cycles per thread when uops are dispatched to port 4 + 0x40 extra: port_4 Cycles per thread when uops are dispatched to port 4 0x40 extra:any port_4_core Cycles per core when uops are dispatched to port 4 - 0x80 port_5 Cycles per thread when uops are dispatched to port 5 + 0x80 extra: port_5 Cycles per thread when uops are dispatched to port 5 0x80 extra:any port_5_core Cycles per core when uops are dispatched to port 5 name:resource_stalls type:bitmask default:0x1 - 0x1 any Resource-related stall cycles - 0x4 rs Cycles stalled due to no eligible RS entry available. - 0x8 sb Cycles stalled due to no store buffers available. (not including draining form sync). - 0x10 rob Cycles stalled due to re-order buffer full. + 0x1 extra: any Resource-related stall cycles + 0x4 extra: rs Cycles stalled due to no eligible RS entry available. + 0x8 extra: sb Cycles stalled due to no store buffers available. (not including draining form sync). + 0x10 extra: rob Cycles stalled due to re-order buffer full. name:cycle_activity type:exclusive default:0x1 0x1 extra:cmask=1 cycles_l2_pending Cycles with pending L2 cache miss loads. 0x2 extra:cmask=2 cycles_ldm_pending Cycles with pending memory loads. @@ -171,99 +171,99 @@ name:cycle_activity type:exclusive default:0x1 0x8 extra:cmask=8 cycles_l1d_pending Cycles with pending L1 cache miss loads. 0xc extra:cmask=c stalls_l1d_pending Execution stalls due to L1 data cache misses name:dsb2mite_switches type:mandatory default:0x1 - 0x1 count Decode Stream Buffer (DSB)-to-MITE switches + 0x1 extra: count Decode Stream Buffer (DSB)-to-MITE switches name:dsb_fill type:mandatory default:0x8 - 0x8 exceed_dsb_lines Cycles when Decode Stream Buffer (DSB) fill encounter more than 3 Decode Stream Buffer (DSB) lines + 0x8 extra: exceed_dsb_lines Cycles when Decode Stream Buffer (DSB) fill encounter more than 3 Decode Stream Buffer (DSB) lines name:itlb type:mandatory default:0x1 - 0x1 itlb_flush Flushing of the Instruction TLB (ITLB) pages, includes 4k/2M/4M pages. + 0x1 extra: itlb_flush Flushing of the Instruction TLB (ITLB) pages, includes 4k/2M/4M pages. name:offcore_requests type:bitmask default:0x1 - 0x1 demand_data_rd Demand Data Read requests sent to uncore - 0x2 demand_code_rd Cacheable and noncachaeble code read requests - 0x4 demand_rfo Demand RFO requests including regular RFOs, locks, ItoM - 0x8 all_data_rd Demand and prefetch data reads + 0x1 extra: demand_data_rd Demand Data Read requests sent to uncore + 0x2 extra: demand_code_rd Cacheable and noncachaeble code read requests + 0x4 extra: demand_rfo Demand RFO requests including regular RFOs, locks, ItoM + 0x8 extra: all_data_rd Demand and prefetch data reads name:uops_executed type:exclusive default:0x1 - 0x1 thread Counts the number of uops to be executed per-thread each cycle. + 0x1 extra: thread Counts the number of uops to be executed per-thread each cycle. 0x1 extra:cmask=1 cycles_ge_1_uop_exec Cycles where at least 1 uop was executed per-thread 0x1 extra:cmask=1,inv stall_cycles Counts number of cycles no uops were dispatched to be executed on this thread. 0x1 extra:cmask=2 cycles_ge_2_uops_exec Cycles where at least 2 uops were executed per-thread 0x1 extra:cmask=3 cycles_ge_3_uops_exec Cycles where at least 3 uops were executed per-thread 0x1 extra:cmask=4 cycles_ge_4_uops_exec Cycles where at least 4 uops were executed per-thread - 0x2 core Number of uops executed on the core. + 0x2 extra: core Number of uops executed on the core. name:tlb_flush type:bitmask default:0x1 - 0x1 dtlb_thread DTLB flush attempts of the thread-specific entries - 0x20 stlb_any STLB flush attempts + 0x1 extra: dtlb_thread DTLB flush attempts of the thread-specific entries + 0x20 extra: stlb_any STLB flush attempts name:other_assists type:bitmask default:0x8 - 0x8 avx_store Number of AVX memory assist for stores. AVX microcode assist is being invoked whenever the hardware is unable to properly handle AVX-256b operations. - 0x10 avx_to_sse Number of transitions from AVX-256 to legacy SSE when penalty applicable. - 0x20 sse_to_avx Number of transitions from SSE to AVX-256 when penalty applicable. + 0x8 extra: avx_store Number of AVX memory assist for stores. AVX microcode assist is being invoked whenever the hardware is unable to properly handle AVX-256b operations. + 0x10 extra: avx_to_sse Number of transitions from AVX-256 to legacy SSE when penalty applicable. + 0x20 extra: sse_to_avx Number of transitions from SSE to AVX-256 when penalty applicable. name:uops_retired type:exclusive default:0x1 - 0x1 all Actually retired uops. + 0x1 extra: all Actually retired uops. 0x1 extra:cmask=1,inv stall_cycles Cycles without actually retired uops. 0x1 extra:cmask=1,inv,any core_stall_cycles Cycles without actually retired uops. 0x1 extra:cmask=10,inv total_cycles Cycles with less than 10 actually retired uops. - 0x2 retire_slots Retirement slots used. + 0x2 extra: retire_slots Retirement slots used. name:machine_clears type:bitmask default:0x2 - 0x2 memory_ordering Counts the number of machine clears due to memory order conflicts. - 0x4 smc Self-modifying code (SMC) detected. - 0x20 maskmov This event counts the number of executed Intel AVX masked load operations that refer to an illegal address range with the mask bits set to 0. + 0x2 extra: memory_ordering Counts the number of machine clears due to memory order conflicts. + 0x4 extra: smc Self-modifying code (SMC) detected. + 0x20 extra: maskmov This event counts the number of executed Intel AVX masked load operations that refer to an illegal address range with the mask bits set to 0. name:br_inst_retired type:exclusive default:0x1 - 0x1 conditional Conditional branch instructions retired. - 0x2 near_call_r3 Direct and indirect macro near call instructions retired (captured in ring 3). - 0x2 near_call Direct and indirect near call instructions retired. - 0x8 near_return Return instructions retired. - 0x10 not_taken Not taken branch instructions retired. - 0x20 near_taken Taken branch instructions retired. - 0x40 far_branch Far branch instructions retired. + 0x1 extra: conditional Conditional branch instructions retired. + 0x2 extra: near_call_r3 Direct and indirect macro near call instructions retired (captured in ring 3). + 0x2 extra: near_call Direct and indirect near call instructions retired. + 0x8 extra: near_return Return instructions retired. + 0x10 extra: not_taken Not taken branch instructions retired. + 0x20 extra: near_taken Taken branch instructions retired. + 0x40 extra: far_branch Far branch instructions retired. name:br_misp_retired type:bitmask default:0x1 - 0x1 conditional Mispredicted conditional branch instructions retired. - 0x20 near_taken number of near branch instructions retired that were mispredicted and taken. + 0x1 extra: conditional Mispredicted conditional branch instructions retired. + 0x20 extra: near_taken number of near branch instructions retired that were mispredicted and taken. name:fp_assist type:exclusive default:0x1e - 0x2 x87_output Number of X87 assists due to output value. - 0x4 x87_input Number of X87 assists due to input value. - 0x8 simd_output Number of SIMD FP assists due to Output values - 0x10 simd_input Number of SIMD FP assists due to input values + 0x2 extra: x87_output Number of X87 assists due to output value. + 0x4 extra: x87_input Number of X87 assists due to input value. + 0x8 extra: simd_output Number of SIMD FP assists due to Output values + 0x10 extra: simd_input Number of SIMD FP assists due to input values 0x1e extra:cmask=1 any Cycles with any input/output SSE or FP assist name:rob_misc_events type:mandatory default:0x20 - 0x20 lbr_inserts Count cases of saving new LBR + 0x20 extra: lbr_inserts Count cases of saving new LBR name:mem_uops_retired type:exclusive default:0x81 - 0x11 stlb_miss_loads Load uops with true STLB miss retired to architected path. - 0x12 stlb_miss_stores Store uops with true STLB miss retired to architected path. - 0x21 lock_loads Load uops with locked access retired to architected path. - 0x41 split_loads Line-splitted load uops retired to architected path. - 0x42 split_stores Line-splitted store uops retired to architected path. - 0x81 all_loads Load uops retired to architected path with filter on bits 0 and 1 applied. - 0x82 all_stores Store uops retired to architected path with filter on bits 0 and 1 applied. + 0x11 extra: stlb_miss_loads Load uops with true STLB miss retired to architected path. + 0x12 extra: stlb_miss_stores Store uops with true STLB miss retired to architected path. + 0x21 extra: lock_loads Load uops with locked access retired to architected path. + 0x41 extra: split_loads Line-splitted load uops retired to architected path. + 0x42 extra: split_stores Line-splitted store uops retired to architected path. + 0x81 extra: all_loads Load uops retired to architected path with filter on bits 0 and 1 applied. + 0x82 extra: all_stores Store uops retired to architected path with filter on bits 0 and 1 applied. name:mem_load_uops_retired type:bitmask default:0x1 - 0x1 l1_hit Retired load uops with L1 cache hits as data sources. - 0x2 l2_hit Retired load uops with L2 cache hits as data sources. - 0x4 llc_hit Retired load uops which data sources were data hits in LLC without snoops required. - 0x40 hit_lfb Retired load uops which data sources were load uops missed L1 but hit forward buffer due to preceding miss to the same cache line with data not ready. + 0x1 extra: l1_hit Retired load uops with L1 cache hits as data sources. + 0x2 extra: l2_hit Retired load uops with L2 cache hits as data sources. + 0x4 extra: llc_hit Retired load uops which data sources were data hits in LLC without snoops required. + 0x40 extra: hit_lfb Retired load uops which data sources were load uops missed L1 but hit forward buffer due to preceding miss to the same cache line with data not ready. name:mem_load_uops_llc_hit_retired type:bitmask default:0x1 - 0x1 xsnp_miss Retired load uops which data sources were LLC hit and cross-core snoop missed in on-pkg core cache. - 0x2 xsnp_hit Retired load uops which data sources were LLC and cross-core snoop hits in on-pkg core cache. - 0x4 xsnp_hitm Retired load uops which data sources were HitM responses from shared LLC. - 0x8 xsnp_none Retired load uops which data sources were hits in LLC without snoops required. + 0x1 extra: xsnp_miss Retired load uops which data sources were LLC hit and cross-core snoop missed in on-pkg core cache. + 0x2 extra: xsnp_hit Retired load uops which data sources were LLC and cross-core snoop hits in on-pkg core cache. + 0x4 extra: xsnp_hitm Retired load uops which data sources were HitM responses from shared LLC. + 0x8 extra: xsnp_none Retired load uops which data sources were hits in LLC without snoops required. name:mem_load_uops_llc_miss_retired type:mandatory default:0x1 - 0x1 local_dram Data from local DRAM either Snoop not needed or Snoop Miss (RspI) + 0x1 extra: local_dram Data from local DRAM either Snoop not needed or Snoop Miss (RspI) name:baclears type:mandatory default:0x1f - 0x1f any Counts the total number when the front end is resteered, mainly when the BPU cannot provide a correct prediction and this is corrected by other branch handling mechanisms at the front end. + 0x1f extra: any Counts the total number when the front end is resteered, mainly when the BPU cannot provide a correct prediction and this is corrected by other branch handling mechanisms at the front end. name:l2_trans type:bitmask default:0x80 - 0x1 demand_data_rd Demand Data Read requests that access L2 cache - 0x2 rfo RFO requests that access L2 cache - 0x4 code_rd L2 cache accesses when fetching instructions - 0x8 all_pf L2 or LLC HW prefetches that access L2 cache - 0x10 l1d_wb L1D writebacks that access L2 cache - 0x20 l2_fill L2 fill requests that access L2 cache - 0x40 l2_wb L2 writebacks that access L2 cache - 0x80 all_requests Transactions accessing L2 pipe + 0x1 extra: demand_data_rd Demand Data Read requests that access L2 cache + 0x2 extra: rfo RFO requests that access L2 cache + 0x4 extra: code_rd L2 cache accesses when fetching instructions + 0x8 extra: all_pf L2 or LLC HW prefetches that access L2 cache + 0x10 extra: l1d_wb L1D writebacks that access L2 cache + 0x20 extra: l2_fill L2 fill requests that access L2 cache + 0x40 extra: l2_wb L2 writebacks that access L2 cache + 0x80 extra: all_requests Transactions accessing L2 pipe name:l2_lines_in type:exclusive default:0x7 - 0x1 i L2 cache lines in I state filling L2 - 0x2 s L2 cache lines in S state filling L2 - 0x4 e L2 cache lines in E state filling L2 - 0x7 all L2 cache lines filling L2 + 0x1 extra: i L2 cache lines in I state filling L2 + 0x2 extra: s L2 cache lines in S state filling L2 + 0x4 extra: e L2 cache lines in E state filling L2 + 0x7 extra: all L2 cache lines filling L2 name:l2_lines_out type:exclusive default:0x1 - 0x1 demand_clean Clean L2 cache lines evicted by demand - 0x2 demand_dirty Dirty L2 cache lines evicted by demand - 0x4 pf_clean Clean L2 cache lines evicted by L2 prefetch - 0x8 pf_dirty Dirty L2 cache lines evicted by L2 prefetch - 0xa dirty_all Dirty L2 cache lines filling the L2 + 0x1 extra: demand_clean Clean L2 cache lines evicted by demand + 0x2 extra: demand_dirty Dirty L2 cache lines evicted by demand + 0x4 extra: pf_clean Clean L2 cache lines evicted by L2 prefetch + 0x8 extra: pf_dirty Dirty L2 cache lines evicted by L2 prefetch + 0xa extra: dirty_all Dirty L2 cache lines filling the L2 diff --git a/events/i386/nehalem/unit_masks b/events/i386/nehalem/unit_masks index d800e5d..8f60292 100644 --- a/events/i386/nehalem/unit_masks +++ b/events/i386/nehalem/unit_masks @@ -4,369 +4,369 @@ # include:i386/arch_perfmon name:sb_forward type:mandatory default:0x01 - 0x01 any Counts the number of store forwards + 0x01 extra: any Counts the number of store forwards name:load_block type:bitmask default:0x01 - 0x01 std Counts the number of loads blocked by a preceding store with unknown data - 0x04 address_offset Counts the number of loads blocked by a preceding store address + 0x01 extra: std Counts the number of loads blocked by a preceding store with unknown data + 0x04 extra: address_offset Counts the number of loads blocked by a preceding store address name:sb_drain type:mandatory default:0x01 - 0x01 cycles Counts the cycles of store buffer drains + 0x01 extra: cycles Counts the cycles of store buffer drains name:misalign_mem_ref type:bitmask default:0x03 - 0x01 load Counts the number of misaligned load references - 0x02 store Counts the number of misaligned store references - 0x03 any Counts the number of misaligned memory references + 0x01 extra: load Counts the number of misaligned load references + 0x02 extra: store Counts the number of misaligned store references + 0x03 extra: any Counts the number of misaligned memory references name:store_blocks type:bitmask default:0x0f - 0x01 not_sta This event counts the number of load operations delayed caused by preceding stores whose addresses are known but whose data is unknown, and preceding stores that conflict with the load but which incompletely overlap the load - 0x02 sta This event counts load operations delayed caused by preceding stores whose addresses are unknown (STA block) - 0x04 at_ret Counts number of loads delayed with at-Retirement block code - 0x08 l1d_block Cacheable loads delayed with L1D block code + 0x01 extra: not_sta This event counts the number of load operations delayed caused by preceding stores whose addresses are known but whose data is unknown, and preceding stores that conflict with the load but which incompletely overlap the load + 0x02 extra: sta This event counts load operations delayed caused by preceding stores whose addresses are unknown (STA block) + 0x04 extra: at_ret Counts number of loads delayed with at-Retirement block code + 0x08 extra: l1d_block Cacheable loads delayed with L1D block code 0x0F any All loads delayed due to store blocks name:dtlb_load_misses type:bitmask default:0x01 - 0x01 any Counts all load misses that cause a page walk - 0x02 walk_completed Counts number of completed page walks due to load miss in the STLB - 0x10 stlb_hit Number of cache load STLB hits - 0x20 pde_miss Number of DTLB cache load misses where the low part of the linear to physical address translation was missed - 0x40 pdp_miss Number of DTLB cache load misses where the high part of the linear to physical address translation was missed - 0x80 large_walk_completed Counts number of completed large page walks due to load miss in the STLB + 0x01 extra: any Counts all load misses that cause a page walk + 0x02 extra: walk_completed Counts number of completed page walks due to load miss in the STLB + 0x10 extra: stlb_hit Number of cache load STLB hits + 0x20 extra: pde_miss Number of DTLB cache load misses where the low part of the linear to physical address translation was missed + 0x40 extra: pdp_miss Number of DTLB cache load misses where the high part of the linear to physical address translation was missed + 0x80 extra: large_walk_completed Counts number of completed large page walks due to load miss in the STLB name:memory_disambiguration type:bitmask default:0x01 - 0x01 reset Counts memory disambiguration reset cycles - 0x02 success Counts the number of loads that memory disambiguration succeeded - 0x04 watchdog Counts the number of times the memory disambiguration watchdog kicked in - 0x08 watch_cycles Counts the cycles that the memory disambiguration watchdog is active + 0x01 extra: reset Counts memory disambiguration reset cycles + 0x02 extra: success Counts the number of loads that memory disambiguration succeeded + 0x04 extra: watchdog Counts the number of times the memory disambiguration watchdog kicked in + 0x08 extra: watch_cycles Counts the cycles that the memory disambiguration watchdog is active name:mem_inst_retired type:bitmask default:0x01 - 0x01 loads Counts the number of instructions with an architecturally-visible store retired on the architected path - 0x02 stores Counts the number of instructions with an architecturally-visible store retired on the architected path + 0x01 extra: loads Counts the number of instructions with an architecturally-visible store retired on the architected path + 0x02 extra: stores Counts the number of instructions with an architecturally-visible store retired on the architected path name:mem_store_retired type:mandatory default:0x01 - 0x01 dtlb_miss The event counts the number of retired stores that missed the DTLB + 0x01 extra: dtlb_miss The event counts the number of retired stores that missed the DTLB name:uops_issued type:bitmask default:0x01 - 0x01 any Counts the number of Uops issued by the Register Allocation Table to the Reservation Station, i - 0x01 stalled_cycles Counts the number of cycles no Uops issued by the Register Allocation Table to the Reservation Station, i - 0x02 fused Counts the number of fused Uops that were issued from the Register Allocation Table to the Reservation Station + 0x01 extra: any Counts the number of Uops issued by the Register Allocation Table to the Reservation Station, i + 0x01 extra: stalled_cycles Counts the number of cycles no Uops issued by the Register Allocation Table to the Reservation Station, i + 0x02 extra: fused Counts the number of fused Uops that were issued from the Register Allocation Table to the Reservation Station name:mem_uncore_retired type:bitmask default:0x02 - 0x02 other_core_l2_hitm Counts number of memory load instructions retired where the memory reference hit modified data in a sibling core residing on the same socket - 0x08 remote_cache_local_home_hit Counts number of memory load instructions retired where the memory reference missed the L1, L2 and LLC caches and HIT in a remote socket's cache - 0x10 remote_dram Counts number of memory load instructions retired where the memory reference missed the L1, L2 and LLC caches and was remotely homed - 0x20 local_dram Counts number of memory load instructions retired where the memory reference missed the L1, L2 and LLC caches and required a local socket memory reference + 0x02 extra: other_core_l2_hitm Counts number of memory load instructions retired where the memory reference hit modified data in a sibling core residing on the same socket + 0x08 extra: remote_cache_local_home_hit Counts number of memory load instructions retired where the memory reference missed the L1, L2 and LLC caches and HIT in a remote socket's cache + 0x10 extra: remote_dram Counts number of memory load instructions retired where the memory reference missed the L1, L2 and LLC caches and was remotely homed + 0x20 extra: local_dram Counts number of memory load instructions retired where the memory reference missed the L1, L2 and LLC caches and required a local socket memory reference name:fp_comp_ops_exe type:bitmask default:0x01 - 0x01 x87 Counts the number of FP Computational Uops Executed - 0x02 mmx Counts number of MMX Uops executed - 0x04 sse_fp Counts number of SSE and SSE2 FP uops executed - 0x08 sse2_integer Counts number of SSE2 integer uops executed - 0x10 sse_fp_packed Counts number of SSE FP packed uops executed - 0x20 sse_fp_scalar Counts number of SSE FP scalar uops executed - 0x40 sse_single_precision Counts number of SSE* FP single precision uops executed - 0x80 sse_double_precision Counts number of SSE* FP double precision uops executed + 0x01 extra: x87 Counts the number of FP Computational Uops Executed + 0x02 extra: mmx Counts number of MMX Uops executed + 0x04 extra: sse_fp Counts number of SSE and SSE2 FP uops executed + 0x08 extra: sse2_integer Counts number of SSE2 integer uops executed + 0x10 extra: sse_fp_packed Counts number of SSE FP packed uops executed + 0x20 extra: sse_fp_scalar Counts number of SSE FP scalar uops executed + 0x40 extra: sse_single_precision Counts number of SSE* FP single precision uops executed + 0x80 extra: sse_double_precision Counts number of SSE* FP double precision uops executed name:simd_int_128 type:bitmask default:0x01 - 0x01 packed_mpy Counts number of 128 bit SIMD integer multiply operations - 0x02 packed_shift Counts number of 128 bit SIMD integer shift operations - 0x04 pack Counts number of 128 bit SIMD integer pack operations - 0x08 unpack Counts number of 128 bit SIMD integer unpack operations - 0x10 packed_logical Counts number of 128 bit SIMD integer logical operations - 0x20 packed_arith Counts number of 128 bit SIMD integer arithmetic operations - 0x40 shuffle_move Cou... [truncated message content] |
From: Maynard J. <may...@us...> - 2013-06-19 22:23:41
|
On 06/18/2013 06:07 PM, Andi Kleen wrote: > From: Andi Kleen <ak...@li...> > > Add empty extra: lines to every Intel event with a unique first word Hi, Andi, I found some cases where the first word is not unique. See below -- search for "Duplicate". -Maynard > in the description. This can then be used to specify the unit mask > symbolically. > > Haswell already had the empty extra masks. > Core 2 did not have unique words for everything. I only add it there > to the events which had. > > v2: Do changes for Atom too > Signed-off-by: Andi Kleen <ak...@li...> > --- > events/i386/atom/unit_masks | 154 +++++------ > events/i386/core_2/unit_masks | 62 ++--- > events/i386/ivybridge/unit_masks | 330 +++++++++++----------- > events/i386/nehalem/unit_masks | 552 ++++++++++++++++++------------------- > events/i386/sandybridge/unit_masks | 374 ++++++++++++------------- > events/i386/westmere/unit_masks | 480 ++++++++++++++++---------------- > 6 files changed, 976 insertions(+), 976 deletions(-) > > diff --git a/events/i386/atom/unit_masks b/events/i386/atom/unit_masks > index acaec23..4802ddb 100644 > --- a/events/i386/atom/unit_masks > +++ b/events/i386/atom/unit_masks > @@ -3,118 +3,118 @@ > # > include:i386/arch_perfmon > name:store_forwards type:mandatory default:0x81 > - 0x81 good Good store forwards > + 0x81 extra: good Good store forwards > name:segment_reg_loads type:mandatory default:0x00 > - 0x00 any Number of segment register loads > + 0x00 extra: any Number of segment register loads > name:simd_prefetch type:bitmask default:0x01 > - 0x01 prefetcht0 Streaming SIMD Extensions (SSE) PrefetchT0 instructions executed > - 0x06 sw_l2 Streaming SIMD Extensions (SSE) PrefetchT1 and PrefetchT2 instructions executed > - 0x08 prefetchnta Streaming SIMD Extensions (SSE) Prefetch NTA instructions executed > + 0x01 extra: prefetcht0 Streaming SIMD Extensions (SSE) PrefetchT0 instructions executed > + 0x06 extra: sw_l2 Streaming SIMD Extensions (SSE) PrefetchT1 and PrefetchT2 instructions executed > + 0x08 extra: prefetchnta Streaming SIMD Extensions (SSE) Prefetch NTA instructions executed > name:data_tlb_misses type:bitmask default:0x07 > - 0x07 dtlb_miss Memory accesses that missed the DTLB > - 0x05 dtlb_miss_ld DTLB misses due to load operations > - 0x09 l0_dtlb_miss_ld L0_DTLB misses due to load operations > - 0x06 dtlb_miss_st DTLB misses due to store operations > + 0x07 extra: dtlb_miss Memory accesses that missed the DTLB > + 0x05 extra: dtlb_miss_ld DTLB misses due to load operations > + 0x09 extra: l0_dtlb_miss_ld L0_DTLB misses due to load operations > + 0x06 extra: dtlb_miss_st DTLB misses due to store operations > name:page_walks type:bitmask default:0x03 > - 0x03 walks Number of page-walks executed > - 0x03 cycles Duration of page-walks in core cycles > + 0x03 extra: walks Number of page-walks executed > + 0x03 extra: cycles Duration of page-walks in core cycles > name:x87_comp_ops_exe type:bitmask default:0x81 > - 0x01 s Floating point computational micro-ops executed > - 0x81 ar Floating point computational micro-ops retired > + 0x01 extra: s Floating point computational micro-ops executed > + 0x81 extra: ar Floating point computational micro-ops retired > name:fp_assist type:mandatory default:0x81 > - 0x81 ar Floating point assists > + 0x81 extra: ar Floating point assists > name:mul type:bitmask default:0x01 > - 0x01 s Multiply operations executed > - 0x81 ar Multiply operations retired > + 0x01 extra: s Multiply operations executed > + 0x81 extra: ar Multiply operations retired > name:div type:bitmask default:0x01 > - 0x01 s Divide operations executed > - 0x81 ar Divide operations retired > + 0x01 extra: s Divide operations executed > + 0x81 extra: ar Divide operations retired > name:l2_rqsts type:bitmask default:0x41 > - 0x41 i_state L2 cache demand requests from this core that missed the L2 > + 0x41 extra: i_state L2 cache demand requests from this core that missed the L2 > 0x4F mesi L2 cache demand requests from this core > name:cpu_clk_unhalted type:bitmask default:0x00 > - 0x00 core_p Core cycles when core is not halted > - 0x01 bus Bus cycles when core is not halted > - 0x02 no_other Bus cycles when core is active and the other is halted > + 0x00 extra: core_p Core cycles when core is not halted > + 0x01 extra: bus Bus cycles when core is not halted > + 0x02 extra: no_other Bus cycles when core is active and the other is halted > name:l1d_cache type:bitmask default:0x21 > - 0x21 ld L1 Cacheable Data Reads > - 0x22 st L1 Cacheable Data Writes > + 0x21 extra: ld L1 Cacheable Data Reads > + 0x22 extra: st L1 Cacheable Data Writes > name:icache type:bitmask default:0x03 > - 0x03 accesses Instruction fetches > - 0x02 misses Icache miss > + 0x03 extra: accesses Instruction fetches > + 0x02 extra: misses Icache miss > name:itlb type:bitmask default:0x04 > - 0x04 flush ITLB flushes > - 0x02 misses ITLB misses > + 0x04 extra: flush ITLB flushes > + 0x02 extra: misses ITLB misses > name:macro_insts type:exclusive default:0x03 > - 0x02 cisc_decoded CISC macro instructions decoded > - 0x03 all_decoded All Instructions decoded > + 0x02 extra: cisc_decoded CISC macro instructions decoded > + 0x03 extra: all_decoded All Instructions decoded > name:simd_uops_exec type:exclusive default:0x80 > - 0x00 s SIMD micro-ops executed (excluding stores) > - 0x80 ar SIMD micro-ops retired (excluding stores) > + 0x00 extra: s SIMD micro-ops executed (excluding stores) > + 0x80 extra: ar SIMD micro-ops retired (excluding stores) > name:simd_sat_uop_exec type:bitmask default:0x00 > - 0x00 s SIMD saturated arithmetic micro-ops executed > - 0x80 ar SIMD saturated arithmetic micro-ops retired > + 0x00 extra: s SIMD saturated arithmetic micro-ops executed > + 0x80 extra: ar SIMD saturated arithmetic micro-ops retired > name:simd_uop_type_exec type:bitmask default:0x01 > - 0x01 s SIMD packed multiply microops executed > - 0x81 ar SIMD packed multiply microops retired > - 0x02 s SIMD packed shift micro-ops executed > - 0x82 ar SIMD packed shift micro-ops retired > - 0x04 s SIMD pack micro-ops executed > - 0x84 ar SIMD pack micro-ops retired > - 0x08 s SIMD unpack micro-ops executed > - 0x88 ar SIMD unpack micro-ops retired > - 0x10 s SIMD packed logical microops executed > - 0x90 ar SIMD packed logical microops retired > - 0x20 s SIMD packed arithmetic micro-ops executed > + 0x01 extra: s SIMD packed multiply microops executed > + 0x81 extra: ar SIMD packed multiply microops retired > + 0x02 extra: s SIMD packed shift micro-ops executed > + 0x82 extra: ar SIMD packed shift micro-ops retired > + 0x04 extra: s SIMD pack micro-ops executed > + 0x84 extra: ar SIMD pack micro-ops retired > + 0x08 extra: s SIMD unpack micro-ops executed > + 0x88 extra: ar SIMD unpack micro-ops retired > + 0x10 extra: s SIMD packed logical microops executed > + 0x90 extra: ar SIMD packed logical microops retired > + 0x20 extra: s SIMD packed arithmetic micro-ops executed Duplicate "s" and "ar" names. > 0xA0 ar SIMD packed arithmetic micro-ops retired > name:uops_retired type:mandatory default:0x10 > - 0x10 any Micro-ops retired > + 0x10 extra: any Micro-ops retired > name:br_inst_retired type:bitmask default:0x00 > - 0x00 any Retired branch instructions > - 0x01 pred_not_taken Retired branch instructions that were predicted not-taken > - 0x02 mispred_not_taken Retired branch instructions that were mispredicted not-taken > - 0x04 pred_taken Retired branch instructions that were predicted taken > - 0x08 mispred_taken Retired branch instructions that were mispredicted taken > + 0x00 extra: any Retired branch instructions > + 0x01 extra: pred_not_taken Retired branch instructions that were predicted not-taken > + 0x02 extra: mispred_not_taken Retired branch instructions that were mispredicted not-taken > + 0x04 extra: pred_taken Retired branch instructions that were predicted taken > + 0x08 extra: mispred_taken Retired branch instructions that were mispredicted taken > 0x0A mispred Retired mispredicted branch instructions (precise event) > 0x0C taken Retired taken branch instructions > 0x0F any1 Retired branch instructions > name:cycles_int_masked type:bitmask default:0x01 > - 0x01 cycles_int_masked Cycles during which interrupts are disabled > - 0x02 cycles_int_pending_and_masked Cycles during which interrupts are pending and disabled > + 0x01 extra: cycles_int_masked Cycles during which interrupts are disabled > + 0x02 extra: cycles_int_pending_and_masked Cycles during which interrupts are pending and disabled > name:simd_inst_retired type:bitmask default:0x01 > - 0x01 packed_single Retired Streaming SIMD Extensions (SSE) packed-single instructions > - 0x02 scalar_single Retired Streaming SIMD Extensions (SSE) scalar-single instructions > - 0x04 packed_double Retired Streaming SIMD Extensions 2 (SSE2) packed-double instructions > - 0x08 scalar_double Retired Streaming SIMD Extensions 2 (SSE2) scalar-double instructions > - 0x10 vector Retired Streaming SIMD Extensions 2 (SSE2) vector instructions > + 0x01 extra: packed_single Retired Streaming SIMD Extensions (SSE) packed-single instructions > + 0x02 extra: scalar_single Retired Streaming SIMD Extensions (SSE) scalar-single instructions > + 0x04 extra: packed_double Retired Streaming SIMD Extensions 2 (SSE2) packed-double instructions > + 0x08 extra: scalar_double Retired Streaming SIMD Extensions 2 (SSE2) scalar-double instructions > + 0x10 extra: vector Retired Streaming SIMD Extensions 2 (SSE2) vector instructions > 0x1F any Retired Streaming SIMD instructions > name:simd_comp_inst_retired type:bitmask default:0x01 > - 0x01 packed_single Retired computational Streaming SIMD Extensions (SSE) packed-single instructions > - 0x02 scalar_single Retired computational Streaming SIMD Extensions (SSE) scalar-single instructions > - 0x04 packed_double Retired computational Streaming SIMD Extensions 2 (SSE2) packed-double instructions > - 0x08 scalar_double Retired computational Streaming SIMD Extensions 2 (SSE2) scalar-double instructions > + 0x01 extra: packed_single Retired computational Streaming SIMD Extensions (SSE) packed-single instructions > + 0x02 extra: scalar_single Retired computational Streaming SIMD Extensions (SSE) scalar-single instructions > + 0x04 extra: packed_double Retired computational Streaming SIMD Extensions 2 (SSE2) packed-double instructions > + 0x08 extra: scalar_double Retired computational Streaming SIMD Extensions 2 (SSE2) scalar-double instructions > name:mem_load_retired type:bitmask default:0x01 > - 0x01 l2_hit Retired loads that hit the L2 cache (precise event) > - 0x02 l2_miss Retired loads that miss the L2 cache (precise event) > - 0x04 dtlb_miss Retired loads that miss the DTLB (precise event) > + 0x01 extra: l2_hit Retired loads that hit the L2 cache (precise event) > + 0x02 extra: l2_miss Retired loads that miss the L2 cache (precise event) > + 0x04 extra: dtlb_miss Retired loads that miss the DTLB (precise event) > name:thermal_trip type:mandatory default:0xc0 > - 0xc0 thermal_trip Number of thermal trips. > + 0xc0 extra: thermal_trip Number of thermal trips. > # 18-11 > name:core type:bitmask default:0x180 > - 0x180 all All cores. > - 0x080 this This Core. > + 0x180 extra: all All cores. > + 0x080 extra: this This Core. > # 18-12 > name:agent type:bitmask default:0x00 > - 0x00 this This agent > - 0x40 any Include any agents > + 0x00 extra: this This agent > + 0x40 extra: any Include any agents > # 18-13 > name:prefetch type:bitmask default:0x60 > - 0x60 all All inclusive > - 0x20 hw Hardware prefetch only > - 0x00 exclude_hw Exclude hardware prefetch > + 0x60 extra: all All inclusive > + 0x20 extra: hw Hardware prefetch only > + 0x00 extra: exclude_hw Exclude hardware prefetch > # 18-14 > name:mesi type:bitmask default:0x0f > - 0x08 modified Counts modified state > - 0x04 exclusive Counts exclusive state > - 0x02 shared Counts shared state > - 0x01 invalid Counts invalid state > + 0x08 extra: modified Counts modified state > + 0x04 extra: exclusive Counts exclusive state > + 0x02 extra: shared Counts shared state > + 0x01 extra: invalid Counts invalid state > diff --git a/events/i386/core_2/unit_masks b/events/i386/core_2/unit_masks > index d528f17..f1d64eb 100644 > --- a/events/i386/core_2/unit_masks > +++ b/events/i386/core_2/unit_masks > @@ -33,7 +33,7 @@ name:sse_prefetch type:exclusive default:0x0 > 0x00 prefetch NTA instructions executed. > 0x01 prefetch T1 instructions executed. > 0x02 prefetch T1 and T2 instructions executed. > - 0x03 SSE weakly-ordered stores > + 0x03 extra: SSE weakly-ordered stores > name:simd_instr_type_exec type:bitmask default:0x3f > 0x01 SIMD packed multiplies > 0x02 SIMD packed shifts > @@ -41,7 +41,7 @@ name:simd_instr_type_exec type:bitmask default:0x3f > 0x08 SIMD unpack operations > 0x10 SIMD packed logical > 0x20 SIMD packed arithmetic > - 0x3f all of the above > + 0x3f extra: all of the above > name:mmx_trans type:bitmask default:0x3 > 0x01 float->MMX transitions > 0x02 MMX->float transitions > @@ -50,30 +50,30 @@ name:sse_miss type:exclusive default:0x0 > 0x01 PREFETCHT0 > 0x02 PREFETCHT1/PREFETCHT2 > name:load_block type:bitmask default:0x3e > - 0x02 STA Loads blocked by a preceding store with unknown address. > - 0x04 STD Loads blocked by a preceding store with unknown data. > - 0x08 OVERLAP_STORE Loads that partially overlap an earlier store, or 4K aliased with a previous store. > - 0x10 UNTIL_RETIRE Loads blocked until retirement. > - 0x20 L1D Loads blocked by the L1 data cache. > + 0x02 extra: STA Loads blocked by a preceding store with unknown address. > + 0x04 extra: STD Loads blocked by a preceding store with unknown data. > + 0x08 extra: OVERLAP_STORE Loads that partially overlap an earlier store, or 4K aliased with a previous store. > + 0x10 extra: UNTIL_RETIRE Loads blocked until retirement. > + 0x20 extra: L1D Loads blocked by the L1 data cache. > name:store_block type:bitmask default:0x0b > - 0x01 SB_DRAIN_CYCLES Cycles while stores are blocked due to store buffer drain. > - 0x02 ORDER Cycles while store is waiting for a preceding store to be globally observed. > - 0x08 NOOP A store is blocked due to a conflict with an external or internal snoop. > + 0x01 extra: SB_DRAIN_CYCLES Cycles while stores are blocked due to store buffer drain. > + 0x02 extra: ORDER Cycles while store is waiting for a preceding store to be globally observed. > + 0x08 extra: NOOP A store is blocked due to a conflict with an external or internal snoop. > name:dtlb_miss type:bitmask default:0x0f > - 0x01 ANY Memory accesses that missed the DTLB. > - 0x02 MISS_LD DTLB misses due to load operations. > - 0x04 L0_MISS_LD L0 DTLB misses due to load operations. > - 0x08 MISS_ST TLB misses due to store operations. > + 0x01 extra: ANY Memory accesses that missed the DTLB. > + 0x02 extra: MISS_LD DTLB misses due to load operations. > + 0x04 extra: L0_MISS_LD L0 DTLB misses due to load operations. > + 0x08 extra: MISS_ST TLB misses due to store operations. > name:memory_dis type:exclusive default:0x01 > - 0x01 RESET Memory disambiguation reset cycles. > - 0x02 SUCCESS Number of loads that were successfully disambiguated. > + 0x01 extra: RESET Memory disambiguation reset cycles. > + 0x02 extra: SUCCESS Number of loads that were successfully disambiguated. > name:page_walks type:exclusive default:0x02 > - 0x01 COUNT Number of page-walks executed. > - 0x02 CYCLES Duration of page-walks in core cycles. > + 0x01 extra: COUNT Number of page-walks executed. > + 0x02 extra: CYCLES Duration of page-walks in core cycles. > name:delayed_bypass type:exclusive default:0x00 > - 0x00 FP Delayed bypass to FP operation. > - 0x01 SIMD Delayed bypass to SIMD operation. > - 0x02 LOAD Delayed bypass to load operation. > + 0x00 extra: FP Delayed bypass to FP operation. > + 0x01 extra: SIMD Delayed bypass to SIMD operation. > + 0x02 extra: LOAD Delayed bypass to load operation. > name:core type:exclusive default:0x40 > 0xc0 All cores > 0x40 This core > @@ -133,13 +133,13 @@ name:esp type:bitmask default:0x01 > 0x01 ESP register content synchronizations > 0x02 ESP register automatic additions > name:inst_retired type:bitmask default:0x00 > - 0x00 Any > - 0x01 Loads > - 0x02 Stores > - 0x04 Other > + 0x00 extra: Any > + 0x01 extra: Loads > + 0x02 extra: Stores > + 0x04 extra: Other > name:x87_ops_retired type:exclusive default:0xfe > - 0x01 FXCH instructions retired > - 0xfe Retired floating-point computational operations (precise) > + 0x01 extra: FXCH instructions retired > + 0xfe extra: Retired floating-point computational operations (precise) > name:uops_retired type:bitmask default:0x0f > 0x01 Fused load+op or load+indirect branch retired > 0x02 Fused store address + data retired > @@ -183,10 +183,10 @@ name:rat_stalls type:bitmask default:0xf > 0x08 FPU status word > 0x0f All RAT > name:seg_regs type:bitmask default:0x0f > - 0x01 ES > - 0x02 DS > - 0x04 FS > - 0x08 GS > + 0x01 extra: ES > + 0x02 extra: DS > + 0x04 extra: FS > + 0x08 extra: GS > name:resource_stalls type:bitmask default:0x0f > 0x01 when the ROB is full > 0x02 during which the RS is full > diff --git a/events/i386/ivybridge/unit_masks b/events/i386/ivybridge/unit_masks > index ddb59a0..ffee1fa 100644 > --- a/events/i386/ivybridge/unit_masks > +++ b/events/i386/ivybridge/unit_masks > @@ -5,163 +5,163 @@ > # > include:i386/arch_perfmon > name:ld_blocks type:mandatory default:0x2 > - 0x2 store_forward loads blocked by overlapping with store buffer that cannot be forwarded > + 0x2 extra: store_forward loads blocked by overlapping with store buffer that cannot be forwarded > name:misalign_mem_ref type:bitmask default:0x1 > - 0x1 loads Speculative cache line split load uops dispatched to L1 cache > - 0x2 stores Speculative cache line split STA uops dispatched to L1 cache > + 0x1 extra: loads Speculative cache line split load uops dispatched to L1 cache > + 0x2 extra: stores Speculative cache line split STA uops dispatched to L1 cache > name:ld_blocks_partial type:mandatory default:0x1 > - 0x1 address_alias False dependencies in MOB due to partial compare on address > + 0x1 extra: address_alias False dependencies in MOB due to partial compare on address > name:dtlb_load_misses type:exclusive default:0x81 > - 0x81 demand_ld_miss_causes_a_walk Demand load Miss in all translation lookaside buffer (TLB) levels causes an page walk of any page size. > - 0x82 demand_ld_walk_completed Demand load Miss in all translation lookaside buffer (TLB) levels causes a page walk that completes of any page size. > - 0x84 demand_ld_walk_duration Demand load cycles page miss handler (PMH) is busy with this walk. > + 0x81 extra: demand_ld_miss_causes_a_walk Demand load Miss in all translation lookaside buffer (TLB) levels causes an page walk of any page size. > + 0x82 extra: demand_ld_walk_completed Demand load Miss in all translation lookaside buffer (TLB) levels causes a page walk that completes of any page size. > + 0x84 extra: demand_ld_walk_duration Demand load cycles page miss handler (PMH) is busy with this walk. > name:int_misc type:exclusive default:0x3 > 0x3 extra:cmask=1 recovery_cycles Number of cycles waiting for the checkpoints in Resource Allocation Table (RAT) to be recovered after Nuke due to all other cases except JEClear (e.g. whenever a ucode assist is needed like SSE exception, memory disambiguation, etc...) > 0x3 extra:cmask=1,edge recovery_stalls_count Number of occurences waiting for the checkpoints in Resource Allocation Table (RAT) to be recovered after Nuke due to all other cases except JEClear (e.g. whenever a ucode assist is needed like SSE exception, memory disambiguation, etc...) > name:uops_issued type:exclusive default:0x1 > - 0x1 any Uops that Resource Allocation Table (RAT) issues to Reservation Station (RS) > + 0x1 extra: any Uops that Resource Allocation Table (RAT) issues to Reservation Station (RS) > 0x1 extra:cmask=1,inv stall_cycles Cycles when Resource Allocation Table (RAT) does not issue Uops to Reservation Station (RS) for the thread > 0x1 extra:cmask=1,inv,any core_stall_cycles Cycles when Resource Allocation Table (RAT) does not issue Uops to Reservation Station (RS) for all threads > - 0x10 flags_merge Number of flags-merge uops being allocated. > - 0x20 slow_lea Number of slow LEA uops being allocated. A uop is generally considered SlowLea if it has 3 sources (e.g. 2 sources + immediate) regardless if as a result of LEA instruction or not. > - 0x40 single_mul Number of Multiply packed/scalar single precision uops allocated > + 0x10 extra: flags_merge Number of flags-merge uops being allocated. > + 0x20 extra: slow_lea Number of slow LEA uops being allocated. A uop is generally considered SlowLea if it has 3 sources (e.g. 2 sources + immediate) regardless if as a result of LEA instruction or not. > + 0x40 extra: single_mul Number of Multiply packed/scalar single precision uops allocated > name:arith type:bitmask default:0x1 > - 0x1 fpu_div_active Cycles when divider is busy executing divide operations > + 0x1 extra: fpu_div_active Cycles when divider is busy executing divide operations > 0x4 extra:cmask=1,edge fpu_div Divide operations executed > name:l2_rqsts type:exclusive default:0x1 > - 0x1 demand_data_rd_hit Demand Data Read requests that hit L2 cache > - 0x3 all_demand_data_rd Demand Data Read requests > - 0x4 rfo_hit RFO requests that hit L2 cache > - 0x8 rfo_miss RFO requests that miss L2 cache > - 0xc all_rfo RFO requests to L2 cache > - 0x10 code_rd_hit L2 cache hits when fetching instructions, code reads. > - 0x20 code_rd_miss L2 cache misses when fetching instructions > - 0x30 all_code_rd L2 code requests > - 0x40 pf_hit Requests from the L2 hardware prefetchers that hit L2 cache > - 0x80 pf_miss Requests from the L2 hardware prefetchers that miss L2 cache > - 0xc0 all_pf Requests from L2 hardware prefetchers > + 0x1 extra: demand_data_rd_hit Demand Data Read requests that hit L2 cache > + 0x3 extra: all_demand_data_rd Demand Data Read requests > + 0x4 extra: rfo_hit RFO requests that hit L2 cache > + 0x8 extra: rfo_miss RFO requests that miss L2 cache > + 0xc extra: all_rfo RFO requests to L2 cache > + 0x10 extra: code_rd_hit L2 cache hits when fetching instructions, code reads. > + 0x20 extra: code_rd_miss L2 cache misses when fetching instructions > + 0x30 extra: all_code_rd L2 code requests > + 0x40 extra: pf_hit Requests from the L2 hardware prefetchers that hit L2 cache > + 0x80 extra: pf_miss Requests from the L2 hardware prefetchers that miss L2 cache > + 0xc0 extra: all_pf Requests from L2 hardware prefetchers > name:l2_store_lock_rqsts type:exclusive default:0x1 > - 0x1 miss RFOs that miss cache lines > - 0x8 hit_m RFOs that hit cache lines in M state > - 0xf all RFOs that access cache lines in any state > + 0x1 extra: miss RFOs that miss cache lines > + 0x8 extra: hit_m RFOs that hit cache lines in M state > + 0xf extra: all RFOs that access cache lines in any state > name:l2_l1d_wb_rqsts type:exclusive default:0x1 > - 0x1 miss Count the number of modified Lines evicted from L1 and missed L2. (Non-rejected WBs from the DCU.) > - 0x4 hit_e Not rejected writebacks from L1D to L2 cache lines in E state > - 0x8 hit_m Not rejected writebacks from L1D to L2 cache lines in M state > - 0xf all Not rejected writebacks from L1D to L2 cache lines in any state. > + 0x1 extra: miss Count the number of modified Lines evicted from L1 and missed L2. (Non-rejected WBs from the DCU.) > + 0x4 extra: hit_e Not rejected writebacks from L1D to L2 cache lines in E state > + 0x8 extra: hit_m Not rejected writebacks from L1D to L2 cache lines in M state > + 0xf extra: all Not rejected writebacks from L1D to L2 cache lines in any state. > name:l1d_pend_miss type:exclusive default:0x1 > - 0x1 pending L1D miss oustandings duration in cycles > + 0x1 extra: pending L1D miss oustandings duration in cycles > 0x1 extra:cmask=1 pending_cycles Cycles with L1D load Misses outstanding. > 0x1 extra:cmask=1,edge occurences This event counts the number of L1D misses outstanding, using an edge detect to count transitions. > name:dtlb_store_misses type:bitmask default:0x1 > - 0x1 miss_causes_a_walk Store misses in all DTLB levels that cause page walks > - 0x2 walk_completed Store misses in all DTLB levels that cause completed page walks > - 0x4 walk_duration Cycles when PMH is busy with page walks > - 0x10 stlb_hit Store operations that miss the first TLB level but hit the second and do not cause page walks > + 0x1 extra: miss_causes_a_walk Store misses in all DTLB levels that cause page walks > + 0x2 extra: walk_completed Store misses in all DTLB levels that cause completed page walks > + 0x4 extra: walk_duration Cycles when PMH is busy with page walks > + 0x10 extra: stlb_hit Store operations that miss the first TLB level but hit the second and do not cause page walks > name:load_hit_pre type:bitmask default:0x1 > - 0x1 sw_pf Not software-prefetch load dispatches that hit forward buffer allocated for software prefetch > - 0x2 hw_pf Not software-prefetch load dispatches that hit forward buffer allocated for hardware prefetch > + 0x1 extra: sw_pf Not software-prefetch load dispatches that hit forward buffer allocated for software prefetch > + 0x2 extra: hw_pf Not software-prefetch load dispatches that hit forward buffer allocated for hardware prefetch > name:l1d type:mandatory default:0x1 > - 0x1 replacement L1D data line replacements > + 0x1 extra: replacement L1D data line replacements > name:move_elimination type:bitmask default:0x1 > - 0x1 int_not_eliminated Number of integer Move Elimination candidate uops that were not eliminated. > - 0x2 simd_not_eliminated Number of SIMD Move Elimination candidate uops that were not eliminated. > - 0x4 int_eliminated Number of integer Move Elimination candidate uops that were eliminated. > - 0x8 simd_eliminated Number of SIMD Move Elimination candidate uops that were eliminated. > + 0x1 extra: int_not_eliminated Number of integer Move Elimination candidate uops that were not eliminated. > + 0x2 extra: simd_not_eliminated Number of SIMD Move Elimination candidate uops that were not eliminated. > + 0x4 extra: int_eliminated Number of integer Move Elimination candidate uops that were eliminated. > + 0x8 extra: simd_eliminated Number of SIMD Move Elimination candidate uops that were eliminated. > name:cpl_cycles type:exclusive default:0x1 > - 0x1 ring0 Unhalted core cycles when the thread is in ring 0 > + 0x1 extra: ring0 Unhalted core cycles when the thread is in ring 0 > 0x1 extra:cmask=1,edge ring0_trans Number of intervals between processor halts while thread is in ring 0 > - 0x2 ring123 Unhalted core cycles when thread is in rings 1, 2, or 3 > + 0x2 extra: ring123 Unhalted core cycles when thread is in rings 1, 2, or 3 > name:rs_events type:mandatory default:0x1 > - 0x1 empty_cycles Cycles when Reservation Station (RS) is empty for the thread > + 0x1 extra: empty_cycles Cycles when Reservation Station (RS) is empty for the thread > name:tlb_access type:mandatory default:0x4 > - 0x4 load_stlb_hit Load operations that miss the first DTLB level but hit the second and do not cause page walks > + 0x4 extra: load_stlb_hit Load operations that miss the first DTLB level but hit the second and do not cause page walks > name:offcore_requests_outstanding type:exclusive default:0x1 > - 0x1 demand_data_rd Offcore outstanding Demand Data Read transactions in uncore queue. > + 0x1 extra: demand_data_rd Offcore outstanding Demand Data Read transactions in uncore queue. > 0x1 extra:cmask=1 cycles_with_demand_data_rd Cycles when offcore outstanding Demand Data Read transactions are present in SuperQueue (SQ), queue to uncore > - 0x2 demand_code_rd Offcore outstanding code reads transactions in SuperQueue (SQ), queue to uncore, every cycle > - 0x4 demand_rfo Offcore outstanding RFO store transactions in SuperQueue (SQ), queue to uncore > + 0x2 extra: demand_code_rd Offcore outstanding code reads transactions in SuperQueue (SQ), queue to uncore, every cycle > + 0x4 extra: demand_rfo Offcore outstanding RFO store transactions in SuperQueue (SQ), queue to uncore > 0x4 extra:cmask=1 cycles_with_demand_rfo Offcore outstanding demand rfo reads transactions in SuperQueue (SQ), queue to uncore, every cycle > - 0x8 all_data_rd Offcore outstanding cacheable Core Data Read transactions in SuperQueue (SQ), queue to uncore > + 0x8 extra: all_data_rd Offcore outstanding cacheable Core Data Read transactions in SuperQueue (SQ), queue to uncore > 0x8 extra:cmask=1 cycles_with_data_rd Cycles when offcore outstanding cacheable Core Data Read transactions are present in SuperQueue (SQ), queue to uncore > name:lock_cycles type:bitmask default:0x1 > - 0x1 split_lock_uc_lock_duration Cycles when L1 and L2 are locked due to UC or split lock > - 0x2 cache_lock_duration Cycles when L1D is locked > + 0x1 extra: split_lock_uc_lock_duration Cycles when L1 and L2 are locked due to UC or split lock > + 0x2 extra: cache_lock_duration Cycles when L1D is locked > name:idq type:exclusive default:0x2 > - 0x2 empty Instruction Decode Queue (IDQ) empty cycles > - 0x4 mite_uops Uops delivered to Instruction Decode Queue (IDQ) from MITE path > + 0x2 extra: empty Instruction Decode Queue (IDQ) empty cycles > + 0x4 extra: mite_uops Uops delivered to Instruction Decode Queue (IDQ) from MITE path > 0x4 extra:cmask=1 mite_cycles Cycles when uops are being delivered to Instruction Decode Queue (IDQ) from MITE path > - 0x8 dsb_uops Uops delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path > + 0x8 extra: dsb_uops Uops delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path > 0x8 extra:cmask=1 dsb_cycles Cycles when uops are being delivered to Instruction Decode Queue (IDQ) from Decode Stream Buffer (DSB) path > - 0x10 ms_dsb_uops Uops initiated by Decode Stream Buffer (DSB) that are being delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy > + 0x10 extra: ms_dsb_uops Uops initiated by Decode Stream Buffer (DSB) that are being delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy > 0x10 extra:cmask=1 ms_dsb_cycles Cycles when uops initiated by Decode Stream Buffer (DSB) are being delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy > 0x10 extra:cmask=1,edge ms_dsb_occur Deliveries to Instruction Decode Queue (IDQ) initiated by Decode Stream Buffer (DSB) while Microcode Sequenser (MS) is busy > 0x18 extra:cmask=1 all_dsb_cycles_any_uops Cycles Decode Stream Buffer (DSB) is delivering any Uop > 0x18 extra:cmask=4 all_dsb_cycles_4_uops Cycles Decode Stream Buffer (DSB) is delivering 4 Uops > - 0x20 ms_mite_uops Uops initiated by MITE and delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy > + 0x20 extra: ms_mite_uops Uops initiated by MITE and delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy > 0x24 extra:cmask=1 all_mite_cycles_any_uops Cycles MITE is delivering any Uop > 0x24 extra:cmask=4 all_mite_cycles_4_uops Cycles MITE is delivering 4 Uops > - 0x30 ms_uops Uops delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy > + 0x30 extra: ms_uops Uops delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy > 0x30 extra:cmask=1 ms_cycles Cycles when uops are being delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy > - 0x3c mite_all_uops Uops delivered to Instruction Decode Queue (IDQ) from MITE path > + 0x3c extra: mite_all_uops Uops delivered to Instruction Decode Queue (IDQ) from MITE path > name:icache type:mandatory default:0x2 > - 0x2 misses Instruction cache, streaming buffer and victim cache misses > + 0x2 extra: misses Instruction cache, streaming buffer and victim cache misses > name:itlb_misses type:bitmask default:0x1 > - 0x1 miss_causes_a_walk Misses at all ITLB levels that cause page walks > - 0x2 walk_completed Misses in all ITLB levels that cause completed page walks > - 0x4 walk_duration Cycles when PMH is busy with page walks > - 0x10 stlb_hit Operations that miss the first ITLB level but hit the second and do not cause any page walks > + 0x1 extra: miss_causes_a_walk Misses at all ITLB levels that cause page walks > + 0x2 extra: walk_completed Misses in all ITLB levels that cause completed page walks > + 0x4 extra: walk_duration Cycles when PMH is busy with page walks > + 0x10 extra: stlb_hit Operations that miss the first ITLB level but hit the second and do not cause any page walks > name:ild_stall type:bitmask default:0x1 > - 0x1 lcp Stalls caused by changing prefix length of the instruction. > - 0x4 iq_full Stall cycles because IQ is full > + 0x1 extra: lcp Stalls caused by changing prefix length of the instruction. > + 0x4 extra: iq_full Stall cycles because IQ is full > name:br_inst_exec type:exclusive default:0x41 > - 0x41 nontaken_conditional Not taken macro-conditional branches > - 0x81 taken_conditional Taken speculative and retired macro-conditional branches > - 0x82 taken_direct_jump Taken speculative and retired macro-conditional branch instructions excluding calls and indirects > - 0x84 taken_indirect_jump_non_call_ret Taken speculative and retired indirect branches excluding calls and returns > - 0x88 taken_indirect_near_return Taken speculative and retired indirect branches with return mnemonic > - 0x90 taken_direct_near_call Taken speculative and retired direct near calls > - 0xa0 taken_indirect_near_call Taken speculative and retired indirect calls > - 0xc1 all_conditional Speculative and retired macro-conditional branches > - 0xc2 all_direct_jmp Speculative and retired macro-unconditional branches excluding calls and indirects > - 0xc4 all_indirect_jump_non_call_ret Speculative and retired indirect branches excluding calls and returns > - 0xc8 all_indirect_near_return Speculative and retired indirect return branches. > - 0xd0 all_direct_near_call Speculative and retired direct near calls > - 0xff all_branches Speculative and retired branches > + 0x41 extra: nontaken_conditional Not taken macro-conditional branches > + 0x81 extra: taken_conditional Taken speculative and retired macro-conditional branches > + 0x82 extra: taken_direct_jump Taken speculative and retired macro-conditional branch instructions excluding calls and indirects > + 0x84 extra: taken_indirect_jump_non_call_ret Taken speculative and retired indirect branches excluding calls and returns > + 0x88 extra: taken_indirect_near_return Taken speculative and retired indirect branches with return mnemonic > + 0x90 extra: taken_direct_near_call Taken speculative and retired direct near calls > + 0xa0 extra: taken_indirect_near_call Taken speculative and retired indirect calls > + 0xc1 extra: all_conditional Speculative and retired macro-conditional branches > + 0xc2 extra: all_direct_jmp Speculative and retired macro-unconditional branches excluding calls and indirects > + 0xc4 extra: all_indirect_jump_non_call_ret Speculative and retired indirect branches excluding calls and returns > + 0xc8 extra: all_indirect_near_return Speculative and retired indirect return branches. > + 0xd0 extra: all_direct_near_call Speculative and retired direct near calls > + 0xff extra: all_branches Speculative and retired branches > name:br_misp_exec type:exclusive default:0x41 > - 0x41 nontaken_conditional Not taken speculative and retired mispredicted macro conditional branches > - 0x81 taken_conditional Taken speculative and retired mispredicted macro conditional branches > - 0x84 taken_indirect_jump_non_call_ret Taken speculative and retired mispredicted indirect branches excluding calls and returns > - 0x88 taken_return_near Taken speculative and retired mispredicted indirect branches with return mnemonic > - 0xa0 taken_indirect_near_call Taken speculative and retired mispredicted indirect calls > - 0xc1 all_conditional Speculative and retired mispredicted macro conditional branches > - 0xc4 all_indirect_jump_non_call_ret Mispredicted indirect branches excluding calls and returns > - 0xff all_branches Speculative and retired mispredicted macro conditional branches > + 0x41 extra: nontaken_conditional Not taken speculative and retired mispredicted macro conditional branches > + 0x81 extra: taken_conditional Taken speculative and retired mispredicted macro conditional branches > + 0x84 extra: taken_indirect_jump_non_call_ret Taken speculative and retired mispredicted indirect branches excluding calls and returns > + 0x88 extra: taken_return_near Taken speculative and retired mispredicted indirect branches with return mnemonic > + 0xa0 extra: taken_indirect_near_call Taken speculative and retired mispredicted indirect calls > + 0xc1 extra: all_conditional Speculative and retired mispredicted macro conditional branches > + 0xc4 extra: all_indirect_jump_non_call_ret Mispredicted indirect branches excluding calls and returns > + 0xff extra: all_branches Speculative and retired mispredicted macro conditional branches > name:idq_uops_not_delivered type:exclusive default:0x1 > - 0x1 core Uops not delivered by the Frontend to the Backend of the machine, while there is no Backend stall > + 0x1 extra: core Uops not delivered by the Frontend to the Backend of the machine, while there is no Backend stall > 0x1 extra:cmask=1 cycles_le_3_uop_deliv.core Cycles with 3 or less uops delivered by the Frontend to the Backend of the machine, while there is no Backend stall > 0x1 extra:cmask=1,inv cycles_fe_was_ok Cycles with 4 uops delivered by the Frontend to the Backend of the machine, or the Backend was stalling > 0x1 extra:cmask=2 cycles_le_2_uop_deliv.core Cycles with 2 or less uops delivered by the Frontend to the Backend of the machine, while there is no Backend stall > 0x1 extra:cmask=3 cycles_le_1_uop_deliv.core Cycles with 1 or less uops delivered by the Frontend to the Backend of the machine, while there is no Backend stall > 0x1 extra:cmask=4 cycles_0_uops_deliv.core Cycles with no uops delivered by the Frontend to the Backend of the machine, while there is no Backend stall > name:uops_dispatched_port type:exclusive default:0x1 > - 0x1 port_0 Cycles per thread when uops are dispatched to port 0 > + 0x1 extra: port_0 Cycles per thread when uops are dispatched to port 0 > 0x1 extra:any port_0_core Cycles per core when uops are dispatched to port 0 > - 0x2 port_1 Cycles per thread when uops are dispatched to port 1 > + 0x2 extra: port_1 Cycles per thread when uops are dispatched to port 1 > 0x2 extra:any port_1_core Cycles per core when uops are dispatched to port 1 > - 0xc port_2 Cycles per thread when load or STA uops are dispatched to port 2 > + 0xc extra: port_2 Cycles per thread when load or STA uops are dispatched to port 2 > 0xc extra:any port_2_core Cycles per core when load or STA uops are dispatched to port 2 > - 0x30 port_3 Cycles per thread when load or STA uops are dispatched to port 3 > + 0x30 extra: port_3 Cycles per thread when load or STA uops are dispatched to port 3 > 0x30 extra:any port_3_core Cycles per core when load or STA uops are dispatched to port 3 > - 0x40 port_4 Cycles per thread when uops are dispatched to port 4 > + 0x40 extra: port_4 Cycles per thread when uops are dispatched to port 4 > 0x40 extra:any port_4_core Cycles per core when uops are dispatched to port 4 > - 0x80 port_5 Cycles per thread when uops are dispatched to port 5 > + 0x80 extra: port_5 Cycles per thread when uops are dispatched to port 5 > 0x80 extra:any port_5_core Cycles per core when uops are dispatched to port 5 > name:resource_stalls type:bitmask default:0x1 > - 0x1 any Resource-related stall cycles > - 0x4 rs Cycles stalled due to no eligible RS entry available. > - 0x8 sb Cycles stalled due to no store buffers available. (not including draining form sync). > - 0x10 rob Cycles stalled due to re-order buffer full. > + 0x1 extra: any Resource-related stall cycles > + 0x4 extra: rs Cycles stalled due to no eligible RS entry available. > + 0x8 extra: sb Cycles stalled due to no store buffers available. (not including draining form sync). > + 0x10 extra: rob Cycles stalled due to re-order buffer full. > name:cycle_activity type:exclusive default:0x1 > 0x1 extra:cmask=1 cycles_l2_pending Cycles with pending L2 cache miss loads. > 0x2 extra:cmask=2 cycles_ldm_pending Cycles with pending memory loads. > @@ -171,99 +171,99 @@ name:cycle_activity type:exclusive default:0x1 > 0x8 extra:cmask=8 cycles_l1d_pending Cycles with pending L1 cache miss loads. > 0xc extra:cmask=c stalls_l1d_pending Execution stalls due to L1 data cache misses > name:dsb2mite_switches type:mandatory default:0x1 > - 0x1 count Decode Stream Buffer (DSB)-to-MITE switches > + 0x1 extra: count Decode Stream Buffer (DSB)-to-MITE switches > name:dsb_fill type:mandatory default:0x8 > - 0x8 exceed_dsb_lines Cycles when Decode Stream Buffer (DSB) fill encounter more than 3 Decode Stream Buffer (DSB) lines > + 0x8 extra: exceed_dsb_lines Cycles when Decode Stream Buffer (DSB) fill encounter more than 3 Decode Stream Buffer (DSB) lines > name:itlb type:mandatory default:0x1 > - 0x1 itlb_flush Flushing of the Instruction TLB (ITLB) pages, includes 4k/2M/4M pages. > + 0x1 extra: itlb_flush Flushing of the Instruction TLB (ITLB) pages, includes 4k/2M/4M pages. > name:offcore_requests type:bitmask default:0x1 > - 0x1 demand_data_rd Demand Data Read requests sent to uncore > - 0x2 demand_code_rd Cacheable and noncachaeble code read requests > - 0x4 demand_rfo Demand RFO requests including regular RFOs, locks, ItoM > - 0x8 all_data_rd Demand and prefetch data reads > + 0x1 extra: demand_data_rd Demand Data Read requests sent to uncore > + 0x2 extra: demand_code_rd Cacheable and noncachaeble code read requests > + 0x4 extra: demand_rfo Demand RFO requests including regular RFOs, locks, ItoM > + 0x8 extra: all_data_rd Demand and prefetch data reads > name:uops_executed type:exclusive default:0x1 > - 0x1 thread Counts the number of uops to be executed per-thread each cycle. > + 0x1 extra: thread Counts the number of uops to be executed per-thread each cycle. > 0x1 extra:cmask=1 cycles_ge_1_uop_exec Cycles where at least 1 uop was executed per-thread > 0x1 extra:cmask=1,inv stall_cycles Counts number of cycles no uops were dispatched to be executed on this thread. > 0x1 extra:cmask=2 cycles_ge_2_uops_exec Cycles where at least 2 uops were executed per-thread > 0x1 extra:cmask=3 cycles_ge_3_uops_exec Cycles where at least 3 uops were executed per-thread > 0x1 extra:cmask=4 cycles_ge_4_uops_exec Cycles where at least 4 uops were executed per-thread > - 0x2 core Number of uops executed on the core. > + 0x2 extra: core Number of uops executed on the core. > name:tlb_flush type:bitmask default:0x1 > - 0x1 dtlb_thread DTLB flush attempts of the thread-specific entries > - 0x20 stlb_any STLB flush attempts > + 0x1 extra: dtlb_thread DTLB flush attempts of the thread-specific entries > + 0x20 extra: stlb_any STLB flush attempts > name:other_assists type:bitmask default:0x8 > - 0x8 avx_store Number of AVX memory assist for stores. AVX microcode assist is being invoked whenever the hardware is unable to properly handle AVX-256b operations. > - 0x10 avx_to_sse Number of transitions from AVX-256 to legacy SSE when penalty applicable. > - 0x20 sse_to_avx Number of transitions from SSE to AVX-256 when penalty applicable. > + 0x8 extra: avx_store Number of AVX memory assist for stores. AVX microcode assist is being invoked whenever the hardware is unable to properly handle AVX-256b operations. > + 0x10 extra: avx_to_sse Number of transitions from AVX-256 to legacy SSE when penalty applicable. > + 0x20 extra: sse_to_avx Number of transitions from SSE to AVX-256 when penalty applicable. > name:uops_retired type:exclusive default:0x1 > - 0x1 all Actually retired uops. > + 0x1 extra: all Actually retired uops. > 0x1 extra:cmask=1,inv stall_cycles Cycles without actually retired uops. > 0x1 extra:cmask=1,inv,any core_stall_cycles Cycles without actually retired uops. > 0x1 extra:cmask=10,inv total_cycles Cycles with less than 10 actually retired uops. > - 0x2 retire_slots Retirement slots used. > + 0x2 extra: retire_slots Retirement slots used. > name:machine_clears type:bitmask default:0x2 > - 0x2 memory_ordering Counts the number of machine clears due to memory order conflicts. > - 0x4 smc Self-modifying code (SMC) detected. > - 0x20 maskmov This event counts the number of executed Intel AVX masked load operations that refer to an illegal address range with the mask bits set to 0. > + 0x2 extra: memory_ordering Counts the number of machine clears due to memory order conflicts. > + 0x4 extra: smc Self-modifying code (SMC) detected. > + 0x20 extra: maskmov This event counts the number of executed Intel AVX masked load operations that refer to an illegal address range with the mask bits set to 0. > name:br_inst_retired type:exclusive default:0x1 > - 0x1 conditional Conditional branch instructions retired. > - 0x2 near_call_r3 Direct and indirect macro near call instructions retired (captured in ring 3). > - 0x2 near_call Direct and indirect near call instructions retired. > - 0x8 near_return Return instructions retired. > - 0x10 not_taken Not taken branch instructions retired. > - 0x20 near_taken Taken branch instructions retired. > - 0x40 far_branch Far branch instructions retired. > + 0x1 extra: conditional Conditional branch instructions retired. > + 0x2 extra: near_call_r3 Direct and indirect macro near call instructions retired (captured in ring 3). > + 0x2 extra: near_call Direct and indirect near call instructions retired. > + 0x8 extra: near_return Return instructions retired. > + 0x10 extra: not_taken Not taken branch instructions retired. > + 0x20 extra: near_taken Taken branch instructions retired. > + 0x40 extra: far_branch Far branch instructions retired. > name:br_misp_retired type:bitmask default:0x1 > - 0x1 conditional Mispredicted conditional branch instructions retired. > - 0x20 near_taken number of near branch instructions retired that were mispredicted and taken. > + 0x1 extra: conditional Mispredicted conditional branch instructions retired. > + 0x20 extra: near_taken number of near branch instructions retired that were mispredicted and taken. > name:fp_assist type:exclusive default:0x1e > - 0x2 x87_output Number of X87 assists due to output value. > - 0x4 x87_input Number of X87 assists due to input value. > - 0x8 simd_output Number of SIMD FP assists due to Output values > - 0x10 simd_input Number of SIMD FP assists due to input values > + 0x2 extra: x87_output Number of X87 assists due to output value. > + 0x4 extra: x87_input Number of X87 assists due to input value. > + 0x8 extra: simd_output Number of SIMD FP assists due to Output values > + 0x10 extra: simd_input Number of SIMD FP assists due to input values > 0x1e extra:cmask=1 any Cycles with any input/output SSE or FP assist > name:rob_misc_events type:mandatory default:0x20 > - 0x20 lbr_inserts Count cases of saving new LBR > + 0x20 extra: lbr_inserts Count cases of saving new LBR > name:mem_uops_retired type:exclusive default:0x81 > - 0x11 stlb_miss_loads Load uops with true STLB miss retired to architected path. > - 0x12 stlb_miss_stores Store uops with true STLB miss retired to architected path. > - 0x21 lock_loads Load uops with locked access retired to architected path. > - 0x41 split_loads Line-splitted load uops retired to architected path. > - 0x42 split_stores Line-splitted store uops retired to architected path. > - 0x81 all_loads Load uops retired to architected path with filter on bits 0 and 1 applied. > - 0x82 all_stores Store uops retired to architected path with filter on bits 0 and 1 applied. > + 0x11 extra: stlb_miss_loads Load uops with true STLB miss retired to architected path. > + 0x12 extra: stlb_miss_stores Store uops with true STLB miss retired to architected path. > + 0x21 extra: lock_loads Load uops with locked access retired to architected path. > + 0x41 extra: split_loads Line-splitted load uops retired to architected path. > + 0x42 extra: split_stores Line-splitted store uops retired to architected path. > + 0x81 extra: all_loads Load uops retired to architected path with filter on bits 0 and 1 applied. > + 0x82 extra: all_stores Store uops retired to architected path with filter on bits 0 and 1 applied. > name:mem_load_uops_retired type:bitmask default:0x1 > - 0x1 l1_hit Retired load uops with L1 cache hits as data sources. > - 0x2 l2_hit Retired load uops with L2 cache hits as data sources. > - 0x4 llc_hit Retired load uops which data sources were data hits in LLC without snoops required. > - 0x40 hit_lfb Retired load uops which data sources were load uops missed L1 but hit forward buffer due to preceding miss to the same cache line with data not ready. > + 0x1 extra: l1_hit Retired load uops with L1 cache hits as data sources. > + 0x2 extra: l2_hit Retired load uops with L2 cache hits as data sources. > + 0x4 extra: llc_hit Retired load uops which data sources were data hits in LLC without snoops required. > + 0x40 extra: hit_lfb Retired load uops which data sources were load uops missed L1 but hit forward buffer due to preceding miss to the same cache line with data not ready. > name:mem_load_uops_llc_hit_retired type:bitmask default:0x1 > - 0x1 xsnp_miss Retired load uops which data sources were LLC hit and cross-core snoop missed in on-pkg core cache. > - 0x2 xsnp_hit Retired load uops which data sources were LLC and cross-core snoop hits in on-pkg core cache. > - 0x4 xsnp_hitm Retired load uops which data sources were HitM responses from shared LLC. > - 0x8 xsnp_none Retired load uops which data sources were hits in LLC without snoops required. > + 0x1 extra: xsnp_miss Retired load uops which data sources were LLC hit and cross-core snoop missed in on-pkg core cache. > + 0x2 extra: xsnp_hit Retired load uops which data sources were LLC and cross-core snoop hits in on-pkg core cache. > + 0x4 extra: xsnp_hitm Retired load uops which data sources were HitM responses from shared LLC. > + 0x8 extra: xsnp_none Retired load uops which data sources were hits in LLC without snoops required. > name:mem_load_uops_llc_miss_retired type:mandatory default:0x1 > - 0x1 local_dram Data from local DRAM either Snoop not needed or Snoop Miss (RspI) > + 0x1 extra: local_dram Data from local DRAM either Snoop not needed or Snoop Miss (RspI) > name:baclears type:mandatory default:0x1f > - 0x1f any Counts the total number when the front end is resteered, mainly when the BPU cannot provide a correct prediction and this is corrected by other branch handling mechanisms at the front end. > + 0x1f extra: any Counts the total number when the front end is resteered, mainly when the BPU cannot provide a correct prediction and this is corrected by other branch handling mechanisms at the front end. > name:l2_trans type:bitmask default:0x80 > - 0x1 demand_data_rd Demand Data Read requests that access L2 cache > - 0x2 rfo RFO requests that access L2 cache > - 0x4 code_rd L2 cache accesses when fetching instructions > - 0x8 all_pf L2 or LLC HW prefetches that access L2 cache > - 0x10 l1d_wb L1D writebacks that access L2 cache > - 0x20 l2_fill L2 fill requests that access L2 cache > - 0x40 l2_wb L2 writebacks that access L2 cache > - 0x80 all_requests Transactions accessing L2 pipe > + 0x1 extra: demand_data_rd Demand Data Read requests that access L2 cache > + 0x2 extra: rfo RFO requests that access L2 cache > + 0x4 extra: code_rd L2 cache accesses when fetching instructions > + 0x8 extra: all_pf L2 or LLC HW prefetches that access L2 cache > + 0x10 extra: l1d_wb L1D writebacks that access L2 cache > + 0x20 extra: l2_fill L2 fill requests that access L2 cache > + 0x40 extra: l2_wb L2 writebacks that access L2 cache > + 0x80 extra: all_requests Transactions accessing L2 pipe > name:l2_lines_in type:exclusive default:0x7 > - 0x1 i L2 cache lines in I state filling L2 > - 0x2 s L2 cache lines in S state filling L2 > - 0x4 e L2 cache lines in E state filling L2 > - 0x7 all L2 cache lines filling L2 > + 0x1 extra: i L2 cache lines in I state filling L2 > + 0x2 extra: s L2 cache lines in S state filling L2 > + 0x4 extra: e L2 cache lines in E state filling L2 > + 0x7 extra: all L2 cache lines filling L2 > name:l2_lines_out type:exclusive default:0x1 > - 0x1 demand_clean Clean L2 cache lines evicted by demand > - 0x2 demand_dirty Dirty L2 cache lines evicted by demand > - 0x4 pf_clean Clean L2 cache lines evicted by L2 prefetch > - 0x8 pf_dirty Dirty L2 cache lines evicted by L2 prefetch > - 0xa dirty_all Dirty L2 cache lines filling the L2 > + 0x1 extra: demand_clean Clean L2 cache lines evicted by demand > + 0x2 extra: demand_dirty Dirty L2 cache lines evicted by demand > + 0x4 extra: pf_clean Clean L2 cache lines evicted by L2 prefetch > + 0x8 extra: pf_dirty Dirty L2 cache lines evicted by L2 prefetch > + 0xa extra: dirty_all Dirty L2 cache lines filling the L2 > diff --git a/events/i386/nehalem/unit_masks b/events/i386/nehalem/unit_masks > index d800e5d..8f60292 100644 > --- a/events/i386/nehalem/unit_masks > +++ b/events/i386/nehalem/unit_masks > @@ -4,369 +4,369 @@ > # > include:i386/arch_perfmon > name:sb_forward type:mandatory default:0x01 > - 0x01 any Counts the number of store forwards > + 0x01 extra: any Counts the number of store forwards > name:load_block type:bitmask default:0x01 > - 0x01 std Counts the number of loads blocked by a preceding store with unknown data > - 0x04 address_offset Counts the number of loads blocked by a preceding store address > + 0x01 extra: std Counts the number of loads blocked by a preceding store with unknown data > + 0x04 extra: address_offset Counts the number of loads blocked by a preceding store address > name:sb_drain type:mandatory default:0x01 > - 0x01 cycles Counts the cycles of store buffer drains > + 0x01 extra: cycles Counts the cycles of store buffer drains > name:misalign_mem_ref type:bitmask default:0x03 > - 0x01 load Counts the number of misaligned load references > - 0x02 store Counts the number of misaligned store references > - 0x03 any Counts the number of misaligned memory references > + 0x01 extra: load Counts the number of misaligned load references > + 0x02 extra: store Counts the number of misaligned store references > + 0x03 extra: any Counts the number of misaligned memory references > name:store_blocks type:bitmask default:0x0f > - 0x01 not_sta This event counts the number of load operations delayed caused by preceding stores whose addresses are known but whose data is unknown, and preceding stores that conflict with the load but which incompletely overlap the load > - 0x02 sta This event counts load operations delayed caused by preceding stores whose addresses are unknown (STA block) > - 0x04 at_ret Counts number of loads delayed with at-Retirement block code > - 0x08 l1d_block Cacheable loads delayed with L1D block code > + 0x01 extra: not_sta This event counts the number of load operations delayed caused by preceding stores whose addresses are known but whose data is unknown, and preceding stores that conflict with the load but which incompletely overlap the load > + 0x02 extra: sta This event counts load operations delayed caused by preceding stores whose addresses are unknown (STA block) > + 0x04 extra: at_ret Counts number of loads delayed with at-Retirement block code > + 0x08 extra: l1d_block Cacheable loads delayed with L1D block code > 0x0F any All loads delayed due to store blocks > name:dtlb_load_misses type:bitmask default:0x01 > - 0x01 any Counts all load misses that cause a page walk > - 0x02 walk_completed Counts number of completed page walks due to load miss in the STLB > - 0x10 stlb_hit Number of cache load STLB hits > - 0x20 pde_miss Number of DTLB cache load misses where the low part of the linear to physical address translation was missed > - 0x40 pdp_miss Number of DTLB cache load misses where the high part of the linear to physical address translation was missed > - 0x80 large_walk_completed Counts number of completed large page walks due to load miss in the STLB > + 0x01 extra: any Counts all load misses that cause a page walk > + 0x02 extra: walk_completed Counts number of completed page walks due to load miss in the STLB > + 0x10 extra: stlb_hit Number of cache load STLB hits > + 0x20 extra: pde_miss Number of DTLB cache load misses where the low part of the linear to physical address translation was missed > + 0x40 extra: pdp_miss Number of DTLB cache load misses where the high part of the linear to physical address translation was missed > + 0x80 extra: large_walk_completed Counts number of completed large page walks due to load miss in the STLB > name:memory_disambiguration type:bitmask default:0x01 > - 0x01 reset Counts memory disambiguration reset cycles > - 0x02 success Counts the number of loads that memory disambiguration succeeded > - 0x04 watchdog Counts the number of times the memory disambiguration watchdog kicked in > - 0x08 watch_cycles Counts the cycles that the memory disambiguration watchdog is active > + 0x01 extra: reset Counts memory disambiguration reset cycles > + 0x02 extra: success Counts the number of loads that memory disambiguration succeeded > + 0x04 extra: watchdog Counts the number of times the memory disambiguration watchdog kicked in > + 0x08 extra: watch_cycles Counts the cycles that the memory disambiguration watchdog is active > name:mem_inst_retired type:bitmask default:0x01 > - 0x01 loads Counts the number of instructions with an architecturally-visible store retired on the architected path > - 0x02 stores Counts the number of instructions with an architecturally-visible store retired on the architected path > + 0x01 extra: loads Counts the number of instructions with an architecturally-visible store retired on the architected path > + 0x02 extra: stores Counts the number of instructions with an architecturally-visible store retired on the architected path > name:mem_store_retired type:mandatory default:0x01 > - 0x01 dtlb_miss The event counts the number of retired stores that missed the DTLB > + 0x01 extra: dtlb_miss The event counts the number of retired stores that missed the DTLB > name:uops_issued type:bitmask default:0x01 > - 0x01 any Counts the number of Uops issued by the Register Allocation Table to the Reservation Station, i > - 0x01 stalled_cycles Counts the number of cycles no Uops issued by the Register Allocation Table to the Reservation Station, i > - 0x02 fused Counts the number of fused Uops that were issued from the Register Allocation Table to the Reservation Station > + 0x01 extra: any Counts the number of Uops issued by the Register Allocation Table to the Reservation Station, i > + 0x01 extra: stalled_cycles Counts the number of cycles no Uops issued by the Register Allocation Table to the Reservation Station, i > + 0x02 extra: fused Counts the number of fused Uops that were issued from the Register Allocation Table to the Reservation Station > name:mem_uncore_retired type:bitmask default:0x02 > - 0x02 other_core_l2_hitm Counts number of memory load instructions retired where the memory reference hit modified data in a sibling core residing on the same socket > - 0x08 remote_cache_local_home_hit Counts number of memory load instructions retired where the memory reference missed the L1, L2 and LLC caches and HIT in a remote socket's cache > - 0x10 remote_dram Counts number of memory load instructions retired where the memory reference missed the L1, L2 and LLC caches and was remotely homed > - 0x20 local_dram Counts number of memory load instructions retired where the memory reference missed the L1, L2 and LLC caches and required a local socket memory reference > + 0x02 extra: other_core_l2_hitm Counts number of memory load instructions retired where the memory reference hit modified data in a sibling core residing on the same socket > + 0x08 extra: remote_cache_local_home_hit Counts number of memory load instructions retired where the memory reference missed the L1, L2 and LLC caches and HIT in a remote socket's cache > + 0x10 extra: remote_dram Counts number of memory load instructions retired where the memory reference missed the L1, L2 and LLC caches and was remotely homed... [truncated message content] |
From: Andi K. <an...@fi...> - 2013-06-18 23:40:49
|
From: Andi Kleen <ak...@li...> Add a dummy extra flag that is set when any extra: (including an empty one) field is present in a unit mask line. This can be used to express that the unit mask has a unique name as the first word of the description. This is mapped to the x86 enabled flag, as we have to pass it around as part of the event mask. Signed-off-by: Andi Kleen <ak...@li...> --- libop/op_events.c | 3 ++- libop/op_events.h | 1 + 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/libop/op_events.c b/libop/op_events.c index 67e85f3..276386d 100644 --- a/libop/op_events.c +++ b/libop/op_events.c @@ -114,7 +114,8 @@ unsigned parse_extra(const char *s) unsigned v, w; int o; - v = 0; + /* This signifies that the first word of the description is unique */ + v = EXTRA_NONE; while (*s) { if (isspace(*s)) break; diff --git a/libop/op_events.h b/libop/op_events.h index 425f767..a437f2a 100644 --- a/libop/op_events.h +++ b/libop/op_events.h @@ -26,6 +26,7 @@ extern "C" { #define EXTRA_CMASK_SHIFT 24 #define EXTRA_CMASK_MASK 0xff #define EXTRA_PEBS (1U << 19) /* fake, mapped to pin control, but mapped back for perf */ +#define EXTRA_NONE (1U << 22) /* mapped to enabled */ /* * For timer based sampling some targets (e.g. s390) use a virtual -- 1.8.1.4 |
From: Andi K. <an...@fi...> - 2013-06-18 23:40:50
|
From: Suravee Suthikulpanit <sur...@am...> In the new unit mask parsing scheme, "extra:" should always followed by named mask. This patch fixes the entries which does not follow the scheme. Signed-off-by: Suravee Suthikulpanit <sur...@am...> --- events/i386/core_2/unit_masks | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/events/i386/core_2/unit_masks b/events/i386/core_2/unit_masks index f1d64eb..6bc0960 100644 --- a/events/i386/core_2/unit_masks +++ b/events/i386/core_2/unit_masks @@ -33,7 +33,7 @@ name:sse_prefetch type:exclusive default:0x0 0x00 prefetch NTA instructions executed. 0x01 prefetch T1 instructions executed. 0x02 prefetch T1 and T2 instructions executed. - 0x03 extra: SSE weakly-ordered stores + 0x03 SSE weakly-ordered stores name:simd_instr_type_exec type:bitmask default:0x3f 0x01 SIMD packed multiplies 0x02 SIMD packed shifts @@ -41,7 +41,7 @@ name:simd_instr_type_exec type:bitmask default:0x3f 0x08 SIMD unpack operations 0x10 SIMD packed logical 0x20 SIMD packed arithmetic - 0x3f extra: all of the above + 0x3f all of the above name:mmx_trans type:bitmask default:0x3 0x01 float->MMX transitions 0x02 MMX->float transitions @@ -138,8 +138,8 @@ name:inst_retired type:bitmask default:0x00 0x02 extra: Stores 0x04 extra: Other name:x87_ops_retired type:exclusive default:0xfe - 0x01 extra: FXCH instructions retired - 0xfe extra: Retired floating-point computational operations (precise) + 0x01 FXCH instructions retired + 0xfe Retired floating-point computational operations (precise) name:uops_retired type:bitmask default:0x0f 0x01 Fused load+op or load+indirect branch retired 0x02 Fused store address + data retired -- 1.8.1.4 |
From: Maynard J. <may...@us...> - 2013-06-19 22:22:41
|
On 06/18/2013 06:07 PM, Andi Kleen wrote: > Including Suravee's patch for named default masks, and > another patch to make all Intel default unit masks unique. > Andi and Suravee, Thank you very much for your collaboration so far in getting the unit mask situation cleaned up. The issues that I'm aware of that need to be fixed are: 1) User guide and man page updates to reflect the fact that named unit masks have a "name" field, and "extra" is no longer displayed. (I'll do this.) 2) A few Ivybridge unit masks have duplicate names (sending separate message about this). 3) Related to #2, we should try to do some automatic sanity tests when parsing the unit mask file to ensure we don't have any duplicate names. The sanity tests should find such duplicates either during normal run time, but also during 'make distcheck'. 4) Specifying a named unit mask associated with a dummy extra field appears to *always* result in "No sample file found". However, given an appropriate workload, substituting the hex value works OK. For example: On Sandybridge, profiling a testcase that does lots of arithmetic operations on large arrays of numbers as follows: operf -e ld_blocks:100000:0x10 ./load_v2 results in about 30,000 samples. But if I use the "all_block" name for the unit mask instead of '0x10': operf -e ld_blocks:100000:all_block ./load_v2 the result is "No sample file found". 5) Some named unit masks have hex values that are duplicates of others in the same UM entry, thus requiring the use of the name to disambiguate them. Specifying one of the hex values that are duplicated should result in an error -- but there are cases where it doesn't. For example: Here's a properly working example on Sandybridge: [mpjohn@oc1757000783 test-stuff]$ ophelp --check-events int_misc:2000000:0x3 Unit mask (0x3) is non unique. Please specify the unit mask using the first word of the description and here's a failing example: [mpjohn@oc1757000783 test-stuff]$ ophelp --check-events l1d_pend_miss:2000000:0x1 2 The unit mask descriptions for these two cases are: ============= BAD: name:l1d_pend_miss type:bitmask default:pending 0x1 extra: pending Cycles with L1D load Misses outstanding. 0x1 extra:cmask=1,edge occurences This event counts the number of L1D misses outstanding occurences. GOOD: name:int_misc type:bitmask default:0x40 0x40 extra: rat_stall_cycles Cycles Resource Allocation Table (RAT) external stall is sent to Instruction Decode Queue (IDQ) for this thread. 0x3 extra:cmask=1 recovery_cycles Number of cycles waiting to be recover after Nuke due to all other cases except JEClear. 0x3 extra:cmask=1,edge recovery_stalls_count Edge applied to recovery_cycles, thus counts occurrences. ============= The obvious difference between these two is that in the BAD case, one of the "extra" fields is a dummy, but in the GOOD case, both "extra" fields have real extra values associated with them. I don't know if this difference is the cause of the problem, but it's certainly a good place to start looking. Thanks. -Maynard > -Andi > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > > http://p.sf.net/sfu/windows-dev2dev > _______________________________________________ > oprofile-list mailing list > opr...@li... > https://lists.sourceforge.net/lists/listinfo/oprofile-list > |
From: Maynard J. <may...@us...> - 2013-06-24 15:15:05
|
On 06/19/2013 05:22 PM, Maynard Johnson wrote: > On 06/18/2013 06:07 PM, Andi Kleen wrote: >> Including Suravee's patch for named default masks, and >> another patch to make all Intel default unit masks unique. I've committed the 5-patch set that included patches from Suravee and Andi. Along with this, I also committed the patch I posted on June 21 to fix problem #4 described below. And finally, I also committed the doc patch that I posted this morning that corresponds with these changes. *Suravee and Andi* Please do 'git pull' and test/validate these changes. Thanks again for working together to get this done! :-) The following open issues are relatively minor, but it would be ideal to have these things taken care of, too. Known open issues ----------------- 1) A few Ivybridge unit masks have duplicate names (I posted a message to *Andi* about this on June 19 -- subject: "Re: [PATCH 2/5] Add empty extra: line to unique Intel events v2"). 2) Related to #1, we should try to do some automatic sanity tests when parsing the unit mask file to ensure we don't have any duplicate names. The sanity tests should find such duplicates either during normal run time, but also during 'make distcheck'. 3) Some named unit masks have hex values that are duplicates of others in the same UM entry, thus requiring the use of the name to disambiguate them. Specifying one of the hex values that are duplicated should result in an error -- but there are cases where it doesn't. For example: Here's a properly working example on Sandybridge: [mpjohn@oc1757000783 test-stuff]$ ophelp --check-events int_misc:2000000:0x3 Unit mask (0x3) is non unique. Please specify the unit mask using the first word of the description and here's a failing example: [mpjohn@oc1757000783 test-stuff]$ ophelp --check-events l1d_pend_miss:2000000:0x1 2 The unit mask descriptions for these two cases are: ============= BAD: name:l1d_pend_miss type:bitmask default:pending 0x1 extra: pending Cycles with L1D load Misses outstanding. 0x1 extra:cmask=1,edge occurences This event counts the number of L1D misses outstanding occurences. GOOD: name:int_misc type:bitmask default:0x40 0x40 extra: rat_stall_cycles Cycles Resource Allocation Table (RAT) external stall is sent to Instruction Decode Queue (IDQ) for this thread. 0x3 extra:cmask=1 recovery_cycles Number of cycles waiting to be recover after Nuke due to all other cases except JEClear. 0x3 extra:cmask=1,edge recovery_stalls_count Edge applied to recovery_cycles, thus counts occurrences. ============= The obvious difference between these two is that in the BAD case, one of the "extra" fields is a dummy, but in the GOOD case, both "extra" fields have real extra values associated with them. I don't know if this difference is the cause of the problem, but it's certainly a good place to start looking. ------------------------------------ -Maynard >> > Andi and Suravee, > Thank you very much for your collaboration so far in getting the unit mask situation cleaned up. The issues that I'm aware of that need to be fixed are: > > 1) User guide and man page updates to reflect the fact that named unit masks have a "name" field, and "extra" is no longer displayed. (I'll do this.) > 2) A few Ivybridge unit masks have duplicate names (sending separate message about this). > 3) Related to #2, we should try to do some automatic sanity tests when parsing the unit mask file to ensure we don't have any duplicate names. The sanity tests should find such duplicates either during normal run time, but also during 'make distcheck'. > 4) Specifying a named unit mask associated with a dummy extra field appears to *always* result in "No sample file found". However, given an appropriate workload, substituting the hex value works OK. For example: > On Sandybridge, profiling a testcase that does lots of arithmetic operations on large arrays of numbers > as follows: > operf -e ld_blocks:100000:0x10 ./load_v2 > results in about 30,000 samples. But if I use the "all_block" name for the unit mask instead of '0x10': > operf -e ld_blocks:100000:all_block ./load_v2 > the result is "No sample file found". > 5) Some named unit masks have hex values that are duplicates of others in the same UM entry, thus requiring the use of the name to disambiguate them. Specifying one of the hex values that are duplicated should result in an error -- but there are cases where it doesn't. For example: > Here's a properly working example on Sandybridge: > [mpjohn@oc1757000783 test-stuff]$ ophelp --check-events int_misc:2000000:0x3 > Unit mask (0x3) is non unique. > Please specify the unit mask using the first word of the description > > and here's a failing example: > [mpjohn@oc1757000783 test-stuff]$ ophelp --check-events l1d_pend_miss:2000000:0x1 > 2 > > The unit mask descriptions for these two cases are: > ============= > BAD: > name:l1d_pend_miss type:bitmask default:pending > 0x1 extra: pending Cycles with L1D load Misses outstanding. > 0x1 extra:cmask=1,edge occurences This event counts the number of L1D misses outstanding occurences. > > GOOD: > name:int_misc type:bitmask default:0x40 > 0x40 extra: rat_stall_cycles Cycles Resource Allocation Table (RAT) external stall is sent to Instruction Decode Queue (IDQ) for this thread. > 0x3 extra:cmask=1 recovery_cycles Number of cycles waiting to be recover after Nuke due to all other cases except JEClear. > 0x3 extra:cmask=1,edge recovery_stalls_count Edge applied to recovery_cycles, thus counts occurrences. > ============= > > The obvious difference between these two is that in the BAD case, one of the "extra" fields is a dummy, > but in the GOOD case, both "extra" fields have real extra values associated with them. I don't know if > this difference is the cause of the problem, but it's certainly a good place to start looking. > > Thanks. > > -Maynard > >> -Andi >> >> >> ------------------------------------------------------------------------------ >> This SF.net email is sponsored by Windows: >> >> Build for Windows Store. >> >> http://p.sf.net/sfu/windows-dev2dev >> _______________________________________________ >> oprofile-list mailing list >> opr...@li... >> https://lists.sourceforge.net/lists/listinfo/oprofile-list >> > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > > http://p.sf.net/sfu/windows-dev2dev > _______________________________________________ > oprofile-list mailing list > opr...@li... > https://lists.sourceforge.net/lists/listinfo/oprofile-list > |
From: Suthikulpanit, S. <Sur...@am...> - 2013-06-25 08:22:37
|
Maynard, Thank you for the documentation updates and the commits. I have tested on AMD system and things look good. However, "make distcheck" fails and complains about events/i386/ivybridge/unit_masks at line 140 where there is an extra "+" at the beginning of the line. Once I take this out, it passes the check. Suravee -----Original Message----- From: Maynard Johnson [mailto:may...@us...] Sent: Monday, June 24, 2013 10:14 AM To: Andi Kleen; Suthikulpanit, Suravee Cc: oprofile-list Subject: Re: New version of default unit masks patchkit On 06/19/2013 05:22 PM, Maynard Johnson wrote: > On 06/18/2013 06:07 PM, Andi Kleen wrote: >> Including Suravee's patch for named default masks, and another patch >> to make all Intel default unit masks unique. I've committed the 5-patch set that included patches from Suravee and Andi. Along with this, I also committed the patch I posted on June 21 to fix problem #4 described below. And finally, I also committed the doc patch that I posted this morning that corresponds with these changes. *Suravee and Andi* Please do 'git pull' and test/validate these changes. Thanks again for working together to get this done! :-) The following open issues are relatively minor, but it would be ideal to have these things taken care of, too. Known open issues ----------------- 1) A few Ivybridge unit masks have duplicate names (I posted a message to *Andi* about this on June 19 -- subject: "Re: [PATCH 2/5] Add empty extra: line to unique Intel events v2"). 2) Related to #1, we should try to do some automatic sanity tests when parsing the unit mask file to ensure we don't have any duplicate names. The sanity tests should find such duplicates either during normal run time, but also during 'make distcheck'. 3) Some named unit masks have hex values that are duplicates of others in the same UM entry, thus requiring the use of the name to disambiguate them. Specifying one of the hex values that are duplicated should result in an error -- but there are cases where it doesn't. For example: Here's a properly working example on Sandybridge: [mpjohn@oc1757000783 test-stuff]$ ophelp --check-events int_misc:2000000:0x3 Unit mask (0x3) is non unique. Please specify the unit mask using the first word of the description and here's a failing example: [mpjohn@oc1757000783 test-stuff]$ ophelp --check-events l1d_pend_miss:2000000:0x1 2 The unit mask descriptions for these two cases are: ============= BAD: name:l1d_pend_miss type:bitmask default:pending 0x1 extra: pending Cycles with L1D load Misses outstanding. 0x1 extra:cmask=1,edge occurences This event counts the number of L1D misses outstanding occurences. GOOD: name:int_misc type:bitmask default:0x40 0x40 extra: rat_stall_cycles Cycles Resource Allocation Table (RAT) external stall is sent to Instruction Decode Queue (IDQ) for this thread. 0x3 extra:cmask=1 recovery_cycles Number of cycles waiting to be recover after Nuke due to all other cases except JEClear. 0x3 extra:cmask=1,edge recovery_stalls_count Edge applied to recovery_cycles, thus counts occurrences. ============= The obvious difference between these two is that in the BAD case, one of the "extra" fields is a dummy, but in the GOOD case, both "extra" fields have real extra values associated with them. I don't know if this difference is the cause of the problem, but it's certainly a good place to start looking. ------------------------------------ -Maynard >> > Andi and Suravee, > Thank you very much for your collaboration so far in getting the unit mask situation cleaned up. The issues that I'm aware of that need to be fixed are: > > 1) User guide and man page updates to reflect the fact that named unit > masks have a "name" field, and "extra" is no longer displayed. (I'll > do this.) > 2) A few Ivybridge unit masks have duplicate names (sending separate message about this). > 3) Related to #2, we should try to do some automatic sanity tests when parsing the unit mask file to ensure we don't have any duplicate names. The sanity tests should find such duplicates either during normal run time, but also during 'make distcheck'. > 4) Specifying a named unit mask associated with a dummy extra field appears to *always* result in "No sample file found". However, given an appropriate workload, substituting the hex value works OK. For example: > On Sandybridge, profiling a testcase that does lots of arithmetic operations on large arrays of numbers > as follows: > operf -e ld_blocks:100000:0x10 ./load_v2 > results in about 30,000 samples. But if I use the "all_block" name for the unit mask instead of '0x10': > operf -e ld_blocks:100000:all_block ./load_v2 > the result is "No sample file found". > 5) Some named unit masks have hex values that are duplicates of others in the same UM entry, thus requiring the use of the name to disambiguate them. Specifying one of the hex values that are duplicated should result in an error -- but there are cases where it doesn't. For example: > Here's a properly working example on Sandybridge: > [mpjohn@oc1757000783 test-stuff]$ ophelp --check-events int_misc:2000000:0x3 > Unit mask (0x3) is non unique. > Please specify the unit mask using the first word of the > description > > and here's a failing example: > [mpjohn@oc1757000783 test-stuff]$ ophelp --check-events l1d_pend_miss:2000000:0x1 > 2 > > The unit mask descriptions for these two cases are: > ============= > BAD: > name:l1d_pend_miss type:bitmask default:pending > 0x1 extra: pending Cycles with L1D load Misses outstanding. > 0x1 extra:cmask=1,edge occurences This event counts the number of L1D misses outstanding occurences. > > GOOD: > name:int_misc type:bitmask default:0x40 > 0x40 extra: rat_stall_cycles Cycles Resource Allocation Table (RAT) external stall is sent to Instruction Decode Queue (IDQ) for this thread. > 0x3 extra:cmask=1 recovery_cycles Number of cycles waiting to be recover after Nuke due to all other cases except JEClear. > 0x3 extra:cmask=1,edge recovery_stalls_count Edge applied to recovery_cycles, thus counts occurrences. > ============= > > The obvious difference between these two is that in the BAD case, one of the "extra" fields is a dummy, > but in the GOOD case, both "extra" fields have real extra values associated with them. I don't know if > this difference is the cause of the problem, but it's certainly a good place to start looking. > > Thanks. > > -Maynard > >> -Andi >> >> >> --------------------------------------------------------------------- >> --------- This SF.net email is sponsored by Windows: >> >> Build for Windows Store. >> >> http://p.sf.net/sfu/windows-dev2dev >> _______________________________________________ >> oprofile-list mailing list >> opr...@li... >> https://lists.sourceforge.net/lists/listinfo/oprofile-list >> > > > ---------------------------------------------------------------------- > -------- This SF.net email is sponsored by Windows: > > Build for Windows Store. > > http://p.sf.net/sfu/windows-dev2dev > _______________________________________________ > oprofile-list mailing list > opr...@li... > https://lists.sourceforge.net/lists/listinfo/oprofile-list > |
From: Maynard J. <may...@us...> - 2013-06-25 15:52:20
|
On 06/25/2013 03:22 AM, Suthikulpanit, Suravee wrote: > Maynard, > > Thank you for the documentation updates and the commits. I have tested on AMD system and things look good. However, "make distcheck" fails and complains about events/i386/ivybridge/unit_masks at line 140 where there is an extra "+" at the beginning of the line. Once I take this out, it passes the check. Suravee, Thanks for thinking of running 'make distcheck' and catching that problem. I fixed it and just committed the fix. -Maynard > > Suravee > > -----Original Message----- > From: Maynard Johnson [mailto:may...@us...] > Sent: Monday, June 24, 2013 10:14 AM > To: Andi Kleen; Suthikulpanit, Suravee > Cc: oprofile-list > Subject: Re: New version of default unit masks patchkit > > On 06/19/2013 05:22 PM, Maynard Johnson wrote: >> On 06/18/2013 06:07 PM, Andi Kleen wrote: >>> Including Suravee's patch for named default masks, and another patch >>> to make all Intel default unit masks unique. > > I've committed the 5-patch set that included patches from Suravee and Andi. Along with this, I also committed the patch I posted on June 21 to fix problem #4 described below. And finally, I also committed the doc patch that I posted this morning that corresponds with these changes. > > *Suravee and Andi* > Please do 'git pull' and test/validate these changes. Thanks again for working together to get this done! :-) > > The following open issues are relatively minor, but it would be ideal to have these things taken care of, too. > > Known open issues > ----------------- > 1) A few Ivybridge unit masks have duplicate names (I posted a message to *Andi* about this on June 19 -- subject: "Re: [PATCH 2/5] Add empty extra: line to unique Intel events v2"). > 2) Related to #1, we should try to do some automatic sanity tests when parsing the unit mask file to ensure we don't have any duplicate names. The sanity tests should find such duplicates either during normal run time, but also during 'make distcheck'. > 3) Some named unit masks have hex values that are duplicates of others in the same UM entry, thus requiring the use of the name to disambiguate them. Specifying one of the hex values that are duplicated should result in an error -- but there are cases where it doesn't. For example: > Here's a properly working example on Sandybridge: > [mpjohn@oc1757000783 test-stuff]$ ophelp --check-events int_misc:2000000:0x3 > Unit mask (0x3) is non unique. > Please specify the unit mask using the first word of the description > > and here's a failing example: > [mpjohn@oc1757000783 test-stuff]$ ophelp --check-events l1d_pend_miss:2000000:0x1 > 2 > > The unit mask descriptions for these two cases are: > ============= > BAD: > name:l1d_pend_miss type:bitmask default:pending > 0x1 extra: pending Cycles with L1D load Misses outstanding. > 0x1 extra:cmask=1,edge occurences This event counts the number of L1D misses outstanding occurences. > > GOOD: > name:int_misc type:bitmask default:0x40 > 0x40 extra: rat_stall_cycles Cycles Resource Allocation Table (RAT) external stall is sent to Instruction Decode Queue (IDQ) for this thread. > 0x3 extra:cmask=1 recovery_cycles Number of cycles waiting to be recover after Nuke due to all other cases except JEClear. > 0x3 extra:cmask=1,edge recovery_stalls_count Edge applied to recovery_cycles, thus counts occurrences. > ============= > > The obvious difference between these two is that in the BAD case, one of the "extra" fields is a dummy, > but in the GOOD case, both "extra" fields have real extra values associated with them. I don't know if > this difference is the cause of the problem, but it's certainly a good place to start looking. > > ------------------------------------ > > -Maynard > >>> >> Andi and Suravee, >> Thank you very much for your collaboration so far in getting the unit mask situation cleaned up. The issues that I'm aware of that need to be fixed are: >> >> 1) User guide and man page updates to reflect the fact that named unit >> masks have a "name" field, and "extra" is no longer displayed. (I'll >> do this.) >> 2) A few Ivybridge unit masks have duplicate names (sending separate message about this). >> 3) Related to #2, we should try to do some automatic sanity tests when parsing the unit mask file to ensure we don't have any duplicate names. The sanity tests should find such duplicates either during normal run time, but also during 'make distcheck'. > >> 4) Specifying a named unit mask associated with a dummy extra field appears to *always* result in "No sample file found". However, given an appropriate workload, substituting the hex value works OK. For example: >> On Sandybridge, profiling a testcase that does lots of arithmetic operations on large arrays of numbers >> as follows: >> operf -e ld_blocks:100000:0x10 ./load_v2 >> results in about 30,000 samples. But if I use the "all_block" name for the unit mask instead of '0x10': >> operf -e ld_blocks:100000:all_block ./load_v2 >> the result is "No sample file found". >> 5) Some named unit masks have hex values that are duplicates of others in the same UM entry, thus requiring the use of the name to disambiguate them. Specifying one of the hex values that are duplicated should result in an error -- but there are cases where it doesn't. For example: >> Here's a properly working example on Sandybridge: >> [mpjohn@oc1757000783 test-stuff]$ ophelp --check-events int_misc:2000000:0x3 >> Unit mask (0x3) is non unique. >> Please specify the unit mask using the first word of the >> description >> >> and here's a failing example: >> [mpjohn@oc1757000783 test-stuff]$ ophelp --check-events l1d_pend_miss:2000000:0x1 >> 2 >> >> The unit mask descriptions for these two cases are: >> ============= >> BAD: >> name:l1d_pend_miss type:bitmask default:pending >> 0x1 extra: pending Cycles with L1D load Misses outstanding. >> 0x1 extra:cmask=1,edge occurences This event counts the number of L1D misses outstanding occurences. >> >> GOOD: >> name:int_misc type:bitmask default:0x40 >> 0x40 extra: rat_stall_cycles Cycles Resource Allocation Table (RAT) external stall is sent to Instruction Decode Queue (IDQ) for this thread. >> 0x3 extra:cmask=1 recovery_cycles Number of cycles waiting to be recover after Nuke due to all other cases except JEClear. >> 0x3 extra:cmask=1,edge recovery_stalls_count Edge applied to recovery_cycles, thus counts occurrences. >> ============= >> >> The obvious difference between these two is that in the BAD case, one of the "extra" fields is a dummy, >> but in the GOOD case, both "extra" fields have real extra values associated with them. I don't know if >> this difference is the cause of the problem, but it's certainly a good place to start looking. >> >> Thanks. >> >> -Maynard >> >>> -Andi >>> >>> >>> --------------------------------------------------------------------- >>> --------- This SF.net email is sponsored by Windows: >>> >>> Build for Windows Store. >>> >>> http://p.sf.net/sfu/windows-dev2dev >>> _______________________________________________ >>> oprofile-list mailing list >>> opr...@li... >>> https://lists.sourceforge.net/lists/listinfo/oprofile-list >>> >> >> >> ---------------------------------------------------------------------- >> -------- This SF.net email is sponsored by Windows: >> >> Build for Windows Store. >> >> http://p.sf.net/sfu/windows-dev2dev >> _______________________________________________ >> oprofile-list mailing list >> opr...@li... >> https://lists.sourceforge.net/lists/listinfo/oprofile-list >> > > > |
From: Suthikulpanit, S. <Sur...@am...> - 2013-06-25 08:28:46
|
I can also help looking at issue "3" below. Suravee -----Original Message----- From: Suthikulpanit, Suravee [mailto:Sur...@am...] Sent: Tuesday, June 25, 2013 3:22 AM To: 'Maynard Johnson'; 'Andi Kleen' Cc: 'oprofile-list' Subject: RE: New version of default unit masks patchkit Maynard, Thank you for the documentation updates and the commits. I have tested on AMD system and things look good. However, "make distcheck" fails and complains about events/i386/ivybridge/unit_masks at line 140 where there is an extra "+" at the beginning of the line. Once I take this out, it passes the check. Suravee -----Original Message----- From: Maynard Johnson [mailto:may...@us...] Sent: Monday, June 24, 2013 10:14 AM To: Andi Kleen; Suthikulpanit, Suravee Cc: oprofile-list Subject: Re: New version of default unit masks patchkit On 06/19/2013 05:22 PM, Maynard Johnson wrote: > On 06/18/2013 06:07 PM, Andi Kleen wrote: >> Including Suravee's patch for named default masks, and another patch >> to make all Intel default unit masks unique. I've committed the 5-patch set that included patches from Suravee and Andi. Along with this, I also committed the patch I posted on June 21 to fix problem #4 described below. And finally, I also committed the doc patch that I posted this morning that corresponds with these changes. *Suravee and Andi* Please do 'git pull' and test/validate these changes. Thanks again for working together to get this done! :-) The following open issues are relatively minor, but it would be ideal to have these things taken care of, too. Known open issues ----------------- 1) A few Ivybridge unit masks have duplicate names (I posted a message to *Andi* about this on June 19 -- subject: "Re: [PATCH 2/5] Add empty extra: line to unique Intel events v2"). 2) Related to #1, we should try to do some automatic sanity tests when parsing the unit mask file to ensure we don't have any duplicate names. The sanity tests should find such duplicates either during normal run time, but also during 'make distcheck'. 3) Some named unit masks have hex values that are duplicates of others in the same UM entry, thus requiring the use of the name to disambiguate them. Specifying one of the hex values that are duplicated should result in an error -- but there are cases where it doesn't. For example: Here's a properly working example on Sandybridge: [mpjohn@oc1757000783 test-stuff]$ ophelp --check-events int_misc:2000000:0x3 Unit mask (0x3) is non unique. Please specify the unit mask using the first word of the description and here's a failing example: [mpjohn@oc1757000783 test-stuff]$ ophelp --check-events l1d_pend_miss:2000000:0x1 2 The unit mask descriptions for these two cases are: ============= BAD: name:l1d_pend_miss type:bitmask default:pending 0x1 extra: pending Cycles with L1D load Misses outstanding. 0x1 extra:cmask=1,edge occurences This event counts the number of L1D misses outstanding occurences. GOOD: name:int_misc type:bitmask default:0x40 0x40 extra: rat_stall_cycles Cycles Resource Allocation Table (RAT) external stall is sent to Instruction Decode Queue (IDQ) for this thread. 0x3 extra:cmask=1 recovery_cycles Number of cycles waiting to be recover after Nuke due to all other cases except JEClear. 0x3 extra:cmask=1,edge recovery_stalls_count Edge applied to recovery_cycles, thus counts occurrences. ============= The obvious difference between these two is that in the BAD case, one of the "extra" fields is a dummy, but in the GOOD case, both "extra" fields have real extra values associated with them. I don't know if this difference is the cause of the problem, but it's certainly a good place to start looking. ------------------------------------ -Maynard >> > Andi and Suravee, > Thank you very much for your collaboration so far in getting the unit mask situation cleaned up. The issues that I'm aware of that need to be fixed are: > > 1) User guide and man page updates to reflect the fact that named unit > masks have a "name" field, and "extra" is no longer displayed. (I'll > do this.) > 2) A few Ivybridge unit masks have duplicate names (sending separate message about this). > 3) Related to #2, we should try to do some automatic sanity tests when parsing the unit mask file to ensure we don't have any duplicate names. The sanity tests should find such duplicates either during normal run time, but also during 'make distcheck'. > 4) Specifying a named unit mask associated with a dummy extra field appears to *always* result in "No sample file found". However, given an appropriate workload, substituting the hex value works OK. For example: > On Sandybridge, profiling a testcase that does lots of arithmetic operations on large arrays of numbers > as follows: > operf -e ld_blocks:100000:0x10 ./load_v2 > results in about 30,000 samples. But if I use the "all_block" name for the unit mask instead of '0x10': > operf -e ld_blocks:100000:all_block ./load_v2 > the result is "No sample file found". > 5) Some named unit masks have hex values that are duplicates of others in the same UM entry, thus requiring the use of the name to disambiguate them. Specifying one of the hex values that are duplicated should result in an error -- but there are cases where it doesn't. For example: > Here's a properly working example on Sandybridge: > [mpjohn@oc1757000783 test-stuff]$ ophelp --check-events int_misc:2000000:0x3 > Unit mask (0x3) is non unique. > Please specify the unit mask using the first word of the > description > > and here's a failing example: > [mpjohn@oc1757000783 test-stuff]$ ophelp --check-events l1d_pend_miss:2000000:0x1 > 2 > > The unit mask descriptions for these two cases are: > ============= > BAD: > name:l1d_pend_miss type:bitmask default:pending > 0x1 extra: pending Cycles with L1D load Misses outstanding. > 0x1 extra:cmask=1,edge occurences This event counts the number of L1D misses outstanding occurences. > > GOOD: > name:int_misc type:bitmask default:0x40 > 0x40 extra: rat_stall_cycles Cycles Resource Allocation Table (RAT) external stall is sent to Instruction Decode Queue (IDQ) for this thread. > 0x3 extra:cmask=1 recovery_cycles Number of cycles waiting to be recover after Nuke due to all other cases except JEClear. > 0x3 extra:cmask=1,edge recovery_stalls_count Edge applied to recovery_cycles, thus counts occurrences. > ============= > > The obvious difference between these two is that in the BAD case, one of the "extra" fields is a dummy, > but in the GOOD case, both "extra" fields have real extra values associated with them. I don't know if > this difference is the cause of the problem, but it's certainly a good place to start looking. > > Thanks. > > -Maynard > >> -Andi >> >> >> --------------------------------------------------------------------- >> --------- This SF.net email is sponsored by Windows: >> >> Build for Windows Store. >> >> http://p.sf.net/sfu/windows-dev2dev >> _______________________________________________ >> oprofile-list mailing list >> opr...@li... >> https://lists.sourceforge.net/lists/listinfo/oprofile-list >> > > > ---------------------------------------------------------------------- > -------- This SF.net email is sponsored by Windows: > > Build for Windows Store. > > http://p.sf.net/sfu/windows-dev2dev > _______________________________________________ > oprofile-list mailing list > opr...@li... > https://lists.sourceforge.net/lists/listinfo/oprofile-list > ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ oprofile-list mailing list opr...@li... https://lists.sourceforge.net/lists/listinfo/oprofile-list |
From: Maynard J. <may...@us...> - 2013-06-25 15:51:59
|
On 06/25/2013 03:28 AM, Suthikulpanit, Suravee wrote: > I can also help looking at issue "3" below. Thanks! I really appreciate the help. :-) -Maynard > > Suravee > > -----Original Message----- > From: Suthikulpanit, Suravee [mailto:Sur...@am...] > Sent: Tuesday, June 25, 2013 3:22 AM > To: 'Maynard Johnson'; 'Andi Kleen' > Cc: 'oprofile-list' > Subject: RE: New version of default unit masks patchkit > > Maynard, > > Thank you for the documentation updates and the commits. I have tested on AMD system and things look good. However, "make distcheck" fails and complains about events/i386/ivybridge/unit_masks at line 140 where there is an extra "+" at the beginning of the line. Once I take this out, it passes the check. > > Suravee > > -----Original Message----- > From: Maynard Johnson [mailto:may...@us...] > Sent: Monday, June 24, 2013 10:14 AM > To: Andi Kleen; Suthikulpanit, Suravee > Cc: oprofile-list > Subject: Re: New version of default unit masks patchkit > > On 06/19/2013 05:22 PM, Maynard Johnson wrote: >> On 06/18/2013 06:07 PM, Andi Kleen wrote: >>> Including Suravee's patch for named default masks, and another patch >>> to make all Intel default unit masks unique. > > I've committed the 5-patch set that included patches from Suravee and Andi. Along with this, I also committed the patch I posted on June 21 to fix problem #4 described below. And finally, I also committed the doc patch that I posted this morning that corresponds with these changes. > > *Suravee and Andi* > Please do 'git pull' and test/validate these changes. Thanks again for working together to get this done! :-) > > The following open issues are relatively minor, but it would be ideal to have these things taken care of, too. > > Known open issues > ----------------- > 1) A few Ivybridge unit masks have duplicate names (I posted a message to *Andi* about this on June 19 -- subject: "Re: [PATCH 2/5] Add empty extra: line to unique Intel events v2"). > 2) Related to #1, we should try to do some automatic sanity tests when parsing the unit mask file to ensure we don't have any duplicate names. The sanity tests should find such duplicates either during normal run time, but also during 'make distcheck'. > 3) Some named unit masks have hex values that are duplicates of others in the same UM entry, thus requiring the use of the name to disambiguate them. Specifying one of the hex values that are duplicated should result in an error -- but there are cases where it doesn't. For example: > Here's a properly working example on Sandybridge: > [mpjohn@oc1757000783 test-stuff]$ ophelp --check-events int_misc:2000000:0x3 > Unit mask (0x3) is non unique. > Please specify the unit mask using the first word of the description > > and here's a failing example: > [mpjohn@oc1757000783 test-stuff]$ ophelp --check-events l1d_pend_miss:2000000:0x1 > 2 > > The unit mask descriptions for these two cases are: > ============= > BAD: > name:l1d_pend_miss type:bitmask default:pending > 0x1 extra: pending Cycles with L1D load Misses outstanding. > 0x1 extra:cmask=1,edge occurences This event counts the number of L1D misses outstanding occurences. > > GOOD: > name:int_misc type:bitmask default:0x40 > 0x40 extra: rat_stall_cycles Cycles Resource Allocation Table (RAT) external stall is sent to Instruction Decode Queue (IDQ) for this thread. > 0x3 extra:cmask=1 recovery_cycles Number of cycles waiting to be recover after Nuke due to all other cases except JEClear. > 0x3 extra:cmask=1,edge recovery_stalls_count Edge applied to recovery_cycles, thus counts occurrences. > ============= > > The obvious difference between these two is that in the BAD case, one of the "extra" fields is a dummy, > but in the GOOD case, both "extra" fields have real extra values associated with them. I don't know if > this difference is the cause of the problem, but it's certainly a good place to start looking. > > ------------------------------------ > > -Maynard [snip] |
From: Andi K. <an...@fi...> - 2013-06-25 22:19:10
|
> Known open issues > ----------------- > 1) A few Ivybridge unit masks have duplicate names (I posted a message to *Andi* about this on June 19 -- subject: "Re: [PATCH 2/5] Add empty extra: line to unique Intel events v2"). Which ones? I wrote a script to look for them, but it didn't find any for IvyBridge (only in unit mask files without extra:) #!/usr/bin/perl -n # find duplicated unit masks $line++; if (/name:(.*?) /) { $name = $1; %umasks={}; } if (/^\s+0x[0-9a-f]+/) { @n = split; next if ($n[1] !~ /extra/) ; $um = ($n[1] =~ /extra/) ? $n[2] : $n[1]; if (++$umasks{$um} > 1) { print "$line: $name.$um duplicated\n"; } } -Andi |
From: Maynard J. <may...@us...> - 2013-06-25 22:54:53
|
On 06/25/2013 05:19 PM, Andi Kleen wrote: >> Known open issues >> ----------------- >> 1) A few Ivybridge unit masks have duplicate names (I posted a message to *Andi* about this on June 19 -- subject: "Re: [PATCH 2/5] Add empty extra: line to unique Intel events v2") > Andi, sorry to lead you on a wild goose chase. I don't know why I said "Ivybridge". I looked back at my June 19 message and it was Atom and Nehalem unit masks where I found the duplicates. Running 'ophelp' for those cpu-types, I found the following events having duplicate UM names: Atom: SIMD_UOP_TYPE_EXEC: (counter: all) SIMD packed microops executed (min count: 6000) Unit masks (default 0x1) ---------- 0x01: (name=s) SIMD packed multiply microops executed 0x81: (name=ar) SIMD packed multiply microops retired 0x02: (name=s) SIMD packed shift micro-ops executed 0x82: (name=ar) SIMD packed shift micro-ops retired 0x04: (name=s) SIMD pack micro-ops executed 0x84: (name=ar) SIMD pack micro-ops retired 0x08: (name=s) SIMD unpack micro-ops executed 0x88: (name=ar) SIMD unpack micro-ops retired 0x10: (name=s) SIMD packed logical microops executed 0x90: (name=ar) SIMD packed logical microops retired 0x20: (name=s) SIMD packed arithmetic micro-ops executed 0xa0: ar SIMD packed arithmetic micro-ops retired Nehalem: BR_INST_RETIRED: (counter: 0, 1, 2, 3) number of branch instructions retired (min count: 500) Unit masks (default 0x0) ---------- 0x00: (name=all_branches) See Table A-1 0x01: (name=conditional) Counts the number of conditional branch instructions retired 0x02: (name=near_call) Counts the number of direct & indirect near unconditional calls retired 0x04: (name=all_branches) Counts the number of branch instructions retired -Maynard > Which ones? I wrote a script to look for them, but it didn't find any > for IvyBridge (only in unit mask files without extra:) > > > #!/usr/bin/perl -n > # find duplicated unit masks > > $line++; > if (/name:(.*?) /) { > $name = $1; > %umasks={}; > } > if (/^\s+0x[0-9a-f]+/) { > @n = split; > > next if ($n[1] !~ /extra/) ; > > $um = ($n[1] =~ /extra/) ? $n[2] : $n[1]; > if (++$umasks{$um} > 1) { > print "$line: $name.$um duplicated\n"; > } > } > > > -Andi > |
From: Andi K. <an...@fi...> - 2013-06-25 22:59:49
|
On Tue, Jun 25, 2013 at 05:54:40PM -0500, Maynard Johnson wrote: > On 06/25/2013 05:19 PM, Andi Kleen wrote: > >> Known open issues > >> ----------------- > >> 1) A few Ivybridge unit masks have duplicate names (I posted a message to *Andi* about this on June 19 -- subject: "Re: [PATCH 2/5] Add empty extra: line to unique Intel events v2") > > > Andi, sorry to lead you on a wild goose chase. I don't know why I said "Ivybridge". I looked back at my June 19 message and it was Atom and Nehalem unit masks where I found the duplicates. Running 'ophelp' for those cpu-types, I found the following events having duplicate UM names: Yes, but those don't have extra: don't they? The original plan was to only require uniqueness with extra. I fixed up Atom, but gave up on the others. Will send a patch. -Andi |
From: Maynard J. <may...@us...> - 2013-07-17 21:24:39
|
Hi, Suravee and Andi, If I'm not mistaken, we have the following open issues relating to the unit masks. Issue #1 is more of a "nice-to-have", since it would have identified the duplicate UM names identified below for Atom and Nehalem. But it's not a functional requirement, so I'd be OK with deferring that issue until after releasing 0.9.9. But I really think the other two issues should be fixed. I need to tie up my own loose ends -- i.e., the new 'ocount' tool -- so I'm hoping to get some volunteers to help with these. Thanks! -Maynard ----------------------------------------------------------------------------- 1. We should try to do some automatic sanity tests when parsing the unit mask file to ensure we don't have any duplicate names. The sanity tests should find such duplicates during normal run time, but also during 'make distcheck'. 2) Some named unit masks have hex values that are duplicates of others in the same UM entry, thus requiring the use of the name to disambiguate them. Specifying one of the hex values that are duplicated should result in an error -- but there are cases where it doesn't. For example: Here's a properly working example on Sandybridge: [mpjohn@oc1757000783 test-stuff]$ ophelp --check-events int_misc:2000000:0x3 Unit mask (0x3) is non unique. Please specify the unit mask using the first word of the description and here's a failing example: [mpjohn@oc1757000783 test-stuff]$ ophelp --check-events l1d_pend_miss:2000000:0x1 2 Running 'operf -e l1d_pend_miss:2000000:0x1' works, but what unit mask value is really being used?. The unit mask descriptions for these two cases are: ============= BAD: name:l1d_pend_miss type:bitmask default:pending 0x1 extra: pending Cycles with L1D load Misses outstanding. 0x1 extra:cmask=1,edge occurences This event counts the number of L1D misses outstanding occurences. GOOD: name:int_misc type:bitmask default:0x40 0x40 extra: rat_stall_cycles Cycles Resource Allocation Table (RAT) external stall is sent to Instruction Decode Queue (IDQ) for this thread. 0x3 extra:cmask=1 recovery_cycles Number of cycles waiting to be recover after Nuke due to all other cases except JEClear. 0x3 extra:cmask=1,edge recovery_stalls_count Edge applied to recovery_cycles, thus counts occurrences. ============= The obvious difference between these two is that in the BAD case, one of the "extra" fields is a dummy, but in the GOOD case, both "extra" fields have real extra values associated with them. I don't know if this difference is the cause of the problem, but it's certainly a good place to start looking. 3) Atom and Nehalem unit masks have some duplicate names. Running 'ophelp' for those cpu-types, I found the following events having duplicate UM names: Atom: SIMD_UOP_TYPE_EXEC: (counter: all) SIMD packed microops executed (min count: 6000) Unit masks (default 0x1) ---------- 0x01: (name=s) SIMD packed multiply microops executed 0x81: (name=ar) SIMD packed multiply microops retired 0x02: (name=s) SIMD packed shift micro-ops executed 0x82: (name=ar) SIMD packed shift micro-ops retired 0x04: (name=s) SIMD pack micro-ops executed 0x84: (name=ar) SIMD pack micro-ops retired 0x08: (name=s) SIMD unpack micro-ops executed 0x88: (name=ar) SIMD unpack micro-ops retired 0x10: (name=s) SIMD packed logical microops executed 0x90: (name=ar) SIMD packed logical microops retired 0x20: (name=s) SIMD packed arithmetic micro-ops executed 0xa0: ar SIMD packed arithmetic micro-ops retired Nehalem: BR_INST_RETIRED: (counter: 0, 1, 2, 3) number of branch instructions retired (min count: 500) Unit masks (default 0x0) ---------- 0x00: (name=all_branches) See Table A-1 0x01: (name=conditional) Counts the number of conditional branch instructions retired 0x02: (name=near_call) Counts the number of direct & indirect near unconditional calls retired 0x04: (name=all_branches) Counts the number of branch instructions retired ============================= On 06/19/2013 05:22 PM, Maynard Johnson wrote: > On 06/18/2013 06:07 PM, Andi Kleen wrote: >> Including Suravee's patch for named default masks, and >> another patch to make all Intel default unit masks unique. >> > Andi and Suravee, > Thank you very much for your collaboration so far in getting the unit mask situation cleaned up. The issues that I'm aware of that need to be fixed are: > > 1) User guide and man page updates to reflect the fact that named unit masks have a "name" field, and "extra" is no longer displayed. (I'll do this.) > 2) A few Ivybridge unit masks have duplicate names (sending separate message about this). > 3) Related to #2, we should try to do some automatic sanity tests when parsing the unit mask file to ensure we don't have any duplicate names. The sanity tests should find such duplicates either during normal run time, but also during 'make distcheck'. > 4) Specifying a named unit mask associated with a dummy extra field appears to *always* result in "No sample file found". However, given an appropriate workload, substituting the hex value works OK. For example: > On Sandybridge, profiling a testcase that does lots of arithmetic operations on large arrays of numbers > as follows: > operf -e ld_blocks:100000:0x10 ./load_v2 > results in about 30,000 samples. But if I use the "all_block" name for the unit mask instead of '0x10': > operf -e ld_blocks:100000:all_block ./load_v2 > the result is "No sample file found". > 5) Some named unit masks have hex values that are duplicates of others in the same UM entry, thus requiring the use of the name to disambiguate them. Specifying one of the hex values that are duplicated should result in an error -- but there are cases where it doesn't. For example: > Here's a properly working example on Sandybridge: > [mpjohn@oc1757000783 test-stuff]$ ophelp --check-events int_misc:2000000:0x3 > Unit mask (0x3) is non unique. > Please specify the unit mask using the first word of the description > > and here's a failing example: > [mpjohn@oc1757000783 test-stuff]$ ophelp --check-events l1d_pend_miss:2000000:0x1 > 2 > > The unit mask descriptions for these two cases are: > ============= > BAD: > name:l1d_pend_miss type:bitmask default:pending > 0x1 extra: pending Cycles with L1D load Misses outstanding. > 0x1 extra:cmask=1,edge occurences This event counts the number of L1D misses outstanding occurences. > > GOOD: > name:int_misc type:bitmask default:0x40 > 0x40 extra: rat_stall_cycles Cycles Resource Allocation Table (RAT) external stall is sent to Instruction Decode Queue (IDQ) for this thread. > 0x3 extra:cmask=1 recovery_cycles Number of cycles waiting to be recover after Nuke due to all other cases except JEClear. > 0x3 extra:cmask=1,edge recovery_stalls_count Edge applied to recovery_cycles, thus counts occurrences. > ============= > > The obvious difference between these two is that in the BAD case, one of the "extra" fields is a dummy, > but in the GOOD case, both "extra" fields have real extra values associated with them. I don't know if > this difference is the cause of the problem, but it's certainly a good place to start looking. > > Thanks. > > -Maynard > >> -Andi >> >> >> ------------------------------------------------------------------------------ >> This SF.net email is sponsored by Windows: >> >> Build for Windows Store. >> >> http://p.sf.net/sfu/windows-dev2dev >> _______________________________________________ >> oprofile-list mailing list >> opr...@li... >> https://lists.sourceforge.net/lists/listinfo/oprofile-list >> > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > > http://p.sf.net/sfu/windows-dev2dev > _______________________________________________ > oprofile-list mailing list > opr...@li... > https://lists.sourceforge.net/lists/listinfo/oprofile-list > |
From: Maynard J. <may...@us...> - 2013-07-19 14:01:55
|
On 07/17/2013 04:24 PM, Maynard Johnson wrote: > Hi, Suravee and Andi, > If I'm not mistaken, we have the following open issues relating to the unit masks. Issue #1 is more of a "nice-to-have", since it would have identified the duplicate UM names identified below for Atom and Nehalem. But it's not a functional requirement, so I'd be OK with deferring that issue until after releasing 0.9.9. But I really think the other two issues should be fixed. I need to tie up my own loose ends -- i.e., the new 'ocount' tool -- so I'm hoping to get some volunteers to help with these. > > Thanks! > > -Maynard > > ----------------------------------------------------------------------------- > > 1. We should try to do some automatic sanity tests when parsing the unit mask file to ensure we don't have any duplicate names. The sanity tests should find such duplicates during normal run time, but also during 'make distcheck'. > > 2) Some named unit masks have hex values that are duplicates of others in the same UM entry, thus requiring the use of the name to disambiguate them. Specifying one of the hex values that are duplicated should result in an error -- but there are cases where it doesn't. For example: > Here's a properly working example on Sandybridge: > [mpjohn@oc1757000783 test-stuff]$ ophelp --check-events int_misc:2000000:0x3 > Unit mask (0x3) is non unique. > Please specify the unit mask using the first word of the description > > and here's a failing example: > [mpjohn@oc1757000783 test-stuff]$ ophelp --check-events l1d_pend_miss:2000000:0x1 This bug was an easy fix. I'll post a patch momentarily. -Maynard > 2 > Running 'operf -e l1d_pend_miss:2000000:0x1' works, but what unit mask value is really being used?. > > The unit mask descriptions for these two cases are: > ============= > BAD: > name:l1d_pend_miss type:bitmask default:pending > 0x1 extra: pending Cycles with L1D load Misses outstanding. > 0x1 extra:cmask=1,edge occurences This event counts the number of L1D misses outstanding occurences. > > GOOD: > name:int_misc type:bitmask default:0x40 > 0x40 extra: rat_stall_cycles Cycles Resource Allocation Table (RAT) external stall is sent to Instruction Decode Queue (IDQ) for this thread. > 0x3 extra:cmask=1 recovery_cycles Number of cycles waiting to be recover after Nuke due to all other cases except JEClear. > 0x3 extra:cmask=1,edge recovery_stalls_count Edge applied to recovery_cycles, thus counts occurrences. > ============= > > The obvious difference between these two is that in the BAD case, one of the "extra" fields is a dummy, > but in the GOOD case, both "extra" fields have real extra values associated with them. I don't know if > this difference is the cause of the problem, but it's certainly a good place to start looking. > [snip] |