From: Andi K. <an...@fi...> - 2011-03-11 19:00:18
|
From: Andi Kleen <ak...@li...> The reference files from which the Intel event lists are generated from have various events that use the CMASK, INV or EDGE flags for the performance counters. This often allows to have a "more natural" counter versus a raw counter. This patch adds the infrastructure to add extra flags for a unit mask event. There is a new extra:... field in the unit mask declaration that declares them. Then a patched kernel can set these extra fields using a new file in oprofilefs. I'm submitting the small kernel patch needed for that separately. This patch adds the infrastructure needed to declare these extra flags, and also adds some of them to the Sandy Bridge events files. Signed-off-by: Andi Kleen <ak...@li...> --- events/i386/sandybridge/unit_masks | 66 ++++++++++++++++++------------------ libop/op_events.c | 38 ++++++++++++++++++++ libop/op_events.h | 6 +++ libop/op_xml_events.c | 4 ++ libop/op_xml_out.c | 1 + libop/op_xml_out.h | 3 +- utils/opcontrol | 14 ++++++++ utils/ophelp.c | 54 +++++++++++++++++++++++++++++ 8 files changed, 152 insertions(+), 34 deletions(-) diff --git a/events/i386/sandybridge/unit_masks b/events/i386/sandybridge/unit_masks index 90b4bf5..c4e8823 100644 --- a/events/i386/sandybridge/unit_masks +++ b/events/i386/sandybridge/unit_masks @@ -28,15 +28,15 @@ name:dtlb_load_misses type:exclusive default:0x1 0x10 stlb_hit First level miss but second level hit; no page walk. name:int_misc type:exclusive default:0x40 0x40 rat_stall_cycles Cycles Resource Allocation Table (RAT) external stall is sent to Instruction Decode Queue (IDQ) for this thread. - 0x3 recovery_cycles Number of cycles waiting to be recover after Nuke due to all other cases except JEClear. - 0x3 recovery_stalls_count Number of occurrences / instances waiting to be recover after Nuke due to all other cases except JEClear. + 0x3 extra:cmask=1 recovery_cycles Number of cycles waiting to be recover after Nuke due to all other cases except JEClear. + 0x3 extra:cmask=1 recovery_stalls_count Number of occurrences / instances waiting to be recover after Nuke due to all other cases except JEClear. name:uops_issued type:exclusive default:0x1 0x1 any Number of Uops issued by the Resource Allocation Table (RAT) to the Reservation Station (RS) - 0x1 stall_cycles cycles no uops issued by this thread. - 0x1 core_stall_cycles cycles no uops issued on this core. + 0x1 extra:cmask=1,inv stall_cycles cycles no uops issued by this thread. + 0x1 extra:cmask=1,inv core_stall_cycles cycles no uops issued on this core. name:arith type:exclusive default:0x1 0x1 fpu_div_active Cycles that the divider is busy with any divide or sqrt operation. - 0x1 fpu_div Number of times that the divider is actived, includes INT, SIMD and FP. + 0x1 extra:cmask=1,edge fpu_div Number of times that the divider is actived, includes INT, SIMD and FP. name:l2_rqsts type:exclusive default:0x1 0x1 demand_data_rd_hit Demand Data Read hit L2, no rejects 0x4 rfo_hit RFO requests that hit L2 cache @@ -59,7 +59,7 @@ name:l2_l1d_wb_rqsts type:exclusive default:0x4 0x8 hit_m writebacks from L1D to L2 cache lines in M state name:l1d_pend_miss type:exclusive default:0x1 0x1 pending Cycles with L1D load Misses outstanding. - 0x1 occurences This event counts the number of L1D misses outstanding occurences. + 0x1 extra:cmask=1,edge occurences This event counts the number of L1D misses outstanding occurences. name:dtlb_store_misses type:exclusive default:0x1 0x1 miss_causes_a_walk Miss in all TLB levels causes an page walk of any page size (4K/2M/4M/1G) 0x2 walk_completed Miss in all TLB levels causes a page walk that completes of any page size (4K/2M/4M/1G) @@ -77,7 +77,7 @@ name:partial_rat_stalls type:exclusive default:0x20 0x20 flags_merge_uop Number of perf sensitive flags-merge uops added by Sandy Bridge u-arch. 0x40 slow_lea_window Number of cycles with at least 1 slow Load Effective Address (LEA) uop being allocated. 0x80 mul_single_uop Number of Multiply packed/scalar single precision uops allocated - 0x20 flags_merge_uop_cycles Cycles with perf sensitive flags-merge uops added by SandyBridge u-arch. + 0x20 extra:cmask=1 flags_merge_uop_cycles Cycles with perf sensitive flags-merge uops added by SandyBridge u-arch. name:resource_stalls2 type:exclusive default:0x40 0x40 bob_full Cycles Allocator is stalled due Branch Order Buffer (BOB). 0xf all_prf_control Resource stalls2 control structures full for physical registers @@ -85,17 +85,17 @@ name:resource_stalls2 type:exclusive default:0x40 0x4f ooo_rsrc Resource stalls2 control structures full Physical Register Reclaim Table (PRRT), Physical History Table (PHT), INT or SIMD Free List (FL), Branch Order Buffer (BOB) name:cpl_cycles type:exclusive default:0x1 0x1 ring0 Unhalted core cycles the Thread was in Rings 0. - 0x1 ring0_trans Transitions from ring123 to Ring0. + 0x1 extra:cmask=1,edge ring0_trans Transitions from ring123 to Ring0. 0x2 ring123 Unhalted core cycles the Thread was in Rings 1/2/3. name:offcore_requests_outstanding type:exclusive default:0x1 0x1 demand_data_rd Offcore outstanding Demand Data Read transactions in the SuperQueue (SQ), queue to uncore, every cycle. Includes L1D data hardware prefetches. - 0x1 cycles_with_demand_data_rd cycles there are Offcore outstanding RD data transactions in the SuperQueue (SQ), queue to uncore. + 0x1 extra:cmask=1 cycles_with_demand_data_rd cycles there are Offcore outstanding RD data transactions in the SuperQueue (SQ), queue to uncore. 0x2 demand_code_rd Offcore outstanding Code Reads transactions in the SuperQueue (SQ), queue to uncore, every cycle. 0x4 demand_rfo Offcore outstanding RFO (store) transactions in the SuperQueue (SQ), queue to uncore, every cycle. 0x8 all_data_rd Offcore outstanding all cacheable Core Data Read transactions in the SuperQueue (SQ), queue to uncore, every cycle. - 0x8 cycles_with_data_rd Cycles there are Offcore outstanding all Data read transactions in the SuperQueue (SQ), queue to uncore, every cycle. - 0x2 cycles_with_demand_code_rd Cycles with offcore outstanding Code Reads transactions in the SuperQueue (SQ), queue to uncore, every cycle. - 0x4 cycles_with_demand_rfo Cycles with offcore outstanding demand RFO Reads transactions in the SuperQueue (SQ), queue to uncore, every cycle. + 0x8 extra:cmask=1 cycles_with_data_rd Cycles there are Offcore outstanding all Data read transactions in the SuperQueue (SQ), queue to uncore, every cycle. + 0x2 extra:cmask=1 cycles_with_demand_code_rd Cycles with offcore outstanding Code Reads transactions in the SuperQueue (SQ), queue to uncore, every cycle. + 0x4 extra:cmask=1 cycles_with_demand_rfo Cycles with offcore outstanding demand RFO Reads transactions in the SuperQueue (SQ), queue to uncore, every cycle. name:lock_cycles type:exclusive default:0x1 0x1 split_lock_uc_lock_duration Cycles in which the L1D and L2 are locked, due to a UC lock or split lock 0x2 cache_lock_duration cycles that theL1D is locked @@ -106,15 +106,15 @@ name:idq type:exclusive default:0x2 0x10 ms_dsb_uops Number of Uops delivered into Instruction Decode Queue (IDQ) when MS_Busy, initiated by Decode Stream Buffer (DSB). 0x20 ms_mite_uops Number of Uops delivered into Instruction Decode Queue (IDQ) when MS_Busy, initiated by MITE. 0x30 ms_uops Number of Uops were delivered into Instruction Decode Queue (IDQ) from MS, initiated by Decode Stream Buffer (DSB) or MITE. - 0x30 ms_cycles Number of cycles that Uops were delivered into Instruction Decode Queue (IDQ) when MS_Busy, initiated by Decode Stream Buffer (DSB) or MITE. - 0x4 mite_cycles Cycles MITE is active - 0x8 dsb_cycles Cycles Decode Stream Buffer (DSB) is active - 0x10 ms_dsb_cycles Cycles Decode Stream Buffer (DSB) Microcode Sequenser (MS) is active - 0x10 ms_dsb_occur Occurences of Decode Stream Buffer (DSB) Microcode Sequenser (MS) going active - 0x18 all_dsb_cycles_any_uops Cycles Decode Stream Buffer (DSB) is delivering anything   - 0x18 all_dsb_cycles_4_uops Cycles Decode Stream Buffer (DSB) is delivering 4 Uops   - 0x24 all_mite_cycles_any_uops Cycles MITE is delivering anything    - 0x24 all_mite_cycles_4_uops Cycles MITE is delivering 4 Uops    + 0x30 extra:cmask=1 ms_cycles Number of cycles that Uops were delivered into Instruction Decode Queue (IDQ) when MS_Busy, initiated by Decode Stream Buffer (DSB) or MITE. + 0x4 extra:cmask=1 mite_cycles Cycles MITE is active + 0x8 extra:cmask=1 dsb_cycles Cycles Decode Stream Buffer (DSB) is active + 0x10 extra:cmask=1 ms_dsb_cycles Cycles Decode Stream Buffer (DSB) Microcode Sequenser (MS) is active + 0x10 extra:cmask=1,edge ms_dsb_occur Occurences of Decode Stream Buffer (DSB) Microcode Sequenser (MS) going active + 0x18 extra:cmask=1 all_dsb_cycles_any_uops Cycles Decode Stream Buffer (DSB) is delivering anything   + 0x18 extra:cmask=4 all_dsb_cycles_4_uops Cycles Decode Stream Buffer (DSB) is delivering 4 Uops   + 0x24 extra:cmask=1 all_mite_cycles_any_uops Cycles MITE is delivering anything    + 0x24 extra:cmask=4 all_mite_cycles_4_uops Cycles MITE is delivering 4 Uops    0x3c mite_all_uops Number of uops delivered to Instruction Decode Queue (IDQ) from any path. name:itlb_misses type:exclusive default:0x1 0x1 miss_causes_a_walk Miss in all TLB levels causes an page walk of any page size (4K/2M/4M) @@ -151,12 +151,12 @@ name:br_misp_exec type:exclusive default:0xff 0xd0 all_direct_near_call All mispredicted non-indirect calls name:idq_uops_not_delivered type:exclusive default:0x1 0x1 core Count number of non-delivered uops to Resource Allocation Table (RAT). - 0x1 cycles_0_uops_deliv.core Counts the cycles no uops were delivered - 0x1 cycles_le_1_uop_deliv.core Counts the cycles less than 1 uops were delivered - 0x1 cycles_le_2_uop_deliv.core Counts the cycles less than 2 uops were delivered - 0x1 cycles_le_3_uop_deliv.core Counts the cycles less than 3 uops were delivered - 0x1 cycles_ge_1_uop_deliv.core Cycles when 1 or more uops were delivered to the by the front end. - 0x1 cycles_fe_was_ok Counts cycles FE delivered 4 uops or Resource Allocation Table (RAT) was stalling FE. + 0x1 extra:cmask=4 cycles_0_uops_deliv.core Counts the cycles no uops were delivered + 0x1 extra:cmask=3 cycles_le_1_uop_deliv.core Counts the cycles less than 1 uops were delivered + 0x1 extra:cmask=2 cycles_le_2_uop_deliv.core Counts the cycles less than 2 uops were delivered + 0x1 extra:cmask=1 cycles_le_3_uop_deliv.core Counts the cycles less than 3 uops were delivered + 0x1 extra:cmask=4,inv cycles_ge_1_uop_deliv.core Cycles when 1 or more uops were delivered to the by the front end. + 0x1 extra:cmask=1,inv cycles_fe_was_ok Counts cycles FE delivered 4 uops or Resource Allocation Table (RAT) was stalling FE. name:uops_dispatched_port type:exclusive default:0x1 0x1 port_0 Cycles which a Uop is dispatched on port 0 0x2 port_1 Cycles which a Uop is dispatched on port 1 @@ -197,14 +197,14 @@ name:offcore_requests type:exclusive default:0x1 0x8 all_data_rd Offcore Demand and prefetch data reads returned to the core. name:uops_dispatched type:exclusive default:0x1 0x1 thread Counts total number of uops to be dispatched per-thread each cycle. - 0x1 stall_cycles Counts number of cycles no uops were dispatced to be executed on this thread. + 0x1 extra:cmask=1,inv stall_cycles Counts number of cycles no uops were dispatced to be executed on this thread. 0x2 core Counts total number of uops dispatched from any thread name:tlb_flush type:exclusive default:0x1 0x1 dtlb_thread Count number of DTLB flushes of thread-specific entries. 0x20 stlb_any Count number of any STLB flushes name:l1d_blocks type:exclusive default:0x1 0x1 ld_bank_conflict Any dispatched loads cancelled due to DCU bank conflict - 0x5 bank_conflict_cycles Cycles with l1d blocks due to bank conflicts + 0x5 extra:cmask=1 bank_conflict_cycles Cycles with l1d blocks due to bank conflicts name:other_assists type:exclusive default:0x2 0x2 itlb_miss_retired Instructions that experienced an ITLB miss. Non Pebs 0x10 avx_to_sse Number of transitions from AVX-256 to legacy SSE when penalty applicable Non Pebs @@ -212,9 +212,9 @@ name:other_assists type:exclusive default:0x2 name:uops_retired type:exclusive default:0x1 0x1 all All uops that actually retired. 0x2 retire_slots number of retirement slots used non PEBS - 0x1 stall_cycles Cycles no executable uops retired non PEBS - 0x1 total_cycles Number of cycles using always true condition applied to non PEBS uops retired event. - 0x1 core_stall_cycles Cycles no executable uops retired on core - non PEBS + 0x1 extra:cmask=1,inv stall_cycles Cycles no executable uops retired non PEBS + 0x1 extra:cmask=10,inv total_cycles Number of cycles using always true condition applied to non PEBS uops retired event. + 0x1 extra:cmask=1,inv core_stall_cycles Cycles no executable uops retired on core - non PEBS name:machine_clears type:exclusive default:0x2 0x2 memory_ordering Number of Memory Ordering Machine Clears detected. 0x4 smc Number of Self-modifying code (SMC) Machine Clears detected. @@ -236,7 +236,7 @@ name:br_misp_retired type:exclusive default:0x1 0x20 taken number of branch instructions retired that were mispredicted and taken.Non PEBS 0x4 all_branches_ps all macro branches (Precise Event) name:fp_assist type:exclusive default:0x1e - 0x1e any Counts any FP_ASSIST umask was incrementing. + 0x1e extra:cmask=1 any Counts any FP_ASSIST umask was incrementing. 0x2 x87_output output - Numeric Overflow, Numeric Underflow, Inexact Result Non Pebs 0x4 x87_input input - Invalid Operation, Denormal Operand, SNaN Operand Non Pebs 0x8 simd_output Any output SSE* FP Assist - Numeric Overflow, Numeric Underflow. Non Pebs diff --git a/libop/op_events.c b/libop/op_events.c index 61d1199..0ddc6bd 100644 --- a/libop/op_events.c +++ b/libop/op_events.c @@ -21,6 +21,7 @@ #include <string.h> #include <stdlib.h> #include <stdio.h> +#include <ctype.h> static LIST_HEAD(events_list); static LIST_HEAD(um_list); @@ -106,6 +107,34 @@ static void include_um(const char *start, const char *end) free(s); } +/* extra:cmask=12,inv,edge */ +unsigned parse_extra(const char *s) +{ + unsigned v, w; + int o; + + v = 0; + while (*s) { + if (isspace(*s)) + break; + if (strisprefix(s, "edge")) { + v |= EXTRA_EDGE; + s += 4; + } else if (strisprefix(s, "inv")) { + v |= EXTRA_INV; + s += 3; + } else if (sscanf(s, "cmask=%x%n", &w, &o) >= 1) { + v |= (w & EXTRA_CMASK_MASK) << EXTRA_CMASK_SHIFT; + s += o; + } else { + parse_error("Illegal extra field modifier"); + } + if (*s == ',') + ++s; + } + return v; +} + /* name:MESI type:bitmask default:0x0f */ static void parse_um(struct op_unit_mask * um, char const * line) { @@ -178,6 +207,7 @@ static void parse_um(struct op_unit_mask * um, char const * line) /* \t0x08 (M)odified cache state */ +/* \t0x08 extra:inv,cmask=... (M)odified cache state */ static void parse_um_entry(struct op_described_um * entry, char const * line) { char const * c = line; @@ -186,6 +216,14 @@ static void parse_um_entry(struct op_described_um * entry, char const * line) entry->value = parse_hex(c); c = skip_nonws(c); + c = skip_ws(c); + if (strisprefix(c, "extra:")) { + c += 6; + entry->extra = parse_extra(c); + c = skip_nonws(c); + } else + entry->extra = 0; + if (!*c) parse_error("invalid unit mask entry"); diff --git a/libop/op_events.h b/libop/op_events.h index 9ffdc49..3aaaba2 100644 --- a/libop/op_events.h +++ b/libop/op_events.h @@ -20,6 +20,11 @@ extern "C" { #include "op_types.h" #include "op_list.h" +#define EXTRA_EDGE (1U << 18) +#define EXTRA_INV (1U << 23) +#define EXTRA_CMASK_SHIFT 24 +#define EXTRA_CMASK_MASK 0xff + /** Describe an unit mask type. Events can optionally use a filter called * the unit mask. the mask type can be a bitmask or a discrete value */ enum unit_mask_type { @@ -39,6 +44,7 @@ struct op_unit_mask { enum unit_mask_type unit_type_mask; u32 default_mask; /**< only the gui use it */ struct op_described_um { + u32 extra; u32 value; char * desc; } um[MAX_UNIT_MASK]; diff --git a/libop/op_xml_events.c b/libop/op_xml_events.c index 1fcb01e..f573e02 100644 --- a/libop/op_xml_events.c +++ b/libop/op_xml_events.c @@ -103,6 +103,10 @@ void xml_help_for_event(struct op_event const * event) init_xml_str_attr(HELP_UNIT_MASK_DESC, event->unit->um[i].desc, buffer, MAX_BUFFER); + if (event->unit->um[i].extra) + init_xml_int_attr(HELP_UNIT_EXTRA_VALUE, + event->unit->um[i].extra, + buffer, MAX_BUFFER); close_xml_element(NONE, 0, buffer, MAX_BUFFER); } close_xml_element(HELP_UNIT_MASKS, 0, buffer, MAX_BUFFER); diff --git a/libop/op_xml_out.c b/libop/op_xml_out.c index f6d9042..cd09c87 100644 --- a/libop/op_xml_out.c +++ b/libop/op_xml_out.c @@ -84,6 +84,7 @@ char const * xml_tag_map[] = { "unit_mask", "mask", "desc" + "extra" }; #define MAX_BUF_LEN 2048 diff --git a/libop/op_xml_out.h b/libop/op_xml_out.h index 4fb06df..544bd51 100644 --- a/libop/op_xml_out.h +++ b/libop/op_xml_out.h @@ -57,7 +57,8 @@ typedef enum { HELP_UNIT_MASKS_CATEGORY, HELP_UNIT_MASK, HELP_UNIT_MASK_VALUE, - HELP_UNIT_MASK_DESC + HELP_UNIT_MASK_DESC, + HELP_UNIT_EXTRA_VALUE, } tag_t; char const * xml_tag_name(tag_t tag); diff --git a/utils/opcontrol b/utils/opcontrol index 8c64af9..d0cc569 100644 --- a/utils/opcontrol +++ b/utils/opcontrol @@ -1356,6 +1356,10 @@ do_param_setup() set_ctr_param $f enabled 0 set_ctr_param $f event 0 set_ctr_param $f count 0 + + if test -d $MOUNT/extra ; then + set_ctr_param $f extra 0 + fi done # Check if driver has IBS support @@ -1440,6 +1444,16 @@ do_param_setup() set_ctr_param $CTR kernel $KERNEL set_ctr_param $CTR user $USER set_ctr_param $CTR unit_mask $UNIT_MASK + + EXTRA=`$OPHELP --extra-mask $GOTEVENT` + if test "$EXTRA" -ne 0 ; then + if ! test -d $MOUNT/extra ; then + echo >&2 "Warning: $GOTEVENT has extra mask, but kernel does not support extra field" + echo >&2 "Please update your kernel or use a different event. Will miscount" + else + set_ctr_param $CTR extra $EXTRA + fi + fi fi OPROFILED_EVENTS=${OPROFILED_EVENTS}$EVENT:$EVENT_VAL: OPROFILED_EVENTS=${OPROFILED_EVENTS}$CTR:$COUNT:$UNIT_MASK: diff --git a/utils/ophelp.c b/utils/ophelp.c index 0f89d57..2453040 100644 --- a/utils/ophelp.c +++ b/utils/ophelp.c @@ -155,6 +155,21 @@ static void help_for_event(struct op_event * event) event->unit->um[j].value); column = 14; word_wrap(14, &column, event->unit->um[j].desc); + if (event->unit->um[j].extra) { + u32 extra = event->unit->um[j].extra; + + word_wrap(14, &column, "(extra:"); + if (extra & EXTRA_EDGE) + word_wrap(14, &column, "edge"); + if (extra & EXTRA_INV) + word_wrap(14, &column, "inv"); + if ((extra >> EXTRA_CMASK_SHIFT) & EXTRA_CMASK_MASK) { + snprintf(buf, sizeof buf, "cmask=%x", + (extra >> EXTRA_CMASK_SHIFT) & EXTRA_CMASK_MASK); + word_wrap(14, &column, buf); + } + word_wrap(14, &column, ")"); + } putchar('\n'); } } @@ -286,6 +301,37 @@ static void show_unit_mask(void) printf("%d\n", event->unit->default_mask); } +static void show_extra_mask(void) +{ + unsigned i; + struct op_event * event; + size_t count; + unsigned extra; + + count = parse_events(parsed_events, num_chosen_events, chosen_events); + if (count > 1) { + fprintf(stderr, "More than one event specified.\n"); + exit(EXIT_FAILURE); + } + + event = find_event_by_name(parsed_events[0].name, + parsed_events[0].unit_mask, + 1); + if (!event) { + fprintf(stderr, "No such event found.\n"); + exit(EXIT_FAILURE); + } + + /* Not exact match is nothing */ + extra = 0; + for (i = 0; i < event->unit->num; i++) + if (event->unit->um[i].value == (unsigned)parsed_events[0].unit_mask) { + extra = event->unit->um[i].extra; + break; + } + + printf ("%d\n", extra); +} static void show_default_event(void) { @@ -305,6 +351,7 @@ static int get_cpu_type; static int check_events; static int unit_mask; static int get_default_event; +static int extra_mask; static struct poptOption options[] = { { "cpu-type", 'c', POPT_ARG_STRING, &cpu_string, 0, @@ -323,6 +370,8 @@ static struct poptOption options[] = { "show version", NULL, }, { "xml", 'X', POPT_ARG_NONE, &want_xml, 0, "list events as XML", NULL, }, + { "extra-mask", 'E', POPT_ARG_NONE, &extra_mask, 0, + "print extra mask for event", NULL, }, POPT_AUTOHELP { NULL, 0, 0, NULL, 0, NULL, NULL, }, }; @@ -434,6 +483,11 @@ int main(int argc, char const * argv[]) exit(EXIT_SUCCESS); } + if (extra_mask) { + show_extra_mask(); + exit(EXIT_SUCCESS); + } + if (check_events) { resolve_events(); exit(EXIT_SUCCESS); -- 1.7.4 |