From: <gr...@re...> - 2002-09-24 03:10:35
|
hi, with last week's cleanups in place, the p4 patch has shrunk a bit (and become quite a bit nicer to work with). I've tested this on a Real Live model-1 P4, and an athlon for good measure. It appears to work against what's in CVS now; gui included (although it occasionally resizes the gui a little awkwardly with all these unit masks). if a ppro user might give it another whirl, I'd appreciate it, or any further comments / edits. -graydon --- ChangeLog Mon Sep 23 19:06:19 2002 +++ ChangeLog Mon Sep 23 19:05:50 2002 @@ -1,3 +1,24 @@ +2002-09-23 Graydon Hoare <gr...@re...> + + * dae/opd_sample_files.c: Change unit mask from 8 to 16 bits. + * gui/oprof_start.cpp: Change number of unit masks from 7 to 16. + * gui/ui/oprof_start.base.ui: Likewise. + * libop/op_cpu_type.c: Add P4 CPU type. + * libop/op_events.h: Change unit mask bit width, number. + * libop/op_events.c: Add P4 events, unit masks. + * libop_op_hw_config.h: Set OP_MAX_COUNTERS to 8. + * libop++/op_print_event.cpp: Change unit mask bit width. + * libop++/op_print_event.h: Likewise. + * module/oprofile.c: Add extra sysctls for counters 5-8. + * module/x86/Makefile.in: Add op_model_p4.o to obj list. + * module/x86/cpu_type.c: Change CPU identification to handle P4. + * module/x86/op_apic.c: (enable_apic): APIC_MAXLVT < 4, not != 4. + (check_cpu_ok): Accept CPU_P4. + * module/x86/op_model_p4.c: New file. + * module/x86/op_nmi.c: (get_model): Handle CPU_P4. + Add sysctl names for counters 5-8. + * module/x86/op_x86_model.h: Declare extern op_p4_spec. + 2002-09-24 Philippe Elie <ph...@wa...> * dae/opd_image.c: --- dae/opd_sample_files.c Sat Sep 7 14:19:34 2002 +++ dae/opd_sample_files.c Mon Sep 23 15:44:48 2002 @@ -28,7 +28,7 @@ extern int separate_samples; extern u32 ctr_count[OP_MAX_COUNTERS]; extern u8 ctr_event[OP_MAX_COUNTERS]; -extern u8 ctr_um[OP_MAX_COUNTERS]; +extern u16 ctr_um[OP_MAX_COUNTERS]; extern double cpu_speed; extern op_cpu cpu_type; --- gui/oprof_start.cpp Mon Sep 23 17:34:52 2002 +++ gui/oprof_start.cpp Mon Sep 23 17:19:01 2002 @@ -640,6 +640,15 @@ get_unit_mask_part(descr, 4, check4->isChecked(), mask); get_unit_mask_part(descr, 5, check5->isChecked(), mask); get_unit_mask_part(descr, 6, check6->isChecked(), mask); + get_unit_mask_part(descr, 7, check7->isChecked(), mask); + get_unit_mask_part(descr, 8, check8->isChecked(), mask); + get_unit_mask_part(descr, 9, check9->isChecked(), mask); + get_unit_mask_part(descr, 10, check10->isChecked(), mask); + get_unit_mask_part(descr, 11, check11->isChecked(), mask); + get_unit_mask_part(descr, 12, check12->isChecked(), mask); + get_unit_mask_part(descr, 13, check13->isChecked(), mask); + get_unit_mask_part(descr, 14, check14->isChecked(), mask); + get_unit_mask_part(descr, 15, check15->isChecked(), mask); return mask; } @@ -652,6 +661,15 @@ check4->hide(); check5->hide(); check6->hide(); + check7->hide(); + check8->hide(); + check9->hide(); + check10->hide(); + check11->hide(); + check12->hide(); + check13->hide(); + check14->hide(); + check15->hide(); } void oprof_start::setup_unit_masks(op_event_descr const & descr) @@ -677,6 +695,15 @@ case 4: check = check4; break; case 5: check = check5; break; case 6: check = check6; break; + case 7: check = check7; break; + case 8: check = check8; break; + case 9: check = check9; break; + case 10: check = check10; break; + case 11: check = check11; break; + case 12: check = check12; break; + case 13: check = check13; break; + case 14: check = check14; break; + case 15: check = check15; break; } check->setText(um->um[i].desc); if (um->unit_type_mask == utm_exclusive) { --- gui/ui/oprof_start.base.ui Tue Jul 23 21:14:15 2002 +++ gui/ui/oprof_start.base.ui Mon Sep 23 15:47:36 2002 @@ -410,6 +410,105 @@ <string>check6</string> </property> </widget> + <widget> + <class>QCheckBox</class> + <property stdset="1"> + <name>name</name> + <cstring>check7</cstring> + </property> + <property stdset="1"> + <name>text</name> + <string>check7</string> + </property> + </widget> + <widget> + <class>QCheckBox</class> + <property stdset="1"> + <name>name</name> + <cstring>check8</cstring> + </property> + <property stdset="1"> + <name>text</name> + <string>check8</string> + </property> + </widget> + <widget> + <class>QCheckBox</class> + <property stdset="1"> + <name>name</name> + <cstring>check9</cstring> + </property> + <property stdset="1"> + <name>text</name> + <string>check9</string> + </property> + </widget> + <widget> + <class>QCheckBox</class> + <property stdset="1"> + <name>name</name> + <cstring>check10</cstring> + </property> + <property stdset="1"> + <name>text</name> + <string>check10</string> + </property> + </widget> + <widget> + <class>QCheckBox</class> + <property stdset="1"> + <name>name</name> + <cstring>check11</cstring> + </property> + <property stdset="1"> + <name>text</name> + <string>check11</string> + </property> + </widget> + <widget> + <class>QCheckBox</class> + <property stdset="1"> + <name>name</name> + <cstring>check12</cstring> + </property> + <property stdset="1"> + <name>text</name> + <string>check12</string> + </property> + </widget> + <widget> + <class>QCheckBox</class> + <property stdset="1"> + <name>name</name> + <cstring>check13</cstring> + </property> + <property stdset="1"> + <name>text</name> + <string>check13</string> + </property> + </widget> + <widget> + <class>QCheckBox</class> + <property stdset="1"> + <name>name</name> + <cstring>check14</cstring> + </property> + <property stdset="1"> + <name>text</name> + <string>check14</string> + </property> + </widget> + <widget> + <class>QCheckBox</class> + <property stdset="1"> + <name>name</name> + <cstring>check15</cstring> + </property> + <property stdset="1"> + <name>text</name> + <string>check15</string> + </property> + </widget> <spacer> <property> <name>name</name> --- libop/op_cpu_type.c Mon Sep 23 17:34:52 2002 +++ libop/op_cpu_type.c Mon Sep 23 15:48:41 2002 @@ -49,7 +49,8 @@ "PIII", "Athlon", "CPU with timer interrupt", - "CPU with RTC device" + "CPU with RTC device", + "P4 / Xeon" }; @@ -76,7 +77,8 @@ 2, /* PIII */ 4, /* Athlon */ 1, /* Timer interrupt */ - 1 /* RTC */ + 1, /* RTC */ + 8 /* P4 / Xeon */ }; /** --- libop/op_cpu_type.h Mon Sep 23 17:34:52 2002 +++ libop/op_cpu_type.h Mon Sep 23 15:49:07 2002 @@ -25,6 +25,7 @@ CPU_ATHLON, /**< AMD P6 series */ CPU_TIMER_INT, /**< CPU using the timer interrupt */ CPU_RTC, /**< other CPU to use the RTC */ + CPU_P4, /**< Pentium 4 / Xeon series */ MAX_CPU_TYPE } op_cpu; --- libop/op_events.c Thu Sep 19 19:25:28 2002 +++ libop/op_events.c Mon Sep 23 16:11:32 2002 @@ -93,6 +93,244 @@ {0x1, "(I)nvalid cache state"}, {0x1f, "all MOESI cache state"} } }; +/* pentium 4 events */ + +/* BRANCH_RETIRED */ +static struct op_unit_mask um_branch_retired = + {4, utm_bitmask, 0x0c, + { {0x01, "branch not-taken predicted"}, + {0x02, "branch not-taken mispredicted"}, + {0x04, "branch taken predicted"}, + {0x08, "branch taken mispredicted"} } }; + +/* MISPRED_BRANCH_RETIRED */ +static struct op_unit_mask um_mispred_branch_retired = + {1, utm_bitmask, 0x01, + { {0x01, "retired instruction is non-bogus"} } }; + +/* TC_DELIVER_MODE */ +static struct op_unit_mask um_tc_deliver_mode = + {8, utm_bitmask, 0x01, + { {0x01, "both logical processors in deliver mode"}, + {0x02, "logical processor 0 in deliver mode, 1 in build mode"}, + {0x04, "logical processor 0 in deliver mode, 1 in halt/clear/trans mode"}, + {0x08, "logical processor 0 in build mode, 1 in deliver mode"}, + {0x10, "both logical processors in build mode"}, + {0x20, "logical processor 0 in build mode, 1 in halt/clear/trans mode"}, + {0x40, "logical processor 0 in halt/clear/trans mode, 1 in deliver mode"}, + {0x80, "logical processor 0 in halt/clear/trans mode, 1 in build mode"} } }; + +/* BPU_FETCH_REQUEST */ +static struct op_unit_mask um_bpu_fetch_request = + {1, utm_bitmask, 0x00, + {{0x01, "trace cache lookup miss"} } }; + +/* ITLB_REFERENCE */ +static struct op_unit_mask um_itlb_reference = + {3, utm_bitmask, 0x07, + { {0x01, "ITLB hit"}, + {0x02, "ITLB miss"}, + {0x04, "uncacheable ITLB hit"} } }; + +/* MEMORY_CANCEL */ +static struct op_unit_mask um_memory_cancel = + {2, utm_bitmask, 0x06, + { {0x02, "replayed because no store request buffer available"}, + {0x04, "conflicts due to 64k aliasing"} } }; + +/* MEMORY_COMPLETE */ +static struct op_unit_mask um_memory_complete = + {2, utm_bitmask, 0x03, + { {0x01, "load split completed, excluding UC/WC loads"}, + {0x02, "any split stores completed"} } }; + +/* LOAD_PORT_REPLAY */ +static struct op_unit_mask um_load_port_replay = + {1, utm_bitmask, 0x02, + { {0x02, "split load"} } }; + +/* STORE_PORT_REPLAY */ +static struct op_unit_mask um_store_port_replay = + {1, utm_bitmask, 0x02, + { {0x02, "split store"} } }; + +/* MOB_LOAD_REPLAY */ +static struct op_unit_mask um_mob_load_replay = + {4, utm_bitmask, 0x3a, + { {0x02, "replay cause: unknown store address"}, + {0x08, "replay cause: unknown store data"}, + {0x10, "replay cause: partial overlap between load and store"}, + {0x20, "replay cause: mismatched low 4 bits between load and store addr"} } }; + +/* PAGE_WALK_TYPE */ +static struct op_unit_mask um_page_walk_type = + {2, utm_bitmask, 0x03, + { {0x01, "page walk for data TLB miss"}, + {0x02, "page walk for instruction TLB miss"} } }; + +/* BSQ_CACHE_REFERENCE */ +static struct op_unit_mask um_bsq_cache_reference = + {9, utm_bitmask, 0x7ff, + { {0x01, "read 2nd level cache hit shared"}, + {0x02, "read 2nd level cache hit exclusive"}, + {0x04, "read 2nd level cache hit modified"}, + {0x08, "read 3rd level cache hit shared"}, + {0x10, "read 3rd level cache hit exclusive"}, + {0x20, "read 3rd level cache hit modified"}, + {0x100, "read 2nd level cache miss"}, + {0x200, "read 3rd level cache miss"}, + {0x400, "writeback lookup from DAC misses 2nd level cache"} } }; + +/* IOQ_ALLOCATION */ +/* IOQ_ACTIVE_ENTRIES */ +static struct op_unit_mask um_ioq = + {15, utm_bitmask, 0xefe1, + { {0x01, "bus request type bit 0"}, + {0x02, "bus request type bit 1"}, + {0x04, "bus request type bit 2"}, + {0x08, "bus request type bit 3"}, + {0x10, "bus request type bit 4"}, + {0x20, "count read entries"}, + {0x40, "count write entries"}, + {0x80, "count UC memory access entries"}, + {0x100, "count WC memory access entries"}, + {0x200, "count write-through memory access entries"}, + {0x400, "count write-protected memory access entries"}, + {0x800, "count WB memory access entries"}, + {0x2000, "count own store requests"}, + {0x4000, "count other / DMA store requests"}, + {0x8000, "count HW/SW prefetch requests"} } }; + +/* FSB_DATA_ACTIVITY */ +static struct op_unit_mask um_fsb_data_activity = + {6, utm_bitmask, 0x3f, + { {0x01, "count when this processor drives data onto bus"}, + {0x02, "count when this processor reads data from bus"}, + {0x04, "count when data is on bus but not sampled by this processor"}, + {0x08, "count when this processor reserves bus for driving"}, + {0x10, "count when other reserves bus and this processor will sample"}, + {0x20, "count when other reserves bus and this processor will not sample"} } }; + +/* BSQ_ALLOCATION */ +/* BSQ_ACTIVE_ENTRIES */ +static struct op_unit_mask um_bsq = + {13, utm_bitmask, 0x21, + { {0x01, "(r)eq (t)ype (e)ncoding, bit 0: see next event"}, + {0x02, "rte bit 1: 00=read, 01=read invalidate, 10=write, 11=writeback"}, + {0x04, "req len bit 0"}, + {0x08, "req len bit 1"}, + {0x20, "request type is input (0=output)"}, + {0x40, "request type is bus lock"}, + {0x80, "request type is cacheable"}, + {0x100, "request type is 8-byte chunk split across 8-byte boundary"}, + {0x200, "request type is demand (0=prefetch)"}, + {0x400, "request type is ordered"}, + {0x800, "(m)emory (t)ype (e)ncoding, bit 0: see next events"}, + {0x1000, "mte bit 1: see next event"}, + {0x2000, "mte bit 2: 000=UC, 001=USWC, 100=WT, 101=WP, 110=WB"} } }; + +/* X87_ASSIST */ +static struct op_unit_mask um_x87_assist = + {5, utm_bitmask, 0x1f, + { {0x01, "handle FP stack underflow"}, + {0x02, "handle FP stack overflow"}, + {0x04, "handle x87 output overflow"}, + {0x08, "handle x87 output underflow"}, + {0x10, "handle x87 input assist"} } }; + +/* SSE_INPUT_ASSIST */ +/* {PACKED,SCALAR}_{SP,DP}_UOP */ +/* {64,128}BIT_MMX_UOP */ +/* X87_FP_UOP */ +static struct op_unit_mask um_flame_uop = + {1, utm_bitmask, 0x8000, + { {0x8000, "count all uops of this type" } } }; + +/* X87_SIMD_MOVES_UOP */ +static struct op_unit_mask um_x87_simd_moves_uop = + {2, utm_bitmask, 0x18, + { { 0x08, "count all x87 SIMD store/move uops"}, + { 0x10, "count all x87 SIMD load uops"} } }; + +/* MACHINE_CLEAR */ +static struct op_unit_mask um_machine_clear = + {3, utm_bitmask, 0x1, + { {0x01, "count a portion of cycles the machine is cleared for any cause"}, + {0x40, "count cycles machine is cleared due to memory ordering issues"}, + {0x80, "count cycles machine is cleared due to self modifying code"} } }; + +/* GLOBAL_POWER_EVENTS */ +static struct op_unit_mask um_global_power_events = + {1, utm_bitmask, 0x1, + { {0x01, "count cycles when processor is active"} } }; + +/* TC_MS_XFER */ +static struct op_unit_mask um_tc_ms_xfer = + {1, utm_bitmask, 0x1, + { {0x01, "count TC to MS transfers"} } }; + +/* UOP_QUEUE_WRITES */ +static struct op_unit_mask um_uop_queue_writes = + {3, utm_bitmask, 0x7, + { {0x01, "count uops written to queue from TC build mode"}, + {0x02, "count uops written to queue from TC deliver mode"}, + {0x04, "count uops written to queue from microcode ROM" } } }; + +/* FRONT_END_EVENT */ +static struct op_unit_mask um_front_end_event = + {2, utm_bitmask, 0x1, + { {0x01, "count marked uops which are non-bogus"}, + {0x02, "count marked uops which are bogus"} } }; + +/* EXECUTION_EVENT */ +static struct op_unit_mask um_execution_event = + {8, utm_bitmask, 0x1, + { {0x01, "count 1st marked uops which are non-bogus"}, + {0x02, "count 2ns marked uops which are non-bogus"}, + {0x04, "count 3rd marked uops which are non-bogus"}, + {0x08, "count 4th marked uops which are non-bogus"}, + {0x10, "count 1st marked uops which are bogus"}, + {0x20, "count 2nd marked uops which are bogus"}, + {0x40, "count 3rd marked uops which are bogus"}, + {0x80, "count 4th marked uops which are bogus"} } }; + +/* REPLAY_EVENT */ +static struct op_unit_mask um_replay_event = + {2, utm_bitmask, 0x1, + { {0x01, "count marked uops which are non-bogus"}, + {0x02, "count marked uops which are bogus"} } }; + +/* INSTR_RETIRED */ +static struct op_unit_mask um_instr_retired = + {4, utm_bitmask, 0x1, + { {0x01, "count non-bogus instructions which are not tagged"}, + {0x02, "count non-bogus instructions which are tagged"}, + {0x04, "count bogus instructions which are not tagged"}, + {0x08, "count bogus instructions which are tagged"} } }; + +/* UOPS_RETIRED */ +static struct op_unit_mask um_uops_retired = + {2, utm_bitmask, 0x1, + { {0x01, "count marked uops which are non-bogus"}, + {0x02, "count marked uops which are bogus"} } }; + +/* UOP_TYPE */ +static struct op_unit_mask um_uop_type = + {2, utm_bitmask, 0x2, + { {0x02, "count uops which are load operations"}, + {0x04, "count uops which are store operations"} } }; + +/* RETIRED_MISPRED_BRANCH_TYPE */ +/* RETIRED_BRANCH_TYPE */ +static struct op_unit_mask um_branch_type = + {4, utm_bitmask, 0x1e, + { {0x02, "count conditional jumps"}, + {0x04, "count indirect call branches"}, + {0x08, "count return branches"}, + {0x10, "count returns, indirect calls or indirect jumps"} } }; + + + /* the following are just short cut for filling the table of event */ #define OP_RTC (1 << CPU_RTC) #define OP_ATHLON (1 << CPU_ATHLON) @@ -102,9 +340,33 @@ #define OP_PII_PIII (OP_PII | OP_PIII) #define OP_IA_ALL (OP_PII_PIII | OP_PPRO) +#define OP_P4 (1 << CPU_P4) + #define CTR_0 (1 << 0) #define CTR_1 (1 << 1) +/* the pentium 4 has a complex set of restrictions between its 18 + counters, so we simplify it a little and say there are 8 counters. these + 8 at least can be treated as entirely independent, although they can + each only count certain classes of events. these defines are also + present in module/x86/op_nmi.c. */ + +#define CTR_BPU_0 (1 << 0) +#define CTR_BPU_2 (1 << 1) +#define CTR_BPU_ALL (CTR_BPU_0 | CTR_BPU_2) + +#define CTR_MS_0 (1 << 2) +#define CTR_MS_2 (1 << 3) +#define CTR_MS_ALL (CTR_MS_0 | CTR_MS_2) + +#define CTR_FLAME_0 (1 << 4) +#define CTR_FLAME_2 (1 << 5) +#define CTR_FLAME_ALL (CTR_FLAME_0 | CTR_FLAME_2) + +#define CTR_IQ_4 (1 << 6) /* #4 for compatibility with PEBS */ +#define CTR_IQ_5 (1 << 7) +#define CTR_IQ_ALL (CTR_IQ_4 | CTR_IQ_5) + /* ctr allowed, allowed cpus, Event #, unit mask, name, min event value */ /* event name must be in one word */ @@ -340,6 +602,86 @@ { CTR_ALL, OP_ATHLON, 0xcf, &um_empty, "HARDWARE_INTERRUPTS", "Number of taken hardware interrupts", 10,}, + /* pentium 4 events */ + { CTR_IQ_ALL, OP_P4, 0x01, &um_branch_retired, "BRANCH_RETIRED", + "retired branches", 3000}, + { CTR_IQ_ALL, OP_P4, 0x02, &um_mispred_branch_retired, "MISPRED_BRANCH_RETIRED", + "retired mispredicted branches", 3000}, + { CTR_MS_ALL, OP_P4, 0x03, &um_tc_deliver_mode, "TC_DELIVER_MODE", + "duration (in clock cycles) in the trace cache and decode engine", 3000}, + { CTR_BPU_ALL, OP_P4, 0x04, &um_bpu_fetch_request, "BPU_FETCH_REQUEST", + "instruction fetch requests from the branch predict unit", 3000}, + { CTR_BPU_ALL, OP_P4, 0x05, &um_itlb_reference, "ITLB_REFERENCE", + "translations using the instruction translation lookaside buffer", 3000}, + { CTR_FLAME_ALL, OP_P4, 0x06, &um_memory_cancel, "MEMORY_CANCEL", + "cancelled requesets in data cache address control unit", 3000}, + { CTR_FLAME_ALL, OP_P4, 0x07, &um_memory_complete, "MEMORY_COMPLETE", + "completed load split, store split, uncacheable split, uncacheable load", 3000}, + { CTR_FLAME_ALL, OP_P4, 0x08, &um_load_port_replay, "LOAD_PORT_REPLAY", + "replayed events at the load port", 3000}, + { CTR_FLAME_ALL, OP_P4, 0x09, &um_store_port_replay, "STORE_PORT_REPLAY", + "replayed events at the store port", 3000}, + { CTR_BPU_ALL, OP_P4, 0x0a, &um_mob_load_replay, "MOB_LOAD_REPLAY", + "replayed loads from the memory order buffer", 3000}, + { CTR_BPU_ALL, OP_P4, 0x0b, &um_page_walk_type, "PAGE_WALK_TYPE", + "page walks by the page miss handler", 3000}, + { CTR_BPU_ALL, OP_P4, 0x0c, &um_bsq_cache_reference, "BSQ_CACHE_REFERENCE", + "cache references seen by the bus unit", 3000}, + { CTR_BPU_0, OP_P4, 0x0d, &um_ioq, "IOQ_ALLOCATION", + "bus transactions", 3000}, + { CTR_BPU_2, OP_P4, 0x0e, &um_ioq, "IOQ_ACTIVE_ENTRIES", + "number of entries in the IOQ which are active", 3000}, + { CTR_BPU_ALL, OP_P4, 0x0f, &um_fsb_data_activity, "FSB_DATA_ACTIVITY", + "DRDY or DBSY events on the front side bus", 3000}, + { CTR_BPU_0, OP_P4, 0x10, &um_bsq, "BSQ_ALLOCATION", + "allocations in the bus sequence unit", 3000}, + { CTR_BPU_2, OP_P4, 0x11, &um_bsq, "BSQ_ACTIVE_ENTRIES", + "number of entries in the bus sequence unit which are active", 3000}, + { CTR_IQ_ALL, OP_P4, 0x12, &um_x87_assist, "X87_ASSIST", + "retired x87 instructions which required special handling", 3000}, + { CTR_FLAME_ALL, OP_P4, 0x13, &um_flame_uop, "SSE_INPUT_ASSIST", + "input assists requested for SSE or SSE2 operands", 3000}, + { CTR_FLAME_ALL, OP_P4, 0x14, &um_flame_uop, "PACKED_SP_UOP", + "packed single precision uops", 3000}, + { CTR_FLAME_ALL, OP_P4, 0x15, &um_flame_uop, "PACKED_DP_UOP", + "packed double precision uops", 3000}, + { CTR_FLAME_ALL, OP_P4, 0x16, &um_flame_uop, "SCALAR_SP_UOP", + "scalar single precision uops", 3000}, + { CTR_FLAME_ALL, OP_P4, 0x17, &um_flame_uop, "SCALAR_DP_UOP", + "scalar double presision uops", 3000}, + { CTR_FLAME_ALL, OP_P4, 0x18, &um_flame_uop, "64BIT_MMX_UOP", + "64 bit SIMD MMX instructions", 3000}, + { CTR_FLAME_ALL, OP_P4, 0x19, &um_flame_uop, "128BIT_MMX_UOP", + "128 bit SIMD SSE2 instructions", 3000}, + { CTR_FLAME_ALL, OP_P4, 0x1a, &um_flame_uop, "X87_FP_UOP", + "x87 floating point uops", 3000}, + { CTR_FLAME_ALL, OP_P4, 0x1b, &um_x87_simd_moves_uop, "X87_SIMD_MOVES_UOP", + "x87 FPU, MMX, SSE, or SSE2 loads, stores and reg-to-reg moves", 3000}, + { CTR_IQ_ALL, OP_P4, 0x1c, &um_machine_clear, "MACHINE_CLEAR", + "cycles with entire machine pipeline cleared", 3000}, + { CTR_BPU_ALL, OP_P4, 0x1d, &um_global_power_events, "GLOBAL_POWER_EVENTS", + "time during which processor is not stopped", 3000}, + { CTR_MS_ALL, OP_P4, 0x1e, &um_tc_ms_xfer, "TC_MS_XFER", + "number of times uops deliver changed from TC to MS ROM", 3000}, + { CTR_MS_ALL, OP_P4, 0x1f, &um_uop_queue_writes, "UOP_QUEUE_WRITES", + "number of valid uops written to the uop queue", 3000}, + { CTR_IQ_ALL, OP_P4, 0x20, &um_front_end_event, "FRONT_END_EVENT", + "retired uops, tagged with front-end tagging", 3000}, + { CTR_IQ_ALL, OP_P4, 0x21, &um_execution_event, "EXECUTION_EVENT", + "retired uops, tagged with execution tagging", 3000}, + { CTR_IQ_ALL, OP_P4, 0x22, &um_replay_event, "REPLAY_EVENT", + "retired uops, tagged with replay tagging", 3000}, + { CTR_IQ_ALL, OP_P4, 0x23, &um_instr_retired, "INSTR_RETIRED", + "retired instructions", 3000}, + { CTR_IQ_ALL, OP_P4, 0x24, &um_uops_retired, "UOPS_RETIRED", + "retired uops", 3000}, + { CTR_IQ_ALL, OP_P4, 0x25, &um_uop_type, "UOP_TYPE", + "type of uop tagged by front-end tagging", 3000}, + { CTR_MS_ALL, OP_P4, 0x26, &um_branch_type, "RETIRED_MISPRED_BRANCH_TYPE", + "retired mispredicted branched, selected by type", 3000}, + { CTR_MS_ALL, OP_P4, 0x27, &um_branch_type, "RETIRED_BRANCH_TYPE", + "retired branches, selected by type", 3000}, + /* other CPUs */ { CTR_0, OP_RTC, 0xff, &um_empty, "RTC_Interrupts", "RTC interrupts/sec (rounded up to power of two)", 2,}, @@ -362,7 +704,7 @@ * > 0 otherwise, in this case allow->um[return value - 1] == um so the * caller can access to the description of the unit_mask. */ -int op_check_unit_mask(struct op_unit_mask const * allow, u8 um) +int op_check_unit_mask(struct op_unit_mask const * allow, u16 um) { u32 i, mask; @@ -439,10 +781,12 @@ * * 3 AMD Athlon * + * 6 Pentium 4 / Xeon + * * The function returns bitmask of failure cause * 0 otherwise */ -int op_check_events(int ctr, u8 ctr_type, u8 ctr_um, op_cpu cpu_type) +int op_check_events(int ctr, u8 ctr_type, u16 ctr_um, op_cpu cpu_type) { int ret = OP_OK_EVENT; u32 i; --- libop/op_events.h Thu Sep 19 19:25:29 2002 +++ libop/op_events.h Mon Sep 23 16:11:07 2002 @@ -40,12 +40,12 @@ struct op_unit_mask { u32 num; /**< number of possible unit masks */ enum unit_mask_type unit_type_mask; - u8 default_mask; /**< only the gui use it */ + u16 default_mask; /**< only the gui use it */ /** up to sixteen allowed unit masks */ struct op_described_um { - u8 value; + u16 value; char const * desc; - } um[7]; + } um[16]; }; /** Describe an event. */ @@ -81,7 +81,7 @@ * * \sa op_cpu, OP_EVENTS_OK */ -int op_check_events(int ctr, u8 ctr_type, u8 ctr_um, op_cpu cpu_type); +int op_check_events(int ctr, u8 ctr_type, u16 ctr_um, op_cpu cpu_type); /** * sanity check unit mask value @@ -99,7 +99,7 @@ * the unit_mask through op_unit_descs * \sa op_unit_descs */ -int op_check_unit_mask(struct op_unit_mask const * allow, u8 um); +int op_check_unit_mask(struct op_unit_mask const * allow, u16 um); /** a special constant meaning this event is available for all counters */ #define CTR_ALL (~0u) --- libop/op_hw_config.h Sat Sep 7 14:19:35 2002 +++ libop/op_hw_config.h Mon Sep 23 16:00:00 2002 @@ -16,7 +16,7 @@ * use of this variable is for static/local array dimension. Never use it in * loop or in array index access/index checking unless you know what you * made. Don't change it without updating OP_BITS_CTR! */ -#define OP_MAX_COUNTERS 4 +#define OP_MAX_COUNTERS 8 /** a plain unsigned int magic value to check against counter overflow */ #define OP_COUNT_MAX ~0u --- libop++/op_print_event.cpp Thu Sep 19 19:25:29 2002 +++ libop++/op_print_event.cpp Mon Sep 23 16:00:44 2002 @@ -23,7 +23,7 @@ using std::setfill; void op_print_event(ostream & out, int counter_nr, op_cpu cpu_type, - u8 type, u8 um, u32 count) + u8 type, u16 um, u32 count) { char const * typenamep; char const * typedescp; --- libop++/op_print_event.h Mon Sep 23 17:34:52 2002 +++ libop++/op_print_event.h Mon Sep 23 16:00:51 2002 @@ -23,6 +23,6 @@ * to the stream. */ void op_print_event(std::ostream & out, int counter_nr, - op_cpu cpu_type, u8 type, u8 um, u32 count); + op_cpu cpu_type, u8 type, u16 um, u32 count); #endif // OP_PRINT_EVENT --- module/oprofile.c Mon Sep 23 17:34:52 2002 +++ module/oprofile.c Mon Sep 23 16:01:58 2002 @@ -796,7 +796,7 @@ { 1, "nr_interrupts", &sysctl.nr_interrupts, sizeof(int), 0444, NULL, &get_nr_interrupts, NULL, }, { 1, "notesize", &sysctl_parms.note_size, sizeof(int), 0644, NULL, &lproc_dointvec, NULL, }, { 1, "cpu_type", &sysctl.cpu_type, sizeof(int), 0444, NULL, &lproc_dointvec, NULL, }, - { 0, }, { 0, }, { 0, }, { 0, }, + { 0, }, { 0, }, { 0, }, { 0, }, { 0, }, { 0, }, { 0, }, { 0, }, { 0, }, }; --- module/x86/Makefile.in Mon Sep 23 17:34:52 2002 +++ module/x86/Makefile.in Mon Sep 23 14:04:33 2002 @@ -19,7 +19,7 @@ obj-y := cpu_type.o op_apic.o op_fixmap.o op_rtc.o op_nmi.o \ op_syscalls.o op_model_ppro.o op_model_athlon.o \ - oprofile_nmi.o + op_model_p4.o oprofile_nmi.o obj-m := $(O_TARGET) O_OBJS := $(obj-y) M_OBJS := $(O_TARGET) --- module/x86/cpu_type.c Sat Sep 14 22:36:31 2002 +++ module/x86/cpu_type.c Mon Sep 23 16:11:46 2002 @@ -36,17 +36,25 @@ return CPU_ATHLON; case X86_VENDOR_INTEL: - /* Less than a P6-class processor */ - if (family != 6) + switch (family) { + default: return CPU_RTC; - - if (model > 5) - return CPU_PIII; - else if (model > 2) - return CPU_PII; - - return CPU_PPRO; - + case 6: + /* A P6-class processor */ + if (model > 5) + return CPU_PIII; + else if (model > 2) + return CPU_PII; + return CPU_PPRO; + case 0xf: + if (model <= 3) + /* a Pentium 4 processor */ + return CPU_P4; + else + /* Do not know what it is */ + return CPU_RTC; + } + default: return CPU_RTC; } --- module/x86/op_apic.c Sat Sep 7 14:19:39 2002 +++ module/x86/op_apic.c Mon Sep 23 16:08:45 2002 @@ -124,7 +124,7 @@ goto not_local_p6_apic; /* LVT0,LVT1,LVTT,LVTPC */ - if (GET_APIC_MAXLVT(apic_read(APIC_LVR)) != 4) + if (GET_APIC_MAXLVT(apic_read(APIC_LVR)) < 4) goto not_local_p6_apic; /* IA32 V3, 7.4.14.1 */ @@ -190,7 +190,8 @@ if (sysctl.cpu_type != CPU_PPRO && sysctl.cpu_type != CPU_PII && sysctl.cpu_type != CPU_PIII && - sysctl.cpu_type != CPU_ATHLON) + sysctl.cpu_type != CPU_ATHLON && + sysctl.cpu_type != CPU_P4) return 0; return 1; --- module/x86/op_model_p4.c Wed Dec 31 19:00:00 1969 +++ module/x86/op_model_p4.c Mon Sep 23 22:09:12 2002 @@ -0,0 +1,493 @@ +/** + * @file op_model_athlon.h + * P4 model-specific MSR operations + * + * @remark Copyright 2002 OProfile authors + * @remark Read the file COPYING + * + * @author Graydon Hoare + */ + +#include "op_x86_model.h" +#include "op_msr.h" +#include "op_apic.h" + +#define NUM_COUNTERS 8 +#define NUM_ESCRS 45 +#define NUM_CCCRS 18 +#define NUM_CONTROLS (NUM_ESCRS + NUM_CCCRS) + +/* tables to simulate simplified hardware view of p4 registers */ +struct p4_counter_binding { + int virt_counter; + int counter_address; + int cccr_address; +}; + +struct p4_event_binding { + int escr_select; /* value to put in CCCR */ + int event_select; /* value to put in ESCR */ + struct { + int virt_counter; /* for this counter... */ + int escr_address; /* use this ESCR */ + } bindings[2]; +}; + +/* nb: these CTR_* defines are a duplicate of defines in + libop/op_events.c. */ + +#define CTR_BPU_0 (1 << 0) +#define CTR_BPU_2 (1 << 1) +#define CTR_MS_0 (1 << 2) +#define CTR_MS_2 (1 << 3) +#define CTR_FLAME_0 (1 << 4) +#define CTR_FLAME_2 (1 << 5) +#define CTR_IQ_4 (1 << 6) +#define CTR_IQ_5 (1 << 7) + +struct p4_counter_binding p4_counters [NUM_COUNTERS] = { + { CTR_BPU_0, MSR_P4_BPU_PERFCTR0, MSR_P4_BPU_CCCR0 }, + { CTR_BPU_2, MSR_P4_BPU_PERFCTR2, MSR_P4_BPU_CCCR2 }, + { CTR_MS_0, MSR_P4_MS_PERFCTR0, MSR_P4_MS_CCCR0 }, + { CTR_MS_2, MSR_P4_MS_PERFCTR2, MSR_P4_MS_CCCR2 }, + { CTR_FLAME_0, MSR_P4_FLAME_PERFCTR0, MSR_P4_FLAME_CCCR0 }, + { CTR_FLAME_2, MSR_P4_FLAME_PERFCTR2, MSR_P4_FLAME_CCCR2 }, + { CTR_IQ_4, MSR_P4_IQ_PERFCTR4, MSR_P4_IQ_CCCR4 }, + { CTR_IQ_5, MSR_P4_IQ_PERFCTR5, MSR_P4_IQ_CCCR5 }, +}; + +/* p4 event codes in libop/op_event.h are indices into this table. */ + +struct p4_event_binding p4_events[0x27] = { + + { /* BRANCH_RETIRED */ + 0x05, 0x06, + { {CTR_IQ_4, MSR_P4_CRU_ESCR2}, + {CTR_IQ_5, MSR_P4_CRU_ESCR3} } + }, + + { /* MISPRED_BRANCH_RETIRED */ + 0x04, 0x03, + { { CTR_IQ_4, MSR_P4_CRU_ESCR0}, + { CTR_IQ_5, MSR_P4_CRU_ESCR1} } + }, + + { /* TC_DELIVER_MODE */ + 0x04, 0x01, + { { CTR_MS_0, MSR_P4_TC_ESCR0}, + { CTR_MS_2, MSR_P4_TC_ESCR1} } + }, + + { /* BPU_FETCH_REQUEST */ + 0x00, 0x03, + { { CTR_BPU_0, MSR_P4_BPU_ESCR0}, + { CTR_BPU_2, MSR_P4_BPU_ESCR1} } + }, + + { /* ITLB_REFERENCE */ + 0x03, 0x18, + { { CTR_BPU_0, MSR_P4_ITLB_ESCR0}, + { CTR_BPU_2, MSR_P4_ITLB_ESCR1} } + }, + + { /* MEMORY_CANCEL */ + 0x05, 0x02, + { { CTR_FLAME_0, MSR_P4_DAC_ESCR0}, + { CTR_FLAME_2, MSR_P4_DAC_ESCR1} } + }, + + { /* MEMORY_COMPLETE */ + 0x02, 0x08, + { { CTR_FLAME_0, MSR_P4_SAAT_ESCR0}, + { CTR_FLAME_2, MSR_P4_SAAT_ESCR1} } + }, + + { /* LOAD_PORT_REPLAY */ + 0x02, 0x04, + { { CTR_FLAME_0, MSR_P4_SAAT_ESCR0}, + { CTR_FLAME_2, MSR_P4_SAAT_ESCR1} } + }, + + { /* STORE_PORT_REPLAY */ + 0x02, 0x05, + { { CTR_FLAME_0, MSR_P4_SAAT_ESCR0}, + { CTR_FLAME_2, MSR_P4_SAAT_ESCR1} } + }, + + { /* MOB_LOAD_REPLAY */ + 0x02, 0x03, + { { CTR_BPU_0, MSR_P4_MOB_ESCR0}, + { CTR_BPU_2, MSR_P4_MOB_ESCR1} } + }, + + { /* PAGE_WALK_TYPE */ + 0x04, 0x01, + { { CTR_BPU_0, MSR_P4_PMH_ESCR0}, + { CTR_BPU_2, MSR_P4_PMH_ESCR1} } + }, + + { /* BSQ_CACHE_REFERENCE */ + 0x07, 0x0c, + { { CTR_BPU_0, MSR_P4_BSU_ESCR0}, + { CTR_BPU_2, MSR_P4_BSU_ESCR1} } + }, + + { /* IOQ_ALLOCATION */ + 0x06, 0x03, + { { CTR_BPU_0, MSR_P4_FSB_ESCR0}, + {-1,-1} } + }, + + { /* IOQ_ACTIVE_ENTRIES */ + 0x06, 0x1a, + { { CTR_BPU_2, MSR_P4_FSB_ESCR1}, + {-1,-1} } + }, + + { /* FSB_DATA_ACTIVITY */ + 0x06, 0x17, + { { CTR_BPU_0, MSR_P4_FSB_ESCR0}, + { CTR_BPU_2, MSR_P4_FSB_ESCR1} } + }, + + { /* BSQ_ALLOCATION */ + 0x07, 0x05, + { { CTR_BPU_0, MSR_P4_BSU_ESCR0}, + {-1,-1} } + }, + + { /* BSQ_ACTIVE_ENTRIES */ + 0x07, 0x06, + { { CTR_BPU_2, MSR_P4_BSU_ESCR1 /* guess */}, + {-1,-1} } + }, + + { /* X87_ASSIST */ + 0x05, 0x03, + { { CTR_IQ_4, MSR_P4_CRU_ESCR2}, + { CTR_IQ_5, MSR_P4_CRU_ESCR3} } + }, + + { /* SSE_INPUT_ASSIST */ + 0x01, 0x34, + { { CTR_FLAME_0, MSR_P4_FIRM_ESCR0}, + { CTR_FLAME_2, MSR_P4_FIRM_ESCR1} } + }, + + { /* PACKED_SP_UOP */ + 0x01, 0x08, + { { CTR_FLAME_0, MSR_P4_FIRM_ESCR0}, + { CTR_FLAME_2, MSR_P4_FIRM_ESCR1} } + }, + + { /* PACKED_DP_UOP */ + 0x01, 0x0c, + { { CTR_FLAME_0, MSR_P4_FIRM_ESCR0}, + { CTR_FLAME_2, MSR_P4_FIRM_ESCR1} } + }, + + { /* SCALAR_SP_UOP */ + 0x01, 0x0a, + { { CTR_FLAME_0, MSR_P4_FIRM_ESCR0}, + { CTR_FLAME_2, MSR_P4_FIRM_ESCR1} } + }, + + { /* SCALAR_DP_UOP */ + 0x01, 0x0e, + { { CTR_FLAME_0, MSR_P4_FIRM_ESCR0}, + { CTR_FLAME_2, MSR_P4_FIRM_ESCR1} } + }, + + { /* 64BIT_MMX_UOP */ + 0x01, 0x02, + { { CTR_FLAME_0, MSR_P4_FIRM_ESCR0}, + { CTR_FLAME_2, MSR_P4_FIRM_ESCR1} } + }, + + { /* 128BIT_MMX_UOP */ + 0x01, 0x1a, + { { CTR_FLAME_0, MSR_P4_FIRM_ESCR0}, + { CTR_FLAME_2, MSR_P4_FIRM_ESCR1} } + }, + + { /* X87_FP_UOP */ + 0x01, 0x04, + { { CTR_FLAME_0, MSR_P4_FIRM_ESCR0}, + { CTR_FLAME_2, MSR_P4_FIRM_ESCR1} } + }, + + { /* X87_SIMD_MOVES_UOP */ + 0x01, 0x2e, + { { CTR_FLAME_0, MSR_P4_FIRM_ESCR0}, + { CTR_FLAME_2, MSR_P4_FIRM_ESCR1} } + }, + + { /* MACHINE_CLEAR */ + 0x05, 0x02, + { { CTR_IQ_4, MSR_P4_CRU_ESCR2}, + { CTR_IQ_5, MSR_P4_CRU_ESCR3} } + }, + + { /* GLOBAL_POWER_EVENTS */ + 0x06, 0x13 /* manual says 0x05 */, + { { CTR_BPU_0, MSR_P4_FSB_ESCR0}, + { CTR_BPU_2, MSR_P4_FSB_ESCR1} } + }, + + { /* TC_MS_XFER */ + 0x00, 0x05, + { { CTR_MS_0, MSR_P4_MS_ESCR0}, + { CTR_MS_2, MSR_P4_MS_ESCR1} } + }, + + { /* UOP_QUEUE_WRITES */ + 0x00, 0x09, + { { CTR_MS_0, MSR_P4_MS_ESCR0}, + { CTR_MS_2, MSR_P4_MS_ESCR1} } + }, + + { /* FRONT_END_EVENT */ + 0x05, 0x08, + { { CTR_IQ_4, MSR_P4_CRU_ESCR2}, + { CTR_IQ_5, MSR_P4_CRU_ESCR3} } + }, + + { /* EXECUTION_EVENT */ + 0x05, 0x0c, + { { CTR_IQ_4, MSR_P4_CRU_ESCR2}, + { CTR_IQ_5, MSR_P4_CRU_ESCR3} } + }, + + { /* REPLAY_EVENT */ + 0x05, 0x09, + { { CTR_IQ_4, MSR_P4_CRU_ESCR2}, + { CTR_IQ_5, MSR_P4_CRU_ESCR3} } + }, + + { /* INSTR_RETIRED */ + 0x04, 0x02, + { { CTR_IQ_4, MSR_P4_CRU_ESCR0}, + { CTR_IQ_5, MSR_P4_CRU_ESCR1} } + }, + + { /* UOPS_RETIRED */ + 0x04, 0x01, + { { CTR_IQ_4, MSR_P4_CRU_ESCR0}, + { CTR_IQ_5, MSR_P4_CRU_ESCR1} } + }, + + { /* UOP_TYPE */ + 0x02, 0x02, + { { CTR_IQ_4, MSR_P4_RAT_ESCR0}, + { CTR_IQ_5, MSR_P4_RAT_ESCR1} } + }, + + { /* RETIRED_MISPRED_BRANCH_TYPE */ + 0x02, 0x05, + { { CTR_MS_0, MSR_P4_TBPU_ESCR0}, + { CTR_MS_2, MSR_P4_TBPU_ESCR1} } + }, + + { /* RETIRED_BRANCH_TYPE */ + 0x02, 0x04, + { { CTR_MS_0, MSR_P4_TBPU_ESCR0}, + { CTR_MS_2, MSR_P4_TBPU_ESCR1} } + } +}; + + +#define MISC_PMC_ENABLED_P(x) ((x) & 1 << 7) + +#define ESCR_RESERVED_BITS 0x80000003 +#define ESCR_CLEAR(escr) ((escr) &= ESCR_RESERVED_BITS) +#define ESCR_SET_USR_0(escr, usr) ((escr) |= (((usr) & 1) << 2)) +#define ESCR_SET_OS_0(escr, os) ((escr) |= (((os) & 1) << 3)) +#define ESCR_SET_EVENT_SELECT(escr, sel) ((escr) |= (((sel) & 0x1f) << 25)) +#define ESCR_SET_EVENT_MASK(escr, mask) ((escr) |= (((mask) & 0xffff) << 9)) +#define ESCR_READ(escr,high,ev,i) do {rdmsr(ev->bindings[(i)].escr_address, (escr), (high));} while (0); +#define ESCR_WRITE(escr,high,ev,i) do {wrmsr(ev->bindings[(i)].escr_address, (escr), (high));} while (0); + +#define CCCR_RESERVED_BITS 0x38030FFF +#define CCCR_CLEAR(cccr) ((cccr) &= CCCR_RESERVED_BITS) +#define CCCR_SET_REQUIRED_BITS(cccr) ((cccr) |= 0x00030000) +#define CCCR_SET_ESCR_SELECT(cccr, sel) ((cccr) |= (((sel) & 0x07) << 13)) +#define CCCR_SET_PMI_OVF(cccr) ((cccr) |= (1<<26)) +#define CCCR_SET_ENABLE(cccr) ((cccr) |= (1<<12)) +#define CCCR_SET_DISABLE(cccr) ((cccr) &= ~(1<<12)) +#define CCCR_READ(low, high, i) do {rdmsr (p4_counters[(i)].cccr_address, (low), (high));} while (0); +#define CCCR_WRITE(low, high, i) do {wrmsr (p4_counters[(i)].cccr_address, (low), (high));} while (0); +#define CCCR_OVF_P(cccr) ((cccr) & (1U<<31)) +#define CCCR_CLEAR_OVF(cccr) ((cccr) &= (~(1U<<31))) + +#define CTR_READ(l,h,i) do {rdmsr(p4_counters[(i)].counter_address, (l), (h));} while (0); +#define CTR_WRITE(l,i) do {wrmsr(p4_counters[(i)].counter_address, -(u32)(l), -1);} while (0); + + +static void p4_fill_in_addresses(struct op_msrs * const msrs) +{ + int i; + uint addr; + + /* the 8 counter registers we pay attention to */ + for (i = 0; i < NUM_COUNTERS; ++i) + msrs->counters.addrs[i] = p4_counters[i].counter_address; + + /* 18 CCCR registers */ + for (i=0, addr = MSR_P4_BPU_CCCR0; + addr <= MSR_P4_IQ_CCCR5; ++addr, ++i) + msrs->controls.addrs[i] = addr; + + /* 43 ESCR registers */ + for (addr = MSR_P4_BSU_ESCR0; + addr <= MSR_P4_SSU_ESCR0; ++addr, ++i){ + msrs->controls.addrs[i] = addr; + } + + for (addr = MSR_P4_MS_ESCR0; + addr <= MSR_P4_TC_ESCR1; ++addr, ++i){ + msrs->controls.addrs[i] = addr; + } + + for (addr = MSR_P4_IX_ESCR0; + addr <= MSR_P4_CRU_ESCR3; ++addr, ++i){ + msrs->controls.addrs[i] = addr; + } + + /* 2 remaining non-contiguously located ESCRs */ + msrs->controls.addrs[i++] = MSR_P4_CRU_ESCR4; + msrs->controls.addrs[i++] = MSR_P4_CRU_ESCR5; +} + +static void pmc_setup_one_p4_counter(uint ctr) +{ + int i; + const int maxbind = 2; + uint cccr = 0; + uint escr = 0; + uint high = 0; + uint counter_bit; + struct p4_event_binding *ev = NULL; + + /* convert from counter *number* to counter *bit* */ + counter_bit = 1 << ctr; + + /* find our event binding structure. + nb: virtual p4 "event codes" count from 0x01 .. 0x27 */ + if (sysctl.ctr[ctr].event < 0 || sysctl.ctr[ctr].event > 0x27) { + printk(KERN_ERR + "oprofile: P4 event code 0x%x out of range\n", + sysctl.ctr[ctr].event); + return; + } + + ev = & (p4_events[sysctl.ctr[ctr].event - 1]); + + for (i = 0; i < maxbind; i++) { + if (ev->bindings[i].virt_counter & counter_bit) { + + /* modify ESCR */ + ESCR_READ(escr, high, ev, i); + ESCR_CLEAR(escr); + ESCR_SET_USR_0(escr, sysctl.ctr[ctr].user); + ESCR_SET_OS_0(escr, sysctl.ctr[ctr].kernel); + ESCR_SET_EVENT_SELECT(escr, ev->event_select); + ESCR_SET_EVENT_MASK(escr, sysctl.ctr[ctr].unit_mask); + ESCR_WRITE(escr, high, ev, i); + + /* modify CCCR */ + CCCR_READ(cccr, high, ctr); + CCCR_CLEAR(cccr); + CCCR_SET_REQUIRED_BITS(cccr); + CCCR_SET_ESCR_SELECT(cccr, ev->escr_select); + CCCR_SET_PMI_OVF(cccr); + CCCR_WRITE(cccr, high, ctr); + return; + } + } + return; +} + + +static void p4_setup_ctrs(struct op_msrs const * const msrs) +{ + uint i; + uint low, high; + + rdmsr(MSR_P4_MISC, low, high); + if (! MISC_PMC_ENABLED_P(low)) { + printk(KERN_ERR "oprofile: P4 PMC not available"); + return; + } + + /* clear all cccrs */ + for (i = 0 ; i < NUM_COUNTERS ; ++i) { + CCCR_READ(low, high, i); + CCCR_CLEAR(low); + CCCR_SET_REQUIRED_BITS(low); + CCCR_WRITE(low, high, i); + } + + /* setup all counters */ + for (i = 0 ; i < NUM_COUNTERS ; ++i) { + if (sysctl.ctr[i].event) { + pmc_setup_one_p4_counter(i); + CTR_WRITE(sysctl.ctr[i].count, i); + } + } +} + +static void p4_check_ctrs(uint const cpu, + struct op_msrs const * const msrs, + struct pt_regs * const regs) +{ + ulong low, high; + int i; + + for (i = 0; i < NUM_COUNTERS; ++i) { + CCCR_READ(low, high, i); + if (CCCR_OVF_P(low)) { + op_do_profile(cpu, regs, i); + CCCR_CLEAR_OVF(low); + CTR_WRITE(oprof_data[cpu].ctr_count[i], i); + CCCR_WRITE(low, high, i); + } + } + // P4 quirk: you have to re-unmask the apic vector + apic_write(APIC_LVTPC, apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED); +} + + +static void p4_start(struct op_msrs const * const msrs) +{ + uint low,high; + int i; + for (i = 0; i < NUM_COUNTERS; ++i) { + if (!sysctl.ctr[i].enabled) continue; + CCCR_READ(low, high, i); + CCCR_SET_ENABLE(low); + CCCR_WRITE(low, high, i); + } +} + +static void p4_stop(struct op_msrs const * const msrs) +{ + uint low,high; + int i; + for (i = 0; i < NUM_COUNTERS; ++i) { + if (!sysctl.ctr[i].enabled) continue; + CCCR_READ(low, high, i); + CCCR_SET_DISABLE(low); + CCCR_WRITE(low, high, i); + } +} + + +struct op_x86_model_spec op_p4_spec = { + .num_counters = NUM_COUNTERS, + .num_controls = NUM_CONTROLS, + .fill_in_addresses = &p4_fill_in_addresses, + .setup_ctrs = &p4_setup_ctrs, + .check_ctrs = &p4_check_ctrs, + .start = &p4_start, + .stop = &p4_stop +}; --- module/x86/op_msr.h Mon Sep 23 17:34:52 2002 +++ module/x86/op_msr.h Mon Sep 23 16:09:38 2002 @@ -61,4 +61,270 @@ #define MSR_K7_PERFCTR3 0xc0010007 #endif +/* There are *82* pentium 4 MSRs: + + - 1 misc register + + - 18 counters (PERFCTRs) + + - 18 counter configuration control registers (CCCRs) + + - 45 event selection control registers (ESCRs). */ + + +#ifndef MSR_P4_MISC +#define MSR_P4_MISC 0x1a0 +#endif + +#ifndef MSR_P4_BPU_PERFCTR0 +#define MSR_P4_BPU_PERFCTR0 0x300 +#endif +#ifndef MSR_P4_BPU_PERFCTR1 +#define MSR_P4_BPU_PERFCTR1 0x301 +#endif +#ifndef MSR_P4_BPU_PERFCTR2 +#define MSR_P4_BPU_PERFCTR2 0x302 +#endif +#ifndef MSR_P4_BPU_PERFCTR3 +#define MSR_P4_BPU_PERFCTR3 0x303 +#endif +#ifndef MSR_P4_MS_PERFCTR0 +#define MSR_P4_MS_PERFCTR0 0x304 +#endif +#ifndef MSR_P4_MS_PERFCTR1 +#define MSR_P4_MS_PERFCTR1 0x305 +#endif +#ifndef MSR_P4_MS_PERFCTR2 +#define MSR_P4_MS_PERFCTR2 0x306 +#endif +#ifndef MSR_P4_MS_PERFCTR3 +#define MSR_P4_MS_PERFCTR3 0x307 +#endif +#ifndef MSR_P4_FLAME_PERFCTR0 +#define MSR_P4_FLAME_PERFCTR0 0x308 +#endif +#ifndef MSR_P4_FLAME_PERFCTR1 +#define MSR_P4_FLAME_PERFCTR1 0x309 +#endif +#ifndef MSR_P4_FLAME_PERFCTR2 +#define MSR_P4_FLAME_PERFCTR2 0x30a +#endif +#ifndef MSR_P4_FLAME_PERFCTR3 +#define MSR_P4_FLAME_PERFCTR3 0x30b +#endif +#ifndef MSR_P4_IQ_PERFCTR0 +#define MSR_P4_IQ_PERFCTR0 0x30c +#endif +#ifndef MSR_P4_IQ_PERFCTR1 +#define MSR_P4_IQ_PERFCTR1 0x30d +#endif +#ifndef MSR_P4_IQ_PERFCTR2 +#define MSR_P4_IQ_PERFCTR2 0x30e +#endif +#ifndef MSR_P4_IQ_PERFCTR3 +#define MSR_P4_IQ_PERFCTR3 0x30f +#endif +#ifndef MSR_P4_IQ_PERFCTR4 +#define MSR_P4_IQ_PERFCTR4 0x310 +#endif +#ifndef MSR_P4_IQ_PERFCTR5 +#define MSR_P4_IQ_PERFCTR5 0x311 +#endif + + +#ifndef MSR_P4_BPU_CCCR0 +#define MSR_P4_BPU_CCCR0 0x360 +#endif +#ifndef MSR_P4_BPU_CCCR1 +#define MSR_P4_BPU_CCCR1 0x361 +#endif +#ifndef MSR_P4_BPU_CCCR2 +#define MSR_P4_BPU_CCCR2 0x362 +#endif +#ifndef MSR_P4_BPU_CCCR3 +#define MSR_P4_BPU_CCCR3 0x363 +#endif +#ifndef MSR_P4_MS_CCCR0 +#define MSR_P4_MS_CCCR0 0x364 +#endif +#ifndef MSR_P4_MS_CCCR1 +#define MSR_P4_MS_CCCR1 0x365 +#endif +#ifndef MSR_P4_MS_CCCR2 +#define MSR_P4_MS_CCCR2 0x366 +#endif +#ifndef MSR_P4_MS_CCCR3 +#define MSR_P4_MS_CCCR3 0x367 +#endif +#ifndef MSR_P4_FLAME_CCCR0 +#define MSR_P4_FLAME_CCCR0 0x368 +#endif +#ifndef MSR_P4_FLAME_CCCR1 +#define MSR_P4_FLAME_CCCR1 0x369 +#endif +#ifndef MSR_P4_FLAME_CCCR2 +#define MSR_P4_FLAME_CCCR2 0x36a +#endif +#ifndef MSR_P4_FLAME_CCCR3 +#define MSR_P4_FLAME_CCCR3 0x36b +#endif +#ifndef MSR_P4_IQ_CCCR0 +#define MSR_P4_IQ_CCCR0 0x36c +#endif +#ifndef MSR_P4_IQ_CCCR1 +#define MSR_P4_IQ_CCCR1 0x36d +#endif +#ifndef MSR_P4_IQ_CCCR2 +#define MSR_P4_IQ_CCCR2 0x36e +#endif +#ifndef MSR_P4_IQ_CCCR3 +#define MSR_P4_IQ_CCCR3 0x36f +#endif +#ifndef MSR_P4_IQ_CCCR4 +#define MSR_P4_IQ_CCCR4 0x370 +#endif +#ifndef MSR_P4_IQ_CCCR5 +#define MSR_P4_IQ_CCCR5 0x371 +#endif + + +#ifndef MSR_P4_ALF_ESCR0 +#define MSR_P4_ALF_ESCR0 0x3ca +#endif +#ifndef MSR_P4_ALF_ESCR1 +#define MSR_P4_ALF_ESCR1 0x3cb +#endif +#ifndef MSR_P4_BPU_ESCR0 +#define MSR_P4_BPU_ESCR0 0x3b2 +#endif +#ifndef MSR_P4_BPU_ESCR1 +#define MSR_P4_BPU_ESCR1 0x3b3 +#endif +#ifndef MSR_P4_BSU_ESCR0 +#define MSR_P4_BSU_ESCR0 0x3a0 +#endif +#ifndef MSR_P4_BSU_ESCR1 +#define MSR_P4_BSU_ESCR1 0x3a1 +#endif +#ifndef MSR_P4_CRU_ESCR0 +#define MSR_P4_CRU_ESCR0 0x3b8 +#endif +#ifndef MSR_P4_CRU_ESCR1 +#define MSR_P4_CRU_ESCR1 0x3b9 +#endif +#ifndef MSR_P4_CRU_ESCR2 +#define MSR_P4_CRU_ESCR2 0x3cc +#endif +#ifndef MSR_P4_CRU_ESCR3 +#define MSR_P4_CRU_ESCR3 0x3cd +#endif +#ifndef MSR_P4_CRU_ESCR4 +#define MSR_P4_CRU_ESCR4 0x3e0 +#endif +#ifndef MSR_P4_CRU_ESCR5 +#define MSR_P4_CRU_ESCR5 0x3e1 +#endif +#ifndef MSR_P4_DAC_ESCR0 +#define MSR_P4_DAC_ESCR0 0x3a8 +#endif +#ifndef MSR_P4_DAC_ESCR1 +#define MSR_P4_DAC_ESCR1 0x3a9 +#endif +#ifndef MSR_P4_FIRM_ESCR0 +#define MSR_P4_FIRM_ESCR0 0x3a4 +#endif +#ifndef MSR_P4_FIRM_ESCR1 +#define MSR_P4_FIRM_ESCR1 0x3a5 +#endif +#ifndef MSR_P4_FLAME_ESCR0 +#define MSR_P4_FLAME_ESCR0 0x3a6 +#endif +#ifndef MSR_P4_FLAME_ESCR1 +#define MSR_P4_FLAME_ESCR1 0x3a7 +#endif +#ifndef MSR_P4_FSB_ESCR0 +#define MSR_P4_FSB_ESCR0 0x3a2 +#endif +#ifndef MSR_P4_FSB_ESCR1 +#define MSR_P4_FSB_ESCR1 0x3a3 +#endif +#ifndef MSR_P4_IQ_ESCR0 +#define MSR_P4_IQ_ESCR0 0x3ba +#endif +#ifndef MSR_P4_IQ_ESCR1 +#define MSR_P4_IQ_ESCR1 0x3bb +#endif +#ifndef MSR_P4_IS_ESCR0 +#define MSR_P4_IS_ESCR0 0x3b4 +#endif +#ifndef MSR_P4_IS_ESCR1 +#define MSR_P4_IS_ESCR1 0x3b5 +#endif +#ifndef MSR_P4_ITLB_ESCR0 +#define MSR_P4_ITLB_ESCR0 0x3b6 +#endif +#ifndef MSR_P4_ITLB_ESCR1 +#define MSR_P4_ITLB_ESCR1 0x3b7 +#endif +#ifndef MSR_P4_IX_ESCR0 +#define MSR_P4_IX_ESCR0 0x3c8 +#endif +#ifndef MSR_P4_IX_ESCR1 +#define MSR_P4_IX_ESCR1 0x3c9 +#endif +#ifndef MSR_P4_MOB_ESCR0 +#define MSR_P4_MOB_ESCR0 0x3aa +#endif +#ifndef MSR_P4_MOB_ESCR1 +#define MSR_P4_MOB_ESCR1 0x3ab +#endif +#ifndef MSR_P4_MS_ESCR0 +#define MSR_P4_MS_ESCR0 0x3c0 +#endif +#ifndef MSR_P4_MS_ESCR1 +#define MSR_P4_MS_ESCR1 0x3c1 +#endif +#ifndef MSR_P4_PMH_ESCR0 +#define MSR_P4_PMH_ESCR0 0x3ac +#endif +#ifndef MSR_P4_PMH_ESCR1 +#define MSR_P4_PMH_ESCR1 0x3ad +#endif +#ifndef MSR_P4_RAT_ESCR0 +#define MSR_P4_RAT_ESCR0 0x3bc +#endif +#ifndef MSR_P4_RAT_ESCR1 +#define MSR_P4_RAT_ESCR1 0x3bd +#endif +#ifndef MSR_P4_SAAT_ESCR0 +#define MSR_P4_SAAT_ESCR0 0x3ae +#endif +#ifndef MSR_P4_SAAT_ESCR1 +#define MSR_P4_SAAT_ESCR1 0x3af +#endif +#ifndef MSR_P4_SSU_ESCR0 +#define MSR_P4_SSU_ESCR0 0x3be +#endif +#ifndef MSR_P4_SSU_ESCR1 +#define MSR_P4_SSU_ESCR1 0x3bf /* guess: not defined in manual */ +#endif +#ifndef MSR_P4_TBPU_ESCR0 +#define MSR_P4_TBPU_ESCR0 0x3c2 +#endif +#ifndef MSR_P4_TBPU_ESCR1 +#define MSR_P4_TBPU_ESCR1 0x3c3 +#endif +#ifndef MSR_P4_TC_ESCR0 +#define MSR_P4_TC_ESCR0 0x3c4 +#endif +#ifndef MSR_P4_TC_ESCR1 +#define MSR_P4_TC_ESCR1 0x3c5 +#endif +#ifndef MSR_P4_U2L_ESCR0 +#define MSR_P4_U2L_ESCR0 0x3b0 +#endif +#ifndef MSR_P4_U2L_ESCR1 +#define MSR_P4_U2L_ESCR1 0x3b1 +#endif + #endif /* OP_MSR_H */ --- module/x86/op_nmi.c Mon Sep 23 17:34:52 2002 +++ module/x86/op_nmi.c Mon Sep 23 16:44:43 2002 @@ -27,7 +27,9 @@ case CPU_ATHLON: model = &op_athlon_spec; break; - + case CPU_P4: + model = &op_p4_spec; + break; default: model = &op_ppro_spec; break; @@ -326,7 +328,7 @@ } -static char *names[] = { "0", "1", "2", "3", "4", }; +static char *names[] = { "0", "1", "2", "3", "4", "5", "6", "7", "8" }; static int pmc_add_sysctls(ctl_table * next) { --- module/x86/op_x86_model.h Sun Sep 22 13:46:00 2002 +++ module/x86/op_x86_model.h Mon Sep 23 16:35:08 2002 @@ -44,5 +44,6 @@ extern struct op_x86_model_spec const op_ppro_spec; extern struct op_x86_model_spec const op_athlon_spec; +extern struct op_x86_model_spec const op_p4_spec; #endif /* OP_X86_MODEL_H */ |
From: John L. <le...@mo...> - 2002-09-24 13:35:58
|
On Mon, Sep 23, 2002 at 11:10:27PM -0400, gr...@re... wrote: > become quite a bit nicer to work with). I've tested this on a Real > Live model-1 P4, and an athlon for good measure. It appears to work Does this mean op_model_athlon code is tested ? > * use of this variable is for static/local array dimension. Never use it in > * loop or in array index access/index checking unless you know what you > * made. Don't change it without updating OP_BITS_CTR! */ You can remove the OP_BITS_CTR comment as discussed. > +++ module/x86/op_model_p4.c Mon Sep 23 22:09:12 2002 > @@ -0,0 +1,493 @@ > +/** > + * @file op_model_athlon.h Change filename. > +struct p4_counter_binding { > + int virt_counter; > + int counter_address; > + int cccr_address; > +}; Use tab for indentation. > +struct p4_event_binding p4_events[0x27] = { Can we use a define instead of 0x27 ? > + const int maxbind = 2; int const > + struct p4_event_binding *ev = NULL; * ev > + return; > +} Remove return The patch looks good to me. Haven't tested. regards john -- "The only perfect circle on the human body is the eye. When a baby is born its so perfect but when it opens its eyes its just blinded by the corruption and everything else is a downward spiral." - Richey Edwards |
From: <gr...@re...> - 2002-09-24 19:08:25
|
At Tue, 24 Sep 2002 14:32:21 +0100, John Levon wrote: > Does this mean op_model_athlon code is tested ? yes, as with this copy of the patch, which includes your and will's corrections; p4 and athlon tested. I've also included some documentation additions, to cover the p4. -graydon --- ChangeLog Mon Sep 23 19:06:19 2002 +++ ChangeLog Tue Sep 24 14:44:25 2002 @@ -1,3 +1,28 @@ +2002-09-23 Graydon Hoare <gr...@re...> + + * doc/oprofile.xml: Add some P4 documentation. + * configure.in: Add detection of different stylesheet paths. + * doc/xsl/xhtml.xsl.in: Parameterize by configure's result. + * doc/xsl/xhtml.xsl: Remove. + * dae/opd_sample_files.c: Change unit mask from 8 to 16 bits. + * gui/oprof_start.cpp: Change number of unit masks from 7 to 16. + * gui/ui/oprof_start.base.ui: Likewise. + * libop/op_cpu_type.c: Add P4 CPU type. + * libop/op_events.h: Change unit mask bit width, number. + * libop/op_events.c: Add P4 events, unit masks. + * libop_op_hw_config.h: Set OP_MAX_COUNTERS to 8. + * libop++/op_print_event.cpp: Change unit mask bit width. + * libop++/op_print_event.h: Likewise. + * module/oprofile.c: Add extra sysctls for counters 5-8. + * module/x86/Makefile.in: Add op_model_p4.o to obj list. + * module/x86/cpu_type.c: Change CPU identification to handle P4. + * module/x86/op_apic.c: (enable_apic): APIC_MAXLVT < 4, not != 4. + (check_cpu_ok): Accept CPU_P4. + * module/x86/op_model_p4.c: New file. + * module/x86/op_nmi.c: (get_model): Handle CPU_P4. + Add sysctl names for counters 5-8. + * module/x86/op_x86_model.h: Declare extern op_p4_spec. + 2002-09-24 Philippe Elie <ph...@wa...> * dae/opd_image.c: --- configure.in Mon Sep 23 17:34:51 2002 +++ configure.in Tue Sep 24 01:05:56 2002 @@ -260,6 +260,14 @@ AC_PATH_XTRA LIBS="$X_PRE_LIBS $LIBS $X_LIBS -lX11 $X_EXTRA_LIBS" QT_DO_IT_ALL + +# try to find the docbook stylesheets +if test -e /usr/share/sgml/docbook/stylesheet/xsl/nwalsh/xhtml/docbook.xsl; then + db_xsl=/usr/share/sgml/docbook/stylesheet/xsl/nwalsh/xhtml/docbook.xsl +else + db_xsl=/usr/share/sgml/docbook/xsl-stylesheets/xhtml/docbook.xsl +fi +AC_SUBST(db_xsl) # do NOT put tests here, they will fail in the case X is not installed ! @@ -281,6 +289,7 @@ daemon/Makefile \ utils/Makefile \ doc/Makefile \ + doc/xsl/xhtml.xsl \ doc/oprofile.1 \ pp/Makefile \ gui/Makefile \ --- dae/opd_sample_files.c Sat Sep 7 14:19:34 2002 +++ dae/opd_sample_files.c Mon Sep 23 15:44:48 2002 @@ -28,7 +28,7 @@ extern int separate_samples; extern u32 ctr_count[OP_MAX_COUNTERS]; extern u8 ctr_event[OP_MAX_COUNTERS]; -extern u8 ctr_um[OP_MAX_COUNTERS]; +extern u16 ctr_um[OP_MAX_COUNTERS]; extern double cpu_speed; extern op_cpu cpu_type; --- doc/oprofile.xml Mon Sep 2 08:48:06 2002 +++ doc/oprofile.xml Tue Sep 24 13:07:53 2002 @@ -56,12 +56,14 @@ </listitem> </varlistentry> <varlistentry> - <term>Intel P6 processor or AMD Athlon/Duron.</term> + <term>Intel P6 processor, Pentium 4 / Xeon, or AMD Athlon/Duron.</term> <listitem> - A CPU with a P6 generation core is required. In marketing terms this translates to anything between an Intel - Pentium Pro (NOT Pentium Classics) and a Pentium III, including all Celerons. - The AMD Athlon & Duron CPUs are also supported. - Other CPU types only support the RTC mode of oprofile, please see later in this manual for details. + + A CPU with either a P6 generation or Pentium 4 core is required. In marketing terms this + translates to anything between an Intel Pentium Pro (NOT Pentium Classics) and a Pentium 4 / Xeon, + including all Celerons. The AMD Athlon & Duron CPUs are also supported. Other CPU types only + support the RTC mode of oprofile, please see later in this manual for details. + </listitem> </varlistentry> <varlistentry> @@ -541,9 +543,10 @@ at a later time. </para> <para> -If we use an event such as <constant>CPU_CLK_UNHALTED</constant> or <constant>INST_RETIRED</constant>, we can -use the overflow counts as an estimate of actual time spent in each part of code. Alternatively we can -profile interesting data such as the cache behaviour of routines with the other available counters. +If we use an event such as <constant>CPU_CLK_UNHALTED</constant> or <constant>INST_RETIRED</constant> +(<constant>GLOBAL_POWER_EVENTS</constant> or <constant>INSTR_RETIRED</constant>, respectively, on the Pentium 4), we can +use the overflow counts as an estimate of actual time spent in each part of code. Alternatively we can profile interesting +data such as the cache behaviour of routines with the other available counters. </para> <para> However there are several caveats. Firstly there are those issues listed in the Intel manual. There is a delay @@ -587,7 +590,7 @@ <title>OProfile in RTC mode</title> <para> Some CPU types do not provide the needed hardware support to use the hardware performance counters. This includes -some laptops, classic Pentiums, and other CPU types not yet supported by OProfile (such as Cyrix and Pentium IV). +some laptops, classic Pentiums, and other CPU types not yet supported by OProfile (such as Cyrix). On these machines, OProfile falls back to using the real-time clock interrupt to collect samples. This interrupt is also used by the <command>rtc</command> module: you cannot have both the oprofile and rtc modules loaded nor the rtc support compiled in the kernel. @@ -612,6 +615,24 @@ </para> </sect2> +<sect2 id="p4"> +<title>Pentium 4 support</title> +<para> +The Pentium 4 / Xeon performance counters are organized around 3 types of model specific registers (MSRs): 45 event +selection control registers (ESCRs), 18 counter configuration control registers (CCCRs) and 18 counters. ESCRs describe a +particular set of events which are to be recorded, and CCCRs bind ESCRs to counters and configure their +operation. Unfortunately the relationship between these registers is quite complex; they cannot all be used with one +another at any time. There is, however, a subset of 8 counters, 8 ESCRs, and 8 CCCRs which can be used independently of +one another, so the Oprofile module only accesses those registers, treating them as a bank of 8 "normal" counters, similar +to those in the P6 or Athlon families of CPU. +</para> +<para> +There is currently no support for Precision Event-Based Sampling (PEBS), nor any advanced uses of the Debug Store +(DS). Current support is limited to the conservative extension of Oprofile's existing interrupt-based model described +above. +</para> +</sect2> + <sect2 id="sysctl"> <title><command>sysctl</command> tree</title> <para> @@ -891,7 +912,8 @@ <varlistentry> <term><option>--counter</option> nr</term> <listitem><para> - Which counter (0 - N) to extract information for. N is dependent on your cpu type: 1 for Intel CPUs, 3 for Athlon based CPUs. + Which counter (0 - N) to extract information for. N is dependent on your cpu type: 1 for P6 generation CPUs, + 3 for Athlon based CPUs, 8 for Pentium 4 / Xeon CPUs. </para></listitem> </varlistentry> <varlistentry> --- doc/xsl/xhtml.xsl.in Wed Dec 31 19:00:00 1969 +++ doc/xsl/xhtml.xsl.in Tue Sep 24 01:05:46 2002 @@ -0,0 +1,14 @@ +<?xml version='1.0'?> +<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" +xmlns:doc="http://nwalsh.com/xsl/documentation/1.0" version="1.0"> + +<xsl:import href="@db_xsl@"/> +<xsl:import href="xhtml-common.xsl"/> + +<!-- this will give you the doctype on your chunks --> +<xsl:output method="xml" encoding="ISO-8859-1" indent="yes" +doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN" +doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd" +/> + +</xsl:stylesheet> --- gui/oprof_start.cpp Mon Sep 23 17:34:52 2002 +++ gui/oprof_start.cpp Mon Sep 23 17:19:01 2002 @@ -640,6 +640,15 @@ get_unit_mask_part(descr, 4, check4->isChecked(), mask); get_unit_mask_part(descr, 5, check5->isChecked(), mask); get_unit_mask_part(descr, 6, check6->isChecked(), mask); + get_unit_mask_part(descr, 7, check7->isChecked(), mask); + get_unit_mask_part(descr, 8, check8->isChecked(), mask); + get_unit_mask_part(descr, 9, check9->isChecked(), mask); + get_unit_mask_part(descr, 10, check10->isChecked(), mask); + get_unit_mask_part(descr, 11, check11->isChecked(), mask); + get_unit_mask_part(descr, 12, check12->isChecked(), mask); + get_unit_mask_part(descr, 13, check13->isChecked(), mask); + get_unit_mask_part(descr, 14, check14->isChecked(), mask); + get_unit_mask_part(descr, 15, check15->isChecked(), mask); return mask; } @@ -652,6 +661,15 @@ check4->hide(); check5->hide(); check6->hide(); + check7->hide(); + check8->hide(); + check9->hide(); + check10->hide(); + check11->hide(); + check12->hide(); + check13->hide(); + check14->hide(); + check15->hide(); } void oprof_start::setup_unit_masks(op_event_descr const & descr) @@ -677,6 +695,15 @@ case 4: check = check4; break; case 5: check = check5; break; case 6: check = check6; break; + case 7: check = check7; break; + case 8: check = check8; break; + case 9: check = check9; break; + case 10: check = check10; break; + case 11: check = check11; break; + case 12: check = check12; break; + case 13: check = check13; break; + case 14: check = check14; break; + case 15: check = check15; break; } check->setText(um->um[i].desc); if (um->unit_type_mask == utm_exclusive) { --- gui/ui/oprof_start.base.ui Tue Jul 23 21:14:15 2002 +++ gui/ui/oprof_start.base.ui Mon Sep 23 15:47:36 2002 @@ -410,6 +410,105 @@ <string>check6</string> </property> </widget> + <widget> + <class>QCheckBox</class> + <property stdset="1"> + <name>name</name> + <cstring>check7</cstring> + </property> + <property stdset="1"> + <name>text</name> + <string>check7</string> + </property> + </widget> + <widget> + <class>QCheckBox</class> + <property stdset="1"> + <name>name</name> + <cstring>check8</cstring> + </property> + <property stdset="1"> + <name>text</name> + <string>check8</string> + </property> + </widget> + <widget> + <class>QCheckBox</class> + <property stdset="1"> + <name>name</name> + <cstring>check9</cstring> + </property> + <property stdset="1"> + <name>text</name> + <string>check9</string> + </property> + </widget> + <widget> + <class>QCheckBox</class> + <property stdset="1"> + <name>name</name> + <cstring>check10</cstring> + </property> + <property stdset="1"> + <name>text</name> + <string>check10</string> + </property> + </widget> + <widget> + <class>QCheckBox</class> + <property stdset="1"> + <name>name</name> + <cstring>check11</cstring> + </property> + <property stdset="1"> + <name>text</name> + <string>check11</string> + </property> + </widget> + <widget> + <class>QCheckBox</class> + <property stdset="1"> + <name>name</name> + <cstring>check12</cstring> + </property> + <property stdset="1"> + <name>text</name> + <string>check12</string> + </property> + </widget> + <widget> + <class>QCheckBox</class> + <property stdset="1"> + <name>name</name> + <cstring>check13</cstring> + </property> + <property stdset="1"> + <name>text</name> + <string>check13</string> + </property> + </widget> + <widget> + <class>QCheckBox</class> + <property stdset="1"> + <name>name</name> + <cstring>check14</cstring> + </property> + <property stdset="1"> + <name>text</name> + <string>check14</string> + </property> + </widget> + <widget> + <class>QCheckBox</class> + <property stdset="1"> + <name>name</name> + <cstring>check15</cstring> + </property> + <property stdset="1"> + <name>text</name> + <string>check15</string> + </property> + </widget> <spacer> <property> <name>name</name> --- libop/op_cpu_type.c Mon Sep 23 17:34:52 2002 +++ libop/op_cpu_type.c Mon Sep 23 15:48:41 2002 @@ -49,7 +49,8 @@ "PIII", "Athlon", "CPU with timer interrupt", - "CPU with RTC device" + "CPU with RTC device", + "P4 / Xeon" }; @@ -76,7 +77,8 @@ 2, /* PIII */ 4, /* Athlon */ 1, /* Timer interrupt */ - 1 /* RTC */ + 1, /* RTC */ + 8 /* P4 / Xeon */ }; /** --- libop/op_cpu_type.h Mon Sep 23 17:34:52 2002 +++ libop/op_cpu_type.h Mon Sep 23 15:49:07 2002 @@ -25,6 +25,7 @@ CPU_ATHLON, /**< AMD P6 series */ CPU_TIMER_INT, /**< CPU using the timer interrupt */ CPU_RTC, /**< other CPU to use the RTC */ + CPU_P4, /**< Pentium 4 / Xeon series */ MAX_CPU_TYPE } op_cpu; --- libop/op_events.c Thu Sep 19 19:25:28 2002 +++ libop/op_events.c Tue Sep 24 11:27:18 2002 @@ -93,6 +93,244 @@ {0x1, "(I)nvalid cache state"}, {0x1f, "all MOESI cache state"} } }; +/* pentium 4 events */ + +/* BRANCH_RETIRED */ +static struct op_unit_mask um_branch_retired = + {4, utm_bitmask, 0x0c, + { {0x01, "branch not-taken predicted"}, + {0x02, "branch not-taken mispredicted"}, + {0x04, "branch taken predicted"}, + {0x08, "branch taken mispredicted"} } }; + +/* MISPRED_BRANCH_RETIRED */ +static struct op_unit_mask um_mispred_branch_retired = + {1, utm_bitmask, 0x01, + { {0x01, "retired instruction is non-bogus"} } }; + +/* TC_DELIVER_MODE */ +static struct op_unit_mask um_tc_deliver_mode = + {8, utm_bitmask, 0x01, + { {0x01, "both logical processors in deliver mode"}, + {0x02, "logical processor 0 in deliver mode, 1 in build mode"}, + {0x04, "logical processor 0 in deliver mode, 1 in halt/clear/trans mode"}, + {0x08, "logical processor 0 in build mode, 1 in deliver mode"}, + {0x10, "both logical processors in build mode"}, + {0x20, "logical processor 0 in build mode, 1 in halt/clear/trans mode"}, + {0x40, "logical processor 0 in halt/clear/trans mode, 1 in deliver mode"}, + {0x80, "logical processor 0 in halt/clear/trans mode, 1 in build mode"} } }; + +/* BPU_FETCH_REQUEST */ +static struct op_unit_mask um_bpu_fetch_request = + {1, utm_bitmask, 0x00, + {{0x01, "trace cache lookup miss"} } }; + +/* ITLB_REFERENCE */ +static struct op_unit_mask um_itlb_reference = + {3, utm_bitmask, 0x07, + { {0x01, "ITLB hit"}, + {0x02, "ITLB miss"}, + {0x04, "uncacheable ITLB hit"} } }; + +/* MEMORY_CANCEL */ +static struct op_unit_mask um_memory_cancel = + {2, utm_bitmask, 0x06, + { {0x04, "replayed because no store request buffer available"}, + {0x08, "conflicts due to 64k aliasing"} } }; + +/* MEMORY_COMPLETE */ +static struct op_unit_mask um_memory_complete = + {2, utm_bitmask, 0x03, + { {0x01, "load split completed, excluding UC/WC loads"}, + {0x02, "any split stores completed"} } }; + +/* LOAD_PORT_REPLAY */ +static struct op_unit_mask um_load_port_replay = + {1, utm_bitmask, 0x02, + { {0x02, "split load"} } }; + +/* STORE_PORT_REPLAY */ +static struct op_unit_mask um_store_port_replay = + {1, utm_bitmask, 0x02, + { {0x02, "split store"} } }; + +/* MOB_LOAD_REPLAY */ +static struct op_unit_mask um_mob_load_replay = + {4, utm_bitmask, 0x3a, + { {0x02, "replay cause: unknown store address"}, + {0x08, "replay cause: unknown store data"}, + {0x10, "replay cause: partial overlap between load and store"}, + {0x20, "replay cause: mismatched low 4 bits between load and store addr"} } }; + +/* PAGE_WALK_TYPE */ +static struct op_unit_mask um_page_walk_type = + {2, utm_bitmask, 0x03, + { {0x01, "page walk for data TLB miss"}, + {0x02, "page walk for instruction TLB miss"} } }; + +/* BSQ_CACHE_REFERENCE */ +static struct op_unit_mask um_bsq_cache_reference = + {9, utm_bitmask, 0x7ff, + { {0x01, "read 2nd level cache hit shared"}, + {0x02, "read 2nd level cache hit exclusive"}, + {0x04, "read 2nd level cache hit modified"}, + {0x08, "read 3rd level cache hit shared"}, + {0x10, "read 3rd level cache hit exclusive"}, + {0x20, "read 3rd level cache hit modified"}, + {0x100, "read 2nd level cache miss"}, + {0x200, "read 3rd level cache miss"}, + {0x400, "writeback lookup from DAC misses 2nd level cache"} } }; + +/* IOQ_ALLOCATION */ +/* IOQ_ACTIVE_ENTRIES */ +static struct op_unit_mask um_ioq = + {15, utm_bitmask, 0xefe1, + { {0x01, "bus request type bit 0"}, + {0x02, "bus request type bit 1"}, + {0x04, "bus request type bit 2"}, + {0x08, "bus request type bit 3"}, + {0x10, "bus request type bit 4"}, + {0x20, "count read entries"}, + {0x40, "count write entries"}, + {0x80, "count UC memory access entries"}, + {0x100, "count WC memory access entries"}, + {0x200, "count write-through memory access entries"}, + {0x400, "count write-protected memory access entries"}, + {0x800, "count WB memory access entries"}, + {0x2000, "count own store requests"}, + {0x4000, "count other / DMA store requests"}, + {0x8000, "count HW/SW prefetch requests"} } }; + +/* FSB_DATA_ACTIVITY */ +static struct op_unit_mask um_fsb_data_activity = + {6, utm_bitmask, 0x3f, + { {0x01, "count when this processor drives data onto bus"}, + {0x02, "count when this processor reads data from bus"}, + {0x04, "count when data is on bus but not sampled by this processor"}, + {0x08, "count when this processor reserves bus for driving"}, + {0x10, "count when other reserves bus and this processor will sample"}, + {0x20, "count when other reserves bus and this processor will not sample"} } }; + +/* BSQ_ALLOCATION */ +/* BSQ_ACTIVE_ENTRIES */ +static struct op_unit_mask um_bsq = + {13, utm_bitmask, 0x21, + { {0x01, "(r)eq (t)ype (e)ncoding, bit 0: see next event"}, + {0x02, "rte bit 1: 00=read, 01=read invalidate, 10=write, 11=writeback"}, + {0x04, "req len bit 0"}, + {0x08, "req len bit 1"}, + {0x20, "request type is input (0=output)"}, + {0x40, "request type is bus lock"}, + {0x80, "request type is cacheable"}, + {0x100, "request type is 8-byte chunk split across 8-byte boundary"}, + {0x200, "request type is demand (0=prefetch)"}, + {0x400, "request type is ordered"}, + {0x800, "(m)emory (t)ype (e)ncoding, bit 0: see next events"}, + {0x1000, "mte bit 1: see next event"}, + {0x2000, "mte bit 2: 000=UC, 001=USWC, 100=WT, 101=WP, 110=WB"} } }; + +/* X87_ASSIST */ +static struct op_unit_mask um_x87_assist = + {5, utm_bitmask, 0x1f, + { {0x01, "handle FP stack underflow"}, + {0x02, "handle FP stack overflow"}, + {0x04, "handle x87 output overflow"}, + {0x08, "handle x87 output underflow"}, + {0x10, "handle x87 input assist"} } }; + +/* SSE_INPUT_ASSIST */ +/* {PACKED,SCALAR}_{SP,DP}_UOP */ +/* {64,128}BIT_MMX_UOP */ +/* X87_FP_UOP */ +static struct op_unit_mask um_flame_uop = + {1, utm_bitmask, 0x8000, + { {0x8000, "count all uops of this type" } } }; + +/* X87_SIMD_MOVES_UOP */ +static struct op_unit_mask um_x87_simd_moves_uop = + {2, utm_bitmask, 0x18, + { { 0x08, "count all x87 SIMD store/move uops"}, + { 0x10, "count all x87 SIMD load uops"} } }; + +/* MACHINE_CLEAR */ +static struct op_unit_mask um_machine_clear = + {3, utm_bitmask, 0x1, + { {0x01, "count a portion of cycles the machine is cleared for any cause"}, + {0x40, "count cycles machine is cleared due to memory ordering issues"}, + {0x80, "count cycles machine is cleared due to self modifying code"} } }; + +/* GLOBAL_POWER_EVENTS */ +static struct op_unit_mask um_global_power_events = + {1, utm_bitmask, 0x1, + { {0x01, "count cycles when processor is active"} } }; + +/* TC_MS_XFER */ +static struct op_unit_mask um_tc_ms_xfer = + {1, utm_bitmask, 0x1, + { {0x01, "count TC to MS transfers"} } }; + +/* UOP_QUEUE_WRITES */ +static struct op_unit_mask um_uop_queue_writes = + {3, utm_bitmask, 0x7, + { {0x01, "count uops written to queue from TC build mode"}, + {0x02, "count uops written to queue from TC deliver mode"}, + {0x04, "count uops written to queue from microcode ROM" } } }; + +/* FRONT_END_EVENT */ +static struct op_unit_mask um_front_end_event = + {2, utm_bitmask, 0x1, + { {0x01, "count marked uops which are non-bogus"}, + {0x02, "count marked uops which are bogus"} } }; + +/* EXECUTION_EVENT */ +static struct op_unit_mask um_execution_event = + {8, utm_bitmask, 0x1, + { {0x01, "count 1st marked uops which are non-bogus"}, + {0x02, "count 2ns marked uops which are non-bogus"}, + {0x04, "count 3rd marked uops which are non-bogus"}, + {0x08, "count 4th marked uops which are non-bogus"}, + {0x10, "count 1st marked uops which are bogus"}, + {0x20, "count 2nd marked uops which are bogus"}, + {0x40, "count 3rd marked uops which are bogus"}, + {0x80, "count 4th marked uops which are bogus"} } }; + +/* REPLAY_EVENT */ +static struct op_unit_mask um_replay_event = + {2, utm_bitmask, 0x1, + { {0x01, "count marked uops which are non-bogus"}, + {0x02, "count marked uops which are bogus"} } }; + +/* INSTR_RETIRED */ +static struct op_unit_mask um_instr_retired = + {4, utm_bitmask, 0x1, + { {0x01, "count non-bogus instructions which are not tagged"}, + {0x02, "count non-bogus instructions which are tagged"}, + {0x04, "count bogus instructions which are not tagged"}, + {0x08, "count bogus instructions which are tagged"} } }; + +/* UOPS_RETIRED */ +static struct op_unit_mask um_uops_retired = + {2, utm_bitmask, 0x1, + { {0x01, "count marked uops which are non-bogus"}, + {0x02, "count marked uops which are bogus"} } }; + +/* UOP_TYPE */ +static struct op_unit_mask um_uop_type = + {2, utm_bitmask, 0x2, + { {0x02, "count uops which are load operations"}, + {0x04, "count uops which are store operations"} } }; + +/* RETIRED_MISPRED_BRANCH_TYPE */ +/* RETIRED_BRANCH_TYPE */ +static struct op_unit_mask um_branch_type = + {4, utm_bitmask, 0x1e, + { {0x02, "count conditional jumps"}, + {0x04, "count indirect call branches"}, + {0x08, "count return branches"}, + {0x10, "count returns, indirect calls or indirect jumps"} } }; + + + /* the following are just short cut for filling the table of event */ #define OP_RTC (1 << CPU_RTC) #define OP_ATHLON (1 << CPU_ATHLON) @@ -102,9 +340,33 @@ #define OP_PII_PIII (OP_PII | OP_PIII) #define OP_IA_ALL (OP_PII_PIII | OP_PPRO) +#define OP_P4 (1 << CPU_P4) + #define CTR_0 (1 << 0) #define CTR_1 (1 << 1) +/* the pentium 4 has a complex set of restrictions between its 18 + counters, so we simplify it a little and say there are 8 counters. these + 8 at least can be treated as entirely independent, although they can + each only count certain classes of events. these defines are also + present in module/x86/op_nmi.c. */ + +#define CTR_BPU_0 (1 << 0) +#define CTR_BPU_2 (1 << 1) +#define CTR_BPU_ALL (CTR_BPU_0 | CTR_BPU_2) + +#define CTR_MS_0 (1 << 2) +#define CTR_MS_2 (1 << 3) +#define CTR_MS_ALL (CTR_MS_0 | CTR_MS_2) + +#define CTR_FLAME_0 (1 << 4) +#define CTR_FLAME_2 (1 << 5) +#define CTR_FLAME_ALL (CTR_FLAME_0 | CTR_FLAME_2) + +#define CTR_IQ_4 (1 << 6) /* #4 for compatibility with PEBS */ +#define CTR_IQ_5 (1 << 7) +#define CTR_IQ_ALL (CTR_IQ_4 | CTR_IQ_5) + /* ctr allowed, allowed cpus, Event #, unit mask, name, min event value */ /* event name must be in one word */ @@ -340,6 +602,86 @@ { CTR_ALL, OP_ATHLON, 0xcf, &um_empty, "HARDWARE_INTERRUPTS", "Number of taken hardware interrupts", 10,}, + /* pentium 4 events */ + { CTR_IQ_ALL, OP_P4, 0x01, &um_branch_retired, "BRANCH_RETIRED", + "retired branches", 3000}, + { CTR_IQ_ALL, OP_P4, 0x02, &um_mispred_branch_retired, "MISPRED_BRANCH_RETIRED", + "retired mispredicted branches", 3000}, + { CTR_MS_ALL, OP_P4, 0x03, &um_tc_deliver_mode, "TC_DELIVER_MODE", + "duration (in clock cycles) in the trace cache and decode engine", 3000}, + { CTR_BPU_ALL, OP_P4, 0x04, &um_bpu_fetch_request, "BPU_FETCH_REQUEST", + "instruction fetch requests from the branch predict unit", 3000}, + { CTR_BPU_ALL, OP_P4, 0x05, &um_itlb_reference, "ITLB_REFERENCE", + "translations using the instruction translation lookaside buffer", 3000}, + { CTR_FLAME_ALL, OP_P4, 0x06, &um_memory_cancel, "MEMORY_CANCEL", + "cancelled requesets in data cache address control unit", 3000}, + { CTR_FLAME_ALL, OP_P4, 0x07, &um_memory_complete, "MEMORY_COMPLETE", + "completed load split, store split, uncacheable split, uncacheable load", 3000}, + { CTR_FLAME_ALL, OP_P4, 0x08, &um_load_port_replay, "LOAD_PORT_REPLAY", + "replayed events at the load port", 3000}, + { CTR_FLAME_ALL, OP_P4, 0x09, &um_store_port_replay, "STORE_PORT_REPLAY", + "replayed events at the store port", 3000}, + { CTR_BPU_ALL, OP_P4, 0x0a, &um_mob_load_replay, "MOB_LOAD_REPLAY", + "replayed loads from the memory order buffer", 3000}, + { CTR_BPU_ALL, OP_P4, 0x0b, &um_page_walk_type, "PAGE_WALK_TYPE", + "page walks by the page miss handler", 3000}, + { CTR_BPU_ALL, OP_P4, 0x0c, &um_bsq_cache_reference, "BSQ_CACHE_REFERENCE", + "cache references seen by the bus unit", 3000}, + { CTR_BPU_0, OP_P4, 0x0d, &um_ioq, "IOQ_ALLOCATION", + "bus transactions", 3000}, + { CTR_BPU_2, OP_P4, 0x0e, &um_ioq, "IOQ_ACTIVE_ENTRIES", + "number of entries in the IOQ which are active", 3000}, + { CTR_BPU_ALL, OP_P4, 0x0f, &um_fsb_data_activity, "FSB_DATA_ACTIVITY", + "DRDY or DBSY events on the front side bus", 3000}, + { CTR_BPU_0, OP_P4, 0x10, &um_bsq, "BSQ_ALLOCATION", + "allocations in the bus sequence unit", 3000}, + { CTR_BPU_2, OP_P4, 0x11, &um_bsq, "BSQ_ACTIVE_ENTRIES", + "number of entries in the bus sequence unit which are active", 3000}, + { CTR_IQ_ALL, OP_P4, 0x12, &um_x87_assist, "X87_ASSIST", + "retired x87 instructions which required special handling", 3000}, + { CTR_FLAME_ALL, OP_P4, 0x13, &um_flame_uop, "SSE_INPUT_ASSIST", + "input assists requested for SSE or SSE2 operands", 3000}, + { CTR_FLAME_ALL, OP_P4, 0x14, &um_flame_uop, "PACKED_SP_UOP", + "packed single precision uops", 3000}, + { CTR_FLAME_ALL, OP_P4, 0x15, &um_flame_uop, "PACKED_DP_UOP", + "packed double precision uops", 3000}, + { CTR_FLAME_ALL, OP_P4, 0x16, &um_flame_uop, "SCALAR_SP_UOP", + "scalar single precision uops", 3000}, + { CTR_FLAME_ALL, OP_P4, 0x17, &um_flame_uop, "SCALAR_DP_UOP", + "scalar double presision uops", 3000}, + { CTR_FLAME_ALL, OP_P4, 0x18, &um_flame_uop, "64BIT_MMX_UOP", + "64 bit SIMD MMX instructions", 3000}, + { CTR_FLAME_ALL, OP_P4, 0x19, &um_flame_uop, "128BIT_MMX_UOP", + "128 bit SIMD SSE2 instructions", 3000}, + { CTR_FLAME_ALL, OP_P4, 0x1a, &um_flame_uop, "X87_FP_UOP", + "x87 floating point uops", 3000}, + { CTR_FLAME_ALL, OP_P4, 0x1b, &um_x87_simd_moves_uop, "X87_SIMD_MOVES_UOP", + "x87 FPU, MMX, SSE, or SSE2 loads, stores and reg-to-reg moves", 3000}, + { CTR_IQ_ALL, OP_P4, 0x1c, &um_machine_clear, "MACHINE_CLEAR", + "cycles with entire machine pipeline cleared", 3000}, + { CTR_BPU_ALL, OP_P4, 0x1d, &um_global_power_events, "GLOBAL_POWER_EVENTS", + "time during which processor is not stopped", 3000}, + { CTR_MS_ALL, OP_P4, 0x1e, &um_tc_ms_xfer, "TC_MS_XFER", + "number of times uops deliver changed from TC to MS ROM", 3000}, + { CTR_MS_ALL, OP_P4, 0x1f, &um_uop_queue_writes, "UOP_QUEUE_WRITES", + "number of valid uops written to the uop queue", 3000}, + { CTR_IQ_ALL, OP_P4, 0x20, &um_front_end_event, "FRONT_END_EVENT", + "retired uops, tagged with front-end tagging", 3000}, + { CTR_IQ_ALL, OP_P4, 0x21, &um_execution_event, "EXECUTION_EVENT", + "retired uops, tagged with execution tagging", 3000}, + { CTR_IQ_ALL, OP_P4, 0x22, &um_replay_event, "REPLAY_EVENT", + "retired uops, tagged with replay tagging", 3000}, + { CTR_IQ_ALL, OP_P4, 0x23, &um_instr_retired, "INSTR_RETIRED", + "retired instructions", 3000}, + { CTR_IQ_ALL, OP_P4, 0x24, &um_uops_retired, "UOPS_RETIRED", + "retired uops", 3000}, + { CTR_IQ_ALL, OP_P4, 0x25, &um_uop_type, "UOP_TYPE", + "type of uop tagged by front-end tagging", 3000}, + { CTR_MS_ALL, OP_P4, 0x26, &um_branch_type, "RETIRED_MISPRED_BRANCH_TYPE", + "retired mispredicted branched, selected by type", 3000}, + { CTR_MS_ALL, OP_P4, 0x27, &um_branch_type, "RETIRED_BRANCH_TYPE", + "retired branches, selected by type", 3000}, + /* other CPUs */ { CTR_0, OP_RTC, 0xff, &um_empty, "RTC_Interrupts", "RTC interrupts/sec (rounded up to power of two)", 2,}, @@ -362,7 +704,7 @@ * > 0 otherwise, in this case allow->um[return value - 1] == um so the * caller can access to the description of the unit_mask. */ -int op_check_unit_mask(struct op_unit_mask const * allow, u8 um) +int op_check_unit_mask(struct op_unit_mask const * allow, u16 um) { u32 i, mask; @@ -439,10 +781,12 @@ * * 3 AMD Athlon * + * 6 Pentium 4 / Xeon + * * The function returns bitmask of failure cause * 0 otherwise */ -int op_check_events(int ctr, u8 ctr_type, u8 ctr_um, op_cpu cpu_type) +int op_check_events(int ctr, u8 ctr_type, u16 ctr_um, op_cpu cpu_type) { int ret = OP_OK_EVENT; u32 i; --- libop/op_events.h Thu Sep 19 19:25:29 2002 +++ libop/op_events.h Mon Sep 23 16:11:07 2002 @@ -40,12 +40,12 @@ struct op_unit_mask { u32 num; /**< number of possible unit masks */ enum unit_mask_type unit_type_mask; - u8 default_mask; /**< only the gui use it */ + u16 default_mask; /**< only the gui use it */ /** up to sixteen allowed unit masks */ struct op_described_um { - u8 value; + u16 value; char const * desc; - } um[7]; + } um[16]; }; /** Describe an event. */ @@ -81,7 +81,7 @@ * * \sa op_cpu, OP_EVENTS_OK */ -int op_check_events(int ctr, u8 ctr_type, u8 ctr_um, op_cpu cpu_type); +int op_check_events(int ctr, u8 ctr_type, u16 ctr_um, op_cpu cpu_type); /** * sanity check unit mask value @@ -99,7 +99,7 @@ * the unit_mask through op_unit_descs * \sa op_unit_descs */ -int op_check_unit_mask(struct op_unit_mask const * allow, u8 um); +int op_check_unit_mask(struct op_unit_mask const * allow, u16 um); /** a special constant meaning this event is available for all counters */ #define CTR_ALL (~0u) --- libop/op_hw_config.h Sat Sep 7 14:19:35 2002 +++ libop/op_hw_config.h Tue Sep 24 11:15:08 2002 @@ -15,8 +15,8 @@ /** maximum number of counters, up to 4 for Athlon (18 for P4). The primary * use of this variable is for static/local array dimension. Never use it in * loop or in array index access/index checking unless you know what you - * made. Don't change it without updating OP_BITS_CTR! */ -#define OP_MAX_COUNTERS 4 + * made. */ +#define OP_MAX_COUNTERS 8 /** a plain unsigned int magic value to check against counter overflow */ #define OP_COUNT_MAX ~0u --- libop++/op_print_event.cpp Thu Sep 19 19:25:29 2002 +++ libop++/op_print_event.cpp Mon Sep 23 16:00:44 2002 @@ -23,7 +23,7 @@ using std::setfill; void op_print_event(ostream & out, int counter_nr, op_cpu cpu_type, - u8 type, u8 um, u32 count) + u8 type, u16 um, u32 count) { char const * typenamep; char const * typedescp; --- libop++/op_print_event.h Mon Sep 23 17:34:52 2002 +++ libop++/op_print_event.h Mon Sep 23 16:00:51 2002 @@ -23,6 +23,6 @@ * to the stream. */ void op_print_event(std::ostream & out, int counter_nr, - op_cpu cpu_type, u8 type, u8 um, u32 count); + op_cpu cpu_type, u8 type, u16 um, u32 count); #endif // OP_PRINT_EVENT --- module/oprofile.c Mon Sep 23 17:34:52 2002 +++ module/oprofile.c Mon Sep 23 16:01:58 2002 @@ -796,7 +796,7 @@ { 1, "nr_interrupts", &sysctl.nr_interrupts, sizeof(int), 0444, NULL, &get_nr_interrupts, NULL, }, { 1, "notesize", &sysctl_parms.note_size, sizeof(int), 0644, NULL, &lproc_dointvec, NULL, }, { 1, "cpu_type", &sysctl.cpu_type, sizeof(int), 0444, NULL, &lproc_dointvec, NULL, }, - { 0, }, { 0, }, { 0, }, { 0, }, + { 0, }, { 0, }, { 0, }, { 0, }, { 0, }, { 0, }, { 0, }, { 0, }, { 0, }, }; --- module/x86/Makefile.in Mon Sep 23 17:34:52 2002 +++ module/x86/Makefile.in Mon Sep 23 14:04:33 2002 @@ -19,7 +19,7 @@ obj-y := cpu_type.o op_apic.o op_fixmap.o op_rtc.o op_nmi.o \ op_syscalls.o op_model_ppro.o op_model_athlon.o \ - oprofile_nmi.o + op_model_p4.o oprofile_nmi.o obj-m := $(O_TARGET) O_OBJS := $(obj-y) M_OBJS := $(O_TARGET) --- module/x86/cpu_type.c Sat Sep 14 22:36:31 2002 +++ module/x86/cpu_type.c Mon Sep 23 16:11:46 2002 @@ -36,17 +36,25 @@ return CPU_ATHLON; case X86_VENDOR_INTEL: - /* Less than a P6-class processor */ - if (family != 6) + switch (family) { + default: return CPU_RTC; - - if (model > 5) - return CPU_PIII; - else if (model > 2) - return CPU_PII; - - return CPU_PPRO; - + case 6: + /* A P6-class processor */ + if (model > 5) + return CPU_PIII; + else if (model > 2) + return CPU_PII; + return CPU_PPRO; + case 0xf: + if (model <= 3) + /* a Pentium 4 processor */ + return CPU_P4; + else + /* Do not know what it is */ + return CPU_RTC; + } + default: return CPU_RTC; } --- module/x86/op_apic.c Sat Sep 7 14:19:39 2002 +++ module/x86/op_apic.c Mon Sep 23 16:08:45 2002 @@ -124,7 +124,7 @@ goto not_local_p6_apic; /* LVT0,LVT1,LVTT,LVTPC */ - if (GET_APIC_MAXLVT(apic_read(APIC_LVR)) != 4) + if (GET_APIC_MAXLVT(apic_read(APIC_LVR)) < 4) goto not_local_p6_apic; /* IA32 V3, 7.4.14.1 */ @@ -190,7 +190,8 @@ if (sysctl.cpu_type != CPU_PPRO && sysctl.cpu_type != CPU_PII && sysctl.cpu_type != CPU_PIII && - sysctl.cpu_type != CPU_ATHLON) + sysctl.cpu_type != CPU_ATHLON && + sysctl.cpu_type != CPU_P4) return 0; return 1; --- module/x86/op_model_p4.c Wed Dec 31 19:00:00 1969 +++ module/x86/op_model_p4.c Tue Sep 24 14:51:12 2002 @@ -0,0 +1,492 @@ +/** + * @file op_model_p4.h + * P4 model-specific MSR operations + * + * @remark Copyright 2002 OProfile authors + * @remark Read the file COPYING + * + * @author Graydon Hoare + */ + +#include "op_x86_model.h" +#include "op_msr.h" +#include "op_apic.h" + +#define NUM_EVENTS 39 +#define NUM_COUNTERS 8 +#define NUM_ESCRS 45 +#define NUM_CCCRS 18 +#define NUM_CONTROLS (NUM_ESCRS + NUM_CCCRS) + +/* tables to simulate simplified hardware view of p4 registers */ +struct p4_counter_binding { + int virt_counter; + int counter_address; + int cccr_address; +}; + +struct p4_event_binding { + int escr_select; /* value to put in CCCR */ + int event_select; /* value to put in ESCR */ + struct { + int virt_counter; /* for this counter... */ + int escr_address; /* use this ESCR */ + } bindings[2]; +}; + +/* nb: these CTR_* defines are a duplicate of defines in + libop/op_events.c. */ + +#define CTR_BPU_0 (1 << 0) +#define CTR_BPU_2 (1 << 1) +#define CTR_MS_0 (1 << 2) +#define CTR_MS_2 (1 << 3) +#define CTR_FLAME_0 (1 << 4) +#define CTR_FLAME_2 (1 << 5) +#define CTR_IQ_4 (1 << 6) +#define CTR_IQ_5 (1 << 7) + +struct p4_counter_binding p4_counters [NUM_COUNTERS] = { + { CTR_BPU_0, MSR_P4_BPU_PERFCTR0, MSR_P4_BPU_CCCR0 }, + { CTR_BPU_2, MSR_P4_BPU_PERFCTR2, MSR_P4_BPU_CCCR2 }, + { CTR_MS_0, MSR_P4_MS_PERFCTR0, MSR_P4_MS_CCCR0 }, + { CTR_MS_2, MSR_P4_MS_PERFCTR2, MSR_P4_MS_CCCR2 }, + { CTR_FLAME_0, MSR_P4_FLAME_PERFCTR0, MSR_P4_FLAME_CCCR0 }, + { CTR_FLAME_2, MSR_P4_FLAME_PERFCTR2, MSR_P4_FLAME_CCCR2 }, + { CTR_IQ_4, MSR_P4_IQ_PERFCTR4, MSR_P4_IQ_CCCR4 }, + { CTR_IQ_5, MSR_P4_IQ_PERFCTR5, MSR_P4_IQ_CCCR5 }, +}; + +/* p4 event codes in libop/op_event.h are indices into this table. */ + +struct p4_event_binding p4_events[NUM_EVENTS] = { + + { /* BRANCH_RETIRED */ + 0x05, 0x06, + { {CTR_IQ_4, MSR_P4_CRU_ESCR2}, + {CTR_IQ_5, MSR_P4_CRU_ESCR3} } + }, + + { /* MISPRED_BRANCH_RETIRED */ + 0x04, 0x03, + { { CTR_IQ_4, MSR_P4_CRU_ESCR0}, + { CTR_IQ_5, MSR_P4_CRU_ESCR1} } + }, + + { /* TC_DELIVER_MODE */ + 0x04, 0x01, + { { CTR_MS_0, MSR_P4_TC_ESCR0}, + { CTR_MS_2, MSR_P4_TC_ESCR1} } + }, + + { /* BPU_FETCH_REQUEST */ + 0x00, 0x03, + { { CTR_BPU_0, MSR_P4_BPU_ESCR0}, + { CTR_BPU_2, MSR_P4_BPU_ESCR1} } + }, + + { /* ITLB_REFERENCE */ + 0x03, 0x18, + { { CTR_BPU_0, MSR_P4_ITLB_ESCR0}, + { CTR_BPU_2, MSR_P4_ITLB_ESCR1} } + }, + + { /* MEMORY_CANCEL */ + 0x05, 0x02, + { { CTR_FLAME_0, MSR_P4_DAC_ESCR0}, + { CTR_FLAME_2, MSR_P4_DAC_ESCR1} } + }, + + { /* MEMORY_COMPLETE */ + 0x02, 0x08, + { { CTR_FLAME_0, MSR_P4_SAAT_ESCR0}, + { CTR_FLAME_2, MSR_P4_SAAT_ESCR1} } + }, + + { /* LOAD_PORT_REPLAY */ + 0x02, 0x04, + { { CTR_FLAME_0, MSR_P4_SAAT_ESCR0}, + { CTR_FLAME_2, MSR_P4_SAAT_ESCR1} } + }, + + { /* STORE_PORT_REPLAY */ + 0x02, 0x05, + { { CTR_FLAME_0, MSR_P4_SAAT_ESCR0}, + { CTR_FLAME_2, MSR_P4_SAAT_ESCR1} } + }, + + { /* MOB_LOAD_REPLAY */ + 0x02, 0x03, + { { CTR_BPU_0, MSR_P4_MOB_ESCR0}, + { CTR_BPU_2, MSR_P4_MOB_ESCR1} } + }, + + { /* PAGE_WALK_TYPE */ + 0x04, 0x01, + { { CTR_BPU_0, MSR_P4_PMH_ESCR0}, + { CTR_BPU_2, MSR_P4_PMH_ESCR1} } + }, + + { /* BSQ_CACHE_REFERENCE */ + 0x07, 0x0c, + { { CTR_BPU_0, MSR_P4_BSU_ESCR0}, + { CTR_BPU_2, MSR_P4_BSU_ESCR1} } + }, + + { /* IOQ_ALLOCATION */ + 0x06, 0x03, + { { CTR_BPU_0, MSR_P4_FSB_ESCR0}, + {-1,-1} } + }, + + { /* IOQ_ACTIVE_ENTRIES */ + 0x06, 0x1a, + { { CTR_BPU_2, MSR_P4_FSB_ESCR1}, + {-1,-1} } + }, + + { /* FSB_DATA_ACTIVITY */ + 0x06, 0x17, + { { CTR_BPU_0, MSR_P4_FSB_ESCR0}, + { CTR_BPU_2, MSR_P4_FSB_ESCR1} } + }, + + { /* BSQ_ALLOCATION */ + 0x07, 0x05, + { { CTR_BPU_0, MSR_P4_BSU_ESCR0}, + {-1,-1} } + }, + + { /* BSQ_ACTIVE_ENTRIES */ + 0x07, 0x06, + { { CTR_BPU_2, MSR_P4_BSU_ESCR1 /* guess */}, + {-1,-1} } + }, + + { /* X87_ASSIST */ + 0x05, 0x03, + { { CTR_IQ_4, MSR_P4_CRU_ESCR2}, + { CTR_IQ_5, MSR_P4_CRU_ESCR3} } + }, + + { /* SSE_INPUT_ASSIST */ + 0x01, 0x34, + { { CTR_FLAME_0, MSR_P4_FIRM_ESCR0}, + { CTR_FLAME_2, MSR_P4_FIRM_ESCR1} } + }, + + { /* PACKED_SP_UOP */ + 0x01, 0x08, + { { CTR_FLAME_0, MSR_P4_FIRM_ESCR0}, + { CTR_FLAME_2, MSR_P4_FIRM_ESCR1} } + }, + + { /* PACKED_DP_UOP */ + 0x01, 0x0c, + { { CTR_FLAME_0, MSR_P4_FIRM_ESCR0}, + { CTR_FLAME_2, MSR_P4_FIRM_ESCR1} } + }, + + { /* SCALAR_SP_UOP */ + 0x01, 0x0a, + { { CTR_FLAME_0, MSR_P4_FIRM_ESCR0}, + { CTR_FLAME_2, MSR_P4_FIRM_ESCR1} } + }, + + { /* SCALAR_DP_UOP */ + 0x01, 0x0e, + { { CTR_FLAME_0, MSR_P4_FIRM_ESCR0}, + { CTR_FLAME_2, MSR_P4_FIRM_ESCR1} } + }, + + { /* 64BIT_MMX_UOP */ + 0x01, 0x02, + { { CTR_FLAME_0, MSR_P4_FIRM_ESCR0}, + { CTR_FLAME_2, MSR_P4_FIRM_ESCR1} } + }, + + { /* 128BIT_MMX_UOP */ + 0x01, 0x1a, + { { CTR_FLAME_0, MSR_P4_FIRM_ESCR0}, + { CTR_FLAME_2, MSR_P4_FIRM_ESCR1} } + }, + + { /* X87_FP_UOP */ + 0x01, 0x04, + { { CTR_FLAME_0, MSR_P4_FIRM_ESCR0}, + { CTR_FLAME_2, MSR_P4_FIRM_ESCR1} } + }, + + { /* X87_SIMD_MOVES_UOP */ + 0x01, 0x2e, + { { CTR_FLAME_0, MSR_P4_FIRM_ESCR0}, + { CTR_FLAME_2, MSR_P4_FIRM_ESCR1} } + }, + + { /* MACHINE_CLEAR */ + 0x05, 0x02, + { { CTR_IQ_4, MSR_P4_CRU_ESCR2}, + { CTR_IQ_5, MSR_P4_CRU_ESCR3} } + }, + + { /* GLOBAL_POWER_EVENTS */ + 0x06, 0x13 /* manual says 0x05 */, + { { CTR_BPU_0, MSR_P4_FSB_ESCR0}, + { CTR_BPU_2, MSR_P4_FSB_ESCR1} } + }, + + { /* TC_MS_XFER */ + 0x00, 0x05, + { { CTR_MS_0, MSR_P4_MS_ESCR0}, + { CTR_MS_2, MSR_P4_MS_ESCR1} } + }, + + { /* UOP_QUEUE_WRITES */ + 0x00, 0x09, + { { CTR_MS_0, MSR_P4_MS_ESCR0}, + { CTR_MS_2, MSR_P4_MS_ESCR1} } + }, + + { /* FRONT_END_EVENT */ + 0x05, 0x08, + { { CTR_IQ_4, MSR_P4_CRU_ESCR2}, + { CTR_IQ_5, MSR_P4_CRU_ESCR3} } + }, + + { /* EXECUTION_EVENT */ + 0x05, 0x0c, + { { CTR_IQ_4, MSR_P4_CRU_ESCR2}, + { CTR_IQ_5, MSR_P4_CRU_ESCR3} } + }, + + { /* REPLAY_EVENT */ + 0x05, 0x09, + { { CTR_IQ_4, MSR_P4_CRU_ESCR2}, + { CTR_IQ_5, MSR_P4_CRU_ESCR3} } + }, + + { /* INSTR_RETIRED */ + 0x04, 0x02, + { { CTR_IQ_4, MSR_P4_CRU_ESCR0}, + { CTR_IQ_5, MSR_P4_CRU_ESCR1} } + }, + + { /* UOPS_RETIRED */ + 0x04, 0x01, + { { CTR_IQ_4, MSR_P4_CRU_ESCR0}, + { CTR_IQ_5, MSR_P4_CRU_ESCR1} } + }, + + { /* UOP_TYPE */ + 0x02, 0x02, + { { CTR_IQ_4, MSR_P4_RAT_ESCR0}, + { CTR_IQ_5, MSR_P4_RAT_ESCR1} } + }, + + { /* RETIRED_MISPRED_BRANCH_TYPE */ + 0x02, 0x05, + { { CTR_MS_0, MSR_P4_TBPU_ESCR0}, + { CTR_MS_2, MSR_P4_TBPU_ESCR1} } + }, + + { /* RETIRED_BRANCH_TYPE */ + 0x02, 0x04, + { { CTR_MS_0, MSR_P4_TBPU_ESCR0}, + { CTR_MS_2, MSR_P4_TBPU_ESCR1} } + } +}; + + +#define MISC_PMC_ENABLED_P(x) ((x) & 1 << 7) + +#define ESCR_RESERVED_BITS 0x80000003 +#define ESCR_CLEAR(escr) ((escr) &= ESCR_RESERVED_BITS) +#define ESCR_SET_USR_0(escr, usr) ((escr) |= (((usr) & 1) << 2)) +#define ESCR_SET_OS_0(escr, os) ((escr) |= (((os) & 1) << 3)) +#define ESCR_SET_EVENT_SELECT(escr, sel) ((escr) |= (((sel) & 0x1f) << 25)) +#define ESCR_SET_EVENT_MASK(escr, mask) ((escr) |= (((mask) & 0xffff) << 9)) +#define ESCR_READ(escr,high,ev,i) do {rdmsr(ev->bindings[(i)].escr_address, (escr), (high));} while (0); +#define ESCR_WRITE(escr,high,ev,i) do {wrmsr(ev->bindings[(i)].escr_address, (escr), (high));} while (0); + +#define CCCR_RESERVED_BITS 0x38030FFF +#define CCCR_CLEAR(cccr) ((cccr) &= CCCR_RESERVED_BITS) +#define CCCR_SET_REQUIRED_BITS(cccr) ((cccr) |= 0x00030000) +#define CCCR_SET_ESCR_SELECT(cccr, sel) ((cccr) |= (((sel) & 0x07) << 13)) +#define CCCR_SET_PMI_OVF(cccr) ((cccr) |= (1<<26)) +#define CCCR_SET_ENABLE(cccr) ((cccr) |= (1<<12)) +#define CCCR_SET_DISABLE(cccr) ((cccr) &= ~(1<<12)) +#define CCCR_READ(low, high, i) do {rdmsr (p4_counters[(i)].cccr_address, (low), (high));} while (0); +#define CCCR_WRITE(low, high, i) do {wrmsr (p4_counters[(i)].cccr_address, (low), (high));} while (0); +#define CCCR_OVF_P(cccr) ((cccr) & (1U<<31)) +#define CCCR_CLEAR_OVF(cccr) ((cccr) &= (~(1U<<31))) + +#define CTR_READ(l,h,i) do {rdmsr(p4_counters[(i)].counter_address, (l), (h));} while (0); +#define CTR_WRITE(l,i) do {wrmsr(p4_counters[(i)].counter_address, -(u32)(l), -1);} while (0); + + +static void p4_fill_in_addresses(struct op_msrs * const msrs) +{ + int i; + uint addr; + + /* the 8 counter registers we pay attention to */ + for (i = 0; i < NUM_COUNTERS; ++i) + msrs->counters.addrs[i] = p4_counters[i].counter_address; + + /* 18 CCCR registers */ + for (i=0, addr = MSR_P4_BPU_CCCR0; + addr <= MSR_P4_IQ_CCCR5; ++addr, ++i) + msrs->controls.addrs[i] = addr; + + /* 43 ESCR registers */ + for (addr = MSR_P4_BSU_ESCR0; + addr <= MSR_P4_SSU_ESCR0; ++addr, ++i){ + msrs->controls.addrs[i] = addr; + } + + for (addr = MSR_P4_MS_ESCR0; + addr <= MSR_P4_TC_ESCR1; ++addr, ++i){ + msrs->controls.addrs[i] = addr; + } + + for (addr = MSR_P4_IX_ESCR0; + addr <= MSR_P4_CRU_ESCR3; ++addr, ++i){ + msrs->controls.addrs[i] = addr; + } + + /* 2 remaining non-contiguously located ESCRs */ + msrs->controls.addrs[i++] = MSR_P4_CRU_ESCR4; + msrs->controls.addrs[i++] = MSR_P4_CRU_ESCR5; +} + +static void pmc_setup_one_p4_counter(uint ctr) +{ + int i; + int const maxbind = 2; + uint cccr = 0; + uint escr = 0; + uint high = 0; + uint counter_bit; + struct p4_event_binding * ev = NULL; + + /* convert from counter *number* to counter *bit* */ + counter_bit = 1 << ctr; + + /* find our event binding structure. */ + if (sysctl.ctr[ctr].event < 0 || sysctl.ctr[ctr].event > NUM_EVENTS) { + printk(KERN_ERR + "oprofile: P4 event code 0x%x out of range\n", + sysctl.ctr[ctr].event); + return; + } + + ev = &(p4_events[sysctl.ctr[ctr].event - 1]); + + for (i = 0; i < maxbind; i++) { + if (ev->bindings[i].virt_counter & counter_bit) { + + /* modify ESCR */ + ESCR_READ(escr, high, ev, i); + ESCR_CLEAR(escr); + ESCR_SET_USR_0(escr, sysctl.ctr[ctr].user); + ESCR_SET_OS_0(escr, sysctl.ctr[ctr].kernel); + ESCR_SET_EVENT_SELECT(escr, ev->event_select); + ESCR_SET_EVENT_MASK(escr, sysctl.ctr[ctr].unit_mask); + ESCR_WRITE(escr, high, ev, i); + + /* modify CCCR */ + CCCR_READ(cccr, high, ctr); + CCCR_CLEAR(cccr); + CCCR_SET_REQUIRED_BITS(cccr); + CCCR_SET_ESCR_SELECT(cccr, ev->escr_select); + CCCR_SET_PMI_OVF(cccr); + CCCR_WRITE(cccr, high, ctr); + return; + } + } +} + + +static void p4_setup_ctrs(struct op_msrs const * const msrs) +{ + uint i; + uint low, high; + + rdmsr(MSR_P4_MISC, low, high); + if (! MISC_PMC_ENABLED_P(low)) { + printk(KERN_ERR "oprofile: P4 PMC not available"); + return; + } + + /* clear all cccrs */ + for (i = 0 ; i < NUM_COUNTERS ; ++i) { + CCCR_READ(low, high, i); + CCCR_CLEAR(low); + CCCR_SET_REQUIRED_BITS(low); + CCCR_WRITE(low, high, i); + } + + /* setup all counters */ + for (i = 0 ; i < NUM_COUNTERS ; ++i) { + if (sysctl.ctr[i].event) { + pmc_setup_one_p4_counter(i); + CTR_WRITE(sysctl.ctr[i].count, i); + } + } +} + +static void p4_check_ctrs(uint const cpu, + struct op_msrs const * const msrs, + struct pt_regs * const regs) +{ + ulong low, high; + int i; + + for (i = 0; i < NUM_COUNTERS; ++i) { + CCCR_READ(low, high, i); + if (CCCR_OVF_P(low)) { + op_do_profile(cpu, regs, i); + CCCR_CLEAR_OVF(low); + CTR_WRITE(oprof_data[cpu].ctr_count[i], i); + CCCR_WRITE(low, high, i); + } + } + // P4 quirk: you have to re-unmask the apic vector + apic_write(APIC_LVTPC, apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED); +} + + +static void p4_start(struct op_msrs const * const msrs) +{ + uint low,high; + int i; + for (i = 0; i < NUM_COUNTERS; ++i) { + if (!sysctl.ctr[i].enabled) continue; + CCCR_READ(low, high, i); + CCCR_SET_ENABLE(low); + CCCR_WRITE(low, high, i); + } +} + +static void p4_stop(struct op_msrs const * const msrs) +{ + uint low,high; + int i; + for (i = 0; i < NUM_COUNTERS; ++i) { + if (!sysctl.ctr[i].enabled) continue; + CCCR_READ(low, high, i); + CCCR_SET_DISABLE(low); + CCCR_WRITE(low, high, i); + } +} + + +struct op_x86_model_spec const op_p4_spec = { + .num_counters = NUM_COUNTERS, + .num_controls = NUM_CONTROLS, + .fill_in_addresses = &p4_fill_in_addresses, + .setup_ctrs = &p4_setup_ctrs, + .check_ctrs = &p4_check_ctrs, + .start = &p4_start, + .stop = &p4_stop +}; --- module/x86/op_msr.h Mon Sep 23 17:34:52 2002 +++ module/x86/op_msr.h Mon Sep 23 16:09:38 2002 @@ -61,4 +61,270 @@ #define MSR_K7_PERFCTR3 0xc0010007 #endif +/* There are *82* pentium 4 MSRs: + + - 1 misc register + + - 18 counters (PERFCTRs) + + - 18 counter configuration control registers (CCCRs) + + - 45 event selection control registers (ESCRs). */ + + +#ifndef MSR_P4_MISC +#define MSR_P4_MISC 0x1a0 +#endif + +#ifndef MSR_P4_BPU_PERFCTR0 +#define MSR_P4_BPU_PERFCTR0 0x300 +#endif +#ifndef MSR_P4_BPU_PERFCTR1 +#define MSR_P4_BPU_PERFCTR1 0x301 +#endif +#ifndef MSR_P4_BPU_PERFCTR2 +#define MSR_P4_BPU_PERFCTR2 0x302 +#endif +#ifndef MSR_P4_BPU_PERFCTR3 +#define MSR_P4_BPU_PERFCTR3 0x303 +#endif +#ifndef MSR_P4_MS_PERFCTR0 +#define MSR_P4_MS_PERFCTR0 0x304 +#endif +#ifndef MSR_P4_MS_PERFCTR1 +#define MSR_P4_MS_PERFCTR1 0x305 +#endif +#ifndef MSR_P4_MS_PERFCTR2 +#define MSR_P4_MS_PERFCTR2 0x306 +#endif +#ifndef MSR_P4_MS_PERFCTR3 +#define MSR_P4_MS_PERFCTR3 0x307 +#endif +#ifndef MSR_P4_FLAME_PERFCTR0 +#define MSR_P4_FLAME_PERFCTR0 0x308 +#endif +#ifndef MSR_P4_FLAME_PERFCTR1 +#define MSR_P4_FLAME_PERFCTR1 0x309 +#endif +#ifndef MSR_P4_FLAME_PERFCTR2 +#define MSR_P4_FLAME_PERFCTR2 0x30a +#endif +#ifndef MSR_P4_FLAME_PERFCTR3 +#define MSR_P4_FLAME_PERFCTR3 0x30b +#endif +#ifndef MSR_P4_IQ_PERFCTR0 +#define MSR_P4_IQ_PERFCTR0 0x30c +#endif +#ifndef MSR_P4_IQ_PERFCTR1 +#define MSR_P4_IQ_PERFCTR1 0x30d +#endif +#ifndef MSR_P4_IQ_PERFCTR2 +#define MSR_P4_IQ_PERFCTR2 0x30e +#endif +#ifndef MSR_P4_IQ_PERFCTR3 +#define MSR_P4_IQ_PERFCTR3 0x30f +#endif +#ifndef MSR_P4_IQ_PERFCTR4 +#define MSR_P4_IQ_PERFCTR4 0x310 +#endif +#ifndef MSR_P4_IQ_PERFCTR5 +#define MSR_P4_IQ_PERFCTR5 0x311 +#endif + + +#ifndef MSR_P4_BPU_CCCR0 +#define MSR_P4_BPU_CCCR0 0x360 +#endif +#ifndef MSR_P4_BPU_CCCR1 +#define MSR_P4_BPU_CCCR1 0x361 +#endif +#ifndef MSR_P4_BPU_CCCR2 +#define MSR_P4_BPU_CCCR2 0x362 +#endif +#ifndef MSR_P4_BPU_CCCR3 +#define MSR_P4_BPU_CCCR3 0x363 +#endif +#ifndef MSR_P4_MS_CCCR0 +#define MSR_P4_MS_CCCR0 0x364 +#endif +#ifndef MSR_P4_MS_CCCR1 +#define MSR_P4_MS_CCCR1 0x365 +#endif +#ifndef MSR_P4_MS_CCCR2 +#define MSR_P4_MS_CCCR2 0x366 +#endif +#ifndef MSR_P4_MS_CCCR3 +#define MSR_P4_MS_CCCR3 0x367 +#endif +#ifndef MSR_P4_FLAME_CCCR0 +#define MSR_P4_FLAME_CCCR0 0x368 +#endif +#ifndef MSR_P4_FLAME_CCCR1 +#define MSR_P4_FLAME_CCCR1 0x369 +#endif +#ifndef MSR_P4_FLAME_CCCR2 +#define MSR_P4_FLAME_CCCR2 0x36a +#endif +#ifndef MSR_P4_FLAME_CCCR3 +#define MSR_P4_FLAME_CCCR3 0x36b +#endif +#ifndef MSR_P4_IQ_CCCR0 +#define MSR_P4_IQ_CCCR0 0x36c +#endif +#ifndef MSR_P4_IQ_CCCR1 +#define MSR_P4_IQ_CCCR1 0x36d +#endif +#ifndef MSR_P4_IQ_CCCR2 +#define MSR_P4_IQ_CCCR2 0x36e +#endif +#ifndef MSR_P4_IQ_CCCR3 +#define MSR_P4_IQ_CCCR3 0x36f +#endif +#ifndef MSR_P4_IQ_CCCR4 +#define MSR_P4_IQ_CCCR4 0x370 +#endif +#ifndef MSR_P4_IQ_CCCR5 +#define MSR_P4_IQ_CCCR5 0x371 +#endif + + +#ifndef MSR_P4_ALF_ESCR0 +#define MSR_P4_ALF_ESCR0 0x3ca +#endif +#ifndef MSR_P4_ALF_ESCR1 +#define MSR_P4_ALF_ESCR1 0x3cb +#endif +#ifndef MSR_P4_BPU_ESCR0 +#define MSR_P4_BPU_ESCR0 0x3b2 +#endif +#ifndef MSR_P4_BPU_ESCR1 +#define MSR_P4_BPU_ESCR1 0x3b3 +#endif +#ifndef MSR_P4_BSU_ESCR0 +#define MSR_P4_BSU_ESCR0 0x3a0 +#endif +#ifndef MSR_P4_BSU_ESCR1 +#define MSR_P4_BSU_ESCR1 0x3a1 +#endif +#ifndef MSR_P4_CRU_ESCR0 +#define MSR_P4_CRU_ESCR0 0x3b8 +#endif +#ifndef MSR_P4_CRU_ESCR1 +#define MSR_P4_CRU_ESCR1 0x3b9 +#endif +#ifndef MSR_P4_CRU_ESCR2 +#define MSR_P4_CRU_ESCR2 0x3cc +#endif +#ifndef MSR_P4_CRU_ESCR3 +#define MSR_P4_CRU_ESCR3 0x3cd +#endif +#ifndef MSR_P4_CRU_ESCR4 +#define MSR_P4_CRU_ESCR4 0x3e0 +#endif +#ifndef MSR_P4_CRU_ESCR5 +#define MSR_P4_CRU_ESCR5 0x3e1 +#endif +#ifndef MSR_P4_DAC_ESCR0 +#define MSR_P4_DAC_ESCR0 0x3a8 +#endif +#ifndef MSR_P4_DAC_ESCR1 +#define MSR_P4_DAC_ESCR1 0x3a9 +#endif +#ifndef MSR_P4_FIRM_ESCR0 +#define MSR_P4_FIRM_ESCR0 0x3a4 +#endif +#ifndef MSR_P4_FIRM_ESCR1 +#define MSR_P4_FIRM_ESCR1 0x3a5 +#endif +#ifndef MSR_P4_FLAME_ESCR0 +#define MSR_P4_FLAME_ESCR0 0x3a6 +#endif +#ifndef MSR_P4_FLAME_ESCR1 +#define MSR_P4_FLAME_ESCR1 0x3a7 +#endif +#ifndef MSR_P4_FSB_ESCR0 +#define MSR_P4_FSB_ESCR0 0x3a2 +#endif +#ifndef MSR_P4_FSB_ESCR1 +#define MSR_P4_FSB_ESCR1 0x3a3 +#endif +#ifndef MSR_P4_IQ_ESCR0 +#define MSR_P4_IQ_ESCR0 0x3ba +#endif +#ifndef MSR_P4_IQ_ESCR1 +#define MSR_P4_IQ_ESCR1 0x3bb +#endif +#ifndef MSR_P4_IS_ESCR0 +#define MSR_P4_IS_ESCR0 0x3b4 +#endif +#ifndef MSR_P4_IS_ESCR1 +#define MSR_P4_IS_ESCR1 0x3b5 +#endif +#ifndef MSR_P4_ITLB_ESCR0 +#define MSR_P4_ITLB_ESCR0 0x3b6 +#endif +#ifndef MSR_P4_ITLB_ESCR1 +#define MSR_P4_ITLB_ESCR1 0x3b7 +#endif +#ifndef MSR_P4_IX_ESCR0 +#define MSR_P4_IX_ESCR0 0x3c8 +#endif +#ifndef MSR_P4_IX_ESCR1 +#define MSR_P4_IX_ESCR1 0x3c9 +#endif +#ifndef MSR_P4_MOB_ESCR0 +#define MSR_P4_MOB_ESCR0 0x3aa +#endif +#ifndef MSR_P4_MOB_ESCR1 +#define MSR_P4_MOB_ESCR1 0x3ab +#endif +#ifndef MSR_P4_MS_ESCR0 +#define MSR_P4_MS_ESCR0 0x3c0 +#endif +#ifndef MSR_P4_MS_ESCR1 +#define MSR_P4_MS_ESCR1 0x3c1 +#endif +#ifndef MSR_P4_PMH_ESCR0 +#define MSR_P4_PMH_ESCR0 0x3ac +#endif +#ifndef MSR_P4_PMH_ESCR1 +#define MSR_P4_PMH_ESCR1 0x3ad +#endif +#ifndef MSR_P4_RAT_ESCR0 +#define MSR_P4_RAT_ESCR0 0x3bc +#endif +#ifndef MSR_P4_RAT_ESCR1 +#define MSR_P4_RAT_ESCR1 0x3bd +#endif +#ifndef MSR_P4_SAAT_ESCR0 +#define MSR_P4_SAAT_ESCR0 0x3ae +#endif +#ifndef MSR_P4_SAAT_ESCR1 +#define MSR_P4_SAAT_ESCR1 0x3af +#endif +#ifndef MSR_P4_SSU_ESCR0 +#define MSR_P4_SSU_ESCR0 0x3be +#endif +#ifndef MSR_P4_SSU_ESCR1 +#define MSR_P4_SSU_ESCR1 0x3bf /* guess: not defined in manual */ +#endif +#ifndef MSR_P4_TBPU_ESCR0 +#define MSR_P4_TBPU_ESCR0 0x3c2 +#endif +#ifndef MSR_P4_TBPU_ESCR1 +#define MSR_P4_TBPU_ESCR1 0x3c3 +#endif +#ifndef MSR_P4_TC_ESCR0 +#define MSR_P4_TC_ESCR0 0x3c4 +#endif +#ifndef MSR_P4_TC_ESCR1 +#define MSR_P4_TC_ESCR1 0x3c5 +#endif +#ifndef MSR_P4_U2L_ESCR0 +#define MSR_P4_U2L_ESCR0 0x3b0 +#endif +#ifndef MSR_P4_U2L_ESCR1 +#define MSR_P4_U2L_ESCR1 0x3b1 +#endif + #endif /* OP_MSR_H */ --- module/x86/op_nmi.c Mon Sep 23 17:34:52 2002 +++ module/x86/op_nmi.c Mon Sep 23 16:44:43 2002 @@ -27,7 +27,9 @@ case CPU_ATHLON: model = &op_athlon_spec; break; - + case CPU_P4: + model = &op_p4_spec; + break; default: model = &op_ppro_spec; break; @@ -326,7 +328,7 @@ } -static char *names[] = { "0", "1", "2", "3", "4", }; +static char *names[] = { "0", "1", "2", "3", "4", "5", "6", "7", "8" }; static int pmc_add_sysctls(ctl_table * next) { --- module/x86/op_x86_model.h Sun Sep 22 13:46:00 2002 +++ module/x86/op_x86_model.h Mon Sep 23 16:35:08 2002 @@ -44,5 +44,6 @@ extern struct op_x86_model_spec const op_ppro_spec; extern struct op_x86_model_spec const op_athlon_spec; +extern struct op_x86_model_spec const op_p4_spec; #endif /* OP_X86_MODEL_H */ |
From: William C. <wc...@nc...> - 2002-09-25 13:48:09
|
I have been working on getting the the P4 patch to compile on the Dual Hyperthreaded P4 with the Linux 2.4.17 kernel. I have run into a couple problems. I think these problems are a result of not testing the code with an SMP configured kernel. module/x86/op_nmi.c:pmc_setup_ctr() needs to be declared as void. Trivial to fix. OP_MAX_CPUS may not be a compile time constant. In module/compat.h it is defined in the following code to be smp_num_cpus in some cases: #if V_BEFORE(2, 5, 23) #define OP_MAX_CPUS smp_num_cpus #define for_each_online_cpu(i) for (i = 0 ; i < smp_num_cpus ; ++i) #else #define OP_MAX_CPUS NR_CPUS static inline int next_cpu(int i) { while (i < NR_CPUS && !cpu_online(i)) ++i; return i; } #define for_each_online_cpu(i) \ for (i = next_cpu(0) ; i < NR_CPUS ; i = next_cpu(++i)) #endif This causes the compiler to have problems with the following line in module/x86/op_nmi.c: static struct op_msrs cpu_msrs[OP_MAX_CPUS]; gcc -D__KERNEL__ -I/usr/src/linux-2.4.17/include -Wall -Wstrict-prototypes -Wno-trigraphs -O2 -fomit-frame-pointer -fno-strict-aliasing -fno-common -pipe -mpreferred-stack-boundary=2 -march=i686 -DMODULE -DMODVERSIONS -include /usr/src/linux-2.4.17/include/linux/modversions.h -DHAVE_LINUX_SPINLOCK_HEADER -DRTC_LOCK -DEXPECT_OK -D__NO_VERSION__ -I/root/oprofile/oprofile-p4.20020925/ -I/root/oprofile/oprofile-p4.20020925/libutil -I/root/oprofile/oprofile-p4.20020925/libop -I/root/oprofile/oprofile-p4.20020925/module -Werror -c -o op_nmi.o op_nmi.c op_nmi.c:19: variable-size type declared outside of any function make[2]: *** [op_nmi.o] Error 1 -Will gr...@re... wrote: > At Tue, 24 Sep 2002 14:32:21 +0100, > John Levon wrote: > > >>Does this mean op_model_athlon code is tested ? > > > yes, as with this copy of the patch, which includes your and will's > corrections; p4 and athlon tested. I've also included some > documentation additions, to cover the p4. > > -graydon |
From: John L. <le...@mo...> - 2002-09-25 13:56:17
|
On Wed, Sep 25, 2002 at 08:58:02AM -0400, William Cohen wrote: > module/x86/op_nmi.c:pmc_setup_ctr() needs to be declared as void. > Trivial to fix. I'm about to apply the P4 patch. I'll fix both (use NR_CPUS for the array) thanks john -- "The only perfect circle on the human body is the eye. When a baby is born its so perfect but when it opens its eyes its just blinded by the corruption and everything else is a downward spiral." - Richey Edwards |
From: William C. <wc...@nc...> - 2002-09-25 18:30:18
|
I have checked out the current OProfile source code with the P4 support from the cvs repository. I have built oprofile on the following machines. P4 (1proc) Linux 2.4.18, GCC 2.96 P4 HT (2 proc) Linux 2.4.18smp, GCC 3.2 Athlon (1proc) Linux 2.4.9, GCC 3.2 I was able use oprof_start to start profiling on each of the platforms and verify that oprofile was sampling data. However, on the P4 HT machine I see that oprofiled is dying after running for a while. /var/lib/oprofile/oprofile.log has the following line at the end of the file: oprofiled: db-insert.c:235: do_insert: Assertion `page->count != 0' failed. I am guessing this is not a hyperthreading related problem. oprofiled is a regular process, it doesn't know about the 1 1/2 processors in each physical processor package. -Will John Levon wrote: > On Wed, Sep 25, 2002 at 08:58:02AM -0400, William Cohen wrote: > > >>module/x86/op_nmi.c:pmc_setup_ctr() needs to be declared as void. >>Trivial to fix. > > > I'm about to apply the P4 patch. I'll fix both (use NR_CPUS for the > array) > > thanks > john > |
From: Philippe E. <ph...@wa...> - 2002-09-25 19:14:27
|
William Cohen wrote: > I have checked out the current OProfile source code with the P4 support > from the cvs repository. I have built oprofile on the following machines. > > P4 (1proc) Linux 2.4.18, GCC 2.96 > P4 HT (2 proc) Linux 2.4.18smp, GCC 3.2 > Athlon (1proc) Linux 2.4.9, GCC 3.2 > > I was able use oprof_start to start profiling on each of the platforms > and verify that oprofile was sampling data. However, on the P4 HT > machine I see that oprofiled is dying after running for a while. > /var/lib/oprofile/oprofile.log has the following line at the end of the > file: > > oprofiled: db-insert.c:235: do_insert: Assertion `page->count != 0' failed. > > I am guessing this is not a hyperthreading related problem. oprofiled is > a regular process, it doesn't know about the 1 1/2 processors in each > physical processor package. please go to the libdb subdir, make db-test, try $ db-test report the 10/15 first line of error if any, more important: $ db-test /var/lib/oprofile/samples/* db-test would report the first failing samples files, then compress this file and put it on the web or mail me it John got the same error, but after a few try the error disappear and I never got this problem. I've made a lot of stress test on libdb w/o any problem. graydon if you encounter the same problem can you follow the step above, I need feedback to understand why this occur. Phil |
From: William C. <wc...@nc...> - 2002-09-25 19:37:58
|
Philippe Elie wrote: > William Cohen wrote: > >> I have checked out the current OProfile source code with the P4 >> support from the cvs repository. I have built oprofile on the >> following machines. >> >> P4 (1proc) Linux 2.4.18, GCC 2.96 >> P4 HT (2 proc) Linux 2.4.18smp, GCC 3.2 >> Athlon (1proc) Linux 2.4.9, GCC 3.2 >> >> I was able use oprof_start to start profiling on each of the platforms >> and verify that oprofile was sampling data. However, on the P4 HT >> machine I see that oprofiled is dying after running for a while. >> /var/lib/oprofile/oprofile.log has the following line at the end of >> the file: >> >> oprofiled: db-insert.c:235: do_insert: Assertion `page->count != 0' >> failed. >> >> I am guessing this is not a hyperthreading related problem. oprofiled >> is a regular process, it doesn't know about the 1 1/2 processors in >> each physical processor package. > > > please go to the libdb subdir, make db-test, try > > $ db-test > > report the 10/15 first line of error if any, more important: No errors reported by ./db-test > $ db-test /var/lib/oprofile/samples/* [root@dhcp59-231 libdb]# ./db-test /var/lib/oprofile/samples/* db_add_page() mmap failure cause: No such device > db-test would report the first failing samples files, then compress > this file and put it on the web or mail me it I tried a slightly different approach to find out which files were problems. [root@dhcp59-231 samples]# find -path "*0" -exec /root/oprofile/oprofile-build.20020925/libdb/db-test {} \; >& /tmp/libdb.res output of the this test. [root@dhcp59-231 samples]# more /tmp/libdb.res checking file ./session-2/}usr}lib}gcc-lib}i386-redhat-linux}3.2}cc1#0 FAIL checking file ./session-1/}bin}bash#0 FAIL db-debug.c:72 invalid page number, max is 3 page_nr is 6 db-debug.c:72 invalid page number, max is 3 page_nr is 6 db-debug.c:79 child page number duplicated 1 db-debug.c:79 child page number duplicated 1 db-debug.c:72 invalid page number, max is 3 page_nr is 4 db-debug.c:79 child page number duplicated 1 db-debug.c:79 child page number duplicated 1 db-debug.c:79 child page number duplicated 2 db-debug.c:72 invalid page number, max is 3 page_nr is 3 db-debug.c:72 invalid page number, max is 3 page_nr is 11 db-debug.c:79 child page number duplicated 2 db-debug.c:79 child page number duplicated 1 db-debug.c:79 child page number duplicated 1 db-debug.c:72 invalid page number, max is 3 page_nr is 6 checking file ./session-1/}bin}rm}}}lib}i686}libc-2.2.93.so#0 FAIL I have attached compressed versions of the files listed to this mail. > John got the same error, but after a few try the error disappear > and I never got this problem. I've made a lot of stress test on > libdb w/o any problem. It takes a while for the problems for it to pop up and it appears to be data dependent. Maybe it is machine dependent. I only recall seeing this on smp machines. What platforms are you and John running oprofile on? > graydon if you encounter the same problem can you follow the step > above, I need feedback to understand why this occur. > > Phil -Will |
From: John L. <le...@mo...> - 2002-09-25 19:50:37
|
On Wed, Sep 25, 2002 at 02:41:51PM -0400, William Cohen wrote: > It takes a while for the problems for it to pop up and it appears to be > data dependent. Maybe it is machine dependent. I only recall seeing this > on smp machines. What platforms are you and John running oprofile on? I saw the problems on my UP machine too. I haven't tested recent oprofile on my 2-way SMP box yet. regards john -- "The only perfect circle on the human body is the eye. When a baby is born its so perfect but when it opens its eyes its just blinded by the corruption and everything else is a downward spiral." - Richey Edwards |
From: William C. <wc...@nc...> - 2002-09-25 21:19:27
|
I had a similar failure on uniprocessor athlon this afternoon. I tried to find the offending sample file, but libdb-test didn't turn anything up. Here is the output in the /var/lib/oprofile/oprofile.log file: oprofiled: db-insert.c:124: do_reorg: Assertion `page->count <= 6*2' failed. Sorry no output file corresponding to the assertion failure. Would it be be possible for the assertions to print out the problem file name? -Will Philippe Elie wrote: > William Cohen wrote: > >> I have checked out the current OProfile source code with the P4 >> support from the cvs repository. I have built oprofile on the >> following machines. >> >> P4 (1proc) Linux 2.4.18, GCC 2.96 >> P4 HT (2 proc) Linux 2.4.18smp, GCC 3.2 >> Athlon (1proc) Linux 2.4.9, GCC 3.2 >> >> I was able use oprof_start to start profiling on each of the platforms >> and verify that oprofile was sampling data. However, on the P4 HT >> machine I see that oprofiled is dying after running for a while. >> /var/lib/oprofile/oprofile.log has the following line at the end of >> the file: >> >> oprofiled: db-insert.c:235: do_insert: Assertion `page->count != 0' >> failed. >> >> I am guessing this is not a hyperthreading related problem. oprofiled >> is a regular process, it doesn't know about the 1 1/2 processors in >> each physical processor package. > > > please go to the libdb subdir, make db-test, try > > $ db-test > > report the 10/15 first line of error if any, more important: > > > $ db-test /var/lib/oprofile/samples/* > > db-test would report the first failing samples files, then compress > this file and put it on the web or mail me it > > John got the same error, but after a few try the error disappear > and I never got this problem. I've made a lot of stress test on > libdb w/o any problem. > > graydon if you encounter the same problem can you follow the step > above, I need feedback to understand why this occur. > > Phil > > |
From: John L. <le...@mo...> - 2002-09-25 21:25:54
|
On Wed, Sep 25, 2002 at 04:20:40PM -0400, William Cohen wrote: > I had a similar failure on uniprocessor athlon this afternoon. I tried > to find the offending sample file, but libdb-test didn't turn anything > up. Here is the output in the /var/lib/oprofile/oprofile.log file: > > oprofiled: db-insert.c:124: do_reorg: Assertion `page->count <= 6*2' failed. > > Sorry no output file corresponding to the assertion failure. Would it be > be possible for the assertions to print out the problem file name? It should have dumped /var/lib/oprofile/core in which case you can gdb it back to find the file regards john |
From: Philippe E. <ph...@wa...> - 2002-09-25 22:52:06
|
John Levon wrote: > On Wed, Sep 25, 2002 at 04:20:40PM -0400, William Cohen wrote: > > >>I had a similar failure on uniprocessor athlon this afternoon. I tried >>to find the offending sample file, but libdb-test didn't turn anything >>up. Here is the output in the /var/lib/oprofile/oprofile.log file: >> >>oprofiled: db-insert.c:124: do_reorg: Assertion `page->count <= 6*2' failed. >> >>Sorry no output file corresponding to the assertion failure. Would it be >>be possible for the assertions to print out the problem file name? > > > It should have dumped /var/lib/oprofile/core in which case you can gdb > it back to find the file it can't help the failure point is far before the assertion failure, I can now reproduce the problem, it seems to occur only (or perhaps more quickly) with --separate-samples then involving a lot of short time process creation. the first *visible* failure start here: DO_PUT_SAMPLE: c0, EIP 0x0807a185, pid 027449, count 000001 DO_PUT_SAMPLE : calc offset 0x00032185, map start 0x08048000, end 0x080c3000, offset 0x00000000, name "/bin/bash" ==25061== Invalid write of size 4 ==25061== at 0x804CA33: copy_item (db-insert.c:37) ==25061== by 0x804CBE8: split_page (db-insert.c:93) ==25061== by 0x804CD92: do_reorg (db-insert.c:151) ==25061== by 0x804CEDE: do_insert (db-insert.c:273) ==25061== Address 0x43D79018 is not stack'd, malloc'd or free'd followed by a dozens of other invalid write. I'm instrumenting libdb ... Phil |
From: William C. <wc...@nc...> - 2002-09-24 15:02:04
|
I tried out the patch on the current OProfile CVS. It still works and collects data on a P3 machine. I looked through the unit masks. I found that the one for um_memory_cancel appears to be wrong. bit 2 -> 2^2 -> 0x04 bit 3 -> 2^3 -> 0x08 Graydon, using emacs? It defaults to 2 space indentation. You might try setting the emacs variable 'c-basic-offset' to 8 to get the appropriate tabs. -Will gr...@re... wrote: > hi, > > with last week's cleanups in place, the p4 patch has shrunk a bit (and > become quite a bit nicer to work with). I've tested this on a Real > Live model-1 P4, and an athlon for good measure. It appears to work > against what's in CVS now; gui included (although it occasionally > resizes the gui a little awkwardly with all these unit masks). > > if a ppro user might give it another whirl, I'd appreciate it, or any > further comments / edits. > > -graydon > |
From: <gr...@re...> - 2002-09-24 15:12:25
|
At Tue, 24 Sep 2002 10:23:26 -0400, William Cohen wrote: > I looked through the unit masks. I found that the one for > um_memory_cancel appears to be wrong. > > bit 2 -> 2^2 -> 0x04 > bit 3 -> 2^3 -> 0x08 ok, will fix. > Graydon, using emacs? It defaults to 2 space indentation. > You might try setting the emacs variable 'c-basic-offset' to 8 to get > the appropriate tabs. yeah, sorry. I made an "oprofile-c-mode" using 8 & tabs, for files in ~/src/oprofile/* which I started using when I realized the oprofile sources are differently whitespaced. unfortunately a couple lines here and there are still held over from the very first p4 patch I made, before I realized this, and it seems john's eyeballs are better at seeing them than I am. I'm too used to GNU style. I'll try to be more careful. I've got to do another run of this patch anyways since the documentation update isn't included in the one I posted. -graydon |
From: John L. <le...@mo...> - 2002-09-24 15:20:24
|
On Tue, Sep 24, 2002 at 11:12:22AM -0400, gr...@re... wrote: > before I realized this, and it seems john's eyeballs are better at > seeing them than I am. Rather I have *really* big hangups about whitespace ;) > I'm too used to GNU style. I'll try to be more > careful. Thanks (I know it's tedious to do so). > I've got to do another run of this patch anyways since the > documentation update isn't included in the one I posted. OK great. My tests show it works on my machine in both PMC and RTC mode, so it's good to go as far as I'm concerned. regards john p.s. are you just sleeping in #oprofile or what ;) -- "The only perfect circle on the human body is the eye. When a baby is born its so perfect but when it opens its eyes its just blinded by the corruption and everything else is a downward spiral." - Richey Edwards |