From: Philippe E. <ph...@us...> - 2002-11-11 02:50:45
|
Update of /cvsroot/oprofile/oprofile-www In directory usw-pr-cvs1:/tmp/cvs-serv17869 Modified Files: ChangeLog doc.php3 Added Files: intel-P4-events.php3 Log Message: add intel P4 events regards, Phil --- NEW FILE: intel-P4-events.php3 --- <?php require("start_page.php3"); start_page("intel-P4-events.php3", "Intel P4 performance counter events"); ?> <h2>Intel P4 events</h2> <p> This is a list of all P4-core CPU's performance counter event types. Please see the Intel Architecture 32 Family Developer's Manual, Volume 3, Appendix A. Oprofile use syntethised events and doen't provide a low-level access to P4 hardware, so the Intel manual is usefull mainly for people trying to add new events in Oprofile rather for end-user. </p> <table class="eventtable"> <tr class="tablehead"><td>Name</td><td>Description</td><td>Counters usable</td><td>CPU needed</td> <td>Unit mask options</td></tr> <tr><td>GLOBAL_POWER_EVENTS</td><td> clocks processor is not halted </td><td> 0 1</td><td> P4</td><td> </td> </tr> <tr><td>BRANCH_RETIRED</td><td> number of branch instructions retired </td><td> 6 7</td><td> P4</td><td> 01: branch not-taken predicted <br /> 02: branch not-taken mispredicted <br /> 04: branch taken predicted <br /> 08: branch taken mispredicted <br /> </td> </tr> <tr><td>MISPRED_BRANCH_RETIRED</td><td> retired mispredicted branches </td><td> 6 7</td><td> P4</td><td> 01: retired instruction is non-bogus <br /> </td> </tr> <tr><td>TC_DELIVER_MODE</td><td> duration (in clock cycles) in the trace cache and decode engine </td><td> 2 3</td><td> P4</td><td> 01: both logical processors in deliver mode <br /> 02: logical processor 0 in deliver mode, 1 in build mode <br /> 04: logical processor 0 in deliver mode, 1 in halt/clear/trans mode <br /> 08: logical processor 0 in build mode, 1 in deliver mode <br /> 10: both logical processors in build mode <br /> 20: logical processor 0 in build mode, 1 in halt/clear/trans mode <br /> 40: logical processor 0 in halt/clear/trans mode, 1 in deliver mode <br /> 80: logical processor 0 in halt/clear/trans mode, 1 in build mode <br /> </td> </tr> <tr><td>BPU_FETCH_REQUEST</td><td> instruction fetch requests from the branch predict unit </td><td> 0 1</td><td> P4</td><td> 01: trace cache lookup miss <br /> </td> </tr> <tr><td>ITLB_REFERENCE</td><td> translations using the instruction translation lookaside buffer </td><td> 0 1</td><td> P4</td><td> 01: ITLB hit <br /> 02: ITLB miss <br /> 04: uncacheable ITLB hit <br /> </td> </tr> <tr><td>MEMORY_CANCEL</td><td> cancelled requesets in data cache address control unit </td><td> 4 5</td><td> P4</td><td> 04: replayed because no store request buffer available <br /> 08: conflicts due to 64k aliasing <br /> </td> </tr> <tr><td>MEMORY_COMPLETE</td><td> completed load split, store split, uncacheable split, uncacheable load </td><td> 4 5</td><td> P4</td><td> 01: load split completed, excluding UC/WC loads <br /> 02: any split stores completed <br /> </td> </tr> <tr><td>LOAD_PORT_REPLAY</td><td> replayed events at the load port </td><td> 4 5</td><td> P4</td><td> 02: split load <br /> </td> </tr> <tr><td>STORE_PORT_REPLAY</td><td> replayed events at the store port </td><td> 4 5</td><td> P4</td><td> 02: split store <br /> </td> </tr> <tr><td>MOB_LOAD_REPLAY</td><td> replayed loads from the memory order buffer </td><td> 0 1</td><td> P4</td><td> 02: replay cause: unknown store address <br /> 08: replay cause: unknown store data <br /> 10: replay cause: partial overlap between load and store <br /> 20: replay cause: mismatched low 4 bits between load and store addr <br /> </td> </tr> <tr><td>PAGE_WALK_TYPE</td><td> page walks by the page miss handler </td><td> 0 1</td><td> P4</td><td> 01: page walk for data TLB miss <br /> 02: page walk for instruction TLB miss <br /> </td> </tr> <tr><td>BSQ_CACHE_REFERENCE</td><td> cache references seen by the bus unit </td><td> 0 1</td><td> P4</td><td> 01: read 2nd level cache hit shared <br /> 02: read 2nd level cache hit exclusive <br /> 04: read 2nd level cache hit modified <br /> 08: read 3rd level cache hit shared <br /> 10: read 3rd level cache hit exclusive <br /> 20: read 3rd level cache hit modified <br /> 100: read 2nd level cache miss <br /> 200: read 3rd level cache miss <br /> 400: writeback lookup from DAC misses 2nd level cache <br /> </td> </tr> <tr><td>IOQ_ALLOCATION</td><td> bus transactions </td><td> 0</td><td> P4</td><td> 01: bus request type bit 0 <br /> 02: bus request type bit 1 <br /> 04: bus request type bit 2 <br /> 08: bus request type bit 3 <br /> 10: bus request type bit 4 <br /> 20: count read entries <br /> 40: count write entries <br /> 80: count UC memory access entries <br /> 100: count WC memory access entries <br /> 200: count write-through memory access entries <br /> 400: count write-protected memory access entries <br /> 800: count WB memory access entries <br /> 2000: count own store requests <br /> 4000: count other / DMA store requests <br /> 8000: count HW/SW prefetch requests <br /> </td> </tr> <tr><td>IOQ_ACTIVE_ENTRIES</td><td> number of entries in the IOQ which are active </td><td> 1</td><td> P4</td><td> 01: bus request type bit 0 <br /> 02: bus request type bit 1 <br /> 04: bus request type bit 2 <br /> 08: bus request type bit 3 <br /> 10: bus request type bit 4 <br /> 20: count read entries <br /> 40: count write entries <br /> 80: count UC memory access entries <br /> 100: count WC memory access entries <br /> 200: count write-through memory access entries <br /> 400: count write-protected memory access entries <br /> 800: count WB memory access entries <br /> 2000: count own store requests <br /> 4000: count other / DMA store requests <br /> 8000: count HW/SW prefetch requests <br /> </td> </tr> <tr><td>FSB_DATA_ACTIVITY</td><td> DRDY or DBSY events on the front side bus </td><td> 0 1</td><td> P4</td><td> 01: count when this processor drives data onto bus <br /> 02: count when this processor reads data from bus <br /> 04: count when data is on bus but not sampled by this processor <br /> 08: count when this processor reserves bus for driving <br /> 10: count when other reserves bus and this processor will sample <br /> 20: count when other reserves bus and this processor will not sample <br /> </td> </tr> <tr><td>BSQ_ALLOCATION</td><td> allocations in the bus sequence unit </td><td> 0</td><td> P4</td><td> 01: (r)eq (t)ype (e)ncoding, bit 0: see next event <br /> 02: rte bit 1: 00=read, 01=read invalidate, 10=write, 11=writeback <br /> 04: req len bit 0 <br /> 08: req len bit 1 <br /> 20: request type is input (0=output) <br /> 40: request type is bus lock <br /> 80: request type is cacheable <br /> 100: request type is 8-byte chunk split across 8-byte boundary <br /> 200: request type is demand (0=prefetch) <br /> 400: request type is ordered <br /> 800: (m)emory (t)ype (e)ncoding, bit 0: see next events <br /> 1000: mte bit 1: see next event <br /> 2000: mte bit 2: 000=UC, 001=USWC, 100=WT, 101=WP, 110=WB <br /> </td> </tr> <tr><td>BSQ_ACTIVE_ENTRIES</td><td> number of entries in the bus sequence unit which are active </td><td> 1</td><td> P4</td><td> 01: (r)eq (t)ype (e)ncoding, bit 0: see next event <br /> 02: rte bit 1: 00=read, 01=read invalidate, 10=write, 11=writeback <br /> 04: req len bit 0 <br /> 08: req len bit 1 <br /> 20: request type is input (0=output) <br /> 40: request type is bus lock <br /> 80: request type is cacheable <br /> 100: request type is 8-byte chunk split across 8-byte boundary <br /> 200: request type is demand (0=prefetch) <br /> 400: request type is ordered <br /> 800: (m)emory (t)ype (e)ncoding, bit 0: see next events <br /> 1000: mte bit 1: see next event <br /> 2000: mte bit 2: 000=UC, 001=USWC, 100=WT, 101=WP, 110=WB <br /> </td> </tr> <tr><td>X87_ASSIST</td><td> retired x87 instructions which required special handling </td><td> 6 7</td><td> P4</td><td> 01: handle FP stack underflow <br /> 02: handle FP stack overflow <br /> 04: handle x87 output overflow <br /> 08: handle x87 output underflow <br /> 10: handle x87 input assist <br /> </td> </tr> <tr><td>SSE_INPUT_ASSIST</td><td> input assists requested for SSE or SSE2 operands </td><td> 4 5</td><td> P4</td><td> 8000: count all uops of this type <br /> </td> </tr> <tr><td>PACKED_SP_UOP</td><td> packed single precision uops </td><td> 4 5</td><td> P4</td><td> 8000: count all uops of this type <br /> </td> </tr> <tr><td>PACKED_DP_UOP</td><td> packed double precision uops </td><td> 4 5</td><td> P4</td><td> 8000: count all uops of this type <br /> </td> </tr> <tr><td>SCALAR_SP_UOP</td><td> scalar single precision uops </td><td> 4 5</td><td> P4</td><td> 8000: count all uops of this type <br /> </td> </tr> <tr><td>SCALAR_DP_UOP</td><td> scalar double presision uops </td><td> 4 5</td><td> P4</td><td> 8000: count all uops of this type <br /> </td> </tr> <tr><td>64BIT_MMX_UOP</td><td> 64 bit SIMD MMX instructions </td><td> 4 5</td><td> P4</td><td> 8000: count all uops of this type <br /> </td> </tr> <tr><td>128BIT_MMX_UOP</td><td> 128 bit SIMD SSE2 instructions </td><td> 4 5</td><td> P4</td><td> 8000: count all uops of this type <br /> </td> </tr> <tr><td>X87_FP_UOP</td><td> x87 floating point uops </td><td> 4 5</td><td> P4</td><td> 8000: count all uops of this type <br /> </td> </tr> <tr><td>MACHINE_CLEAR</td><td> cycles with entire machine pipeline cleared </td><td> 6 7</td><td> P4</td><td> 01: count a portion of cycles the machine is cleared for any cause <br /> 40: count cycles machine is cleared due to memory ordering issues <br /> 80: count cycles machine is cleared due to self modifying code <br /> </td> </tr> <tr><td>TC_MS_XFER</td><td> number of times uops deliver changed from TC to MS ROM </td><td> 2 3</td><td> P4</td><td> 01: count TC to MS transfers <br /> </td> </tr> <tr><td>UOP_QUEUE_WRITES</td><td> number of valid uops written to the uop queue </td><td> 2 3</td><td> P4</td><td> 01: count uops written to queue from TC build mode <br /> 02: count uops written to queue from TC deliver mode <br /> 04: count uops written to queue from microcode ROM <br /> </td> </tr> <tr><td>FRONT_END_EVENT</td><td> retired uops, tagged with front-end tagging </td><td> 6 7</td><td> P4</td><td> 01: count marked uops which are non-bogus <br /> 02: count marked uops which are bogus <br /> </td> </tr> <tr><td>EXECUTION_EVENT</td><td> retired uops, tagged with execution tagging </td><td> 6 7</td><td> P4</td><td> 01: count 1st marked uops which are non-bogus <br /> 02: count 2ns marked uops which are non-bogus <br /> 04: count 3rd marked uops which are non-bogus <br /> 08: count 4th marked uops which are non-bogus <br /> 10: count 1st marked uops which are bogus <br /> 20: count 2nd marked uops which are bogus <br /> 40: count 3rd marked uops which are bogus <br /> 80: count 4th marked uops which are bogus <br /> </td> </tr> <tr><td>REPLAY_EVENT</td><td> retired uops, tagged with replay tagging </td><td> 6 7</td><td> P4</td><td> 01: count marked uops which are non-bogus <br /> 02: count marked uops which are bogus <br /> </td> </tr> <tr><td>INSTR_RETIRED</td><td> retired instructions </td><td> 6 7</td><td> P4</td><td> 01: count non-bogus instructions which are not tagged <br /> 02: count non-bogus instructions which are tagged <br /> 04: count bogus instructions which are not tagged <br /> 08: count bogus instructions which are tagged <br /> </td> </tr> <tr><td>UOPS_RETIRED</td><td> retired uops </td><td> 6 7</td><td> P4</td><td> 01: count marked uops which are non-bogus <br /> 02: count marked uops which are bogus <br /> </td> </tr> <tr><td>UOP_TYPE</td><td> type of uop tagged by front-end tagging </td><td> 6 7</td><td> P4</td><td> 02: count uops which are load operations <br /> 04: count uops which are store operations <br /> </td> </tr> <tr><td>RETIRED_MISPRED_BRANCH_TYPE</td><td> retired mispredicted branched, selected by type </td><td> 2 3</td><td> P4</td><td> 02: count conditional jumps" <br /> 04: count indirect call branches <br /> 08: count return branches <br /> 10: count returns, indirect calls or indirect jumps <br /> </td> </tr> <tr><td>RETIRED_BRANCH_TYPE</td><td> retired branches, selected by type </td><td> 2 3</td><td> P4</td><td> </td> </tr> </table> <?php require("end_page.php3"); end_page("intel-P4-events.php3"); ?> Index: ChangeLog =================================================================== RCS file: /cvsroot/oprofile/oprofile-www/ChangeLog,v retrieving revision 1.16 retrieving revision 1.17 diff -u -d -r1.16 -r1.17 --- ChangeLog 28 Aug 2002 23:36:23 -0000 1.16 +++ ChangeLog 11 Nov 2002 02:50:42 -0000 1.17 @@ -1,3 +1,8 @@ +2002-11-11 Philippe Elie <ph...@wa...> + + * intel-P4-events.php3: new, P4 events description + * doc.php3: link it + 2002-08-29 John Levon <le...@mo...> * faq.php3: add FAQ on RTC busy error Index: doc.php3 =================================================================== RCS file: /cvsroot/oprofile/oprofile-www/doc.php3,v retrieving revision 1.11 retrieving revision 1.12 diff -u -d -r1.11 -r1.12 --- doc.php3 17 Jun 2002 20:38:49 -0000 1.11 +++ doc.php3 11 Nov 2002 02:50:42 -0000 1.12 @@ -63,7 +63,7 @@ Quick reference for the available event types for the performance counters. The same info can be retrieved by running <tt>op_help</tt>. <br /> -<a href="intel-events.php3">Intel</a> | <a href="amd-events.php3">AMD</a> +<a href="intel-events.php3">Intel P6/PII/PIII</a> | <a href="intel-P4-events.php3">Intel P4</a> | <a href="amd-events.php3">AMD</a> </p> <?php require('end_page.php3'); end_page("doc.php3"); ?> |