From: John L. <le...@mo...> - 2003-01-15 12:29:33
|
Looks trivial, thought I'd check first before applying though. OK ? john Index: libop/op_events.c =================================================================== RCS file: /cvsroot/oprofile/oprofile/libop/op_events.c,v retrieving revision 1.19 diff -u -r1.19 op_events.c --- libop/op_events.c 13 Jan 2003 10:26:40 -0000 1.19 +++ libop/op_events.c 15 Jan 2003 12:22:10 -0000 @@ -105,7 +105,7 @@ /* MISPRED_BRANCH_RETIRED */ static struct op_unit_mask um_mispred_branch_retired = - {1, utm_bitmask, 0x01, + {1, utm_mandatory, 0x01, { {0x01, "retired instruction is non-bogus"} } }; /* TC_DELIVER_MODE */ @@ -122,7 +122,7 @@ /* BPU_FETCH_REQUEST */ static struct op_unit_mask um_bpu_fetch_request = - {1, utm_bitmask, 0x00, + {1, utm_mandatory, 0x00, {{0x01, "trace cache lookup miss"} } }; /* ITLB_REFERENCE */ @@ -146,12 +146,12 @@ /* LOAD_PORT_REPLAY */ static struct op_unit_mask um_load_port_replay = - {1, utm_bitmask, 0x02, + {1, utm_mandatory, 0x02, { {0x02, "split load"} } }; /* STORE_PORT_REPLAY */ static struct op_unit_mask um_store_port_replay = - {1, utm_bitmask, 0x02, + {1, utm_mandatory, 0x02, { {0x02, "split store"} } }; /* MOB_LOAD_REPLAY */ @@ -243,7 +243,7 @@ /* {64,128}BIT_MMX_UOP */ /* X87_FP_UOP */ static struct op_unit_mask um_flame_uop = - {1, utm_bitmask, 0x8000, + {1, utm_mandatory, 0x8000, { {0x8000, "count all uops of this type" } } }; /* X87_SIMD_MOVES_UOP */ @@ -261,12 +261,12 @@ /* GLOBAL_POWER_EVENTS */ static struct op_unit_mask um_global_power_events = - {1, utm_bitmask, 0x1, + {1, utm_mandatory, 0x1, { {0x01, "count cycles when processor is active"} } }; /* TC_MS_XFER */ static struct op_unit_mask um_tc_ms_xfer = - {1, utm_bitmask, 0x1, + {1, utm_mandatory, 0x1, { {0x01, "count TC to MS transfers"} } }; /* UOP_QUEUE_WRITES */ |
From: William C. <wc...@nc...> - 2003-01-15 21:53:27
|
Making some of these things manditory isn't the right thing to do. There are cases were one might want to count all instructions including the bogus ones, e.g. to get an idea of the speculation. The P4 attempts to aggressively move load up in the scheduling to fetch data from the L1 cache. However, there are cases where load has to be redone ("replay") because the speculation wasn't correct. LOAD_PORT_REPLAY counts the number of times operations are repeated. A "split load" crosses cache lines. OProfile should not limit the replay to just measuring that subset of operations. Same applies to the STORE_PORT_REPLAY. John Levon wrote: > Looks trivial, thought I'd check first before applying though. OK ? > > john > > > Index: libop/op_events.c > =================================================================== > RCS file: /cvsroot/oprofile/oprofile/libop/op_events.c,v > retrieving revision 1.19 > diff -u -r1.19 op_events.c > --- libop/op_events.c 13 Jan 2003 10:26:40 -0000 1.19 > +++ libop/op_events.c 15 Jan 2003 12:22:10 -0000 > @@ -105,7 +105,7 @@ > > /* MISPRED_BRANCH_RETIRED */ > static struct op_unit_mask um_mispred_branch_retired = > - {1, utm_bitmask, 0x01, > + {1, utm_mandatory, 0x01, > { {0x01, "retired instruction is non-bogus"} } }; > > /* TC_DELIVER_MODE */ > @@ -122,7 +122,7 @@ > > /* BPU_FETCH_REQUEST */ > static struct op_unit_mask um_bpu_fetch_request = > - {1, utm_bitmask, 0x00, > + {1, utm_mandatory, 0x00, > {{0x01, "trace cache lookup miss"} } }; > > /* ITLB_REFERENCE */ > @@ -146,12 +146,12 @@ > > /* LOAD_PORT_REPLAY */ > static struct op_unit_mask um_load_port_replay = > - {1, utm_bitmask, 0x02, > + {1, utm_mandatory, 0x02, > { {0x02, "split load"} } }; > > /* STORE_PORT_REPLAY */ > static struct op_unit_mask um_store_port_replay = > - {1, utm_bitmask, 0x02, > + {1, utm_mandatory, 0x02, > { {0x02, "split store"} } }; > > /* MOB_LOAD_REPLAY */ > @@ -243,7 +243,7 @@ > /* {64,128}BIT_MMX_UOP */ > /* X87_FP_UOP */ > static struct op_unit_mask um_flame_uop = > - {1, utm_bitmask, 0x8000, > + {1, utm_mandatory, 0x8000, > { {0x8000, "count all uops of this type" } } }; I am not sure what the distinction is between "coutnt all uops of this type" on and off. Don't know about GLOBAL_POWER_EVENTS or TC_MS_XFER. > /* X87_SIMD_MOVES_UOP */ > @@ -261,12 +261,12 @@ > > /* GLOBAL_POWER_EVENTS */ > static struct op_unit_mask um_global_power_events = > - {1, utm_bitmask, 0x1, > + {1, utm_mandatory, 0x1, > { {0x01, "count cycles when processor is active"} } }; > > /* TC_MS_XFER */ > static struct op_unit_mask um_tc_ms_xfer = > - {1, utm_bitmask, 0x1, > + {1, utm_mandatory, 0x1, > { {0x01, "count TC to MS transfers"} } }; > > /* UOP_QUEUE_WRITES */ -Will |
From: graydon h. <gr...@re...> - 2003-01-15 22:16:37
|
On Wed, 2003-01-15 at 16:53, William Cohen wrote: > Making some of these things manditory isn't the right thing to do. well, perhaps. I'm not entirely clear on the semantics of some of these things, to be honest. for example, as you say, there is only one bit defined as a mask value for LOAD_PORT_REPLAY: split load. so, clearly if I set that bit to 1, I'll be recording the split-load replays. but I wonder: if I set that bit to 0, does that mean I'm going to record non-split-load replays? or am I going to record nothing at all? if the former, then certainly making it utm_mandatory is wrong; if the latter, I don't see what the harm is. (aside: is the "splitness" of the load what *causes* the replay, or is the splitness just an interesting feature of replays you might want to observe?) -graydon |
From: William C. <wc...@nc...> - 2003-01-15 22:37:06
|
The P4 documentation could use some be clearer in that area of the select bits. For bits like non-bogus the bit subsets the stuff sampled. For some of the cache data collection the bits need to be set to get data. It doesn't make sense to only have the replay counters count split operations. However, maybe Intel implemented it that way, but that wouldn't make much sense. Reading through some of the information about the microarchitecture the P4 tries to execute load and stores to the L1 cache before all the data dependencies are verified as resolved. The processor makes the assumption the values being used are not going to change. There are cases where the processor finds out after the load or store is speculatively executed the assumptions for the values were incorrect, the P4 has to replay the instruction in these cases. split load is a load of a value that crosses between cache lines (unaligned access). split store is the same thing but with a store. The store forwarding on P4 has problems with partial register writes. There may be similar penalties with split loads and stores that makes them interesting to pick out because they are more expensive. -Will graydon hoare wrote: > On Wed, 2003-01-15 at 16:53, William Cohen wrote: > > >>Making some of these things manditory isn't the right thing to do. > > > well, perhaps. I'm not entirely clear on the semantics of some of these > things, to be honest. for example, as you say, there is only one bit > defined as a mask value for LOAD_PORT_REPLAY: split load. so, clearly if > I set that bit to 1, I'll be recording the split-load replays. > > but I wonder: if I set that bit to 0, does that mean I'm going to record > non-split-load replays? or am I going to record nothing at all? if the > former, then certainly making it utm_mandatory is wrong; if the latter, > I don't see what the harm is. > > (aside: is the "splitness" of the load what *causes* the replay, or is > the splitness just an interesting feature of replays you might want to > observe?) > > -graydon > > > > > ------------------------------------------------------- > This SF.NET email is sponsored by: A Thawte Code Signing Certificate > is essential in establishing user confidence by providing assurance of > authenticity and code integrity. Download our Free Code Signing guide: > http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0028en > _______________________________________________ > oprofile-list mailing list > opr...@li... > https://lists.sourceforge.net/lists/listinfo/oprofile-list > |
From: Philippe E. <ph...@wa...> - 2003-01-16 10:44:03
|
graydon hoare wrote: > On Wed, 2003-01-15 at 16:53, William Cohen wrote: > > >>Making some of these things manditory isn't the right thing to do. > > > well, perhaps. I'm not entirely clear on the semantics of some of these > things, to be honest. for example, as you say, there is only one bit > defined as a mask value for LOAD_PORT_REPLAY: split load. so, clearly if > I set that bit to 1, I'll be recording the split-load replays. > > but I wonder: if I set that bit to 0, does that mean I'm going to record > non-split-load replays? or am I going to record nothing at all? if the > former, then certainly making it utm_mandatory is wrong; if the latter, > I don't see what the harm is. Same problem occur for: /* MISPRED_BRANCH_RETIRED */ static struct op_unit_mask um_mispred_branch_retired = - {1, utm_bitmask, 0x01, + {1, utm_mandatory, 0x01, { {0x01, "retired instruction is non-bogus"} } }; does this means than 0 count bogus instruction and there is no way to count bogus and non-bogus retired insn or only non-bogus insn can be counted ? If not utm_mandatory is required. If the documentation is not enough clear test are needed. Anyway for unit mask with only one bit allowed utm_bitmask is always wrong: if the bit is required utm_mandatory is needed else utm_exclusive must be used and unit mask == 0 allowed explicitely. so on: static struct op_unit_mask um_mispred_branch_retired = {1, utm_bitmask, 0x01, + {1, utm_mandatory, 0x01, { {0x01, "retired instruction is non-bogus"} } }; or static struct op_unit_mask um_mispred_branch_retired = {2, utm_excelusive, 0x01, + {1, utm_mandatory, 0x01, { {0x01, "retired instruction is non-bogus"}, {0x00, bogus (or non bogus and bogus ?)"} } }; libop/op_events.c: int op_check_unit_mask(struct op_unit_mask const * allow, u16 um) ... case utm_bitmask: // Must reject 0 bit mask because it can count nothing if (um == 0) break; ... return -1; regards, Phil |