Re: [perfmon2] Question about the plm field in perf_event_attr
Status: Beta
Brought to you by:
seranian
From: stephane e. <er...@go...> - 2009-12-14 21:00:53
|
Corey, On Mon, Dec 14, 2009 at 8:45 PM, Corey Ashford <cja...@li...> wrote: > Thanks for the reply, Stephane, > > stephane eranian wrote: >>>> Name : RS_UOPS_DISPATCHED_CYCLES >>>> Desc : Cycles micro-ops dispatched for execution >>>> Code : 0xa1 >>>> Umask-00 : 0x01 : [PORT_0] : on port 0 >>>> Umask-01 : 0x02 : [PORT_1] : on port 1 >>>> Umask-02 : 0x04 : [PORT_2] : on port 2 >>>> Umask-03 : 0x08 : [PORT_3] : on port 3 >>>> Umask-04 : 0x10 : [PORT_4] : on port 4 >>>> Umask-05 : 0x20 : [PORT_5] : on port 5 >>>> Umask-06 : 0x3f : [ANY] : on any port (DEFAULT) >>>> Modif-00 : 0x00 : [u] : monitor at priv level 1, 2, 3 (boolean) >>>> Modif-01 : 0x01 : [k] : monitor at priv level 0 (boolean) >>>> Modif-02 : 0x02 : [i] : invert (boolean) >>>> Modif-03 : 0x03 : [e] : edge level (boolean) >>>> Modif-04 : 0x04 : [c] : counter-mask in range [0-255] (integer) >>>> >> To answer the underlying question, anything the user passes in the event >> string >> has higher priority than dfl_plm. I can have dfl_plm = PLM3 (u=1) and an >> event >> string with k=1, if which case the event will measure at the kernel only. > > So it's a true default, then, not just an initial value. Thanks for that > clarification. > Yes. >> { .name = "RS_UOPS_DISPATCHED_NONE", >> .code = 0xa0 | (1 << 23) | (1 << 24), >> .cntmsk = 0x3, >> .modmsk = _INTEL_X86_ATTR_U|_INTEL_X86_ATTR_K, >> .flags = INTEL_X86_ALIAS, >> .alias = "RS_UOPS_DISPATCHED", >> .desc = "Number of of cycles in which no micro-ops is >> dispatched for execution", >> }, >> >> This is an event that is an alias to another event but it has some >> modifiers hardcoded. >> (i=invert, c=counter-mask). So first you noticed that are not part of >> the modmsk (which >> is the attrmsk we talked about). Second, you have the INTEL_X86_ALIAS >> which indicates >> this is an alias with the actual event pointed to by .alias. In this >> case, the fstr coming out >> is: >> >> $ perf_examples/task -e rs_uops_dispatched_none date >> >> [0x1d100a0 event_sel=0xa0 umask=0x0 os=0 usr=1 en=1 int=1 inv=1 edge=0 >> cnt_mask=1] >> >> PERF[type=4 val=0x1d100a0 e_u=0 e_k=1 e_hv=1] >> RS_UOPS_DISPATCHED:k=0:u=1:e=0:i=1:c=1 >> >> FQSTR=RS_UOPS_DISPATCHED:k=0:u=1:e=0:i=1:c=1 > > I'm guessing the "... | (1 << 23) | (1 << 24)" are the invert and counter > mask bits? > True. > Does this mean you have to have code that does a reverse translation from > the code value to attribute strings? > Yes, that's how I have it at the moment. For modifiers, the code looks at the raw register value and figures out the modifiers and its value. That is quite simple. It does not do it for unit masks but I think it would be doable assuming no unit mask bits overlap in case you can combine them. >>> Sorry, I'm a little confused here. Do you mean hard-wired or hard-coded? >>> It looks like a ucode of 0x1 is required (needs to be hard-wired), but >>> there are also a couple of optional bits which can be turned on, and >>> you've >>> given a name to one such optional encoding. Would it be legit also to >>> specify "UOPS_ISSUED:ANY:i=1" or "UOPS_ISSUED:ANY:c=1"? I would assume >>> so. >>> >> Given what I just described above, OPS_ISSUED:STALLED_CYCLES would now >> come out as fstr=UOPS_ISSUED:ANY:c=1:i=1 which makes more sense. Because >> now you have both the logical event string (what you pass) and the actual >> event >> string. The tool has both and can decide which one to use. >> As for your question, the way the code is, it would not be legit, You >> cannot set a >> hardwired modifier, even if it is to the same value. >> > > Oh, I get it now. ANY is 0x3f...all PORT_* bits set. Somehow I was > thinking it was a 0x0 value, which in retrospect may not make much sense for > a umask anyway, since 0 is the default value. > > With that in mind, let's say I specify: > > UOPS_ISSUED:PORT_0:PORT_1:PORT_2:PORT_3:PORT_4:PORT_5:c=1:i=1 > > would fqstr come back as > > UOPS_ISSUED:ANY:c=1:i=1 ? > That would be hard in the general case. How to do you go from 0x3f back to 0x1, 0x2, 0x4, 0x8, 0x10, 0x20 if you don't know that each unit mask is just one bit. There two cases to consider here: - no unit mask combination possible, in that case you can do a perfect match reg.sel_umask == table[i].umask[j].code - unit masks can be combined and each is one bit, then scan bit by bit until you find a perfect match Don't get me wrong, I think that is possible if the tables are correctly annotated to allow/deny combinations. I need to take a closer look at the unit mask values on all X86 processors. But I think you would agree that this translation would have to happen in the model specific layer because there are many PMU specific details in here. > If so, is there a way to accomplish this reverse translation in a generic, > table-driven way, or must it be done per arch? I'm thinking it's the > latter, because the architectures are so different. Latter. > |