Thread: [RFC] reserving perfctr registers from nmi before using them

oprofile-list

[RFC] reserving perfctr registers from nmi before using them

From: Don Z. <dz...@gm...> - 2006-04-03 21:32:53

Attachments: oprofile_reserve_nmi.patch

Hello,

I was trying to fix some functionality in the nmi watchdog but
unfortunately it broke how oprofile works.  As a result I was trying
to come up with a patch that would fix the oprofile issue.  The
problem revolves around the fact that oprofile uses the same
performance counter register as the nmi watchdog.  Thus if one were to
start oprofile, they would accidentally disable the watchdog, which in
turn would cause other nmi watchdog issues.

I have attached a patch which creates a reservation function so that
only one subsystem (oprofile or nmi watchdog) could use a particular
performance counter.  The patch only contains changes for the athlon
model (I'll get to the Pentium models in just a sec).  It has minimal
testing.  More importantly, I was hoping someone could look at the
patch and say it is an acceptable solution for oprofile to have the
restricted functionality.

If this is acceptable, then I was hoping to solict some feedback on a
good way of implementing this on the pentium models.  It seems in the
AMD systems, only a few registers are ever touched, thus making it
easy to hack in some code.  On the Intel systems, the code seems to
touch over 100 registers for no other reason than just to clear them.=20
This makes hacking more complicated.

My question is are all those registers being touched on the Pentuim
models really necessary?  I mean over a 100 hundred registers for a
handful of performance counters?  Am I missing something?

Cheers,
Don

Re: [RFC] reserving perfctr registers from nmi before using them

From: Andi K. <ak...@su...> - 2006-04-03 23:26:37

On Monday 03 April 2006 23:32, Don Zickus wrote:
> Hello,
>
> I was trying to fix some functionality in the nmi watchdog but
> unfortunately it broke how oprofile works.  As a result I was trying
> to come up with a patch that would fix the oprofile issue.  The
> problem revolves around the fact that oprofile uses the same
> performance counter register as the nmi watchdog.  Thus if one were to
> start oprofile, they would accidentally disable the watchdog, which in
> turn would cause other nmi watchdog issues.
>
> I have attached a patch which creates a reservation function so that
> only one subsystem (oprofile or nmi watchdog) could use a particular
> performance counter.  The patch only contains changes for the athlon
> model (I'll get to the Pentium models in just a sec).  It has minimal
> testing.  More importantly, I was hoping someone could look at the
> patch and say it is an acceptable solution for oprofile to have the
> restricted functionality.

Looks reasonable to me. Two comments:
- If you use the linux atomic bit operations (set_bit, clear_bit, 
test_and_set_bit etc.) you don't actually need the spinlock
- I'm not sure how oprofile userland reacts to a reserved
counter. Perhaps adding a printk for this case would be useful
and fixing the userland to give a sensible error message
(since it's effectively a change to the UI)

>
> If this is acceptable, then I was hoping to solict some feedback on a
> good way of implementing this on the pentium models.  It seems in the
> AMD systems, only a few registers are ever touched, thus making it
> easy to hack in some code.  On the Intel systems, the code seems to
> touch over 100 registers for no other reason than just to clear them.
> This makes hacking more complicated.
>
> My question is are all those registers being touched on the Pentuim
> models really necessary?  I mean over a 100 hundred registers for a
> handful of performance counters?  Am I missing something?

Intel decided to do it the complicated way for P4 (P3 and P-M are
nice and simple to). 

Essentially there are multiple
groups of performance counters belonging to different parts of the CPU
and each group has a couple of counters that can be assigned to an 
arbitary event in the group (I probably left out some details) 
HyperThreading makes it more complicated again.

The easy way to handle this mess I guess would be to just extend the bitmap to 
all possible counters. Bits are cheap.

-Andi

Re: [RFC] reserving perfctr registers from nmi before using them

From: Stephane E. <er...@hp...> - 2006-04-04 07:22:41

Don,

On Mon, Apr 03, 2006 at 04:32:46PM -0500, Don Zickus wrote:
> 
> I was trying to fix some functionality in the nmi watchdog but
> unfortunately it broke how oprofile works.  As a result I was trying
> to come up with a patch that would fix the oprofile issue.  The
> problem revolves around the fact that oprofile uses the same
> performance counter register as the nmi watchdog.  Thus if one were to
> start oprofile, they would accidentally disable the watchdog, which in
> turn would cause other nmi watchdog issues.
> 
I welcome your initiative. I have had problems dealing with this NMI
code with the perfmon2 subsystem.

I looked at the patch and I am not quite sure what the goal is
with the reserve_perftctr_nmi()/release_perfctr_nmi(). The logic
seems to be that Oprofile or perfmon2 would share the counters
with the NMI watchdog. If NMI watchdog is active, then they
have one less counter (a counter group for P4). I am sure this
could break some tools because most of the time they are not prepared
to deal with unavailable counter. You need to convery that the counter
is unavailable as opposed to implemented so that they can try requesting
the same set of events BUT with another event -> counter assignement.

The reservation works if you assume that NMI watchdog and Oprofile NMI-based
interruption can work together. I do not see where you can chain the interrupt
handler on NMI interrupt. The set_nmi_callback() replaces the existing handler,
it does not add a new one. Thus, you break one or the other. If you cannot
chain the NMI interrupt handler then what is the point of the reservation?

Did I miss anything?

> If this is acceptable, then I was hoping to solict some feedback on a
> good way of implementing this on the pentium models.  It seems in the
> AMD systems, only a few registers are ever touched, thus making it
> easy to hack in some code.  On the Intel systems, the code seems to
> touch over 100 registers for no other reason than just to clear them. 
> This makes hacking more complicated.

You have about 18 counters and they are implemented by about 80 registers.
Yes, they can all be used it depends on what you want to measure.

> 
> My question is are all those registers being touched on the Pentuim
> models really necessary?  I mean over a 100 hundred registers for a
> handful of performance counters?  Am I missing something?
> 

-- 
-Stephane

Re: [RFC] reserving perfctr registers from nmi before using them

From: Don Z. <dz...@gm...> - 2006-04-04 16:14:14

On 4/4/06, Stephane Eranian <er...@hp...> wrote:
> Don,
>
> On Mon, Apr 03, 2006 at 04:32:46PM -0500, Don Zickus wrote:
> >
> > I was trying to fix some functionality in the nmi watchdog but
> > unfortunately it broke how oprofile works.  As a result I was trying
> > to come up with a patch that would fix the oprofile issue.  The
> > problem revolves around the fact that oprofile uses the same
> > performance counter register as the nmi watchdog.  Thus if one were to
> > start oprofile, they would accidentally disable the watchdog, which in
> > turn would cause other nmi watchdog issues.
> >
> I welcome your initiative. I have had problems dealing with this NMI
> code with the perfmon2 subsystem.
>
> I looked at the patch and I am not quite sure what the goal is
> with the reserve_perftctr_nmi()/release_perfctr_nmi(). The logic
> seems to be that Oprofile or perfmon2 would share the counters
> with the NMI watchdog. If NMI watchdog is active, then they
> have one less counter (a counter group for P4). I am sure this
> could break some tools because most of the time they are not prepared
> to deal with unavailable counter. You need to convery that the counter
> is unavailable as opposed to implemented so that they can try requesting
> the same set of events BUT with another event -> counter assignement.

Thanks for the feedback (both Stephane and Andi).  I didn't realize my
patch would break the user space API.  :(  Initially I was trying to
figure out a way to make that counter unavailable but couldn't find
the right piece of code, so I went the other way and denying a
request.  If you could point me to the code which shows user space
which counters are available, I would happily re-work the patch.

>
> The reservation works if you assume that NMI watchdog and Oprofile NMI-ba=
sed
> interruption can work together. I do not see where you can chain the inte=
rrupt
> handler on NMI interrupt. The set_nmi_callback() replaces the existing ha=
ndler,
> it does not add a new one. Thus, you break one or the other. If you canno=
t
> chain the NMI interrupt handler then what is the point of the reservation=
?
>
> Did I miss anything?
>
Yes, my original patch to Andi reworked the NMI watchdog such that
set_nmi_callback() would _not_ disable it.  Instead set_nmi_callback()
would be used to register an NMI handler to capture interrupts the NMI
subsystem didn't know what to do with (it knows NMI watchdog, memory
parity errors, and i/o check errors).   However, the problem arises
over the new concept of the NMI watchdog never being allowed to be
disabled.  Since the NMI watchdog uses one of the performance
counters, this conflicts with how oprofile implements its
functionality.    Therefore, I was trying to find a way to prevent
oprofile from using whatever counter register NMI watchdog is using
(as long as NMI watchdog is enabled at boot time), hence my
reservation logic.

I am open to suggestions of alternate ideas but this is the crux of
the problem I am trying to solve.

> > If this is acceptable, then I was hoping to solict some feedback on a
> > good way of implementing this on the pentium models.  It seems in the
> > AMD systems, only a few registers are ever touched, thus making it
> > easy to hack in some code.  On the Intel systems, the code seems to
> > touch over 100 registers for no other reason than just to clear them.
> > This makes hacking more complicated.
>
> You have about 18 counters and they are implemented by about 80 registers=
.
> Yes, they can all be used it depends on what you want to measure.

*blinks* I was hoping I just misread the code.  This will cause my
approach to be very intrusive on P4 boxes.  Hopefully, we can come up
with a simpler way.

Cheers,
Don

Re: [RFC] reserving perfctr registers from nmi before using them

From: Stephane E. <er...@hp...> - 2006-04-05 08:38:03

Don,

On Tue, Apr 04, 2006 at 11:13:51AM -0500, Don Zickus wrote:
> On 4/4/06, Stephane Eranian <er...@hp...> wrote:
> > Don,
> >
> > On Mon, Apr 03, 2006 at 04:32:46PM -0500, Don Zickus wrote:
> > >
> > > I was trying to fix some functionality in the nmi watchdog but
> > > unfortunately it broke how oprofile works.  As a result I was trying
> > > to come up with a patch that would fix the oprofile issue.  The
> > > problem revolves around the fact that oprofile uses the same
> > > performance counter register as the nmi watchdog.  Thus if one were to
> > > start oprofile, they would accidentally disable the watchdog, which in
> > > turn would cause other nmi watchdog issues.
> > >
> > I welcome your initiative. I have had problems dealing with this NMI
> > code with the perfmon2 subsystem.
> >
> > I looked at the patch and I am not quite sure what the goal is
> > with the reserve_perftctr_nmi()/release_perfctr_nmi(). The logic
> > seems to be that Oprofile or perfmon2 would share the counters
> > with the NMI watchdog. If NMI watchdog is active, then they
> > have one less counter (a counter group for P4). I am sure this
> > could break some tools because most of the time they are not prepared
> > to deal with unavailable counter. You need to convery that the counter
> > is unavailable as opposed to implemented so that they can try requesting
> > the same set of events BUT with another event -> counter assignement.
> 
> Thanks for the feedback (both Stephane and Andi).  I didn't realize my
> patch would break the user space API.  :(  Initially I was trying to
> figure out a way to make that counter unavailable but couldn't find
> the right piece of code, so I went the other way and denying a
> request.  If you could point me to the code which shows user space
> which counters are available, I would happily re-work the patch.
> 

I don't know about the particular details of OProfile. I think counters
are exposed via /dev/oprofile filesystem. I don't know if what you did is
enough to make unavailable counters disappear from there. Supposing it does.
The issue is how is this taken into account? Typically you say I want to
measure EVT1, EVT2, EVT3. Then, the tool figures out the assignment:
EVT1 -> CNTR2, EVT2 -> CNTR1, EVT3 -> CNTR4 for instance. The key
is to figure out if the assignement algorithm takes into the list of
available counters as shown by the kernel or just as expected for this CPU.
In others words, does this code check /dev/oprofile/.. to verify what is actually
available. 

The other aspect to keep in mind here is that counters are not always symetrical.
Not all events can be measured on any counter. Few CPUs have that level of flexibility.
I think Pentium M and AMD Opteron do, but P4 does not. If you remove certain counters
it may not be possible to make certain measurements anymore.


> >
> > The reservation works if you assume that NMI watchdog and Oprofile NMI-based
> > interruption can work together. I do not see where you can chain the interrupt
> > handler on NMI interrupt. The set_nmi_callback() replaces the existing handler,
> > it does not add a new one. Thus, you break one or the other. If you cannot
> > chain the NMI interrupt handler then what is the point of the reservation?
> >
> > Did I miss anything?
> >
> Yes, my original patch to Andi reworked the NMI watchdog such that
> set_nmi_callback() would _not_ disable it.  Instead set_nmi_callback()
> would be used to register an NMI handler to capture interrupts the NMI
> subsystem didn't know what to do with (it knows NMI watchdog, memory
> parity errors, and i/o check errors).   However, the problem arises
> over the new concept of the NMI watchdog never being allowed to be
> disabled.  Since the NMI watchdog uses one of the performance
> counters, this conflicts with how oprofile implements its
> functionality.    Therefore, I was trying to find a way to prevent
> oprofile from using whatever counter register NMI watchdog is using
> (as long as NMI watchdog is enabled at boot time), hence my
> reservation logic.
> 
You want both NMI interrupt for NMI watchdog and for OProfile.
NMI watchdog is using one counter. Oprofile will use the others.
If set_nmi_callback() registers an ADDITIONAL handler, then you
have to dispatch between the various handlers. Watchdog and
OProfile will trigger NMI on counter overflow. The dispatcher
has to detect that this is an NMI from counters. Then, it needs to figure
out which counter(s), based on this it needs to call the corresponding
handler.

Is that your plan?

-- 
-Stephane

Re: [RFC] reserving perfctr registers from nmi before using them

From: Don Z. <dz...@gm...> - 2006-04-05 14:16:24

Stephane,

> > Thanks for the feedback (both Stephane and Andi).  I didn't realize my
> > patch would break the user space API.  :(  Initially I was trying to
> > figure out a way to make that counter unavailable but couldn't find
> > the right piece of code, so I went the other way and denying a
> > request.  If you could point me to the code which shows user space
> > which counters are available, I would happily re-work the patch.
> >
>
> I don't know about the particular details of OProfile. I think counters
> are exposed via /dev/oprofile filesystem. I don't know if what you did is
> enough to make unavailable counters disappear from there. Supposing it do=
es.
> The issue is how is this taken into account? Typically you say I want to
> measure EVT1, EVT2, EVT3. Then, the tool figures out the assignment:
> EVT1 -> CNTR2, EVT2 -> CNTR1, EVT3 -> CNTR4 for instance. The key
> is to figure out if the assignement algorithm takes into the list of
> available counters as shown by the kernel or just as expected for this CP=
U.
> In others words, does this code check /dev/oprofile/.. to verify what is =
actually
> available.
>
> The other aspect to keep in mind here is that counters are not always sym=
etrical.
> Not all events can be measured on any counter. Few CPUs have that level o=
f flexibility.
> I think Pentium M and AMD Opteron do, but P4 does not. If you remove cert=
ain counters
> it may not be possible to make certain measurements anymore.
>
Thanks, this gives me a better idea how this subsystem works.  I'll
look through the code again to find this logic.

>
> > Yes, my original patch to Andi reworked the NMI watchdog such that
> > set_nmi_callback() would _not_ disable it.  Instead set_nmi_callback()
> > would be used to register an NMI handler to capture interrupts the NMI
> > subsystem didn't know what to do with (it knows NMI watchdog, memory
> > parity errors, and i/o check errors).   However, the problem arises
> > over the new concept of the NMI watchdog never being allowed to be
> > disabled.  Since the NMI watchdog uses one of the performance
> > counters, this conflicts with how oprofile implements its
> > functionality.    Therefore, I was trying to find a way to prevent
> > oprofile from using whatever counter register NMI watchdog is using
> > (as long as NMI watchdog is enabled at boot time), hence my
> > reservation logic.
> >
> You want both NMI interrupt for NMI watchdog and for OProfile.
> NMI watchdog is using one counter. Oprofile will use the others.
> If set_nmi_callback() registers an ADDITIONAL handler, then you
> have to dispatch between the various handlers. Watchdog and
> OProfile will trigger NMI on counter overflow. The dispatcher
> has to detect that this is an NMI from counters. Then, it needs to figure
> out which counter(s), based on this it needs to call the corresponding
> handler.
>
> Is that your plan?
>
Yes, but not as complicated.  Basically, the NMI watchdog counter is
checked on every NMI interrupt (if enabled).   If the watchdog
overflowed, then clear, reset and return success.  Else if
set_nmi_callback is not NULL, execute the handler.  The logic doesn't
neccessarily have to understand the concept of counters.

Cheers,
Don

Re: [RFC] reserving perfctr registers from nmi before using them

From: Don Z. <dz...@gm...> - 2006-04-07 21:27:41

Attachments: oprofile_reserve_nmi2.patch

On 4/5/06, Don Zickus <dz...@gm...> wrote:
> Stephane,
>
> > > Thanks for the feedback (both Stephane and Andi).  I didn't realize m=
y
> > > patch would break the user space API.  :(  Initially I was trying to
> > > figure out a way to make that counter unavailable but couldn't find
> > > the right piece of code, so I went the other way and denying a
> > > request.  If you could point me to the code which shows user space
> > > which counters are available, I would happily re-work the patch.
> > >
> >
> > I don't know about the particular details of OProfile. I think counters
> > are exposed via /dev/oprofile filesystem. I don't know if what you did =
is
> > enough to make unavailable counters disappear from there. Supposing it =
does.
> > The issue is how is this taken into account? Typically you say I want t=
o
> > measure EVT1, EVT2, EVT3. Then, the tool figures out the assignment:
> > EVT1 -> CNTR2, EVT2 -> CNTR1, EVT3 -> CNTR4 for instance. The key
> > is to figure out if the assignement algorithm takes into the list of
> > available counters as shown by the kernel or just as expected for this =
CPU.
> > In others words, does this code check /dev/oprofile/.. to verify what i=
s actually
> > available.
> >

I reworked this patch a little bit.  This time I added a hack to _not_
expose performance counters (avail_to_resrv_perfctr_nmi() ) through
the /dev/oprofile interface if they are reserved.  Its an ugly hack
but that is the result of the way oprofile was designed.   There was
also a bunch of other cleanups too but more related to nmi.c.

I am hoping this doesn't break your user space tools.  As I don't
really run oprofile, I don't know the right way to test this.  Let me
know if this is a better solution.

Cheers,
Don

Re: [RFC] reserving perfctr registers from nmi before using them

From: Stephane E. <er...@hp...> - 2006-04-10 09:40:51

Don,

On Fri, Apr 07, 2006 at 05:27:38PM -0400, Don Zickus wrote:
> On 4/5/06, Don Zickus <dz...@gm...> wrote:
> > Stephane,
> >
> > > > Thanks for the feedback (both Stephane and Andi).  I didn't realize my
> > > > patch would break the user space API.  :(  Initially I was trying to
> > > > figure out a way to make that counter unavailable but couldn't find
> > > > the right piece of code, so I went the other way and denying a
> > > > request.  If you could point me to the code which shows user space
> > > > which counters are available, I would happily re-work the patch.
> > > >
> > >
> > > I don't know about the particular details of OProfile. I think counters
> > > are exposed via /dev/oprofile filesystem. I don't know if what you did is
> > > enough to make unavailable counters disappear from there. Supposing it does.
> > > The issue is how is this taken into account? Typically you say I want to
> > > measure EVT1, EVT2, EVT3. Then, the tool figures out the assignment:
> > > EVT1 -> CNTR2, EVT2 -> CNTR1, EVT3 -> CNTR4 for instance. The key
> > > is to figure out if the assignement algorithm takes into the list of
> > > available counters as shown by the kernel or just as expected for this CPU.
> > > In others words, does this code check /dev/oprofile/.. to verify what is actually
> > > available.
> > >
> 
> I reworked this patch a little bit.  This time I added a hack to _not_
> expose performance counters (avail_to_resrv_perfctr_nmi() ) through
> the /dev/oprofile interface if they are reserved.  Its an ugly hack
> but that is the result of the way oprofile was designed.   There was
> also a bunch of other cleanups too but more related to nmi.c.
> 
I think it looks okay. As Will pointed out, I think there is still
a problem with HyperThreading (HT). The counters are ALL Shared in HT.

Thus the P4_NMI_IQ_CCCR0 of thread0 and thread1 go to the same register,i.e,
overwritten. I wonder how NMI works on HT CPUs? I think it would make sense
to use two counters in HT mode, one for each CPU seen by the kernel. Yet
this may render certain user measurement impossible. 

> I am hoping this doesn't break your user space tools.  As I don't
> really run oprofile, I don't know the right way to test this.  Let me
> know if this is a better solution.

I don't own OProfile tools. I think as a result of this change, user level
code must take a look at /dev/oprofile to figure out what counters are actually
available. They must be prepared to operate with fewer counters than expected
for a given CPU. I don't know the internals of the OProfile tools, not sure
they already have support for this.

In the case of perfmon, the PMU description table can use your avail_to_resrv_perfctr_nmi()
function to mask off certain PMU registers that are reserved for NMI. The results on
user tools will be the same, i.e, they need to be prepared to cope with less counters
than expected. None do this today. But that could be fixed.

Thanks.

-- 
-Stephane

Re: [RFC] reserving perfctr registers from nmi before using them

From: Andi K. <ak...@su...> - 2006-04-10 14:44:39

On Monday 10 April 2006 11:34, Stephane Eranian wrote:

> Thus the P4_NMI_IQ_CCCR0 of thread0 and thread1 go to the same
> register,i.e, overwritten. I wonder how NMI works on HT CPUs? I think it
> would make sense to use two counters in HT mode, one for each CPU seen by
> the kernel. Yet this may render certain user measurement impossible

The watchdog code should always use the first counter even on HT.
This means it uses a shared counter essentially for both threads. Should 
be ok.

-Andi

Re: [RFC] reserving perfctr registers from nmi before using them

From: Don Z. <dz...@gm...> - 2006-04-10 18:22:15

> > I reworked this patch a little bit.  This time I added a hack to _not_
> > expose performance counters (avail_to_resrv_perfctr_nmi() ) through
> > the /dev/oprofile interface if they are reserved.  Its an ugly hack
> > but that is the result of the way oprofile was designed.   There was
> > also a bunch of other cleanups too but more related to nmi.c.
> >
> I think it looks okay. As Will pointed out, I think there is still
> a problem with HyperThreading (HT). The counters are ALL Shared in HT.
>
> Thus the P4_NMI_IQ_CCCR0 of thread0 and thread1 go to the same register,i=
.e,
> overwritten. I wonder how NMI works on HT CPUs? I think it would make sen=
se
> to use two counters in HT mode, one for each CPU seen by the kernel. Yet
> this may render certain user measurement impossible.
>
> > I am hoping this doesn't break your user space tools.  As I don't
> > really run oprofile, I don't know the right way to test this.  Let me
> > know if this is a better solution.
>
> I don't own OProfile tools. I think as a result of this change, user leve=
l
> code must take a look at /dev/oprofile to figure out what counters are ac=
tually
> available. They must be prepared to operate with fewer counters than expe=
cted
> for a given CPU. I don't know the internals of the OProfile tools, not su=
re
> they already have support for this.
>
Who owns the tool?  I just figured this mailing list was the right
place to post patches and ask questions about making internal changes?
 Perhaps I was wrong?

Cheers,
Don

Re: [RFC] reserving perfctr registers from nmi before using them

From: John L. <le...@mo...> - 2006-04-11 23:48:47

On Mon, Apr 10, 2006 at 02:34:37AM -0700, Stephane Eranian wrote:

> I don't own OProfile tools. I think as a result of this change, user level
> code must take a look at /dev/oprofile to figure out what counters are actually
> available. They must be prepared to operate with fewer counters than expected
> for a given CPU. I don't know the internals of the OProfile tools, not sure
> they already have support for this.

They don't, you'd need to add support for this. Otherwise it's yet more
userspace breakage.

john

Re: [RFC] reserving perfctr registers from nmi before using them

From: Don Z. <dz...@gm...> - 2006-04-12 15:08:07

What other changes do I need to do for my changes to be acceptable.  I
know I still have to port to the p4 (which I get sick every time I
think about the work needed) and some stubs for i386.  Other than
that, how do I go about getting the tools to accept this?  Should I
just post the patch and let them build off of it?

Cheers,
Don

On 4/11/06, John Levon <le...@mo...> wrote:
> On Mon, Apr 10, 2006 at 02:34:37AM -0700, Stephane Eranian wrote:
>
> > I don't own OProfile tools. I think as a result of this change, user le=
vel
> > code must take a look at /dev/oprofile to figure out what counters are =
actually
> > available. They must be prepared to operate with fewer counters than ex=
pected
> > for a given CPU. I don't know the internals of the OProfile tools, not =
sure
> > they already have support for this.
>
> They don't, you'd need to add support for this. Otherwise it's yet more
> userspace breakage.
>
> john
>

Re: [RFC] reserving perfctr registers from nmi before using them

From: John L. <le...@mo...> - 2006-04-12 15:09:46

On Wed, Apr 12, 2006 at 11:07:58AM -0400, Don Zickus wrote:

> What other changes do I need to do for my changes to be acceptable.  I
> know I still have to port to the p4 (which I get sick every time I
> think about the work needed) and some stubs for i386.  Other than
> that, how do I go about getting the tools to accept this?  Should I
> just post the patch and let them build off of it?

I'm not sure what you mean here. You have to send a patch to
opr...@li... that does the checking, and changes to the
docs.

regards
john

Re: [RFC] reserving perfctr registers from nmi before using them

From: Andi K. <ak...@su...> - 2006-04-12 15:13:23

On Wednesday 12 April 2006 17:07, Don Zickus wrote:
> What other changes do I need to do for my changes to be acceptable.  I
> know I still have to port to the p4 (which I get sick every time I
> think about the work needed) and some stubs for i386.  Other than
> that, how do I go about getting the tools to accept this?  Should I
> just post the patch and let them build off of it?

I think the tools should just display an error to the user right?

If there is something intelligible in dmesg that explains the error
that should be ok.

-Andi

Re: [RFC] reserving perfctr registers from nmi before using them

From: John L. <le...@mo...> - 2006-04-12 15:20:28

On Wed, Apr 12, 2006 at 05:12:56PM +0200, Andi Kleen wrote:

> > What other changes do I need to do for my changes to be acceptable.  I
> > know I still have to port to the p4 (which I get sick every time I
> > think about the work needed) and some stubs for i386.  Other than
> > that, how do I go about getting the tools to accept this?  Should I
> > just post the patch and let them build off of it?
> 
> I think the tools should just display an error to the user right?
> 
> If there is something intelligible in dmesg that explains the error
> that should be ok.

BTW, I still think this needs to be optional, or at the very least a way
to dynamically turn off the NMI watchdog so if we need the counter we
can use it.

john

Re: [RFC] reserving perfctr registers from nmi before using them

From: Stephane E. <er...@hp...> - 2006-04-14 13:11:46

Don,

I took a second look at your patch. You are assuming
that the naming scheme of Oprofile counters is the same
as yours. I think this is too constrained. It would
be better if avail_to_resrv_perfctr_nmi() would use
the MSR address of the register to check. This is guaranteed
to be the common piece of information. I understand
this impacts the way you maintain the ownership via a
bitmask. But the NMI is never going to use more than
one counter and worst case (P4) you need to track 2
MSR addresses.

What do you think?

On Fri, Apr 07, 2006 at 05:27:38PM -0400, Don Zickus wrote:
> On 4/5/06, Don Zickus <dz...@gm...> wrote:
> > Stephane,
> >
> > > > Thanks for the feedback (both Stephane and Andi).  I didn't realize my
> > > > patch would break the user space API.  :(  Initially I was trying to
> > > > figure out a way to make that counter unavailable but couldn't find
> > > > the right piece of code, so I went the other way and denying a
> > > > request.  If you could point me to the code which shows user space
> > > > which counters are available, I would happily re-work the patch.
> > > >
> > >
> > > I don't know about the particular details of OProfile. I think counters
> > > are exposed via /dev/oprofile filesystem. I don't know if what you did is
> > > enough to make unavailable counters disappear from there. Supposing it does.
> > > The issue is how is this taken into account? Typically you say I want to
> > > measure EVT1, EVT2, EVT3. Then, the tool figures out the assignment:
> > > EVT1 -> CNTR2, EVT2 -> CNTR1, EVT3 -> CNTR4 for instance. The key
> > > is to figure out if the assignement algorithm takes into the list of
> > > available counters as shown by the kernel or just as expected for this CPU.
> > > In others words, does this code check /dev/oprofile/.. to verify what is actually
> > > available.
> > >
> 
> I reworked this patch a little bit.  This time I added a hack to _not_
> expose performance counters (avail_to_resrv_perfctr_nmi() ) through
> the /dev/oprofile interface if they are reserved.  Its an ugly hack
> but that is the result of the way oprofile was designed.   There was
> also a bunch of other cleanups too but more related to nmi.c.
> 
> I am hoping this doesn't break your user space tools.  As I don't
> really run oprofile, I don't know the right way to test this.  Let me
> know if this is a better solution.
> 
> Cheers,
> Don



-- 

-Stephane

Re: [RFC] reserving perfctr registers from nmi before using them

From: Don Z. <dz...@gm...> - 2006-04-14 15:49:03

On 4/14/06, Stephane Eranian <er...@hp...> wrote:
> Don,
>
> I took a second look at your patch. You are assuming
> that the naming scheme of Oprofile counters is the same
> as yours. I think this is too constrained. It would
> be better if avail_to_resrv_perfctr_nmi() would use
> the MSR address of the register to check. This is guaranteed
> to be the common piece of information. I understand
> this impacts the way you maintain the ownership via a
> bitmask. But the NMI is never going to use more than
> one counter and worst case (P4) you need to track 2
> MSR addresses.
>
> What do you think?
>
The reason I hacked in avail_to_resrv_perfctr_nmi the way I did is
because I _didn't_ have the MSR address to play with.  In fact the
MSRs are even reserved yet.  Unless I am misreading the oprofile code,
it keeps two distinct structs:  one to hold the msrs and the other to
hold the file pointers to /dev/oprofile. It would be nice to have a
reference pointer to either struct from the other, that way the lookup
could be safer.

I'll take another look at the code and see I missed something.  But
its hard to do when oprofile blindly sets up all the counters under
/dev/oprofile then later checks to see if they are there and
initializes them.  Without re-writing half of the oprofile setup
logic, I couldn't find a real easy way to do what I needed to do.

Cheers,
Don

Re: [RFC] reserving perfctr registers from nmi before using them

From: Stephane E. <er...@hp...> - 2006-04-18 08:42:57

Don,

On Fri, Apr 14, 2006 at 11:48:59AM -0400, Don Zickus wrote:
> On 4/14/06, Stephane Eranian <er...@hp...> wrote:
> > Don,
> >
> > I took a second look at your patch. You are assuming
> > that the naming scheme of Oprofile counters is the same
> > as yours. I think this is too constrained. It would
> > be better if avail_to_resrv_perfctr_nmi() would use
> > the MSR address of the register to check. This is guaranteed
> > to be the common piece of information. I understand
> > this impacts the way you maintain the ownership via a
> > bitmask. But the NMI is never going to use more than
> > one counter and worst case (P4) you need to track 2
> > MSR addresses.
> >
> > What do you think?
> >
> The reason I hacked in avail_to_resrv_perfctr_nmi the way I did is
> because I _didn't_ have the MSR address to play with.  In fact the
> MSRs are even reserved yet.  Unless I am misreading the oprofile code,
> it keeps two distinct structs:  one to hold the msrs and the other to
> hold the file pointers to /dev/oprofile. It would be nice to have a
> reference pointer to either struct from the other, that way the lookup
> could be safer.
> 
> I'll take another look at the code and see I missed something.  But
> its hard to do when oprofile blindly sets up all the counters under
> /dev/oprofile then later checks to see if they are there and
> initializes them.  Without re-writing half of the oprofile setup
> logic, I couldn't find a real easy way to do what I needed to do.
> 

What about those *fill_in_*() routines? They fill in the MSR addresses
corresponding to counters. Couldn't you intervene at this level?

-- 
-Stephane

Re: [RFC] reserving perfctr registers from nmi before using them

From: Don Z. <dz...@gm...> - 2006-04-18 14:25:32

On 4/18/06, Stephane Eranian <er...@hp...> wrote:
> Don,
>
> > The reason I hacked in avail_to_resrv_perfctr_nmi the way I did is
> > because I _didn't_ have the MSR address to play with.  In fact the
> > MSRs are even reserved yet.  Unless I am misreading the oprofile code,
> > it keeps two distinct structs:  one to hold the msrs and the other to
> > hold the file pointers to /dev/oprofile. It would be nice to have a
> > reference pointer to either struct from the other, that way the lookup
> > could be safer.
> >
> > I'll take another look at the code and see I missed something.  But
> > its hard to do when oprofile blindly sets up all the counters under
> > /dev/oprofile then later checks to see if they are there and
> > initializes them.  Without re-writing half of the oprofile setup
> > logic, I couldn't find a real easy way to do what I needed to do.
> >
>
> What about those *fill_in_*() routines? They fill in the MSR addresses
> corresponding to counters. Couldn't you intervene at this level?
>

Unfortunately, I can not.  The fill_in-*() routines are called once a
user selects an event to profile.  This is waaaaay later than when the
/dev/oprofile/* files are created.  Hence my ugly hack.  The problem
(as I see it) was the code originally assumed it had full access to
that part of the hardware and that is was always available.  By adding
in the changes I want to make, it completely breaks that assumption.=20
Now things start to get ugly.

Like I said without re-writing much of the setup logic for oprofile, I
have failed to find a better approach.

Cheers,
Don

Re: [RFC] reserving perfctr registers from nmi before using them

From: Stephane E. <er...@hp...> - 2006-04-19 17:57:32

Don,

On Tue, Apr 18, 2006 at 10:25:26AM -0400, Don Zickus wrote:
> On 4/18/06, Stephane Eranian <er...@hp...> wrote:
> > Don,
> >
> > > The reason I hacked in avail_to_resrv_perfctr_nmi the way I did is
> > > because I _didn't_ have the MSR address to play with.  In fact the
> > > MSRs are even reserved yet.  Unless I am misreading the oprofile code,
> > > it keeps two distinct structs:  one to hold the msrs and the other to
> > > hold the file pointers to /dev/oprofile. It would be nice to have a
> > > reference pointer to either struct from the other, that way the lookup
> > > could be safer.
> > >
> > > I'll take another look at the code and see I missed something.  But
> > > its hard to do when oprofile blindly sets up all the counters under
> > > /dev/oprofile then later checks to see if they are there and
> > > initializes them.  Without re-writing half of the oprofile setup
> > > logic, I couldn't find a real easy way to do what I needed to do.
> > >
> >
> > What about those *fill_in_*() routines? They fill in the MSR addresses
> > corresponding to counters. Couldn't you intervene at this level?
> >
> 
> Unfortunately, I can not.  The fill_in-*() routines are called once a
> user selects an event to profile.  This is waaaaay later than when the
> /dev/oprofile/* files are created.  Hence my ugly hack.  The problem
> (as I see it) was the code originally assumed it had full access to
> that part of the hardware and that is was always available.  By adding
> in the changes I want to make, it completely breaks that assumption. 
> Now things start to get ugly.
> 
> Like I said without re-writing much of the setup logic for oprofile, I
> have failed to find a better approach.
> 
What about providing a second avail_() function that takes MSR addresses as input?
I think it would make your interface more useable by non OProfile code.
That would certainly make my job easier trying to interface perfmon2 to your NMI routines.
> Cheers,
> Don
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by xPML, a groundbreaking scripting language
> that extends applications into web and mobile media. Attend the live webcast
> and join the prime developer group breaking into this new coding territory!
> http://sel.as-us.falkag.net/sel?cmd_______________________________________________
> oprofile-list mailing list
> opr...@li...
> https://lists.sourceforge.net/lists/listinfo/oprofile-list

-- 

-Stephane

Re: [RFC] reserving perfctr registers from nmi before using them

From: Don Z. <dz...@gm...> - 2006-04-20 22:28:45

On 4/19/06, Stephane Eranian <er...@hp...> wrote:
> Don,
>
> >
> What about providing a second avail_() function that takes MSR addresses =
as input?
> I think it would make your interface more useable by non OProfile code.
> That would certainly make my job easier trying to interface perfmon2 to y=
our NMI routines.

I like it.  In fact it makes better sense for all the reserve/release
functions too.  I'll add it.
So now I have the following functions:
avail_to_resrv_perfctr_nmi( unsigned int MSR)
avail_to_resrv_perfctr_nmi_bit( unsigned int counter)  //hack version
for oprofile
resrv_perfctr_nmi( unsigned int MSR)
release_perfctr_nmi( unsigned int MSR)
resrv_evntsel_nmi( unsigned int MSR)
release_evntsel_nmi( unsigned int MSR)

Cheers,
Don

Re: [perfmon] Re: [RFC] reserving perfctr registers from nmi before using them

From: Stephane E. <er...@hp...> - 2006-04-21 16:09:27

Don,

On Thu, Apr 20, 2006 at 06:28:42PM -0400, Don Zickus wrote:
> On 4/19/06, Stephane Eranian <er...@hp...> wrote:
> > Don,
> >
> > >
> > What about providing a second avail_() function that takes MSR addresses as input?
> > I think it would make your interface more useable by non OProfile code.
> > That would certainly make my job easier trying to interface perfmon2 to your NMI routines.
> 
> I like it.  In fact it makes better sense for all the reserve/release
> functions too.  I'll add it.
> So now I have the following functions:
> avail_to_resrv_perfctr_nmi( unsigned int MSR)
> avail_to_resrv_perfctr_nmi_bit( unsigned int counter)  //hack version
> for oprofile
> resrv_perfctr_nmi( unsigned int MSR)
> release_perfctr_nmi( unsigned int MSR)
> resrv_evntsel_nmi( unsigned int MSR)
> release_evntsel_nmi( unsigned int MSR)

Very good. Now I can use this for perfmon2.
Thanks.

-- 
-Stephane

Re: [RFC] reserving perfctr registers from nmi before using them

From: Andi K. <ak...@su...> - 2006-04-05 18:48:39

On Wednesday 05 April 2006 10:31, Stephane Eranian wrote:

> The other aspect to keep in mind here is that counters are not always
> symetrical. Not all events can be measured on any counter. Few CPUs have
> that level of flexibility. I think Pentium M and AMD Opteron do, but P4
> does not. If you remove certain counters it may not be possible to make
> certain measurements anymore.

P4 has at least two counters per group, possible more. NMI watchdog
will never use more than one. So any counter should be still accessible.

> > > The reservation works if you assume that NMI watchdog and Oprofile
> > > NMI-based interruption can work together. I do not see where you can
> > > chain the interrupt handler on NMI interrupt. The set_nmi_callback()
> > > replaces the existing handler, it does not add a new one. Thus, you
> > > break one or the other. If you cannot chain the NMI interrupt handler
> > > then what is the point of the reservation?
> > >
> > > Did I miss anything?
> >
> > Yes, my original patch to Andi reworked the NMI watchdog such that
> > set_nmi_callback() would _not_ disable it.  Instead set_nmi_callback()
> > would be used to register an NMI handler to capture interrupts the NMI
> > subsystem didn't know what to do with (it knows NMI watchdog, memory
> > parity errors, and i/o check errors).   However, the problem arises
> > over the new concept of the NMI watchdog never being allowed to be
> > disabled.  Since the NMI watchdog uses one of the performance
> > counters, this conflicts with how oprofile implements its
> > functionality.    Therefore, I was trying to find a way to prevent
> > oprofile from using whatever counter register NMI watchdog is using
> > (as long as NMI watchdog is enabled at boot time), hence my
> > reservation logic.
>
> You want both NMI interrupt for NMI watchdog and for OProfile.
> NMI watchdog is using one counter. Oprofile will use the others.
> If set_nmi_callback() registers an ADDITIONAL handler,

Only oprofile uses set_nmi_callback, the other users
use DIE_NMI

> then you 
> have to dispatch between the various handlers. Watchdog and
> OProfile will trigger NMI on counter overflow. The dispatcher
> has to detect that this is an NMI from counters. Then, it needs to figure
> out which counter(s), based on this it needs to call the corresponding
> handler.

The point is that people want to get panics to reach the "panic on nmi"
check at the end. That only works if not all users before that 
eat them up. Currently oprofile and the nmi watchdog are that badly
behaved.
-Andi

Re: [RFC] reserving perfctr registers from nmi before using them

From: William C. <wc...@re...> - 2006-04-05 18:59:15

Andi Kleen wrote:
> On Wednesday 05 April 2006 10:31, Stephane Eranian wrote:
> 
> 
>>The other aspect to keep in mind here is that counters are not always
>>symetrical. Not all events can be measured on any counter. Few CPUs have
>>that level of flexibility. I think Pentium M and AMD Opteron do, but P4
>>does not. If you remove certain counters it may not be possible to make
>>certain measurements anymore.
> 
> 
> P4 has at least two counters per group, possible more. NMI watchdog
> will never use more than one. So any counter should be still accessible.

Is this still true when hyperthreading is being used? A number of the 
tools, e.g. split the perf counters between the threads, leaving only 
one register in a group on a processor.

-Will

Re: [RFC] reserving perfctr registers from nmi before using them

From: Stephane E. <er...@hp...> - 2006-04-05 19:08:24

Will,

On Wed, Apr 05, 2006 at 02:57:01PM -0400, William Cohen wrote:
> Andi Kleen wrote:
> >On Wednesday 05 April 2006 10:31, Stephane Eranian wrote:
> >
> >
> >>The other aspect to keep in mind here is that counters are not always
> >>symetrical. Not all events can be measured on any counter. Few CPUs have
> >>that level of flexibility. I think Pentium M and AMD Opteron do, but P4
> >>does not. If you remove certain counters it may not be possible to make
> >>certain measurements anymore.
> >
> >
> >P4 has at least two counters per group, possible more. NMI watchdog
> >will never use more than one. So any counter should be still accessible.
> 
> Is this still true when hyperthreading is being used? A number of the 
> tools, e.g. split the perf counters between the threads, leaving only 
> one register in a group on a processor.
> 
This is a good point.

Yes, perfmon does split the register file in two. The user sees on 8 counters
and those 8 are mapped into one half of the other depending of which HW thread
the program executes on. That is the only way to let two programs using the counters
run on the same CPU core.

-- 
-Stephane

1 2 > >> (Page 1 of 2)