From: Jiri K. <jk...@su...> - 2008-11-13 20:57:39
|
Hi, there has been a bug [1] recently which caused NMIs to be stuck on certain Intel CPUs, resultung in oprofile not working. This has been fixed by commit 7c64ade53a6f977d73f16243865c42ceae999aea Author: Andi Kleen <an...@fi...> Date: Fri Nov 7 14:02:49 2008 +0100 oprofile: Fix p6 counter overflow check but we are seeing the very same symptoms still on the following CPU, even with 2.6.28-rc4 (which includes 7c64ade5): processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 47 model name : AMD Athlon(tm) 64 Processor 3000+ stepping : 0 After starting oprofile, no NMI is delivered at all: vulture:~ # opcontrol --start Using 2.6+ OProfile kernel interface. Reading module info. Using log file /var/lib/oprofile/samples/oprofiled.log Daemon started. Profiler running. vulture:~ # grep NMI /proc/interrupts NMI: 0 Non-maskable interrupts vulture:~ # grep NMI /proc/interrupts NMI: 0 Non-maskable interrupts on a quite similar CPU, everything seems to be working smoothly though with the very same kernel -- NMIs are being delivered, and oprofile performs proper profiling: processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 4 model name : AMD Athlon(tm) 64 Processor 3000+ stepping : 8 I haven't yet found a time to start bisecting this. [1] http://lkml.org/lkml/2008/10/30/319 -- Jiri Kosina SUSE Labs |
From: Ingo M. <mi...@el...> - 2008-11-13 21:25:13
|
* Jiri Kosina <jk...@su...> wrote: > I haven't yet found a time to start bisecting this. Would be nice to identify a commit to revert - in case we run out of time fixing it. Ingo |
From: Jiri K. <jk...@su...> - 2008-11-13 21:27:58
|
On Thu, 13 Nov 2008, Ingo Molnar wrote: > > I haven't yet found a time to start bisecting this. > Would be nice to identify a commit to revert - in case we run out of > time fixing it. Yup, I first wanted to make this known to the public in hope that it will ring a bell somewhere. If noone sees an obvous reason for this, I will do my best to bisect this tomorrow. -- Jiri Kosina SUSE Labs |
From: Ingo M. <mi...@el...> - 2008-11-13 21:38:16
|
* Jiri Kosina <jk...@su...> wrote: > On Thu, 13 Nov 2008, Ingo Molnar wrote: > > > > I haven't yet found a time to start bisecting this. > > Would be nice to identify a commit to revert - in case we run out of > > time fixing it. > > Yup, I first wanted to make this known to the public in hope that it > will ring a bell somewhere. > > If noone sees an obvous reason for this, I will do my best to bisect > this tomorrow. We've got the one patch below pending, but that's not for AMD cpus so it shouldnt impact your case. But ... some change made it all much more fragile. I'm curious why things became more fragile. Ingo ---------------> Subject: oprofile: un-mask APIC before resetting counter in ppro_check_ctrs() From: Eric Dumazet <da...@co...> Date: Tue, 11 Nov 2008 09:32:12 +0100 While using oprofile on my HP BL460c G1, (two quad core intel E5450 CPU), I noticed that one CPU after the other could not get anymore NMI. After a while, all cores where blocked (ie not generating events for oprofile) I tried all major linux versions and all where affected by this freeze. I found that we have to un-mask APIC *before* writing to MSR counter when we get event notification, because we use APIC_LVTPC in edge triggered mode. Signed-off-by: Eric Dumazet <da...@co...> Signed-off-by: Ingo Molnar <mi...@el...> --- arch/x86/oprofile/op_model_ppro.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) Index: tip/arch/x86/oprofile/op_model_ppro.c =================================================================== --- tip.orig/arch/x86/oprofile/op_model_ppro.c +++ tip/arch/x86/oprofile/op_model_ppro.c @@ -126,6 +126,12 @@ static int ppro_check_ctrs(struct pt_reg u64 val; int i; + /* + * We need to unmask the apic vector *before* writing reset_value + * to msr counter, because we use edge trigger + */ + apic_write(APIC_LVTPC, apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED); + for (i = 0 ; i < num_counters; ++i) { if (!reset_value[i]) continue; @@ -136,10 +142,6 @@ static int ppro_check_ctrs(struct pt_reg } } - /* Only P6 based Pentium M need to re-unmask the apic vector but it - * doesn't hurt other P6 variant */ - apic_write(APIC_LVTPC, apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED); - /* We can't work out if we really handled an interrupt. We * might have caught a *second* counter just after overflowing * the interrupt for this counter then arrives |
From: Eric D. <da...@co...> - 2008-11-13 22:05:14
Attachments:
oprofile_ppro.patch
|
Ingo Molnar a écrit : > * Jiri Kosina <jk...@su...> wrote: > >> On Thu, 13 Nov 2008, Ingo Molnar wrote: >> >>>> I haven't yet found a time to start bisecting this. >>> Would be nice to identify a commit to revert - in case we run out of >>> time fixing it. >> Yup, I first wanted to make this known to the public in hope that it >> will ring a bell somewhere. >> >> If noone sees an obvous reason for this, I will do my best to bisect >> this tomorrow. > > We've got the one patch below pending, but that's not for AMD cpus so > it shouldnt impact your case. > > But ... some change made it all much more fragile. I'm curious why > things became more fragile. > > Ingo > > ---------------> > Subject: oprofile: un-mask APIC before resetting counter in ppro_check_ctrs() > From: Eric Dumazet <da...@co...> > Date: Tue, 11 Nov 2008 09:32:12 +0100 > > While using oprofile on my HP BL460c G1, (two quad core intel E5450 CPU), > I noticed that one CPU after the other could not get anymore NMI. > > After a while, all cores where blocked (ie not generating events for oprofile) > I tried all major linux versions and all where affected by this freeze. > > I found that we have to un-mask APIC *before* writing to MSR counter > when we get event notification, because we use APIC_LVTPC in edge triggered mode. > > Signed-off-by: Eric Dumazet <da...@co...> > Signed-off-by: Ingo Molnar <mi...@el...> > --- > arch/x86/oprofile/op_model_ppro.c | 10 ++++++---- > 1 file changed, 6 insertions(+), 4 deletions(-) > > Index: tip/arch/x86/oprofile/op_model_ppro.c > =================================================================== > --- tip.orig/arch/x86/oprofile/op_model_ppro.c > +++ tip/arch/x86/oprofile/op_model_ppro.c > @@ -126,6 +126,12 @@ static int ppro_check_ctrs(struct pt_reg > u64 val; > int i; > > + /* > + * We need to unmask the apic vector *before* writing reset_value > + * to msr counter, because we use edge trigger > + */ > + apic_write(APIC_LVTPC, apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED); > + > for (i = 0 ; i < num_counters; ++i) { > if (!reset_value[i]) > continue; > @@ -136,10 +142,6 @@ static int ppro_check_ctrs(struct pt_reg > } > } > > - /* Only P6 based Pentium M need to re-unmask the apic vector but it > - * doesn't hurt other P6 variant */ > - apic_write(APIC_LVTPC, apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED); > - > /* We can't work out if we really handled an interrupt. We > * might have caught a *second* counter just after overflowing > * the interrupt for this counter then arrives Just to clarify, I found this patch necessary for previous linux versions as well. Maybe new CPUS from intel triggers a software bug, I dont know. Also, I posted a patch about the kmalloc() of reset_value, I am not sure patch was pushed. This one is a real bug. [PATCH] oprofile: fix an overflow in ppro code reset_value was changed from long to u64 in commit b99170288421c79f0c2efa8b33e26e65f4bb7fb8 (oprofile: Implement Intel architectural perfmon support) But dynamic allocation of this array use a wrong type (long instead of u64) Signed-off-by: Eric Dumazet <da...@co...> --- arch/x86/oprofile/op_model_ppro.c | 2 +- 1 files changed, 1 insertion(+), 1 deletion(-) |
From: Eric D. <da...@co...> - 2008-11-14 15:26:30
|
Very strange results on my Intel E5450 CPU I found that I have to bias regs->ip by 16 bytes in order to have correct profiling. Does it rings a bell to anyone ? opcontrol --version opcontrol: oprofile 0.9.4 compiled on Nov 3 2008 06:45:57 Thank you diff --git a/arch/x86/oprofile/op_model_ppro.c b/arch/x86/oprofile/op_model_ppro.c index 3f1b81a..051fb16 100644 --- a/arch/x86/oprofile/op_model_ppro.c +++ b/arch/x86/oprofile/op_model_ppro.c @@ -131,7 +131,9 @@ static int ppro_check_ctrs(struct pt_regs * const regs, continue; rdmsrl(msrs->counters[i].addr, val); if (CTR_OVERFLOWED(val)) { + regs->ip -= 0x10; oprofile_add_sample(regs, i); + regs->ip += 0x10; wrmsrl(msrs->counters[i].addr, -reset_value[i]); } } |
From: Andi K. <an...@fi...> - 2008-11-14 15:49:46
|
On Fri, Nov 14, 2008 at 04:25:51PM +0100, Eric Dumazet wrote: > Very strange results on my Intel E5450 CPU > > I found that I have to bias regs->ip by 16 bytes in order to have > correct profiling. Can you expand a bit how you determined if the profile is right or wrong? -Andi |
From: Eric D. <da...@co...> - 2008-11-14 16:03:26
|
Andi Kleen a écrit : > On Fri, Nov 14, 2008 at 04:25:51PM +0100, Eric Dumazet wrote: >> Very strange results on my Intel E5450 CPU >> >> I found that I have to bias regs->ip by 16 bytes in order to have >> correct profiling. > > Can you expand a bit how you determined if the profile is right or wrong? > > -Andi > > opannotate -a vmlinux > FILE And check results. With the patch I get : c041d270 <dst_release>: /* dst_release total: 16523 1.2978 */ 252 0.0198 :c041d270: push %ebp 654 0.0514 :c041d271: test %eax,%eax :c041d273: mov %esp,%ebp 47 0.0037 :c041d275: mov %eax,%edx 42 0.0033 :c041d277: je c041d289 <dst_release+0x19> :c041d279: mov $0xffffffff,%eax 3 2.4e-04 :c041d27e: lock xadd %eax,0x80(%edx) 14766 1.1598 :c041d286: dec %eax 1 7.9e-05 :c041d287: js c041d28b <dst_release+0x1b> 53 0.0042 :c041d289: pop %ebp 705 0.0554 :c041d28a: ret :c041d28b: mov $0x10e,%edx :c041d290: mov $0xc0561ed5,%eax :c041d295: call c022d1a0 <warn_on_slowpath> :c041d29a: pop %ebp :c041d29b: ret :c041d29c: lea 0x0(%esi),%esi This results please me : The cost is right after the "lock xadd" as expected. Without patch, all addresses were off by 16 bytes. On dst_release() it was on the call on warn_on_slowpath(), and I am pretty sure it is not possible at all this WARN_ON is hit |
From: Mikael P. <mi...@it...> - 2008-11-14 17:15:50
|
Eric Dumazet writes: > Very strange results on my Intel E5450 CPU > > I found that I have to bias regs->ip by 16 bytes in order to have > correct profiling. > > Does it rings a bell to anyone ? The term is "skid". All OO processors are susceptible to it, as was known already to the Alpha "continous profiling" performance counter folks. And no, blindly subtracting 16 from IP is not a fix. > > opcontrol --version > opcontrol: oprofile 0.9.4 compiled on Nov 3 2008 06:45:57 > > Thank you > > diff --git a/arch/x86/oprofile/op_model_ppro.c b/arch/x86/oprofile/op_model_ppro.c > index 3f1b81a..051fb16 100644 > --- a/arch/x86/oprofile/op_model_ppro.c > +++ b/arch/x86/oprofile/op_model_ppro.c > @@ -131,7 +131,9 @@ static int ppro_check_ctrs(struct pt_regs * const regs, > continue; > rdmsrl(msrs->counters[i].addr, val); > if (CTR_OVERFLOWED(val)) { > + regs->ip -= 0x10; > oprofile_add_sample(regs, i); > + regs->ip += 0x10; > wrmsrl(msrs->counters[i].addr, -reset_value[i]); > } > } > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to maj...@vg... > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > |
From: Maynard J. <may...@us...> - 2008-11-13 21:38:08
|
Ingo Molnar wrote: > * Jiri Kosina <jk...@su...> wrote: > >> I haven't yet found a time to start bisecting this. > > Would be nice to identify a commit to revert - in case we run out of > time fixing it. Robert, Jason, Can one of you take a look at this? It appears to be an AMD issue right now. The first message in the thread can be found at: http://lkml.org/lkml/2008/11/13/336 Thanks. -Maynard > > Ingo > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > oprofile-list mailing list > opr...@li... > https://lists.sourceforge.net/lists/listinfo/oprofile-list |
From: Robert R. <rob...@am...> - 2008-11-14 10:10:41
|
On 13.11.08 22:27:52, Jiri Kosina wrote: > On Thu, 13 Nov 2008, Ingo Molnar wrote: > > > > I haven't yet found a time to start bisecting this. > > Would be nice to identify a commit to revert - in case we run out of > > time fixing it. > > Yup, I first wanted to make this known to the public in hope that it will > ring a bell somewhere. > > If noone sees an obvous reason for this, I will do my best to bisect this > tomorrow. Could you at least test with v2.6.27? I could not reproduce it on the hardware I am using. Thanks, -Robert -- Advanced Micro Devices, Inc. Operating System Research Center email: rob...@am... |
From: Jiri K. <jk...@su...> - 2008-11-14 11:12:08
|
On Fri, 14 Nov 2008, Robert Richter wrote: > Could you at least test with v2.6.27? Oprofile in 2.6.27 doesn't work on this system, as far as I know. > I could not reproduce it on the hardware I am using. Yes, it seems to be very system-specific. As I have said in my original mail, the machine we are seeing this problem on is vendor_id : AuthenticAMD cpu family : 15 model : 47 model name : AMD Athlon(tm) 64 Processor 3000+ stepping : 0 but on the other hand on the following CPU, the oprofile works with 2.6.27 and 2.6.28-rc4 without any problems, NMIs are coming flawlessly vendor_id : AuthenticAMD cpu family : 15 model : 4 model name : AMD Athlon(tm) 64 Processor 3000+ stepping : 8 -- Jiri Kosina SUSE Labs |
From: Eric D. <da...@co...> - 2008-11-14 17:21:57
|
Mikael Pettersson a écrit : > Eric Dumazet writes: > > Very strange results on my Intel E5450 CPU > > > > I found that I have to bias regs->ip by 16 bytes in order to have > > correct profiling. > > > > Does it rings a bell to anyone ? > > The term is "skid". All OO processors are susceptible to it, > as was known already to the Alpha "continous profiling" > performance counter folks. Cool. Maybe this can explain the other bug I have with oprofile. > > And no, blindly subtracting 16 from IP is not a fix. Who mentioned a fix ? I am only giving more fuel to Intel guys so they hopefully can give us a working oprofile. > > > > > opcontrol --version > > opcontrol: oprofile 0.9.4 compiled on Nov 3 2008 06:45:57 > > > > Thank you > > > > diff --git a/arch/x86/oprofile/op_model_ppro.c b/arch/x86/oprofile/op_model_ppro.c > > index 3f1b81a..051fb16 100644 > > --- a/arch/x86/oprofile/op_model_ppro.c > > +++ b/arch/x86/oprofile/op_model_ppro.c > > @@ -131,7 +131,9 @@ static int ppro_check_ctrs(struct pt_regs * const regs, > > continue; > > rdmsrl(msrs->counters[i].addr, val); > > if (CTR_OVERFLOWED(val)) { > > + regs->ip -= 0x10; > > oprofile_add_sample(regs, i); > > + regs->ip += 0x10; > > wrmsrl(msrs->counters[i].addr, -reset_value[i]); > > } > > } > > > > -- |
From: Andi K. <an...@fi...> - 2008-11-14 17:41:38
|
> >And no, blindly subtracting 16 from IP is not a fix. > > Who mentioned a fix ? I am only giving more fuel to Intel guys so they > hopefully can give us a working oprofile. You would need to implement PEBS support to avoid that problem. But it's a big task. perfmon2 implements it already. -Andi |
From: Eric D. <da...@co...> - 2008-11-15 16:31:34
|
Andi Kleen a écrit : >>> And no, blindly subtracting 16 from IP is not a fix. >> Who mentioned a fix ? I am only giving more fuel to Intel guys so they >> hopefully can give us a working oprofile. > > You would need to implement PEBS support to avoid that problem. But it's a big > task. perfmon2 implements it already. > Thanks for the information. Hum, so I grabbed perfmon2 git tree, installed various tools... I am quite new to pfmon and tried : # pfmon --system-wide sizeof=64 44 <press ENTER to stop session> Then started "tbench 8", and got a kernel panic after 6 seconds. I was using oprofile like this opcontrol --vmlinux=/path/vmlinux --start // doing some benchmarking... opreport -l vmlinux | head -n 40 What would be a working equivalent for perfmon2 based tools ? Thanks |
From: Andi K. <an...@fi...> - 2008-11-15 18:27:04
|
On Sat, Nov 15, 2008 at 05:30:58PM +0100, Eric Dumazet wrote: > Andi Kleen a écrit : > >>>And no, blindly subtracting 16 from IP is not a fix. > >>Who mentioned a fix ? I am only giving more fuel to Intel guys so they > >>hopefully can give us a working oprofile. > > > >You would need to implement PEBS support to avoid that problem. But it's a > >big > >task. perfmon2 implements it already. > > > > Thanks for the information. > > Hum, so I grabbed perfmon2 git tree, installed various tools... > > I am quite new to pfmon and tried : > > # pfmon --system-wide > sizeof=64 44 > <press ENTER to stop session> > > Then started "tbench 8", and got a kernel panic after 6 seconds. > > > I was using oprofile like this > > opcontrol --vmlinux=/path/vmlinux --start > // doing some benchmarking... > opreport -l vmlinux | head -n 40 > > > What would be a working equivalent for perfmon2 based tools ? Probably getting a perfmon tree that works. I guess Stephane can help (cc'ed). Or just deal with imprecise events for now. -Andi -- ak...@li... |
From: stephane e. <er...@go...> - 2008-11-17 15:02:25
|
Hello, I have not seen the beginning of that discussion so my comments may be slightly off. It seems Eric has problems with accuracy of instruction addresses when sampling with the PMU. This is an inherent limitation of the PMU. It can be mitigated but not completely eliminated. The core issue is that it takes several cycles between the moment a counter overflows and posting of the PMU interrupt. During that time, the CPU keeps on executing instructions. The interrupt IP you get, reflects the place you were when it triggered. That can be far away from where it was posted and where the counter actually overflowed. Of course, if you are stalled that distance is usually 0 or off by a small number of instructions. But it can be very large when overflow happens during a kernel critical section where interrupts are off. There is nothing SW can do about all of this. Andi mentioned PEBS. I don't know if you are familiar with what it does. Let me summarize. This is a hardware/microcode feature which implements a hardware-managed buffer where samples are stored. The OS points the CPU to a memory region where PMU samples are stored. No PMU interrupt is generated until the buffer becomes full. That part addresses some of the overhead associated with interrupt-based sampling. Unfortunately, PEBS does not point to the instruction where the counter overflowed, it will still be a few instructions off. But this time, you get the machine state at the last retired instruction. Furthermore, PEBS can record samples while in kernel critical sections. A limitation of PEBS is that it does not work with all the PMU events. Only a handful are available. As for perfmon, if you pull from the perfmon2 GIT tree, this should work. Don't know what happen in you case. Perfmon and the pfmon can do simple counting or also collect profiles. $ pfmon date Counts cycles at the user level only for the process date $ pfmon --system-wide -t10 Counts elapsed cycles at user level for all CPU for 10s. Results are per-cpu $ pfmon --long-smpl-periods=240000 date Collect a flat profile of process date. Period is 240,000 elapsed user cycle $ pfmon --system-wide --long-smpl-periods=240000 -t 10 Collect a flat profile on each online CPU during 10s. Period is 240,000 user elapsed cycles. Results are per-cpu You have a lot more examples on the perfmon web site, Following the documentation and pfmon users' guide. Perfmon/pfmon can use PEBS on Intel Core processors. First step is to insert the kernel module for it: # modprobe perfmon_pebs_core_smpl Then use pfmon, we use instruction_retired because elapsed cycles does not support PEBS: $ pfmon --smpl-module=pebs -einstructions_retired --long-smpl-periods=120000 date Hope this helps. On Sat, Nov 15, 2008 at 7:36 PM, Andi Kleen <an...@fi...> wrote: > On Sat, Nov 15, 2008 at 05:30:58PM +0100, Eric Dumazet wrote: >> Andi Kleen a écrit : >> >>>And no, blindly subtracting 16 from IP is not a fix. >> >>Who mentioned a fix ? I am only giving more fuel to Intel guys so they >> >>hopefully can give us a working oprofile. >> > >> >You would need to implement PEBS support to avoid that problem. But it's a >> >big >> >task. perfmon2 implements it already. >> > >> >> Thanks for the information. >> >> Hum, so I grabbed perfmon2 git tree, installed various tools... >> >> I am quite new to pfmon and tried : >> >> # pfmon --system-wide >> sizeof=64 44 >> <press ENTER to stop session> >> >> Then started "tbench 8", and got a kernel panic after 6 seconds. >> >> >> I was using oprofile like this >> >> opcontrol --vmlinux=/path/vmlinux --start >> // doing some benchmarking... >> opreport -l vmlinux | head -n 40 >> >> >> What would be a working equivalent for perfmon2 based tools ? > > Probably getting a perfmon tree that works. I guess Stephane > can help (cc'ed). Or just deal with imprecise events for now. > > -Andi > -- > ak...@li... > |
From: Andi K. <an...@fi...> - 2008-11-17 15:15:02
|
On Mon, Nov 17, 2008 at 04:02:19PM +0100, stephane eranian wrote: Thanks for explaining it better than I could. Just a minor correction. > But it can be very large when overflow happens > during a kernel critical section > where interrupts are off. Actually oprofile uses the NMI to avoid that particular problem. It still has limits, e.g. no profiling of SMM sections. > There is nothing SW can do about all of this. Yes profiling on a out of order machine is hard. > Furthermore, PEBS can record samples while > in kernel critical sections. Again with oprofile that works too without PEBS. -Andi |
From: Eric D. <da...@co...> - 2008-11-17 15:16:59
|
stephane eranian a écrit : > Hello, > > I have not seen the beginning of that discussion so my comments may be > slightly off. > It seems Eric has problems with accuracy of instruction addresses when > sampling with the PMU. > > This is an inherent limitation of the PMU. It can be mitigated but not > completely eliminated. The core > issue is that it takes several cycles between the moment a counter > overflows and posting of the PMU > interrupt. During that time, the CPU keeps on executing instructions. > The interrupt IP you get, reflects > the place you were when it triggered. That can be far away from where > it was posted and where the > counter actually overflowed. Of course, if you are stalled that > distance is usually 0 or off by a small > number of instructions. But it can be very large when overflow happens > during a kernel critical section > where interrupts are off. There is nothing SW can do about all of this. > > Andi mentioned PEBS. I don't know if you are familiar with what it > does. Let me summarize. This is > a hardware/microcode feature which implements a hardware-managed > buffer where samples are > stored. The OS points the CPU to a memory region where PMU samples are > stored. No PMU > interrupt is generated until the buffer becomes full. That part > addresses some of the overhead > associated with interrupt-based sampling. Unfortunately, PEBS does > not point to the instruction where > the counter overflowed, it will still be a few instructions off. But > this time, you get the machine state at the > last retired instruction. Furthermore, PEBS can record samples while > in kernel critical sections. A limitation > of PEBS is that it does not work with all the PMU events. Only a > handful are available. > > As for perfmon, if you pull from the perfmon2 GIT tree, this should > work. Don't know what happen in > you case. > > Perfmon and the pfmon can do simple counting or also collect profiles. > > $ pfmon date > > Counts cycles at the user level only for the process date > > $ pfmon --system-wide -t10 > > Counts elapsed cycles at user level for all CPU for 10s. Results are per-cpu > > $ pfmon --long-smpl-periods=240000 date > > Collect a flat profile of process date. Period is 240,000 elapsed user cycle > > $ pfmon --system-wide --long-smpl-periods=240000 -t 10 > > Collect a flat profile on each online CPU during 10s. Period is > 240,000 user elapsed cycles. Results are per-cpu > > You have a lot more examples on the perfmon web site, Following the > documentation and pfmon users' guide. > > Perfmon/pfmon can use PEBS on Intel Core processors. First step is to > insert the kernel module for it: > # modprobe perfmon_pebs_core_smpl > > Then use pfmon, we use instruction_retired because elapsed cycles does > not support PEBS: > > $ pfmon --smpl-module=pebs -einstructions_retired > --long-smpl-periods=120000 date > > Hope this helps. > Sure it does ! Thanks a lot Stephane, I am going to try all this stuff. |