Ok - none of the following helped to increase the number of samples from 2675 per/sec:
[CPU_CLK_UNHALTED, count is 100000, Intel P6 family, model is 15]
- running a spinloop on one cpu (affined)
- running a spinloop on 7 cpus(affined) out of 8
- removing the affinity and run the above
- using --separate=lib,kernel
- using opreport --symbols

I then added some instrumentation in the oprofile driver code. I see that the number of
NMIs that were delivered per sec itself is much fewer and only slightly above the above
number of samples. On another (newer) P6 machine with the different model number,
the same instrumentation shows the expected number of NMIs (i.e 24000 NMIs/sec, this
is a 2.4GHZ machine).

So, I am trying to figure out why the expected # of NMIs are not getting generated on a
P6 family, model 15 cpu. I am going to see if the NMI watchdog itself can fire correctly
with different 'nmi_hz' values first.

Meanwhile, I have switched to the 'timer' mode profiling and with that, I can get about
1000 samples/sec per cpu (HZ is set to 1000). I did get some overflow when profiling this
for say 100 secs that went away after increasing the per cpu sample buffer size. So, the
timer profiling is working as expected.

Thanks,
--Vasu



From: svasudevan@hotmail.com
To: maynardj@us.ibm.com
CC: oprofile-list@lists.sourceforge.net; suravee.suthikulpanit@amd.com
Subject: RE: oprofile fewer samples
Date: Tue, 31 Jul 2012 15:54:52 +0000



> Date: Tue, 31 Jul 2012 08:26:57 -0500
> From: maynardj@us.ibm.com
> To: svasudevan@hotmail.com
> CC: oprofile-list@lists.sourceforge.net; Suravee.Suthikulpanit@amd.com
> Subject: Re: oprofile fewer samples
>
> On 07/30/2012 07:30 PM, Vasudevan S wrote:
> > Hello Maynard,
> > Thanks for the reply!
> >
> > Initially, the profiling was done on our userspace file system process that can generate quite a bit of cpu,
> > memory, storage load. There is zero to few percentage of idle here. 'oprofile' does report samples for this
> > process, but far fewer.
> >
> > Now, to understand this problem better, I just created a single threaded spin loop program, run it for
> > 100 seconds after binding it to a cpu (taskset -c 1 spinloop). This machine has 8 cores with the clock of 2.6GHZ.
> >
> > By default, I see:
> > samples| %|
> > ------------------
> > 23602 99.8393 spinloop
> > 30 0.1269 vmlinux
> >
> > I then rebooted the system with the "idle=loop" option:
> The 'idle' boot parameter is and x86 thingy, and my expertise is mostly ppc64, so I added Suarvee to cc.
> > samples| %|
> > ------------------
> > 238630 89.1911 vmlinux
> > 28879 10.7939 spinloop
> >
> > The samples have significantly increased. But, still only about 2675 samples per second for all the cpus.
> > The CPU_CLK_UNHALTED count is set as 100000.
> The numbers seem reasonable for one process.

It is a single process spin loop. I bound it to one cpu so that one cpu never enters idle and I can verify
the number of samples for one cpu for 100 secs.

Isn't 2675 samples per second still less for a 2.6GHZ 8 core machine, when idle=poll is set?

> So are you saying you started 8 of these spinloop jobs and ran oprofile during their run?

I will be doing this today.

>  I asked you previously what your opreport command line is, but you didn't respond.

I did list the .daemonrc output and my command lines :-).

I use opreport as: 'opreport --merge cpu'.

> Are you simply doing "opreport --symbols" and then picking
> and choosing which samples you *think* are associated with your spinloop? Or, since you're binding the spinloop to a specific CPU and you're
> running with --separate=cpu, are you using 'opreport cpu:<cpu#> [options]' ? I also see you are not using --separate=lib,kernel. To best see
> *all* samples (kernel, shared libs, and executable) associated with your spinloop program, I suggest you run with --separate=lib,kernel. Skip
> the binding to a specific CPU and skip the --separate=cpu. Then when you generate the report, do it like 'opreport --symbols ./spinloop'. If
> you're profiling an application that may use kernel modules, then add --image-path=/lib/modules/`uname -r` to your opreport options to mak!
> e sure that samples taken in those modules are also include in the report.

Thanks much for this. Let me try this today and see how that goes.

--Vasu

> -Maynard
>
> >
> > Its probably impossible to get ~26000 samples per sec per cpu here, because of the sample processing that
> > happens in the NMI context, correct? But, trying to understand why the samples would be this low.
> >
> > /root/.oprofile/daemonrc:
> > SESSION_DIR=/var/lib/oprofile
> > NR_CHOSEN=0
> > SEPARATE_LIB=0
> > SEPARATE_KERNEL=0
> > SEPARATE_THREAD=0
> > SEPARATE_CPU=1
> > VMLINUX=/vmlinux
> > IMAGE_FILTER=
> > CPU_BUF_SIZE=0
> > CALLGRAPH=0
> > XENIMAGE=none
> >
> > Run the program like this:
> > # opcontrol --deinit; opcontrol --reset
> > # opcontrol --start --vmlinux=/vmlinux; taskset -c 1 spinloop 100; opcontrol --stop; opcontrol --dump
> >
> > Run the opreport like this:
> > # opreport --merge cpu
> >
> > Thanks,
> > --Vasu
> >
> >> Date: Mon, 30 Jul 2012 17:57:07 -0500
> >> From: maynardj@us.ibm.com
> >> To: svasudevan@hotmail.com
> >> CC: oprofile-list@lists.sourceforge.net
> >> Subject: Re: oprofile fewer samples
> >>
> >> On 07/29/2012 02:21 PM, Vasudevan S wrote:
> >> > Hi,
> >> > I am using oprofile (0.9.7) on a linux system for the CPU_CLK_UNHALTED event
> >> > with the count of 100000.
> >> >
> >> > This is a machine with 8 cores (no HT), each with 2.66GHZ.
> >> >
> >> > Timer based profiling is not ON:
> >> > # cat /dev/oprofile/cpu_type
> >> > i386/core_2
> >> >
> >> > Now, when running a test that shows almost no idle, oprofile shows that
> >> > only ~5000 samples were collected over the period of 7 mins. I expected to see
> >> > at least 20K samples per sec per cpu on this machine.
> >> >
> >> > Lot of idle time would've explained this, but thats not the case with this test.
> >> >
> >> > I am trying to debug this, but if any one has some ideas, that would help!
> >> Oh, this could be so many different things causing this. First, what exactly are you doing that "shows almost no idle" on your system? Is there a particular app you are trying to profile? If so, does opeport show *any* samples for that app? What are the contents of /root/.oprofile/daemonrc? What is your opreport command line that you're using?
> >>
> >> -Maynard
> >> >
> >> > Thanks,
> >> > --Vasu
> >> >
> >> >
> >> > ------------------------------------------------------------------------------
> >> > Live Security Virtual Conference
> >> > Exclusive live event will cover all the ways today's security and
> >> > threat landscape has changed and how IT managers can respond. Discussions
> >> > will include endpoint security, mobile security and the latest in malware
> >> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> >> >
> >> >
> >> >
> >> > _______________________________________________
> >> > oprofile-list mailing list
> >> > oprofile-list@lists.sourceforge.net
> >> > https://lists.sourceforge.net/lists/listinfo/oprofile-list
> >>
>