Thanks Stephane. That works for me.


On Mon, Sep 9, 2013 at 6:58 PM, Stephane Eranian <eranian@googlemail.com> wrote:
Hi,

I get the same thing with my own little multithreaded benchmark.

I think it has to do with what happens in per-thread/per-cpu mode
with inheritance. I think each events of the child threads are propagated
to each parent. I get 12x the amount on my 12 CPUs workstation.
If you drop -i in task_cpu, then you only capture one thread's execution
across all the CPUs.

In per-thread mode + inehritance + cpu, it seems the kernel aggregates
per PID and not per TID. Not sure this was intended.

I think for what you are after, you need to get system-wide with per
CPU breakdown:

$ perf stat -a -A -e instructions ./binary

Then the counts will add up.



On Mon, Sep 9, 2013 at 6:44 PM, Bhavishya Goel <bhavishya.goel@gmail.com> wrote:
> Yes, that's what I am doing. Below is an example of the counts I get from
> task vs task_cpu from a 4-threaded binary:
>
> $> task -i -e INSTRUCTIONS_RETIRED ./binary
>
> 27 708 362 452 INSTRUCTIONS_RETIRED
>
> $> task_cpu -i -e INSTRUCTIONS_RETIRED ./binary
> CPU0 29 529 250 689 INSTRUCTIONS_RETIRED
>
> CPU1 26 991 683 307 INSTRUCTIONS_RETIRED
>
> CPU2 24 404 405 488 INSTRUCTIONS_RETIRED
>
> CPU3 30 442 586 081 INSTRUCTIONS_RETIRED
>
> Adding up the task_cpu counts from all 4 CPUs will give me instructions
> retired count which is almost four times of what I get from task. And I know
> the count from task is correct because this is very close to what I get from
> Intel PCM-TSX tool. Can you tell me what am I doing wrong?
>
>
>
> On Mon, Sep 9, 2013 at 4:46 PM, Stephane Eranian <eranian@googlemail.com>
> wrote:
>>
>> On Mon, Sep 9, 2013 at 4:35 PM, Bhavishya Goel <bhavishya.goel@gmail.com>
>> wrote:
>> > It is multi-threaded.
>> >
>> >
>> Then you need to add up all the counts from all the CPUs.
>>
>> > On Mon, Sep 9, 2013 at 4:26 PM, Stephane Eranian
>> > <eranian@googlemail.com>
>> > wrote:
>> >>
>> >> On Tue, Sep 3, 2013 at 8:51 AM, Bhavishya Goel
>> >> <bhavishya.goel@gmail.com>
>> >> wrote:
>> >> > Hi,
>> >> >
>> >> > I am using "task_cpu" example in perf_examples folder of libpfm4.4
>> >> > (linux
>> >> > kernel 3.11-rc7, micrarchitecture: haswell). The counter numbers that
>> >> > I
>> >> > get
>> >> > from task_cpu seem to be wrong as they are vastly different than what
>> >> > I
>> >> > get
>> >> > from "task" example and Intel's PCM-TSX tool. This is an example of
>> >> > command
>> >> > line I use:
>> >> >
>> >> > task_cpu -i -e INSTRUCTIONS_RETIRED ./binary
>> >> >
>> >> > task -i -e INSTRUCTIONS_RETIRED ./binary
>> >> >
>> >> > The counter numbers from task_cpu are almost double of numbers from
>> >> > task.
>> >> > While, the numbers from PCM-TSX match the numbers from task. Is it a
>> >> > bug
>> >> > of
>> >> > am I doing something wrong?
>> >> >
>> >> Sorry for late reply.
>> >> Is your binary multi-process or multi-thread?
>> >>
>> >> > --
>> >> > ಠ_ಠ
>> >> >
>> >> >
>> >> >
>> >> > ------------------------------------------------------------------------------
>> >> > Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012,
>> >> > more!
>> >> > Discover the easy way to master current and previous Microsoft
>> >> > technologies
>> >> > and advance your career. Get an incredible 1,500+ hours of
>> >> > step-by-step
>> >> > tutorial videos with LearnDevNow. Subscribe today and save!
>> >> >
>> >> >
>> >> > http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
>> >> > _______________________________________________
>> >> > perfmon2-devel mailing list
>> >> > perfmon2-devel@lists.sourceforge.net
>> >> > https://lists.sourceforge.net/lists/listinfo/perfmon2-devel
>> >> >
>> >
>> >
>> >
>> >
>> > --
>> > ಠ_ಠ
>
>
>
>
> --
> ಠ_ಠ



--
ಠ_ಠ