Re: [perfmon2] strange results with libpfm4
Status: Beta
Brought to you by:
seranian
From: stephane e. <er...@go...> - 2010-10-15 20:57:23
|
Victor, FYI, I posted the kernel patch to fix this today on LKML. See: http://lkml.org/lkml/2010/10/15/192 On Thu, Oct 14, 2010 at 9:17 PM, stephane eranian <er...@go...> wrote: > Victor, > > On Thu, Oct 14, 2010 at 3:05 PM, Victor Jimenez <vic...@bs...> wrote: >> Hello, >> >> I am interested in being able to sample the performance counters along >> the time (for instance, once every second). This is exactly what "task" >> example from libpfm4 does when the -p option is used. >> >> However, I am getting some strange results when monitoring programs that >> call the sleep function. Directly using the perf tool included in the >> kernel seems to provide correct results. >> >> Just for reference, I am using Ubuntu 10.10 (kernel version is >> 2.6.35-22) on an x86 system (please, let me know if you need more >> information). >> >> If I execute the following command: >> perf_2.6.35-22 stat -e cycles:uk,instructions:uk sleep 10 >> >> this is what I get: >> >> Performance counter stats for 'sleep 10': >> >> 1,119,863 cycles >> 564,811 instructions # 0.504 IPC >> >> 10.001751746 seconds time elapsed >> >> The number of cycles and instructions is really low, as expected, since >> the process is sleeping most of the time. >> > That's correct. > >> However, the output from "task" in libpfm4 gives me: >> >> task -p -e PERF_COUNT_HW_CPU_CYCLES,PERF_COUNT_HW_INSTRUCTIONS sleep 10 >> >> 1,121,784 PERF_COUNT_HW_CPU_CYCLES (0.00% scaling, >> ena=615,705, run=615,705) >> 480,523 PERF_COUNT_HW_INSTRUCTIONS (0.00% scaling, >> ena=615,705, run=615,705) >> >> 0 PERF_COUNT_HW_CPU_CYCLES (0.00% scaling, >> ena=615,705, run=615,705) >> 0 PERF_COUNT_HW_INSTRUCTIONS (0.00% scaling, >> ena=615,705, run=615,705) >> >> 0 PERF_COUNT_HW_CPU_CYCLES (0.00% scaling, >> ena=615,705, run=615,705) >> 0 PERF_COUNT_HW_INSTRUCTIONS (0.00% scaling, >> ena=615,705, run=615,705) >> >> 0 PERF_COUNT_HW_CPU_CYCLES (0.00% scaling, >> ena=615,705, run=615,705) >> 0 PERF_COUNT_HW_INSTRUCTIONS (0.00% scaling, >> ena=615,705, run=615,705) >> >> 0 PERF_COUNT_HW_CPU_CYCLES (0.00% scaling, >> ena=615,705, run=615,705) >> 0 PERF_COUNT_HW_INSTRUCTIONS (0.00% scaling, >> ena=615,705, run=615,705) >> >> 0 PERF_COUNT_HW_CPU_CYCLES (0.00% scaling, >> ena=615,705, run=615,705) >> 0 PERF_COUNT_HW_INSTRUCTIONS (0.00% scaling, >> ena=615,705, run=615,705) >> >> 0 PERF_COUNT_HW_CPU_CYCLES (0.00% scaling, >> ena=615,705, run=615,705) >> 0 PERF_COUNT_HW_INSTRUCTIONS (0.00% scaling, >> ena=615,705, run=615,705) >> >> 0 PERF_COUNT_HW_CPU_CYCLES (0.00% scaling, >> ena=615,705, run=615,705) >> 0 PERF_COUNT_HW_INSTRUCTIONS (0.00% scaling, >> ena=615,705, run=615,705) >> >> 0 PERF_COUNT_HW_CPU_CYCLES (0.00% scaling, >> ena=615,705, run=615,705) >> 0 PERF_COUNT_HW_INSTRUCTIONS (0.00% scaling, >> ena=615,705, run=615,705) >> >> 14,724,919,451 PERF_COUNT_HW_CPU_CYCLES (99.99% scaling, >> ena=9,001,429,971, run=785,445) >> 6,363,156,689 PERF_COUNT_HW_INSTRUCTIONS (99.99% scaling, >> ena=9,001,429,971, run=785,445) >> >> The first sample approximately matches the result from "perf". However, >> with libpfm4 there is a sample at the end with a significant number of > > With -p, you print the deltas every second. It's thus normal to get zero > given the process is blocked in sleep(). > >> cycles and instructions that does not appear with "perf". Initially, I >> was thinking I could just drop the last sample, but if the sleep is >> called in the middle of the execution, I would definitely get "wrong" >> results, as there will be more samples after that. >> > what you are seeing in the end is the result of the counter scaling. > The tool is scaling the count based on time_enabled vs time_running. > Time_enabled captures duration from start to stop whereas time_running > captures duration of when each event was actually on the PMU HW. > > Now, I think there is a kernel bug with accounting of time_running because > if you run task without -p it works (this is what perf does). But if in between > you read() counts then the value of time_running at the end is very different. > > I will investigate. > Thanks for pointing this out to me. > >> Do you have any idea on what could be the problem? >> >> Thank you very much, >> Victor >> >> WARNING / LEGAL TEXT: This message is intended only for the use of the >> individual or entity to which it is addressed and may contain >> information which is privileged, confidential, proprietary, or exempt >> from disclosure under applicable law. If you are not the intended >> recipient or the person responsible for delivering the message to the >> intended recipient, you are strictly prohibited from disclosing, >> distributing, copying, or in any way using this message. If you have >> received this communication in error, please notify the sender and >> destroy and delete any copies you may have received. >> >> http://www.bsc.es/disclaimer.htm >> >> ------------------------------------------------------------------------------ >> Beautiful is writing same markup. Internet Explorer 9 supports >> standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3. >> Spend less time writing and rewriting code and more time creating great >> experiences on the web. Be a part of the beta today. >> http://p.sf.net/sfu/beautyoftheweb >> _______________________________________________ >> perfmon2-libpfm4-commits mailing list >> per...@li... >> https://lists.sourceforge.net/lists/listinfo/perfmon2-libpfm4-commits >> > |