Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

#191 time discrepancy using oprofile

closed-fixed
None
5
2011-06-15
2010-09-17
PJ Nee
No

I'm using oprofile 0.9.3. I've noticed a time discrepancy which I can't understand. I'm running a function that I manually time at approx 50 secs. Yet when I run oprofile and do the maths on it, it works out at 23 secs. CPU speed is being reported as 1203MHz (estimated). (277796 * 100000)/1203MHz = 23secs. I'm running on a Core 2 with clock speed of 2.4GHz.

root@pjnee-desktop:/home/pjnee/CRAY/libitpdriver/version_003/Linux_build/Test# opreport -c
CPU: Core 2, speed 1203 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
samples % symbol name
-------------------------------------------------------------------------------
277796 100.000 ai_ExecuteUserDiag
2 0.2002 CNehalemII::GetUserCodeStatus()
277793 99.9917 CNehalemII::CheckInDebugMode(bool*)
12 0.0043 CNehalemII::DebugMode(int)
3 0.0011 CNehalemII::UpdateLogicalCPU()
2 7.2e-04 CNehalemII::TapStatus(int, int*)
2 7.2e-04 CNehalemII::SetEnterExitMode(int, int)
2 7.2e-04 CNehalemII::GetUserCodeStatus() [self]
1 3.6e-04 .plt
1 3.6e-04 __i686.get_pc_thunk.bx
-------------------------------------------------------------------------------

root@pjnee-desktop:/home/pjnee/CRAY/libitpdriver/version_003/Linux_build/Test#

Discussion

  • PJ Nee
    PJ Nee
    2010-09-17

     
    Attachments
    • assigned_to: nobody --> ssuthiku
     
  • Suravee, can you find time to investigate this bugzilla, please? Thanks.

     
  • There are many factors which could affect the behavior reported here.

    - User used CPU_CLK_UNHALTED event and try to co-relate the profile result to the wall clock time. This only make sense if the target application is CPU-bound.

    - If the CPU power state changes at some point during the profile which normally affects the CPU frequency , this will affect the amount samples collected here.

    I don't have Intel Core2-based system. However, I experimented on the AMD family10h system using the current OProfile (0.9.6) profiling a CPU-bound (matrix multiplication benchmark) target workload. And I can see that the amount of samples corresponded to the wall clock time as long as the two conditions I mentioned above are met.

     
  • pjnee,
    Suravee has posted a patch for the bug where opreport shows an incorrect CPU speed. The patch is available on the oprofile-list. If you're not subscribed to the list, you can access her posting at:
    http://marc.info/?l=oprofile-list&m=130678359708896&w=2

    Please try this patch at your earliest convenience and let us know if that resolves your problem.

     
    • status: open --> closed-fixed
     
  • [Comment from Maynard and Andi]
    Please not that the header line of an opreport (where the processor speed is printed) includes the caveat "(estimated)", so no one should *EVER* expect precise correlation between number of cycles-based samples and wall clock using this estimated processor speed. This is due to the processor P-State which affect the running frequency at a certain point.