Menu

#230 opreport callgraph sometimes splits an arc

closed-wont-fix
nobody
None
5
2012-12-19
2012-12-12
No

This problem was reported to the oprofile-list by David Flater <dflater@nist.gov>. See his original posting from Dec 10, 2012, with subject heading "Fwd: lazy frame push causes wrong call graph".

The callgraph output of opreport will sometimes display two separate arcs that should have been combined. For example, the attached testcase, when compiled with gcc 4.7 and optimization level O1 resulted in the following callgraph output for David:

Snippet #1
-----------------------------------------------
2496 33.2711 val4 fn2
2502 33.3511 libc-2.15.so __libc_start_main
2504 33.3778 val4 main
2496 49.8403 val4 fn2
-----------------------------------------------

And later in the output . . .

Snippet #2
-------------------------------------------------------------------------------
0 0 libc-2.15.so __libc_start_main
2504 50.0200 val4 main
2502 49.9800 val4 fn2
0 0 libc-2.15.so __libc_start_main [self]
-------------------------------------------------------------------------------
2504 100.000 libc-2.15.so __libc_start_main
0 0 val4 main
2504 100.000 val4 fn2
0 0 val4 main [self]
-------------------------------------------------------------------------------

With the attached testcase, the __libc_start_main function calls main which in turn calls fn2. One would expect that the number of times __libc_start_main shows in the stack for fn2 would be equal to that for main. Note that in snippet #1, __libc_start_main is in the stack 2502 times, whereas the number for main is 2504. Snippet #2 fills out the picture, showing that __libc_start_main was in the stack 2504 times when main was running. It seems that the version of gcc is a contributing factor to this split. Further testing on an Intel Core 2 Duo running RHEL 6.3 (with gcc 4.6.6), the callgraph output has zero entries for __libc_start_main where fn2 is the focus; instead, I see the following:

-------------------------------------------------------------------------------
28757 32.9858 val4_O1 fn2
58423 67.0142 val4_O1 main
28757 48.8044 val4_O1 fn2
. . . . blah
. . . . blah

-------------------------------------------------------------------------------
0 0 libc-2.12.so __libc_start_main
58423 100.000 val4_O1 main
0 0 libc-2.12.so __libc_start_main [self]
-------------------------------------------------------------------------------
58423 100.000 libc-2.12.so __libc_start_main
0 0 val4_O1 main
58423 100.000 val4_O1 fn2
0 0 val4_O1 main [self]
-------------------------------------------------------------------------------

We should see if oprofile can detect and handle such splits and combine stacks appropriately in the output.

Discussion

  • Maynard Johnson

    Maynard Johnson - 2012-12-12
     
  • Maynard Johnson

    Maynard Johnson - 2012-12-19
    • status: open --> closed-wont-fix
     
  • Maynard Johnson

    Maynard Johnson - 2012-12-19

    I did a deep dive into oprofile callgraph the last day or so and found that it's supposed to be interpreted just as gprof callgraph data, where every function above the non-indented line is *supposed to be* a caller of the non-indented function. I used a different testcase that exhibited similar behavior to what was reported in this bug, where callgraph data showed __libc_start_main apparently calling some function that was, in fact, called by main. My deep dive proved the opreport callgraph output was only reflecting the information provided by the kernel, which, obviously was wrong. I had compiled my testcase with gcc 4.4.6, and so the number of times this incorrect callchain info was received from the kernel was quite low. Apparently (as reported in this bug), the situation worsens with newer gcc.

    So the results of my digging led me back to looking at http://oprofile.sourceforge.net/doc/interpreting-callgraph.html, where the following statement is made:

    ----------------------
    Often you may see a caller of a function that does not actually directly call the function you're looking at (e.g. if a() calls b(), which in turn calls c(), you may see an entry for a()->c()). What's actually occurring is that we are taking samples at the very start (or the very end) of c(); at these few instructions, we haven't yet created the new function's frame, so it appears as if a() is calling directly into c(). Be careful not to be misled by these entries.
    ----------------------

    The upshot is that implementing a mechanism for oprofile to use libunwind should resolve this situation, assuming we're running on a new enough kernel that provides the extra info to use with libunwind. There's no good way that I can think of to fix this situation short of using libunwind, so I'm closing this bug as "Wont Fix" and will open a new feature bug to add the libunwind support.

     

Log in to post a comment.