On 06/17/2012 02:51 PM, Lisong Xu wrote:

I have some difficulty in understanding the call graph output. I have read the manual and some other online tutorials, but still could not understand it. Could somebody help me on it? Thanks!

Below is part of the call graph example http://oprofile.sourceforge.net/examples/opreport-l-lyx.bz2

  88       100.000  lyx-qt                   boost::scoped_ptr<Buffer::Impl>::~scoped_ptr()
0              0  lyx-qt                   void boost::checked_delete<Buffer::Impl>(Buffer::Impl*)
  99       100.000  lyx-qt                   Buffer::Impl::~Impl()
  0              0  lyx-qt                   void boost::checked_delete<Buffer::Impl>(Buffer::Impl*) [self]

I have the following questons:

1. why the sample value of checked_delete is 0? No sample?
Yes, it means no samples were taken while boost::checked_delete<Buffer::Impl> was running.

Every time a sample is taken when in callgraph profiling mode, a snapshot of the callstack is taken.  As you can see, there *were* samples taken for Buffer::Impl::~Impl(), and its callstack would have included checked_delete<Buffer::Impl> as a caller.  When opreport generates a callgraph report, all functions that are captured in callstacks are identified and an entry is generated for each where the function is the one being focused on (i.e., non-indented).  So even functions that were not hot enough to get any samples will have a non-indented entry in the report.  The oprofile manual says "This line is the same as you'd get from a normal opreport output", but there is one difference -- a normal (non-callgraph) report would not show any line at all for functions that had no samples.  The callgraph report shows such functions for completeness, as it's needed for GUI tools to render the complete callchain.

If you look further in the example callgraph report for the lyx app (on the oprofile web site), you'll see entries where checked_delete<Buffer::Impl> show up as a caller or callee, and in these cases, the number of samples is non-zero.  Why is that?  Again, the manual explains: 

"The samples and percentage values here refer to the number of times we took a sample where this call was found in the stack; the percentage is relative to all other callers of the function we're focussing on. Note that these values are not call counts; they only reflect the call stack every time a sample is taken; that is, if a call is found in the stack at the time of a sample, it is recorded in this count."

2. my understanding is that the total number of samples above the non-indented line (functions called it) should be equal to the total number of samples of below it (functions called by it). However, in this example,  the number of scoped_ptr samples is 88, but the number of Impl and checked_delete samples is 99. Why there is a difference between 88 and 99?
There is no such equation.  Take the example of a()->b()->printf(), but b() only calls printf() when a() passes a particular value.  The callgraph entry for b() would show a higher count for a() than for printf().

Another related question is about the following call graph output

0              0  test                    del_timer
  3        75.0000  test                    monotonic_time
  1        25.0000  test                    thr_poll
  0              0  test                    del_timer [self]

Why there is no line above the non-indent line? That is, why there is no information about the functions who called del_timer?
This is probably due to either lost samples, a bug, or a limitation in the architecture (which I'd consider a bug, as well).

Lost samples: If you aren't already using oprofile 0.9.6 or higher, I recommend upgrading.  Since the 0.9.6 release, opreport issues a warning message if it detects overflows in the sample buffers (implying lost samples).  If the oprofile log shows a high number of lost samples, then decreasing the sampling rate (by using a higher 'count' value in the event specification) will either eliminate or reduce the lost samples substantially).  Just remember that the higher proportion of lost samples to samples received, the less statistically valid your profile is.  You don't need to get the number of lost samples down to zero (often, not even possible), but the lost-to-received proportion should be very low -- say, 1%.

Bugs:  I'm not aware of any specific fix for the problem you show above, but in general, I recommend you upgrade to the highest level of kernel and oprofile that you can to pick up any fixes that may be related to callgraph profiling.

Architecture limitations: There seem to be architecture specific nuances to how the kernel can capture callstack information, and in certain situations, the information may not be complete or accurate.

In general, I would recommend not spending too much time trying to figure out callgraph entries that are not "hot".  If only a very few samples were taken for a particular callchain, not only does it not matter as far as the performance analysis of the application, but it may not even be statistically valid.

Hope this helped.


Thank you very much! I have been searching for the explanation for a long time, but could not find it.


Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/

oprofile-list mailing list