From: Maynard J. <may...@us...> - 2012-12-12 23:34:24
|
On 12/12/2012 12:45 PM, David Flater wrote: > On 12/12/2012 9:45 AM, David Flater wrote: >> On 12/11/2012 7:39 PM, Maynard Johnson wrote: >>> Nevertheless, I am not seeing this callgraph problem with either perf or operf on my RHEL 6.3/Sandybridge laptop. Below, you mention you're running kernel 3.6.7. Maybe this is a regression. Can you try to reproduce on an older distro (like RHEL 6.x or SLES 11 SPx? >> >> Taking that as agreement that I'm seeing a real problem, the first >> thing I will try is to see if an older version of gcc has a different >> behavior and yields different results through the profiling tools. > > Attached is a minor revision of the test program and results from different > versions of gcc. The "missing main" problem correlates 100% with whether or > not the code generated by gcc pushes rbp at the top of each function. > > All of the following were built and tested under x86_64 with mainline linux > kernel 3.6.7. The gcc command line in each case was > $(GCCBIN) -std=gnu99 -Wall -Wextra -pedantic $(OPTLVL) -g > -fno-omit-frame-pointer -o val4 val4.c > with GCCBIN and OPTLVL varying as shown below. > > gcc 4.7.2 -O1 -m64: bad rbp is pushed late in fn2 > gcc 4.7.2 -O0 -m64: OK > gcc 4.7.2 -O1 -m32: OK > gcc 4.7.0 -O1 -m64: bad rbp is pushed late in fn2 > gcc 4.6.3 -O1 -m64: OK > gcc 4.5.4 -O1 -m64: worse rbp is never pushed in fn2 or fn1 > gcc 4.4.7 -O1 -m64: worse rbp is never pushed in fn2 or fn1 > gcc 4.3.6 -O1 -m64: OK > > I'll get to work on a bug report for gcc. Hi, David, I don't doubt that different versions of gcc can result in a different looking oprofile callgraph, but I don't think this is a gcc bug. I think gprof2dot is misinterpreting the meaning of oprofile callgraph data, which is different from gprof callgraph output. A gprof callgraph is generated by way of instrumenting the target binary, which allows for recording the actual calls, so you get output similar to the following: ----------------------------------------------- index % time self children called name 0.95 0.97 1/1 main [2] [1] 100.0 0.95 0.97 1 fn2 [1] 0.97 0.00 1/1 fn1 [3] ----------------------------------------------- <spontaneous> [2] 100.0 0.00 1.92 main [2] 0.95 0.97 1/1 fn2 [1] ----------------------------------------------- 0.97 0.00 1/1 fn2 [1] [3] 50.5 0.97 0.00 1 fn1 [3] ----------------------------------------------- In gprof callgraph output, when you see function foo above the non-indented line for function bar, it *always* means that foo called bar. The oprofile callgraph output looks similar to gprof, but the interpretation is different. Here are a couple of snippets of callgraph output from your "gcc 4.7.2 -m64" run: Snippet #1 ----------------------------------------------- 2496 33.2711 val4 fn2 2502 33.3511 libc-2.15.so __libc_start_main 2504 33.3778 val4 main 2496 49.8403 val4 fn2 ----------------------------------------------- And later in the output . . . Snippet #2 ------------------------------------------------------------------------------- 0 0 libc-2.15.so __libc_start_main 2504 50.0200 val4 main 2502 49.9800 val4 fn2 0 0 libc-2.15.so __libc_start_main [self] ------------------------------------------------------------------------------- 2504 100.000 libc-2.15.so __libc_start_main 0 0 val4 main 2504 100.000 val4 fn2 0 0 val4 main [self] ------------------------------------------------------------------------------- In Snippet #1, the fact that __libc_start_main is above the non-indented line for fn2 does *not* mean that __libc_start_main called fn2. It means that __libc_start_main was somewhere in the stack when fn2 was running. The "samples" column on the left-hand side indicates the number of times __libc_start_main was in the stack when fn2 was running. (See http://oprofile.sourceforge.net/doc/opreport.html#opreport-callgraph.) Note that in snippet #1, that number is 2502, whereas the number for main is 2504. Snippet #2 fills out the picture, showing that __libc_start_main was in the stack 2504 times when main was running. I don't know off-hand why oprofile output is split like that for __libc_start_main. The actual binary code generated by gcc apparently is a contributing factor. Maybe there's something oprofile can do to merge these two entries. I've opened bug https://sourceforge.net/tracker/?func=detail&aid=3595379&group_id=16191&atid=116191 to investigate. On the Intel Core 2 Duo running RHEL 6.3 (with gcc 4.6.6) that I have access to today, the callgraph output has zero entries for __libc_start_main where fn2 is the focus; instead, I see the following: ------------------------------------------------------------------------------- 28757 32.9858 val4_O1 fn2 58423 67.0142 val4_O1 main 28757 48.8044 val4_O1 fn2 . . . . blah . . . . blah ------------------------------------------------------------------------------- 0 0 libc-2.12.so __libc_start_main 58423 100.000 val4_O1 main 0 0 libc-2.12.so __libc_start_main [self] ------------------------------------------------------------------------------- 58423 100.000 libc-2.12.so __libc_start_main 0 0 val4_O1 main 58423 100.000 val4_O1 fn2 0 0 val4_O1 main [self] ------------------------------------------------------------------------------- So in this case, the split is complete! Over the years, we've had several bugs involving how opreport callgraph treats functions that have zero samples, as is the case here with both main and __libc_start_main. I don't know if this is yet another one of those issues -- TBD. Nevertheless, taken in its entirety and interpreting it correctly, your callgraph data (snippets 1 & 2) shows a callchain of: __libc_start_main -> main -> fn2 It's clear that gprof2dot is misinterpreting the data in showing an arc from __libc_start_main to fn2. -Maynard > > > > > ------------------------------------------------------------------------------ > LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial > Remotely access PCs and mobile devices and provide instant support > Improve your efficiency, and focus on delivering more value-add services > Discover what IT Professionals Know. Rescue delivers > http://p.sf.net/sfu/logmein_12329d2d > > > > _______________________________________________ > oprofile-list mailing list > opr...@li... > https://lists.sourceforge.net/lists/listinfo/oprofile-list |