|
From: Josef W. <Jos...@gm...> - 2006-09-21 19:07:39
|
On Thursday 21 September 2006 16:49, Harris, Jeff wrote:
> #include <iostream>
> #include <string>
> using namespace std;
> int foo()
> {
> return 35;
> }
> int main()
> {
> cout << foo() << endl;
> return 0;
> }
>
> The call to foo appears the same in both attached outputs of running
> valgrind with callgrind. The differences show when the ppc program
> calls the libstdc++ methods for operator<<(). If you look at the output
> for x86, kcachegrind shows main calling foo, two libstdc++ methods, and
> _dl_runtime_resolve (presumably to find and relocate the libstdc++
> methods).
Yes.
As these "operator<<()" are 2 different functions,
_dl_runtime_resolve is called 2 times.
On x86, callgrind generates quite pretty call graphs for calls into
shared libraries, as it (1) defaults to ignore the call to the PLT section,
and (2) interpretes the jump at end of _dl_runtime_resolve to the
resolved function as "return for _dl_runtime_resolve and call into
resolved function".
> On ppc, the same call to main shows a call to foo and exit
The call to exit() already seems to be wrong.
> , but shows
> one call to 0x10000E2C. Address 0x10000E2C is the location of the
> dynamic relocation record for the operator<< method.
I wonder why this address is not found inside of a PLT section;
if that would be the case, it would have been ignored as in the x86 case.
> Stepping into
> 0x10000E2C, I see the call to the operator<< method as in the x86.
This actually looks sane. There is also a call to
_dl_runtime_resolve from 0x10000E2C (you can ignore the "'2").
> But,
> I have to step further into the operator<< call to see another call to a
> dynamic relocation record in order to see the second libstdc++ method as
> in x86. In the disassembly of main, both libstdc++ calls occur in main,
> there is no recursion.
That is an example how it looks like when reality and callgrinds
shadow stack are not in sync. Obviously, a PPC jump which should
have been interpreted as a return was interpreted as a call, and
therefore, the second call to operator<< is 2 levels too deep.
To analyse such problems, it is best to look at the order of
function enter/exit events as callgrind observes them.
You can print out the order of function enter events (and exit events
implicitly via indentation) with
valgrind --tool=callgrind --ct-verbose1=main ./testprog
Meaning of "--ct-verbose1=main" here: "Switch to verbose mode 1 when
entering function <main>, and restore verbose mode (actually, to 0 again)
when leaving <main>", and verbosity 1 prints out the dynamic call tree.
> I'm guessing that valgrind/callgrind is not seeing a "return" from the
> dynamic relocation record, causing it to think the function never exits.
> Does valgrind/callgrind perhaps not recognize that 0x10000E2C is a
> relocation entry which may act differently than a local function call?
It is not that easy, as there are a lot of calls in the call tree, even
with this small example. I compiled it, and run it with callgrind on
our PPC32 machine, with printing out the events as shown above
(see attached file). You see that the call level slowly gets
more to the right (deeper and deeper), and there are 3 places where
it gets around 10 levels up again in one step, when entering
* 0x10010E64 (in line 225)
* exit (in line 390), and
* __libc_csu_fini (in line 485)
These points actually are resynchronisation points, using the stack
pointer. This is needed to make the tool robust, and to
handle e.g. longjumps - also on x86 - correctly. And therefore, you see
a call to exit() from main()...
One has to look at the PPC assembler to detect where these wrong
interpretations happen, and think about good heuristics how to recognize
them correctly.
The thing is, I never got around to do this very carefully.
Partly, because it did not known PPC assembler before the last time
I looked at this stuff.
x86 with its explicit call/ret instructions is way easier to get right;
on x86, the stack pointer always changes on call/ret. On PPC, this does
not need to happen as the return address is stored in the link register.
So: ideas for good heuristics welcome.
Josef
|