Re: [Valgrind-users] Callgrind results on ppc

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Julian Seward wrote:
>>What I do know is that there is a nasty hack in m_stacktrace.c, the part
>>for unwinding the stack -- see VG_(get_StackTrace2) and specifically the
>>stuff for setting/using lr_is_first_RA.  This was from one of the IBM
>>linux guys, unfortunately moved on elsewhere now.
> 
> 
> I later documented the trick as shown below, so at least we can see what 
> it is doing.
> 
> J
> 
>    /* We have to determine whether or not LR currently holds this fn
>       (call it F)'s return address.  It might not if F has previously
>       called some other function, hence overwriting LR with a pointer
>       to some part of F.  Hence if LR and IP point to the same
>       function then we conclude LR does not hold this function's
>       return address; instead the LR at entry must have been saved in
>       the stack by F's prologue and so we must get it from there
>       instead.  Note all this guff only applies to the innermost
>       frame. */

By itself, the reasoning of that paragraph is not always correct.
The prolog of a recursive function that calls itself directly (so that
immediately after the recursive 'bl', then IP and LR do point to the
same function) might save the return address into the stack only when
forced to by preparation for a yet-deeper call.  The deepest call
on any call chain can avoid saving the return address into
the stack, as long as returns from a deepest call also know this.
Implementations of Ackerman's function often behave in this manner.
[Indeed, Ackerman's function is a useful testcase for Callgrind.]
Some hand-written subroutine nests have apriori bounds on the
nesting level (frequently 1, 2, or 3), and dedicate a general register
(instead of the stack) to hold the return address for each level.

If the first call after the outermost entry into function F is a
[recursive] call to F, and if the prolog determines that the
[recursive] entry is a leaf entry, and if therefore F decides not to
save the return address into the stack (and perhaps avoids constructing
a stack frame at all), then LR and IP will point to the same function,
and LR will be the current return address, but the logic of the
paragraph quoted above will say that the stack holds the current return
address.  This will be an error, either because the stack slot for this
level is logically undefined (never was written), or because leaf entry
uses no frame at all (and thus the return address that is in "the" stack
frame actually designates the _grandparent_ of the current activation.)

If the compiler is "nice", then an instruction is a CALL if and only if
it is a branch instruction with the LK bit set (the least significant bit.)
Any indirect jump through the Link or Count register, when the LK bit of
the instruction is 0, must be a RETURN, a tail-recursive continuation CALL
(which must use the Count register, because the return address must always
be in the Link register [unless the tail-recursive CALL is known to be a
leaf call, or otherwise skips part of the prolog]), or a 'switch' case.
Some compilers are "naughty": they set the LK bit willy-nilly.  After all,
the value in LR immediately after a RETURN is a "do not care."

The logic in the quoted paragraph also does not handle true co-routines:
two or more functions which resume each other by turns at the point
of previous "exit."  Runtime-generated code for formatted I/O often uses
co-routines, and so do various simulation engines.  Of course, co-routines
blur the meaning of CALL and RETURN, but Callgrind must cope somehow.

--