|
From: Mohsen P. <mo...@pa...> - 2011-07-07 06:20:28
|
Dear all, I have a binary file what i compile it with -g.So i need to perform a set of action in my computer and see behavior of my file, This mean , i need to see functions of run when i perform those set of actions, So i need to tell to valgrind : Please print source of which peace of program that running.(my program is big,for this reason i can't debug and so just see name of those function which running,and then i put a hook in those.) How i do it? Yours, Mohsen |
|
From: David C. <dcc...@ac...> - 2011-07-07 07:58:08
|
On 7/6/2011 10:53 PM, Mohsen Pahlevanzadeh wrote:
> Dear all,
>
> I have a binary file what i compile it with -g.So i need to perform a
> set of action in my computer and see behavior of my file, This mean , i
> need to see functions of run when i perform those set of actions, So i
> need to tell to valgrind : Please print source of which peace of program
> that
> running.(my program is big,for this reason i can't debug and so just see
> name of those function which running,and then i put a hook in those.)
> How i do it?
>
> Yours,
> Mohsen
>
By default valgrind analyzes all code that executes when the program is
run. There is no need to tell it which pieces of code to test; it looks
at all of them. This of course makes the program run slower.
--
David Chapman dcc...@ac...
Chapman Consulting -- San Jose, CA
|
|
From: John R. <jr...@bi...> - 2011-07-07 13:50:37
|
> I have a binary file what i compile it with -g.So i need to perform a > set of action in my computer and see behavior of my file, This mean , i > need to see functions of run when i perform those set of actions, This sounds like some kind of profiling. Re-compile and re-link with "gcc -p" or "gcc -pg". If you cannot re-compile, then perhaps try "strace -i" or "ltrace -i" which will give you partial information. You'll need to do some work to process the instruction addresses, and you might have to perform a short backtrace (either dynamic or static) to get interesting intformation. More generally, write a utility program which reads your original binary program, then writes a new binary program having each static call: call subr1 replaced with an indirection: call *indir1 .section indir_section indir1: .addr subr1 Then change the contents of location indir1 dynamically at run time: point it to a logging subroutine for a while, etc. [This won't track existing indirect calls, but perhaps those can be ignored for a while.] -- |
|
From: Josef W. <Jos...@gm...> - 2011-07-07 14:52:11
|
On Thursday 07 July 2011, John Reiser wrote: > > I have a binary file what i compile it with -g.So i need to perform a > > set of action in my computer and see behavior of my file, This mean , i > > need to see functions of run when i perform those set of actions, > > This sounds like some kind of profiling. Re-compile and re-link with > "gcc -p" or "gcc -pg". If you cannot re-compile, then perhaps try > "strace -i" or "ltrace -i" which will give you partial information. > You'll need to do some work to process the instruction addresses, > and you might have to perform a short backtrace (either dynamic or > static) to get interesting intformation. > > More generally, write a utility program which reads your original binary > program, then writes a new binary program having each static call: > call subr1 > replaced with an indirection: > call *indir1 > .section indir_section > indir1: .addr subr1 > Then change the contents of location indir1 dynamically at run time: > point it to a logging subroutine for a while, etc. > [This won't track existing indirect calls, but perhaps those can > be ignored for a while.] valgrind --tool=callgrind --ct-verbose=1 ... Hmm? |
|
From: John R. <jr...@bi...> - 2011-07-07 16:16:23
|
> valgrind --tool=callgrind --ct-verbose=1 ...
When I run callgrind and callgrind_annotate, then I don't understand
the output. For instance, the connection between "73" and the number
of actual dynamic calls to exit() is mysterious to me:
=====
$ valgrind --tool=callgrind /bin/date
==3790== Using Valgrind-3.6.0 and LibVEX; rerun with -h for copyright info
$ callgrind_annotate callgrind.out*
Ir file:function
73 /usr/src/debug/glibc-2.13/stdlib/exit.c:exit [/lib64/libc-2.13.so]
-----/usr/src/debug/glibc-2.13/stdlib/exit.c
void
exit (int status)
{
__run_exit_handlers (status, &__exit_funcs, true);
}
-----
$ gdb /bin/date
(gdb) b exit
(gdb) run
Thu Jul 7 08:52:35 PDT 2011
Breakpoint 1, exit (status=0x0) at exit.c:99
(gdb) list ## we are at the right place
97 void
98 exit (int status)
99 {
100 __run_exit_handlers (status, &__exit_funcs, true);
101 }
102 libc_hidden_def (exit)
(gdb) continue
$ ## was called only once, not 73 times.
=====
For another instance, using the exact suggestion:
$ valgrind --tool=callgrind --ct-verbose=1 /bin/date
gives output such as:
-----
. . . > check_match.10789(0x3d, 0x5b, ...) [ld-2.13.so / 0x3d25a08a20]
. . . .> strcmp(0x3d, 0x5b, ...) [ld-2.13.so / 0x3d25a16ac0]
. . . . > strcmp(0x3d, 0x5b, ...) [ld-2.13.so / 0x3d25a16ac0]
-----
where the indicated parameters are nonsense, and a function whose body
is a loop displays as a recursion. It is hard for me to trust such output.
--
|
|
From: Josef W. <Jos...@gm...> - 2011-07-08 12:46:06
|
On Thursday 07 July 2011, John Reiser wrote:
> > valgrind --tool=callgrind --ct-verbose=1 ...
>
> When I run callgrind and callgrind_annotate, then I don't understand
> the output. For instance, the connection between "73" and the number
> of actual dynamic calls to exit() is mysterious to me:
> =====
> $ valgrind --tool=callgrind /bin/date
> ==3790== Using Valgrind-3.6.0 and LibVEX; rerun with -h for copyright info
> $ callgrind_annotate callgrind.out*
> Ir file:function
> 73 /usr/src/debug/glibc-2.13/stdlib/exit.c:exit [/lib64/libc-2.13.so]
The default for callgrind_annotate is to show self cost of the functions
for the given event types collected.
Thus, the "73" in the "Ir" column is the number of executed client instruction
inside of exit. It has nothing to do with the number of calls to exit.
Actually, callgrind_annotate only prints the number of calls for call arcs,
which are displayed e.g. with "--tree=caller":
> callgrind_annotate --tree=caller callgrind.out*
...
7,060 < ???:0x0000000000401d60 (1x) [/bin/date]
73 * /build/buildd/eglibc-2.13/stdlib/exit.c:exit [/lib/x86_64-linux-gnu/libc-2.13.so]
...
This excerpt means that a function at "0x0000000000401d60" called
"exit" one time. It probably would be good to add a total call count number
for every function.
> For another instance, using the exact suggestion:
> $ valgrind --tool=callgrind --ct-verbose=1 /bin/date
> gives output such as:
> -----
> . . . > check_match.10789(0x3d, 0x5b, ...) [ld-2.13.so / 0x3d25a08a20]
> . . . .> strcmp(0x3d, 0x5b, ...) [ld-2.13.so / 0x3d25a16ac0]
> . . . . > strcmp(0x3d, 0x5b, ...) [ld-2.13.so / 0x3d25a16ac0]
> -----
> where the indicated parameters are nonsense,
Sure, it would be a nice feature!
But "--ct-verbose=..." is actually only for internal debugging, and
the output is not documented. The values show the stack content
(interpreted as unsigned ints, as far as I remember) when entering the
function - which was enough at the time I needed the debug output.
> and a function whose body
> is a loop displays as a recursion.
Please tell me a way how to distinguish loops from tail recursion optimization
at machine code level, when the loop jumps back to the first instruction
of a function.
Often quite some meta information is thrown away, and there
is no way to exactly reconstruct what has happened at the source level.
So callgrind must use heuristics which sometimes can go wrong. For the
high-optimized code of strcmp above, this heuristic obviously goes wrong.
But without source code, your claim of existance of a loop in strcmp
could actually be wrong. It could have been coded as recursive function,
resulting in the same machine code.
> It is hard for me to trust such output.
Please tell me the reason why you started with the assumption that the
discovered output must be buggy.
Actually, to improve the trustworthiness, I added machine code annotation
to callgrind.
Josef
|
|
From: Josef W. <Jos...@gm...> - 2011-07-08 18:00:06
|
On Thursday 07 July 2011, Josef Weidendorfer wrote: > valgrind --tool=callgrind --ct-verbose=1 ... By the way: please only look at the sequence of function calls printed out. The rest probably is completely misleading (esp. the stuff that looks like parameters of the called functions). The option "--ct-verbose" is for internal debugging only, not further documented, and the output can change at every release without notice. Josef |
|
From: John R. <jr...@bi...> - 2011-07-08 15:18:44
|
On 07/08/2011 05:45 AM, Josef Weidendorfer wrote: > On Thursday 07 July 2011, John Reiser wrote: >>> valgrind --tool=callgrind --ct-verbose=1 ... >> >> When I run callgrind and callgrind_annotate, then I don't understand >> the output. For instance, the connection between "73" and the number >> of actual dynamic calls to exit() is mysterious to me: >> ===== >> $ valgrind --tool=callgrind /bin/date >> ==3790== Using Valgrind-3.6.0 and LibVEX; rerun with -h for copyright info >> $ callgrind_annotate callgrind.out* >> Ir file:function >> 73 /usr/src/debug/glibc-2.13/stdlib/exit.c:exit [/lib64/libc-2.13.so] > > The default for callgrind_annotate is to show self cost of the functions > for the given event types collected. > Thus, the "73" in the "Ir" column is the number of executed client instruction > inside of exit. It has nothing to do with the number of calls to exit. What I see is an instance of EPIC FAIL for Usability. The legend paragraph at the beginning of the output does not define "Ir", the output displayed by the command-line parameter "--help" does not define "Ir", and the default output of a tool whose name begins with "call" has nothing to do with the number of calls to each subroutine. The legend paragraph contains these instances of "Ir": Events recorded: Ir Events shown: Ir Event sort order: Ir The output of "callgrind_annotate --help" does not contain "Ir". The MINIMUM I expect is for BOTH places to contain an additional annotation such as "Possible events: Ir (Instruction Read)". The only explanation of "Ir" is in the callgrind manual share/doc/valgrind/html/cl-manual.html and the cachegrind manual share/doc/valgrind/html/cg-manual.html Neither manual explains why "Ir" (Instruction Read) is used instead of something like "Ix" (Instruction eXecuted). Until some years ago most x86 chips read [and decode] many more instructions than they execute: in particular, some [effectively] non-executed instructions which reside after every taken branch. Some newer x86 chips read-decode-translate instruction bytes into microcode, often blurring the boundaries between architectural instructions, and execute only the translated microcode. The translations are cached; a loop of up to several dozen instructions can be executed many times after the first translation. So the relationship between Instruction Read and Instruction eXecute is murky. > > Actually, callgrind_annotate only prints the number of calls for call arcs, > which are displayed e.g. with "--tree=caller": > >> callgrind_annotate --tree=caller callgrind.out* > ... > 7,060 < ???:0x0000000000401d60 (1x) [/bin/date] > 73 * /build/buildd/eglibc-2.13/stdlib/exit.c:exit [/lib/x86_64-linux-gnu/libc-2.13.so] > ... > > This excerpt means that a function at "0x0000000000401d60" called > "exit" one time. It probably would be good to add a total call count number > for every function. Yes. Total incoming calls by function is the zero-level statistic which is useful and expected by both novice and experienced users. > > >> For another instance, using the exact suggestion: >> $ valgrind --tool=callgrind --ct-verbose=1 /bin/date >> gives output such as: >> ----- >> . . . > check_match.10789(0x3d, 0x5b, ...) [ld-2.13.so / 0x3d25a08a20] >> . . . .> strcmp(0x3d, 0x5b, ...) [ld-2.13.so / 0x3d25a16ac0] >> . . . . > strcmp(0x3d, 0x5b, ...) [ld-2.13.so / 0x3d25a16ac0] >> ----- >> where the indicated parameters are nonsense, > > Sure, it would be a nice feature! > But "--ct-verbose=..." is actually only for internal debugging, and > the output is not documented. If something such as "strcmp(0x3d, 0x5b, ...)" is _not_ an indication of parameters, then this is another poor choice, especially as a recommendation to a new user. The values show the stack content > (interpreted as unsigned ints, as far as I remember) when entering the > function - which was enough at the time I needed the debug output. On x86_64, the first 6 [integer and pointer] parameters are passed in registers, and not on the stack. > >> and a function whose body >> is a loop displays as a recursion. > > Please tell me a way how to distinguish loops from tail recursion optimization > at machine code level, when the loop jumps back to the first instruction > of a function. After entry to a function and before the corresponding return, if there has been no write to the register or memory location which is designated to hold the return address, and if the target is within the same function (as determined by the available symbol table and perhaps by implementation details such as choice of opcode and instruction encoding), then it's a loop and not a recursion. Sure, at the _lowest_ level a branch back to the beginning [or even after the prologue] cannot be distinguished between a loop and a recursion. However, programmers who use explicit tail recursion to the same function _know_ that it is equivalent to a loop, and won't be surprised by such a display; whereas the coder of a loop might be confused by seeing a recursion. Also, callgrind should take advantage of having an event horizon that is larger than one instruction. Adopting the "subroutine outlook" also favors a loop over recursion. The "subroutine outlook" knows only what is [scheduled] to be done in the future; what has happened in the past is not relevant and cannot be known. Thus the current recursion level always equals the number of pending return addresses. The disadvantage is that traceback "through" a tail recursion elides some names, in much the same way that visible state at a debugger breakpoint shows only the _next_ instruction to be executed, and not the previous instruction pointer. > > Often quite some meta information is thrown away, and there > is no way to exactly reconstruct what has happened at the source level. > So callgrind must use heuristics which sometimes can go wrong. For the > high-optimized code of strcmp above, this heuristic obviously goes wrong. > But without source code, your claim of existance of a loop in strcmp > could actually be wrong. It could have been coded as recursive function, > resulting in the same machine code. > >> It is hard for me to trust such output. > > Please tell me the reason why you started with the assumption that the > discovered output must be buggy. I started with the assumption that a tool with a name such as "callgrind" would tell me how many times each subroutine was called. The output was not so. The output legend had no explanation. The --help output had no explanation. The output used notation "strcmp(0x3d, 0x5b, ...)" for something other than displaying [correctly] a function call with parameters. After all that experience, then I decided that I should be wary of callgrind. -- |
|
From: Josef W. <Jos...@gm...> - 2011-07-08 17:54:17
|
On Friday 08 July 2011, John Reiser wrote:
> What I see is an instance of EPIC FAIL for Usability.
Thanks for the detailed reply. I really appreciate it.
The first thing I must note here is that "callgrind_annotate" is meant
as fallback solution for people who can/do not want to use a GUI.
KCachegrind (the GUI for callgrind output)
- shows "Instruction Fetch" as explanation for "Ir",
- shows a call count column in the function list.
> The legend paragraph
> at the beginning of the output does not define "Ir", the output displayed
> by the command-line parameter "--help" does not define "Ir",
Ok. The same is true for cachegrind/cg_annotate. We should do something
about that.
> and the default
> output of a tool whose name begins with "call" has nothing to do with
> the number of calls to each subroutine.
Point taken.
> The legend paragraph contains these instances of "Ir":
> Events recorded: Ir
> Events shown: Ir
> Event sort order: Ir
> The output of "callgrind_annotate --help" does not contain "Ir".
> The MINIMUM I expect is for BOTH places to contain an additional annotation
> such as "Possible events: Ir (Instruction Read)".
>
> The only explanation of "Ir" is in the callgrind manual
> share/doc/valgrind/html/cl-manual.html
> and the cachegrind manual
> share/doc/valgrind/html/cg-manual.html
> Neither manual explains why "Ir" (Instruction Read) is used instead of
> something like "Ix" (Instruction eXecuted). Until some years ago most
> x86 chips read [and decode] many more instructions than they execute:
> in particular, some [effectively] non-executed instructions which reside
> after every taken branch. Some newer x86 chips read-decode-translate
> instruction bytes into microcode, often blurring the boundaries between
> architectural instructions, and execute only the translated microcode.
> The translations are cached; a loop of up to several dozen instructions
> can be executed many times after the first translation. So the
> relationship between Instruction Read and Instruction eXecute is murky.
Indeed. The "Ir" naming is part of cachegrind history.
We equally could use "Instructions Executed", and yes, this should be
explained in the manual. It would be even better to talk about
"Instructions retired" (works as abbrevation to Ir).
> > Actually, callgrind_annotate only prints the number of calls for call arcs,
> > which are displayed e.g. with "--tree=caller":
> >
> >> callgrind_annotate --tree=caller callgrind.out*
> > ...
> > 7,060 < ???:0x0000000000401d60 (1x) [/bin/date]
> > 73 * /build/buildd/eglibc-2.13/stdlib/exit.c:exit [/lib/x86_64-linux-gnu/libc-2.13.so]
> > ...
> >
> > This excerpt means that a function at "0x0000000000401d60" called
> > "exit" one time. It probably would be good to add a total call count number
> > for every function.
>
> Yes. Total incoming calls by function is the zero-level statistic which is
> useful and expected by both novice and experienced users.
Good.
> >> For another instance, using the exact suggestion:
> >> $ valgrind --tool=callgrind --ct-verbose=1 /bin/date
> >> gives output such as:
> >> -----
> >> . . . > check_match.10789(0x3d, 0x5b, ...) [ld-2.13.so / 0x3d25a08a20]
> >> . . . .> strcmp(0x3d, 0x5b, ...) [ld-2.13.so / 0x3d25a16ac0]
> >> . . . . > strcmp(0x3d, 0x5b, ...) [ld-2.13.so / 0x3d25a16ac0]
> >> -----
> >> where the indicated parameters are nonsense,
> >
> > Sure, it would be a nice feature!
> > But "--ct-verbose=..." is actually only for internal debugging, and
> > the output is not documented.
>
> If something such as "strcmp(0x3d, 0x5b, ...)" is _not_ an indication
> of parameters, then this is another poor choice,
This is undocumented debug output. We can change it any time. I do not see
a problem here.
> especially as a
> recommendation to a new user.
The original poster asked for a trace of called functions. You suggested
a quite elaborated wrapper method, which did not seem to be easy to implement.
I just commented that callgrind can print out exactly that list of
called functions using a hidden "feature", to show him a shortcut.
The only thing I may have added: "Warning: includes a lot of totally misleading crap".
> >> and a function whose body
> >> is a loop displays as a recursion.
> >
> > Please tell me a way how to distinguish loops from tail recursion optimization
> > at machine code level, when the loop jumps back to the first instruction
> > of a function.
>
> After entry to a function and before the corresponding return, if there has
> been no write to the register or memory location which is designated to hold
> the return address, and if the target is within the same function (as determined
> by the available symbol table and perhaps by implementation details such as
> choice of opcode and instruction encoding), then it's a loop and not
> a recursion.
That is exactly what callgrind is doing, unless ...
> Sure, at the _lowest_ level a branch back to the beginning [or even after the
> prologue] cannot be distinguished between a loop and a recursion.
Right. Unfortunately, callgrind only can see the lowest level. Do you
know any debug information that can help here?
> However,
> programmers who use explicit tail recursion to the same function _know_
> that it is equivalent to a loop, and won't be surprised by such a display;
> whereas the coder of a loop might be confused by seeing a recursion.
That is arguable.
Callgrind only translates a jump into recursion if it is back to the first
instruction. This mostly works fine, as loop bodies do not include function
prologues. The "wrongly detected" strcmp really is a rare exception, as
with the x86_64 calling convention, this function actually has no prologue:
0x7ffff7df36f0: mov (%rdi),%al
0x7ffff7df36f2: cmp (%rsi),%al
0x7ffff7df36f4: jne 0x7ffff7df3703
0x7ffff7df36f6: inc %rdi
0x7ffff7df36f9: inc %rsi
0x7ffff7df36fc: test %al,%al
0x7ffff7df36fe: jne 0x7ffff7df36f0
0x7ffff7df3700: xor %eax,%eax
0x7ffff7df3702: retq
0x7ffff7df3703: mov $0x1,%eax
0x7ffff7df3708: mov $0xffffffff,%ecx
0x7ffff7df370d: cmovb %ecx,%eax
0x7ffff7df3710: retq
And you can try yourself:
int mysc1(char* a, char* b)
{
if (*a == *b) {
if (*a != 0) return mysc1(a+1,b+1);
return 0;
}
return (*a < *b) ? -1:1;
}
int mysc2(char* a, char* b)
{
while(1) {
if (*a != *b) break;
if (*a == 0) return 0;
a++, b++;
}
return (*a < *b) ? -1:1;
}
$> gcc -v
...
gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-8ubuntu4)
$> gcc -O3 -Os -c mysc{1,2}.c; objdump -S mysc{1,2}.o
0000000000000000 <mysc>:
0: 8a 07 mov (%rdi),%al
2: 3a 06 cmp (%rsi),%al
4: 75 0c jne 12 <mysc+0x12>
6: 84 c0 test %al,%al
8: 74 13 je 1d <mysc+0x1d>
a: 48 ff c6 inc %rsi
d: 48 ff c7 inc %rdi
10: eb ee jmp 0 <mysc>
12: 0f 9d c0 setge %al
15: 0f b6 c0 movzbl %al,%eax
18: 8d 44 00 ff lea -0x1(%rax,%rax,1),%eax
1c: c3 retq
1d: 31 c0 xor %eax,%eax
1f: c3 retq
Both give exactly the same machine code as result with GCC 4.5.2!
> Also,
> callgrind should take advantage of having an event horizon that is larger
> than one instruction.
Please explain.
> Adopting the "subroutine outlook" also favors a loop over recursion.
> The "subroutine outlook" knows only what is [scheduled] to be done
> in the future; what has happened in the past is not relevant and
> cannot be known. Thus the current recursion level always equals
> the number of pending return addresses. The disadvantage is that
> traceback "through" a tail recursion elides some names, in much the same
> way that visible state at a debugger breakpoint shows only the _next_
> instruction to be executed, and not the previous instruction pointer.
Sorry, I don't get your point here.
Do you suggest for callgrind to better always play dumb, and do not even
try to reconstruct a potential tail recursion into a real recursion?
There is a similar scenario: machine code can have jumps between
functions (e.g. as result of a tail recursion optimization, or hand crafted
assembler). Callgrind output is about call relationship. There is no way
to reason about jumps between functions. Thus, callgrind has to map a
jump between functions either to a "call" (with an additional return when
returning from the function jumped to) or a "return/call" pair.
> > Please tell me the reason why you started with the assumption that the
> > discovered output must be buggy.
>
> I started with the assumption that a tool with a name such as "callgrind"
> would tell me how many times each subroutine was called. The output was
> not so. The output legend had no explanation.
I am really sorry about that experience. I obviously need to better care
about CLI users.
I will open a bug report for that.
Josef
> The --help output had
> no explanation. The output used notation "strcmp(0x3d, 0x5b, ...)"
> for something other than displaying [correctly] a function call with parameters.
> After all that experience, then I decided that I should be wary of callgrind.
|
|
From: John R. <jr...@bi...> - 2011-07-11 05:36:17
|
On 07/08/2011 10:54 AM, Josef Weidendorfer wrote:
> On Friday 08 July 2011, John Reiser wrote:
>> <<snip>>
> The first thing I must note here is that "callgrind_annotate" is meant
> as fallback solution for people who can/do not want to use a GUI.
> KCachegrind (the GUI for callgrind output)
> - shows "Instruction Fetch" as explanation for "Ir",
> - shows a call count column in the function list.
A newbie can discover KCachegrind only from a detailed reading of the
manual. Please mention KCachegrind in the header output from callgrind,
and in the header output from callgrind_annotate, and on the manual pages.
Also mention that KCachegrind is part of the kdesdk package;
this is _very_ non-obvious! For example:
For interactive control, run 'callgrind_control -h'.
For GUI report and visualization, see KCachegrind, which is
part of the kdesdk package.
Installing kdesdk on my system (Fedora 14 with Gnome and no previous KDE)
requires 156MB of new file space, which seems like a lot: more than 1/2
of valgrind itself plus all tools.
<<snip>>
>> Sure, at the _lowest_ level a branch back to the beginning [or even after the
>> prologue] cannot be distinguished between a loop and a recursion.
>
> Right. Unfortunately, callgrind only can see the lowest level.
Really? It seems to me that any valgrind tool can look at the whole
program before execution starts, and also can see every dynamically-
loaded library during dlopen() before the execution of any instruction
from that library. Thus callgrind could detect in advance all leaf
subroutines, and all [over-]writes of any return address.
> Do you know any debug information that can help here?
The DWARF-3 info certainly pinpoints every write to the return-address
pseudo-register, and every adjustment to the stack pointer. If both
are empty, then it is safe to consider every jump to be a loop
rather than a tail recursion to the same routine. [More cases apply,
too.]
<<snip>>
>> callgrind should take advantage of having an event horizon that is larger
>> than one instruction.
>
> Please explain.
A tool can detect first entry to a subroutine. Analyze the static properties
of the whole subroutine, then cache the results; invalidate for dlclose()
and self-modifying code. .text can be analyzed at several megabytes
per second, so the "extra work" is small.
>
>> Adopting the "subroutine outlook" also favors a loop over recursion.
>> The "subroutine outlook" knows only what is [scheduled] to be done
>> in the future; what has happened in the past is not relevant and
>> cannot be known. Thus the current recursion level always equals
>> the number of pending return addresses. The disadvantage is that
>> traceback "through" a tail recursion elides some names, in much the same
>> way that visible state at a debugger breakpoint shows only the _next_
>> instruction to be executed, and not the previous instruction pointer.
>
> Sorry, I don't get your point here.
The point is: if the number of pending return addresses does not change
then the recursion depth does not change, so don't change the indentation
when printing the nesting. In particular, if consecutive lines of the
nesting display would designate the same subroutine with the same number
of return addresses pending on the stack, then it's a loop and not
a [full] recursion.
> Do you suggest for callgrind to better always play dumb, and do not even
> try to reconstruct a potential tail recursion into a real recursion?
Forget the details of any call which has returned. Report only what
can be deduced from the current $pc and the stack.
Consider the source-code call chain A ==> B ==> C where B ==> C is
a tail recursion which the compiler recognizes and implements. During
execution of C, then the actual dynamic _return_ chain is C ==> A.
B has been erased by the tail recursion. The stack contains no record
of B ever having been invoked. So the effective dynamic call chain
[backtrace] at this point is A ==> C. This may surprise the programmer
who never wrote any call from A to C, but this is much the same as
stopping at a breakpoint immediately after a taken branch. You know
where you are [$pc], and where you're going (the backtrace of pending
_return_s), but not every detail of how you got here (taken branches
or tail recursions are not known.)
>
> There is a similar scenario: machine code can have jumps between
> functions (e.g. as result of a tail recursion optimization, or hand crafted
> assembler). Callgrind output is about call relationship. There is no way
> to reason about jumps between functions. Thus, callgrind has to map a
> jump between functions either to a "call" (with an additional return when
> returning from the function jumped to) or a "return/call" pair.
It is better to think of this as a _continuation_, for which
some programmers might use the idea of "return+call" as a crutch
until they become familiar with continuations. In many cases
it is important to realize that a principle side effect during
execution is the dynamic writing of the program which constitutes
the continuation from the current subroutine. Of course, this
"program of continuations" is the stack.
--
|
|
From: Josef W. <Jos...@gm...> - 2011-07-11 14:36:15
|
On Monday 11 July 2011, John Reiser wrote: > ... > >> Adopting the "subroutine outlook" also favors a loop over recursion. > >> The "subroutine outlook" knows only what is [scheduled] to be done > >> in the future; what has happened in the past is not relevant and > >> cannot be known. Thus the current recursion level always equals > >> the number of pending return addresses. The disadvantage is that > >> traceback "through" a tail recursion elides some names, in much the same > >> way that visible state at a debugger breakpoint shows only the _next_ > >> instruction to be executed, and not the previous instruction pointer. > > > > Sorry, I don't get your point here. > > The point is: if the number of pending return addresses does not change > then the recursion depth does not change, so don't change the indentation > when printing the nesting. In particular, if consecutive lines of the > nesting display would designate the same subroutine with the same number > of return addresses pending on the stack, then it's a loop and not > a [full] recursion. From my example in my last mail, you can see that a recursive implementation can produce the same code as a loop-based one. So you can never say with certainty that some machine code was programmed as a loop. Up to now, Callgrind tries to reconstruct what the programmer has coded. Which means that it tries to reconstruct recursive behavior even when the compiler did tail recursion optimization (and the number of pending returns do not change). I see that your main criteria is: "if the number of pending returns doesn't change, it is a loop". The problem is highlighted exactly with your example: > Consider the source-code call chain A ==> B ==> C where B ==> C is > a tail recursion which the compiler recognizes and implements. During > execution of C, then the actual dynamic _return_ chain is C ==> A. Typo: "A ==> C". > B has been erased by the tail recursion. The stack contains no record > of B ever having been invoked. So the effective dynamic call chain > [backtrace] at this point is A ==> C. This is impossible to represent in one single call graph for the execution of the program. C is either called from B or from A. The latter would mean an abstract return from B to A before A calls C. This of course is possible, but as you say yourself: > This may surprise the programmer > who never wrote any call from A to C, but this is much the same as > stopping at a breakpoint immediately after a taken branch. > You know where you are [$pc], and where you're going (the backtrace of pending > _return_s), but not every detail of how you got here (taken branches > or tail recursions are not known.) But you simple can not compare it with debugging, as Callgrind wants to come up with that single call graph. For this call graph, I much prefer Callgrind to use "A ==> B ==> C". The whole point of Callgrind/KCachegrind is to present performance metrics attributed to call graphs in a top-down way. Your suggestion seems incompatible to that goal. > > There is a similar scenario: machine code can have jumps between > > functions (e.g. as result of a tail recursion optimization, or hand crafted > > assembler). Callgrind output is about call relationship. There is no way > > to reason about jumps between functions. Thus, callgrind has to map a > > jump between functions either to a "call" (with an additional return when > > returning from the function jumped to) or a "return/call" pair. > > It is better to think of this as a _continuation_, for which > some programmers might use the idea of "return+call" as a crutch > until they become familiar with continuations. In many cases > it is important to realize that a principle side effect during > execution is the dynamic writing of the program which constitutes > the continuation from the current subroutine. Of course, this > "program of continuations" is the stack. And how to map this to a call graph? Do you suggest introducing nodes which are continuations of multiple functions? I think this would be even more confusing, as regular programming languages do not allow to code continuations, so programmers usually don't know this concept. Josef |
|
From: John R. <jr...@bi...> - 2011-07-11 15:56:57
|
On 07/11/2011 07:36 AM, Josef Weidendorfer wrote:
> On Monday 11 July 2011, John Reiser wrote:
>> [snip]
>>From my example in my last mail, you can see that a recursive implementation
> can produce the same code as a loop-based one. So you can never say with certainty
> that some machine code was programmed as a loop.
But the recursion depth (number of actual pending return addresses) is certain.
If the depth does not change, and if the target is in the same routine, then
it's a loop. If the depth does not change and if the target is in some other
routine, then it's a tail-recursive continuation. If the depth increases [by 1],
then it's a CALL. If the depth decreases by 1, then it's a RETurn;
if the depth decreases by more than 1 then it's an "unwind".
>
> Up to now, Callgrind tries to reconstruct what the programmer has coded. Which
> means that it tries to reconstruct recursive behavior even when the compiler
> did tail recursion optimization (and the number of pending returns do not change).
>
> I see that your main criteria is: "if the number of pending returns doesn't change,
> it is a loop".
>
> The problem is highlighted exactly with your example:
>
>> Consider the source-code call chain A ==> B ==> C where B ==> C is
>> a tail recursion which the compiler recognizes and implements. During
>> execution of C, then the actual dynamic _return_ chain is C ==> A.
>
> Typo: "A ==> C".
C will return to A, and a return happens in the future, so I write the arrow
for a _return_ chain in the direction of future motion. A _call_ chain records
the direction of past motion: "A called B" as a call chain is "A ==> B"; the
resulting _return_ chain is "B ==> A". Perhaps I should write "A <== B" ?
(Most often a return does not go to the beginning, but this does not matter
here. There is a style which explicitly writes [constructs, schedules] the
continuation stack; this style often _does_ "return" [continue] to the
beginning of many routines.)
>
>> B has been erased by the tail recursion. The stack contains no record
>> of B ever having been invoked. So the effective dynamic call chain
>> [backtrace] at this point is A ==> C.
>
> This is impossible to represent in one single call graph for the execution
> of the program. C is either called from B or from A. The latter would mean
> an abstract return from B to A before A calls C.
The CPU executes only continuations ("what is the [address of the] next
instruction?") and is indifferent to the terminology "call" and "return".
Often there is hardware (such as an on-chip stack of subroutine return
addresses) to speed the execution of common idioms such as CALL with RET,
but to the instruction fetch+decode unit and to the execution unit
this is a mere optimization, not a fundamental concept.
>
> This of course is possible, but as you say yourself:
>
>> This may surprise the programmer
>> who never wrote any call from A to C, but this is much the same as
>> stopping at a breakpoint immediately after a taken branch.
>> You know where you are [$pc], and where you're going (the backtrace of pending
>> _return_s), but not every detail of how you got here (taken branches
>> or tail recursions are not known.)
>
> But you simple can not compare it with debugging, as Callgrind wants
> to come up with that single call graph.
> For this call graph, I much prefer Callgrind to use "A ==> B ==> C".
>
> The whole point of Callgrind/KCachegrind is to present performance
> metrics attributed to call graphs in a top-down way.
> Your suggestion seems incompatible to that goal.
Going from "top-down" to insisting on "call and return" is problematic.
(For one thing, such a view fumbles true co-routines, which never return.)
The instant that B executes a tail-recursive continuation to C, then
the call path A ==> B ==> C is transformed into A ==> C, and I expect
the call graph to have such an arc, because that's what the true state
has become. B has vanished, A and C still are there, and [for the
moment] C is going to return to A. This still is "top-down",
except based on continuations.
>>> There is a similar scenario: machine code can have jumps between
>>> functions (e.g. as result of a tail recursion optimization, or hand crafted
>>> assembler). Callgrind output is about call relationship. There is no way
>>> to reason about jumps between functions. Thus, callgrind has to map a
>>> jump between functions either to a "call" (with an additional return when
>>> returning from the function jumped to) or a "return/call" pair.
>>
>> It is better to think of this as a _continuation_, for which
>> some programmers might use the idea of "return+call" as a crutch
>> until they become familiar with continuations. In many cases
>> it is important to realize that a principle side effect during
>> execution is the dynamic writing of the program which constitutes
>> the continuation from the current subroutine. Of course, this
>> "program of continuations" is the stack.
>
> And how to map this to a call graph?
> Do you suggest introducing nodes which are continuations of multiple functions?
If $pc is in C, and the next return address is in A, then there is
an arc A ==> C which should be charged (billed) accordingly.
> I think this would be even more confusing, as regular programming languages
> do not allow to code continuations, so programmers usually don't know this concept.
Tail recursion necessarily introduces the concept of continuation.
Trying to hide continuations causes problems, as you have noticed.
--
|
|
From: Josef W. <Jos...@gm...> - 2011-07-11 19:05:59
|
On Monday 11 July 2011, John Reiser wrote: > >> It is better to think of this as a _continuation_, for which > >> some programmers might use the idea of "return+call" as a crutch > >> until they become familiar with continuations. In many cases > >> it is important to realize that a principle side effect during > >> execution is the dynamic writing of the program which constitutes > >> the continuation from the current subroutine. Of course, this > >> "program of continuations" is the stack. > > > > And how to map this to a call graph? > > Do you suggest introducing nodes which are continuations of multiple functions? > > If $pc is in C, and the next return address is in A, then there is > an arc A ==> C which should be charged (billed) accordingly. This is key here: callgrind tries to come up with a sensible dynamic call graph for the execution of a program. A call graph does not have arcs for returns. Call graphs assume perfect nesting of function calls. In contrast, return arcs appear in control flow graphs, which are out-of-scope for callgrind. Obviously, call graphs only can represent limited control flow behavior, e.g. you can not represent tail recursion, longjmps, exception handling or co-routines. With tail recursion, longjmps and exception handling, callgrind tries to come up with sensible approximations. With co-routines, callgrind will blow up ;-) Call graphs are interesting for performance analysis because you can reason about the "inclusive" cost of a function call, thus allowing to have a nice top-down view of where performance cost was spent. Regarding above example, the call graph "A ==> B ==> C" is perfectly fine: the cost of C will be attributed to inclusive cost of B, because execution of C was a result of calling B. It does not matter whether the compiler was able to do tail recursion optimization or not. Call graphs are used since ages for that, such as in gprof. The IMHO most useful visualizations in KCachegrind are based on call graphs. Also, for performance analysis using call graphs, it does not really matter whether callgrind reconstructed direct recursion or just found a loop inside a function: the inclusive cost will be the same. So if we come back to were we started in this thread: "--ct-verbose=1" actually just prints out debugging information showing the approximations callgrind comes up with, to build the call graph. And that probably was the main misunderstanding. > > I think this would be even more confusing, as regular programming languages > > do not allow to code continuations, so programmers usually don't know this concept. > > Tail recursion necessarily introduces the concept of continuation. On the level of machine code, yes. But in the context of performance analysis, it does not matter. The incorrectly visualized additional returns have zero cost. > Trying to hide continuations causes problems, as you have noticed. It's a trade-off. Whether one can neglect continuations depends on the use-case. For performance analysis with languages such as C/C++, I think that is the case. Josef |