Re: [Valgrind-users] periodical flush of cachegrind.out

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Friday 11 April 2003 22:15, Charlls Quarra wrote:
>  Hi,
>
>  Im wondering if its possible to cachegrind to
> periodically dump a partial cachegrind.out file, so
> one can check the profile statistics without finishing
> the application

Hi Charlls,

take a look at my call-tree skin (I think I should rename it to callgraph, 
shouldn't I?) at kcachegrind.sf.net. The newest version 0.2.93 allows for

1) ad hoc dumping ("touch cachegrind.cmd" while running). This method is 
"supported" with auto-reloading after the dump in KCachegrind by clicking on 
a toolbar button.

2) periodic dumping every <count> basic blocks (--dumps=<count>).

3) dumping at enter/leave of all functions whose name starts with <funcprefix> 
(--dump-before=<funcprefix>/--dump-after=<funcprefix>). I choose the prefix 
method so with C++ you don't have to specify full signatures. You can specify 
this option multiple times.

4) program controlled dumping: "#include <valgrind/calltree.h>" into your 
source and add "CALLTREE_DUMP_STATS;" when you want a dump to happen.

If you are running a multi-threaded application and specify 
"--dump-threads=yes", every thread will be profiled on its own and will 
create its own profile dump. Thus, (3) and (4) will only generate one dump of 
the currently running thread. With (1) and (2), you will get multiple dumps 
(one for each thread) on a dump request. 

The generated dump files get names

	cachegrind.out.<pid>[.<part>][-<threadID>]

where <pid> is the PID of the running program, <part> is a number incremented 
on each dump (".<part>" is skipped for the dump at program termination), 
<threadID> is a thread identification.
When your program changes the current working directory while running, the 
dump files will get spread out into different directory. This can be avoided 
by specifying a dump file base name with an absolute path with 
"--base=/tmp/cachegrind.out".

You can control for which part of your program you want to collect event costs 
by using --toggle-collect=<funcprefix>. This will toggle the collection state 
on entering and leaving a function. When specifying this option, the default 
collecting state at programm start is "off". Thus, only events happing while 
running inside of <funcprefix> will be collected. Recursive function calls of 
<funcprefix> don't influence collecting at all.

Note that cache simulation still will be done all the time to be usefull. 
Thus, you don't have any speedup while collection state is off. But you can 
switch off cache simulation with "--simulate-cache=no". This will only give 
instruction read accesses and the callgraph tracing. But it significantly 
speeds up the profiling typically by more than a factor of 2.

There is an option to ignore calls to a function with 
"--fn-skip=<funcprefix>". E.g. you usually don't want to see the trampoline 
functions in the PLT sections for calls to functions in shared libs. You can 
see the difference if you profile with "--skip-plt=no". If a call is ignored, 
cost events happening will be attached to the enclosing function.

The remaining options are used to avoid cycles in profile data.
* if you have a recursive function, you can distinguish the first 10 recursion 
levels by specifying "--fn-recursion10=<funcprefix>". Or for all functions 
with "fn-recursion=10", but this will give you *much* bigger profile dumps.
In the profile data, you will see the recursion levels of "func" as the 
different functions with names "func", "func'2", "func'3" and so on.
* if you have call chains "A > B > C" and "A > C > B" in your program, you 
usually get a "false" cycle "B <> C". Use "--fn-caller2=B --fn-caller2=C", 
and functions "B" and "C" will be treated as different functions depending on 
the direct caller. Using the apostrophe for appending this "context" to the 
function name, you get "A > B'A > C'B" and "A > C'A > B'C", and there will be 
no cycle. Use "--fn-callers=3" to get a 2-caller depencendy for *all* 
functions. Again, this will multiplicate the profile data size.

Cheers,
Josef