From: Josef W. <Jos...@gm...> - 2003-04-11 23:21:47
|
On Friday 11 April 2003 22:15, Charlls Quarra wrote: > Hi, > > Im wondering if its possible to cachegrind to > periodically dump a partial cachegrind.out file, so > one can check the profile statistics without finishing > the application Hi Charlls, take a look at my call-tree skin (I think I should rename it to callgraph, shouldn't I?) at kcachegrind.sf.net. The newest version 0.2.93 allows for 1) ad hoc dumping ("touch cachegrind.cmd" while running). This method is "supported" with auto-reloading after the dump in KCachegrind by clicking on a toolbar button. 2) periodic dumping every <count> basic blocks (--dumps=<count>). 3) dumping at enter/leave of all functions whose name starts with <funcprefix> (--dump-before=<funcprefix>/--dump-after=<funcprefix>). I choose the prefix method so with C++ you don't have to specify full signatures. You can specify this option multiple times. 4) program controlled dumping: "#include <valgrind/calltree.h>" into your source and add "CALLTREE_DUMP_STATS;" when you want a dump to happen. If you are running a multi-threaded application and specify "--dump-threads=yes", every thread will be profiled on its own and will create its own profile dump. Thus, (3) and (4) will only generate one dump of the currently running thread. With (1) and (2), you will get multiple dumps (one for each thread) on a dump request. The generated dump files get names cachegrind.out.<pid>[.<part>][-<threadID>] where <pid> is the PID of the running program, <part> is a number incremented on each dump (".<part>" is skipped for the dump at program termination), <threadID> is a thread identification. When your program changes the current working directory while running, the dump files will get spread out into different directory. This can be avoided by specifying a dump file base name with an absolute path with "--base=/tmp/cachegrind.out". You can control for which part of your program you want to collect event costs by using --toggle-collect=<funcprefix>. This will toggle the collection state on entering and leaving a function. When specifying this option, the default collecting state at programm start is "off". Thus, only events happing while running inside of <funcprefix> will be collected. Recursive function calls of <funcprefix> don't influence collecting at all. Note that cache simulation still will be done all the time to be usefull. Thus, you don't have any speedup while collection state is off. But you can switch off cache simulation with "--simulate-cache=no". This will only give instruction read accesses and the callgraph tracing. But it significantly speeds up the profiling typically by more than a factor of 2. There is an option to ignore calls to a function with "--fn-skip=<funcprefix>". E.g. you usually don't want to see the trampoline functions in the PLT sections for calls to functions in shared libs. You can see the difference if you profile with "--skip-plt=no". If a call is ignored, cost events happening will be attached to the enclosing function. The remaining options are used to avoid cycles in profile data. * if you have a recursive function, you can distinguish the first 10 recursion levels by specifying "--fn-recursion10=<funcprefix>". Or for all functions with "fn-recursion=10", but this will give you *much* bigger profile dumps. In the profile data, you will see the recursion levels of "func" as the different functions with names "func", "func'2", "func'3" and so on. * if you have call chains "A > B > C" and "A > C > B" in your program, you usually get a "false" cycle "B <> C". Use "--fn-caller2=B --fn-caller2=C", and functions "B" and "C" will be treated as different functions depending on the direct caller. Using the apostrophe for appending this "context" to the function name, you get "A > B'A > C'B" and "A > C'A > B'C", and there will be no cycle. Use "--fn-callers=3" to get a 2-caller depencendy for *all* functions. Again, this will multiplicate the profile data size. Cheers, Josef |