|
From: Brian J. M. <mil...@ll...> - 2008-08-07 22:07:36
|
I'm using the callgrind tool of valgrind-3.4.0.SVN-8016-1833 on an AMD64
system running
a variant of Linux version 2.6.18-38. When examining the output of
callgrind_annotate --auto=yes
we see what appears to be the same function referenced several times.
After cutting out much of the
template parameter stuff and long path names we see:
<snip>
--------------------------------------------------------------------------------
Ir
--------------------------------------------------------------------------------
150,801,218,914 PROGRAM TOTALS
--------------------------------------------------------------------------------
Ir file:function
--------------------------------------------------------------------------------
23,838,486,859 myFoo.cc:myFoo::computeDist(...)
19,665,876,720 myFoo.hh:myFoo::computeDist(...)
3,824,672,223 stl_iterator.h:myFoo::computeDist(...)
3,069,195,547 Vector3d.hh:myFoo::computeDist(...)
<snip>
My guess is that the second through fourth entries are inlined functions
called from within myFoo::computeDist(...) while the first one is the
exclusive instruction count of myFoo::computeDist(...)
Is my understanding correct?
Thanks,
Brian
--
________________________________________________________________________
Brian Miller, L-556
Center for Applied Scientific Computing (CASC)
Lawrence Livermore National Lab (LLNL) Email: mil...@ll...
P.O. Box 808, 7000 East Ave. Phone: (925)424-4773
Livermore, CA 94551 Fax: (925)422-9429
________________________________________________________________________
|
|
From: Anish A. <Ani...@ap...> - 2008-08-07 23:38:23
|
Hello Valgrind/Kcachegrind support - I have been making use of Valgrind(Callgrind) and Kcachegrind for profiling on my application. However, i have been finding that my App. which for a normal single run takes total 10 secs(cpu time) and one of its functions FUN() takes 0.98 secs(cpu time)- thats 10% of total cpu time , however when run with Valgrind like this valgrind --tool=callgrind ./Release/MYAPP, takes about total 600 secs and the function FUN() takes about 200 secs - thats 33% of the total cputime. Kcachegrind does show me that percentage of time spend at FUN() as 33%, which is true when app is run using Valgrind, however just 10% when run without using Valgrind/Kcachegrind. Well, its understandable that running app using Valgrind has an additional overhead - hence 600 secs CPU time with valgrind is compared with 10 secs CPU time without valgrind. However, my assumption was that the realtive percentage usage within the functions should have remained the same , -eg 10% usage at function FUN(), should have remained exactly 10% even on running valgrind, however why does Valgrind/Kcachegrind show me as 33% ? Thanks Anish |
|
From: Danny R. <dr...@cs...> - 2008-08-07 23:55:17
|
Anish Anto wrote: > Well, its understandable that running app using Valgrind has an > additional overhead - hence 600 secs CPU time with valgrind is compared > with 10 secs CPU time without valgrind. > However, my assumption was that the realtive percentage usage within the > functions should have remained the same , -eg 10% usage at function > FUN(), should have remained exactly 10% even on running valgrind, > however why does Valgrind/Kcachegrind show me as 33% ? There are no guarantees that the Valgrind generated code (even in the nullgrind case) have timings that match native, even relatively. In the case of cachegrind, the state of the cache also influences the amount of work the instrumentation does, and hence the amount of time taken. Eg, code which hits on the MRU line frequently does not need to check for hits against L2 or install lines. - Danny |
|
From: Josef W. <Jos...@gm...> - 2008-08-08 11:01:28
|
On Friday 08 August 2008, Anish Anto wrote: > However, i have been finding that my App. which for a normal single run > takes total 10 secs(cpu time) and one of its functions FUN() takes 0.98 > secs(cpu time)- thats 10% of total cpu time , however when run with > Valgrind like this valgrind --tool=callgrind ./Release/MYAPP, takes about > total 600 secs and the function FUN() takes about 200 secs - thats 33% of > the total cputime. As already said, runtime instrumentation as done by Valgrind totally perturbates time behavior of the application, so any wallclock measurement done within a Valgrind run is bogus. However, Cachegrind/Callgrind does not do any such wallclock measurement but only count synthetic cache events like hits/misses and instruction executions, so ... > Kcachegrind does show me that percentage of time spend > at FUN() as 33%, which is true when app is run using Valgrind, however > just 10% when run without using Valgrind/Kcachegrind. ... I do not understand what "time" you are talking about here. KCachegrind does have a derived event type "estimated cycle time", which is calculated very roughly from the events counted, but you can not really expect this to have much relation to reality. As this only takes cache hits/misses into account, it is only kind of valid if (1) the runtime of your program is heavily influenced by bad cache behavior (it can not say anything about cycles wasted in branch misprediction, pipeline conflicts, bus congestions) (2) the coefficients of the "estimated cycle time" formula for average stall times of L1/L2 misses are adjusted to your system (the default is 100 cycles for L2 miss, and 10 cycles for L1 miss) You always also should use real time measurements to check any improvements. Callgrind/Cachegrind can just gave hints about potential time wasters in your code. Josef > Well, its understandable that running app using Valgrind has an additional > overhead - hence 600 secs CPU time with valgrind is compared with 10 secs > CPU time without valgrind. > However, my assumption was that the realtive percentage usage within the > functions should have remained the same , -eg 10% usage at function FUN(), > should have remained exactly 10% even on running valgrind, however why > does Valgrind/Kcachegrind show me as 33% ? > > Thanks > Anish > > > |
|
From: Josef W. <Jos...@gm...> - 2008-08-08 10:48:59
|
On Friday 08 August 2008, Brian J. Miller wrote: > Ir file:function > -------------------------------------------------------------------------------- > > 23,838,486,859 myFoo.cc:myFoo::computeDist(...) > 19,665,876,720 myFoo.hh:myFoo::computeDist(...) > 3,824,672,223 stl_iterator.h:myFoo::computeDist(...) > 3,069,195,547 Vector3d.hh:myFoo::computeDist(...) > <snip> > > My guess is that the second through fourth entries are inlined functions > called from within myFoo::computeDist(...) while the first one is the > exclusive instruction count of myFoo::computeDist(...) > > Is my understanding correct? At the moment I do not remember exactly how callgrind_annotate comes to its figures indexed by "file:function"... Can you check the output you get with KCachegrind? For KCachegrind, multiple functions shown with the same name really *are* different function instances (e.g. other signature, other template type etc.), and the source information for a given function just tells you the debug information associated by the compiler for the first instruction of the given function. This heuristic is fine for most cases, but probably not very useful when a lot of inlining is happening (as with template-rich C++). Functions which are inlined into other functions do not show up in the function list in KCachegrind. But you can see inlining in the source annotation of a given function, which will consist of multiple chunks, including inlined function lines. Josef > > Thanks, > > Brian > |