While inspecting a profile, I noticed an area in the call map that didn't appear to be correct. I tracked this down to the way that callgrind appears to handle unique call stacks.
Following up with the reproduction case.
Logged In: YES
File Added: callgrind.out.17563
Callgrind out from the test program
Logged In: YES
I see no obvious bug in the attached result file.
What did you expect to be different?
Remember that in a call graph every function appears only
once, and you get a superposition of all call paths. Callgrind
collects the exact cost of functions and call arcs between
functions. This is different from path profiling, where cost is
attributed to call chains. However, callgrind can do this by
e.g. running it with "--separate-callers=2". Then, a node in the
graph actually is a call chain.
Logged In: NO
I was more or less expecting --seperate-callers=infinity to be the default, but then combine the samples in the per-function view.
The area based profile is completely incorrect if you have complex call trees without supplying --seperate-callers. It's incorrect enough that it might even be better to completely disable the area based view when seperate callers is not supplied in the profile.
The call graph view could probably go either way and make sense (which is why I was suggesting a toggle for it). The same could be said for the per-function statistics (callers / callees / top cost / etc...).
> I was more or less expecting --seperate-callers=infinity to be the
> default, but then combine the samples in the per-function view.
The exponentially growing number of different call paths forbids
such behavior in general. Callgrind observes every path executed,
not just some samples. E.g. just a startup of firefox already
gives a few million call paths with --seperate-callers=20. And
as every event counter for every instruction is counted separately,
you can imagine that this results in out-of-memory for 32bit
architectures quite fast.
So there is some need for adaptive data collection. The problem
is that space for counters has to be allocated before one knows
if a given counter is important for later analysis or not.
My idea is to refine the context when it can be seen that some
function/piece of a call path is executed often in the run of
> The area based profile is completely incorrect if you have
> complex call trees without supplying --seperate-callers. It's
> incorrect enough that it might even be better to completely
> disable the area based view when seperate callers is not supplied
> in the profile.
It just very much depends on the problem at hand. IMHO instead of
generally disabling the view, it often is quite useful for a
quick overview, even if the data comes from heuristics.
They same is true about cycle detection: therefore, there is a
button to switch it off. A lot of code can show spurious false
cycles, making the call graph visualization totally useless.
GProf has a similar problem: on the one side, it only shows
a butterfly statistic, and cycle detection can not be switched
off. Therefore, you would assume that it is quite strict to
always show correct data.
But on the other hand, its data are based on time sampling, and
inclusive costs are heuristically derived from call counts,
which can produce totally wrong data. In addition, if a shared
library is not instrumented, you do not even get a warning about
the bogus measurement.
So, in an ideal world, the user of a performance analysis tool / visualization is aware of the good and bad sides of a given
measurement method, and he knows what are exact measurements, and
what are derived from heuristics. IMHO a tool should never prohibit
ways of usage by "intelligent" assumptions.
> The call graph view could probably go either way and make sense
> (which is why I was suggesting a toggle for it).
> The same could be said for the per-function statistics (callers /
> callees / top cost / etc...).
The data about direct callers/callees (and inclusive cost) is always
correct as Callgrind does measure it directly.
Perhaps it would make sense to show hints about what values are
direct measurements, and what are based on heuristics...