On Saturday 29 November 2008, John Reiser wrote:
> > Is there any profiling tool which could count the execution time of
> > basic blocks ( like gprof and oprofile) in the inclusive style?
>
> No.
>
> Assume that "in the inclusive style" means "program counter is in this
> basic block or in the dynamic nest of subroutines called (directly or
> indirectly) by this basic block." There are two problems. Basic blocks
> tend to be short in control-dominated code [as opposed to data-dominated,
> such as traditional Fortran floating-point computation or software crypto],
Why is this a problem?
> and there is no published technique that accounts correctly for recursion.
For calls among recursive functions, there just is no sensible definition
for inclusive cost. This of course does not mean that there can not be
a technique in a measurement tool to avoid this problem for visualization.
First, the correct way is to modify the graph and get rid of
recursions: check for strongly connected components, and
reconstruct from this a DAG without cyles/recursion, using artifical nodes
which represent strongly connected components. This is the way gprof
handles it, and I do the same for visualization of the data I get from
callgrind (in KCachegrind).
The real problem with this strategy is that a lot of potentially useful
information is thrown away, and "hidden" inside of the cycle components.
To visualize such data, one has to avoid recursion in the first place
in the dynamically measured call graph.
At the time I thought about it, the best solution I came up with was
to arrange for cycles/recursion avoidance by identifying nodes in the
call graph not only by function name (or instruction address), but add
some dynamic context to make them unique, e.g. via the dynamic call path
to this function or a recursion depth of the function.
As this can lead to explosion of contexts, callgrind only does this via
command line option on demand. But one could do this on the fly via
adaptive refining, depending eg. on the call count of a function.
> Short basic blocks impose high overhead for direct measurement of time,
> and decrease the precision of a given duration of sampling.
Yes. For callgrind, this is no the problem, as it simulates a machine model,
and accounts for events taking place in the model.
> Despite 26 years
> since the problem of accounting for recursion was identified in the
> original paper (1982), gprof has not improved its method, which is subject
> to giving results that are incorrect or seriously misleading without
> specific warning.
Hmm... IMHO the only source for incorrect/misleading results of gprof has
nothing to do with the cycle detection (and thus, recursion handling), but
with the heuristic which is used to derive inclusive cost: gprof is doing
statistical sampling to get self cost of functions, and gets exact call
counts of functions via instrumentation. With the modified call graph
(with cycle objects), the inclusive cost is heuristically derived via
upwards propagation of self costs in the topologically sorted call graph.
Thus, the sources for incorrect/misleading results in gprof are
* self cost measurement done statistically on an instrumented version
of the program
* heuristic propagation of costs for inclusive figures. For this, gprof
propagates proportions of inclusive cost to callers according to the
call count, which can be very wrong.
Callgrind has no problem with these, as it uses exact event counts, has no
overhead because of instrumentation, and does not need a heuristic
to derive inclusive costs, as inclusive costs are measured directly. It uses
the cycle detection only afterwards in the visualization to hide nonsense
data from the user.
> In nearly any actual executing application, just a few basic blocks are
> all that matter. Candidates can be identified by profiling at the level
> of subroutines, refined by re-writing basic blocks as subroutines, then
> measured by re-profiling the modified code.
>
> My 'tsprof' proprietary application from BitWagon Software LLC solved
> the problem of accounting for recursion. The product has been withdrawn
> due to insufficient market demand.
I am very interested in your solution. It would be really nice if you
could disclose it in some way.
Josef
PS: Last time I checked, OProfile in call graph mode even does not do any
cycle detection at all; it also throughs away backtrace information
of sample points, which could be used for cycle avoidance :-(
|