[Valgrind-developers] Re: kcachegrind patch for valgrind-HEAD?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Fri, 2002-10-04 at 03:10, Josef Weidendorfer wrote:
    Regarding the valgrind skin architecture: Shouldn't it be possible to "stack" 
    skins? 

I've been thinking about that too.  If the skins have non-overlapping
functionality and instrument the code in non-overlapping ways, then
maybe.  It seems to me that you'd want to avoid having multiple skins
instrumenting each other's instrumentation.  I guess you could add a
couple of new core UInstrs which limit CALLM_[SE], except for delimiting
a skin's instrumentation so other skins can avoid each other, but
there's then the problem of making sure the generated code is actually
correct...

But I still like the idea because my gprof skin is very non-intrusive
(it inserts just one call per basic block and a memory store per jump),
and could easily be composed with memcheck.

    Suppose a call chain starting from A: A calls B and C; C calls A again. The 
    recursively called A only calls B. Say the cost of B is always 100, the self 
    cost of each A and C are 10. So the cumulative cost of C will be 120 
    (C=>A=>B), and the one of the first call to A will be 230. I log (cumulative) 
    costs for the call A=>B only once, so this call gets cost 200.
    The problem: The callee list of A shows a call to B with cost 200 and a call 
    to C with cost 120, but A itself only has cumulative cost 230 !?! This is 
    confusing to the user, and really makes problems drawing the TreeMap...
    The real problem is that KCachegrind can't see that the cost of A=>B is 
    included in the call cost A=>C and thus shown twice in the callee list of A.
    And in the Treemap, I simple stop drawing recursive calls: This leaves empty
    space where there should be none and it looks like a performance problem for
    the user where there is none !!

    The real solution (without the ad-hoc one) would be:
    The callee list of A shows a call to B with cost 200. This is correct: B is 
    called twice from A, leading to cost 200 for calls to B. But the call A=>C 
    should be 20 only, skipping costs from any recursive A inside (perhaps 
    stating that cost 100 is already included in other calls). And this would 
    make the Treemap drawing fine again.

    So the only question I have:
    HOW to calculate this value (20) in the general case ?!?
    I suppose I can't calulate it at post-processing time, but have to log it in 
    Cachegrind somehow (that is, the skipped cost of 100 in the example above).

I don't think you should calculate anything at runtime; just record
enough so you can do it all afterwards.  There shouldn't be a huge
problem in working all this out: gprof does it, after all.

Does this do what your example describes?

#define COST(N)	{ int i; for(i = 0; i < N*1000000; i++); }

void A(int);
void B(void);
void C(void);

int main()
{
	A(0);

	return 0;
}

void A(int a)
{
	COST(10);
	B();
	if (!a)
		C();
}

void B(void)
{
	COST(100);
}

void C(void)
{
	COST(10);
	A(1);
}

gprof generates this flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  us/call  us/call  name    
 88.46      0.23     0.23        2 115000.00 115000.00  B
  7.69      0.25     0.02        2 10000.00 125000.00  A
  3.85      0.26     0.01        1 10000.00 10000.00  C

and this call graph profile:

index % time    self  children    called     name
[1]    100.0    0.03    0.23       1+2       <cycle 1 as a whole> [1]
                0.02    0.23       2             A <cycle 1> [3]
-----------------------------------------------
                                                 <spontaneous>
[2]    100.0    0.00    0.26                 main [2]
                0.03    0.23       1/1           A <cycle 1> [3]
-----------------------------------------------
                                   1             C <cycle 1> [5]
                0.03    0.23       1/1           main [2]
[3]     96.2    0.02    0.23       2         A <cycle 1> [3]
                0.23    0.00       2/2           B [4]
                                   1             C <cycle 1> [5]
-----------------------------------------------
                0.23    0.00       2/2           A <cycle 1> [3]
[4]     88.5    0.23    0.00       2         B [4]
-----------------------------------------------
                                   1             A <cycle 1> [3]
[5]      3.8    0.01    0.00       1         C <cycle 1> [5]
                                   1             A <cycle 1> [3]
-----------------------------------------------

So basically this splits call graph into 3 pieces (indices 3, 4, 5):
	3 is A seen as part of the cycle through A->C
	4 is A calling B
	5 is C seen as part of the cycle through C->A

[Handwave mode:] For the purposes of displaying a profile as a TreeMap,
it seems to me that you need to break the graph up into strongly
connected subgraphs, and display each of those as a node in your tree. 
When you want to descend into that node (ie display its sub-parts), you
remove all the backwards edges (recursive calls) and treat it as a plain
tree.  When you remove the backwards edge, you can recompute the cost of
the subgraph without the recursion; the difference between the original
cost and the new cost is how much you should attribute to the function
which is making the recursive call.

Does this help at all?

	J