|
From: David B. <dav...@gm...> - 2012-03-28 19:09:06
|
Hi Everyone,
I've been using massif quite heavily lately, and have been helping several
co-worker in using massif.
I've seen that there are two issues which bother me (and co-workers) when
trying to get useful information from ms_print and act upon them.
1) The back-trace style detailed report often "hides" the real consumption
of a function as it's usage is distributed among several trees.
2) The sorting by memory usage - this type of sorting makes it very hard to
compare detailed snapshots of the same run, as call trees are shifted
between snapshots, depending on their usage.
I've written a small utility which I call tnrip_sm (ms_print in reverse).
It works on the output of ms_print, and prints the reverse tree, and it can
also sort the tree by the instruction pointer, not by size.
For example, the detailed snapshot from the documentation of Valgrind:
99.48% (20,000B) (heap allocation functions) malloc/new/new[], --alloc-fns,
etc.
->49.74% (10,000B) 0x804841A: main (example.c:20)
|
->39.79% (8,000B) 0x80483C2: g (example.c:5)
| ->19.90% (4,000B) 0x80483E2: f (example.c:11)
| | ->19.90% (4,000B) 0x8048431: main (example.c:23)
| |
| ->19.90% (4,000B) 0x8048436: main (example.c:25)
|
->09.95% (2,000B) 0x80483DA: f (example.c:10)
->09.95% (2,000B) 0x8048431: main (example.c:23)
Is changed to:
100.00% (20,000B) 0xROOTROOT: root
->50.00% (10,000B) 0x804841A: main (example.c:20)
| ->50.00% (10,000B) 0x00000000: malloc/new/new[]
|
->30.00% (6,000B) 0x8048431: main (example.c:23)
| ->20.00% (4,000B) 0x80483E2: f (example.c:11)
| | ->20.00% (4,000B) 0x80483C2: g (example.c:5)
| | ->20.00% (4,000B) 0x00000000: malloc/new/new[
| |
| ->10.00% (2,000B) 0x80483DA: f (example.c:10)
| ->10.00% (2,000B) 0x00000000: malloc/new/new[]
|
->20.00% (4,000B) 0x8048436: main (example.c:25)
->20.00% (4,000B) 0x80483C2: g (example.c:5)
->20.00% (4,000B) 0x00000000: malloc/new/new[]
As you can see, it's now clear that example.c:23 is responsible for 6000
bytes. In the original tree, you needed to search for it in the allocations
of g and f, and sum up the numbers. This seems like a trivial problem, and
it is in an example application with 3 functions and 30 lines of function.
With thousands of functions, this becomes really annoying.
The reverse tree is similar to the output of kcachegrind. We begin at
the beginning of the program, and drill down from there.
The sorting by the instruction pointer proved to be very useful when using
massif as a tool to see how the memory usage increases over time. When
using this option the tree always looks the same, even as sizes change.
It's then possible to take two sorted reveres trees, put in a diff tool
side-by-side, and easily see how much RAM each flow consumed, and how that
consumption changed between the snapshots.
The drawback is that to have useful results the "--depth" must be large
enough to reach main() (or before, if there are allocations due to static
constructors etc.). Also, the "--threshold" better be set to a low value
(say 0.1), as what might be a low threshold for a given flow, might
actually be significant consumption if combining consumption in
the reverse tree.
I'm not very proud of the utility. It's a hack, and not very well written,
and a bit buggy - so I'm a bit ashamed to post it on the list.
I'm more interested in hearing from developers here their thoughts about
the functionality I'm offering.
If many people find it interesting, I'll be happy to invest more time in
adding the functionality of tnrip_sm into ms_print.
Any thoughts and comments welcomed.
David Bar
|