|
From: Raphael Z. <zul...@in...> - 2006-07-31 08:09:23
|
Hi
This email should:
1. give some feedback from a (almost happy) massif user
2. asks if there are other ways to aquire the data I needed
3. If 2. is not true, this email should also act as a "feature request"
My problem/situation: I've an application that definitely uses far to
much memory. So I've decided using massif to identify the evil code.
Massif did really help me find some of the worst code lines in my
program! Massif does also show memory peaks in the application lifetime
cycle. But it would be also be extremely helpful to have a list that
includes (for every source line):
- Bytes allocated
- Execution count
- Full callstack
This statistic should also not "optimize" away allocations < 0.5% or
whatever...
Why is this useful? First you have to know what my problem is:
I have an application that creates instances of plugins (PIC .so). When
my application creates several such instances (e.g. 50), I saw that my
process needs about 50 Mb RAM (and my application is meant to be able to
create about 200 and more such instances - even on system with not so
much RAM). Now the problem arises, that the amount of memory shown with
top does not give me the information like "the app uses 12000 Bytes
allocated on the heap, 4530 Bytes are used by initalized data", and so
on. It also does not tell me how much byte (exactly) one plugin instance
uses. In other words: top leaves me almost completely in the dark...
In my opinion, in such a situation, it is crucial to "understand" what
parts of the application uses how much memory (especially heap). Its
useful to see which source line does allocate how many times how much
memory. Those information helps to understand how the program works, it
helps to find code/mechanisms that should probably be reworked. After I
wasn't able to get these infomration out of valigrind, I've foudn
TotalView (comercial). This app does exactly! what I needed. It lists
exactly the above information (source line, execution count, byte
count). By the help of this application, I identiefied many things in my
code that can be improved - of course massif marked the same lines of
code as "using a lot of bytetime" - but it did not inform me about how
many bytes I would be able to safe if I'm going to optimize there...
Also, I've found some code lines that were "fallen out" of massifs <0.5%
that were still worth optimizing it (because some of them were just
programming mistakes). My problem was alos, that my boss (his a
programmer - not a manager) wanted to know exactly why we need that much
memory. He wanted that to know because he has to decide whether we have
to throw aways this project or whether there is room for optimizations.
In fact I had to create a list of actions (code reworks) we can take and
how much we safe by each action - with massif, I wasn't able to do this...
What about a massif command line argument that enables such a statistic
(instead of the "bytetime" values)? As far as I've understood how massif
works, such a feature shouldn't take to much time to implement, doesn't
it? Or does massif already do this and was just to silly to find out how?
Thanks!
Raphael
|
|
From: Nicholas N. <nj...@cs...> - 2007-03-29 22:16:47
|
On Mon, 31 Jul 2006, Raphael Zulliger wrote: > This email should: > 1. give some feedback from a (almost happy) massif user > 2. asks if there are other ways to aquire the data I needed > 3. If 2. is not true, this email should also act as a "feature request" > > My problem/situation: I've an application that definitely uses far to > much memory. So I've decided using massif to identify the evil code. > Massif did really help me find some of the worst code lines in my > program! Massif does also show memory peaks in the application lifetime > cycle. But it would be also be extremely helpful to have a list that > includes (for every source line): > - Bytes allocated > - Execution count > - Full callstack > This statistic should also not "optimize" away allocations < 0.5% or > whatever... Raphael (and other Massif users), I've been spending some time reworking Massif. You're not the first user to say that space-time (bytes x milliseconds) isn't very useful. The new version records detailed snapshots periodically. It then prints detailed call trace information (like the .txt/.html output of the current Massif) for each of those snapshots. The important difference is that the measurements are in size (bytes), not space-time (bytes x milliseconds). So it tells you how the heap looked at a particular point in time, rather than some kind of measurement spread out over the whole program. The new version also has a --threshold option which lets you control how much memory is considered "significant". I also plan to automatically take a snapshot when the peak allocation point is reached. A longer-term plan is to dump the raw data to a file, and allow post-processing tools to display it how they like. I would write a simple post-processing tool that produces text, but others could do graphical tools if they want. (That would make it like Cachegrind, which has two tools, one graphical and one textual, for presenting the results.) I've been doing this on the "MASSIF2" branch. You can check it out with: svn co svn://svn.valgrind.org/valgrind/branches/MASSIF2 MASSIF2 and then follow the instructions in the README to build it. Sample output is below, which gives a rough idea of what the final results of the textual presentation tool will be like. If you do try it, please note that it's still very experimental and likely to have bugs. Any feedback is welcome. Thanks. Nick ==14438== Massif, a space profiler. ==14438== Copyright (C) 2003, Nicholas Nethercote ==14438== Using LibVEX rev 1744, a library for dynamic binary translation. ==14438== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP. ==14438== Using valgrind-3.3.0.SVN, a dynamic binary instrumentation framework. ==14438== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al. ==14438== For more details, rerun with: -v ==14438== ==14438== -- start graph header -- cmd: concord configure -- end graph header -- -- start graph -- 699k| .: . | .:|: : | ::::|: : | .::::::|: : | :::::::::|: : | .:::::::::::|: : | ..:::::::::::::|: : | ..:::::::::::::::|: : | .:|::::::::::::::::|: : | ..:::|::::::::::::::::|: : | .::::::|::::::::::::::::|: : | ..|:::::::|::::::::::::::::|: : | ..::|:::::::|::::::::::::::::|: : | .:::::|:::::::|::::::::::::::::|: : | ::::::::|:::::::|::::::::::::::::|: : | .::::::::::|:::::::|::::::::::::::::|: : | .:|::::::::::|:::::::|::::::::::::::::|: : | .:::|::::::::::|:::::::|::::::::::::::::|: : | .:::::|::::::::::|:::::::|::::::::::::::::|: : | .:::::::|::::::::::|:::::::|::::::::::::::::|: : 0 +------------------------|-------|----------|-------|---------|------|--- 4 15 31 42 55 66 -- end graph -- -- start graph legend -- [these snapshots correspond to the points marked '|' on the x-axis of the graph] snapshot 4: t = 700 ms, size = 28,824 bytes snapshot 15: t = 920 ms, size = 158,059 bytes snapshot 31: t = 1,245 ms, size = 331,430 bytes snapshot 42: t = 1,465 ms, size = 443,063 bytes snapshot 55: t = 1,735 ms, size = 581,261 bytes snapshot 66: t = 1,955 ms, size = 689,443 bytes -- end graph legend -- ================================= == snapshot 4 ================================= Total memory usage: 28,824 bytes Useful heap usage : 12,816 bytes (44.46%) Admin heap usage : 9,664 bytes (33.52%) Stacks usage : 6,344 bytes (22.00%) Heap tree: 44.46% (12,816B): (heap allocation functions) malloc/new/new[], --alloc-fns, etc. ->43.24% (12,464B): 0x80487C6: create (concord.c:112) | ->23.10% (6,660B): 0x8048B01: new_word_node (concord.c:244) | | ->12.07% (3,480B): 0x8048AA6: insert (concord.c:226) | | | ->12.07% (3,480B): 0x80488D2: init_hash_table (concord.c:158) | | | ->12.07% (3,480B): 0x80487AA: main (concord.c:99) | | | | | ->11.03% (3,180B): 0x8048A38: insert (concord.c:210) | | ->11.03% (3,180B): 0x80488D2: init_hash_table (concord.c:158) | | ->11.03% (3,180B): 0x80487AA: main (concord.c:99) | | | ->09.21% (2,656B): 0x8048B6C: new_word_node (concord.c:251) | | ->04.82% (1,392B): 0x8048AA6: insert (concord.c:226) | | | ->04.82% (1,392B): 0x80488D2: init_hash_table (concord.c:158) | | | ->04.82% (1,392B): 0x80487AA: main (concord.c:99) | | | | | ->04.38% (1,264B): 0x8048A38: insert (concord.c:210) | | ->04.38% (1,264B): 0x80488D2: init_hash_table (concord.c:158) | | ->04.38% (1,264B): 0x80487AA: main (concord.c:99) | | | ->05.80% (1,672B): 0x8048BBD: add_existing (concord.c:270) | | ->05.80% (1,672B): 0x8048AE0: insert (concord.c:234) | | ->05.80% (1,672B): 0x80488D2: init_hash_table (concord.c:158) | | ->05.80% (1,672B): 0x80487AA: main (concord.c:99) | | | ->04.75% (1,372B): 0x8048B29: new_word_node (concord.c:245) | | ->03.66% (1,056B): 0x8048AA6: insert (concord.c:226) | | | ->03.66% (1,056B): 0x80488D2: init_hash_table (concord.c:158) | | | ->03.66% (1,056B): 0x80487AA: main (concord.c:99) | | | | | ->01.09% (316B): 0x8048A38: insert (concord.c:210) | | ->01.09% (316B): 0x80488D2: init_hash_table (concord.c:158) | | ->01.09% (316B): 0x80487AA: main (concord.c:99) | | | ->00.36% (104B): in 1 other insignificant place | ->01.22% (352B): 0x2AF47C: __fopen_internal (in /lib/libc-2.3.5.so) ->01.22% (352B): 0x2AF53D: fopen@@GLIBC_2.1 (in /lib/libc-2.3.5.so) ->01.22% (352B): 0x804888D: init_hash_table (concord.c:149) ->01.22% (352B): 0x80487AA: main (concord.c:99) ================================= == snapshot 15 ================================= Total memory usage: 158,059 bytes Useful heap usage : 82,683 bytes (52.31%) Admin heap usage : 69,048 bytes (43.68%) Stacks usage : 6,328 bytes (04.00%) Heap tree: 52.31% (82,683B): (heap allocation functions) malloc/new/new[], --alloc-fns, etc. ->52.08% (82,331B): 0x80487C6: create (concord.c:112) | ->20.83% (32,936B): 0x8048BBD: add_existing (concord.c:270) | | ->20.83% (32,936B): 0x8048AE0: insert (concord.c:234) | | ->20.83% (32,936B): 0x80488D2: init_hash_table (concord.c:158) | | ->20.83% (32,936B): 0x80487AA: main (concord.c:99) | | | ->19.03% (30,080B): 0x8048B01: new_word_node (concord.c:244) | | ->11.53% (18,240B): 0x8048AA6: insert (concord.c:226) | | | ->11.53% (18,240B): 0x80488D2: init_hash_table (concord.c:158) | | | ->11.53% (18,240B): 0x80487AA: main (concord.c:99) | | | | | ->07.49% (11,840B): 0x8048A38: insert (concord.c:210) | | ->07.49% (11,840B): 0x80488D2: init_hash_table (concord.c:158) | | ->07.49% (11,840B): 0x80487AA: main (concord.c:99) | | | ->07.61% (12,032B): 0x8048B6C: new_word_node (concord.c:251) | | ->04.61% (7,296B): 0x8048AA6: insert (concord.c:226) | | | ->04.61% (7,296B): 0x80488D2: init_hash_table (concord.c:158) | | | ->04.61% (7,296B): 0x80487AA: main (concord.c:99) | | | | | ->02.99% (4,736B): 0x8048A38: insert (concord.c:210) | | ->02.99% (4,736B): 0x80488D2: init_hash_table (concord.c:158) | | ->02.99% (4,736B): 0x80487AA: main (concord.c:99) | | | ->04.54% (7,179B): 0x8048B29: new_word_node (concord.c:245) | | ->03.79% (5,995B): 0x8048AA6: insert (concord.c:226) | | | ->03.79% (5,995B): 0x80488D2: init_hash_table (concord.c:158) | | | ->03.79% (5,995B): 0x80487AA: main (concord.c:99) | | | | | ->00.74% (1,184B): in 1 other insignificant place | | | ->00.06% (104B): in 1 other insignificant place | ->00.22% (352B): in 1 other insignificant place ================================= == snapshot 31 ================================= Total memory usage: 331,430 bytes Useful heap usage : 172,094 bytes (51.92%) Admin heap usage : 153,008 bytes (46.16%) Stacks usage : 6,328 bytes (01.90%) Heap tree: 51.92% (172,094B): (heap allocation functions) malloc/new/new[], --alloc-fns, etc. ->51.81% (171,742B): 0x80487C6: create (concord.c:112) | ->31.24% (103,552B): 0x8048BBD: add_existing (concord.c:270) | | ->31.24% (103,552B): 0x8048AE0: insert (concord.c:234) | | ->31.24% (103,552B): 0x80488D2: init_hash_table (concord.c:158) | | ->31.24% (103,552B): 0x80487AA: main (concord.c:99) | | | ->12.43% (41,200B): 0x8048B01: new_word_node (concord.c:244) | | ->08.03% (26,620B): 0x8048AA6: insert (concord.c:226) | | | ->08.03% (26,620B): 0x80488D2: init_hash_table (concord.c:158) | | | ->08.03% (26,620B): 0x80487AA: main (concord.c:99) | | | | | ->04.39% (14,580B): 0x8048A38: insert (concord.c:210) | | ->04.39% (14,580B): 0x80488D2: init_hash_table (concord.c:158) | | ->04.39% (14,580B): 0x80487AA: main (concord.c:99) | | | ->04.97% (16,480B): 0x8048B6C: new_word_node (concord.c:251) | | ->03.21% (10,648B): 0x8048AA6: insert (concord.c:226) | | | ->03.21% (10,648B): 0x80488D2: init_hash_table (concord.c:158) | | | ->03.21% (10,648B): 0x80487AA: main (concord.c:99) | | | | | ->01.75% (5,832B): 0x8048A38: insert (concord.c:210) | | ->01.75% (5,832B): 0x80488D2: init_hash_table (concord.c:158) | | ->01.75% (5,832B): 0x80487AA: main (concord.c:99) | | | ->03.13% (10,406B): 0x8048B29: new_word_node (concord.c:245) | | ->02.69% (8,948B): 0x8048AA6: insert (concord.c:226) | | | ->02.69% (8,948B): 0x80488D2: init_hash_table (concord.c:158) | | | ->02.69% (8,948B): 0x80487AA: main (concord.c:99) | | | | | ->00.43% (1,458B): in 1 other insignificant place | | | ->00.03% (104B): in 1 other insignificant place | ->00.10% (352B): in 1 other insignificant place ================================= == snapshot 42 ================================= Total memory usage: 443,063 bytes Useful heap usage : 228,655 bytes (51.60%) Admin heap usage : 208,080 bytes (46.96%) Stacks usage : 6,328 bytes (01.42%) Heap tree: 51.60% (228,655B): (heap allocation functions) malloc/new/new[], --alloc-fns, etc. ->51.52% (228,303B): 0x80487C6: create (concord.c:112) | ->35.01% (155,144B): 0x8048BBD: add_existing (concord.c:270) | | ->35.01% (155,144B): 0x8048AE0: insert (concord.c:234) | | ->35.01% (155,144B): 0x80488D2: init_hash_table (concord.c:158) | | ->35.01% (155,144B): 0x80487AA: main (concord.c:99) | | | ->09.95% (44,100B): 0x8048B01: new_word_node (concord.c:244) | | ->06.51% (28,860B): 0x8048AA6: insert (concord.c:226) | | | ->06.51% (28,860B): 0x80488D2: init_hash_table (concord.c:158) | | | ->06.51% (28,860B): 0x80487AA: main (concord.c:99) | | | | | ->03.43% (15,240B): 0x8048A38: insert (concord.c:210) | | ->03.43% (15,240B): 0x80488D2: init_hash_table (concord.c:158) | | ->03.43% (15,240B): 0x80487AA: main (concord.c:99) | | | ->03.98% (17,640B): 0x8048B6C: new_word_node (concord.c:251) | | ->02.60% (11,544B): 0x8048AA6: insert (concord.c:226) | | | ->02.60% (11,544B): 0x80488D2: init_hash_table (concord.c:158) | | | ->02.60% (11,544B): 0x80487AA: main (concord.c:99) | | | | | ->01.37% (6,096B): 0x8048A38: insert (concord.c:210) | | ->01.37% (6,096B): 0x80488D2: init_hash_table (concord.c:158) | | ->01.37% (6,096B): 0x80487AA: main (concord.c:99) | | | ->02.55% (11,315B): 0x8048B29: new_word_node (concord.c:245) | | ->02.20% (9,791B): 0x8048AA6: insert (concord.c:226) | | | ->02.20% (9,791B): 0x80488D2: init_hash_table (concord.c:158) | | | ->02.20% (9,791B): 0x80487AA: main (concord.c:99) | | | | | ->00.34% (1,524B): in 1 other insignificant place | | | ->00.02% (104B): in 1 other insignificant place | ->00.07% (352B): in 1 other insignificant place ================================= == snapshot 55 ================================= Total memory usage: 581,261 bytes Useful heap usage : 298,965 bytes (51.43%) Admin heap usage : 275,952 bytes (47.47%) Stacks usage : 6,344 bytes (01.09%) Heap tree: 51.43% (298,965B): (heap allocation functions) malloc/new/new[], --alloc-fns, etc. ->51.37% (298,613B): 0x80487C6: create (concord.c:112) | ->37.40% (217,448B): 0x8048BBD: add_existing (concord.c:270) | | ->37.40% (217,448B): 0x8048AE0: insert (concord.c:234) | | ->37.40% (217,448B): 0x80488D2: init_hash_table (concord.c:158) | | ->37.40% (217,448B): 0x80487AA: main (concord.c:99) | | | ->08.38% (48,740B): 0x8048B01: new_word_node (concord.c:244) | | ->05.61% (32,660B): 0x8048AA6: insert (concord.c:226) | | | ->05.61% (32,660B): 0x80488D2: init_hash_table (concord.c:158) | | | ->05.61% (32,660B): 0x80487AA: main (concord.c:99) | | | | | ->02.76% (16,080B): 0x8048A38: insert (concord.c:210) | | ->02.76% (16,080B): 0x80488D2: init_hash_table (concord.c:158) | | ->02.76% (16,080B): 0x80487AA: main (concord.c:99) | | | ->03.35% (19,496B): 0x8048B6C: new_word_node (concord.c:251) | | ->02.24% (13,064B): 0x8048AA6: insert (concord.c:226) | | | ->02.24% (13,064B): 0x80488D2: init_hash_table (concord.c:158) | | | ->02.24% (13,064B): 0x80487AA: main (concord.c:99) | | | | | ->01.10% (6,432B): 0x8048A38: insert (concord.c:210) | | ->01.10% (6,432B): 0x80488D2: init_hash_table (concord.c:158) | | ->01.10% (6,432B): 0x80487AA: main (concord.c:99) | | | ->02.20% (12,825B): 0x8048B29: new_word_node (concord.c:245) | | ->01.92% (11,217B): 0x8048AA6: insert (concord.c:226) | | | ->01.92% (11,217B): 0x80488D2: init_hash_table (concord.c:158) | | | ->01.92% (11,217B): 0x80487AA: main (concord.c:99) | | | | | ->00.27% (1,608B): in 1 other insignificant place | | | ->00.01% (104B): in 1 other insignificant place | ->00.06% (352B): in 1 other insignificant place ================================= == snapshot 66 ================================= Total memory usage: 689,443 bytes Useful heap usage : 354,067 bytes (51.35%) Admin heap usage : 329,048 bytes (47.72%) Stacks usage : 6,328 bytes (00.91%) Heap tree: 51.35% (354,067B): (heap allocation functions) malloc/new/new[], --alloc-fns, etc. ->51.30% (353,715B): 0x80487C6: create (concord.c:112) | ->38.58% (266,008B): 0x8048BBD: add_existing (concord.c:270) | | ->38.58% (266,008B): 0x8048AE0: insert (concord.c:234) | | ->38.58% (266,008B): 0x80488D2: init_hash_table (concord.c:158) | | ->38.58% (266,008B): 0x80487AA: main (concord.c:99) | | | ->07.61% (52,520B): 0x8048B01: new_word_node (concord.c:244) | | ->05.20% (35,900B): 0x8048AA6: insert (concord.c:226) | | | ->05.20% (35,900B): 0x80488D2: init_hash_table (concord.c:158) | | | ->05.20% (35,900B): 0x80487AA: main (concord.c:99) | | | | | ->02.41% (16,620B): 0x8048A38: insert (concord.c:210) | | ->02.41% (16,620B): 0x80488D2: init_hash_table (concord.c:158) | | ->02.41% (16,620B): 0x80487AA: main (concord.c:99) | | | ->03.04% (21,008B): 0x8048B6C: new_word_node (concord.c:251) | | ->02.08% (14,360B): 0x8048AA6: insert (concord.c:226) | | | ->02.08% (14,360B): 0x80488D2: init_hash_table (concord.c:158) | | | ->02.08% (14,360B): 0x80487AA: main (concord.c:99) | | | | | ->00.96% (6,648B): in 1 other insignificant place | | | ->02.04% (14,075B): 0x8048B29: new_word_node (concord.c:245) | | ->01.80% (12,413B): 0x8048AA6: insert (concord.c:226) | | | ->01.80% (12,413B): 0x80488D2: init_hash_table (concord.c:158) | | | ->01.80% (12,413B): 0x80487AA: main (concord.c:99) | | | | | ->00.24% (1,662B): in 1 other insignificant place | | | ->00.01% (104B): in 1 other insignificant place | ->00.05% (352B): in 1 other insignificant place |