From: Dave Nomura <dcnltc@us...> - 2007-01-19 21:35:02
I am looking into a problem where opreport -lXgd runs out of memory. I
debugged this and found that it is exhausting memory while doing the
populate_for_image before it gets to any of the XML generation. The
profile was done with --separate=all on a PPC with 2G memory, and only
1G swap while doing about 2 minutes of 'make modules' of a kernel build.
I assume that nothing extraordinary was going on while the profile was
being run since they were able to reproduce this behavior several
times. It eventually consumes all of memory and swap before the kernel
kills it. Note: this only happens if --details use used.
I am trying to determine whether it is reasonable to expect that
opreport would need more than 2G of heap to do the populate.
The use of -X allows opreport to handle profiles with large numbers of
tids, tgids, cpus, etc. Without -X opreport says that the user must use
tgid:<tgid> ... to restrict the size of the profile.
I am finding it pretty difficult to figure out how much space is being
allocated because C++ is doing a lot of implicit allocation.
I think that the --details allocations are all happening in
profile_container::add_samples in the call to samples->insert(). I am
pretty much a C++ novice, but it appears to me that for each vma address
there is a growable vector allocated for the sample data that is indexed
by the profile classes. So, for example, if at vma 0x1234 there was
some sample data for profile class 200, an array of at least 200
count_type (8 byte) elements.
Is this correct? Is there some other allocation overhead associated
with --details that I am missing?
I discovered that there are 485 profile classes in this profile that are
comprised on various combinations of tgids(250), tids(260), and cpus(8),
so I ran
opreport -ld tgid:$tgid tid:$tid cpu:$cpu for all of the combination to
get a count of the total number of vmas(22K).
For each of the profile classes I multiplied the profile class by the
number of vmas for that class and totaled them up. This accounted for
5.3M sample array entries or about 42MB.
Clearly, my analysis must be flawed or I am missing some other source of
Do you see the flaws in my analysis?
Do you have a better way of calculating the number of vmas in a profile
and figuring out how much heap space should be required to represent
LTC Linux Power Toolchain