|
From: Philippe W. <phi...@sk...> - 2012-01-12 23:25:03
|
At work , running the regression tests of a (big) application under Valgrind leads to an out of memory situation. I am investigating what could be done to improve either the Valgrind memory consumption and/or the ways to understand why Valgrind needs that amount of memory to run this application. The --profile-heap=yes gives already some hints: A there are millions of stack traces/exe context B there are millions of partially defined bytes C there are millions of malloc-ed blocks by the application Some activities are already going on to decrease the memory overhead needed by Valgrind for B and C see https://bugs.kde.org/show_bug.cgi?id=282230 . Understanding of the still reachable blocks is improved by https://bugs.kde.org/show_bug.cgi?id=289939 I will here discuss some other things being looked at. Feedback welcome .... 1. Give more info when Valgrind crashes out of memory : when -v or -d or --profile-heap=yes is given, output the following information when out of memory: stack traces of client threads stack trace of valgrind itself output heap statistics (and then the current 'out of memory' message). I think this is a little thing to do, but gives the exact state of memory at the time of the crash. Maybe this info could be given all the time when out of memory, even without -v or -d or --profile-heap=yes ? 2. for the gdbserver leak check command: when asking for reachable blocks (cfr C), it produces a huge list of loss records => I have implemented a gdbserver command to limit the nr of loss records: + + <listitem> + <para><varname>set max_loss_records_output nr </varname> + sets <varname>nr</varname> as the maximum nr of loss records leak_check will + output. If the maximum is reached, the leak records printed are + the records with the biggest number of bytes. + </para> Currently only available as part of gdbserver. It would not be difficult to also have this in the memcheck.h leak search client requests, but is that useful ? Limiting interactive output is for sure interesting, but client requests are more "batch". Opinion ? 3. Partially defined bytes (PDB) I am thinking to implement something (probably a gdbserver command) that will output a "classification" of the PDB: The idea would be to regroup the PDB using the following: a) PDB in malloc-ed blocks (per stack trace) b) PDB in global variables c) PDB in ... (root segments and similar). (whatever describe_addr can output) I guess that most of these PDB are created inside malloc-ed blocks. Seeing where these PDBs are coming from would allow to understand how to modify the application to avoid these millions of partially defined bytes. 4. stack traces/execontext I guess that the millions of stack traces/exe context are created due to recursive algorithms. An idea is to have a new clo : --stack-recursion-depth-collapsing=<nr> Then when producing a stack trace, if a repetition of a block (one or more IP addresses) is detected on the previous <nr> IPs in the stack trace, then the "recursive" part is "collapsed". 0 means collapsing is disabled. 1 will do collapsing of very simple recursion (e.g. "factorial"). More than 1 will allow to collapse most complex recursions (e.g. 2 will collapse fnA(line 53) calling fnB(line 78) calling fnA(line 53) calling fnB(line 78) to "fnA(line 53) calling fnB(line 78)". Note: the collapsing must be fast, so must be on the IP equality, not on "symbolic line nr" equality. This can be implemented in m_execontext.c : after computing the stack trace, do a loop to discover and collapse the recursive parts. Or it can be a parameter to VG_(get_stacktrace). This last solution is better (as it means the collapsing will happen during stack production, meaning more relevant addresses will be in the final resulting stack trace). Doing it in m_execontext.c is simpler/more localised. 5. It would be nice to implement a "garbage collection" of not useful anymore execontext but it is unclear if/how that can be done. In particular, it might be that various tools are keeping references to execontext, and I guess reference counting is out of question. Any feedback/suggestions/encouragement/... welcome Philippe |
|
From: Julian S. <js...@ac...> - 2012-01-18 22:00:29
|
On Friday, January 13, 2012, Philippe Waroquiers wrote: > At work , running the regression tests of a (big) application under > Valgrind leads to an out of memory situation. Can you clarify what you mean by "out of memory" ? * machine runs out of physical memory and swaps * V hits its internal 32GB total limit and gives up? * something else? On 64 bit platforms, V is limited to 32GB total memory use, but this is completely artificial and so can easily be increased to 64GB or 128GB. IOW, providing you can run your app as a 64-bit process, you should be able to "solve" the problem by throwing physical memory at it. J |
|
From: Philippe W. <phi...@sk...> - 2012-01-19 19:22:43
|
> * something else? Valgrind x86 reaches the limit of 4Gb (sum of client memory + Valgrind memory). > On 64 bit platforms, V is limited to 32GB total memory use, but this is > completely artificial and so can easily be increased to 64GB or 128GB. > IOW, providing you can run your app as a 64-bit process, you should > be able to "solve" the problem by throwing physical memory at it. We would need to switch to 64 bits for our application(s), and it is not a short term objective. So, at short term, the solution is to either reduce the memory used by Valgrind ( cfr 282230 pool alloc work recently committed) or analyse more in depth the client memory (289939, still to be reviewed). With 282230 and some results obtained with 289939, we are now back below the 4 Gb. Philippe |