|
From: John C. <joh...@ta...> - 2018-11-05 01:19:27
|
We run our suite of unit tests under valgrind.... Something which is immensely helpful. However as suite grows and grows.... it is starting to take significant time, even when spread across multiple cores. What techniques can we employ to speed things up (whilst retaining most of the value)? For example, what optimization flags (for compiling the unit tests) have been found helpful? (We want the sum of compilation and link and unit test run time to be minimized) Are there any valgrind flags that would speed things up? Thanks! -- John Carter Phone : (64)(3) 358 6639 Tait Electronics PO Box 1645 Christchurch New Zealand -- This Communication is Confidential. We only send and receive email on the basis of the terms set out at www.taitradio.com/email_disclaimer <http://www.taitradio.com/email_disclaimer> |
|
From: John R. <jr...@bi...> - 2018-11-05 04:37:02
|
> What techniques can we employ to speed things up (whilst retaining most of the value)? In general, memcheck runs faster when it emulates fewer instructions. So compile with -O2 to get smaller code size. The flag -Os means "prefer small code size" but is little used, so there just might be more compiler bugs. Secondly, do whatever it takes to *avoid* inlining string functions such as strlen etc. You want memcheck to replace the entire logical function call with a memcheck internal equivalent. Unfortunately the gcc flags -fno-builtin* have documentation that is hard to understand. The memcheck option --expensive-definedness-checks= already defaults to 'no'. Specifying --redzone-size=8 might save 16 bytes of memory for each allocation, which helps if there are many small allocations. But default --alignment=16 is required for common SSE2 instructions [used by other glibc routines on the blocks, etc.], and the redzone is not the only per-block overhead (see --keep-stacktraces= and --num-callers=), so experimentation may be required. If all you care about is memory leaks, then experiment with --undef-value-errors=no and even non-valgrind tools [such as mtrace (malloc trace)] that are specialized for detecting leaks. |
|
From: Philippe W. <phi...@sk...> - 2018-11-05 20:41:01
|
On Sun, 2018-11-04 at 20:36 -0800, John Reiser wrote:
> > What techniques can we employ to speed things up (whilst retaining most of the value)?
> The memcheck option --expensive-definedness-checks= already defaults to 'no'.
Note that it defaults to 'auto' in 3.14.
> Specifying --redzone-size=8 might save 16 bytes of memory for each allocation,
> which helps if there are many small allocations. But default --alignment=16
> is required for common SSE2 instructions [used by other glibc routines on the blocks,
> etc.], and the redzone is not the only per-block overhead (see --keep-stacktraces=
> and --num-callers=), so experimentation may be required.
>
> If all you care about is memory leaks, then experiment with --undef-value-errors=no
> and even non-valgrind tools [such as mtrace (malloc trace)] that are specialized
> for detecting leaks.
You might look at the FOSDEM presentation
'Tuning Valgrind for your Workload
Hints, tricks and tips to effectively use Valgrind on small or big
applications'
https://archive.fosdem.org/2015/schedule/event/valgrind_tuning/
for other suggestions.
Philippe
|
|
From: John C. <joh...@ta...> - 2018-11-05 23:18:04
|
Thanks for the replies... Tweaking the optimization settings did very little for the running time. However a huge difference (factor of 2) comes from taking out --show-reachable=yes --track-origins=yes except they are very very useful... so I sort of don't want to. On Tue, Nov 6, 2018 at 9:43 AM Philippe Waroquiers < phi...@sk...> wrote: > On Sun, 2018-11-04 at 20:36 -0800, John Reiser wrote: > > > What techniques can we employ to speed things up (whilst retaining > most of the value)? > > > The memcheck option --expensive-definedness-checks= already defaults to > 'no'. > Note that it defaults to 'auto' in 3.14. > > > Specifying --redzone-size=8 might save 16 bytes of memory for each > allocation, > > which helps if there are many small allocations. But default > --alignment=16 > > is required for common SSE2 instructions [used by other glibc routines > on the blocks, > > etc.], and the redzone is not the only per-block overhead (see > --keep-stacktraces= > > and --num-callers=), so experimentation may be required. > > > > If all you care about is memory leaks, then experiment with > --undef-value-errors=no > > and even non-valgrind tools [such as mtrace (malloc trace)] that are > specialized > > for detecting leaks. > > You might look at the FOSDEM presentation > 'Tuning Valgrind for your Workload > Hints, tricks and tips to effectively use Valgrind on small or big > > applications' > https://archive.fosdem.org/2015/schedule/event/valgrind_tuning/ > > for other suggestions. > > Philippe > > > > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users > -- John Carter Phone : (64)(3) 358 6639 Tait Electronics PO Box 1645 Christchurch New Zealand -- This Communication is Confidential. We only send and receive email on the basis of the terms set out at www.taitradio.com/email_disclaimer <http://www.taitradio.com/email_disclaimer> |
|
From: Julian S. <js...@ac...> - 2018-11-06 08:13:27
|
On 05/11/18 01:49, John Carter wrote: > For example, what optimization flags (for compiling the unit tests) have > been found helpful? What flags are you using at the moment, for your unit tests? I have tended to use -Og -g as a reasonable tradeoff between debuggability and performance (which is its intended aim anyway). With gcc this works quite well. I've had more mixed performance results with clang using -g -Og. Probably the most important thing is to avoid building your test cases with -O0 (that is, no optimisation at all). That causes gcc, at least, to produce very poor code, involving many unnecessary memory references, which makes Memcheck run very slowly. Even -Og, which is the least level of optimisation one can ask for above "none", drastically reduces memory traffic and thereby makes Memcheck run significantly faster. J |