|
From: Karthik S. <ks...@dr...> - 2016-01-20 19:13:58
|
Hi all, I've run a few OpenMP-based multi-threaded applications with Valgrind/Callgrind on an ARMv8-based server, and found that increasing the number of threads has increased the per-thread IR, memory operations, and integer operations, while floating point operations scaled correctly (as displayed by the stats provided by Callgrind). In general, the native behavior on the server is: increase the number of threads, reduce the amount of memory ops, and instructions retired per thread, which I've observed on the host machine's hardware counters. On the documentation for Helgrind, I read that Linux futuxes may be causing some quirky runtime behavior in Valgrind, so I recompiled gcc with linux futexes disabled. I found that the per-thread IR, etc did indeed reduce, but that tons of mutex locks were used for every barrier. Does anyone know if this is the normal behavior? Is there a solution that allows us to use the native run-time support library of GNU OpenMP when using Valgrind-based tools? For reference, I tested this with gcc 4.9.2 and Valgrind 3.10.1. Thanks, Karthik |