Thread: [Valgrind-users] Valgrind apparently looping after long test run

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

I'm using Valgrind 3.8.1 to run fuzzing tests on control software for an
embedded system. The tests are run on

Linux pod 3.11.4-201.fc19.x86_64 #1 SMP Thu Oct 10 14:11:18 UTC 2013
x86_64 x86_64 x86_64 GNU/Linux

Valgrind itself is the standard Fedora 19 build with the following
arguments:

valgrind	-v --leak-check=full
		--leak-resolution=high
		--show-possibly-lost=yes
		--track-fds=yes
		--num-callers=24
		--vgdb=no

Normally all this works fine, but last night I did an overnight run on a
new dataset and after about eight hours valgrind started looping. I
wasn't sure what to do about that... there were no diagnostics either
from my software or from valgrind. When I attached to the program with
strace I got this endless series:

clock_gettime(CLOCK_MONOTONIC, {124051, 458677331}) = 0
rt_sigprocmask(SIG_SETMASK, ~[], ~[ILL TRAP BUS FPE KILL SEGV STOP], 8) = 0
rt_sigtimedwait([HUP INT ILL BUS FPE KILL SEGV TERM STOP WINCH SYS RTMIN
RT_1], 0x4028a5e30, {0, 0}, 8) = -1 EAGAIN (Resource temporarily
unavailable)
rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP], NULL, 8) = 0
gettid()                                = 30464
gettid()                                = 30464
write(1029, "A", 1)                     = 1
gettid()                                = 30464
read(1028, "A", 1)                      = 1
gettid()                                = 30464
clock_gettime(CLOCK_MONOTONIC, {124051, 469072047}) = 0
rt_sigprocmask(SIG_SETMASK, ~[], ~[ILL TRAP BUS FPE KILL SEGV STOP], 8) = 0
rt_sigtimedwait([HUP INT ILL BUS FPE KILL SEGV TERM STOP WINCH SYS RTMIN
RT_1], 0x4028a5e30, {0, 0}, 8) = -1 EAGAIN (Resource temporarily
unavailable)
rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP], NULL, 8) = 0
gettid()                                = 30464
gettid()                                = 30464
write(1029, "B", 1)                     = 1
gettid()                                = 30464
read(1028, "B", 1)                      = 1
gettid()

It loops through the English alphabet, one letter at a time, uppercase A
to uppercase Z, and then repeats. TID 30464 is the PID of the valgrind
process. File descriptors 1028 and 1029 appear to be the opposite ends
of a pipe, but I have no idea why the alphabet would be cycling through
it. The sigtimedwait() call is polling for one of those signals and not
getting it. We have no such calls explicitly in our code, and no RT
calls at all, which leads me to think this is valgrind's internal code,
but of course it could still be measuring something within the main program.

I ran gstack on it but got only:

gstack 30464
#0  0x0000000038033cb0 in vgMemCheck_helperc_LOADV64le ()
#1  0x0000000403290318 in ?? ()
#2  0x00000004028a5f00 in ?? ()
#3  0x0000000000000000 in ?? ()

which doesn't tell me a whole lot, and I don't find vgMemCheck_helperc
anywhere in the code, so I guess it's something synthesized at compile time.

My test is scripted so I can re-run it easily (given time), but could
use some pointers on how to approach this and get more information. I
imagine that rebuilding valgrind with -g might help with gstack, but is
there a way I can signal or stop Valgrind and have it display a stack
trace? It looks like SIGINT is blocked, and sending TERM merely stops
the process and I get nothing.

Thanks for any pointers!

Tim

Thread: [Valgrind-users] Valgrind apparently looping after long test run

valgrind-users