|
From: Christopher G. <kr...@vi...> - 2005-03-14 17:16:21
|
This message is about a problem I encountered with valgrind, how I
investigated it, and how I "fixed" it. I'm not sure the "fix" is
correct.
I recently upgraded valgrind on my Debian box, and I could not
valgrind a heavily multi-threaded project anymore (100+ threads).
It would deadlock. strace told me that the system was regularly
polling on a futex(FUTEX_WAIT) call.
What would happen is that valgrind would run for a few seconds (so that
most of the threads are spawned), then 2 messages like this would
appear:
==13761== Syscall param futex(futex2) points to unaddressable byte(s)
==13761== at 0x1B95C64D: pthread_cond_broadcast@@GLIBC_2.3.2 (in
/lib/tls/libpthread-0.60.so)
==13761== by 0x1B9415B2: PR_Unlock (in /usr/lib/libnspr4.so)
==13761== by 0x1B9CB130: skpInternalPipe::CloseW() (internalpipe.cpp:87)
==13761== by 0x1B9C2256: skpInternalFilter::CloseOutgoingInterfaces()
(ifilter.cpp:347)
(...more lines of my stack...)
==13761== Address 0x7FFFFFFF is not stack'd, malloc'd or (recently) free'd
1) Looking into /lib/tls/libpthread-0.60.so, I found:
00007600 <pthread_cond_broadcast@@GLIBC_2.3.2>:
...
7637: b9 03 00 00 00 mov $0x3,%ecx
763c: b8 f0 00 00 00 mov $0xf0,%eax
7641: be ff ff ff 7f mov $0x7fffffff,%esi
7646: ba 01 00 00 00 mov $0x1,%edx
764b: cd 80 int $0x80
Man page for futex gives the prototype:
* int futex (int *uaddr, int op, int val, const struct timespec *timeout,
int *uaddr2, int val3);
* Note that %ecx=3 corresponds to FUTEX_REQUEUE.
* %esi refers to the 4th parameters, that means timeout=$0x7fffffff.
Sure enough, this is not a valid pointer. Why libc puts such value?
2) Next step was to look into the kernel itself (I tried 2.6.9)
Reading kernel/futex.c, and do_futex() more precisely,
case FUTEX_REQUEUE:
ret = futex_requeue(uaddr, uaddr2, val, val2, NULL);
break;
The timeout parameter is ignored, therefore putting $0x7fffffff is of no
consequence.
(reading the [Futexes are tricky] paper from Ulrich Drepper,
it seems that this value should not be ignored, but this is another story..)
diff -ru valgrind-2.2.0+2.4.0rc3-orig/coregrind/linux/syscalls.c
valgrind-2.2.0+2.4.0rc3/coregrind/linux/syscalls.c
--- valgrind-2.2.0+2.4.0rc3-orig/coregrind/linux/syscalls.c 2005-03-11
07:29:41.000000000 +0100
+++ valgrind-2.2.0+2.4.0rc3/coregrind/linux/syscalls.c 2005-03-14
16:08:48.000000000 +0100
@@ -419,7 +419,7 @@
SYS_PRE_MEM_READ( "futex(futex)", arg1, sizeof(int) );
if (arg2 == VKI_FUTEX_WAIT && arg4 != 0)
SYS_PRE_MEM_READ( "futex(timeout)", arg4, sizeof(struct vki_timespec) );
- if (arg2 == VKI_FUTEX_REQUEUE)
+ if (arg2 == VKI_FUTEX_REQUEUE && arg4 != 0x7FFFFFFF)
SYS_PRE_MEM_READ( "futex(futex2)", arg4, sizeof(int) );
}
And this fixed my problem. I'm not sure why, but it did :)
A summary of my machine:
valgrind-2.2.0+2.4.0rc3
kernel 2.6.9
libc6 2.3.2.ds1-20
--
Christopher Gautier
|