|
From: Tim M. <ti...@se...> - 2005-04-11 16:36:27
|
Hi all,
I've just come across a curious segfault when using valgrind, which
doesn't happen without it. It appears to be caused by declaring an array
on the stack.
I have the following small test file:
---test.c---
#include <stdio.h>
int main(void)
{
char szVar[4000] = {0};
printf("Hello world!\n");
return 0;
}
---End test.c---
Compiling it with gcc:
[timm@timm uas]$ gcc test.c -o test
[timm@timm uas]$ valgrind --tool=none ./test
==13765== Nulgrind, a binary JIT-compiler for x86-linux.
==13765== Copyright (C) 2002-2004, and GNU GPL'd, by Nicholas Nethercote.
==13765== Using valgrind-2.4.0, a program supervision framework for
x86-linux.
==13765== Copyright (C) 2000-2005, and GNU GPL'd, by Julian Seward et al.
==13765== For more details, rerun with: -v
==13765==
==13765== Signal 11 (SIGSEGV) appears to have lost its siginfo; I can't go
on.
==13765== This may be because one of your programs has consumed your
==13765== ration of siginfo structures.
==13765==
Segmentation fault
[timm@timm uas]$
It happens with memcheck, addrcheck and tool=none. It happened with
valgrind 2.2.0, which is why I tried upgrading to 2.4.
If I reduce the size of the stack array to 3000 it seems to work.
I'm slightly suspicious that this is a gcc problem, since the same process
works on another box with a different gcc. My gcc is
[timm@timm .autotest.uas]$ gcc --version
gcc (GCC) 3.3.4 (pre 3.3.5 20040809)
(standard gcc with SuSE 9.2)
Tim
|
|
From: Christian P. <tr...@ge...> - 2005-04-11 23:45:32
|
On Monday 11 April 2005 6:36 pm, Tim Martin wrote: > Hi all, > > I've just come across a curious segfault when using valgrind, which > doesn't happen without it. It appears to be caused by declaring an array > on the stack. [....] I cannot confirm this. My test environment was x86 chroot environment withi= ng=20 true amd64 on Gentoo Linux testing branch. hmm.... maybe your valgrind install fsck'd up? Christian. =2D-=20 Netiquette: http://www.ietf.org/rfc/rfc1855.txt 01:27:26 up 19 days, 14:33, 6 users, load average: 0.15, 0.23, 0.27 |
|
From: Jeremy F. <je...@go...> - 2005-04-13 22:14:35
|
Tim Martin wrote:
>==13765== Signal 11 (SIGSEGV) appears to have lost its siginfo; I can't go
>on.
>==13765== This may be because one of your programs has consumed your
>==13765== ration of siginfo structures.
>==13765==
>Segmentation fault
>[timm@timm uas]$
>
>It happens with memcheck, addrcheck and tool=none. It happened with
>valgrind 2.2.0, which is why I tried upgrading to 2.4.
>
>
What happened with 2.2.0? It didn't print that particular message
because it is new with 2.4.
>If I reduce the size of the stack array to 3000 it seems to work.
>
>
The problem seems to be with the SIGSEGV Valgrind needs to grow the
stack. If the stack is small enough, then it doesn't need to be grown.
I think the problem is exactly as the message says:
==13765== This may be because one of your programs has consumed your
==13765== ration of siginfo structures.
The last time this problem was reported, it was with SuSE 9.2; in
particular, something in KDE has a lot of signals sent to it, but it
blocks them so they are never delivered and so remain pending forever.
This eats up your per-user limit of pending signals, so Valgrind's use
of signals is twarted.
This can be confirmed in several ways:
1. try running your test program under Valgrind as another user, or root
2. try running it just after logging in
3. run the test program "none/tests/faultstatus" natively (not under
Valgrind); if it reports failures, there's a basic problem with
signal delivery
4. run "grep '^...Pnd:' /proc/*/status|grep -v 0000000000000000" and
see what comes up
Newer versions of 2.6 kernels don't have this problem and make it easier
to diagnose pending signal exhaustion. Unfortunately this problem is
outside of Valgrind's scope, and there isn't any way to make Valgrind
avoid this case (the best we can do is report when it happens, rather
than silently dying or going into an infinite loop as it used to).
J
|
|
From: Tim M. <ti...@se...> - 2005-04-15 09:19:12
|
Jeremy Fitzhardinge said: > Tim Martin wrote: >>It happens with memcheck, addrcheck and tool=none. It happened with >>valgrind 2.2.0, which is why I tried upgrading to 2.4. >> > What happened with 2.2.0? It didn't print that particular message > because it is new with 2.4. It appears that the same thing happens with a different message - certainly the process dies with a segfault. [timm@timm uas]$ valgrind --tool=none ./test ==24996== Nulgrind, a binary JIT-compiler for x86-linux. ==24996== Copyright (C) 2002-2004, and GNU GPL'd, by Nicholas Nethercote. ==24996== Using valgrind-2.2.0, a program supervision framework for x86-linux. ==24996== Copyright (C) 2000-2004, and GNU GPL'd, by Julian Seward et al. ==24996== For more details, rerun with: -v ==24996== ==24996== ==24996== Process terminating with default action of signal 11 (SIGSEGV) ==24996== at 0x80483F1: main (in /home/timm/build/linux/products/servers/uas/test) ==24996== Segmentation fault > I think the problem is exactly as the message says: > > ==13765== This may be because one of your programs has consumed your > ==13765== ration of siginfo structures. > > The last time this problem was reported, it was with SuSE 9.2; in > particular, something in KDE has a lot of signals sent to it, but it > blocks them so they are never delivered and so remain pending forever. > This eats up your per-user limit of pending signals, so Valgrind's use > of signals is twarted. The thing that initially led me to discount that message was that it happened with such a trivial test program, which isn't doing any special signal handling at all. By the results of the tests below, I guess I was wrong. > This can be confirmed in several ways: > > 1. try running your test program under Valgrind as another user, or > root Running as root, it works fine on 2.2.0 and 2.4.0. > 2. try running it just after logging in > 3. run the test program "none/tests/faultstatus" natively (not under > Valgrind); if it reports failures, there's a basic problem with > signal delivery This reports Test 1: FAIL: expected si_code==1, not 0 Test 2: FAIL: expected si_code==2, not 0 Test 3: FAIL: expected si_code==2, not 0 Test 4: FAIL: expected si_code==1, not 0 Test 5: FAIL: expected si_code==2, not 0 Test 6: FAIL: expected si_code==128, not 0 Test 7: FAIL: expected si_code==128, not 0 Test 8: FAIL: expected si_code==128, not 0 Test 9: FAIL: expected si_code==128, not 0 which I guess counts as a problem. :-) > 4. run "grep '^...Pnd:' /proc/*/status|grep -v 0000000000000000" and > see what comes up This returns /proc/3084/status:SigPnd: 0000010000000000 /proc/3165/status:SigPnd: 0000010000000000 /proc/8256/status:SigPnd: 0000010000000000 /proc/8258/status:SigPnd: 0000010000000000 > Newer versions of 2.6 kernels don't have this problem and make it easier > to diagnose pending signal exhaustion. Unfortunately this problem is > outside of Valgrind's scope, and there isn't any way to make Valgrind > avoid this case (the best we can do is report when it happens, rather > than silently dying or going into an infinite loop as it used to). Is this a genuine bug in the kernel then? If so, I'll either upgrade or live with it. Thanks for your help. Tim |
|
From: Jeremy F. <je...@go...> - 2005-04-15 21:58:34
|
Tim Martin wrote:
>/proc/3084/status:SigPnd: 0000010000000000
>/proc/3165/status:SigPnd: 0000010000000000
>/proc/8256/status:SigPnd: 0000010000000000
>/proc/8258/status:SigPnd: 0000010000000000
>
>
Yep, that's the problem. I bet those are all KDE processes.
>Is this a genuine bug in the kernel then? If so, I'll either upgrade or
>live with it.
>
It's a genuine KDE bug, and a kernel misfeature. Updating the kernel
will resolve the problem for you.
J
|
|
From: Dirk M. <dm...@gm...> - 2005-04-15 22:12:05
|
On Friday 15 April 2005 11:18, Tim Martin wrote: > This returns > /proc/3084/status:SigPnd: 0000010000000000 > /proc/3165/status:SigPnd: 0000010000000000 > /proc/8256/status:SigPnd: 0000010000000000 > /proc/8258/status:SigPnd: 0000010000000000 > what are the process names and command lines of these pids? Dirk |