|
From: Josef W. <Jos...@gm...> - 2004-11-17 19:55:55
|
Hi, I have a strange problem with VG and Suse 9.2. I did not do a bug report, as the bug seems to be in the kernel (?). > uname -a Linux acer 2.6.8-24.3-default #1 Tue Oct 26 14:40:54 UTC 2004 i686 i686 i386 GNU/Linux As user, almost any VG execution terminates with a SEGFAULT. The same is working just fine as root. Example: As root (in an empty directory): # valgrind --tool=none --trace-signals=yes ls ==10021== Nulgrind, a binary JIT-compiler for x86-linux. ==10021== Copyright (C) 2002-2004, and GNU GPL'd, by Nicholas Nethercote. ==10021== Using valgrind-2.2.0, a program supervision framework for x86-linux. ==10021== Copyright (C) 2000-2004, and GNU GPL'd, by Julian Seward et al. snaffling handler 0x0 for signal 1 snaffling handler 0x0 for signal 2 [Skipped] --10021-- setting ksig 63 to: hdlr 0xB003166C, flags 0x1C000004, mask(63..0) 0xFFFFFFFF 0xFFFBFEFF --10021-- setting ksig 64 to: hdlr 0xB003166C, flags 0x1C000004, mask(63..0) 0xFFFFFFFF 0xFFFBFEFF ==10021== For more details, rerun with: -v ==10021== --10021-- signal 11 arrived ... si_code=1 --10021-- SIGSEGV: si_code=1 faultaddr=0xAFEFCFB8 tid=1 esp=0xAFEFCFB8 seg=0xAFEFD000-0xAFEFF000 fl=60 shad=0xB0000000-0xB0000000 ==10021== Same as user (in an empty directory): > valgrind --tool=none --trace-signals=yes ls ==10081== Nulgrind, a binary JIT-compiler for x86-linux. ==10081== Copyright (C) 2002-2004, and GNU GPL'd, by Nicholas Nethercote. ==10081== Using valgrind-2.2.0, a program supervision framework for x86-linux. ==10081== Copyright (C) 2000-2004, and GNU GPL'd, by Julian Seward et al. snaffling handler 0x0 for signal 1 snaffling handler 0x0 for signal 2 [Skipped] --10081-- setting ksig 64 to: hdlr 0xB003166C, flags 0x1C000004, mask(63..0) 0xFFFFFFFF 0xFFFBFEFF ==10081== For more details, rerun with: -v ==10081== --10081-- signal 11 arrived ... si_code=0 --10081-- SIGSEGV: si_code=0 faultaddr=0x0 tid=1 esp=0xAFEFCFFC seg=0x8048000-0x805A000 fl=318 shad=0xB0000000-0xB0000000 --10081-- delivering signal 11 (SIGSEGV) to thread 1 --10081-- delivering 11 to default handler terminate+core ==10081== ==10081== Process terminating with default action of signal 11 (SIGSEGV) ==10081== at 0x3A975710: __libc_memalign (in /lib/ld-2.3.3.so) ==10081== by 0x3A9758A2: calloc (in /lib/ld-2.3.3.so) ==10081== by 0x3A96D1D3: _dl_new_object (in /lib/ld-2.3.3.so) ==10081== by 0x3A969B9A: _dl_map_object_from_fd (in /lib/ld-2.3.3.so) ==10081== Speicherzugriffsfehler I checked it with attaching GDB to stage2, and the segfaults happens when accessing 0xAFEFCFB8, like in the root case. I.e. the stack simply has to be expanded by VGs segfault handler. The strange thing is that for VG, the faultaddr=0x0, and therefore the segfault is forwarded to the client. Any idea how to track this down further or even to solve this? Josef |
|
From: Julian S. <js...@ac...> - 2004-11-17 23:29:04
|
Hi Josef, > I have a strange problem with VG and Suse 9.2. I did not do a bug report, > as the bug seems to be in the kernel (?). That's with 2.2.0. Does the current CVS head also behave like that? I believe Tom / Nick have fixed some bugs with segfault potential recently. J |
|
From: Josef W. <Jos...@gm...> - 2004-11-18 00:33:04
|
On Thursday 18 November 2004 00:28, Julian Seward wrote: > Hi Josef, > > > I have a strange problem with VG and Suse 9.2. I did not do a bug report, > > as the bug seems to be in the kernel (?). > > That's with 2.2.0. Does the current CVS head also behave like that? > I believe Tom / Nick have fixed some bugs with segfault potential > recently. Ah, yes. With CVS, it's working. Thanks and sorry for the noise, Josef |
|
From: Nicholas N. <nj...@ca...> - 2004-11-18 09:51:59
|
On Thu, 18 Nov 2004, Josef Weidendorfer wrote: >>> I have a strange problem with VG and Suse 9.2. I did not do a bug report, >>> as the bug seems to be in the kernel (?). >> >> That's with 2.2.0. Does the current CVS head also behave like that? >> I believe Tom / Nick have fixed some bugs with segfault potential >> recently. > > Ah, yes. With CVS, it's working. > Thanks and sorry for the noise, We have received some bug reports about Valgrind crashing immediately on SuSE 9.2, so it's good to know that it's been fixed in CVS. I assume this is with the CVS HEAD? Could you try the CVS VALGRIND_2_2_0_BRANCH and let us know if there is a problem with that? Thanks. N |
|
From: Nicholas N. <nj...@ca...> - 2004-11-18 10:25:44
|
On Thu, 18 Nov 2004, Nicholas Nethercote wrote: > We have received some bug reports about Valgrind crashing immediately on SuSE > 9.2, so it's good to know that it's been fixed in CVS. I assume this is with > the CVS HEAD? Could you try the CVS VALGRIND_2_2_0_BRANCH and let us know if > there is a problem with that? Thanks. Hmm, it seems that those reports also had the problem with the currnet CVS HEAD. Hmm. N |
|
From: Tom H. <th...@cy...> - 2004-11-18 10:34:01
|
In message <Pin...@he...>
Nicholas Nethercote <nj...@ca...> wrote:
> On Thu, 18 Nov 2004, Nicholas Nethercote wrote:
>
>> We have received some bug reports about Valgrind crashing
>> immediately on SuSE 9.2, so it's good to know that it's been fixed
>> in CVS. I assume this is with the CVS HEAD? Could you try the CVS
>> VALGRIND_2_2_0_BRANCH and let us know if there is a problem with
>> that? Thanks.
>
> Hmm, it seems that those reports also had the problem with the currnet
> CVS HEAD. Hmm.
Does SuSE 9.2 use gcc 3.4 by any chance?
I was chasing a problem on my newly upgraded FC3 box last night which
uses gcc 3.4 rather than the 3.3 that it was using before and I'm
pretty sure that it is down to a compiler bug related to __builtin_setjmp
that is causing VG_(is_addressable) not to reinstall the correct SEGV
handler after it has run.
Tom
--
Tom Hughes (th...@cy...)
Software Engineer, Cyberscience Corporation
http://www.cyberscience.com/
|
|
From: Nicholas N. <nj...@ca...> - 2004-11-18 15:32:17
|
On Thu, 18 Nov 2004, Tom Hughes wrote: > I was chasing a problem on my newly upgraded FC3 box last night which > uses gcc 3.4 rather than the 3.3 that it was using before and I'm > pretty sure that it is down to a compiler bug related to __builtin_setjmp > that is causing VG_(is_addressable) not to reinstall the correct SEGV > handler after it has run. Erk, that sucks... N |
|
From: Josef W. <Jos...@gm...> - 2004-11-18 16:30:25
|
> >> in CVS. I assume this is with the CVS HEAD? Could you try the CVS > >> VALGRIND_2_2_0_BRANCH and let us know if there is a problem with > >> that? Thanks. Currently, it always segfaults, either with HEAD or VALGRIND_2_2_0_BRANCH, so my previous reply was wrong ;-( > > Hmm, it seems that those reports also had the problem with the currnet > > CVS HEAD. Hmm. > > Does SuSE 9.2 use gcc 3.4 by any chance? No. It's "gcc (GCC) 3.3.4 (pre 3.3.5 20040809)" It still segfaults. I don't know why, this morning it was working... > I was chasing a problem on my newly upgraded FC3 box last night which > uses gcc 3.4 rather than the 3.3 that it was using before and I'm > pretty sure that it is down to a compiler bug related to __builtin_setjmp > that is causing VG_(is_addressable) not to reinstall the correct SEGV > handler after it has run. But why is this also a problem with gcc 3.3.4? Where should I look to be sure that this is a compiler bug ?!? Josef |
|
From: Tom H. <th...@cy...> - 2004-11-18 17:02:22
|
In message <200...@gm...>
Josef Weidendorfer <Jos...@gm...> wrote:
>> > Hmm, it seems that those reports also had the problem with the currnet
>> > CVS HEAD. Hmm.
>>
>> Does SuSE 9.2 use gcc 3.4 by any chance?
>
> No. It's "gcc (GCC) 3.3.4 (pre 3.3.5 20040809)"
> It still segfaults. I don't know why, this morning it was working...
Hmm. FC2 was 3.3.3 so I guess that's possible.
>> I was chasing a problem on my newly upgraded FC3 box last night which
>> uses gcc 3.4 rather than the 3.3 that it was using before and I'm
>> pretty sure that it is down to a compiler bug related to __builtin_setjmp
>> that is causing VG_(is_addressable) not to reinstall the correct SEGV
>> handler after it has run.
>
> But why is this also a problem with gcc 3.3.4?
> Where should I look to be sure that this is a compiler bug ?!?
Try straceing the valgrind process and see what the last half dozen
or so lines look like. The problem case looks like this:
rt_sigaction(SIGSEGV, {0xf003950f, ~[], 0}, {0xf0041e92, ~[KILL STOP], SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0xf0086d52}, 8) = 0
rt_sigprocmask(SIG_SETMASK, NULL, ~[ILL BUS FPE KILL SEGV STOP], 8) = 0
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
rt_sigaction(SIGSEGV, {SIG_DFL}, NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++
Note the second sigaction which sets SEGV to SIG_DFL which is complete
nonsense.
Tom
--
Tom Hughes (th...@cy...)
Software Engineer, Cyberscience Corporation
http://www.cyberscience.com/
|
|
From: Josef W. <Jos...@gm...> - 2004-11-18 17:39:24
|
On Thursday 18 November 2004 18:02, Tom Hughes wrote:
> In message <200...@gm...>
>
> Josef Weidendorfer <Jos...@gm...> wrote:
> >> > Hmm, it seems that those reports also had the problem with the currnet
> >> > CVS HEAD. Hmm.
> >>
> >> Does SuSE 9.2 use gcc 3.4 by any chance?
> >
> > No. It's "gcc (GCC) 3.3.4 (pre 3.3.5 20040809)"
> > It still segfaults. I don't know why, this morning it was working...
Another thing:
Valgrind as root is working. Perhaps I tried it as root.
> Try straceing the valgrind process and see what the last half dozen
> or so lines look like. The problem case looks like this:
>
> rt_sigaction(SIGSEGV, {0xf003950f, ~[], 0}, {0xf0041e92, ~[KILL STOP],
> SA_RESTORER|SA_STACK|SA_RESTART|SA_SIGINFO, 0xf0086d52}, 8) = 0
> rt_sigprocmask(SIG_SETMASK, NULL, ~[ILL BUS FPE KILL SEGV STOP], 8) = 0 ---
> SIGSEGV (Segmentation fault) @ 0 (0) ---
> rt_sigaction(SIGSEGV, {SIG_DFL}, NULL, 8) = 0
> rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
> --- SIGSEGV (Segmentation fault) @ 0 (0) ---
> +++ killed by SIGSEGV +++
I get (as normal user):
...
munmap(0xb031a000, 4096) = 0
rt_sigaction(SIGSEGV, {SIG_DFL}, {0xb0037f77, ~[KILL STOP], SA_RESTORER|
SA_STACK|SA_RESTART|SA_SIGINFO, 0xb0074c1e}, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[SEGV], ~[ILL BUS FPE KILL SEGV STOP], 8) = 0
getpid() = 15972
tgkill(15972, 15972, SIGSEGV) = 0
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++
There is no second sigaction.
Look at this small prog:
==============================
#include <signal.h>
char buf[1024];
void handler(int s, siginfo_t *i, void *p)
{
sprintf(buf,"Signal %d, Addr %p\n", s, i->si_addr);
write(1,buf,strlen(buf));
exit(1);
}
int main()
{
struct sigaction sa;
int *addr = 0xd000000;
sa.sa_flags = SA_SIGINFO;
sa.sa_sigaction = handler;
sigaction(11,&sa,0);
*addr = 5;
return 0;
}
====================================
As root, this gives the correct output:
Signal 11, Addr 0xd000000
As normal user, I get
Signal 11, Addr (nil)
I already supposed this to be a feature of SELinux. But I can't see how, and
SELinux seems to be installed, but disabled. Looking at Suse's kernel source,
I can't find anything (in arch/i386/mm/fault.c and kernel/signal.c).
Any idea?
Josef
>
> Note the second sigaction which sets SEGV to SIG_DFL which is complete
> nonsense.
>
> Tom
|
|
From: Tom H. <th...@cy...> - 2004-11-18 18:34:28
|
In message <200...@gm...>
Josef Weidendorfer <Jos...@gm...> wrote:
> I get (as normal user):
> ...
> munmap(0xb031a000, 4096) = 0
> rt_sigaction(SIGSEGV, {SIG_DFL}, {0xb0037f77, ~[KILL STOP], SA_RESTORER|
> SA_STACK|SA_RESTART|SA_SIGINFO, 0xb0074c1e}, 8) = 0
> rt_sigprocmask(SIG_SETMASK, ~[SEGV], ~[ILL BUS FPE KILL SEGV STOP], 8) = 0
> getpid() = 15972
> tgkill(15972, 15972, SIGSEGV) = 0
> --- SIGSEGV (Segmentation fault) @ 0 (0) ---
> +++ killed by SIGSEGV +++
That's completely different. That is the trace that you get when
valgrind decides it can't handle the signal and decides to kill
itself instead.
> Look at this small prog:
> ==============================
> #include <signal.h>
> char buf[1024];
>
> void handler(int s, siginfo_t *i, void *p)
> {
> sprintf(buf,"Signal %d, Addr %p\n", s, i->si_addr);
> write(1,buf,strlen(buf));
> exit(1);
> }
>
> int main()
> {
> struct sigaction sa;
> int *addr = 0xd000000;
>
> sa.sa_flags = SA_SIGINFO;
> sa.sa_sigaction = handler;
> sigaction(11,&sa,0);
>
> *addr = 5;
> return 0;
> }
> ====================================
> As root, this gives the correct output:
> Signal 11, Addr 0xd000000
>
> As normal user, I get
> Signal 11, Addr (nil)
Well that's completely bogus. If the kernel doesn't give us the
right fault address then nothing is going to work.
> I already supposed this to be a feature of SELinux. But I can't see how, and
> SELinux seems to be installed, but disabled. Looking at Suse's kernel source,
> I can't find anything (in arch/i386/mm/fault.c and kernel/signal.c).
I'm running FC3 which has SELinux turned on and I don't have this
problem so it looks like something with the SuSE kernel.
Are you looking at the SuSE kernel source? or the generic source
from kernel.org?
Tom
--
Tom Hughes (th...@cy...)
Software Engineer, Cyberscience Corporation
http://www.cyberscience.com/
|
|
From: Josef W. <Jos...@gm...> - 2004-11-18 19:54:55
|
On Thursday 18 November 2004 19:34, Tom Hughes wrote: > In message <200...@gm...> ... > > rt_sigprocmask(SIG_SETMASK, ~[SEGV], ~[ILL BUS FPE KILL SEGV STOP], 8) = > > 0 getpid() = 15972 > > tgkill(15972, 15972, SIGSEGV) = 0 > > --- SIGSEGV (Segmentation fault) @ 0 (0) --- > > +++ killed by SIGSEGV +++ > > That's completely different. That is the trace that you get when > valgrind decides it can't handle the signal and decides to kill > itself instead. Yes. If valgrind sees the faulting address of 0, it passes the segfault on to the client. > > As root, this gives the correct output: > > Signal 11, Addr 0xd000000 > > > > As normal user, I get > > Signal 11, Addr (nil) > > Well that's completely bogus. If the kernel doesn't give us the > right fault address then nothing is going to work. > > > I already supposed this to be a feature of SELinux. But I can't see how, > > and SELinux seems to be installed, but disabled. Looking at Suse's kernel > > source, I can't find anything (in arch/i386/mm/fault.c and > > kernel/signal.c). > > I'm running FC3 which has SELinux turned on and I don't have this > problem so it looks like something with the SuSE kernel. OK. One possibility less. > Are you looking at the SuSE kernel source? or the generic source > from kernel.org? I look into /usr/src/linux-2.6.8-24.3. And I did a grep on the full source for si_addr. Hmmm.... I found a candidate: I did a diff of vanilla 2.6.9 and the Suse9.2 2.6.8-24, and found this: ============================================= --- linux-2.6.9/arch/i386/mm/fault.c 2004-10-18 23:53:06.000000000 +0200 +++ linux-2.6.8-24.3/arch/i386/mm/fault.c 2004-10-26 18:50:30.000000000 +0200 @@ -227,7 +227,7 @@ asmlinkage void do_page_fault(struct pt_ __asm__("movl %%cr2,%0":"=r" (address)); if (notify_die(DIE_PAGE_FAULT, "page fault", regs, error_code, 14, - SIGSEGV) == NOTIFY_STOP) + SIGSEGV) == NOTIFY_OK) =============================================== Googling, this change is part of a "kprobes-exceptions-nofitier-fix", included into 2.6.9rc2, which isn't in the Suse kernel. But I am not sure if that's really the problem. I have no idea why this changes sefault behaviour between root and normal user. I think I will run plain 2.6.9 instead. And waiting for the next kernel in an Suse online update... ;-( BTW: This is a kernel from the last Suse online update. Perhaps the original one from the DVD does not show the problem. Josef > > Tom |