|
From: Eyal L. <ey...@ey...> - 2005-01-15 08:17:11
|
--2005-01-15 18:51:19.459 15031-- disInstr: unhandled instruction bytes: 0xF4 0xEB 0xDE 0xC7 --2005-01-15 18:51:19.459 15031-- at 0x1BEB1E23: abort (in /lib/tls/libc-2.3.2.so) vg built off latest cvs, on Debian testing. $ gcc --version gcc (GCC) 3.3.5 (Debian 1:3.3.5-5) Copyright (C) 2003 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Any idea? -- Eyal Lebedinsky (ey...@ey...) <http://samba.org/eyal/> If attaching .zip rename to .dat |
|
From: Naveen K. <g_n...@ya...> - 2005-07-20 21:48:36
|
Hi all,
On solaris I'm getting the following error when
running an app using valgrind.
vex x86->IR: unhandled instruction bytes: 0xF8 0x2A
0x7 0x8B
The code that triggered this was
f8 clc
2a 07 subb (%edi),%al
8b fa movl %edx,%edi
I looked at the vex code priv/guest-x86/toIR.c and the
case for 0xF8 (CLC) seems to be commented out. Is
there a reason for this ? Is this a simple fix where I
can uncomment the lines out and tweak the statements
or is there something else ?
Thanks
Naveen
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
|
|
From: Julian S. <js...@ac...> - 2005-07-20 22:03:31
|
> On solaris I'm getting the following error when > running an app using valgrind. Uh, did I miss something? You're running the 3.X line on Solaris x86 ? > vex x86->IR: unhandled instruction bytes: 0xF8 0x2A > 0x7 0x8B > > The code that triggered this was > f8 clc > 2a 07 subb (%edi),%al > 8b fa movl %edx,%edi > > I looked at the vex code priv/guest-x86/toIR.c and the > case for 0xF8 (CLC) seems to be commented out. Is > there a reason for this ? Yes -- that code is from the old UCode JIT. Flag handling in x86 vex is completely different. I'll look into it. J |
|
From: Nicholas N. <nj...@cs...> - 2005-07-20 22:12:19
|
On Wed, 20 Jul 2005, Julian Seward wrote: >> On solaris I'm getting the following error when >> running an app using valgrind. > > Uh, did I miss something? You're running the 3.X line on > Solaris x86 ? (: Naveen's the one who's been working on a Solaris port. Sounds like he's making some progress... N |
|
From: Julian S. <js...@ac...> - 2005-07-21 15:50:45
|
> vex x86->IR: unhandled instruction bytes: 0xF8 0x2A
> 0x7 0x8B
Fixed in vex r1284. The attached test program should now
work properly.
What is the current state of valgrind-3 on x86-solaris?
J
#include <stdio.h>
int x0, x1, x2, x3, x4;
extern void foo ( void );
asm("\n"
".global foo\n"
"foo:\n"
"\txorl %eax,%eax\n"
"\tpushfl\n"
"\tpopl x0\n"
"\tstc\n"
"\tpushfl\n"
"\tpopl x1\n"
"\tclc\n"
"\tpushfl\n"
"\tpopl x2\n"
"\tcmc\n"
"\tpushfl\n"
"\tpopl x3\n"
"\tcmc\n"
"\tpushfl\n"
"\tpopl x4\n"
"\tret\n"
);
int main ( void )
{
const int M = 0xFFFF; /* don't want to see the ID flag, bit 21 */
foo();
printf("0x%x 0x%x 0x%x 0x%x 0x%x\n", x0&M, x1&M, x2&M, x3&M, x4&M);
return 0;
}
|
|
From: Jeremy F. <je...@go...> - 2005-01-15 09:16:35
|
On Sat, 2005-01-15 at 19:17 +1100, Eyal Lebedinsky wrote: > --2005-01-15 18:51:19.459 15031-- disInstr: unhandled instruction bytes: 0xF4 0xEB 0xDE 0xC7 > --2005-01-15 18:51:19.459 15031-- at 0x1BEB1E23: abort (in /lib/tls/libc-2.3.2.so) > > vg built off latest cvs, on Debian testing. > > $ gcc --version > gcc (GCC) 3.3.5 (Debian 1:3.3.5-5) > Copyright (C) 2003 Free Software Foundation, Inc. > This is free software; see the source for copying conditions. There is NO > warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. This looks like hlt jmp -34 Hm, and since it's in abort(), perhaps its deliberately doing it to kill itself. Hm, but it does suggest it failed to kill itself with SIGABRT first. What does running with --trace-syscalls=yes or --trace-signals=yes say? J |
|
From: Eyal L. <ey...@ey...> - 2005-01-15 09:59:30
Attachments:
ssan3sv.log
|
Jeremy Fitzhardinge wrote:
> On Sat, 2005-01-15 at 19:17 +1100, Eyal Lebedinsky wrote:
>
>>--2005-01-15 18:51:19.459 15031-- disInstr: unhandled instruction bytes: 0xF4 0xEB 0xDE 0xC7
>>--2005-01-15 18:51:19.459 15031-- at 0x1BEB1E23: abort (in /lib/tls/libc-2.3.2.so)
>>
>>vg built off latest cvs, on Debian testing.
>>
>>$ gcc --version
>>gcc (GCC) 3.3.5 (Debian 1:3.3.5-5)
>>Copyright (C) 2003 Free Software Foundation, Inc.
>>This is free software; see the source for copying conditions. There is NO
>>warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>
>
> This looks like
> hlt
> jmp -34
>
> Hm, and since it's in abort(), perhaps its deliberately doing it to kill
> itself.
>
> Hm, but it does suggest it failed to kill itself with SIGABRT first.
>
> What does running with --trace-syscalls=yes or --trace-signals=yes say?
Running as follows, log attached.
valgrind --tool=memcheck \
--leak-check=yes \
--show-reachable=no \
--num-callers=32 \
--error-limit=no \
--run-libc-freeres=no \
--trace-syscalls=yes \
The executable is a server. When launched if immediately forks itself
and the parent only takes care of restarting the child if it dies. It
seems as if the child died just after launch and the parents exited
clean.
Naturally, there are a number of threads running even when the server
is idle. For example, my main() is actually a thread, the real main
that crt calls is practically empty, doing just a thread_create and
_join.
I should say that I avoided using cvs for a few months now since I
had problems spawning processes (I follow branch 2.2.0). I decided
to give head a try after reading of the progress done in the
threading area.
--
Eyal Lebedinsky (ey...@ey...) <http://samba.org/eyal/>
If attaching .zip rename to .dat
|
|
From: Jeremy F. <je...@go...> - 2005-01-15 11:17:25
|
On Sat, 2005-01-15 at 20:59 +1100, Eyal Lebedinsky wrote: > SYSCALL[29282,2](120) special:sys_clone ( 1200011, 0x0, 0x0, 0x0, 0x1D190BF8 )sys_fork ( ) fork: process 29282 created child 29303 > --> 0 (0x0) > --> 29303 (0x7277) > SYSCALL[29282,2]( 7) mayBlock:sys_waitpid ( 29303, 0x1D1909B4, 0 ) --> ... > SYSCALL[29303,2]( 4) mayBlock:sys_write ( 2, 0x1BFC3378, 635 ) --> ... > SYSCALL[29303,2]( 4) --> 635 (0x27B) > SYSCALL[29303,2](175) special:sys_rt_sigprocmask ( 1, 0x1D1906A4, 0x0, 8 ) --> 0 (0x0) > SYSCALL[29303,2](270):sys_tgkill ( 29282, 29300, 6 )==2005-01-15 20:56:02.227 29282== This looks like an assertion failure in the child process very shortly after a fork. It kills threadgroup 29282. > ==2005-01-15 20:56:02.227 29282== Process terminating with default action of signal 6 (SIGABRT) > ==2005-01-15 20:56:02.228 29282== at 0x1BE59511: (within /lib/tls/libpthread-0.60.so) > ==2005-01-15 20:56:02.228 29282== > ==2005-01-15 20:56:02.228 29282== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 25 from 1) > ==2005-01-15 20:56:02.228 29282== malloc/free: in use at exit: 10008 bytes in 26 blocks. > ==2005-01-15 20:56:02.228 29282== malloc/free: 34 allocs, 8 frees, 14241 bytes allocated. > ==2005-01-15 20:56:02.228 29282== For counts of detected errors, rerun with: -v > ==2005-01-15 20:56:02.228 29282== searching for pointers to 26 not-freed blocks. > ==2005-01-15 20:56:02.489 29282== checked 24775060 bytes. > ==2005-01-15 20:56:02.490 29282== > ==2005-01-15 20:56:02.490 29282== 68 bytes in 1 blocks are possibly lost in loss record 4 of 16 > ==2005-01-15 20:56:02.490 29282== at 0x1B906FE5: calloc (vg_replace_malloc.c:175) > ==2005-01-15 20:56:02.490 29282== by 0x1B8F25A8: (within /lib/ld-2.3.2.so) > ==2005-01-15 20:56:02.490 29282== by 0x1B8F287B: _dl_allocate_tls (in /lib/ld-2.3.2.so) > ==2005-01-15 20:56:02.490 29282== by 0x1BE5424A: allocate_stack (in /lib/tls/libpthread-0.60.so) > ==2005-01-15 20:56:02.490 29282== by 0x1BE53C54: pthread_create@@GLIBC_2.1 (in /lib/tls/libpthread-0.60.so) > ==2005-01-15 20:56:02.490 29282== by 0x1BD1CC1F: ??? (thread.c:793) > ==2005-01-15 20:56:02.490 29282== by 0x1BD1DD18: skthcr (thread.c:1401) > ==2005-01-15 20:56:02.490 29282== by 0x1BCFA06B: skmain (main.c:1237) > ==2005-01-15 20:56:02.490 29282== by 0x804A4B9: main (ssan3sv.c:319) > ==2005-01-15 20:56:02.490 29282== > ==2005-01-15 20:56:02.490 29282== > ==2005-01-15 20:56:02.490 29282== 268 bytes in 1 blocks are definitely lost in loss record 7 of 16 > ==2005-01-15 20:56:02.490 29282== at 0x1B9065C1: malloc (vg_replace_malloc.c:130) > ==2005-01-15 20:56:02.490 29282== by 0x1BD37D95: suwmal (malloc.c:46) > ==2005-01-15 20:56:02.490 29282== by 0x1BCF7579: ??? (log.c:298) > ==2005-01-15 20:56:02.490 29282== by 0x1BCF78D0: sksetp (log.c:520) > ==2005-01-15 20:56:02.490 29282== by 0x1BD033A2: skredi (redir.c:217) > ==2005-01-15 20:56:02.490 29282== by 0x1BCF9E20: skmain (main.c:1124) > ==2005-01-15 20:56:02.490 29282== by 0x804A4B9: main (ssan3sv.c:319) > ==2005-01-15 20:56:02.490 29282== > ==2005-01-15 20:56:02.490 29282== LEAK SUMMARY: > ==2005-01-15 20:56:02.490 29282== definitely lost: 268 bytes in 1 blocks. > ==2005-01-15 20:56:02.490 29282== possibly lost: 68 bytes in 1 blocks. > ==2005-01-15 20:56:02.490 29282== still reachable: 9672 bytes in 24 blocks. > ==2005-01-15 20:56:02.490 29282== suppressed: 0 bytes in 0 blocks. > ==2005-01-15 20:56:02.490 29282== Reachable blocks (those to which a pointer was found) are not shown. > ==2005-01-15 20:56:02.490 29282== To see them, rerun with: --show-reachable=yes > --> 0 (0x0) > SYSCALL[29303,2](174) special:sys_rt_sigaction ( 6, 0x1D1905DC, 0x0, 8 ) --> 0 (0x0) > SYSCALL[29303,2]( 91):sys_munmap ( 0x1BE3B000, 4096 ) --> 0 (0x0) > SYSCALL[29303,2](270):sys_tgkill ( 29282, 29300, 6 ) --> 0 (0x0) And this looks like a bug, but I'm not sure whose. 29303, the process created by the fork(), it trying to send itself SIGABRT with tgkill, after (presumably) using sigaction to set SIGABRT to SIG_DFL. But it's using the threadgroup id of its parent along with its own pid, which should cause it to fail with ESRCH. It fails to do the kill, because there's no pending SIGABRT, but the syscall doesn't return an error code. So, it could be a libc bug (for using a bad tgkill after fork()), a Valgrind bug (for confusing the client into using a bad tgkill), and/or a kernel bug (for not making tgkill return an error code when it should). Because the SIGABRT doesn't get delivered to 29303, it carries on to the following instruction, which is hlt, and so this crash. So the real question is why there's an assertion failure? Are you positive that your processes are not crashing like this outside of Valgrind? > --2005-01-15 20:56:02.496 29303-- disInstr: unhandled instruction bytes: 0xF4 0xEB 0xDE 0xC7 > --2005-01-15 20:56:02.496 29303-- at 0x1BEB1E23: abort (in /lib/tls/libc-2.3.2.so) > ==2005-01-15 20:56:02.496 29303== > ==2005-01-15 20:56:02.496 29303== Process terminating with default action of signal 4 (SIGILL) > ==2005-01-15 20:56:02.496 29303== Illegal operand at address 0xB00896F0 > ==2005-01-15 20:56:02.496 29303== at 0x1BEB1E23: abort (in /lib/tls/libc-2.3.2.so) > ==2005-01-15 20:56:02.496 29303== > ==2005-01-15 20:56:02.496 29303== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 25 from 1) > ==2005-01-15 20:56:02.496 29303== malloc/free: in use at exit: 10008 bytes in 26 blocks. > ==2005-01-15 20:56:02.496 29303== malloc/free: 38 allocs, 12 frees, 15977 bytes allocated. > ==2005-01-15 20:56:02.496 29303== For counts of detected errors, rerun with: -v > ==2005-01-15 20:56:02.496 29303== searching for pointers to 26 not-freed blocks. > [29267] end: Sat Jan 15 20:56:02 EST 2005 > ==2005-01-15 20:56:02.665 29303== checked 24771384 bytes. > ==2005-01-15 20:56:02.665 29303== > ==2005-01-15 20:56:02.665 29303== 68 bytes in 1 blocks are possibly lost in loss record 4 of 16 > ==2005-01-15 20:56:02.665 29303== at 0x1B906FE5: calloc (vg_replace_malloc.c:175) > ==2005-01-15 20:56:02.665 29303== by 0x1B8F25A8: (within /lib/ld-2.3.2.so) > ==2005-01-15 20:56:02.665 29303== by 0x1B8F287B: _dl_allocate_tls (in /lib/ld-2.3.2.so) > ==2005-01-15 20:56:02.665 29303== by 0x1BE5424A: allocate_stack (in /lib/tls/libpthread-0.60.so) > ==2005-01-15 20:56:02.665 29303== by 0x1BE53C54: pthread_create@@GLIBC_2.1 (in /lib/tls/libpthread-0.60.so) > ==2005-01-15 20:56:02.665 29303== by 0x1BD1CC1F: ??? (thread.c:793) > ==2005-01-15 20:56:02.665 29303== by 0x1BD1DD18: skthcr (thread.c:1401) > ==2005-01-15 20:56:02.665 29303== by 0x1BCFA06B: skmain (main.c:1237) > ==2005-01-15 20:56:02.665 29303== by 0x804A4B9: main (ssan3sv.c:319) > ==2005-01-15 20:56:02.665 29303== > ==2005-01-15 20:56:02.665 29303== > ==2005-01-15 20:56:02.665 29303== 268 bytes in 1 blocks are definitely lost in loss record 7 of 16 > ==2005-01-15 20:56:02.665 29303== at 0x1B9065C1: malloc (vg_replace_malloc.c:130) > ==2005-01-15 20:56:02.665 29303== by 0x1BD37D95: suwmal (malloc.c:46) > ==2005-01-15 20:56:02.665 29303== by 0x1BCF7579: ??? (log.c:298) > ==2005-01-15 20:56:02.665 29303== by 0x1BCF78D0: sksetp (log.c:520) > ==2005-01-15 20:56:02.665 29303== by 0x1BD033A2: skredi (redir.c:217) > ==2005-01-15 20:56:02.665 29303== by 0x1BCF9E20: skmain (main.c:1124) > ==2005-01-15 20:56:02.665 29303== by 0x804A4B9: main (ssan3sv.c:319) > ==2005-01-15 20:56:02.665 29303== > ==2005-01-15 20:56:02.665 29303== LEAK SUMMARY: > ==2005-01-15 20:56:02.665 29303== definitely lost: 268 bytes in 1 blocks. > ==2005-01-15 20:56:02.665 29303== possibly lost: 68 bytes in 1 blocks. > ==2005-01-15 20:56:02.665 29303== still reachable: 9672 bytes in 24 blocks. > ==2005-01-15 20:56:02.665 29303== suppressed: 0 bytes in 0 blocks. > ==2005-01-15 20:56:02.665 29303== Reachable blocks (those to which a pointer was found) are not shown. > ==2005-01-15 20:56:02.665 29303== To see them, rerun with: --show-reachable=yes J |
|
From: Eyal L. <ey...@ey...> - 2005-01-15 11:56:20
Attachments:
zz33.sh
|
Jeremy Fitzhardinge wrote: [trim] > So the real question is why there's an assertion failure? > > Are you positive that your processes are not crashing like this outside > of Valgrind? Yes, it works just fine. The crash happens when the server is starting, and the plain run has no problems running about 300 regression tests (some rather elaborate). To make it simple, I am attaching a small program that demonstrates the problem. I posted it to the list a while ago, but it never hurts to do it again. This should make it easier to diagnose and (hopefully) fix. -- Eyal Lebedinsky (ey...@ey...) <http://samba.org/eyal/> If attaching .zip rename to .dat |
|
From: Jeremy F. <je...@go...> - 2005-01-16 01:13:41
|
On Sat, 2005-01-15 at 22:56 +1100, Eyal Lebedinsky wrote:
> Jeremy Fitzhardinge wrote:
> [trim]
> > So the real question is why there's an assertion failure?
> >
> > Are you positive that your processes are not crashing like this outside
> > of Valgrind?
>
> Yes, it works just fine. The crash happens when the server is starting, and the
> plain run has no problems running about 300 regression tests (some rather elaborate).
>
> To make it simple, I am attaching a small program that demonstrates the problem.
> I posted it to the list a while ago, but it never hurts to do it again.
>
> This should make it easier to diagnose and (hopefully) fix.
Unfortunately this works fine for me. Could you send me:
* the full output of a failing run
* uname -a and glibc version
* distro?
* strace -f output too
Thanks.
Oh, is there a bug filed on this?
J
|
|
From: Eyal L. <ey...@ey...> - 2005-01-16 21:42:55
Attachments:
vglogs.tar.bz2
|
[original was too big - compress it]
[now the zip file was rejected - try tar]
========================
Jeremy Fitzhardinge wrote:
> On Sat, 2005-01-15 at 22:56 +1100, Eyal Lebedinsky wrote:
>
>>Jeremy Fitzhardinge wrote:
>>[trim]
>>
>>>So the real question is why there's an assertion failure?
>>>
>>>Are you positive that your processes are not crashing like this outside
>>>of Valgrind?
>>
>>Yes, it works just fine. The crash happens when the server is starting, and the
>>plain run has no problems running about 300 regression tests (some rather elaborate).
>>
>>To make it simple, I am attaching a small program that demonstrates the problem.
>>I posted it to the list a while ago, but it never hurts to do it again.
>>
>>This should make it easier to diagnose and (hopefully) fix.
>
> Unfortunately this works fine for me. Could you send me:
> * the full output of a failing run
zz33.log
> * uname -a and glibc version
$ uname -a
Linux e7 2.6.10-ac9 #1 SMP Fri Jan 14 08:56:38 EST 2005 i686 GNU/Linux
This was tested with other kernel versions, same results.
$ gcc --version
gcc (GCC) 3.3.5 (Debian 1:3.3.5-5)
Copyright (C) 2003 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ ldd zz33
libpthread.so.0 => /lib/tls/libpthread.so.0 (0xb7fc7000)
libc.so.6 => /lib/tls/libc.so.6 (0xb7e92000)
/lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0xb7fea000)
$ ls -l /lib/tls/libc-2.3.2.so /lib/tls/libpthread.so.0
-rw-r--r-- 1 root root 1253924 Dec 27 13:41 /lib/tls/libc-2.3.2.so
lrwxrwxrwx 1 root root 18 Jan 5 10:54 /lib/tls/libpthread.so.0 -> libpthread-0.60.so
> * distro?
Debian testing
> * strace -f output too
The vanilla run is zz33-strace-raw.log
The valgrind run is zz33-strace-vg.log
> Thanks.
>
> Oh, is there a bug filed on this?
No
--
Eyal Lebedinsky (ey...@ey...) <http://samba.org/eyal/>
If attaching .zip rename to .dat
|
|
From: Jeremy F. <je...@go...> - 2005-01-17 01:20:55
|
On Sun, 2005-01-16 at 22:28 +1100, Eyal Lebedinsky wrote:
> zz33: ../nptl/sysdeps/unix/sysv/linux/fork.c:132: __libc_fork:
> Assertion `({ __typeof (self->tid) __value; if (sizeof (__value) == 1)
> asm volatile ("movb %%gs:%P2,%b0" : "=q" (__value) : "0" (0),
> "i" (((size_t) &((struct pthread *)0)->tid))); else if (sizeof
> (__value) == 4) asm volatile ("movl %%gs:%P1,%0" : "=r" (__value) :
> "i" (((size_t) &((struct pthread *)0)->tid))); else { if (sizeof
> (__value) != 8) abort (); asm volatile ("movl %%gs:%P1,%%eax\n\t"
> "movl %%gs:%P2,%%edx" : "=A" (__value) : "i" (((size_t) &((struct
> pthread *)0)->tid)), "i" (((size_t) &((struct pthread *)0)->tid) +
> 4)); } __value; }) != ppid' failed.
OK, this looks like the real problem.
Currently, when the sys_clone wrapper sees a clone() which is actually a
fork, it ends up using the fork() syscall instead. However, this
doesn't do the extra things that clone() can do, like writing the parent
and/or child pid into memory, which is what this assert checks for.
However, my glibc does this too, so I don't understand why I (and
everyone else with NPTL glibc) doesn't see this too. Anyway, I'll look
at this in a bit more detail now.
In the meantime as a workaround, try using LD_ASSUME_KERNEL=2.4.0, which
will use LinuxThreads instead, and should be OK with the current code.
J
|
|
From: Jeremy F. <je...@go...> - 2005-01-17 03:18:44
|
On Sun, 2005-01-16 at 17:20 -0800, Jeremy Fitzhardinge wrote: > Currently, when the sys_clone wrapper sees a clone() which is actually a > fork, it ends up using the fork() syscall instead. However, this > doesn't do the extra things that clone() can do, like writing the parent > and/or child pid into memory, which is what this assert checks for. OK, I just checked in a fix for this, so give it a go. J |
|
From: Eyal L. <ey...@ey...> - 2005-01-17 08:45:11
|
Jeremy Fitzhardinge wrote: > On Sun, 2005-01-16 at 17:20 -0800, Jeremy Fitzhardinge wrote: > >>Currently, when the sys_clone wrapper sees a clone() which is actually a >>fork, it ends up using the fork() syscall instead. However, this >>doesn't do the extra things that clone() can do, like writing the parent >>and/or child pid into memory, which is what this assert checks for. > > OK, I just checked in a fix for this, so give it a go. It works, thanks! The tests last many hours, but if I still see any funnies I will report back. -- Eyal Lebedinsky (ey...@ey...) <http://samba.org/eyal/> If attaching .zip rename to .dat |
|
From: Eyal L. <ey...@ey...> - 2005-01-17 13:39:32
Attachments:
zz34.sh
|
Jeremy Fitzhardinge wrote:
> On Sun, 2005-01-16 at 17:20 -0800, Jeremy Fitzhardinge wrote:
>
>>Currently, when the sys_clone wrapper sees a clone() which is actually a
>>fork, it ends up using the fork() syscall instead. However, this
>>doesn't do the extra things that clone() can do, like writing the parent
>>and/or child pid into memory, which is what this assert checks for.
>
>
> OK, I just checked in a fix for this, so give it a go.
>
> J
I still see unusual reports every time I spawn a process. The references to
'status' are flagged as:
==2005-01-18 00:03:38.848 23963== Thread 2:
==2005-01-18 00:03:38.848 23963== Conditional jump or move depends on uninitialised value(s)
==2005-01-18 00:03:38.848 23963== at 0x1BEE2FCF: skspwn (spawn.c:477)
==2005-01-18 00:03:38.848 23963==
==2005-01-18 00:03:38.848 23963== Thread 2:
==2005-01-18 00:03:38.848 23963== Conditional jump or move depends on uninitialised value(s)
==2005-01-18 00:03:38.848 23963== at 0x1BEE2FF3: skspwn (spawn.c:479)
==2005-01-18 00:03:38.848 23963==
==2005-01-18 00:03:38.848 23963== Thread 2:
==2005-01-18 00:03:38.848 23963== Conditional jump or move depends on uninitialised value(s)
==2005-01-18 00:03:38.848 23963== at 0x1BEE3034: skspwn (spawn.c:484)
pid = fork();
if (0 == pid) { /* child */
handle child (exec)
}
if (pid < 0) {
handle failure
}
rc = waitpid (pid, &status, 0));
if (-1 == rc)
...
477 if (WIFEXITED (status)) {
rc = WEXITSTATUS (status); /* child ok */
479 if (rc) {
...
484 another use of 'rc'
The "uninitialised value" then progresses to the caller etc.
But this is not my real problem. Attached is a program that
aborts when I use waitpid() in place of wait(). This is my
current showstopper.
--
Eyal Lebedinsky (ey...@ey...) <http://samba.org/eyal/>
If attaching .zip rename to .dat
|
|
From: Jeremy F. <je...@go...> - 2005-01-17 17:23:45
|
On Tue, 2005-01-18 at 00:39 +1100, Eyal Lebedinsky wrote: > But this is not my real problem. Attached is a program that > aborts when I use waitpid() in place of wait(). This is my > current showstopper. Can you send the output of the program aborting? zz34 works for me. J |
|
From: Eyal L. <ey...@ey...> - 2005-01-17 22:29:31
|
Jeremy Fitzhardinge wrote: > On Tue, 2005-01-18 at 00:39 +1100, Eyal Lebedinsky wrote: > >>But this is not my real problem. Attached is a program that >>aborts when I use waitpid() in place of wait(). This is my >>current showstopper. > > > Can you send the output of the program aborting? zz34 works for me. I just pulled the latest cvs, it made no difference (it included only some suppressions). However this made me see that I made a minor change in my tree a long time ago, which is required for my tests. # cvs diff include/tool.h.base Index: include/tool.h.base =================================================================== RCS file: /home/kde/valgrind/include/tool.h.base,v retrieving revision 1.19 diff -r1.19 tool.h.base 53c53 < #define VG_N_THREADS 100 --- > #define VG_N_THREADS 500 -- Eyal Lebedinsky (ey...@ey...) <http://samba.org/eyal/> If attaching .zip rename to .dat |
|
From: Eyal L. <ey...@ey...> - 2005-01-17 22:12:38
Attachments:
logs.tar.bz2
zz34.sh
|
Jeremy Fitzhardinge wrote: > On Tue, 2005-01-18 at 00:39 +1100, Eyal Lebedinsky wrote: > >>But this is not my real problem. Attached is a program that >>aborts when I use waitpid() in place of wait(). This is my >>current showstopper. > > > Can you send the output of the program aborting? zz34 works for me. Naturally. Strangely enough, this morning it would not behave in the same way. abort.log is a copy+paste off my xterm of the abort I had last night. uninited.log is the log I get this morning with the error (but no crash). I will try and reproduce the crash and see what it depends on. I should say that my original test did run OK (with the uninited errors) for a while (26 tests were fine) before programs started crashing (every single one). So there is something that triggers this problem, maybe the uninited error is an indication that something may behave unpredictably. -- Eyal Lebedinsky (ey...@ey...) <http://samba.org/eyal/> If attaching .zip rename to .dat |
|
From: Jeremy F. <je...@go...> - 2005-01-17 22:50:25
|
On Tue, 2005-01-18 at 09:12 +1100, Eyal Lebedinsky wrote: > abort.log is a copy+paste off my xterm of the abort I had last night. It just quietly died with SIGSEGV? And then kept doing that once it started? Very odd. J |
|
From: Jeremy F. <je...@go...> - 2005-01-19 03:05:00
|
On Tue, 2005-01-18 at 09:12 +1100, Eyal Lebedinsky wrote: > uninited.log is the log I get this morning with the error (but no crash). I just checked in a fix for this. If wait*() was interrupted by SIGCHLD, Valgrind would fail to correctly note that *status had been written to. It shouldn't have caused any functional problems though. J |
|
From: Josef W. <Jos...@gm...> - 2005-01-19 10:13:29
|
On Monday 17 January 2005 23:47, Jeremy Fitzhardinge wrote: > On Tue, 2005-01-18 at 09:12 +1100, Eyal Lebedinsky wrote: > > abort.log is a copy+paste off my xterm of the abort I had last night. > > It just quietly died with SIGSEGV? And then kept doing that once it > started? Very odd. Is this on Suse 9.2 ? Sometimes on Suse 9.2 (every kernel until now, currently 2.6.8-24.10) the kernel starts to give back the wrong faulting address to the Segfault handler (always address 0 instead of the real one). This of course kills valgrind. This behaviour is on user basis. Strangely, if you log out and in again it works again... Josef > > J > > > > ------------------------------------------------------- > The SF.Net email is sponsored by: Beat the post-holiday blues > Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. > It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt > _______________________________________________ > Valgrind-developers mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-developers |
|
From: Jeremy F. <je...@go...> - 2005-01-19 10:33:40
|
On Tue, 2005-01-18 at 22:49 +0100, Josef Weidendorfer wrote: > On Monday 17 January 2005 23:47, Jeremy Fitzhardinge wrote: > > On Tue, 2005-01-18 at 09:12 +1100, Eyal Lebedinsky wrote: > > > abort.log is a copy+paste off my xterm of the abort I had last night. > > > > It just quietly died with SIGSEGV? And then kept doing that once it > > started? Very odd. > > Is this on Suse 9.2 ? > Sometimes on Suse 9.2 (every kernel until now, currently 2.6.8-24.10) the > kernel starts to give back the wrong faulting address to the Segfault handler > (always address 0 instead of the real one). This of course kills valgrind. > This behaviour is on user basis. Strangely, if you log out and in again it > works again... I've noticed that with the stock 2.6.10 FC3 kernel as well. It fixed itself after a short period of time, which doesn't make me feel any better... The kernel.org kernels seem fine. J |