|
From: Konstantin S. <kon...@gm...> - 2009-01-20 13:29:21
|
Hello, I am experiencing a problem with valgrind: the memcheck process becomes zombie at the very end of the program execution. This happens with the fresh valgrind trunk. I run valgrind -v --trace-syscalls=yes my_program and it hangs one out of 5-10 runs. 'top' shows: 7806 me 18 0 0 0 0 Z 0 0.0 0:16.27 memcheck <defunct> The last lines printed by valgrind before it hangs are: SYSCALL[7806,4]( 9) sys_mmap ( 0x0, 69632, 7, 98, -1, 0 ) --> [pre-success] Success(0xbb6b000) SYSCALL[7806,4]( 10) sys_mprotect ( 0xbb6b000, 4096, 0 )[sync] --> Success(0x0) SYSCALL[7806,4]( 56) sys_clone ( 3d0f00, 0xbb7a1d0, 0xbb7b9f0, 0xbb7b9f0, 0xbb7b960 ) --> [pre-success] Success(0x1f93) SYSCALL[7806,4]( 96) sys_gettimeofday ( 0xb75bd70, 0x0 )[sync] --> Success(0x0) SYSCALL[7806,4]( 96) sys_gettimeofday ( 0xb75bac0, 0x0 )[sync] --> Success(0x0) SYSCALL[7806,4]( 96) sys_gettimeofday ( 0xb75bd70, 0x0 )[sync] --> Success(0x0) SYSCALL[7806,4]( 96) sys_gettimeofday ( 0xb75bac0, 0x0 )[sync] --> Success(0x0) SYSCALL[7806,4](202) sys_futex ( 0x8427820, 0, 0, 0xb75bb20, 0x8427820 ) --> [async] ... SYSCALL[7806,4](202) ... [async] --> Failure(0x6e) SYSCALL[7806,4]( 96) sys_gettimeofday ( 0xb75bac0, 0x0 )[sync] --> Success(0x0) SYSCALL[7806,4]( 96) sys_gettimeofday ( 0xb75bd70, 0x0 )[sync] --> Success(0x0) SYSCALL[7806,4]( 96) sys_gettimeofday ( 0xb75bac0, 0x0 )[sync] --> Success(0x0) SYSCALL[7806,4](202) sys_futex ( 0x8427820, 0, 0, 0xb75bb20, 0x8427820 ) --> [async] ... SYSCALL[7806,1]( 11) sys_munmap ( 0x8437000, 8192 )[sync] --> Success(0x0) SYSCALL[7806,1](231) exit_group( 0 ) --> [pre-success] Success(0x0) SYSCALL[7806,5](186) sys_gettid ()[sync] --> Success(0x1f93) SYSCALL[7806,5](202) sys_futex ( 0x8427f20, 0, 0, 0x0, 0x8427f20 ) --> [async] ... SYSCALL[7806,4](202) ... [async] --> Failure(0x6e) If valgrind passes, it does not print lines containing "Failure(0x6e)" Any idea how to attack this problem further? Thanks, --kcc |
|
From: Konstantin S. <kon...@gm...> - 2009-01-30 16:02:00
|
Any idea? --kcc On Tue, Jan 20, 2009 at 4:29 PM, Konstantin Serebryany <kon...@gm...> wrote: > Hello, > > I am experiencing a problem with valgrind: the memcheck process > becomes zombie at the very end of the program execution. > This happens with the fresh valgrind trunk. > > I run > valgrind -v --trace-syscalls=yes my_program > and it hangs one out of 5-10 runs. > > 'top' shows: > 7806 me 18 0 0 0 0 Z 0 0.0 0:16.27 memcheck <defunct> > > The last lines printed by valgrind before it hangs are: > SYSCALL[7806,4]( 9) sys_mmap ( 0x0, 69632, 7, 98, -1, 0 ) --> > [pre-success] Success(0xbb6b000) > SYSCALL[7806,4]( 10) sys_mprotect ( 0xbb6b000, 4096, 0 )[sync] --> Success(0x0) > SYSCALL[7806,4]( 56) sys_clone ( 3d0f00, 0xbb7a1d0, 0xbb7b9f0, > 0xbb7b9f0, 0xbb7b960 ) --> [pre-success] Success(0x1f93) > SYSCALL[7806,4]( 96) sys_gettimeofday ( 0xb75bd70, 0x0 )[sync] --> Success(0x0) > SYSCALL[7806,4]( 96) sys_gettimeofday ( 0xb75bac0, 0x0 )[sync] --> Success(0x0) > SYSCALL[7806,4]( 96) sys_gettimeofday ( 0xb75bd70, 0x0 )[sync] --> Success(0x0) > SYSCALL[7806,4]( 96) sys_gettimeofday ( 0xb75bac0, 0x0 )[sync] --> Success(0x0) > SYSCALL[7806,4](202) sys_futex ( 0x8427820, 0, 0, 0xb75bb20, 0x8427820 > ) --> [async] ... > SYSCALL[7806,4](202) ... [async] --> Failure(0x6e) > SYSCALL[7806,4]( 96) sys_gettimeofday ( 0xb75bac0, 0x0 )[sync] --> Success(0x0) > SYSCALL[7806,4]( 96) sys_gettimeofday ( 0xb75bd70, 0x0 )[sync] --> Success(0x0) > SYSCALL[7806,4]( 96) sys_gettimeofday ( 0xb75bac0, 0x0 )[sync] --> Success(0x0) > SYSCALL[7806,4](202) sys_futex ( 0x8427820, 0, 0, 0xb75bb20, 0x8427820 > ) --> [async] ... > SYSCALL[7806,1]( 11) sys_munmap ( 0x8437000, 8192 )[sync] --> Success(0x0) > SYSCALL[7806,1](231) exit_group( 0 ) --> [pre-success] Success(0x0) > SYSCALL[7806,5](186) sys_gettid ()[sync] --> Success(0x1f93) > SYSCALL[7806,5](202) sys_futex ( 0x8427f20, 0, 0, 0x0, 0x8427f20 ) --> > [async] ... > SYSCALL[7806,4](202) ... [async] --> Failure(0x6e) > > If valgrind passes, it does not print lines containing "Failure(0x6e)" > > Any idea how to attack this problem further? > > Thanks, > > --kcc > |
|
From: Julian S. <js...@ac...> - 2009-01-30 16:40:27
|
> > SYSCALL[7806,5](202) sys_futex ( 0x8427f20, 0, 0, 0x0, 0x8427f20 ) --> > > [async] ... > > SYSCALL[7806,4](202) ... [async] --> Failure(0x6e) > > > > If valgrind passes, it does not print lines containing "Failure(0x6e)" > > > > Any idea how to attack this problem further? This is with a completely vanilla, unmodified Valgrind, yes? Figure out what 0x6E means, so we can see why sys_futex is failing. It's an Exxxx value (eg, ENOSYS, etc) logically from /usr/include/sys/errno.h. Some of the values are defined in ./include/vki/vki*.h, but I can't find 0x6E (110) there. J |
|
From: Konstantin S. <kon...@gm...> - 2009-01-30 16:47:45
|
On Fri, Jan 30, 2009 at 7:39 PM, Julian Seward <js...@ac...> wrote: > >> > SYSCALL[7806,5](202) sys_futex ( 0x8427f20, 0, 0, 0x0, 0x8427f20 ) --> >> > [async] ... >> > SYSCALL[7806,4](202) ... [async] --> Failure(0x6e) >> > >> > If valgrind passes, it does not print lines containing "Failure(0x6e)" >> > >> > Any idea how to attack this problem further? > > This is with a completely vanilla, unmodified Valgrind, yes? yes, I verified it with today's trunk. > > Figure out what 0x6E means, so we can see why sys_futex is failing. > It's an Exxxx value (eg, ENOSYS, etc) logically from /usr/include/sys/errno.h. > Some of the values are defined in ./include/vki/vki*.h, but I can't find 0x6E > (110) there. /usr/grte/v1//include/linux/errno.h:#define ETIMEDOUT 110 /* Connection timed out */ > > J > |
|
From: Julian S. <js...@ac...> - 2009-01-30 16:52:21
|
> /usr/grte/v1//include/linux/errno.h:#define ETIMEDOUT 110 > /* Connection timed out */ Then it's a timing-related problem? What happens if you run with --tool=none, can you still reproduce it? J |
|
From: Konstantin S. <kon...@gm...> - 2009-01-30 17:59:49
|
On Fri, Jan 30, 2009 at 7:51 PM, Julian Seward <js...@ac...> wrote: > >> /usr/grte/v1//include/linux/errno.h:#define ETIMEDOUT 110 >> /* Connection timed out */ > > Then it's a timing-related problem? What happens if you run with > --tool=none, can you still reproduce it? Same thing... kcc 29787 0.1 0.0 0 0 pts/11 Zl+ 19:57 0:06 [none] <defunct> |