|
From: Jeremy F. <je...@go...> - 2003-10-14 17:30:11
|
Hi all,
I've checked in all my system calls rework of the last few months onto
CVS HEAD. This patch revamps the way Valgrind handles signals and
syscalls by eliminating the need to turn all blocking syscalls into
non-blocking. It does this by creating a worker task for each thread
(ProxyLWP is the term I use) which executes blocking syscalls on the
behalf of that thread. It also handles signals, since it needs to be
the same thread to get signals interrupting syscalls properly.
My intention with this change was to make Valgrind support the full
range of Posix signal behaviour. It should now handle all the strange
corners of signal handling, including the SA_SIGINFO flag, blocking
signals around a handler, and so on.
There are a couple of minor improvements which make day-to-day use
better as well:
* It now reports fatal signals rather than just quietly dying; if
the signal is a core-dumper, it also shows the stack backtrace.
* When reporting the use of a bad FD, -v also makes it show the
backtrace to the site of the bad usage (making the message
actually useful)
* Lots of other little things
My target for this patch was Linux 2.6. I've tried as hard as possible
to make Linux 2.4 work well, but there are a few deficient corners - I
suspect no real programs will encounter them (especially if they ever
worked with the previous signals/syscalls implementation). RedHat's
version of 2.4 includes a number of 2.6 changes, and so should work more
like 2.6 than 2.4.
This is a large, complex change. I've been testing it pretty
extensively, but I expect there's still some bugs in there (I found one
just before checking in). Please sync to CVS HEAD and try it out on
your favorite programs.
New command-line options:
* --signal-polltime=<time> - on 2.4, Valgrind has to poll for
pending signals to deliver them to the appropriate threads. By
default this is every 50ms.
* --lowlat-signals=no|yes - normally the thread scheduler applies
strict a round-robin policy, but if you have a thread blocking
on signals competing against lots of CPU-bound threads, it could
take a while before it gets run. This option makes the
scheduler start running the signalled thread immediately.
* --lowlat-syscalls=no|yes - this is the same as lowlat-signals,
except that it wakes the thread quickly when a syscall unblocks.
J
|
|
From: Tom H. <th...@cy...> - 2003-10-19 15:03:54
|
In message <106...@ix...>
Jeremy Fitzhardinge <je...@go...> wrote:
> This is a large, complex change. I've been testing it pretty
> extensively, but I expect there's still some bugs in there (I found one
> just before checking in). Please sync to CVS HEAD and try it out on
> your favorite programs.
Your commit has finally shown up in the anonymous CVS tree and it
all seems to be fine so far, except that valgrind fails unless you
have a large soft limit on file descriptors, as follows:
audi [~] % valgrind ls
==2035== valgrind: failed to move logfile fd into safe range
==2035== Memcheck, a.k.a. Valgrind, a memory error detector for x86-linux.
==2035== Copyright (C) 2002-2003, and GNU GPL'd, by Julian Seward.
==2035== Using valgrind-20030725, a program supervision framework for x86-linux.
==2035== Copyright (C) 2000-2003, and GNU GPL'd, by Julian Seward.
valgrind: vg_mylibc.c:1229 (vgPlain_safe_fd): Assertion `newfd > (1024 - (100*2 + 4))' failed.
==2035== at 0x4016E66B: ???
==2035== by 0x4016E66A: ???
==2035== by 0x4016E6C5: ???
==2035== by 0x4016E7B8: ???
Looking at the code this is because the code that sets up the proxy
threads tries to dup descriptors up to a high number but my descriptor
limit was set to 256 by default.
Tom
--
Tom Hughes (th...@cy...)
Software Engineer, Cyberscience Corporation
http://www.cyberscience.com/
|
|
From: Jeremy F. <je...@go...> - 2003-10-19 16:43:59
|
On Sun, 2003-10-19 at 07:38, Tom Hughes wrote: > In message <106...@ix...> > Jeremy Fitzhardinge <je...@go...> wrote: > > > This is a large, complex change. I've been testing it pretty > > extensively, but I expect there's still some bugs in there (I found one > > just before checking in). Please sync to CVS HEAD and try it out on > > your favorite programs. > > Your commit has finally shown up in the anonymous CVS tree and it > all seems to be fine so far, except that valgrind fails unless you > have a large soft limit on file descriptors, as follows: Thanks, I was wondering when this would turn up as a problem. I guess I'll have to try and raise the soft limit to at least 1024. J |
|
From: Tom H. <th...@cy...> - 2003-10-19 23:29:19
|
In message <106...@ix...>
Jeremy Fitzhardinge <je...@go...> wrote:
> On Sun, 2003-10-19 at 07:38, Tom Hughes wrote:
>
> > Your commit has finally shown up in the anonymous CVS tree and it
> > all seems to be fine so far, except that valgrind fails unless you
> > have a large soft limit on file descriptors, as follows:
>
> Thanks, I was wondering when this would turn up as a problem. I guess
> I'll have to try and raise the soft limit to at least 1024.
The only problem with that is, of course, that it makes it impossible
to use valgrind on any sort of problem where the number of available
file descriptors affects the behaviour of the program.
I think that probably what needs to be done is something along the
following lines:
- Add a switch (--descriptor-limit?) to control the descriptor
limit that valgrind will try and select
- Make valgrind raise the soft limit to the descriptor limit
chosen by the user using the switch, or 1024 if no switch
- Make the safe_fd routine start allocating FDs from some offset
below the chosen soft limit rather than a fixed point
That should make everything work reasonably well by default whilst
still allowing the user to reduce the limit on the rare occasions
when it is necessary to provoke a bug.
Tom
--
Tom Hughes (th...@cy...)
Software Engineer, Cyberscience Corporation
http://www.cyberscience.com/
|
|
From: Nicholas N. <nj...@ca...> - 2003-10-20 10:07:17
|
On Tue, 14 Oct 2003, Jeremy Fitzhardinge wrote: > This is a large, complex change. I've been testing it pretty > extensively, but I expect there's still some bugs in there (I found one > just before checking in). Please sync to CVS HEAD and try it out on > your favorite programs. --trace-malloc=yes seems to be broken. Running on 'date', I get hundreds of these messages: ==12538== Warning: bad use of file descriptor 821 in syscall write() ==12538== Use --logfile-fd=<number> to select an alternative logfile fd. but no --trace-malloc output. Kernel 2.4.19 again, glibc-2.3.2. N |
|
From: Jeremy F. <je...@go...> - 2003-10-20 18:20:19
|
On Mon, 2003-10-20 at 01:27, Nicholas Nethercote wrote: > On Tue, 14 Oct 2003, Jeremy Fitzhardinge wrote: > > > This is a large, complex change. I've been testing it pretty > > extensively, but I expect there's still some bugs in there (I found one > > just before checking in). Please sync to CVS HEAD and try it out on > > your favorite programs. > > --trace-malloc=yes seems to be broken. Running on 'date', I get hundreds > of these messages: > > ==12538== Warning: bad use of file descriptor 821 in syscall write() > ==12538== Use --logfile-fd=<number> to select an alternative logfile fd. > > but no --trace-malloc output. Kernel 2.4.19 again, glibc-2.3.2. Looks like a bug in --trace-malloc - its printfs seem to be being run in the virtual machine rather than the real one... I think Robert's VALGRIND_PRINTF patch might be in order. J |
|
From: Tom H. <th...@cy...> - 2003-10-21 15:31:41
|
In message <yek...@au...>
Tom Hughes <th...@cy...> wrote:
> My real problem seems to be that the waitpid is sometimes returning a
> nonsense error of -512 at the system call level. I've written a little
> test program, which is attached, and when it is run under valgrind
> this is what I see:
So the -512 is actually ERESTARTSYS but that shouldn't be making it
back to my program - either the system call should be restarted for me
or I should get EINTR back.
Bizarrely if I install a signal handler for SIGCHLD then although
valgrind still shows ERESTARTSYS as the result of the wait4 system
call the waitpid library routine gives me a zero result, without it
having restarted the system call.
Without the signal handler I get ERESTARTSYS back from the waitpid
library routine.
Tom
--
Tom Hughes (th...@cy...)
Software Engineer, Cyberscience Corporation
http://www.cyberscience.com/
|
|
From: Tom H. <th...@cy...> - 2003-10-21 18:16:54
|
In message <106...@ix...>
Jeremy Fitzhardinge <je...@go...> wrote:
> This is a large, complex change. I've been testing it pretty
> extensively, but I expect there's still some bugs in there (I found one
> just before checking in). Please sync to CVS HEAD and try it out on
> your favorite programs.
I've another problem now - all attempts to send SIGINT or SIGKILL seem
to be suppressed for some reason. We have some code which does this:
kill( child, SIGKILL );
waitpid( child, &status, 0 );
Unfortunately this is now hanging because the pre-syscall action for
kill looks like this:
PRE(kill)
{
/* int kill(pid_t pid, int sig); */
MAYBE_PRINTF("kill ( %d, %d )\n", arg1,arg2);
if (arg2 == VKI_SIGVGINT || arg2 == VKI_SIGVGKILL)
res = -VKI_EINVAL;
}
Because this suppresses SIGKILL the waitpid then hangs...
Actually there is something even more bizarre than that happening
as we only actually do the waitpid if the kill succeeds, but we still
seem to be hanging in waitpid, as shown:
SYSCALL[14042,1]( 37):kill ( 14080, 9 )
SYSCALL[14042,1](114) blocking:wait4 ( 14080, 0xBFFF69F0, 0, 0x0 )
Anyway, the point is that it's highly antisocial to mysteriously
break kill for certain signals, and I can't see an obvious reason
for it.
Tom
--
Tom Hughes (th...@cy...)
Software Engineer, Cyberscience Corporation
http://www.cyberscience.com/
|
|
From: Jeremy F. <je...@go...> - 2003-10-22 00:26:26
|
On Tue, 2003-10-21 at 04:04, Tom Hughes wrote:
> Unfortunately this is now hanging because the pre-syscall action for
> kill looks like this:
>
> PRE(kill)
> {
> /* int kill(pid_t pid, int sig); */
> MAYBE_PRINTF("kill ( %d, %d )\n", arg1,arg2);
> if (arg2 == VKI_SIGVGINT || arg2 == VKI_SIGVGKILL)
> res = -VKI_EINVAL;
> }
>
> Because this suppresses SIGKILL the waitpid then hangs...
No, it suppresses SIG*VG*INT and SIG*VG*KILL, which is are magic
internal signals I'm using (32 and 33).
I'm still trying to reproduce your ERESTARTSYS problem.
J
|
|
From: Tom H. <th...@cy...> - 2003-10-21 23:07:00
|
In message <1066768769.4354.2.camel@localhost.localdomain>
Jeremy Fitzhardinge <je...@go...> wrote:
> On Tue, 2003-10-21 at 04:04, Tom Hughes wrote:
> > Unfortunately this is now hanging because the pre-syscall action for
> > kill looks like this:
> >
> > PRE(kill)
> > {
> > /* int kill(pid_t pid, int sig); */
> > MAYBE_PRINTF("kill ( %d, %d )\n", arg1,arg2);
> > if (arg2 == VKI_SIGVGINT || arg2 == VKI_SIGVGKILL)
> > res = -VKI_EINVAL;
> > }
> >
> > Because this suppresses SIGKILL the waitpid then hangs...
>
> No, it suppresses SIG*VG*INT and SIG*VG*KILL, which is are magic
> internal signals I'm using (32 and 33).
Yes I realised that shortly after I sent that which is what led to
the later posts pinpointing waitpid as the problem.
Tom
--
Tom Hughes (th...@cy...)
Software Engineer, Cyberscience Corporation
http://www.cyberscience.com/
|
|
From: Tom H. <th...@cy...> - 2003-10-21 23:10:05
Attachments:
sigkill.c
|
In message <1066774658.4409.2.camel@localhost.localdomain>
Jeremy Fitzhardinge <je...@go...> wrote:
> OK, there seems to be some slight timing difference between 2.4-RH and
> 2.6, or something. Anyway, I think this is the right fix - can you test
> it out?
I'm running on a dual processor box which may change things given
that there are at least three threads of control running here.
That patch certainly improves things - it fixes the case where there is
no handler installed for the signal. The modified test program that is
attached to this message, and which installs a SIGCHLD handler, exhibits
a different problem however.
Specifically the wait4 system call exists with ERESTARTSYS but is isn't
restarted and the waitpid library call returns 114 which is the number of
the wait4 system call, as shown here:
==9444== Memcheck, a.k.a. Valgrind, a memory error detector for x86-linux.
==9444== Copyright (C) 2002-2003, and GNU GPL'd, by Julian Seward.
==9444== Using valgrind-20030725, a program supervision framework for x86-linux.
==9444== Copyright (C) 2000-2003, and GNU GPL'd, by Julian Seward.
--9444-- sys_wait_results: got PX_SetSigmask for TID 1
--9444-- sys_wait_results: got PX_SetSigmask for TID 1
==9444== Estimated CPU clock rate is 2005 MHz
==9444== For more details, rerun with: -v
==9444==
SYSCALL[9444,1](174) special:sigaction ( 17, 0xBFFFD420, 0x0 )
SYSCALL[9444,1]( 2):fork ()
fork: process 9444 created child 9451
SYSCALL[9444,1]( 37):kill ( 9451, 9 )
SYSCALL[9444,1](114) blocking:wait4 ( 9451, 0xBFFFD504, 0, 0x0 )
--9444-- sys_wait_results: got PX_Signal for TID 1, signal 17
--9444-- sys_wait_results: got PX_RunSyscall for TID 1: syscall 114 result -512
--9444-- sys_wait_results: got PX_SetSigmask for TID 1
SYSCALL[9444,1](197):fstat64 ( 1, 0xBFFFCD20 )
SYSCALL[9444,1](192):mmap2 ( 0x0, 4096, 3, 34, -1, 0 )
==9444== Use of uninitialised value of size 4
==9444== at 0x4027C94F: _IO_vfprintf_internal (in /lib/i686/libc-2.3.2.so)
==9444== by 0x40285311: _IO_printf (in /lib/i686/libc-2.3.2.so)
==9444== by 0x8048668: main (in /home/thh/vgtest/sigkill)
==9444== by 0x40248A06: __libc_start_main (in /lib/i686/libc-2.3.2.so)
==9444==
==9444== Conditional jump or move depends on uninitialised value(s)
==9444== at 0x4027C957: _IO_vfprintf_internal (in /lib/i686/libc-2.3.2.so)
==9444== by 0x40285311: _IO_printf (in /lib/i686/libc-2.3.2.so)
==9444== by 0x8048668: main (in /home/thh/vgtest/sigkill)
==9444== by 0x40248A06: __libc_start_main (in /lib/i686/libc-2.3.2.so)
==9444==
==9444== Conditional jump or move depends on uninitialised value(s)
==9444== at 0x4027C020: _IO_vfprintf_internal (in /lib/i686/libc-2.3.2.so)
==9444== by 0x40285311: _IO_printf (in /lib/i686/libc-2.3.2.so)
==9444== by 0x8048668: main (in /home/thh/vgtest/sigkill)
==9444== by 0x40248A06: __libc_start_main (in /lib/i686/libc-2.3.2.so)
==9444==
==9444== Conditional jump or move depends on uninitialised value(s)
==9444== at 0x4027C07E: _IO_vfprintf_internal (in /lib/i686/libc-2.3.2.so)
==9444== by 0x40285311: _IO_printf (in /lib/i686/libc-2.3.2.so)
==9444== by 0x8048668: main (in /home/thh/vgtest/sigkill)
==9444== by 0x40248A06: __libc_start_main (in /lib/i686/libc-2.3.2.so)
==9444==
==9444== Conditional jump or move depends on uninitialised value(s)
==9444== at 0x4027C0AF: _IO_vfprintf_internal (in /lib/i686/libc-2.3.2.so)
==9444== by 0x40285311: _IO_printf (in /lib/i686/libc-2.3.2.so)
==9444== by 0x8048668: main (in /home/thh/vgtest/sigkill)
==9444== by 0x40248A06: __libc_start_main (in /lib/i686/libc-2.3.2.so)
==9444==
==9444== Conditional jump or move depends on uninitialised value(s)
==9444== at 0x4027C0E0: _IO_vfprintf_internal (in /lib/i686/libc-2.3.2.so)
==9444== by 0x40285311: _IO_printf (in /lib/i686/libc-2.3.2.so)
==9444== by 0x8048668: main (in /home/thh/vgtest/sigkill)
==9444== by 0x40248A06: __libc_start_main (in /lib/i686/libc-2.3.2.so)
==9444==
==9444== Conditional jump or move depends on uninitialised value(s)
==9444== at 0x4027C10C: _IO_vfprintf_internal (in /lib/i686/libc-2.3.2.so)
==9444== by 0x40285311: _IO_printf (in /lib/i686/libc-2.3.2.so)
==9444== by 0x8048668: main (in /home/thh/vgtest/sigkill)
==9444== by 0x40248A06: __libc_start_main (in /lib/i686/libc-2.3.2.so)
SYSCALL[9444,1]( 4) blocking:write ( 1, 0x40230000, 40 )
Child 114 exited with status 1073831928
--9444-- sys_wait_results: got PX_RunSyscall for TID 1: syscall 4 result 40
SYSCALL[9444,1]( 91):munmap ( 0x40230000, 4096 )
--9444-- Caught __NR_exit; running __libc_freeres()
SYSCALL[9444,1]( 91):munmap ( 0x0, 0 )
--9444-- __libc_freeres() done; really quitting!
==9444==
==9444== ERROR SUMMARY: 25 errors from 7 contexts (suppressed: 0 from 0)
==9444== malloc/free: in use at exit: 0 bytes in 0 blocks.
==9444== malloc/free: 0 allocs, 0 frees, 0 bytes allocated.
==9444== For a detailed leak analysis, rerun with: --leak-check=yes
==9444== For counts of detected errors, rerun with: -v
Tom
--
Tom Hughes (th...@cy...)
Software Engineer, Cyberscience Corporation
http://www.cyberscience.com/
|
|
From: Jeremy F. <je...@go...> - 2003-10-22 01:49:35
|
On Tue, 2003-10-21 at 16:08, Tom Hughes wrote:
> In message <1066774658.4409.2.camel@localhost.localdomain>
> Jeremy Fitzhardinge <je...@go...> wrote:
>
> > OK, there seems to be some slight timing difference between 2.4-RH and
> > 2.6, or something. Anyway, I think this is the right fix - can you test
> > it out?
>
> I'm running on a dual processor box which may change things given
> that there are at least three threads of control running here.
No, its all nicely reproducible on my UP laptop, so I think it's just
something to do with the way the kernels work. I'm not really sure why
I didn't see this under 2.6.
> That patch certainly improves things - it fixes the case where there is
> no handler installed for the signal. The modified test program that is
> attached to this message, and which installs a SIGCHLD handler, exhibits
> a different problem however.
>
> Specifically the wait4 system call exists with ERESTARTSYS but is isn't
> restarted and the waitpid library call returns 114 which is the number of
> the wait4 system call, as shown here:
Gak. I tested all this, honest. Try this (manually hacked up, so the
lines will be all off) patch.
Index: vg_proxylwp.c
===================================================================
RCS file: /cvsroot/valgrind/valgrind/coregrind/vg_proxylwp.c,v
retrieving revision 1.1
diff -c -r1.1 vg_proxylwp.c
*** vg_proxylwp.c 13 Oct 2003 22:26:54 -0000 1.1
--- vg_proxylwp.c 22 Oct 2003 00:34:49 -0000
***************
*** 714,720 ****
case PX_RunSyscall:
/* Run a syscall for our thread; results will be poked
back into tst */
! reply.syscallno = tst->m_eax;
vg_assert(px->state == PXS_WaitReq ||
px->state == PXS_SigACK);
--- 716,722 ----
case PX_RunSyscall:
/* Run a syscall for our thread; results will be poked
back into tst */
! reply.syscallno = tst->syscallno;
vg_assert(px->state == PXS_WaitReq ||
px->state == PXS_SigACK);
***************
*** 728,734 ****
reply.syscallno);
tst->m_eax = -VKI_ERESTARTSYS;
} else {
! Int syscallno = tst->m_eax;
px->state = PXS_RunSyscall;
/* If we're interrupted before we get to the syscall
--- 730,736 ----
reply.syscallno);
tst->m_eax = -VKI_ERESTARTSYS;
} else {
! Int syscallno = tst->syscallno;
px->state = PXS_RunSyscall;
/* If we're interrupted before we get to the syscall
***************
*** 1264,1269 ****
--- 1271,1279 ----
req.request = PX_RunSyscall;
+ tst->syscallno = tst->m_eax;
+ tst->m_eax = -VKI_ERESTARTSYS;
+
/* clear the results pipe before we try to write to a proxy to
prevent a deadlock */
VG_(proxy_results)();
|
|
From: Tom H. <th...@cy...> - 2003-10-22 13:12:46
|
In message <1066783120.4409.10.camel@localhost.localdomain>
Jeremy Fitzhardinge <je...@go...> wrote:
> On Tue, 2003-10-21 at 16:08, Tom Hughes wrote:
>
>> Specifically the wait4 system call exists with ERESTARTSYS but is isn't
>> restarted and the waitpid library call returns 114 which is the number of
>> the wait4 system call, as shown here:
>
> Gak. I tested all this, honest. Try this (manually hacked up, so the
> lines will be all off) patch.
That seems to fix my test case, yes.
Naturally enough that meant that it didn't fix my original problem in
our software though ;-)
Several hours chasing around and poring over linux kernel source have
led me to a sort of understanding about my actual problem and a way to
work around it in valgrind.
My original thesis that the SIGKILL wasn't making it to the child is
in fact correct, as can be seen by the fact that when the parent hangs
waiting for the child the child is still running quite happily.
What seems to happen is that there is some sort of race condition such
that if the SIGKILL is sent by the parent after the child process has
called exec() but before the new executable actually starts running
then the signal can disappear. This only seems to happen if the
process that calls exec() has more than one thread, as is the case now
that valgrind has a proxy thread running.
My money is on the culprit being de_thread() in fs/exec.c in the
kernel, which does all sorts of mucking about to kill the extra
threads and unshare the signal handling apparatus. Either the signal
is pending and is discarded when a new clean signal handling structure
is created for the thread doing the exec() or the signal is being
delivered but is only killing the other threads and not the one doing
the exec().
Either way, patching valgrind to explicitly kill the proxy thread in
the pre handler for execve seems to work around the problem... Whether
this is a safe thing to do isn't totally clear to me though.
Tom
--
Tom Hughes (th...@cy...)
Software Engineer, Cyberscience Corporation
http://www.cyberscience.com/
|
|
From: Jeremy F. <je...@go...> - 2003-10-22 16:12:39
|
On Wed, 2003-10-22 at 04:25, Tom Hughes wrote: > My money is on the culprit being de_thread() in fs/exec.c in the > kernel, which does all sorts of mucking about to kill the extra > threads and unshare the signal handling apparatus. Either the signal > is pending and is discarded when a new clean signal handling structure > is created for the thread doing the exec() or the signal is being > delivered but is only killing the other threads and not the one doing > the exec(). Yes, that code is all very tricky. Valgrind has triggered a number of bugs in the 2.6 implementation. > Either way, patching valgrind to explicitly kill the proxy thread in > the pre handler for execve seems to work around the problem... Whether > this is a safe thing to do isn't totally clear to me though. It is, and I've made that change to fix problems with SuSE's kernel - it doesn't kill the other threads on exec, so you end up with proxy LWPs hanging around marooned in old address spaces. The patch is to replace the "tid" with "VG_INVALID_THREADID" in the VG_(nuke_all_threads_except)() line. J |
|
From: Jeremy F. <je...@go...> - 2003-10-22 21:45:43
Attachments:
execve-fix.diff
|
On Wed, 2003-10-22 at 09:01, Jeremy Fitzhardinge wrote: > It is, and I've made that change to fix problems with SuSE's kernel - it > doesn't kill the other threads on exec, so you end up with proxy LWPs > hanging around marooned in old address spaces. The patch is to replace > the "tid" with "VG_INVALID_THREADID" in the > VG_(nuke_all_threads_except)() line. Hm, this isn't actually safe, but neither was the old code. It makes a real mess of the process/valgrind state, which is OK if the execve works. But if it fails, we basically cannot continue, since we've effectively killed all the threads anyway. As a workaround, I've changed PRE(execve) to do some preflight tests to see if the execve can possibly work, and fail out early if not. The execve may still fail, but the only thing we can do is panic. J |