From: John R. <jreiser@BitWagon.com> - 2007-12-09 04:24:55
|
In source file arch/um/os-Linux/process.c there is a warning: ----- /* Don't use the glibc version, which caches the result in TLS. It misses some * syscalls, and also breaks with clone(), which does not unshare the TLS. */ int os_getpid(void) ----- I see no os_clone(), yet the glibc clone() does the same caching of pid in ThreadLocalStorage [TLS], and the TLS still may be shared. If nobody reads glibc's shared TLS slot for PID then an actual bug will be avoided. However, it is unsafe to leave such a tempting pitfall. Also, if you are ptrace()ing through a glibc clone(), then in many cases you will see syscall(__NR_getpid) *from glibc* immediately following! There is an "extra" getpid() that the tracking logic might not expect. So it seems to me that there should be an os_clone() that refrains from fiddling with getpid. [Unfortunately os_clone() is not so simple as os_getpid().] The clone() we're talking about here is _not_ the bare syscall: ----- _syscall5(int, clone, int, flags, void *, child_stack, int *, parent_tidptr, struct user_desc *, newtls, int *, child_tidptr) ----- but rather the C-language interface: ----- int clone(int (*fn)(void *), void *child_stack, int flags, void *arg, ... /* pid_t *pid, struct user_desc *tls, pid_t *ctid */ ); ----- which is declared in arch/um/include/kern.h and referenced in arch/um/drivers/ubd_user.c arch/um/kernel/tt/tracer.c arch/um/os/tt.c arch/um/os/start_up.c arch/um/os/skas/process.c This clone() is implemented by glibc, and at runtime lives in the shared library /lib/libc.so.6. Not only that, but some versions of glibc for x86 use "int $0x80" directly only for the __NR_clone call. They use "ENTER_KERNEL" for the getpid(), which in some cases (such as Fedora 7 and 8, but not Ubuntu 7.04) expands to "call *%gs:nnnnn" which points at "sysenter; ret". -- John Reiser, jreiser@BitWagon.com |
From: Jeff D. <jd...@ad...> - 2007-12-09 15:11:05
|
On Sat, Dec 08, 2007 at 08:24:57PM -0800, John Reiser wrote: >In source file arch/um/os-Linux/process.c there is a warning: >----- >/* Don't use the glibc version, which caches the result in TLS. It misses some > * syscalls, and also breaks with clone(), which does not unshare the TLS. > */ > >int os_getpid(void) AFAIK, there are two reasons that we make sure we call the system call getpid - when we're testing ptrace, we want getpid to result in exactly one system call there was a brief bug in libc where getpid returned the wrong answer due to caching across a clone This bug did get into the wild, and people did try to run UML on such systems, so that's why the code is careful about making the system call. > I see no os_clone(), yet the glibc clone() does the same caching of pid in > ThreadLocalStorage [TLS], and the TLS still may be shared. If nobody reads > glibc's shared TLS slot for PID then an actual bug will be avoided. However, > it is unsafe to leave such a tempting pitfall. What's the actual bug, exactly? As long as libc's getpid gives us the right answer, we're happy. > Also, if you are ptrace()ing > through a glibc clone(), then in many cases you will see syscall(__NR_getpid) > *from glibc* immediately following! There is an "extra" getpid() > that the tracking logic might not expect. Where do we care about how clone translates into a system call? > arch/um/drivers/ubd_user.c > arch/um/kernel/tt/tracer.c > arch/um/os/tt.c > arch/um/os/start_up.c > arch/um/os/skas/process.c These guys all just want a new process - they don't care how it happens. Is something in here causing valgrind some trouble? Jeff -- Work email - jdike at linux dot intel dot com |
From: John R. <jreiser@BitWagon.com> - 2007-12-09 20:58:31
|
Jeff Dike wrote: > On Sat, Dec 08, 2007 at 08:24:57PM -0800, John Reiser wrote: >>I see no os_clone(), yet the glibc clone() does the same caching of pid in >>ThreadLocalStorage [TLS], and the TLS still may be shared. If nobody reads >>glibc's shared TLS slot for PID then an actual bug will be avoided. However, >>it is unsafe to leave such a tempting pitfall. > > > What's the actual bug, exactly? As long as libc's getpid gives us the > right answer, we're happy. The actual bug is unnecessary complexity, which slows down development. [And glibc's getpid still may give the wrong answer. glibc caching getpid() requires that glibc is the only implementor of fork or clone. (Obviously uml and/or valgrind *could* violate this assumption.) glibc getpid() may give the wrong answer when non-glibc code does a fork() or clone(). glibc-2.6+ getpid() also gives the wrong answer when called from a signal handler, if the signal is delivered immediately after the __NR_clone but before glibc updates its cache %gs:PID. (glibc should poison its cache before doing the __NR_clone syscall. Yeah, it's a bug in glibc. Section 2.4.3 of the Single UNIX Specification requires that getpid() be async-signal-safe, which means that getpid() may be called from a signal handler.) A virtualizer, such as valgrind, is *likely* to trigger this race.] The "normal" code within UML is not the only player who wants the right answer from getpid(). Temporary debugging code also wants the right answer. That's still "internal" to UML, so the "BEWARE!" might excuse the fact that "getpid()" gives the wrong answer. But there is also valgrind in the same new process, and "getpid()" giving the wrong answer is less excusable. It "shouldn't happen", but the number of different getpid() is growing, and remembering which one(s) are unreliable (and why), and ensuring that you aren't using one of them, becomes difficult. > > >>Also, if you are ptrace()ing >>through a glibc clone(), then in many cases you will see syscall(__NR_getpid) >>*from glibc* immediately following! There is an "extra" getpid() >>that the tracking logic might not expect. > > > Where do we care about how clone translates into a system call? check_sysemu() in arch/um/os/start_up.c cares that the actual sequence of system calls is: __NR_clone __NR_getpid presumably from ptrace_child() calling os_getpid() However, when using the clone() from glibc then the actual sequence is: __NR_clone [random system calls from glibc] __NR_getpid from ptrace_child calling os_getpid() Now it "accidentally" happens that "random system calls from glibc" is at most one syscall, and if present it is '__NR_getpid', which is the same syscall as the presumed os_getpid() from ptrace_child(). That's a "lucky" break. > > >> arch/um/drivers/ubd_user.c >> arch/um/kernel/tt/tracer.c >> arch/um/os/tt.c >> arch/um/os/start_up.c >> arch/um/os/skas/process.c > > > These guys all just want a new process - they don't care how it > happens. Not so. userspace() in arch/um/os/skas/process.c relies on stack location and signal delivery that is mediated by the combination of clone() and ptrace(). userspace() also depends on the carry-over of signal handlers, particularly the SIGSEGV ==> SIGUSR1 trampoline. "Just a new process" isn't good enough. > Is something in here causing valgrind some trouble? Yes. It is not simple for the current valgrind to "let go" of a new child. [The current valgrind knows how to "let go" only at execve().] The internal logic of valgrind requires that the creation of the new process [implemented by clone(,, ~CLONE_VM & ( ),,)] must be done with all signals blocked. Therefore the child side must do a sigprocmask() somewhere to unblock, and the uml ptrace()ing is not expecting this. I'm working on it, and moving ahead, but progress is slow. -- John Reiser, jreiser@BitWagon.com |
From: Jeff D. <jd...@ad...> - 2007-12-10 17:09:39
|
On Sun, Dec 09, 2007 at 12:58:33PM -0800, John Reiser wrote: > The actual bug is unnecessary complexity, which slows down development. We live in an imperfect world. I'd like to be able to call getpid and know that I'm always going to get the right answer. However, I don't want to have to debug the cases where it doesn't, so the current code seems like the best arrangement. > > Where do we care about how clone translates into a system call? > > check_sysemu() in arch/um/os/start_up.c cares that the actual sequence of > system calls is: > __NR_clone > __NR_getpid presumably from ptrace_child() calling os_getpid() > However, when using the clone() from glibc then the actual sequence is: > __NR_clone > [random system calls from glibc] > __NR_getpid from ptrace_child calling os_getpid() No, it doesn't. The only thing is cares about is that the system call following the kill(pid, SIGSTOP) in ptrace_child is a getpid. It cares not at all about the details of the clone. > > These guys all just want a new process - they don't care how it > > happens. > > Not so. userspace() in arch/um/os/skas/process.c relies on > stack location and signal delivery that is mediated by the combination > of clone() and ptrace(). userspace() also depends on the carry-over > of signal handlers, particularly the SIGSEGV ==> SIGUSR1 trampoline. No, it doesn't. userspace_tramp sets up its own handlers after the clone - it doesn't rely on inheriting them, and in fact it can't, since the stub's SEGV handler makes no sense in the UML kernel. It may inherit a signal mask, but as far as I'm concerned, that's a bug. As far as a stack location goes, that's part of the interface, so that's just part of creating a new process. > > Is something in here causing valgrind some trouble? > > Yes. It is not simple for the current valgrind to "let go" of > a new child. [The current valgrind knows how to "let go" only at > execve().] The internal logic of valgrind requires that the creation > of the new process [implemented by clone(,, ~CLONE_VM & ( ),,)] must be done > with all signals blocked. Therefore the child side must do a > sigprocmask() somewhere to unblock, and the uml ptrace()ing is not > expecting this. I don't see how this can be causing problems. If the child side of valgrind makes some extra system calls before allowing the child to run, I don't see UML caring about that at all. Any system call tracing is done only after seeing the child hit itself with a signal. Any extra system calls in libc's clone or the child side valgrind are long over by this point. Jeff -- Work email - jdike at linux dot intel dot com |