Thread: [uml-devel] should there be os_clone() analogous to os_getpid() ?

Brought to you by: blaisorblade, derrichard, jdike, rusty

user-mode-linux-devel

[uml-devel] should there be os_clone() analogous to os_getpid() ?

From: John R. <jreiser@BitWagon.com> - 2007-12-09 04:24:55

In source file arch/um/os-Linux/process.c there is a warning:
-----
/* Don't use the glibc version, which caches the result in TLS. It misses some
 * syscalls, and also breaks with clone(), which does not unshare the TLS.
 */

int os_getpid(void)
-----

I see no os_clone(), yet the glibc clone() does the same caching of pid in
ThreadLocalStorage [TLS], and the TLS still may be shared.  If nobody reads
glibc's shared TLS slot for PID then an actual bug will be avoided.  However,
it is unsafe to leave such a tempting pitfall.  Also, if you are ptrace()ing
through a glibc clone(), then in many cases you will see syscall(__NR_getpid)
*from glibc* immediately following!  There is an "extra" getpid()
that the tracking logic might not expect.  So it seems to me that
there should be an os_clone() that refrains from fiddling with getpid.
[Unfortunately os_clone() is not so simple as os_getpid().]


The clone() we're talking about here is _not_ the bare syscall:
-----
	_syscall5(int, clone, int, flags, void *, child_stack,
            int *, parent_tidptr, struct user_desc *, newtls,
            int *, child_tidptr)
-----
but rather the C-language interface:
-----
       int clone(int (*fn)(void *), void *child_stack,
                 int flags, void *arg, ...
                 /* pid_t *pid, struct user_desc *tls, pid_t *ctid */ );
-----
which is declared in arch/um/include/kern.h and referenced in
   arch/um/drivers/ubd_user.c
   arch/um/kernel/tt/tracer.c
   arch/um/os/tt.c
   arch/um/os/start_up.c
   arch/um/os/skas/process.c
This clone() is implemented by glibc, and at runtime lives in the shared
library /lib/libc.so.6.

Not only that, but some versions of glibc for x86 use "int $0x80" directly
only for the __NR_clone call.  They use "ENTER_KERNEL" for the getpid(),
which in some cases (such as Fedora 7 and 8, but not Ubuntu 7.04) expands to
"call *%gs:nnnnn" which points at "sysenter; ret".

-- 
John Reiser, jreiser@BitWagon.com

Re: [uml-devel] should there be os_clone() analogous to os_getpid() ?

From: Jeff D. <jd...@ad...> - 2007-12-09 15:11:05

On Sat, Dec 08, 2007 at 08:24:57PM -0800, John Reiser wrote:
>In source file arch/um/os-Linux/process.c there is a warning:
>-----
>/* Don't use the glibc version, which caches the result in TLS. It misses some
> * syscalls, and also breaks with clone(), which does not unshare the TLS.
> */
> 
>int os_getpid(void)

AFAIK, there are two reasons that we make sure we call the system call
getpid -
	when we're testing ptrace, we want getpid to result in exactly
one system call
	there was a brief bug in libc where getpid returned the wrong
answer due to caching across a clone

This bug did get into the wild, and people did try to run UML on such
systems, so that's why the code is careful about making the system call.

> I see no os_clone(), yet the glibc clone() does the same caching of pid in
> ThreadLocalStorage [TLS], and the TLS still may be shared.  If nobody reads
> glibc's shared TLS slot for PID then an actual bug will be avoided.  However,
> it is unsafe to leave such a tempting pitfall.

What's the actual bug, exactly?  As long as libc's getpid gives us the
right answer, we're happy.

> Also, if you are ptrace()ing
> through a glibc clone(), then in many cases you will see syscall(__NR_getpid)
> *from glibc* immediately following!  There is an "extra" getpid()
> that the tracking logic might not expect.

Where do we care about how clone translates into a system call?

>    arch/um/drivers/ubd_user.c
>    arch/um/kernel/tt/tracer.c
>    arch/um/os/tt.c
>    arch/um/os/start_up.c
>    arch/um/os/skas/process.c

These guys all just want a new process - they don't care how it
happens.

Is something in here causing valgrind some trouble?

				Jeff

-- 
Work email - jdike at linux dot intel dot com

Re: [uml-devel] should there be os_clone() analogous to os_getpid() ?

From: John R. <jreiser@BitWagon.com> - 2007-12-09 20:58:31

Jeff Dike wrote:
> On Sat, Dec 08, 2007 at 08:24:57PM -0800, John Reiser wrote:

>>I see no os_clone(), yet the glibc clone() does the same caching of pid in
>>ThreadLocalStorage [TLS], and the TLS still may be shared.  If nobody reads
>>glibc's shared TLS slot for PID then an actual bug will be avoided.  However,
>>it is unsafe to leave such a tempting pitfall.
> 
> 
> What's the actual bug, exactly?  As long as libc's getpid gives us the
> right answer, we're happy.

The actual bug is unnecessary complexity, which slows down development.
[And glibc's getpid still may give the wrong answer.  glibc caching getpid()
requires that glibc is the only implementor of fork or clone.
(Obviously uml and/or valgrind *could* violate this assumption.)
glibc getpid() may give the wrong answer when non-glibc code does a fork() or
clone().  glibc-2.6+ getpid() also gives the wrong answer when called
from a signal handler, if the signal is delivered immediately after the
__NR_clone but before glibc updates its cache %gs:PID.  (glibc should poison
its cache before doing the __NR_clone syscall.  Yeah, it's a bug in glibc.
Section 2.4.3 of the Single UNIX Specification requires that getpid() be
async-signal-safe, which means that getpid() may be called from a signal
handler.)  A virtualizer, such as valgrind, is *likely* to trigger this race.]

The "normal" code within UML is not the only player who wants the right answer
from getpid().  Temporary debugging code also wants the right answer.  That's
still "internal" to UML, so the "BEWARE!" might excuse the fact that "getpid()"
gives the wrong answer.  But there is also valgrind in the same new process,
and "getpid()" giving the wrong answer is less excusable.  It "shouldn't happen",
but the number of different getpid() is growing, and remembering which one(s)
are unreliable (and why), and ensuring that you aren't using one of them,
becomes difficult.

> 
> 
>>Also, if you are ptrace()ing
>>through a glibc clone(), then in many cases you will see syscall(__NR_getpid)
>>*from glibc* immediately following!  There is an "extra" getpid()
>>that the tracking logic might not expect.
> 
> 
> Where do we care about how clone translates into a system call?

check_sysemu() in arch/um/os/start_up.c cares that the actual sequence of
system calls is:
	__NR_clone
	__NR_getpid  presumably from ptrace_child() calling os_getpid()
However, when using the clone() from glibc then the actual sequence is:
	__NR_clone
	[random system calls from glibc]
	__NR_getpid  from ptrace_child calling os_getpid()
Now it "accidentally" happens that "random system calls from glibc"
is at most one syscall, and if present it is '__NR_getpid', which is the
same syscall as the presumed os_getpid() from ptrace_child().  That's
a "lucky" break.

> 
> 
>>   arch/um/drivers/ubd_user.c
>>   arch/um/kernel/tt/tracer.c
>>   arch/um/os/tt.c
>>   arch/um/os/start_up.c
>>   arch/um/os/skas/process.c
> 
> 
> These guys all just want a new process - they don't care how it
> happens.

Not so.  userspace() in arch/um/os/skas/process.c relies on
stack location and signal delivery that is mediated by the combination
of clone() and ptrace().  userspace() also depends on the carry-over
of signal handlers, particularly the SIGSEGV ==> SIGUSR1 trampoline.
"Just a new process" isn't good enough.

> Is something in here causing valgrind some trouble?

Yes.  It is not simple for the current valgrind to "let go" of
a new child.  [The current valgrind knows how to "let go" only at
execve().]  The internal logic of valgrind requires that the creation
of the new process [implemented by clone(,, ~CLONE_VM & ( ),,)] must be done
with all signals blocked.  Therefore the child side must do a
sigprocmask() somewhere to unblock, and the uml ptrace()ing is not
expecting this.  I'm working on it, and moving ahead, but
progress is slow.

-- 
John Reiser, jreiser@BitWagon.com

Re: [uml-devel] should there be os_clone() analogous to os_getpid() ?

From: Jeff D. <jd...@ad...> - 2007-12-10 17:09:39

On Sun, Dec 09, 2007 at 12:58:33PM -0800, John Reiser wrote:
> The actual bug is unnecessary complexity, which slows down development.

We live in an imperfect world.  I'd like to be able to call getpid and
know that I'm always going to get the right answer.  However, I don't
want to have to debug the cases where it doesn't, so the current code
seems like the best arrangement.

> > Where do we care about how clone translates into a system call?
> 
> check_sysemu() in arch/um/os/start_up.c cares that the actual sequence of
> system calls is:
> 	__NR_clone
> 	__NR_getpid  presumably from ptrace_child() calling os_getpid()
> However, when using the clone() from glibc then the actual sequence is:
> 	__NR_clone
> 	[random system calls from glibc]
> 	__NR_getpid  from ptrace_child calling os_getpid()

No, it doesn't.  The only thing is cares about is that the system call
following the kill(pid, SIGSTOP) in ptrace_child is a getpid.  It
cares not at all about the details of the clone.

> > These guys all just want a new process - they don't care how it
> > happens.
> 
> Not so.  userspace() in arch/um/os/skas/process.c relies on
> stack location and signal delivery that is mediated by the combination
> of clone() and ptrace().  userspace() also depends on the carry-over
> of signal handlers, particularly the SIGSEGV ==> SIGUSR1 trampoline.

No, it doesn't.  userspace_tramp sets up its own handlers after the
clone - it doesn't rely on inheriting them, and in fact it can't,
since the stub's SEGV handler makes no sense in the UML kernel.  It
may inherit a signal mask, but as far as I'm concerned, that's a bug.
As far as a stack location goes, that's part of the interface, so
that's just part of creating a new process.

> > Is something in here causing valgrind some trouble?
> 
> Yes.  It is not simple for the current valgrind to "let go" of
> a new child.  [The current valgrind knows how to "let go" only at
> execve().]  The internal logic of valgrind requires that the creation
> of the new process [implemented by clone(,, ~CLONE_VM & ( ),,)] must be done
> with all signals blocked.  Therefore the child side must do a
> sigprocmask() somewhere to unblock, and the uml ptrace()ing is not
> expecting this.

I don't see how this can be causing problems.  If the child side of
valgrind makes some extra system calls before allowing the child to
run, I don't see UML caring about that at all.

Any system call tracing is done only after seeing the child hit itself
with a signal.  Any extra system calls in libc's clone or the child
side valgrind are long over by this point.

				Jeff

-- 
Work email - jdike at linux dot intel dot com