This release is largely security changes. I'd like to thank Dartmouth
ISTS and Bill Stearns for sponsoring this work. Basically, nearly all
the changes needed to make UML a secure root jail are in now. The exception
is a hole which requires a fix on the host rather than in UML.
Before getting into the security stuff, here are the other changes in this
The md config is now pulled into the UML config.
A segfault when a network interface has no IP addresses was fixed.
James McMechan's latest changes to the ubd driver are in. end_request
is now locked properly. The construction and dispatch of a request is
now much cleaner.
ubd_ioctl now calls blk_ioctl.
A stupid bug in the signal delivery code was fixed.
execve now uses KERNEL_CALL like it always should have.
Removed the ignoring of SIGSEGV from the gdb init string since it is
no longer routed through the debugger.
OK, now for the security stuff:
When a process is in userspace, all kernel memory (with a few exceptions) -
kernel text, static data, the heap, physical memory, and kernel virtual
memory - is write-protected.
The exceptions are:
The thread private page (actually two pages, one of text, one of data)
The text is only run before the kernel boots, and the data contains
A page of static data which contains timer_on and missed_ticks. The
timer interrupt has to be able to run at any time, and it wants to
modify those two variables.
Three pages of kernel stack. These don't matter because they are
completely reinitialized before being used, so any scribbling on
them from userspace will just be overwritten.
The only /proc or /dev files that I know of that allow access to kernel memory
are /dev/mem and /dev/kmem. These have been disabled by removing CAP_SYS_RAWIO
from the bounding capability set (and thanks to the person on #kernelnewbies
who clued me in to that trick!).
UML no longer reads /proc/self/maps, so /proc is no longer required for
running UML in a chroot jail. Some device nodes may be, depending on how
UML is run.
'honeypot' enables 'jail'. It doesn't seem to make sense to run a honeypot
without it also being a jail, so this makes sure.
With 'honeypot', a number of system calls need to be treated specially
because STACK_TOP > TASK_SIZE. This causes getname to return -EFAULT for
any filenames on the stack. To get around this, all system calls that
take filenames as arguments have KERNEL_DS enabled before making the
system call. Any of those system calls which also have output buffers
have those buffers checked for validity before making the system call.
The one thing that still needs doing is to implement a personality on
the host which segfaults any attempts to make lcall system calls rather
than feeding them into the SysV compatibility code.