Share

kernel virtual machine

Tracker: Bugs

5 EXPLOITABLE failure to validate cr3 after KVM_SET_SREGS - ID: 2687641
Last Update: Comment added ( mtosatti )

This applies to kvm-84 and earlier (and possibly to the in-kernel kvm
version too) on all x86 machines in all guest modes (32-bit, PAE, 64-bit).

Userspace callers of KVM_SET_SREGS can pass a bogus value of cr3 to the
kernel. This will trigger a NULL pointer access in gfn_to_rmap() when
userspace next tries to call KVM_RUN on the affected VCPU and kvm attempts
to activate the new non-existent page table root.

This happens since kvm only validates that cr3 points to a valid guest
physical memory page when code *inside* the guest sets cr3. However, kvm
currently trusts the userspace caller (e.g. QEMU) on the host machine to
always supply a valid page table root, rather than properly validating it
along with the rest of the reloaded guest state.

The result is extraordinarily disruptive: the kernel correctly catches the
NULL pointer, but kvm is holding numerous kernel-wide locks by that point.
This means the calling process (e.g. QEMU) can't be killed, any other
process that accesses parts of sysfs or the filesystem will hang, and in
some cases sync() itself hangs before attempting to cleanly reboot,
resulting in data loss.

This becomes an EXPLOITABLE bug when non-root users can create or
manipulate KVM VMs (i.e. /dev/kvm is g+rw, which is the default in many
distros now), and QEMU or another tool is modified to pass corrupted cr3
values to kvm.

Two other remotely exploitable attack vectors are also possible:

- cr3 can be corrupted in a QEMU snapshot on disk, which will trigger the
bug when the snapshot is restored.

- cr3 can be corrupted during migration or network transfer, resulting in
the migration target host system hanging.

Fortunately, it doesn't appear this bug can be exploited for privilege
escalation, but it's clearly a possible DoS attack vector. It could also be
used to trigger data loss on the host if the attacker ensures a malicious
QEMU process is holding locks on certain files and/or sysfs entries at the
time the attack is attemped (either explicitly or via race conditions).

I've attached a patch that adds cr3 validation to mmu_alloc_roots(), so
vcpu_enter_guest() will simply return -EFAULT to userspace if it tries to
activate a cr3 value pointing to an inaccessible guest physical page. In
addition, the patch prints a syslog warning like "mmu_alloc_roots(vcpu 0):
caller attempted to load a bogus root_gfn xxxxxx".


Matt T. Yourst ( yourst ) - 2009-03-15 12:36

5

Closed

None

Nobody/Anonymous

kernel

None

Public


Comments ( 2 )

Date: 2009-05-18 22:13
Sender: mtosattiProject Admin

Fixed, thanks for the report.


Date: 2009-03-16 16:07
Sender: avikProject Admin

Good catch. However, instead of adding extra checks, I suggest using
kvm_set_cr3(). This will reduce the number of code paths. I also suggest
setting KVM_REQ_TRIPLE_FAULT to kill the vcpu instead of injecting a #GP;
this is more similar to real hardware.




Attached File ( 1 )

Filename Description Download
kvm-84-check-bogus-cr3-after-set-sregs.diff Patch to add cr3 validation when guest is restarted after KVM_SET_SREGS Download

Changes ( 4 )

Field Old Value Date By
status_id Open 2009-05-18 22:13 mtosatti
allow_comments 1 2009-05-18 22:13 mtosatti
close_date - 2009-05-18 22:13 mtosatti
File Added 317901: kvm-84-check-bogus-cr3-after-set-sregs.diff 2009-03-15 12:36 yourst