|
From: Anthony L. <an...@co...> - 2008-05-07 00:30:59
|
Kay, Allen M wrote:
> Kvm kernel changes.
>
> Signed-off-by: Allen M Kay <all...@in...>
>
> ------
> arch/x86/kvm/Makefile | 2
> arch/x86/kvm/vtd.c | 183
> +++++++++++++++++++++++++++++++++++++++++++++
> arch/x86/kvm/x86.c | 7 +
> include/asm-x86/kvm_host.h | 3
> include/asm-x86/kvm_para.h | 1
> include/linux/kvm_host.h | 6 +
> virt/kvm/kvm_main.c | 3
> 7 files changed, 204 insertions(+), 1 deletion(-)
>
> ------
>
> diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
> index c97d35c..b1057fb 100644
> --- a/arch/x86/kvm/Makefile
> +++ b/arch/x86/kvm/Makefile
> @@ -12,7 +12,7 @@ EXTRA_CFLAGS += -Ivirt/kvm -Iarch/x86/kvm
> kvm-objs := $(common-objs) x86.o mmu.o x86_emulate.o i8259.o irq.o
> lapic.o \
> i8254.o
> obj-$(CONFIG_KVM) += kvm.o
> -kvm-intel-objs = vmx.o
> +kvm-intel-objs = vmx.o vtd.o
> obj-$(CONFIG_KVM_INTEL) += kvm-intel.o
> kvm-amd-objs = svm.o
> obj-$(CONFIG_KVM_AMD) += kvm-amd.o
> diff --git a/arch/x86/kvm/vtd.c b/arch/x86/kvm/vtd.c
> new file mode 100644
> index 0000000..9a080b5
> --- /dev/null
> +++ b/arch/x86/kvm/vtd.c
> @@ -0,0 +1,183 @@
> +/*
> + * Copyright (c) 2006, Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify
> it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but
> WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
> or
> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public
> License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License
> along with
> + * this program; if not, write to the Free Software Foundation, Inc.,
> 59 Temple
> + * Place - Suite 330, Boston, MA 02111-1307 USA.
> + *
> + * Copyright (C) 2006-2008 Intel Corporation
> + * Author: Allen M. Kay <all...@in...>
> + * Author: Weidong Han <wei...@in...>
> + */
> +
> +#include <linux/list.h>
> +#include <linux/kvm_host.h>
> +#include <linux/pci.h>
> +#include <linux/dmar.h>
> +#include <linux/intel-iommu.h>
> +
> +//#define DEBUG
> +
> +#define DEFAULT_DOMAIN_ADDRESS_WIDTH 48
> +
> +struct dmar_drhd_unit * dmar_find_matched_drhd_unit(struct pci_dev
> *dev);
> +struct dmar_domain * iommu_alloc_domain(struct intel_iommu *iommu);
> +void iommu_free_domain(struct dmar_domain *domain);
> +int domain_init(struct dmar_domain *domain, int guest_width);
> +int domain_context_mapping(struct dmar_domain *d,
> + struct pci_dev *pdev);
> +int domain_page_mapping(struct dmar_domain *domain, dma_addr_t iova,
> + u64 hpa, size_t size, int prot);
> +void detach_domain_for_dev(struct dmar_domain *domain, u8 bus, u8
> devfn);
> +struct dmar_domain * find_domain(struct pci_dev *pdev);
>
These definitely need to be moved to a common header.
> +
> +int kvm_iommu_map_pages(struct kvm *kvm,
> + gfn_t base_gfn, unsigned long npages)
> +{
> + unsigned long gpa;
> + struct page *page;
> + hpa_t hpa;
> + int j, write;
> + struct vm_area_struct *vma;
> +
> + if (!kvm->arch.domain)
> + return 1;
>
In the kernel, we should be using -errno to return error codes.
> + gpa = base_gfn << PAGE_SHIFT;
> + page = gfn_to_page(kvm, base_gfn);
> + hpa = page_to_phys(page);
>
Please use gfn_to_pfn(). Keep in mind, by using gfn_to_page/gfn_to_pfn,
you take a reference to a page. You're leaking that reference here.
> + printk(KERN_DEBUG "kvm_iommu_map_page: gpa = %lx\n", gpa);
> + printk(KERN_DEBUG "kvm_iommu_map_page: hpa = %llx\n", hpa);
> + printk(KERN_DEBUG "kvm_iommu_map_page: size = %lx\n",
> + npages*PAGE_SIZE);
> +
> + for (j = 0; j < npages; j++) {
> + gpa += PAGE_SIZE;
> + page = gfn_to_page(kvm, gpa >> PAGE_SHIFT);
> + hpa = page_to_phys(page);
>
Again, gfn_to_pfn() and you're taking a reference that I never see you
releasing.
> + domain_page_mapping(kvm->arch.domain, gpa, hpa,
> PAGE_SIZE,
> + DMA_PTE_READ | DMA_PTE_WRITE);
> + vma = find_vma(current->mm, gpa);
> + if (!vma)
> + return 1;
> + write = (vma->vm_flags & VM_WRITE) != 0;
> + get_user_pages(current, current->mm, gpa,
> + PAGE_SIZE, write, 0, NULL, NULL);
>
I don't quite see what you're doing here. It looks like you're trying
to pre-fault the page in? gfn_to_pfn will do that for you. You're
taking a bunch of references here that are never getting released.
I think the general approach here is a bit faulty. I think what we want
to do is mlock() from userspace to ensure all the memory is present for
the guest. We should combine this with MMU-notifiers such that whenever
the userspace mapping changes, we can reprogram the IOMMU. In the case
where we don't have MMU-notifiers, we simply hold on to the memory
forever and never program the IOMMU. The initial mlock() in userspace
is somewhat of a nop here but it's important with MMU-notifiers because
we will no longer be holding a reference for guest memory.
We have to ensure we don't swap KVM guest memory while using hardware
pass-through, but AFAICT, we do not need to make the memory
non-reclaimable As long as we reprogram the IOMMU with a new, valid,
mapping everything should be fine. mlock() really gives us the right
semantics.
Semantically, a PV API that supports DMA window registration simply
mlock()s the DMA regions on behalf of the guest. No special logic
should be needed.
Regards,
Anthony Liguori
|