You can subscribe to this list here.
2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(33) |
Nov
(325) |
Dec
(320) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2007 |
Jan
(484) |
Feb
(438) |
Mar
(407) |
Apr
(713) |
May
(831) |
Jun
(806) |
Jul
(1023) |
Aug
(1184) |
Sep
(1118) |
Oct
(1461) |
Nov
(1224) |
Dec
(1042) |
2008 |
Jan
(1449) |
Feb
(1110) |
Mar
(1428) |
Apr
(1643) |
May
(682) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Hollis B. <ho...@us...> - 2008-04-29 15:08:48
|
On Monday 28 April 2008 16:23:04 Jerone Young wrote: > +/* This function is to manipulate a cell with multiple values */ > +void dt_cell_multi(void *fdt, char *node_path, char *property, > + uint32_t *val_array, int size) > +{ > + > + int offset; > + int ret; Could you please be more careful with your whitespace? -- Hollis Blanchard IBM Linux Technology Center |
From: Joerg R. <joe...@am...> - 2008-04-29 15:05:24
|
The current KVM x86 exception code handles double and triple faults only for page fault exceptions. This patch extends this detection for every exception that gets queued for the guest. Signed-off-by: Joerg Roedel <joe...@am...> Cc: Jan Kiszka <jan...@si...> --- arch/x86/kvm/x86.c | 31 +++++++++++++++++-------------- 1 files changed, 17 insertions(+), 14 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 578a0c1..c05aa32 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -144,9 +144,21 @@ void kvm_set_apic_base(struct kvm_vcpu *vcpu, u64 data) } EXPORT_SYMBOL_GPL(kvm_set_apic_base); +static void handle_multiple_faults(struct kvm_vcpu *vcpu) +{ + if (vcpu->arch.exception.nr != DF_VECTOR) { + vcpu->arch.exception.nr = DF_VECTOR; + vcpu->arch.exception.error_code = 0; + } else + set_bit(KVM_REQ_TRIPLE_FAULT, &vcpu->requests); +} + void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr) { - WARN_ON(vcpu->arch.exception.pending); + if (vcpu->arch.exception.pending) { + handle_multiple_faults(vcpu); + return; + } vcpu->arch.exception.pending = true; vcpu->arch.exception.has_error_code = false; vcpu->arch.exception.nr = nr; @@ -157,25 +169,16 @@ void kvm_inject_page_fault(struct kvm_vcpu *vcpu, unsigned long addr, u32 error_code) { ++vcpu->stat.pf_guest; - if (vcpu->arch.exception.pending) { - if (vcpu->arch.exception.nr == PF_VECTOR) { - printk(KERN_DEBUG "kvm: inject_page_fault:" - " double fault 0x%lx\n", addr); - vcpu->arch.exception.nr = DF_VECTOR; - vcpu->arch.exception.error_code = 0; - } else if (vcpu->arch.exception.nr == DF_VECTOR) { - /* triple fault -> shutdown */ - set_bit(KVM_REQ_TRIPLE_FAULT, &vcpu->requests); - } - return; - } vcpu->arch.cr2 = addr; kvm_queue_exception_e(vcpu, PF_VECTOR, error_code); } void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code) { - WARN_ON(vcpu->arch.exception.pending); + if (vcpu->arch.exception.pending) { + handle_multiple_faults(vcpu); + return; + } vcpu->arch.exception.pending = true; vcpu->arch.exception.has_error_code = true; vcpu->arch.exception.nr = nr; -- 1.5.3.7 |
From: Andrea A. <an...@qu...> - 2008-04-29 14:55:09
|
On Tue, Apr 29, 2008 at 09:32:09AM -0500, Anthony Liguori wrote: > + vma = find_vma(current->mm, addr); > + if (vma == NULL) { > + get_page(bad_page); > + return page_to_pfn(bad_page); > + } Here you must check vm_start address, find_vma only checks addr < vm_end but there's no guarantee addr >= vm_start yet. > + > + BUG_ON(!(vma->vm_flags & VM_IO)); For consistency we should return bad_page and not bug on, VM_IO and VM_PFNMAP can theoretically not be set at the same time, otherwise get_user_pages would be buggy checking against VM_PFNMAP|VM_IO. I doubt anybody isn't setting VM_IO before calling remap_pfn_range but anyway... Secondly the really correct check is against VM_PFNMAP. This is because PFNMAP is set at the same time of vm_pgoff = pfn. VM_IO is not even if in theory if a driver uses ->fault instead of remap_pfn_range, shouldn't set VM_IO and it should only set VM_RESERVED. VM_IO is about keeping gdb/coredump out as they could mess with the hardware if they read, PFNMAP is about remap_pfn_range having been called and pgoff pointing to the first pfn mapped at vm_start address. Patch is in the right direction, way to go! |
From: Glauber C. <gc...@re...> - 2008-04-29 14:51:53
|
Amit Shah wrote: > We introduce three hypercalls: > 1. When the guest wants to check if a particular device is an assigned device > (this is done once per device by the guest to enable / disable hypercall- > based translation of addresses) > > 2. map: to convert guest phyical addresses to host physical address to pass on > to the device for DMA. We also pin the pages thus requested so that they're > not swapped out. > > 3. unmap: to unpin the pages and free any information we might have stored. > > Signed-off-by: Amit Shah <ami...@qu...> > --- > arch/x86/kvm/x86.c | 211 +++++++++++++++++++++++++++++++++++++++++++- > include/asm-x86/kvm_host.h | 15 +++ > include/asm-x86/kvm_para.h | 8 ++ > 3 files changed, 233 insertions(+), 1 deletions(-) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index fb9b329..94ee4db 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -24,8 +24,11 @@ > #include <linux/interrupt.h> > #include <linux/kvm.h> > #include <linux/fs.h> > +#include <linux/list.h> > +#include <linux/pci.h> > #include <linux/vmalloc.h> > #include <linux/module.h> > +#include <linux/highmem.h> > #include <linux/mman.h> > #include <linux/highmem.h> > > @@ -76,6 +79,9 @@ struct kvm_stats_debugfs_item debugfs_entries[] = { > { "halt_exits", VCPU_STAT(halt_exits) }, > { "halt_wakeup", VCPU_STAT(halt_wakeup) }, > { "hypercalls", VCPU_STAT(hypercalls) }, > + { "hypercall_map", VCPU_STAT(hypercall_map) }, > + { "hypercall_unmap", VCPU_STAT(hypercall_unmap) }, > + { "hypercall_pv_dev", VCPU_STAT(hypercall_pv_dev) }, > { "request_irq", VCPU_STAT(request_irq_exits) }, > { "irq_exits", VCPU_STAT(irq_exits) }, > { "host_state_reload", VCPU_STAT(host_state_reload) }, > @@ -95,9 +101,164 @@ struct kvm_stats_debugfs_item debugfs_entries[] = { > { NULL } > }; > > +static struct kvm_pv_dma_map* > +find_pci_pv_dmap(struct list_head *head, dma_addr_t dma) > +{ might be better to prefix those functions with kvm? Even though they are static, it seems to be the current practice. > +static int pv_map_hypercall(struct kvm_vcpu *vcpu, int npages, gfn_t page_gfn) > +{ > + int i, r = 0; > + struct page *host_page; > + struct scatterlist *sg; > + struct kvm_pv_dma_map *dmap; > + unsigned long *shared_addr, *hcall_page; > + > + /* We currently don't support dma mappings which have more than > + * PAGE_SIZE/sizeof(unsigned long *) pages > + */ > + if (!npages || npages > MAX_PVDMA_PAGES) { > + printk(KERN_INFO "%s: Illegal number of pages: %d\n", > + __func__, npages); > + goto out; > + } > + > + host_page = gfn_to_page(vcpu->kvm, page_gfn); you need mmap_sem held for read to use gfn_to_page. > + if (is_error_page(host_page)) { > + printk(KERN_INFO "%s: Bad gfn %p\n", __func__, > + (void *)page_gfn); > + goto out; > + } > + hcall_page = shared_addr = kmap(host_page); > + > + /* scatterlist to map guest dma pages into host physical > + * memory -- if they exceed the DMA map limit > + */ > + sg = kcalloc(npages, sizeof(struct scatterlist), GFP_KERNEL); > + if (sg == NULL) { > + printk(KERN_INFO "%s: Couldn't allocate memory (sg)\n", > + __func__); > + goto out_unmap; > + } > + > + /* List to store all guest pages mapped into host. This will > + * be used later to free pages on the host. Think of this as a > + * translation table from guest dma addresses into host dma > + * addresses > + */ > + dmap = kzalloc(sizeof(*dmap), GFP_KERNEL); > + if (dmap == NULL) { > + printk(KERN_INFO "%s: Couldn't allocate memory\n", > + __func__); > + goto out_unmap_sg; > + } > + > + /* FIXME: consider the length of the last page. Guest should > + * send this info. > + */ > + for (i = 0; i < npages; i++) { > + struct page *page; > + gpa_t gpa; > + > + gpa = *shared_addr++; > + page = gfn_to_page(vcpu->kvm, gpa >> PAGE_SHIFT); care for locking here too. > + if (is_error_page(page)) { > + int j; > + printk(KERN_INFO "kvm %s: gpa %p not valid\n", > + __func__, (void *)gpa); > + > + for (j = 0; j < i; j++) > + put_page(sg_page(&sg[j])); > + goto out_unmap_sg_dmap; > + } > + prepare_sg_entry(&sg[i], page); > + get_page(sg_page(&sg[i])); > + } > + > + /* Put this on the dmap_head list, so that we can find it > + * later for the 'free' operation > + */ > + dmap->sg = sg; > + dmap->nents = npages; > + list_add(&dmap->list, &vcpu->kvm->arch.pci_pv_dmap_head); > + > + /* FIXME: guest should send the direction */ > + r = dma_ops->map_sg(NULL, sg, npages, PCI_DMA_BIDIRECTIONAL); > + if (r) { > + r = npages; > + *hcall_page = sg[0].dma_address | (*hcall_page & ~PAGE_MASK); > + } > + > + out_unmap: > + if (!r) > + *hcall_page = bad_dma_address; > + kunmap(host_page); > + out: > + ++vcpu->stat.hypercall_map; > + return r; > + out_unmap_sg_dmap: > + kfree(dmap); > + out_unmap_sg: > + kfree(sg); > + goto out_unmap; those backwards goto are very clumsy. Might be better to give it a further attention in order to avoid id. > +} > + > +static int free_dmap(struct kvm_pv_dma_map *dmap, struct list_head *head) > +{ > + int i; > + > + if (!dmap) > + return 1; that's ugly. it's better to keep the free function with free-like semantics: just a void function that plainly returns if !dmap, and check in the caller. > +static int > +pv_mapped_pci_device_hypercall(struct kvm_vcpu *vcpu, gfn_t page_gfn) > +{ > + int r = 0; > + unsigned long *shared_addr; > + struct page *host_page; > + struct kvm_pci_pt_info pci_pt_info; > + > + host_page = gfn_to_page(vcpu->kvm, page_gfn); locking > + if (is_error_page(host_page)) { > + printk(KERN_INFO "%s: gfn %p not valid\n", > + __func__, (void *)page_gfn); > + r = -1; r = -1 is not really informative. Better use some meaningful error. We can return here, and avoid this goto if we always increment the hypercall counter in the beginning of the function. But this is nitpicking. > + goto out; > + } > + shared_addr = kmap(host_page); > + memcpy(&pci_pt_info, shared_addr, sizeof(pci_pt_info)); > + > + if (find_pci_pt_dev(&vcpu->kvm->arch.pci_pt_dev_head, > + &pci_pt_info, 0, KVM_PT_SOURCE_ASSIGN)) > + r++; /* We have assigned the device */ > + > + kunmap(host_page); better use atomic mappings here. |
From: Joerg R. <joe...@am...> - 2008-04-29 14:42:07
|
On Tue, Apr 29, 2008 at 03:07:25PM +0200, Jan Kiszka wrote: > Hi, > > looks like we are getting better and better here in hitting yet > unsupported corner-case features of KVM :). This time our guest fiddles > with hardware debugging registers, but quickly gets unhappy as they do > not yet have the expected effect. KVM is mostly tested with guests that run with paging. So a 16 bit protected mode guest is not tested very well :) > Joerg, I found you SVM-related patch series in the archive which does > not seem to have raised much responses. Is this general direction OK? > Does it allow self-debugging of guests? But how are conflicts resolved > if both guest and host need the physical registers (host debugging the > guest which is debugging itself)? I sent a patchset in the past to enable guest debugging for SVM which means debugging the guest from outside using gdb. But I was not able to test these patches because the userspace side of guest debugging is broken in the kvm-qemu. Debugging in the guest should work without problems. The debug registers are switched between guest and host if the guest uses them. So there should be no problems when the guest and the host using the debug registers. > I would try to dig into the VMX side if the general architecture is > -mostly- clear. [ Sorry, Joerg, someone put the latter type of HW on my > desk :->. Hope I can once check our stuff against SVM as well! ] With some debug output from SVM I can better help to debug your problems ;-) Joerg -- | AMD Saxony Limited Liability Company & Co. KG Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany System | Register Court Dresden: HRA 4896 Research | General Partner authorized to represent: Center | AMD Saxony LLC (Wilmington, Delaware, US) | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy |
From: Anthony L. <ali...@us...> - 2008-04-29 14:34:08
|
This patch allows VMA's that contain no backing page to be used for guest memory. This is a drop-in replacement for Ben-Ami's first page in his direct mmio series. Here, we continue to allow mmio pages to be represented in the rmap. Signed-off-by: Anthony Liguori <ali...@us...> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index f095b73..11b26f5 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -531,6 +531,7 @@ pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn) struct page *page[1]; unsigned long addr; int npages; + pfn_t pfn; might_sleep(); @@ -543,19 +544,36 @@ pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn) npages = get_user_pages(current, current->mm, addr, 1, 1, 1, page, NULL); - if (npages != 1) { - get_page(bad_page); - return page_to_pfn(bad_page); - } + if (unlikely(npages != 1)) { + struct vm_area_struct *vma; + + vma = find_vma(current->mm, addr); + if (vma == NULL) { + get_page(bad_page); + return page_to_pfn(bad_page); + } + + BUG_ON(!(vma->vm_flags & VM_IO)); - return page_to_pfn(page[0]); + pfn = ((addr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff; + BUG_ON(pfn_valid(pfn)); + } else + pfn = page_to_pfn(page[0]); + + return pfn; } EXPORT_SYMBOL_GPL(gfn_to_pfn); struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn) { - return pfn_to_page(gfn_to_pfn(kvm, gfn)); + pfn_t pfn; + + pfn = gfn_to_pfn(kvm, gfn); + if (pfn_valid(pfn)) + return pfn_to_page(pfn); + + return NULL; } EXPORT_SYMBOL_GPL(gfn_to_page); @@ -568,7 +586,8 @@ EXPORT_SYMBOL_GPL(kvm_release_page_clean); void kvm_release_pfn_clean(pfn_t pfn) { - put_page(pfn_to_page(pfn)); + if (pfn_valid(pfn)) + put_page(pfn_to_page(pfn)); } EXPORT_SYMBOL_GPL(kvm_release_pfn_clean); @@ -593,21 +612,25 @@ EXPORT_SYMBOL_GPL(kvm_set_page_dirty); void kvm_set_pfn_dirty(pfn_t pfn) { - struct page *page = pfn_to_page(pfn); - if (!PageReserved(page)) - SetPageDirty(page); + if (pfn_valid(pfn)) { + struct page *page = pfn_to_page(pfn); + if (!PageReserved(page)) + SetPageDirty(page); + } } EXPORT_SYMBOL_GPL(kvm_set_pfn_dirty); void kvm_set_pfn_accessed(pfn_t pfn) { - mark_page_accessed(pfn_to_page(pfn)); + if (pfn_valid(pfn)) + mark_page_accessed(pfn_to_page(pfn)); } EXPORT_SYMBOL_GPL(kvm_set_pfn_accessed); void kvm_get_pfn(pfn_t pfn) { - get_page(pfn_to_page(pfn)); + if (pfn_valid(pfn)) + get_page(pfn_to_page(pfn)); } EXPORT_SYMBOL_GPL(kvm_get_pfn); |
From: Amit S. <ami...@qu...> - 2008-04-29 14:00:27
|
On Tuesday 29 April 2008 18:44:23 Andi Kleen wrote: > Amit Shah <ami...@qu...> writes: > > diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c > > index 388b113..678cafb 100644 > > --- a/arch/x86/kernel/pci-dma.c > > +++ b/arch/x86/kernel/pci-dma.c > > @@ -443,6 +443,17 @@ dma_alloc_coherent(struct device *dev, size_t size, > > dma_addr_t *dma_handle, memset(memory, 0, size); > > if (!mmu) { > > *dma_handle = bus; > > + if (unlikely(dma_ops->is_pv_device) && > > + unlikely(dma_ops->is_pv_device(dev, dev->bus_id))) { > > First double unlikely in a condition is useless. Just drop them. > > And then ->is_xyz() in a generic vops interface is about as ugly > and non generic as you can get. dma_alloc_coherent is not performance > critical, so you should rather change the interface that ->alloc_coherent > is always called and the other handlers handle the !mmu case correctly. > In fact they need that already I guess (e.g. on DMAR there is not really > a nommu case) This point came up the last time I sent out the patch; we should do this as well as implement stackable dma_ops (the need for that is evident in the next patch). Thanks for the observation; this should be the next step. Amit. |
From: Amit S. <ami...@qu...> - 2008-04-29 13:58:44
|
On Tuesday 29 April 2008 19:01:32 Andi Kleen wrote: > Amit Shah <ami...@qu...> writes: > > +const struct dma_mapping_ops *orig_dma_ops; > > I suspect real dma ops stacking will need some further thought than > your simple hacks Yes; that's something we're planning to do. > Haven't read further, but to be honest the code doesn't seem to be anywhere > near merging quality. I'm basically using these patches to test the PCI passthrough functionality (by which we can assign host PCI devices to a guest OS via KVM). While other methods of handling DMA operations are being worked on (1-1 mapping of the guest in the host address space and virtualization-aware IOMMU translations), this patchset provides a quick way to check if things indeed work. However, if some version of this patch can be useful upstream, I'll be glad to work on that. That said, as you point out, we need stackable dma ops as well as getting rid of the is_pv_device() function in dma_ops and that's something that can be done right away. Thanks for the review! Amit |
From: Andrea A. <an...@qu...> - 2008-04-29 13:35:47
|
Hi Hugh!! On Tue, Apr 29, 2008 at 11:49:11AM +0100, Hugh Dickins wrote: > [I'm scarcely following the mmu notifiers to-and-fro, which seems > to be in good hands, amongst faster thinkers than me: who actually > need and can test this stuff. Don't let me slow you down; but I > can quickly clarify on this history.] Still I think it'd be great if you could review mmu-notifier-core v14. You and Nick are the core VM maintainers so it'd be great to hear any feedback about it. I think it's fairly easy to classify the patch as obviously safe as long as mmu notifiers are disarmed. Here a link for your convenience. http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.25/mmu-notifier-v14/mmu-notifier-core > No, the locking was different as you had it, Andrea: there was an extra > bitspin lock, carried over from the pte_chains days (maybe we changed > the name, maybe we disagreed over the name, I forget), which mainly > guarded the page->mapcount. I thought that was one lock more than we > needed, and eliminated it in favour of atomic page->mapcount in 2.6.9. Thanks a lot for the explanation! |
From: Andi K. <an...@fi...> - 2008-04-29 13:35:35
|
Amit Shah <ami...@qu...> writes: > + > +static struct page *page; > +static unsigned long page_gfn; Bad variable names > + > +const struct dma_mapping_ops *orig_dma_ops; I suspect real dma ops stacking will need some further thought than your simple hacks > + > + match = find_matching_pt_dev(&pt_devs_head, &pv_pci_info); > + if (match) { > + r = match->is_pv; > + goto out; > + } > + > + memcpy(page_address(page), &pv_pci_info, sizeof(pv_pci_info)); Note that on 32bit page_address() might be not mapped. > + > + npages = get_order(size) + 1; Are you sure that's correct? It looks quite bogus. order is a 2 logarithm, normally npages = 1 << order if you want npages from order the correct need 1 << order Haven't read further, but to be honest the code doesn't seem to be anywhere near merging quality. -Andi |
From: Andi K. <an...@fi...> - 2008-04-29 13:16:09
|
Amit Shah <ami...@qu...> writes: > diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c > index 388b113..678cafb 100644 > --- a/arch/x86/kernel/pci-dma.c > +++ b/arch/x86/kernel/pci-dma.c > @@ -443,6 +443,17 @@ dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle, > memset(memory, 0, size); > if (!mmu) { > *dma_handle = bus; > + if (unlikely(dma_ops->is_pv_device) && > + unlikely(dma_ops->is_pv_device(dev, dev->bus_id))) { First double unlikely in a condition is useless. Just drop them. And then ->is_xyz() in a generic vops interface is about as ugly and non generic as you can get. dma_alloc_coherent is not performance critical, so you should rather change the interface that ->alloc_coherent is always called and the other handlers handle the !mmu case correctly. In fact they need that already I guess (e.g. on DMAR there is not really a nommu case) -Andi |
From: Andi K. <an...@fi...> - 2008-04-29 13:15:59
|
Amit Shah <ami...@qu...> writes: > This patchset implements PVDMA for handling DMA requests from > devices assigned to the guest from the host machine. You forgot to post a high level design overview of how this works, what it is good for, what are the design trade offs etc.? Include that in the first patch. -Andi |
From: Jan K. <jan...@si...> - 2008-04-29 13:12:39
|
Hi, looks like we are getting better and better here in hitting yet unsupported corner-case features of KVM :). This time our guest fiddles with hardware debugging registers, but quickly gets unhappy as they do not yet have the expected effect. Joerg, I found you SVM-related patch series in the archive which does not seem to have raised much responses. Is this general direction OK? Does it allow self-debugging of guests? But how are conflicts resolved if both guest and host need the physical registers (host debugging the guest which is debugging itself)? I would try to dig into the VMX side if the general architecture is -mostly- clear. [ Sorry, Joerg, someone put the latter type of HW on my desk :->. Hope I can once check our stuff against SVM as well! ] Thanks, Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux |
From: Guillaume T. <gui...@ex...> - 2008-04-29 13:07:13
|
Hello, This patch should solve the problem observed during protected mode transitions that appears for example during the installation of openSuse-10.3. Unfortunately there is an issue that crashes kvm-userspace. I'm not sure if it's a problem introduced by the patch or if the patch is good and raises a new issue. Here is what I'm doing: 1) Remove the SS patching that modifies SS_SELECTOR in enter_pmode() to see vmentry failure. 2) Add the handler that catches the VMentry failure. It is called handle_vmentry_failure() 3) while CS.RPL != SS.RPL, emulate the instruction. 4) Add the emulation of "ljmp", "mov r, imm", "mov sreg, r/m16" and "mov r/m16, sreg" that have respectively opcode 0xea, 0xb8, 0x8e and 0x8c. Normally, it should be sufficient to boot openSuse-10.3 because instructions that need to be emulated are: 0x0000000000046e53: ljmp $0x18,$0x6e18 0x0000000000046e58: mov $0x20,%ax 0x0000000000046e5c: mov %eax,%ds 0x0000000000046e5e: mov %ss,%eax 0x0000000000046e60: and $0xffff,%esp 0x0000000000046e66: shl $0x4,%eax 0x0000000000046e69: add %eax,%esp 0x0000000000046e6b: mov $0x8,%ax 0x0000000000046e6f: mov %eax,%ss At this point, cs.rpl is equal to ss.rpl. I added trace in handle_vmentry_failure() and also in writeback() to see what functions are emulated and I observe: [82766.614575] Failed vm entry (exit reason 0x21) invalid guest state [82766.651046] emulation at (46e53) rip 6e13: ea 18 6e 18 [82766.682611] writeback: dst.byte 0 [82766.706180] writeback: dst.ptr 0x0000000000000000 [82766.734890] writeback: dst.val 0x0 [82766.758591] writeback: src.ptr 0x0000000000000000 [82766.790594] writeback: src.val 0x0 [82766.855058] successfully emulated instruction [82766.882695] Failed vm entry (exit reason 0x21) invalid guest state [82766.923061] emulation at (46e58) rip 6e18: 66 b8 20 00 [82766.951079] writeback: dst.byte 2 [82766.975074] writeback: dst.ptr 0xffff810324d07400 [82767.003112] writeback: dst.val 0x20 [82767.027100] writeback: src.ptr 0x0000000000006e1a [82767.059092] writeback: src.val 0x20 [82767.127094] successfully emulated instruction [82767.151111] Failed vm entry (exit reason 0x21) invalid guest state [82767.191099] emulation at (46e5c) rip 6e1c: 8e d8 8c d0 [82767.219156] writeback: dst.byte 4 [82767.243118] writeback: dst.ptr 0xffff810324d07418 [82767.275091] writeback: dst.val 0x800000 [82767.299122] writeback: src.ptr 0x0000000000000000 [82767.331106] writeback: src.val 0x20 [82767.395255] successfully emulated instruction [82767.423135] Failed vm entry (exit reason 0x21) invalid guest state [82767.459260] emulation at (46e5e) rip 6e1e: 8c d0 81 e4 [82767.491137] writeback: dst.byte 2 [82767.515117] writeback: dst.ptr 0xffff810324d07400 [82767.543138] writeback: dst.val 0x53e1 [82767.567264] writeback: src.ptr 0xffff810324d07410 [82767.599142] writeback: src.val 0x20 [82767.667146] successfully emulated instruction [82767.691277] Failed vm entry (exit reason 0x21) invalid guest state [82767.731152] emulation at (46e60) rip 6e20: 81 e4 ff ff [82767.763136] writeback: dst.byte 0 [82767.783154] writeback: dst.ptr 0x0000000000000000 [82767.815157] writeback: dst.val 0x2004 [82767.839156] writeback: src.ptr 0x0000000000006e22 [82767.871140] writeback: src.val 0xffff [82767.939170] successfully emulated instruction [82767.963307] Failed vm entry (exit reason 0x21) invalid guest state [82768.003174] emulation at (46e66) rip 6e26: c1 e0 04 01 [82768.035153] writeback: dst.byte 0 [82768.055174] writeback: dst.ptr 0x0000000000000000 [82768.087177] writeback: dst.val 0x53e1 [82768.111178] writeback: src.ptr 0x0000000000006e28 [82768.143157] writeback: src.val 0x4 [82768.211151] successfully emulated instruction [82768.235189] Failed vm entry (exit reason 0x21) invalid guest state [82768.271311] emulation at (46e69) rip 6e29: 01 c4 66 b8 [82768.303214] writeback: dst.byte 0 [82768.327213] writeback: dst.ptr 0x0000000000000000 [82768.355238] writeback: dst.val 0x2004 [82768.379316] writeback: src.ptr 0xffff810324d07400 [82768.411227] writeback: src.val 0x53e1 [82768.483168] successfully emulated instruction [82768.507240] Failed vm entry (exit reason 0x21) invalid guest state [82768.543329] emulation at (46e6b) rip 6e2b: 66 b8 08 00 [82768.575239] writeback: dst.byte 2 [82768.599233] writeback: dst.ptr 0xffff810324d07400 [82768.627257] writeback: dst.val 0x8 [82768.651246] writeback: src.ptr 0x0000000000006e2d [82768.683245] writeback: src.val 0x8 [82768.751250] successfully emulated instruction [82768.775331] Failed vm entry (exit reason 0x21) invalid guest state [82768.815256] emulation at (46e6f) rip 6e2f: 8e d0 8e c0 [82768.843348] writeback: dst.byte 4 [82768.867268] writeback: dst.ptr 0xffff810324d07410 [82768.899204] writeback: dst.val 0x53e1 [82768.923259] writeback: src.ptr 0x0000000000000000 [82768.951351] writeback: src.val 0x8 [82769.019279] successfully emulated instruction So everything seems ok but after the emulation of "mov %eax,%ss" instruction, it seems that cs.rpl == ss.rpl but the guest is still in a VT-unfriendly state because I have the following error in kvm-userspace: [guill@enterprise][~/local/kvm-userspace.git/bin]$ ./qemu-system-x86_64 -hda ~/disk_images/hd_50G.qcow2 -cdrom /images_iso/openSUSE-10.3-GM-x86_64-mini.iso -boot d -s -m 1024 exception 13 (33) rax 0000000000000673 rbx 0000000000800000 rcx 0000000000000000 rdx 00000000000013ca rsi 0000000000055e1c rdi 0000000000055e1d rsp 00000000fffa0080 rbp 000000000000200b r8 0000000000000000 r9 0000000000000000 r10 0000000000000000 r11 0000000000000000 r12 0000000000000000 r13 0000000000000000 r14 0000000000000000 r15 0000000000000000 rip 000000000000b071 rflags 00033092 cs 4004 (00040040/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) ds 4004 (00040040/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) es 00ff (00000ff0/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) ss ff11 (000ff110/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) fs 3002 (00030020/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) gs 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) tr 0000 (fffbd000/00002088 p 1 dpl 0 db 0 s 0 type b l 0 g 0 avl 0) ldt 0000 (00000000/0000ffff p 1 dpl 0 db 0 s 0 type 2 l 0 g 0 avl 0) gdt 40920/47 idt 0/ffff cr0 10 cr2 0 cr3 0 cr4 0 cr8 0 efer 0 code: 17 06 29 4b 01 18 eb 18 a8 25 aa 19 28 4c 01 28 4d 01 01 17 --> 0f 17 0f 01 17 0f 17 12 01 17 2c 25 4b 19 21 00 02 17 1a 94 0a 76 67 61 3d 30 78 25 78 20 Aborted It's strange because handle_vmentry_failure() is not called. I'm trying to see where is the problem, any comments are welcome Regards, Guillaume arch/x86/kvm/vmx.c | 68 +++++++++++++++++++++++++++ arch/x86/kvm/vmx.h | 3 + arch/x86/kvm/x86.c | 12 ++-- arch/x86/kvm/x86_emulate.c | 112 +++++++++++++++++++++++++++++++++++++++++++-- include/asm-x86/kvm_host.h | 4 + 5 files changed, 190 insertions(+), 9 deletions(-) --- diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 79cdbe8..a0a13b8 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -1272,7 +1272,9 @@ static void enter_pmode(struct kvm_vcpu *vcpu) fix_pmode_dataseg(VCPU_SREG_GS, &vcpu->arch.rmode.gs); fix_pmode_dataseg(VCPU_SREG_FS, &vcpu->arch.rmode.fs); +#if 0 vmcs_write16(GUEST_SS_SELECTOR, 0); +#endif vmcs_write32(GUEST_SS_AR_BYTES, 0x93); vmcs_write16(GUEST_CS_SELECTOR, @@ -2635,6 +2637,66 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) return 1; } +static int invalid_guest_state(struct kvm_vcpu *vcpu, + struct kvm_run *kvm_run, u32 failure_reason) +{ + u16 ss, cs; + u8 opcodes[4]; + unsigned long rip = vcpu->arch.rip; + unsigned long rip_linear; + + ss = vmcs_read16(GUEST_SS_SELECTOR); + cs = vmcs_read16(GUEST_CS_SELECTOR); + + if ((ss & 0x03) != (cs & 0x03)) { + int err; + rip_linear = rip + vmx_get_segment_base(vcpu, VCPU_SREG_CS); + emulator_read_std(rip_linear, (void *)opcodes, 4, vcpu); + printk(KERN_INFO "emulation at (%lx) rip %lx: %02x %02x %02x %02x\n", + rip_linear, + rip, opcodes[0], opcodes[1], opcodes[2], opcodes[3]); + err = emulate_instruction(vcpu, kvm_run, 0, 0, 0); + switch (err) { + case EMULATE_DONE: + printk(KERN_INFO "successfully emulated instruction\n"); + return 1; + case EMULATE_DO_MMIO: + printk(KERN_INFO "mmio?\n"); + return 0; + default: + kvm_report_emulation_failure(vcpu, "vmentry failure"); + break; + } + } + + kvm_run->exit_reason = KVM_EXIT_UNKNOWN; + kvm_run->hw.hardware_exit_reason = failure_reason; + return 0; +} + +static int handle_vmentry_failure(struct kvm_vcpu *vcpu, + struct kvm_run *kvm_run, + u32 failure_reason) +{ + unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION); + + printk(KERN_INFO "Failed vm entry (exit reason 0x%x) ", failure_reason); + switch (failure_reason) { + case EXIT_REASON_INVALID_GUEST_STATE: + printk("invalid guest state \n"); + return invalid_guest_state(vcpu, kvm_run, failure_reason); + case EXIT_REASON_MSR_LOADING: + printk("caused by MSR entry %ld loading.\n", exit_qualification); + break; + case EXIT_REASON_MACHINE_CHECK: + printk("caused by machine check.\n"); + break; + default: + printk("reason not known yet!\n"); + break; + } + return 0; +} /* * The exit handlers return 1 if the exit was handled fully and guest execution * may resume. Otherwise they set the kvm_run parameter to indicate what needs @@ -2696,6 +2758,12 @@ static int kvm_handle_exit(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) exit_reason != EXIT_REASON_EPT_VIOLATION)) printk(KERN_WARNING "%s: unexpected, valid vectoring info and " "exit reason is 0x%x\n", __func__, exit_reason); + + if ((exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY)) { + exit_reason &= ~VMX_EXIT_REASONS_FAILED_VMENTRY; + return handle_vmentry_failure(vcpu, kvm_run, exit_reason); + } + if (exit_reason < kvm_vmx_max_exit_handlers && kvm_vmx_exit_handlers[exit_reason]) return kvm_vmx_exit_handlers[exit_reason](vcpu, kvm_run); diff --git a/arch/x86/kvm/vmx.h b/arch/x86/kvm/vmx.h index 79d94c6..2cebf48 100644 --- a/arch/x86/kvm/vmx.h +++ b/arch/x86/kvm/vmx.h @@ -238,7 +238,10 @@ enum vmcs_field { #define EXIT_REASON_IO_INSTRUCTION 30 #define EXIT_REASON_MSR_READ 31 #define EXIT_REASON_MSR_WRITE 32 +#define EXIT_REASON_INVALID_GUEST_STATE 33 +#define EXIT_REASON_MSR_LOADING 34 #define EXIT_REASON_MWAIT_INSTRUCTION 36 +#define EXIT_REASON_MACHINE_CHECK 41 #define EXIT_REASON_TPR_BELOW_THRESHOLD 43 #define EXIT_REASON_APIC_ACCESS 44 #define EXIT_REASON_EPT_VIOLATION 48 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 578a0c1..9e5d687 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3027,8 +3027,8 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) return 0; } -static void get_segment(struct kvm_vcpu *vcpu, - struct kvm_segment *var, int seg) +void get_segment(struct kvm_vcpu *vcpu, + struct kvm_segment *var, int seg) { kvm_x86_ops->get_segment(vcpu, var, seg); } @@ -3111,8 +3111,8 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu, return 0; } -static void set_segment(struct kvm_vcpu *vcpu, - struct kvm_segment *var, int seg) +void set_segment(struct kvm_vcpu *vcpu, + struct kvm_segment *var, int seg) { kvm_x86_ops->set_segment(vcpu, var, seg); } @@ -3270,8 +3270,8 @@ static int load_segment_descriptor_to_kvm_desct(struct kvm_vcpu *vcpu, return 0; } -static int load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, - int type_bits, int seg) +int load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, + int type_bits, int seg) { struct kvm_segment kvm_seg; diff --git a/arch/x86/kvm/x86_emulate.c b/arch/x86/kvm/x86_emulate.c index 2ca0838..f6b9dad 100644 --- a/arch/x86/kvm/x86_emulate.c +++ b/arch/x86/kvm/x86_emulate.c @@ -138,7 +138,8 @@ static u16 opcode_table[256] = { /* 0x88 - 0x8F */ ByteOp | DstMem | SrcReg | ModRM | Mov, DstMem | SrcReg | ModRM | Mov, ByteOp | DstReg | SrcMem | ModRM | Mov, DstReg | SrcMem | ModRM | Mov, - 0, ModRM | DstReg, 0, Group | Group1A, + DstMem | SrcReg | ModRM | Mov, ModRM | DstReg, + DstReg | SrcMem | ModRM | Mov, Group | Group1A, /* 0x90 - 0x9F */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ImplicitOps | Stack, ImplicitOps | Stack, 0, 0, @@ -152,7 +153,8 @@ static u16 opcode_table[256] = { ByteOp | ImplicitOps | Mov | String, ImplicitOps | Mov | String, ByteOp | ImplicitOps | String, ImplicitOps | String, /* 0xB0 - 0xBF */ - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, + DstReg | SrcImm | Mov, 0, 0, 0, 0, 0, 0, 0, /* 0xC0 - 0xC7 */ ByteOp | DstMem | SrcImm | ModRM, DstMem | SrcImmByte | ModRM, 0, ImplicitOps | Stack, 0, 0, @@ -168,7 +170,7 @@ static u16 opcode_table[256] = { /* 0xE0 - 0xE7 */ 0, 0, 0, 0, 0, 0, 0, 0, /* 0xE8 - 0xEF */ - ImplicitOps | Stack, SrcImm|ImplicitOps, 0, SrcImmByte|ImplicitOps, + ImplicitOps | Stack, SrcImm | ImplicitOps, ImplicitOps, SrcImmByte | ImplicitOps, 0, 0, 0, 0, /* 0xF0 - 0xF7 */ 0, 0, 0, 0, @@ -1511,14 +1513,90 @@ special_insn: break; case 0x88 ... 0x8b: /* mov */ goto mov; + case 0x8c: { /* mov r/m, sreg */ + struct kvm_segment segreg; + + if (c->modrm_mod == 0x3) + c->src.val = c->modrm_val; + + switch ( c->modrm_reg ) { + case 0: + get_segment(ctxt->vcpu, &segreg, VCPU_SREG_ES); + break; + case 1: + get_segment(ctxt->vcpu, &segreg, VCPU_SREG_CS); + break; + case 2: + get_segment(ctxt->vcpu, &segreg, VCPU_SREG_SS); + break; + case 3: + get_segment(ctxt->vcpu, &segreg, VCPU_SREG_DS); + break; + case 4: + get_segment(ctxt->vcpu, &segreg, VCPU_SREG_FS); + break; + case 5: + get_segment(ctxt->vcpu, &segreg, VCPU_SREG_GS); + break; + default: + printk(KERN_INFO "0x8c: Invalid segreg in modrm byte 0x%02x\n", + c->modrm); + goto cannot_emulate; + } + c->dst.val = segreg.selector; + c->dst.bytes = 2; + c->dst.ptr = (unsigned long *)decode_register(c->modrm_rm, c->regs, + c->d & ByteOp); + break; + } case 0x8d: /* lea r16/r32, m */ c->dst.val = c->modrm_ea; break; + case 0x8e: { /* mov seg, r/m16 */ + uint16_t sel; + + sel = c->src.val; + switch ( c->modrm_reg ) { + case 0: + if (load_segment_descriptor(ctxt->vcpu, sel, 1, VCPU_SREG_ES) < 0) + goto cannot_emulate; + break; + case 1: + if (load_segment_descriptor(ctxt->vcpu, sel, 9, VCPU_SREG_CS) < 0) + goto cannot_emulate; + break; + case 2: + if (load_segment_descriptor(ctxt->vcpu, sel, 1, VCPU_SREG_SS) < 0) + goto cannot_emulate; + break; + case 3: + if (load_segment_descriptor(ctxt->vcpu, sel, 1, VCPU_SREG_DS) < 0) + goto cannot_emulate; + break; + case 4: + if (load_segment_descriptor(ctxt->vcpu, sel, 1, VCPU_SREG_FS) < 0) + goto cannot_emulate; + break; + case 5: + if (load_segment_descriptor(ctxt->vcpu, sel, 1, VCPU_SREG_GS) < 0) + goto cannot_emulate; + break; + default: + printk(KERN_INFO "Invalid segreg in modrm byte 0x%02x\n", + c->modrm); + goto cannot_emulate; + } + + c->dst.type = OP_NONE; /* Disable writeback. */ + break; + } case 0x8f: /* pop (sole member of Grp1a) */ rc = emulate_grp1a(ctxt, ops); if (rc != 0) goto done; break; + case 0xb8: /* mov r, imm */ + goto mov; case 0x9c: /* pushf */ c->src.val = (unsigned long) ctxt->eflags; emulate_push(ctxt); @@ -1657,6 +1735,34 @@ special_insn: break; } case 0xe9: /* jmp rel */ + jmp_rel(c, c->src.val); + c->dst.type = OP_NONE; /* Disable writeback. */ + break; + case 0xea: /* jmp far */ { + uint32_t eip; + uint16_t sel; + + switch (c->op_bytes) { + case 2: + eip = insn_fetch(u16, 2, c->eip); + eip = eip & 0x0000FFFF; /* clear upper 16 bits */ + break; + case 4: + eip = insn_fetch(u32, 4, c->eip); + break; + default: + DPRINTF("jmp far: Invalid op_bytes\n"); + goto cannot_emulate; + } + sel = insn_fetch(u16, 2, c->eip); + if (load_segment_descriptor(ctxt->vcpu, sel, 9, VCPU_SREG_CS) < 0) { + DPRINTF("jmp far: Failed to load CS descriptor\n"); + goto cannot_emulate; + } + + c->eip = eip; + break; + } case 0xeb: /* jmp rel short */ jmp_rel(c, c->src.val); c->dst.type = OP_NONE; /* Disable writeback. */ diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h index 4baa9c9..7a0846a 100644 --- a/include/asm-x86/kvm_host.h +++ b/include/asm-x86/kvm_host.h @@ -495,6 +495,10 @@ int emulator_get_dr(struct x86_emulate_ctxt *ctxt, int dr, int emulator_set_dr(struct x86_emulate_ctxt *ctxt, int dr, unsigned long value); +void set_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg); +void get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg); +int load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, + int type_bits, int seg); int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int reason); void kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0); |
From: Hugh D. <hu...@ve...> - 2008-04-29 11:00:14
|
On Tue, 29 Apr 2008, Andrea Arcangeli wrote: > > My point of view is that there was no rcu when I wrote that code, yet > there was no reference count and yet all locking looks still exactly > the same as I wrote it. There's even still the page_table_lock to > serialize threads taking the mmap_sem in read mode against the first > vma->anon_vma = anon_vma during the page fault. > > Frankly I've absolutely no idea why rcu is needed in all rmap code > when walking the page->mapping. Definitely the PG_locked is taken so > there's no way page->mapping could possibly go away under the rmap > code, hence the anon_vma can't go away as it's queued in the vma, and > the vma has to go away before the page is zapped out of the pte. [I'm scarcely following the mmu notifiers to-and-fro, which seems to be in good hands, amongst faster thinkers than me: who actually need and can test this stuff. Don't let me slow you down; but I can quickly clarify on this history.] No, the locking was different as you had it, Andrea: there was an extra bitspin lock, carried over from the pte_chains days (maybe we changed the name, maybe we disagreed over the name, I forget), which mainly guarded the page->mapcount. I thought that was one lock more than we needed, and eliminated it in favour of atomic page->mapcount in 2.6.9. Here's the relevant extracts from ChangeLog-2.6.9: [PATCH] rmaplock: PageAnon in mapping First of a batch of five patches to eliminate rmap's page_map_lock, replace its trylocking by spinlocking, and use anon_vma to speed up swapoff. Patches updated from the originals against 2.6.7-mm7: nothing new so I won't spam the list, but including Manfred's SLAB_DESTROY_BY_RCU fixes, and omitting the unuse_process mmap_sem fix already in 2.6.8-rc3. This patch: Replace the PG_anon page->flags bit by setting the lower bit of the pointer in page->mapping when it's anon_vma: PAGE_MAPPING_ANON bit. We're about to eliminate the locking which kept the flags and mapping in synch: it's much easier to work on a local copy of page->mapping, than worry about whether flags and mapping are in synch (though I imagine it could be done, at greater cost, with some barriers). [PATCH] rmaplock: kill page_map_lock The pte_chains rmap used pte_chain_lock (bit_spin_lock on PG_chainlock) to lock its pte_chains. We kept this (as page_map_lock: bit_spin_lock on PG_maplock) when we moved to objrmap. But the file objrmap locks its vma tree with mapping->i_mmap_lock, and the anon objrmap locks its vma list with anon_vma->lock: so isn't the page_map_lock superfluous? Pretty much, yes. The mapcount was protected by it, and needs to become an atomic: starting at -1 like page _count, so nr_mapped can be tracked precisely up and down. The last page_remove_rmap can't clear anon page mapping any more, because of races with page_add_rmap; from which some BUG_ONs must go for the same reason, but they've served their purpose. vmscan decisions are naturally racy, little change there beyond removing page_map_lock/unlock. But to stabilize the file-backed page->mapping against truncation while acquiring i_mmap_lock, page_referenced_file now needs page lock to be held even for refill_inactive_zone. There's a similar issue in acquiring anon_vma->lock, where page lock doesn't help: which this patch pretends to handle, but actually it needs the next. Roughly 10% cut off lmbench fork numbers on my 2*HT*P4. Must confess my testing failed to show the races even while they were knowingly exposed: would benefit from testing on racier equipment. [PATCH] rmaplock: SLAB_DESTROY_BY_RCU With page_map_lock gone, how to stabilize page->mapping's anon_vma while acquiring anon_vma->lock in page_referenced_anon and try_to_unmap_anon? The page cannot actually be freed (vmscan holds reference), but however much we check page_mapped (which guarantees that anon_vma is in use - or would guarantee that if we added suitable barriers), there's no locking against page becoming unmapped the instant after, then anon_vma freed. It's okay to take anon_vma->lock after it's freed, so long as it remains a struct anon_vma (its list would become empty, or perhaps reused for an unrelated anon_vma: but no problem since we always check that the page located is the right one); but corruption if that memory gets reused for some other purpose. This is not unique: it's liable to be problem whenever the kernel tries to approach a structure obliquely. It's generally solved with an atomic reference count; but one advantage of anon_vma over anonmm is that it does not have such a count, and it would be a backward step to add one. Therefore... implement SLAB_DESTROY_BY_RCU flag, to guarantee that such a kmem_cache_alloc'ed structure cannot get freed to other use while the rcu_read_lock is held i.e. preempt disabled; and use that for anon_vma. Fix concerns raised by Manfred: this flag is incompatible with poisoning and destructor, and kmem_cache_destroy needs to synchronize_kernel. I hope SLAB_DESTROY_BY_RCU may be useful elsewhere; but though it's safe for little anon_vma, I'd be reluctant to use it on any caches whose immediate shrinkage under pressure is important to the system. [PATCH] rmaplock: mm lock ordering With page_map_lock out of the way, there's no need for page_referenced and try_to_unmap to use trylocks - provided we switch anon_vma->lock and mm->page_table_lock around in anon_vma_prepare. Though I suppose it's possible that we'll find that vmscan makes better progress with trylocks than spinning - we're free to choose trylocks again if so. Try to update the mm lock ordering documentation in filemap.c. But I still find it confusing, and I've no idea of where to stop. So add an mm lock ordering list I can understand to rmap.c. [The fifth patch was about using anon_vma in swapoff, not relevant here.] So, going back to what you wrote: holding the page lock there is not enough to prevent the struct anon_vma going away beneath us. Hugh |
From: Mark M. <ma...@re...> - 2008-04-29 10:42:32
|
The -kernel option generates a new boot sector for the boot drive which jumps directly to the supplied kernel rather than running the standard bootloader. Trivially fix generate_bootsect() to handle the case where we're booting using extboot. Signed-off-by: Mark McLoughlin <ma...@re...> --- qemu/hw/pc.c | 7 ++++--- 1 files changed, 4 insertions(+), 3 deletions(-) diff --git a/qemu/hw/pc.c b/qemu/hw/pc.c index 48a36e0..506ef6b 100644 --- a/qemu/hw/pc.c +++ b/qemu/hw/pc.c @@ -405,11 +405,12 @@ static void generate_bootsect(uint32_t gpr[8], uint16_t segs[6], uint16_t ip) { uint8_t bootsect[512], *p; int i; - int hda; + int hda = extboot_drive; - hda = drive_get_index(IF_IDE, 0, 0); + if (hda == -1) + hda = drive_get_index(IF_IDE, 0, 0); if (hda == -1) { - fprintf(stderr, "A disk image must be given for 'hda' when booting " + fprintf(stderr, "-hda or -drive boot=on must be given when booting " "a Linux kernel\n"); exit(1); } -- 1.5.4.1 |
From: Amit S. <ami...@qu...> - 2008-04-29 10:37:34
|
dma_alloc_coherent() doesn't call dma_ops->alloc_coherent in case no IOMMU translations are necessary. However, if the device doing the DMA is a physical device assigned to the guest OS by the host, we need to map all the DMA addresses to the host machine addresses. This is done via hypercalls to the host. In KVM, with pci passthrough support, we can assign actual devices to the guest OS which need this functionality. Signed-off-by: Amit Shah <ami...@qu...> --- arch/x86/kernel/pci-dma.c | 11 +++++++++++ include/asm-x86/dma-mapping.h | 2 ++ 2 files changed, 13 insertions(+), 0 deletions(-) diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c index 388b113..678cafb 100644 --- a/arch/x86/kernel/pci-dma.c +++ b/arch/x86/kernel/pci-dma.c @@ -443,6 +443,17 @@ dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle, memset(memory, 0, size); if (!mmu) { *dma_handle = bus; + if (unlikely(dma_ops->is_pv_device) && + unlikely(dma_ops->is_pv_device(dev, dev->bus_id))) { + void *r; + r = dma_ops->alloc_coherent(dev, size, + dma_handle, gfp); + if (r == NULL) { + free_pages((unsigned long)memory, + get_order(size)); + memory = NULL; + } + } return memory; } } diff --git a/include/asm-x86/dma-mapping.h b/include/asm-x86/dma-mapping.h index a1a4dc7..b9c6a39 100644 --- a/include/asm-x86/dma-mapping.h +++ b/include/asm-x86/dma-mapping.h @@ -55,6 +55,8 @@ struct dma_mapping_ops { int direction); int (*dma_supported)(struct device *hwdev, u64 mask); int is_phys; + /* Is this a physical device in a paravirtualized guest? */ + int (*is_pv_device)(struct device *hwdev, const char *name); }; extern const struct dma_mapping_ops *dma_ops; -- 1.5.4.3 |
From: Amit S. <ami...@qu...> - 2008-04-29 10:37:33
|
We introduce three hypercalls: 1. When the guest wants to check if a particular device is an assigned device (this is done once per device by the guest to enable / disable hypercall- based translation of addresses) 2. map: to convert guest phyical addresses to host physical address to pass on to the device for DMA. We also pin the pages thus requested so that they're not swapped out. 3. unmap: to unpin the pages and free any information we might have stored. Signed-off-by: Amit Shah <ami...@qu...> --- arch/x86/kvm/x86.c | 211 +++++++++++++++++++++++++++++++++++++++++++- include/asm-x86/kvm_host.h | 15 +++ include/asm-x86/kvm_para.h | 8 ++ 3 files changed, 233 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index fb9b329..94ee4db 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -24,8 +24,11 @@ #include <linux/interrupt.h> #include <linux/kvm.h> #include <linux/fs.h> +#include <linux/list.h> +#include <linux/pci.h> #include <linux/vmalloc.h> #include <linux/module.h> +#include <linux/highmem.h> #include <linux/mman.h> #include <linux/highmem.h> @@ -76,6 +79,9 @@ struct kvm_stats_debugfs_item debugfs_entries[] = { { "halt_exits", VCPU_STAT(halt_exits) }, { "halt_wakeup", VCPU_STAT(halt_wakeup) }, { "hypercalls", VCPU_STAT(hypercalls) }, + { "hypercall_map", VCPU_STAT(hypercall_map) }, + { "hypercall_unmap", VCPU_STAT(hypercall_unmap) }, + { "hypercall_pv_dev", VCPU_STAT(hypercall_pv_dev) }, { "request_irq", VCPU_STAT(request_irq_exits) }, { "irq_exits", VCPU_STAT(irq_exits) }, { "host_state_reload", VCPU_STAT(host_state_reload) }, @@ -95,9 +101,164 @@ struct kvm_stats_debugfs_item debugfs_entries[] = { { NULL } }; +static struct kvm_pv_dma_map* +find_pci_pv_dmap(struct list_head *head, dma_addr_t dma) +{ + struct list_head *ptr; + struct kvm_pv_dma_map *match; + + list_for_each(ptr, head) { + match = list_entry(ptr, struct kvm_pv_dma_map, list); + if (match && match->sg[0].dma_address == dma) + return match; + } + return NULL; +} + +static void prepare_sg_entry(struct scatterlist *sg, struct page *page) +{ + unsigned int offset, len; + + offset = page_to_phys(page) & ~PAGE_MASK; + len = PAGE_SIZE - offset; + + /* FIXME: Use the sg chaining features */ + sg_set_page(sg, page, len, offset); +} + +static int pv_map_hypercall(struct kvm_vcpu *vcpu, int npages, gfn_t page_gfn) +{ + int i, r = 0; + struct page *host_page; + struct scatterlist *sg; + struct kvm_pv_dma_map *dmap; + unsigned long *shared_addr, *hcall_page; + + /* We currently don't support dma mappings which have more than + * PAGE_SIZE/sizeof(unsigned long *) pages + */ + if (!npages || npages > MAX_PVDMA_PAGES) { + printk(KERN_INFO "%s: Illegal number of pages: %d\n", + __func__, npages); + goto out; + } + + host_page = gfn_to_page(vcpu->kvm, page_gfn); + if (is_error_page(host_page)) { + printk(KERN_INFO "%s: Bad gfn %p\n", __func__, + (void *)page_gfn); + goto out; + } + hcall_page = shared_addr = kmap(host_page); + + /* scatterlist to map guest dma pages into host physical + * memory -- if they exceed the DMA map limit + */ + sg = kcalloc(npages, sizeof(struct scatterlist), GFP_KERNEL); + if (sg == NULL) { + printk(KERN_INFO "%s: Couldn't allocate memory (sg)\n", + __func__); + goto out_unmap; + } + + /* List to store all guest pages mapped into host. This will + * be used later to free pages on the host. Think of this as a + * translation table from guest dma addresses into host dma + * addresses + */ + dmap = kzalloc(sizeof(*dmap), GFP_KERNEL); + if (dmap == NULL) { + printk(KERN_INFO "%s: Couldn't allocate memory\n", + __func__); + goto out_unmap_sg; + } + + /* FIXME: consider the length of the last page. Guest should + * send this info. + */ + for (i = 0; i < npages; i++) { + struct page *page; + gpa_t gpa; + + gpa = *shared_addr++; + page = gfn_to_page(vcpu->kvm, gpa >> PAGE_SHIFT); + if (is_error_page(page)) { + int j; + printk(KERN_INFO "kvm %s: gpa %p not valid\n", + __func__, (void *)gpa); + + for (j = 0; j < i; j++) + put_page(sg_page(&sg[j])); + goto out_unmap_sg_dmap; + } + prepare_sg_entry(&sg[i], page); + get_page(sg_page(&sg[i])); + } + + /* Put this on the dmap_head list, so that we can find it + * later for the 'free' operation + */ + dmap->sg = sg; + dmap->nents = npages; + list_add(&dmap->list, &vcpu->kvm->arch.pci_pv_dmap_head); + + /* FIXME: guest should send the direction */ + r = dma_ops->map_sg(NULL, sg, npages, PCI_DMA_BIDIRECTIONAL); + if (r) { + r = npages; + *hcall_page = sg[0].dma_address | (*hcall_page & ~PAGE_MASK); + } + + out_unmap: + if (!r) + *hcall_page = bad_dma_address; + kunmap(host_page); + out: + ++vcpu->stat.hypercall_map; + return r; + out_unmap_sg_dmap: + kfree(dmap); + out_unmap_sg: + kfree(sg); + goto out_unmap; +} + +static int free_dmap(struct kvm_pv_dma_map *dmap, struct list_head *head) +{ + int i; + + if (!dmap) + return 1; + + for (i = 0; i < dmap->nents; i++) + put_page(sg_page(&dmap->sg[i])); + + if (dma_ops->unmap_sg) + dma_ops->unmap_sg(NULL, dmap->sg, dmap->nents, + PCI_DMA_BIDIRECTIONAL); + kfree(dmap->sg); + list_del(&dmap->list); + kfree(dmap); + + return 0; +} + +/* FIXME: the argument passed from guest can be 32-bit. We need 64-bit for + * dma_addr_t. Send the dma address in a page (or split in two registers) + */ +static int pv_unmap_hypercall(struct kvm_vcpu *vcpu, dma_addr_t dma) +{ + struct kvm_pv_dma_map *dmap; + + ++vcpu->stat.hypercall_unmap; + + dmap = find_pci_pv_dmap(&vcpu->kvm->arch.pci_pv_dmap_head, dma); + return free_dmap(dmap, &vcpu->kvm->arch.pci_pv_dmap_head); +} + /* * Used to find a registered host PCI device (a "passthrough" device) - * during ioctls, interrupts or EOI + * during hypercalls, ioctls or interrupts or EOI */ static struct kvm_pci_pt_dev_list * find_pci_pt_dev(struct list_head *head, @@ -136,6 +297,34 @@ find_pci_pt_dev(struct list_head *head, return NULL; } +static int +pv_mapped_pci_device_hypercall(struct kvm_vcpu *vcpu, gfn_t page_gfn) +{ + int r = 0; + unsigned long *shared_addr; + struct page *host_page; + struct kvm_pci_pt_info pci_pt_info; + + host_page = gfn_to_page(vcpu->kvm, page_gfn); + if (is_error_page(host_page)) { + printk(KERN_INFO "%s: gfn %p not valid\n", + __func__, (void *)page_gfn); + r = -1; + goto out; + } + shared_addr = kmap(host_page); + memcpy(&pci_pt_info, shared_addr, sizeof(pci_pt_info)); + + if (find_pci_pt_dev(&vcpu->kvm->arch.pci_pt_dev_head, + &pci_pt_info, 0, KVM_PT_SOURCE_ASSIGN)) + r++; /* We have assigned the device */ + + kunmap(host_page); + out: + ++vcpu->stat.hypercall_pv_dev; + return r; +} + static DECLARE_BITMAP(pt_irq_pending, NR_IRQS); static DECLARE_BITMAP(pt_irq_handled, NR_IRQS); @@ -218,6 +407,10 @@ static int kvm_vm_ioctl_pci_pt_dev(struct kvm *kvm, set_bit(pci_pt_dev->host.irq, pt_irq_handled); } list_add(&match->list, &kvm->arch.pci_pt_dev_head); + + printk(KERN_INFO "kvm: Handling hypercalls for device %02x:%02x.%1x\n", + pci_pt_dev->host.busnr, PCI_SLOT(pci_pt_dev->host.devfn), + PCI_FUNC(pci_pt_dev->host.devfn)); out: return r; out_free: @@ -248,6 +441,7 @@ static void kvm_free_pci_passthrough(struct kvm *kvm) { struct list_head *ptr, *ptr2; struct kvm_pci_pt_dev_list *pci_pt_dev; + struct kvm_pv_dma_map *dmap; list_for_each_safe(ptr, ptr2, &kvm->arch.pci_pt_dev_head) { pci_pt_dev = list_entry(ptr, struct kvm_pci_pt_dev_list, list); @@ -257,6 +451,11 @@ static void kvm_free_pci_passthrough(struct kvm *kvm) list_del(&pci_pt_dev->list); } + + list_for_each_safe(ptr, ptr2, &kvm->arch.pci_pv_dmap_head) { + dmap = list_entry(ptr, struct kvm_pv_dma_map, list); + free_dmap(dmap, &kvm->arch.pci_pv_dmap_head); + } } unsigned long segment_base(u16 selector) @@ -2672,6 +2871,15 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) } switch (nr) { + case KVM_PV_DMA_MAP: + ret = pv_map_hypercall(vcpu, a0, a1); + break; + case KVM_PV_DMA_UNMAP: + ret = pv_unmap_hypercall(vcpu, a0); + break; + case KVM_PV_PCI_DEVICE: + ret = pv_mapped_pci_device_hypercall(vcpu, a0); + break; case KVM_HC_VAPIC_POLL_IRQ: ret = 0; break; @@ -4059,6 +4267,7 @@ struct kvm *kvm_arch_create_vm(void) INIT_LIST_HEAD(&kvm->arch.active_mmu_pages); INIT_LIST_HEAD(&kvm->arch.pci_pt_dev_head); + INIT_LIST_HEAD(&kvm->arch.pci_pv_dmap_head); return kvm; } diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h index 3b9cb50..d52e44e 100644 --- a/include/asm-x86/kvm_host.h +++ b/include/asm-x86/kvm_host.h @@ -298,6 +298,17 @@ struct kvm_mem_alias { #define KVM_PT_SOURCE_IRQ_ACK 2 #define KVM_PT_SOURCE_ASSIGN 3 +/* Paravirt DMA: We pin the host-side pages for the GPAs that we get + * for the DMA operation. We do a sg_map on the host pages for a DMA + * operation on the guest side. We un-pin the pages on the + * unmap_hypercall. + */ +struct kvm_pv_dma_map { + struct list_head list; + int nents; + struct scatterlist *sg; +}; + /* This list is to store the guest bus:device:function-irq and host * bus:device:function-irq mapping for assigned devices. */ @@ -319,6 +330,7 @@ struct kvm_arch{ */ struct list_head active_mmu_pages; struct list_head pci_pt_dev_head; + struct list_head pci_pv_dmap_head; struct kvm_pic *vpic; struct kvm_ioapic *vioapic; struct kvm_pit *vpit; @@ -366,6 +378,9 @@ struct kvm_vcpu_stat { u32 insn_emulation; u32 insn_emulation_fail; u32 hypercalls; + u32 hypercall_map; + u32 hypercall_unmap; + u32 hypercall_pv_dev; }; struct descriptor_table { diff --git a/include/asm-x86/kvm_para.h b/include/asm-x86/kvm_para.h index 5f93b78..e13bf4c 100644 --- a/include/asm-x86/kvm_para.h +++ b/include/asm-x86/kvm_para.h @@ -74,6 +74,12 @@ extern void kvmclock_init(void); */ #define KVM_HYPERCALL ".byte 0x0f,0x01,0xc1" +/* Hypercall numbers */ +#define KVM_PV_UNUSED 0 +#define KVM_PV_DMA_MAP 101 +#define KVM_PV_DMA_UNMAP 102 +#define KVM_PV_PCI_DEVICE 103 + /* For KVM hypercalls, a three-byte sequence of either the vmrun or the vmmrun * instruction. The hypervisor may replace it with something else but only the * instructions are guaranteed to be supported. @@ -155,6 +161,8 @@ static inline unsigned int kvm_arch_para_features(void) return cpuid_eax(KVM_CPUID_FEATURES); } +/* Max. DMA pages we send from guest to host for mapping */ +#define MAX_PVDMA_PAGES (PAGE_SIZE / sizeof(unsigned long *)) #endif /* KERNEL */ /* Stores information for identifying host PCI devices assigned to the -- 1.5.4.3 |
From: Amit S. <ami...@qu...> - 2008-04-29 10:37:33
|
We make the dma_mapping_ops structure to point to our structure so that every DMA access goes through us. We make a hypercall for every device that does a DMA operation to find out if it is an assigned device -- so that we can make hypercalls on each DMA access. The result of this hypercall is cached, so that this hypercall is made only once for each device This can be compiled as a module, but that's only used for debugging. It can be compiled into the guest kernel directly without any side effects (if you ignore one error message about the hypercall failing for hard disks). Signed-off-by: Amit Shah <ami...@qu...> --- arch/x86/Kconfig | 8 + arch/x86/kernel/Makefile | 1 + arch/x86/kernel/kvm_pv_dma.c | 391 ++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 400 insertions(+), 0 deletions(-) create mode 100644 arch/x86/kernel/kvm_pv_dma.c diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index e5790fe..aad16d9 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -392,6 +392,14 @@ config KVM_GUEST This option enables various optimizations for running under the KVM hypervisor. +config KVM_PV_DMA + tristate "KVM paravirtualized DMA access" + ---help--- + Provides support for DMA operations in the guest. A hypercall + is raised to the host to enable devices owned by guest to use + DMA. Select this if compiling a guest kernel and you need + paravirtualized DMA operations. + source "arch/x86/lguest/Kconfig" config PARAVIRT diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index fa19c38..0adb37b 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -82,6 +82,7 @@ obj-$(CONFIG_DEBUG_NX_TEST) += test_nx.o obj-$(CONFIG_VMI) += vmi_32.o vmiclock_32.o obj-$(CONFIG_KVM_GUEST) += kvm.o obj-$(CONFIG_KVM_CLOCK) += kvmclock.o +obj-$(CONFIG_KVM_PV_DMA) += kvm_pv_dma.o obj-$(CONFIG_PARAVIRT) += paravirt.o paravirt_patch_$(BITS).o ifdef CONFIG_INPUT_PCSPKR diff --git a/arch/x86/kernel/kvm_pv_dma.c b/arch/x86/kernel/kvm_pv_dma.c new file mode 100644 index 0000000..db83324 --- /dev/null +++ b/arch/x86/kernel/kvm_pv_dma.c @@ -0,0 +1,391 @@ +/* + * KVM guest DMA para-virtualization driver + * + * Copyright (C) 2007, Qumranet, Inc., Amit Shah <ami...@qu...> + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + */ + +#include <asm/page.h> +#include <linux/io.h> +#include <linux/fs.h> +#include <linux/pci.h> +#include <linux/module.h> +#include <linux/version.h> +#include <linux/miscdevice.h> +#include <linux/kvm_para.h> + +MODULE_AUTHOR("Amit Shah"); +MODULE_DESCRIPTION("Implements guest para-virtualized DMA"); +MODULE_LICENSE("GPL"); +MODULE_VERSION("1"); + +#define KVM_DMA_MINOR MISC_DYNAMIC_MINOR + +static struct page *page; +static unsigned long page_gfn; + +const struct dma_mapping_ops *orig_dma_ops; + +#include <linux/list.h> +struct pv_passthrough_dev_list { + struct list_head list; + struct kvm_pci_pt_info pv_pci_info; + int is_pv; +}; +static LIST_HEAD(pt_devs_head); + +static struct pv_passthrough_dev_list* +find_matching_pt_dev(struct list_head *head, + struct kvm_pci_pt_info *pv_pci_info) +{ + struct list_head *ptr; + struct pv_passthrough_dev_list *match; + + list_for_each(ptr, head) { + match = list_entry(ptr, struct pv_passthrough_dev_list, list); + if (match && + (match->pv_pci_info.busnr == pv_pci_info->busnr) && + (match->pv_pci_info.devfn == pv_pci_info->devfn)) + return match; + } + return NULL; +} + +void empty_pt_dev_list(struct list_head *head) +{ + struct pv_passthrough_dev_list *match; + + while (!list_empty(head)) { + match = list_entry(head->next, \ + struct pv_passthrough_dev_list, list); + list_del(&match->list); + } +} + +static int kvm_is_pv_device(struct device *dev, const char *name) +{ + int r; + struct pci_dev *pci_dev; + struct kvm_pci_pt_info pv_pci_info; + struct pv_passthrough_dev_list *match; + + pci_dev = to_pci_dev(dev); + pv_pci_info.busnr = pci_dev->bus->number; + pv_pci_info.devfn = pci_dev->devfn; + + match = find_matching_pt_dev(&pt_devs_head, &pv_pci_info); + if (match) { + r = match->is_pv; + goto out; + } + + memcpy(page_address(page), &pv_pci_info, sizeof(pv_pci_info)); + r = kvm_hypercall1(KVM_PV_PCI_DEVICE, page_gfn); + if (r < 1) { + printk(KERN_INFO "%s: Error doing hypercall!\n", __func__); + r = 0; + goto out; + } + + match = kmalloc(sizeof(struct pv_passthrough_dev_list), GFP_KERNEL); + if (match == NULL) { + printk(KERN_INFO "%s: Out of memory\n", __func__); + r = 0; + goto out; + } + match->pv_pci_info.busnr = pv_pci_info.busnr; + match->pv_pci_info.devfn = pv_pci_info.devfn; + match->is_pv = r; + list_add(&match->list, &pt_devs_head); + out: + return r; +} + +static void *kvm_dma_map(void *vaddr, size_t size, dma_addr_t *dma_handle) +{ + int npages, i; + unsigned long *dma_addr; + dma_addr_t host_addr = bad_dma_address; + + if (page == NULL) + goto out; + + npages = get_order(size) + 1; + dma_addr = page_address(page); + + /* We have to take into consideration the offsets for the + * virtual address provided by the calling + * functions. Currently both, pci_alloc_consistent and + * pci_map_single call this function. We have to change it so + * that we can also pass to the host the offset of the addr in + * the page it is in. + */ + + if (*dma_handle == bad_dma_address) + goto out; + + /* It's not really OK to use dma_handle here, as the IOMMU or + * swiotlb could have mapped it elsewhere. But what's a better + * solution? + */ + *dma_addr++ = *dma_handle; + if (npages > 1) { + /* All of the pages will be contiguous in guest + * physical memory in both, pci_map_consistent and + * pci_map_single cases (see DMA-API.txt) + */ + /* FIXME: we're currently not crossing over to + * multiple pages to be sent to host, in case + * we have a lot of pages that we can't + * accomodate in one page. + */ + for (i = 1; i < min((unsigned long)npages, MAX_PVDMA_PAGES); i++) + *dma_addr++ = virt_to_phys(vaddr + PAGE_SIZE * i); + } + + /* Maybe we need more arguments (we have first two): + * @npages: number of gpas pages in this hypercall + * @page: page we pass to host with all the gpas in them + * @more: are there any more pages coming? + * @offset: offset of the address in the first page + * @direction: direction for the mapping (only for pci_map_single) + */ + npages = kvm_hypercall2(KVM_PV_DMA_MAP, npages, page_gfn); + if (!npages) + host_addr = bad_dma_address; + else + host_addr = *(unsigned long *)page_address(page); + + out: + *dma_handle = host_addr; + if (host_addr == bad_dma_address) + vaddr = NULL; + return vaddr; +} + +static void kvm_dma_unmap(dma_addr_t dma_handle) +{ + kvm_hypercall1(KVM_PV_DMA_UNMAP, dma_handle); + return; +} + +static void *kvm_dma_alloc_coherent(struct device *dev, size_t size, + dma_addr_t *dma_handle, gfp_t gfp) +{ + void *vaddr = NULL; + if ((*dma_handle == bad_dma_address) + || !dma_ops->is_pv_device(dev, dev->bus_id)) + goto out; + + vaddr = bus_to_virt(*(unsigned long *)dma_handle); + vaddr = kvm_dma_map(vaddr, size, dma_handle); + out: + return vaddr; +} + +static void kvm_dma_free_coherent(struct device *dev, size_t size, void *vaddr, + dma_addr_t dma_handle) +{ + kvm_dma_unmap(dma_handle); +} + +static dma_addr_t kvm_dma_map_single(struct device *dev, phys_addr_t paddr, + size_t size, int direction) +{ + dma_addr_t r; + + r = orig_dma_ops->map_single(dev, paddr, size, direction); + + if (r != bad_dma_address && kvm_is_pv_device(dev, dev->bus_id)) + kvm_dma_map(phys_to_virt(paddr), size, &r); + return r; +} + +static inline void kvm_dma_unmap_single(struct device *dev, dma_addr_t addr, + size_t size, int direction) +{ + kvm_dma_unmap(addr); +} + +int kvm_pv_dma_mapping_error(dma_addr_t dma_addr) +{ + if (orig_dma_ops->mapping_error) + return orig_dma_ops->mapping_error(dma_addr); + + printk(KERN_ERR "%s: Unhandled PV DMA operation. Report this.\n", + __func__); + return dma_addr == bad_dma_address; +} + +/* like map_single, but doesn't check the device mask */ +dma_addr_t kvm_pv_dma_map_simple(struct device *hwdev, phys_addr_t paddr, + size_t size, int direction) +{ + return orig_dma_ops->map_simple(hwdev, paddr, size, direction); +} + +void kvm_pv_dma_sync_single_for_cpu(struct device *hwdev, + dma_addr_t dma_handle, size_t size, + int direction) +{ + if (orig_dma_ops->sync_single_for_cpu) + orig_dma_ops->sync_single_for_cpu(hwdev, dma_handle, + size, direction); +} + +void kvm_pv_dma_sync_single_for_device(struct device *hwdev, + dma_addr_t dma_handle, size_t size, + int direction) +{ + if (orig_dma_ops->sync_single_for_device) + orig_dma_ops->sync_single_for_device(hwdev, dma_handle, + size, direction); +} + +void kvm_pv_dma_sync_single_range_for_cpu(struct device *hwdev, + dma_addr_t dma_handle, + unsigned long offset, + size_t size, int direction) +{ + if (orig_dma_ops->sync_single_range_for_cpu) + orig_dma_ops->sync_single_range_for_cpu(hwdev, dma_handle, + offset, size, + direction); +} + +void kvm_pv_dma_sync_single_range_for_device(struct device *hwdev, + dma_addr_t dma_handle, + unsigned long offset, + size_t size, int direction) +{ + if (orig_dma_ops->sync_single_range_for_device) + orig_dma_ops->sync_single_range_for_device(hwdev, dma_handle, + offset, size, + direction); +} + +void kvm_pv_dma_sync_sg_for_cpu(struct device *hwdev, + struct scatterlist *sg, int nelems, + int direction) +{ + if (orig_dma_ops->sync_sg_for_cpu) + orig_dma_ops->sync_sg_for_cpu(hwdev, sg, nelems, direction); +} + +void kvm_pv_dma_sync_sg_for_device(struct device *hwdev, + struct scatterlist *sg, int nelems, + int direction) +{ + if (orig_dma_ops->sync_sg_for_device) + orig_dma_ops->sync_sg_for_device(hwdev, sg, nelems, direction); +} + +int kvm_pv_dma_map_sg(struct device *hwdev, struct scatterlist *sg, + int nents, int direction) +{ + return orig_dma_ops->map_sg(hwdev, sg, nents, direction); + printk(KERN_ERR "%s: Unhandled PV DMA operation. Report this.\n", + __func__); + return 0; +} + +void kvm_pv_dma_unmap_sg(struct device *hwdev, + struct scatterlist *sg, int nents, + int direction) +{ + if (orig_dma_ops->unmap_sg) + orig_dma_ops->unmap_sg(hwdev, sg, nents, direction); +} + +int kvm_pv_dma_dma_supported(struct device *hwdev, u64 mask) +{ + if (orig_dma_ops->dma_supported) + return orig_dma_ops->dma_supported(hwdev, mask); + printk(KERN_ERR "%s: Unhandled PV DMA operation. Report this.\n", + __func__); + return 0; +} + +static const struct dma_mapping_ops kvm_dma_ops = { + .alloc_coherent = kvm_dma_alloc_coherent, + .free_coherent = kvm_dma_free_coherent, + .map_single = kvm_dma_map_single, + .unmap_single = kvm_dma_unmap_single, + .is_pv_device = kvm_is_pv_device, + + .mapping_error = kvm_pv_dma_mapping_error, + .map_simple = kvm_pv_dma_map_simple, + .sync_single_for_cpu = kvm_pv_dma_sync_single_for_cpu, + .sync_single_for_device = kvm_pv_dma_sync_single_for_device, + .sync_single_range_for_cpu = kvm_pv_dma_sync_single_range_for_cpu, + .sync_single_range_for_device = kvm_pv_dma_sync_single_range_for_device, + .sync_sg_for_cpu = kvm_pv_dma_sync_sg_for_cpu, + .sync_sg_for_device = kvm_pv_dma_sync_sg_for_device, + .map_sg = kvm_pv_dma_map_sg, + .unmap_sg = kvm_pv_dma_unmap_sg, +}; + +static struct file_operations dma_chardev_ops; +static struct miscdevice kvm_dma_dev = { + KVM_DMA_MINOR, + "kvm_dma", + &dma_chardev_ops, +}; + +int __init kvm_pv_dma_init(void) +{ + int r; + + dma_chardev_ops.owner = THIS_MODULE; + if (misc_register(&kvm_dma_dev)) { + printk(KERN_ERR "%s: misc device register failed\n", + __func__); + r = -EBUSY; + goto out; + } + if (!kvm_para_available()) { + printk(KERN_ERR "KVM paravirt support not available\n"); + r = -ENODEV; + goto out_dereg; + } + + /* FIXME: check for hypercall support */ + page = alloc_page(GFP_ATOMIC); + if (page == NULL) { + printk(KERN_ERR "%s: Could not allocate page\n", __func__); + r = -ENOMEM; + goto out_dereg; + } + page_gfn = page_to_pfn(page); + + orig_dma_ops = dma_ops; + dma_ops = &kvm_dma_ops; + + printk(KERN_INFO "KVM PV DMA engine registered\n"); + return 0; + goto out; + goto out_free; + + out_free: + __free_page(page); + out_dereg: + misc_deregister(&kvm_dma_dev); + out: + return r; +} + +static void __exit kvm_pv_dma_exit(void) +{ + dma_ops = orig_dma_ops; + + __free_page(page); + + empty_pt_dev_list(&pt_devs_head); + + misc_deregister(&kvm_dma_dev); +} + +module_init(kvm_pv_dma_init); +module_exit(kvm_pv_dma_exit); -- 1.5.4.3 |
From: Amit S. <ami...@qu...> - 2008-04-29 10:37:33
|
This patchset implements PVDMA for handling DMA requests from devices assigned to the guest from the host machine. They're also available from git-pull git://git.kernel.org/pub/scm/linux/kernel/git/amit/kvm.git pvdma These patches are based on my pci-passthrough tree, which is available from git-pull git://git.kernel.org/pub/scm/linux/kernel/git/amit/kvm.git and the userspace from git-pull git://git.kernel.org/pub/scm/linux/kernel/git/amit/kvm-userspace.git The first and the third patch in this series is needed on the guest (with some bits from the 2nd as well). The 2nd patch is meant for the host kernel. Amit. |
From: Jan K. <jan...@si...> - 2008-04-29 10:34:40
|
Joerg Roedel wrote: > On Tue, Apr 29, 2008 at 10:38:41AM +0200, Jan Kiszka wrote: >> Joerg Roedel wrote: >>> Hmm, seems we have to check for DF and triple faults in the >>> kvm_queue_exception functions too. Does the attached patch fix the >>> problem (patch is against kvm-66). >> Thanks, it indeed fixes the warnings (*) and makes KVM issue a reset. But >> then is stumbles and falls probably over some inconsistent system state: >> >> exception 13 (43) >> rax 0000000000000000 rbx 0000000000000000 rcx 0000000000000000 rdx 0000000000000633 >> rsi 0000000000000000 rdi 0000000000000000 rsp 0000000000000000 rbp 0000000000000000 >> r8 0000000000000000 r9 0000000000000000 r10 0000000000000000 r11 0000000000000000 >> r12 0000000000000000 r13 0000000000000000 r14 0000000000000000 r15 0000000000000000 >> rip 000000000000fff0 rflags 00033002 >> cs f000 (000f0000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) >> ds 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) >> es 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) >> ss 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) >> fs 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) >> gs 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) >> tr 0178 (fffbd000/00002088 p 1 dpl 0 db 0 s 0 type b l 0 g 0 avl 0) >> ldt 0000 (00000000/0000ffff p 1 dpl 0 db 0 s 0 type 2 l 0 g 0 avl 0) >> gdt 0/ffff >> idt 0/ffff >> cr0 60000010 cr2 0 cr3 0 cr4 0 cr8 0 efer 0 >> code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 --> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> Looks like trying to execute the first instruction after reset is >> already unsuccessful. As the tr selector is non-zero here, I already >> tried a kvm_arch_reset_cpu-hack along the line that sets >> KVM_REQ_TRIPLE_FAULT, but without success. Any idea what to check? > > Its weird to me what triggers the taskswitch. What guest operating It is the guest, looking for a soft-restart (after it detected some other error - now our main problem). > system are you running and what is the qemu/kvm command line to start > the guest? Well, the guest is a proprietary OS of our customer, running in 16-bit protected mode with a lot of segment shuffling. Due to this and also some special hardware emulations, the current test case is not portable. So I'm looking for input on where to dig and what to try. Note that I ran the very same test with -no-kvm, and here we do not get those post-reset GPF (provided that some reset-on-triple-fault patch is applied to avoid the abort(), e.g. [1]). > >> Note that this does not happen when I raise a reset via the monitor. >> >> BTW, kvm_show_code() does not seem to provide correct informations, >> even when I add it right before the first kvm_run(). > > When the guest state is messed up the information may be incorrect. I don't expect the guest state to be messed up right before the very first guest code execution (that's where kvm_show_code() also reported zeros)... :-> > >> (*) There is just a bit noise left behind in the syslog: >> >> kvm_handle_exit: unexpected, valid vectoring info and exit reason is 0x9 > > Reason 0x9 is the taskswitch intercept. > >> kvm: inject_page_fault: double fault > > This is expected from the patch I sent you. For sure. I would just suggest to rethink if a final version should still issue such warnings. We basically had the same discussion on qemu-devel around the reset-on-triple-fault patch (which is unfortunately still not finalized :-/). Jan [1] http://permalink.gmane.org/gmane.comp.emulators.qemu/24475 -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux |
From: David M. <da...@da...> - 2008-04-29 10:30:08
|
From: Avi Kivity <av...@qu...> Date: Sun, 27 Apr 2008 12:40:29 +0300 > Avi Kivity wrote: > > I propose moving the kvm lists to vger.kernel.org, for the following > > benefits: > > > > - better spam control > > - faster service (I see significant lag with the sourceforge lists) > > - no ads appended to the end of each email > > > > If no objections are raised, and if the vger postmasters agree, I will > > mass subscribe the current subscribers so that there will be no > > service interruption. > > > > Since no objections were raised, we'll start to get this rolling. Should I create the list(s) now? If so, please let me know the names they should have. Thanks. |
From: Joerg R. <joe...@am...> - 2008-04-29 10:05:03
|
On Tue, Apr 29, 2008 at 10:38:41AM +0200, Jan Kiszka wrote: > Joerg Roedel wrote: > > Hmm, seems we have to check for DF and triple faults in the > > kvm_queue_exception functions too. Does the attached patch fix the > > problem (patch is against kvm-66). > > Thanks, it indeed fixes the warnings (*) and makes KVM issue a reset. But > then is stumbles and falls probably over some inconsistent system state: > > exception 13 (43) > rax 0000000000000000 rbx 0000000000000000 rcx 0000000000000000 rdx 0000000000000633 > rsi 0000000000000000 rdi 0000000000000000 rsp 0000000000000000 rbp 0000000000000000 > r8 0000000000000000 r9 0000000000000000 r10 0000000000000000 r11 0000000000000000 > r12 0000000000000000 r13 0000000000000000 r14 0000000000000000 r15 0000000000000000 > rip 000000000000fff0 rflags 00033002 > cs f000 (000f0000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) > ds 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) > es 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) > ss 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) > fs 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) > gs 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) > tr 0178 (fffbd000/00002088 p 1 dpl 0 db 0 s 0 type b l 0 g 0 avl 0) > ldt 0000 (00000000/0000ffff p 1 dpl 0 db 0 s 0 type 2 l 0 g 0 avl 0) > gdt 0/ffff > idt 0/ffff > cr0 60000010 cr2 0 cr3 0 cr4 0 cr8 0 efer 0 > code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 --> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > Looks like trying to execute the first instruction after reset is > already unsuccessful. As the tr selector is non-zero here, I already > tried a kvm_arch_reset_cpu-hack along the line that sets > KVM_REQ_TRIPLE_FAULT, but without success. Any idea what to check? Its weird to me what triggers the taskswitch. What guest operating system are you running and what is the qemu/kvm command line to start the guest? > Note that this does not happen when I raise a reset via the monitor. > > BTW, kvm_show_code() does not seem to provide correct informations, > even when I add it right before the first kvm_run(). When the guest state is messed up the information may be incorrect. > (*) There is just a bit noise left behind in the syslog: > > kvm_handle_exit: unexpected, valid vectoring info and exit reason is 0x9 Reason 0x9 is the taskswitch intercept. > kvm: inject_page_fault: double fault This is expected from the patch I sent you. Joerg -- | AMD Saxony Limited Liability Company & Co. KG Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany System | Register Court Dresden: HRA 4896 Research | General Partner authorized to represent: Center | AMD Saxony LLC (Wilmington, Delaware, US) | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy |
From: Jan K. <jan...@si...> - 2008-04-29 08:38:35
|
Joerg Roedel wrote: > On Mon, Apr 28, 2008 at 07:35:10PM +0200, Jan Kiszka wrote: >> Hi, >> >> sorry, the test environment is not really reproducible (stock kvm-66, >> yet unpublished NMI support by Sheng Yang and me, special guest), but >> I'm just fishing for some ideas on what may cause the flood of the >> following warning in my kernel log: >> >> ------------[ cut here ]------------ >> WARNING: at /data/kvm-66/kernel/x86.c:180 >> kvm_queue_exception_e+0x30/0x54 [kvm]() >> Modules linked in: ipt_MASQUERADE kvm_intel kvm bridge tun ip6t_LOG >> nf_conntrack_ipv6 xt_pkttype ipt_LOG xt_limit snd_pcm_oss snd_mixer_oss >> snd_seq snd_seq_device nls_utf8 cifs af_packet ip6t_REJECT xt_tcpudp >> ipt_REJECT xt_state iptable_mangle iptable_nat nf_nat iptable_filter >> ip6table_mangle nf_conntrack_ipv4 nf_conntrack ip_tables ip6table_filter >> ip6_tables cpufreq_conservative x_tables cpufreq_userspace >> cpufreq_powersave acpi_cpufreq ipv6 microcode fuse ohci_hcd loop rfcomm >> l2cap wlan_scan_sta ath_rate_sample ath_pci snd_hda_intel wlan pcmcia >> firmware_class hci_usb snd_pcm snd_timer ath_hal(P) sdhci battery >> bluetooth button ohci1394 mmc_core rtc_cmos parport_pc intel_agp >> rtc_core dock ac snd_page_alloc iTCO_wdt ieee1394 sky2 rtc_lib >> yenta_socket parport snd_hwdep snd iTCO_vendor_support i2c_i801 >> rsrc_nonstatic pcmcia_core sg i2c_core soundcore serio_raw joydev >> sha256_generic aes_x86_64 aes_generic cbc dm_crypt crypto_blkcipher >> usbhid hid ff_memless sd_mod ehci_hcd uhci_hcd usbcore dm_snapshot >> dm_mod edd ext3 mbcache jbd fan ata_piix ahci libata scsi_mod thermal >> processor >> Pid: 4718, comm: qemu-system-x86 Tainted: P N >> 2.6.25-rc5-git2-109.8-default #1 >> >> Call Trace: >> [<ffffffff8020d826>] dump_trace+0xc4/0x576 >> [<ffffffff8020dd18>] show_trace+0x40/0x57 >> [<ffffffff8044e341>] _etext+0x72/0x7b >> [<ffffffff80238137>] warn_on_slowpath+0x58/0x80 >> [<ffffffff886e2e05>] :kvm:kvm_queue_exception_e+0x30/0x54 >> [<ffffffff886e3678>] :kvm:kvm_task_switch+0xca/0x20a >> [<ffffffff8870d096>] :kvm_intel:handle_task_switch+0x19/0x1b >> [<ffffffff8870cb1b>] :kvm_intel:kvm_handle_exit+0x7f/0x9c >> [<ffffffff886e51e2>] :kvm:kvm_arch_vcpu_ioctl_run+0x49b/0x686 >> [<ffffffff886e08c9>] :kvm:kvm_vcpu_ioctl+0xf7/0x3ca >> [<ffffffff802ad0ba>] vfs_ioctl+0x2a/0x78 >> [<ffffffff802ad34f>] do_vfs_ioctl+0x247/0x261 >> [<ffffffff802ad3be>] sys_ioctl+0x55/0x77 >> [<ffffffff8020c18a>] system_call_after_swapgs+0x8a/0x8f >> [<00007faed2969267>] >> >> ---[ end trace 5d286714f3c5c50f ]--- > > Hmm, seems we have to check for DF and triple faults in the > kvm_queue_exception functions too. Does the attached patch fix the > problem (patch is against kvm-66). Thanks, it indeed fixes the warnings (*) and makes KVM issue a reset. But then is stumbles and falls probably over some inconsistent system state: exception 13 (43) rax 0000000000000000 rbx 0000000000000000 rcx 0000000000000000 rdx 0000000000000633 rsi 0000000000000000 rdi 0000000000000000 rsp 0000000000000000 rbp 0000000000000000 r8 0000000000000000 r9 0000000000000000 r10 0000000000000000 r11 0000000000000000 r12 0000000000000000 r13 0000000000000000 r14 0000000000000000 r15 0000000000000000 rip 000000000000fff0 rflags 00033002 cs f000 (000f0000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) ds 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) es 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) ss 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) fs 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) gs 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) tr 0178 (fffbd000/00002088 p 1 dpl 0 db 0 s 0 type b l 0 g 0 avl 0) ldt 0000 (00000000/0000ffff p 1 dpl 0 db 0 s 0 type 2 l 0 g 0 avl 0) gdt 0/ffff idt 0/ffff cr0 60000010 cr2 0 cr3 0 cr4 0 cr8 0 efer 0 code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 --> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Looks like trying to execute the first instruction after reset is already unsuccessful. As the tr selector is non-zero here, I already tried a kvm_arch_reset_cpu-hack along the line that sets KVM_REQ_TRIPLE_FAULT, but without success. Any idea what to check? Note that this does not happen when I raise a reset via the monitor. BTW, kvm_show_code() does not seem to provide correct informations, even when I add it right before the first kvm_run(). Jan (*) There is just a bit noise left behind in the syslog: kvm_handle_exit: unexpected, valid vectoring info and exit reason is 0x9 kvm: inject_page_fault: double fault kvm_handle_exit: unexpected, valid vectoring info and exit reason is 0x9 handle_exception: unexpected, vectoring info 0x80000b08 intr info 0x80000b0d -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux |
From: David M. <dm...@ma...> - 2008-04-29 05:09:06
|
Fabian Deutsch wrote: > Hey. > > I've been trying Microsoft Windows 2003 a couple of times. The wiki > tells me that "everything" should work okay. It does, when using -smp 1, > but gets ugly when using -smp 2 or so. > > SO might it be useful, to add the column "smp" to the "Guest Support > Status" Page in the wiki? It's probably worth explaining what you mean by "gets ugly" and worth describing the environment. I just installed a Windows 2003 Enterprise Edition with -smp 2 and had no problems during or after the install. I confirmed Windows detected and uses two CPUs. IOW, works for me. Intel(R) Core(TM)2 CPU T7600 @ 2.33GHz SuSE kernel 2.6.22.17-0.1-default x86_64 kvm-67 # qemu-system-x86_64 -smp 2 -m 512 -usb -localtime -hda disk.img -boot c --- David Mair. |