You can subscribe to this list here.
2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(33) |
Nov
(325) |
Dec
(320) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2007 |
Jan
(484) |
Feb
(438) |
Mar
(407) |
Apr
(713) |
May
(831) |
Jun
(806) |
Jul
(1023) |
Aug
(1184) |
Sep
(1118) |
Oct
(1461) |
Nov
(1224) |
Dec
(1042) |
2008 |
Jan
(1449) |
Feb
(1110) |
Mar
(1428) |
Apr
(1643) |
May
(682) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Gerd H. <kr...@re...> - 2008-04-21 13:01:23
|
Jeremy Fitzhardinge wrote: > Gerd Hoffmann wrote: >> +cycle_t pvclock_clocksource_read(struct kvm_vcpu_time_info *src) >> +{ >> + struct pvclock_shadow_time *shadow = &get_cpu_var(shadow_time); >> + cycle_t ret; >> + >> + pvclock_get_time_values(shadow, src); >> + ret = shadow->system_timestamp + pvclock_get_nsec_offset(shadow); >> > > You need to put this in a loop in case the system clock parameters > change between the pvclock_get_time_values() and pvclock_get_nsec_offset(). Fixed, new patch attached. > How does kvm deal with suspend/resume with respect to time? Is the > "system" timestamp guaranteed to remain monotonic? For Xen, I think > we'll need to maintain an offset between the initial system timestamp > and whatever it is after resuming. Havn't looked at it yet. cheers, Gerd -- http://kraxel.fedorapeople.org/xenner/ |
From: Gerd H. <kr...@re...> - 2008-04-21 12:50:29
|
Jeremy Fitzhardinge wrote: > Xen could change the parameters in the instant after get_time_values(). > That change could be as a result of suspend-resume, so the parameters > and the tsc could be wildly different. Ah, ok, forgot the rdtsc in the picture. With that in mind I fully agree that the loop is needed. I think kvm guests can even hit that one with the vcpu migrating to a different physical cpu, so we better handle it correctly ;) > Sure, but get_time_values() has several other callers. Not really. There are only two calls, one in clocksource_read() and one in the init path. The later is superfluous I think because clocksource_read() is the only user of the shadowed time info. cheers, Gerd -- http://kraxel.fedorapeople.org/xenner/ |
From: Jamie L. <ja...@sh...> - 2008-04-21 12:11:13
|
Avi Kivity wrote: > >At such a tiny difference, I'm wondering why Linux-AIO exists at all, > >as it complicates the kernel rather a lot. I can see the theoretical > >appeal, but if performance is so marginal, I'm surprised it's in > >there. > > Linux aio exists, but that's all that can be said for it. It works > mostly for raw disks, doesn't integrate with networking, and doesn't > advance at the same pace as the rest of the kernel. I believe only > databases use it (and a userspace filesystem I wrote some time ago). And video streaming on some embedded devices with no MMU! (Due to the page cache heuristics working poorly with no MMU, sustained reliable streaming is managed with O_DIRECT and the app managing cache itself (like a database), and that needs AIO to keep the request queue busy. At least, that's the theory.) > >I'm also surprised the Glibc implementation of AIO using ordinary > >threads is so close to it. > > Why are you surprised? Because I've read that Glibc AIO (which uses a thread pool) is a relatively poor performer as AIO implementations go, and is only there for API compatibility, not suggested for performance. But I read that quite a while ago, perhaps it's changed. > Actually the glibc implementation could be improved from what I've > heard. My estimates are for a thread pool implementation, but there is > not reason why glibc couldn't achieve exactly the same performance. Erm... I thought you said it _does_ achieve nearly the same performance, not that it _could_. Do you mean it could achieve exactly the same performance by using Linux AIO when possible? > >And then, I'm wondering why use AIO it > >all: it suggests QEMU would run about as fast doing synchronous I/O in > >a few dedicated I/O threads. > > Posix aio is the unix API for this, why not use it? Because far more host platforms have threads than have POSIX AIO. (I suspect both options will end up supported in the end, as dedicated I/O threads were already suggested for other things.) > >>Also, I'd presume that those that need 10K IOPS and above will not place > >>their high throughput images on a filesystem; rather on a separate SAN > >>LUN. > > > >Does the separate LUN make any difference? I thought O_DIRECT on a > >filesystem was meant to be pretty close to block device performance. > > On a good extent-based filesystem like XFS you will get good performance > (though more cpu overhead due to needing to go through additional > mapping layers. Old clunkers like ext3 will require additional seeks or > a ton of cache (1 GB per 1 TB). Hmm. Thanks. I may consider switching to XFS now.... -- Jamie |
From: Christian B. <bor...@de...> - 2008-04-21 11:49:46
|
Avi, kvm_dev_ioctl casts the arg value to void __user *, just to recast it again to long. This seems unnecessary. According to objdump the binary code on x86 is unchanged by this patch. Signed-off-by: Christian Borntraeger <bor...@de...> --- virt/kvm/kvm_main.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) Index: kvm/virt/kvm/kvm_main.c =================================================================== --- kvm.orig/virt/kvm/kvm_main.c +++ kvm/virt/kvm/kvm_main.c @@ -1188,7 +1188,6 @@ static int kvm_dev_ioctl_create_vm(void) static long kvm_dev_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { - void __user *argp = (void __user *)arg; long r = -EINVAL; switch (ioctl) { @@ -1205,7 +1204,7 @@ static long kvm_dev_ioctl(struct file *f r = kvm_dev_ioctl_create_vm(); break; case KVM_CHECK_EXTENSION: - r = kvm_dev_ioctl_check_extension((long)argp); + r = kvm_dev_ioctl_check_extension(arg); break; case KVM_GET_VCPU_MMAP_SIZE: r = -EINVAL; |
From: Jeremy F. <je...@go...> - 2008-04-21 11:46:43
|
Gerd Hoffmann wrote: > Hmm, I somehow fail to see a case where it could be non-atomic ... > > get_time_values() copies a consistent snapshot, thus > xen_clocksource_read() doesn't race against xen updating the fields. > The snapshot is in a per-cpu variable, thus it doesn't race against > other guest vcpus running get_time_values() at the same time. > Xen could change the parameters in the instant after get_time_values(). That change could be as a result of suspend-resume, so the parameters and the tsc could be wildly different. It's definitely an edge-case, but it's easy enough to deal with. >> There could be a loopless >> __get_time_values() for use in this case, but given that it almost never >> loops, I don't think its worthwhile. >> > > "in this case" ??? I'm confused. There is only a single user of > get_nsec_offset(), which is xen_clocksource_read() ... > Sure, but get_time_values() has several other callers. If xen_clocksource_read() had its own loop to make sure the read_tsc is atomic with respect to get_time_values, then get_time_values itself needn't loop. But, given that it only loops in the very rare case that it races with Xen updating those parameters, it doesn't seem to make much difference either way. J |
From: Avi K. <av...@qu...> - 2008-04-21 10:30:21
|
This lets us treat the case where mod == 3 in the same manner as other cases. Signed-off-by: Avi Kivity <av...@qu...> --- arch/x86/kvm/x86_emulate.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/x86_emulate.c b/arch/x86/kvm/x86_emulate.c index f59ed93..8e1b32f 100644 --- a/arch/x86/kvm/x86_emulate.c +++ b/arch/x86/kvm/x86_emulate.c @@ -1001,6 +1001,7 @@ done_prefixes: */ if ((c->d & ModRM) && c->modrm_mod == 3) { c->src.type = OP_REG; + c->src.val = c->modrm_val; break; } c->src.type = OP_MEM; @@ -1044,6 +1045,7 @@ done_prefixes: case DstMem: if ((c->d & ModRM) && c->modrm_mod == 3) { c->dst.type = OP_REG; + c->dst.val = c->dst.orig_val = c->modrm_val; break; } c->dst.type = OP_MEM; -- 1.5.5 |
From: Avi K. <av...@qu...> - 2008-04-21 10:30:20
|
From: Joerg Roedel <joe...@am...> If the CR8 write intercept is disabled the V_TPR field of the VMCB needs to be synced with the TPR field in the local apic. Signed-off-by: Joerg Roedel <joe...@am...> Signed-off-by: Avi Kivity <av...@qu...> --- arch/x86/kvm/svm.c | 12 ++++++++++++ 1 files changed, 12 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index f8ce36e..ee2ee83 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1620,6 +1620,16 @@ static void svm_prepare_guest_switch(struct kvm_vcpu *vcpu) { } +static inline void sync_cr8_to_lapic(struct kvm_vcpu *vcpu) +{ + struct vcpu_svm *svm = to_svm(vcpu); + + if (!(svm->vmcb->control.intercept_cr_write & INTERCEPT_CR8_MASK)) { + int cr8 = svm->vmcb->control.int_ctl & V_TPR_MASK; + kvm_lapic_set_tpr(vcpu, cr8); + } +} + static inline void sync_lapic_to_cr8(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); @@ -1791,6 +1801,8 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) stgi(); + sync_cr8_to_lapic(vcpu); + svm->next_rip = 0; } -- 1.5.5 |
From: Avi K. <av...@qu...> - 2008-04-21 10:30:20
|
We never hit this, since there is currently no reason to emulate lea. Signed-off-by: Avi Kivity <av...@qu...> --- arch/x86/kvm/x86_emulate.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/x86_emulate.c b/arch/x86/kvm/x86_emulate.c index 46ef78f..2ca0838 100644 --- a/arch/x86/kvm/x86_emulate.c +++ b/arch/x86/kvm/x86_emulate.c @@ -1512,7 +1512,7 @@ special_insn: case 0x88 ... 0x8b: /* mov */ goto mov; case 0x8d: /* lea r16/r32, m */ - c->dst.val = c->modrm_val; + c->dst.val = c->modrm_ea; break; case 0x8f: /* pop (sole member of Grp1a) */ rc = emulate_grp1a(ctxt, ops); -- 1.5.5 |
From: Avi K. <av...@qu...> - 2008-04-21 10:30:20
|
From: Hollis Blanchard <ho...@us...> PowerPC 440 KVM needs to know how many TLB entries are used for the host kernel linear mapping (it does not modify these mappings when switching between guest and host execution). Signed-off-by: Hollis Blanchard <ho...@us...> Acked-by: Josh Boyer <jw...@li...> Acked-by: Paul Mackerras <pa...@sa...> Signed-off-by: Avi Kivity <av...@qu...> --- include/asm-powerpc/mmu-44x.h | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/include/asm-powerpc/mmu-44x.h b/include/asm-powerpc/mmu-44x.h index 62772ae..b6953e8 100644 --- a/include/asm-powerpc/mmu-44x.h +++ b/include/asm-powerpc/mmu-44x.h @@ -53,6 +53,8 @@ #ifndef __ASSEMBLY__ +extern unsigned int tlb_44x_hwater; + typedef unsigned long long phys_addr_t; typedef struct { -- 1.5.5 |
From: Avi K. <av...@qu...> - 2008-04-21 10:30:20
|
From: Hollis Blanchard <ho...@us...> Signed-off-by: Hollis Blanchard <ho...@us...> Acked-by: Paul Mackerras <pa...@sa...> Signed-off-by: Avi Kivity <av...@qu...> --- MAINTAINERS | 7 +++++++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index a18aac1..6072f2f 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2293,6 +2293,13 @@ L: kvm...@li... W: kvm.sourceforge.net S: Supported +KERNEL VIRTUAL MACHINE (KVM) FOR POWERPC +P: Hollis Blanchard +M: ho...@us... +L: kvm...@li... +W: kvm.sourceforge.net +S: Supported + KERNEL VIRTUAL MACHINE For Itanium(KVM/IA64) P: Anthony Xu M: ant...@in... -- 1.5.5 |
From: Avi K. <av...@qu...> - 2008-04-21 10:30:19
|
From: Joerg Roedel <joe...@am...> This patch adds syncing of the lapic.tpr field to the V_TPR field of the VMCB. With this change we can safely remove the CR8 read intercept. Signed-off-by: Joerg Roedel <joe...@am...> Signed-off-by: Avi Kivity <av...@qu...> --- arch/x86/kvm/svm.c | 18 ++++++++++++++++-- 1 files changed, 16 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 3379e13..f8ce36e 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -486,8 +486,7 @@ static void init_vmcb(struct vcpu_svm *svm) control->intercept_cr_read = INTERCEPT_CR0_MASK | INTERCEPT_CR3_MASK | - INTERCEPT_CR4_MASK | - INTERCEPT_CR8_MASK; + INTERCEPT_CR4_MASK; control->intercept_cr_write = INTERCEPT_CR0_MASK | INTERCEPT_CR3_MASK | @@ -1621,6 +1620,19 @@ static void svm_prepare_guest_switch(struct kvm_vcpu *vcpu) { } +static inline void sync_lapic_to_cr8(struct kvm_vcpu *vcpu) +{ + struct vcpu_svm *svm = to_svm(vcpu); + u64 cr8; + + if (!irqchip_in_kernel(vcpu->kvm)) + return; + + cr8 = kvm_get_cr8(vcpu); + svm->vmcb->control.int_ctl &= ~V_TPR_MASK; + svm->vmcb->control.int_ctl |= cr8 & V_TPR_MASK; +} + static void svm_vcpu_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) { struct vcpu_svm *svm = to_svm(vcpu); @@ -1630,6 +1642,8 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) pre_svm_run(svm); + sync_lapic_to_cr8(vcpu); + save_host_msrs(vcpu); fs_selector = read_fs(); gs_selector = read_gs(); -- 1.5.5 |
From: Avi K. <av...@qu...> - 2008-04-21 10:30:18
|
From: Joerg Roedel <joe...@am...> With the usage of the V_TPR field this comment is now obsolete. Signed-off-by: Joerg Roedel <joe...@am...> Signed-off-by: Avi Kivity <av...@qu...> --- arch/x86/kvm/svm.c | 7 ------- 1 files changed, 0 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 61bb2cb..d643605 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -916,13 +916,6 @@ static void svm_set_segment(struct kvm_vcpu *vcpu, } -/* FIXME: - - svm(vcpu)->vmcb->control.int_ctl &= ~V_TPR_MASK; - svm(vcpu)->vmcb->control.int_ctl |= (sregs->cr8 & V_TPR_MASK); - -*/ - static int svm_guest_debug(struct kvm_vcpu *vcpu, struct kvm_debug_guest *dbg) { return -EOPNOTSUPP; -- 1.5.5 |
From: Avi K. <av...@qu...> - 2008-04-21 10:30:18
|
From: Hollis Blanchard <ho...@us...> Don't allow building as a module (asm-offsets dependencies). Also, automatically select KVM_BOOKE_HOST until we better separate the guest and host layers. Signed-off-by: Hollis Blanchard <ho...@us...> Signed-off-by: Avi Kivity <av...@qu...> --- arch/powerpc/kvm/Kconfig | 11 +++++------ 1 files changed, 5 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig index 989ee82..6f73edd 100644 --- a/arch/powerpc/kvm/Kconfig +++ b/arch/powerpc/kvm/Kconfig @@ -15,10 +15,12 @@ menuconfig VIRTUALIZATION if VIRTUALIZATION config KVM - tristate "Kernel-based Virtual Machine (KVM) support" - depends on EXPERIMENTAL + bool "Kernel-based Virtual Machine (KVM) support" + depends on 44x && EXPERIMENTAL select PREEMPT_NOTIFIERS select ANON_INODES + # We can only run on Book E hosts so far + select KVM_BOOKE_HOST ---help--- Support hosting virtualized guest machines. You will also need to select one or more of the processor modules below. @@ -26,13 +28,10 @@ config KVM This module provides access to the hardware capabilities through a character device node named /dev/kvm. - To compile this as a module, choose M here: the module - will be called kvm. - If unsure, say N. config KVM_BOOKE_HOST - tristate "KVM host support for Book E PowerPC processors" + bool "KVM host support for Book E PowerPC processors" depends on KVM && 44x ---help--- Provides host support for KVM on Book E PowerPC processors. Currently -- 1.5.5 |
From: Avi K. <av...@qu...> - 2008-04-21 10:30:13
|
From: Joerg Roedel <joe...@am...> This patch exports the kvm_lapic_set_tpr() function from the lapic code to modules. It is required in the kvm-amd module to optimize CR8 intercepts. Signed-off-by: Joerg Roedel <joe...@am...> Signed-off-by: Avi Kivity <av...@qu...> --- arch/x86/kvm/lapic.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 2ccf994..57ac4e4 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -822,6 +822,7 @@ void kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long cr8) apic_set_tpr(apic, ((cr8 & 0x0f) << 4) | (apic_get_reg(apic, APIC_TASKPRI) & 4)); } +EXPORT_SYMBOL_GPL(kvm_lapic_set_tpr); u64 kvm_lapic_get_cr8(struct kvm_vcpu *vcpu) { -- 1.5.5 |
From: Avi K. <av...@qu...> - 2008-04-21 10:30:13
|
From: Joerg Roedel <joe...@am...> This patch disables the intercept of CR8 writes if the TPR is not masking interrupts. This reduces the total number CR8 intercepts to below 1 percent of what we have without this patch using Windows 64 bit guests. Signed-off-by: Joerg Roedel <joe...@am...> Signed-off-by: Avi Kivity <av...@qu...> --- arch/x86/kvm/svm.c | 31 +++++++++++++++++++++++++++---- 1 files changed, 27 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index ee2ee83..61bb2cb 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1502,6 +1502,27 @@ static void svm_set_irq(struct kvm_vcpu *vcpu, int irq) svm_inject_irq(svm, irq); } +static void update_cr8_intercept(struct kvm_vcpu *vcpu) +{ + struct vcpu_svm *svm = to_svm(vcpu); + struct vmcb *vmcb = svm->vmcb; + int max_irr, tpr; + + if (!irqchip_in_kernel(vcpu->kvm) || vcpu->arch.apic->vapic_addr) + return; + + vmcb->control.intercept_cr_write &= ~INTERCEPT_CR8_MASK; + + max_irr = kvm_lapic_find_highest_irr(vcpu); + if (max_irr == -1) + return; + + tpr = kvm_lapic_get_cr8(vcpu) << 4; + + if (tpr >= (max_irr & 0xf0)) + vmcb->control.intercept_cr_write |= INTERCEPT_CR8_MASK; +} + static void svm_intr_assist(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); @@ -1514,14 +1535,14 @@ static void svm_intr_assist(struct kvm_vcpu *vcpu) SVM_EVTINJ_VEC_MASK; vmcb->control.exit_int_info = 0; svm_inject_irq(svm, intr_vector); - return; + goto out; } if (vmcb->control.int_ctl & V_IRQ_MASK) - return; + goto out; if (!kvm_cpu_has_interrupt(vcpu)) - return; + goto out; if (!(vmcb->save.rflags & X86_EFLAGS_IF) || (vmcb->control.int_state & SVM_INTERRUPT_SHADOW_MASK) || @@ -1529,12 +1550,14 @@ static void svm_intr_assist(struct kvm_vcpu *vcpu) /* unable to deliver irq, set pending irq */ vmcb->control.intercept |= (1ULL << INTERCEPT_VINTR); svm_inject_irq(svm, 0x0); - return; + goto out; } /* Okay, we can deliver the interrupt: grab it and update PIC state. */ intr_vector = kvm_cpu_get_interrupt(vcpu); svm_inject_irq(svm, intr_vector); kvm_timer_intr_post(vcpu, intr_vector); +out: + update_cr8_intercept(vcpu); } static void kvm_reput_irq(struct vcpu_svm *svm) -- 1.5.5 |
From: Avi K. <av...@qu...> - 2008-04-21 10:30:13
|
lmsw and smsw were implemented only with a register operand. Extend them to support a memory operand as well. Fixes Windows running some display compatibility test on AMD hosts. Signed-off-by: Avi Kivity <av...@qu...> --- arch/x86/kvm/x86_emulate.c | 29 +++++++++++++++++------------ 1 files changed, 17 insertions(+), 12 deletions(-) diff --git a/arch/x86/kvm/x86_emulate.c b/arch/x86/kvm/x86_emulate.c index 8e1b32f..46ef78f 100644 --- a/arch/x86/kvm/x86_emulate.c +++ b/arch/x86/kvm/x86_emulate.c @@ -275,12 +275,15 @@ static u16 group_table[] = { SrcMem | ModRM, 0, SrcMem | ModRM | Stack, 0, [Group7*8] = 0, 0, ModRM | SrcMem, ModRM | SrcMem, - SrcNone | ModRM | DstMem, 0, SrcMem | ModRM, SrcMem | ModRM | ByteOp, + SrcNone | ModRM | DstMem | Mov, 0, + SrcMem16 | ModRM | Mov, SrcMem | ModRM | ByteOp, }; static u16 group2_table[] = { [Group7*8] = - SrcNone | ModRM, 0, 0, 0, SrcNone | ModRM | DstMem, 0, SrcMem | ModRM, 0, + SrcNone | ModRM, 0, 0, 0, + SrcNone | ModRM | DstMem | Mov, 0, + SrcMem16 | ModRM | Mov, 0, }; /* EFLAGS bit definitions. */ @@ -1722,6 +1725,8 @@ twobyte_insn: goto done; kvm_emulate_hypercall(ctxt->vcpu); + /* Disable writeback. */ + c->dst.type = OP_NONE; break; case 2: /* lgdt */ rc = read_descriptor(ctxt, ops, c->src.ptr, @@ -1729,6 +1734,8 @@ twobyte_insn: if (rc) goto done; realmode_lgdt(ctxt->vcpu, size, address); + /* Disable writeback. */ + c->dst.type = OP_NONE; break; case 3: /* lidt/vmmcall */ if (c->modrm_mod == 3 && c->modrm_rm == 1) { @@ -1744,27 +1751,25 @@ twobyte_insn: goto done; realmode_lidt(ctxt->vcpu, size, address); } + /* Disable writeback. */ + c->dst.type = OP_NONE; break; case 4: /* smsw */ - if (c->modrm_mod != 3) - goto cannot_emulate; - *(u16 *)&c->regs[c->modrm_rm] - = realmode_get_cr(ctxt->vcpu, 0); + c->dst.bytes = 2; + c->dst.val = realmode_get_cr(ctxt->vcpu, 0); break; case 6: /* lmsw */ - if (c->modrm_mod != 3) - goto cannot_emulate; - realmode_lmsw(ctxt->vcpu, (u16)c->modrm_val, - &ctxt->eflags); + realmode_lmsw(ctxt->vcpu, (u16)c->src.val, + &ctxt->eflags); break; case 7: /* invlpg*/ emulate_invlpg(ctxt->vcpu, memop); + /* Disable writeback. */ + c->dst.type = OP_NONE; break; default: goto cannot_emulate; } - /* Disable writeback. */ - c->dst.type = OP_NONE; break; case 0x06: emulate_clts(ctxt->vcpu); -- 1.5.5 |
From: Avi K. <av...@qu...> - 2008-04-21 10:30:13
|
We wish to export it to userspace, so move it into the kvm namespace. Signed-off-by: Avi Kivity <av...@qu...> --- arch/ia64/kvm/kvm-ia64.c | 26 +++++++++++++------------- arch/x86/kvm/i8254.c | 2 +- arch/x86/kvm/lapic.c | 16 ++++++++-------- arch/x86/kvm/x86.c | 18 +++++++++--------- include/asm-ia64/kvm_host.h | 8 ++++---- include/asm-x86/kvm_host.h | 10 +++++----- 6 files changed, 40 insertions(+), 40 deletions(-) diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c index ca1cfb1..f7589db 100644 --- a/arch/ia64/kvm/kvm-ia64.c +++ b/arch/ia64/kvm/kvm-ia64.c @@ -340,7 +340,7 @@ static int handle_ipi(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) regs->cr_iip = vcpu->kvm->arch.rdv_sal_data.boot_ip; regs->r1 = vcpu->kvm->arch.rdv_sal_data.boot_gp; - target_vcpu->arch.mp_state = VCPU_MP_STATE_RUNNABLE; + target_vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; if (waitqueue_active(&target_vcpu->wq)) wake_up_interruptible(&target_vcpu->wq); } else { @@ -386,7 +386,7 @@ static int handle_global_purge(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) for (i = 0; i < KVM_MAX_VCPUS; i++) { if (!kvm->vcpus[i] || kvm->vcpus[i]->arch.mp_state == - VCPU_MP_STATE_UNINITIALIZED || + KVM_MP_STATE_UNINITIALIZED || vcpu == kvm->vcpus[i]) continue; @@ -437,12 +437,12 @@ int kvm_emulate_halt(struct kvm_vcpu *vcpu) hrtimer_start(p_ht, kt, HRTIMER_MODE_ABS); if (irqchip_in_kernel(vcpu->kvm)) { - vcpu->arch.mp_state = VCPU_MP_STATE_HALTED; + vcpu->arch.mp_state = KVM_MP_STATE_HALTED; kvm_vcpu_block(vcpu); hrtimer_cancel(p_ht); vcpu->arch.ht_active = 0; - if (vcpu->arch.mp_state != VCPU_MP_STATE_RUNNABLE) + if (vcpu->arch.mp_state != KVM_MP_STATE_RUNNABLE) return -EINTR; return 1; } else { @@ -668,7 +668,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) vcpu_load(vcpu); - if (unlikely(vcpu->arch.mp_state == VCPU_MP_STATE_UNINITIALIZED)) { + if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)) { kvm_vcpu_block(vcpu); vcpu_put(vcpu); return -EAGAIN; @@ -1127,12 +1127,12 @@ static enum hrtimer_restart hlt_timer_fn(struct hrtimer *data) wait_queue_head_t *q; vcpu = container_of(data, struct kvm_vcpu, arch.hlt_timer); - if (vcpu->arch.mp_state != VCPU_MP_STATE_HALTED) + if (vcpu->arch.mp_state != KVM_MP_STATE_HALTED) goto out; q = &vcpu->wq; if (waitqueue_active(q)) { - vcpu->arch.mp_state = VCPU_MP_STATE_RUNNABLE; + vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; wake_up_interruptible(q); } out: @@ -1159,7 +1159,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) return PTR_ERR(vmm_vcpu); if (vcpu->vcpu_id == 0) { - vcpu->arch.mp_state = VCPU_MP_STATE_RUNNABLE; + vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; /*Set entry address for first run.*/ regs->cr_iip = PALE_RESET_ENTRY; @@ -1172,7 +1172,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) v->arch.last_itc = 0; } } else - vcpu->arch.mp_state = VCPU_MP_STATE_UNINITIALIZED; + vcpu->arch.mp_state = KVM_MP_STATE_UNINITIALIZED; r = -ENOMEM; vcpu->arch.apic = kzalloc(sizeof(struct kvm_lapic), GFP_KERNEL); @@ -1704,10 +1704,10 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, u8 vec, u8 trig) if (!test_and_set_bit(vec, &vpd->irr[0])) { vcpu->arch.irq_new_pending = 1; - if (vcpu->arch.mp_state == VCPU_MP_STATE_RUNNABLE) + if (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE) kvm_vcpu_kick(vcpu); - else if (vcpu->arch.mp_state == VCPU_MP_STATE_HALTED) { - vcpu->arch.mp_state = VCPU_MP_STATE_RUNNABLE; + else if (vcpu->arch.mp_state == KVM_MP_STATE_HALTED) { + vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; if (waitqueue_active(&vcpu->wq)) wake_up_interruptible(&vcpu->wq); } @@ -1790,5 +1790,5 @@ gfn_t unalias_gfn(struct kvm *kvm, gfn_t gfn) int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) { - return vcpu->arch.mp_state == VCPU_MP_STATE_RUNNABLE; + return vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE; } diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c index abb4b16..2852dd1 100644 --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -201,7 +201,7 @@ int __pit_timer_fn(struct kvm_kpit_state *ps) atomic_inc(&pt->pending); smp_mb__after_atomic_inc(); if (vcpu0 && waitqueue_active(&vcpu0->wq)) { - vcpu0->arch.mp_state = VCPU_MP_STATE_RUNNABLE; + vcpu0->arch.mp_state = KVM_MP_STATE_RUNNABLE; wake_up_interruptible(&vcpu0->wq); } diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index debf582..2ccf994 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -338,10 +338,10 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode, } else apic_clear_vector(vector, apic->regs + APIC_TMR); - if (vcpu->arch.mp_state == VCPU_MP_STATE_RUNNABLE) + if (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE) kvm_vcpu_kick(vcpu); - else if (vcpu->arch.mp_state == VCPU_MP_STATE_HALTED) { - vcpu->arch.mp_state = VCPU_MP_STATE_RUNNABLE; + else if (vcpu->arch.mp_state == KVM_MP_STATE_HALTED) { + vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; if (waitqueue_active(&vcpu->wq)) wake_up_interruptible(&vcpu->wq); } @@ -362,11 +362,11 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode, case APIC_DM_INIT: if (level) { - if (vcpu->arch.mp_state == VCPU_MP_STATE_RUNNABLE) + if (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE) printk(KERN_DEBUG "INIT on a runnable vcpu %d\n", vcpu->vcpu_id); - vcpu->arch.mp_state = VCPU_MP_STATE_INIT_RECEIVED; + vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED; kvm_vcpu_kick(vcpu); } else { printk(KERN_DEBUG @@ -379,9 +379,9 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode, case APIC_DM_STARTUP: printk(KERN_DEBUG "SIPI to vcpu %d vector 0x%02x\n", vcpu->vcpu_id, vector); - if (vcpu->arch.mp_state == VCPU_MP_STATE_INIT_RECEIVED) { + if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) { vcpu->arch.sipi_vector = vector; - vcpu->arch.mp_state = VCPU_MP_STATE_SIPI_RECEIVED; + vcpu->arch.mp_state = KVM_MP_STATE_SIPI_RECEIVED; if (waitqueue_active(&vcpu->wq)) wake_up_interruptible(&vcpu->wq); } @@ -940,7 +940,7 @@ static int __apic_timer_fn(struct kvm_lapic *apic) atomic_inc(&apic->timer.pending); if (waitqueue_active(q)) { - apic->vcpu->arch.mp_state = VCPU_MP_STATE_RUNNABLE; + apic->vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; wake_up_interruptible(q); } if (apic_lvtt_period(apic)) { diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index f070f0a..b364d19 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2433,11 +2433,11 @@ int kvm_emulate_halt(struct kvm_vcpu *vcpu) ++vcpu->stat.halt_exits; KVMTRACE_0D(HLT, vcpu, handler); if (irqchip_in_kernel(vcpu->kvm)) { - vcpu->arch.mp_state = VCPU_MP_STATE_HALTED; + vcpu->arch.mp_state = KVM_MP_STATE_HALTED; up_read(&vcpu->kvm->slots_lock); kvm_vcpu_block(vcpu); down_read(&vcpu->kvm->slots_lock); - if (vcpu->arch.mp_state != VCPU_MP_STATE_RUNNABLE) + if (vcpu->arch.mp_state != KVM_MP_STATE_RUNNABLE) return -EINTR; return 1; } else { @@ -2726,14 +2726,14 @@ static int __vcpu_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) { int r; - if (unlikely(vcpu->arch.mp_state == VCPU_MP_STATE_SIPI_RECEIVED)) { + if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED)) { pr_debug("vcpu %d received sipi with vector # %x\n", vcpu->vcpu_id, vcpu->arch.sipi_vector); kvm_lapic_reset(vcpu); r = kvm_x86_ops->vcpu_reset(vcpu); if (r) return r; - vcpu->arch.mp_state = VCPU_MP_STATE_RUNNABLE; + vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; } down_read(&vcpu->kvm->slots_lock); @@ -2891,7 +2891,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) vcpu_load(vcpu); - if (unlikely(vcpu->arch.mp_state == VCPU_MP_STATE_UNINITIALIZED)) { + if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)) { kvm_vcpu_block(vcpu); vcpu_put(vcpu); return -EAGAIN; @@ -3794,9 +3794,9 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) vcpu->arch.mmu.root_hpa = INVALID_PAGE; if (!irqchip_in_kernel(kvm) || vcpu->vcpu_id == 0) - vcpu->arch.mp_state = VCPU_MP_STATE_RUNNABLE; + vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; else - vcpu->arch.mp_state = VCPU_MP_STATE_UNINITIALIZED; + vcpu->arch.mp_state = KVM_MP_STATE_UNINITIALIZED; page = alloc_page(GFP_KERNEL | __GFP_ZERO); if (!page) { @@ -3936,8 +3936,8 @@ int kvm_arch_set_memory_region(struct kvm *kvm, int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) { - return vcpu->arch.mp_state == VCPU_MP_STATE_RUNNABLE - || vcpu->arch.mp_state == VCPU_MP_STATE_SIPI_RECEIVED; + return vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE + || vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED; } static void vcpu_kick_intr(void *info) diff --git a/include/asm-ia64/kvm_host.h b/include/asm-ia64/kvm_host.h index d6d6e15..c082c20 100644 --- a/include/asm-ia64/kvm_host.h +++ b/include/asm-ia64/kvm_host.h @@ -318,10 +318,10 @@ struct kvm_vcpu_arch { int vmm_tr_slot; int vm_tr_slot; -#define VCPU_MP_STATE_RUNNABLE 0 -#define VCPU_MP_STATE_UNINITIALIZED 1 -#define VCPU_MP_STATE_INIT_RECEIVED 2 -#define VCPU_MP_STATE_HALTED 3 +#define KVM_MP_STATE_RUNNABLE 0 +#define KVM_MP_STATE_UNINITIALIZED 1 +#define KVM_MP_STATE_INIT_RECEIVED 2 +#define KVM_MP_STATE_HALTED 3 int mp_state; #define MAX_PTC_G_NUM 3 diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h index 15169cb..f35a6ad 100644 --- a/include/asm-x86/kvm_host.h +++ b/include/asm-x86/kvm_host.h @@ -227,11 +227,11 @@ struct kvm_vcpu_arch { u64 shadow_efer; u64 apic_base; struct kvm_lapic *apic; /* kernel irqchip context */ -#define VCPU_MP_STATE_RUNNABLE 0 -#define VCPU_MP_STATE_UNINITIALIZED 1 -#define VCPU_MP_STATE_INIT_RECEIVED 2 -#define VCPU_MP_STATE_SIPI_RECEIVED 3 -#define VCPU_MP_STATE_HALTED 4 +#define KVM_MP_STATE_RUNNABLE 0 +#define KVM_MP_STATE_UNINITIALIZED 1 +#define KVM_MP_STATE_INIT_RECEIVED 2 +#define KVM_MP_STATE_SIPI_RECEIVED 3 +#define KVM_MP_STATE_HALTED 4 int mp_state; int sipi_vector; u64 ia32_misc_enable_msr; -- 1.5.5 |
From: Avi K. <av...@qu...> - 2008-04-21 10:30:12
|
Shutdown interception clears the vmcb, leaving the asid at zero (which is illegal. so force a new asid on vmcb initialization. Signed-off-by: Avi Kivity <av...@qu...> --- arch/x86/kvm/svm.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 8d04aed..3379e13 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -603,7 +603,7 @@ static void init_vmcb(struct vcpu_svm *svm) save->cr3 = 0; save->cr4 = 0; } - + force_new_asid(&svm->vcpu); } static int svm_vcpu_reset(struct kvm_vcpu *vcpu) -- 1.5.5 |
From: Avi K. <av...@qu...> - 2008-04-21 10:30:12
|
From: Joerg Roedel <joe...@am...> There is not selective cr0 intercept bug. The code in the comment sets the CR0.PG bit. But KVM sets the CR4.PG bit for SVM always to implement the paged real mode. So the 'mov %eax,%cr0' instruction does not change the CR0.PG bit. Selective CR0 intercepts only occur when a bit is actually changed. So its the right behavior that there is no intercept on this instruction. Signed-off-by: Joerg Roedel <joe...@am...> Signed-off-by: Avi Kivity <av...@qu...> --- arch/x86/kvm/svm.c | 11 ----------- 1 files changed, 0 insertions(+), 11 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index d643605..89e0be2 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -513,17 +513,6 @@ static void init_vmcb(struct vcpu_svm *svm) control->intercept = (1ULL << INTERCEPT_INTR) | (1ULL << INTERCEPT_NMI) | (1ULL << INTERCEPT_SMI) | - /* - * selective cr0 intercept bug? - * 0: 0f 22 d8 mov %eax,%cr3 - * 3: 0f 20 c0 mov %cr0,%eax - * 6: 0d 00 00 00 80 or $0x80000000,%eax - * b: 0f 22 c0 mov %eax,%cr0 - * set cr3 ->interception - * get cr0 ->interception - * set cr0 -> no interception - */ - /* (1ULL << INTERCEPT_SELECTIVE_CR0) | */ (1ULL << INTERCEPT_CPUID) | (1ULL << INTERCEPT_INVD) | (1ULL << INTERCEPT_HLT) | -- 1.5.5 |
From: Avi K. <av...@qu...> - 2008-04-21 10:30:12
|
From: Hollis Blanchard <ho...@us...> This functionality is definitely experimental, but is capable of running unmodified PowerPC 440 Linux kernels as guests on a PowerPC 440 host. (Only tested with 440EP "Bamboo" guests so far, but with appropriate userspace support other SoC/board combinations should work.) See Documentation/powerpc/kvm_440.txt for technical details. Signed-off-by: Hollis Blanchard <ho...@us...> Acked-by: Paul Mackerras <pa...@sa...> Signed-off-by: Avi Kivity <av...@qu...> --- Documentation/powerpc/kvm_440.txt | 41 ++ arch/powerpc/Kconfig | 1 + arch/powerpc/Kconfig.debug | 3 + arch/powerpc/Makefile | 1 + arch/powerpc/kernel/asm-offsets.c | 26 ++ arch/powerpc/kvm/44x_tlb.c | 224 ++++++++++ arch/powerpc/kvm/44x_tlb.h | 91 +++++ arch/powerpc/kvm/Kconfig | 44 ++ arch/powerpc/kvm/Makefile | 15 + arch/powerpc/kvm/booke_guest.c | 615 ++++++++++++++++++++++++++++ arch/powerpc/kvm/booke_host.c | 83 ++++ arch/powerpc/kvm/booke_interrupts.S | 436 ++++++++++++++++++++ arch/powerpc/kvm/emulate.c | 760 +++++++++++++++++++++++++++++++++++ arch/powerpc/kvm/powerpc.c | 436 ++++++++++++++++++++ include/asm-powerpc/kvm.h | 53 +++- include/asm-powerpc/kvm_asm.h | 55 +++ include/asm-powerpc/kvm_host.h | 152 +++++++ include/asm-powerpc/kvm_para.h | 38 ++ include/asm-powerpc/kvm_ppc.h | 88 ++++ 19 files changed, 3160 insertions(+), 2 deletions(-) create mode 100644 Documentation/powerpc/kvm_440.txt create mode 100644 arch/powerpc/kvm/44x_tlb.c create mode 100644 arch/powerpc/kvm/44x_tlb.h create mode 100644 arch/powerpc/kvm/Kconfig create mode 100644 arch/powerpc/kvm/Makefile create mode 100644 arch/powerpc/kvm/booke_guest.c create mode 100644 arch/powerpc/kvm/booke_host.c create mode 100644 arch/powerpc/kvm/booke_interrupts.S create mode 100644 arch/powerpc/kvm/emulate.c create mode 100644 arch/powerpc/kvm/powerpc.c create mode 100644 include/asm-powerpc/kvm_asm.h create mode 100644 include/asm-powerpc/kvm_host.h create mode 100644 include/asm-powerpc/kvm_para.h create mode 100644 include/asm-powerpc/kvm_ppc.h diff --git a/Documentation/powerpc/kvm_440.txt b/Documentation/powerpc/kvm_440.txt new file mode 100644 index 0000000..c02a003 --- /dev/null +++ b/Documentation/powerpc/kvm_440.txt @@ -0,0 +1,41 @@ +Hollis Blanchard <ho...@us...> +15 Apr 2008 + +Various notes on the implementation of KVM for PowerPC 440: + +To enforce isolation, host userspace, guest kernel, and guest userspace all +run at user privilege level. Only the host kernel runs in supervisor mode. +Executing privileged instructions in the guest traps into KVM (in the host +kernel), where we decode and emulate them. Through this technique, unmodified +440 Linux kernels can be run (slowly) as guests. Future performance work will +focus on reducing the overhead and frequency of these traps. + +The usual code flow is started from userspace invoking an "run" ioctl, which +causes KVM to switch into guest context. We use IVPR to hijack the host +interrupt vectors while running the guest, which allows us to direct all +interrupts to kvmppc_handle_interrupt(). At this point, we could either +- handle the interrupt completely (e.g. emulate "mtspr SPRG0"), or +- let the host interrupt handler run (e.g. when the decrementer fires), or +- return to host userspace (e.g. when the guest performs device MMIO) + +Address spaces: We take advantage of the fact that Linux doesn't use the AS=1 +address space (in host or guest), which gives us virtual address space to use +for guest mappings. While the guest is running, the host kernel remains mapped +in AS=0, but the guest can only use AS=1 mappings. + +TLB entries: The TLB entries covering the host linear mapping remain +present while running the guest. This reduces the overhead of lightweight +exits, which are handled by KVM running in the host kernel. We keep three +copies of the TLB: + - guest TLB: contents of the TLB as the guest sees it + - shadow TLB: the TLB that is actually in hardware while guest is running + - host TLB: to restore TLB state when context switching guest -> host +When a TLB miss occurs because a mapping was not present in the shadow TLB, +but was present in the guest TLB, KVM handles the fault without invoking the +guest. Large guest pages are backed by multiple 4KB shadow pages through this +mechanism. + +IO: MMIO and DCR accesses are emulated by userspace. We use virtio for network +and block IO, so those drivers must be enabled in the guest. It's possible +that some qemu device emulation (e.g. e1000 or rtl8139) may also work with +little effort. diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 1189d8d..c27ad5e 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -722,3 +722,4 @@ config PPC_CLOCK config PPC_LIB_RHEAP bool +source "arch/powerpc/kvm/Kconfig" diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug index db7cc34..9aec870 100644 --- a/arch/powerpc/Kconfig.debug +++ b/arch/powerpc/Kconfig.debug @@ -151,6 +151,9 @@ config BOOTX_TEXT config PPC_EARLY_DEBUG bool "Early debugging (dangerous)" + # PPC_EARLY_DEBUG on 440 leaves AS=1 mappings above the TLB high water + # mark, which doesn't work with current 440 KVM. + depends on !KVM help Say Y to enable some early debugging facilities that may be available for your processor/board combination. Those facilities are hacks diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index ab5cfe8..6df2d6c 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -147,6 +147,7 @@ core-y += arch/powerpc/kernel/ \ arch/powerpc/platforms/ core-$(CONFIG_MATH_EMULATION) += arch/powerpc/math-emu/ core-$(CONFIG_XMON) += arch/powerpc/xmon/ +core-$(CONFIG_KVM) += arch/powerpc/kvm/ drivers-$(CONFIG_OPROFILE) += arch/powerpc/oprofile/ diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 4b749c4..f8088d2 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -23,6 +23,7 @@ #include <linux/mm.h> #include <linux/suspend.h> #include <linux/hrtimer.h> +#include <linux/kvm_host.h> #ifdef CONFIG_PPC64 #include <linux/time.h> #include <linux/hardirq.h> @@ -329,5 +330,30 @@ int main(void) DEFINE(PGD_TABLE_SIZE, PGD_TABLE_SIZE); +#ifdef CONFIG_KVM + DEFINE(TLBE_BYTES, sizeof(struct tlbe)); + + DEFINE(VCPU_HOST_STACK, offsetof(struct kvm_vcpu, arch.host_stack)); + DEFINE(VCPU_HOST_PID, offsetof(struct kvm_vcpu, arch.host_pid)); + DEFINE(VCPU_HOST_TLB, offsetof(struct kvm_vcpu, arch.host_tlb)); + DEFINE(VCPU_SHADOW_TLB, offsetof(struct kvm_vcpu, arch.shadow_tlb)); + DEFINE(VCPU_GPRS, offsetof(struct kvm_vcpu, arch.gpr)); + DEFINE(VCPU_LR, offsetof(struct kvm_vcpu, arch.lr)); + DEFINE(VCPU_CR, offsetof(struct kvm_vcpu, arch.cr)); + DEFINE(VCPU_XER, offsetof(struct kvm_vcpu, arch.xer)); + DEFINE(VCPU_CTR, offsetof(struct kvm_vcpu, arch.ctr)); + DEFINE(VCPU_PC, offsetof(struct kvm_vcpu, arch.pc)); + DEFINE(VCPU_MSR, offsetof(struct kvm_vcpu, arch.msr)); + DEFINE(VCPU_SPRG4, offsetof(struct kvm_vcpu, arch.sprg4)); + DEFINE(VCPU_SPRG5, offsetof(struct kvm_vcpu, arch.sprg5)); + DEFINE(VCPU_SPRG6, offsetof(struct kvm_vcpu, arch.sprg6)); + DEFINE(VCPU_SPRG7, offsetof(struct kvm_vcpu, arch.sprg7)); + DEFINE(VCPU_PID, offsetof(struct kvm_vcpu, arch.pid)); + + DEFINE(VCPU_LAST_INST, offsetof(struct kvm_vcpu, arch.last_inst)); + DEFINE(VCPU_FAULT_DEAR, offsetof(struct kvm_vcpu, arch.fault_dear)); + DEFINE(VCPU_FAULT_ESR, offsetof(struct kvm_vcpu, arch.fault_esr)); +#endif + return 0; } diff --git a/arch/powerpc/kvm/44x_tlb.c b/arch/powerpc/kvm/44x_tlb.c new file mode 100644 index 0000000..f5d7a5e --- /dev/null +++ b/arch/powerpc/kvm/44x_tlb.c @@ -0,0 +1,224 @@ +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. + * + * Copyright IBM Corp. 2007 + * + * Authors: Hollis Blanchard <ho...@us...> + */ + +#include <linux/types.h> +#include <linux/string.h> +#include <linux/kvm_host.h> +#include <linux/highmem.h> +#include <asm/mmu-44x.h> +#include <asm/kvm_ppc.h> + +#include "44x_tlb.h" + +#define PPC44x_TLB_USER_PERM_MASK (PPC44x_TLB_UX|PPC44x_TLB_UR|PPC44x_TLB_UW) +#define PPC44x_TLB_SUPER_PERM_MASK (PPC44x_TLB_SX|PPC44x_TLB_SR|PPC44x_TLB_SW) + +static unsigned int kvmppc_tlb_44x_pos; + +static u32 kvmppc_44x_tlb_shadow_attrib(u32 attrib, int usermode) +{ + /* Mask off reserved bits. */ + attrib &= PPC44x_TLB_PERM_MASK|PPC44x_TLB_ATTR_MASK; + + if (!usermode) { + /* Guest is in supervisor mode, so we need to translate guest + * supervisor permissions into user permissions. */ + attrib &= ~PPC44x_TLB_USER_PERM_MASK; + attrib |= (attrib & PPC44x_TLB_SUPER_PERM_MASK) << 3; + } + + /* Make sure host can always access this memory. */ + attrib |= PPC44x_TLB_SX|PPC44x_TLB_SR|PPC44x_TLB_SW; + + return attrib; +} + +/* Search the guest TLB for a matching entry. */ +int kvmppc_44x_tlb_index(struct kvm_vcpu *vcpu, gva_t eaddr, unsigned int pid, + unsigned int as) +{ + int i; + + /* XXX Replace loop with fancy data structures. */ + for (i = 0; i < PPC44x_TLB_SIZE; i++) { + struct tlbe *tlbe = &vcpu->arch.guest_tlb[i]; + unsigned int tid; + + if (eaddr < get_tlb_eaddr(tlbe)) + continue; + + if (eaddr > get_tlb_end(tlbe)) + continue; + + tid = get_tlb_tid(tlbe); + if (tid && (tid != pid)) + continue; + + if (!get_tlb_v(tlbe)) + continue; + + if (get_tlb_ts(tlbe) != as) + continue; + + return i; + } + + return -1; +} + +struct tlbe *kvmppc_44x_itlb_search(struct kvm_vcpu *vcpu, gva_t eaddr) +{ + unsigned int as = !!(vcpu->arch.msr & MSR_IS); + unsigned int index; + + index = kvmppc_44x_tlb_index(vcpu, eaddr, vcpu->arch.pid, as); + if (index == -1) + return NULL; + return &vcpu->arch.guest_tlb[index]; +} + +struct tlbe *kvmppc_44x_dtlb_search(struct kvm_vcpu *vcpu, gva_t eaddr) +{ + unsigned int as = !!(vcpu->arch.msr & MSR_DS); + unsigned int index; + + index = kvmppc_44x_tlb_index(vcpu, eaddr, vcpu->arch.pid, as); + if (index == -1) + return NULL; + return &vcpu->arch.guest_tlb[index]; +} + +static int kvmppc_44x_tlbe_is_writable(struct tlbe *tlbe) +{ + return tlbe->word2 & (PPC44x_TLB_SW|PPC44x_TLB_UW); +} + +/* Must be called with mmap_sem locked for writing. */ +static void kvmppc_44x_shadow_release(struct kvm_vcpu *vcpu, + unsigned int index) +{ + struct tlbe *stlbe = &vcpu->arch.shadow_tlb[index]; + struct page *page = vcpu->arch.shadow_pages[index]; + + kunmap(vcpu->arch.shadow_pages[index]); + + if (get_tlb_v(stlbe)) { + if (kvmppc_44x_tlbe_is_writable(stlbe)) + kvm_release_page_dirty(page); + else + kvm_release_page_clean(page); + } +} + +/* Caller must ensure that the specified guest TLB entry is safe to insert into + * the shadow TLB. */ +void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 gvaddr, gfn_t gfn, u64 asid, + u32 flags) +{ + struct page *new_page; + struct tlbe *stlbe; + hpa_t hpaddr; + unsigned int victim; + + /* Future optimization: don't overwrite the TLB entry containing the + * current PC (or stack?). */ + victim = kvmppc_tlb_44x_pos++; + if (kvmppc_tlb_44x_pos > tlb_44x_hwater) + kvmppc_tlb_44x_pos = 0; + stlbe = &vcpu->arch.shadow_tlb[victim]; + + /* Get reference to new page. */ + down_write(¤t->mm->mmap_sem); + new_page = gfn_to_page(vcpu->kvm, gfn); + if (is_error_page(new_page)) { + printk(KERN_ERR "Couldn't get guest page!\n"); + kvm_release_page_clean(new_page); + return; + } + hpaddr = page_to_phys(new_page); + + /* Drop reference to old page. */ + kvmppc_44x_shadow_release(vcpu, victim); + up_write(¤t->mm->mmap_sem); + + vcpu->arch.shadow_pages[victim] = new_page; + + /* XXX Make sure (va, size) doesn't overlap any other + * entries. 440x6 user manual says the result would be + * "undefined." */ + + /* XXX what about AS? */ + + stlbe->tid = asid & 0xff; + + /* Force TS=1 for all guest mappings. */ + /* For now we hardcode 4KB mappings, but it will be important to + * use host large pages in the future. */ + stlbe->word0 = (gvaddr & PAGE_MASK) | PPC44x_TLB_VALID | PPC44x_TLB_TS + | PPC44x_TLB_4K; + + stlbe->word1 = (hpaddr & 0xfffffc00) | ((hpaddr >> 32) & 0xf); + stlbe->word2 = kvmppc_44x_tlb_shadow_attrib(flags, + vcpu->arch.msr & MSR_PR); +} + +void kvmppc_mmu_invalidate(struct kvm_vcpu *vcpu, u64 eaddr, u64 asid) +{ + unsigned int pid = asid & 0xff; + int i; + + /* XXX Replace loop with fancy data structures. */ + down_write(¤t->mm->mmap_sem); + for (i = 0; i <= tlb_44x_hwater; i++) { + struct tlbe *stlbe = &vcpu->arch.shadow_tlb[i]; + unsigned int tid; + + if (!get_tlb_v(stlbe)) + continue; + + if (eaddr < get_tlb_eaddr(stlbe)) + continue; + + if (eaddr > get_tlb_end(stlbe)) + continue; + + tid = get_tlb_tid(stlbe); + if (tid && (tid != pid)) + continue; + + kvmppc_44x_shadow_release(vcpu, i); + stlbe->word0 = 0; + } + up_write(¤t->mm->mmap_sem); +} + +/* Invalidate all mappings, so that when they fault back in they will get the + * proper permission bits. */ +void kvmppc_mmu_priv_switch(struct kvm_vcpu *vcpu, int usermode) +{ + int i; + + /* XXX Replace loop with fancy data structures. */ + down_write(¤t->mm->mmap_sem); + for (i = 0; i <= tlb_44x_hwater; i++) { + kvmppc_44x_shadow_release(vcpu, i); + vcpu->arch.shadow_tlb[i].word0 = 0; + } + up_write(¤t->mm->mmap_sem); +} diff --git a/arch/powerpc/kvm/44x_tlb.h b/arch/powerpc/kvm/44x_tlb.h new file mode 100644 index 0000000..2ccd46b --- /dev/null +++ b/arch/powerpc/kvm/44x_tlb.h @@ -0,0 +1,91 @@ +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. + * + * Copyright IBM Corp. 2007 + * + * Authors: Hollis Blanchard <ho...@us...> + */ + +#ifndef __KVM_POWERPC_TLB_H__ +#define __KVM_POWERPC_TLB_H__ + +#include <linux/kvm_host.h> +#include <asm/mmu-44x.h> + +extern int kvmppc_44x_tlb_index(struct kvm_vcpu *vcpu, gva_t eaddr, + unsigned int pid, unsigned int as); +extern struct tlbe *kvmppc_44x_dtlb_search(struct kvm_vcpu *vcpu, gva_t eaddr); +extern struct tlbe *kvmppc_44x_itlb_search(struct kvm_vcpu *vcpu, gva_t eaddr); + +/* TLB helper functions */ +static inline unsigned int get_tlb_size(const struct tlbe *tlbe) +{ + return (tlbe->word0 >> 4) & 0xf; +} + +static inline gva_t get_tlb_eaddr(const struct tlbe *tlbe) +{ + return tlbe->word0 & 0xfffffc00; +} + +static inline gva_t get_tlb_bytes(const struct tlbe *tlbe) +{ + unsigned int pgsize = get_tlb_size(tlbe); + return 1 << 10 << (pgsize << 1); +} + +static inline gva_t get_tlb_end(const struct tlbe *tlbe) +{ + return get_tlb_eaddr(tlbe) + get_tlb_bytes(tlbe) - 1; +} + +static inline u64 get_tlb_raddr(const struct tlbe *tlbe) +{ + u64 word1 = tlbe->word1; + return ((word1 & 0xf) << 32) | (word1 & 0xfffffc00); +} + +static inline unsigned int get_tlb_tid(const struct tlbe *tlbe) +{ + return tlbe->tid & 0xff; +} + +static inline unsigned int get_tlb_ts(const struct tlbe *tlbe) +{ + return (tlbe->word0 >> 8) & 0x1; +} + +static inline unsigned int get_tlb_v(const struct tlbe *tlbe) +{ + return (tlbe->word0 >> 9) & 0x1; +} + +static inline unsigned int get_mmucr_stid(const struct kvm_vcpu *vcpu) +{ + return vcpu->arch.mmucr & 0xff; +} + +static inline unsigned int get_mmucr_sts(const struct kvm_vcpu *vcpu) +{ + return (vcpu->arch.mmucr >> 16) & 0x1; +} + +static inline gpa_t tlb_xlate(struct tlbe *tlbe, gva_t eaddr) +{ + unsigned int pgmask = get_tlb_bytes(tlbe) - 1; + + return get_tlb_raddr(tlbe) | (eaddr & pgmask); +} + +#endif /* __KVM_POWERPC_TLB_H__ */ diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig new file mode 100644 index 0000000..989ee82 --- /dev/null +++ b/arch/powerpc/kvm/Kconfig @@ -0,0 +1,44 @@ +# +# KVM configuration +# + +menuconfig VIRTUALIZATION + bool "Virtualization" + ---help--- + Say Y here to get to see options for using your Linux host to run + other operating systems inside virtual machines (guests). + This option alone does not add any kernel code. + + If you say N, all options in this submenu will be skipped and + disabled. + +if VIRTUALIZATION + +config KVM + tristate "Kernel-based Virtual Machine (KVM) support" + depends on EXPERIMENTAL + select PREEMPT_NOTIFIERS + select ANON_INODES + ---help--- + Support hosting virtualized guest machines. You will also + need to select one or more of the processor modules below. + + This module provides access to the hardware capabilities through + a character device node named /dev/kvm. + + To compile this as a module, choose M here: the module + will be called kvm. + + If unsure, say N. + +config KVM_BOOKE_HOST + tristate "KVM host support for Book E PowerPC processors" + depends on KVM && 44x + ---help--- + Provides host support for KVM on Book E PowerPC processors. Currently + this works on 440 processors only. + +source drivers/virtio/Kconfig + +endif # VIRTUALIZATION + diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile new file mode 100644 index 0000000..d0d358d --- /dev/null +++ b/arch/powerpc/kvm/Makefile @@ -0,0 +1,15 @@ +# +# Makefile for Kernel-based Virtual Machine module +# + +EXTRA_CFLAGS += -Ivirt/kvm -Iarch/powerpc/kvm + +common-objs = $(addprefix ../../../virt/kvm/, kvm_main.o) + +kvm-objs := $(common-objs) powerpc.o emulate.o booke_guest.o +obj-$(CONFIG_KVM) += kvm.o + +AFLAGS_booke_interrupts.o := -I$(obj) + +kvm-booke-host-objs := booke_host.o booke_interrupts.o 44x_tlb.o +obj-$(CONFIG_KVM_BOOKE_HOST) += kvm-booke-host.o diff --git a/arch/powerpc/kvm/booke_guest.c b/arch/powerpc/kvm/booke_guest.c new file mode 100644 index 0000000..6d9884a --- /dev/null +++ b/arch/powerpc/kvm/booke_guest.c @@ -0,0 +1,615 @@ +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. + * + * Copyright IBM Corp. 2007 + * + * Authors: Hollis Blanchard <ho...@us...> + * Christian Ehrhardt <ehr...@li...> + */ + +#include <linux/errno.h> +#include <linux/err.h> +#include <linux/kvm_host.h> +#include <linux/module.h> +#include <linux/vmalloc.h> +#include <linux/fs.h> +#include <asm/cputable.h> +#include <asm/uaccess.h> +#include <asm/kvm_ppc.h> + +#include "44x_tlb.h" + +#define VM_STAT(x) offsetof(struct kvm, stat.x), KVM_STAT_VM +#define VCPU_STAT(x) offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU + +struct kvm_stats_debugfs_item debugfs_entries[] = { + { "exits", VCPU_STAT(sum_exits) }, + { "mmio", VCPU_STAT(mmio_exits) }, + { "dcr", VCPU_STAT(dcr_exits) }, + { "sig", VCPU_STAT(signal_exits) }, + { "light", VCPU_STAT(light_exits) }, + { "itlb_r", VCPU_STAT(itlb_real_miss_exits) }, + { "itlb_v", VCPU_STAT(itlb_virt_miss_exits) }, + { "dtlb_r", VCPU_STAT(dtlb_real_miss_exits) }, + { "dtlb_v", VCPU_STAT(dtlb_virt_miss_exits) }, + { "sysc", VCPU_STAT(syscall_exits) }, + { "isi", VCPU_STAT(isi_exits) }, + { "dsi", VCPU_STAT(dsi_exits) }, + { "inst_emu", VCPU_STAT(emulated_inst_exits) }, + { "dec", VCPU_STAT(dec_exits) }, + { "ext_intr", VCPU_STAT(ext_intr_exits) }, + { NULL } +}; + +static const u32 interrupt_msr_mask[16] = { + [BOOKE_INTERRUPT_CRITICAL] = MSR_ME, + [BOOKE_INTERRUPT_MACHINE_CHECK] = 0, + [BOOKE_INTERRUPT_DATA_STORAGE] = MSR_CE|MSR_ME|MSR_DE, + [BOOKE_INTERRUPT_INST_STORAGE] = MSR_CE|MSR_ME|MSR_DE, + [BOOKE_INTERRUPT_EXTERNAL] = MSR_CE|MSR_ME|MSR_DE, + [BOOKE_INTERRUPT_ALIGNMENT] = MSR_CE|MSR_ME|MSR_DE, + [BOOKE_INTERRUPT_PROGRAM] = MSR_CE|MSR_ME|MSR_DE, + [BOOKE_INTERRUPT_FP_UNAVAIL] = MSR_CE|MSR_ME|MSR_DE, + [BOOKE_INTERRUPT_SYSCALL] = MSR_CE|MSR_ME|MSR_DE, + [BOOKE_INTERRUPT_AP_UNAVAIL] = MSR_CE|MSR_ME|MSR_DE, + [BOOKE_INTERRUPT_DECREMENTER] = MSR_CE|MSR_ME|MSR_DE, + [BOOKE_INTERRUPT_FIT] = MSR_CE|MSR_ME|MSR_DE, + [BOOKE_INTERRUPT_WATCHDOG] = MSR_ME, + [BOOKE_INTERRUPT_DTLB_MISS] = MSR_CE|MSR_ME|MSR_DE, + [BOOKE_INTERRUPT_ITLB_MISS] = MSR_CE|MSR_ME|MSR_DE, + [BOOKE_INTERRUPT_DEBUG] = MSR_ME, +}; + +const unsigned char exception_priority[] = { + [BOOKE_INTERRUPT_DATA_STORAGE] = 0, + [BOOKE_INTERRUPT_INST_STORAGE] = 1, + [BOOKE_INTERRUPT_ALIGNMENT] = 2, + [BOOKE_INTERRUPT_PROGRAM] = 3, + [BOOKE_INTERRUPT_FP_UNAVAIL] = 4, + [BOOKE_INTERRUPT_SYSCALL] = 5, + [BOOKE_INTERRUPT_AP_UNAVAIL] = 6, + [BOOKE_INTERRUPT_DTLB_MISS] = 7, + [BOOKE_INTERRUPT_ITLB_MISS] = 8, + [BOOKE_INTERRUPT_MACHINE_CHECK] = 9, + [BOOKE_INTERRUPT_DEBUG] = 10, + [BOOKE_INTERRUPT_CRITICAL] = 11, + [BOOKE_INTERRUPT_WATCHDOG] = 12, + [BOOKE_INTERRUPT_EXTERNAL] = 13, + [BOOKE_INTERRUPT_FIT] = 14, + [BOOKE_INTERRUPT_DECREMENTER] = 15, +}; + +const unsigned char priority_exception[] = { + BOOKE_INTERRUPT_DATA_STORAGE, + BOOKE_INTERRUPT_INST_STORAGE, + BOOKE_INTERRUPT_ALIGNMENT, + BOOKE_INTERRUPT_PROGRAM, + BOOKE_INTERRUPT_FP_UNAVAIL, + BOOKE_INTERRUPT_SYSCALL, + BOOKE_INTERRUPT_AP_UNAVAIL, + BOOKE_INTERRUPT_DTLB_MISS, + BOOKE_INTERRUPT_ITLB_MISS, + BOOKE_INTERRUPT_MACHINE_CHECK, + BOOKE_INTERRUPT_DEBUG, + BOOKE_INTERRUPT_CRITICAL, + BOOKE_INTERRUPT_WATCHDOG, + BOOKE_INTERRUPT_EXTERNAL, + BOOKE_INTERRUPT_FIT, + BOOKE_INTERRUPT_DECREMENTER, +}; + + +void kvmppc_dump_tlbs(struct kvm_vcpu *vcpu) +{ + struct tlbe *tlbe; + int i; + + printk("vcpu %d TLB dump:\n", vcpu->vcpu_id); + printk("| %2s | %3s | %8s | %8s | %8s |\n", + "nr", "tid", "word0", "word1", "word2"); + + for (i = 0; i < PPC44x_TLB_SIZE; i++) { + tlbe = &vcpu->arch.guest_tlb[i]; + if (tlbe->word0 & PPC44x_TLB_VALID) + printk(" G%2d | %02X | %08X | %08X | %08X |\n", + i, tlbe->tid, tlbe->word0, tlbe->word1, + tlbe->word2); + } + + for (i = 0; i < PPC44x_TLB_SIZE; i++) { + tlbe = &vcpu->arch.shadow_tlb[i]; + if (tlbe->word0 & PPC44x_TLB_VALID) + printk(" S%2d | %02X | %08X | %08X | %08X |\n", + i, tlbe->tid, tlbe->word0, tlbe->word1, + tlbe->word2); + } +} + +/* TODO: use vcpu_printf() */ +void kvmppc_dump_vcpu(struct kvm_vcpu *vcpu) +{ + int i; + + printk("pc: %08x msr: %08x\n", vcpu->arch.pc, vcpu->arch.msr); + printk("lr: %08x ctr: %08x\n", vcpu->arch.lr, vcpu->arch.ctr); + printk("srr0: %08x srr1: %08x\n", vcpu->arch.srr0, vcpu->arch.srr1); + + printk("exceptions: %08lx\n", vcpu->arch.pending_exceptions); + + for (i = 0; i < 32; i += 4) { + printk("gpr%02d: %08x %08x %08x %08x\n", i, + vcpu->arch.gpr[i], + vcpu->arch.gpr[i+1], + vcpu->arch.gpr[i+2], + vcpu->arch.gpr[i+3]); + } +} + +/* Check if we are ready to deliver the interrupt */ +static int kvmppc_can_deliver_interrupt(struct kvm_vcpu *vcpu, int interrupt) +{ + int r; + + switch (interrupt) { + case BOOKE_INTERRUPT_CRITICAL: + r = vcpu->arch.msr & MSR_CE; + break; + case BOOKE_INTERRUPT_MACHINE_CHECK: + r = vcpu->arch.msr & MSR_ME; + break; + case BOOKE_INTERRUPT_EXTERNAL: + r = vcpu->arch.msr & MSR_EE; + break; + case BOOKE_INTERRUPT_DECREMENTER: + r = vcpu->arch.msr & MSR_EE; + break; + case BOOKE_INTERRUPT_FIT: + r = vcpu->arch.msr & MSR_EE; + break; + case BOOKE_INTERRUPT_WATCHDOG: + r = vcpu->arch.msr & MSR_CE; + break; + case BOOKE_INTERRUPT_DEBUG: + r = vcpu->arch.msr & MSR_DE; + break; + default: + r = 1; + } + + return r; +} + +static void kvmppc_deliver_interrupt(struct kvm_vcpu *vcpu, int interrupt) +{ + switch (interrupt) { + case BOOKE_INTERRUPT_DECREMENTER: + vcpu->arch.tsr |= TSR_DIS; + break; + } + + vcpu->arch.srr0 = vcpu->arch.pc; + vcpu->arch.srr1 = vcpu->arch.msr; + vcpu->arch.pc = vcpu->arch.ivpr | vcpu->arch.ivor[interrupt]; + kvmppc_set_msr(vcpu, vcpu->arch.msr & interrupt_msr_mask[interrupt]); +} + +/* Check pending exceptions and deliver one, if possible. */ +void kvmppc_check_and_deliver_interrupts(struct kvm_vcpu *vcpu) +{ + unsigned long *pending = &vcpu->arch.pending_exceptions; + unsigned int exception; + unsigned int priority; + + priority = find_first_bit(pending, BITS_PER_BYTE * sizeof(*pending)); + while (priority <= BOOKE_MAX_INTERRUPT) { + exception = priority_exception[priority]; + if (kvmppc_can_deliver_interrupt(vcpu, exception)) { + kvmppc_clear_exception(vcpu, exception); + kvmppc_deliver_interrupt(vcpu, exception); + break; + } + + priority = find_next_bit(pending, + BITS_PER_BYTE * sizeof(*pending), + priority + 1); + } +} + +static int kvmppc_emulate_mmio(struct kvm_run *run, struct kvm_vcpu *vcpu) +{ + enum emulation_result er; + int r; + + er = kvmppc_emulate_instruction(run, vcpu); + switch (er) { + case EMULATE_DONE: + /* Future optimization: only reload non-volatiles if they were + * actually modified. */ + r = RESUME_GUEST_NV; + break; + case EMULATE_DO_MMIO: + run->exit_reason = KVM_EXIT_MMIO; + /* We must reload nonvolatiles because "update" load/store + * instructions modify register state. */ + /* Future optimization: only reload non-volatiles if they were + * actually modified. */ + r = RESUME_HOST_NV; + break; + case EMULATE_FAIL: + /* XXX Deliver Program interrupt to guest. */ + printk(KERN_EMERG "%s: emulation failed (%08x)\n", __func__, + vcpu->arch.last_inst); + r = RESUME_HOST; + break; + default: + BUG(); + } + + return r; +} + +/** + * kvmppc_handle_exit + * + * Return value is in the form (errcode<<2 | RESUME_FLAG_HOST | RESUME_FLAG_NV) + */ +int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, + unsigned int exit_nr) +{ + enum emulation_result er; + int r = RESUME_HOST; + + local_irq_enable(); + + run->exit_reason = KVM_EXIT_UNKNOWN; + run->ready_for_interrupt_injection = 1; + + switch (exit_nr) { + case BOOKE_INTERRUPT_MACHINE_CHECK: + printk("MACHINE CHECK: %lx\n", mfspr(SPRN_MCSR)); + kvmppc_dump_vcpu(vcpu); + r = RESUME_HOST; + break; + + case BOOKE_INTERRUPT_EXTERNAL: + case BOOKE_INTERRUPT_DECREMENTER: + /* Since we switched IVPR back to the host's value, the host + * handled this interrupt the moment we enabled interrupts. + * Now we just offer it a chance to reschedule the guest. */ + + /* XXX At this point the TLB still holds our shadow TLB, so if + * we do reschedule the host will fault over it. Perhaps we + * should politely restore the host's entries to minimize + * misses before ceding control. */ + if (need_resched()) + cond_resched(); + if (exit_nr == BOOKE_INTERRUPT_DECREMENTER) + vcpu->stat.dec_exits++; + else + vcpu->stat.ext_intr_exits++; + r = RESUME_GUEST; + break; + + case BOOKE_INTERRUPT_PROGRAM: + if (vcpu->arch.msr & MSR_PR) { + /* Program traps generated by user-level software must be handled + * by the guest kernel. */ + vcpu->arch.esr = vcpu->arch.fault_esr; + kvmppc_queue_exception(vcpu, BOOKE_INTERRUPT_PROGRAM); + r = RESUME_GUEST; + break; + } + + er = kvmppc_emulate_instruction(run, vcpu); + switch (er) { + case EMULATE_DONE: + /* Future optimization: only reload non-volatiles if + * they were actually modified by emulation. */ + vcpu->stat.emulated_inst_exits++; + r = RESUME_GUEST_NV; + break; + case EMULATE_DO_DCR: + run->exit_reason = KVM_EXIT_DCR; + r = RESUME_HOST; + break; + case EMULATE_FAIL: + /* XXX Deliver Program interrupt to guest. */ + printk(KERN_CRIT "%s: emulation at %x failed (%08x)\n", + __func__, vcpu->arch.pc, vcpu->arch.last_inst); + /* For debugging, encode the failing instruction and + * report it to userspace. */ + run->hw.hardware_exit_reason = ~0ULL << 32; + run->hw.hardware_exit_reason |= vcpu->arch.last_inst; + r = RESUME_HOST; + break; + default: + BUG(); + } + break; + + case BOOKE_INTERRUPT_DATA_STORAGE: + vcpu->arch.dear = vcpu->arch.fault_dear; + vcpu->arch.esr = vcpu->arch.fault_esr; + kvmppc_queue_exception(vcpu, exit_nr); + vcpu->stat.dsi_exits++; + r = RESUME_GUEST; + break; + + case BOOKE_INTERRUPT_INST_STORAGE: + vcpu->arch.esr = vcpu->arch.fault_esr; + kvmppc_queue_exception(vcpu, exit_nr); + vcpu->stat.isi_exits++; + r = RESUME_GUEST; + break; + + case BOOKE_INTERRUPT_SYSCALL: + kvmppc_queue_exception(vcpu, exit_nr); + vcpu->stat.syscall_exits++; + r = RESUME_GUEST; + break; + + case BOOKE_INTERRUPT_DTLB_MISS: { + struct tlbe *gtlbe; + unsigned long eaddr = vcpu->arch.fault_dear; + gfn_t gfn; + + /* Check the guest TLB. */ + gtlbe = kvmppc_44x_dtlb_search(vcpu, eaddr); + if (!gtlbe) { + /* The guest didn't have a mapping for it. */ + kvmppc_queue_exception(vcpu, exit_nr); + vcpu->arch.dear = vcpu->arch.fault_dear; + vcpu->arch.esr = vcpu->arch.fault_esr; + vcpu->stat.dtlb_real_miss_exits++; + r = RESUME_GUEST; + break; + } + + vcpu->arch.paddr_accessed = tlb_xlate(gtlbe, eaddr); + gfn = vcpu->arch.paddr_accessed >> PAGE_SHIFT; + + if (kvm_is_visible_gfn(vcpu->kvm, gfn)) { + /* The guest TLB had a mapping, but the shadow TLB + * didn't, and it is RAM. This could be because: + * a) the entry is mapping the host kernel, or + * b) the guest used a large mapping which we're faking + * Either way, we need to satisfy the fault without + * invoking the guest. */ + kvmppc_mmu_map(vcpu, eaddr, gfn, gtlbe->tid, + gtlbe->word2); + vcpu->stat.dtlb_virt_miss_exits++; + r = RESUME_GUEST; + } else { + /* Guest has mapped and accessed a page which is not + * actually RAM. */ + r = kvmppc_emulate_mmio(run, vcpu); + } + + break; + } + + case BOOKE_INTERRUPT_ITLB_MISS: { + struct tlbe *gtlbe; + unsigned long eaddr = vcpu->arch.pc; + gfn_t gfn; + + r = RESUME_GUEST; + + /* Check the guest TLB. */ + gtlbe = kvmppc_44x_itlb_search(vcpu, eaddr); + if (!gtlbe) { + /* The guest didn't have a mapping for it. */ + kvmppc_queue_exception(vcpu, exit_nr); + vcpu->stat.itlb_real_miss_exits++; + break; + } + + vcpu->stat.itlb_virt_miss_exits++; + + gfn = tlb_xlate(gtlbe, eaddr) >> PAGE_SHIFT; + + if (kvm_is_visible_gfn(vcpu->kvm, gfn)) { + /* The guest TLB had a mapping, but the shadow TLB + * didn't. This could be because: + * a) the entry is mapping the host kernel, or + * b) the guest used a large mapping which we're faking + * Either way, we need to satisfy the fault without + * invoking the guest. */ + kvmppc_mmu_map(vcpu, eaddr, gfn, gtlbe->tid, + gtlbe->word2); + } else { + /* Guest mapped and leaped at non-RAM! */ + kvmppc_queue_exception(vcpu, + BOOKE_INTERRUPT_MACHINE_CHECK); + } + + break; + } + + default: + printk(KERN_EMERG "exit_nr %d\n", exit_nr); + BUG(); + } + + local_irq_disable(); + + kvmppc_check_and_deliver_interrupts(vcpu); + + /* Do some exit accounting. */ + vcpu->stat.sum_exits++; + if (!(r & RESUME_HOST)) { + /* To avoid clobbering exit_reason, only check for signals if + * we aren't already exiting to userspace for some other + * reason. */ + if (signal_pending(current)) { + run->exit_reason = KVM_EXIT_INTR; + r = (-EINTR << 2) | RESUME_HOST | (r & RESUME_FLAG_NV); + + vcpu->stat.signal_exits++; + } else { + vcpu->stat.light_exits++; + } + } else { + switch (run->exit_reason) { + case KVM_EXIT_MMIO: + vcpu->stat.mmio_exits++; + break; + case KVM_EXIT_DCR: + vcpu->stat.dcr_exits++; + break; + case KVM_EXIT_INTR: + vcpu->stat.signal_exits++; + break; + } + } + + return r; +} + +/* Initial guest state: 16MB mapping 0 -> 0, PC = 0, MSR = 0, R1 = 16MB */ +int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu) +{ + struct tlbe *tlbe = &vcpu->arch.guest_tlb[0]; + + tlbe->tid = 0; + tlbe->word0 = PPC44x_TLB_16M | PPC44x_TLB_VALID; + tlbe->word1 = 0; + tlbe->word2 = PPC44x_TLB_SX | PPC44x_TLB_SW | PPC44x_TLB_SR; + + tlbe++; + tlbe->tid = 0; + tlbe->word0 = 0xef600000 | PPC44x_TLB_4K | PPC44x_TLB_VALID; + tlbe->word1 = 0xef600000; + tlbe->word2 = PPC44x_TLB_SX | PPC44x_TLB_SW | PPC44x_TLB_SR + | PPC44x_TLB_I | PPC44x_TLB_G; + + vcpu->arch.pc = 0; + vcpu->arch.msr = 0; + vcpu->arch.gpr[1] = (16<<20) - 8; /* -8 for the callee-save LR slot */ + + /* Eye-catching number so we know if the guest takes an interrupt + * before it's programmed its own IVPR. */ + vcpu->arch.ivpr = 0x55550000; + + /* Since the guest can directly access the timebase, it must know the + * real timebase frequency. Accordingly, it must see the state of + * CCR1[TCS]. */ + vcpu->arch.ccr1 = mfspr(SPRN_CCR1); + + return 0; +} + +int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) +{ + int i; + + regs->pc = vcpu->arch.pc; + regs->cr = vcpu->arch.cr; + regs->ctr = vcpu->arch.ctr; + regs->lr = vcpu->arch.lr; + regs->xer = vcpu->arch.xer; + regs->msr = vcpu->arch.msr; + regs->srr0 = vcpu->arch.srr0; + regs->srr1 = vcpu->arch.srr1; + regs->pid = vcpu->arch.pid; + regs->sprg0 = vcpu->arch.sprg0; + regs->sprg1 = vcpu->arch.sprg1; + regs->sprg2 = vcpu->arch.sprg2; + regs->sprg3 = vcpu->arch.sprg3; + regs->sprg5 = vcpu->arch.sprg4; + regs->sprg6 = vcpu->arch.sprg5; + regs->sprg7 = vcpu->arch.sprg6; + + for (i = 0; i < ARRAY_SIZE(regs->gpr); i++) + regs->gpr[i] = vcpu->arch.gpr[i]; + + return 0; +} + +int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) +{ + int i; + + vcpu->arch.pc = regs->pc; + vcpu->arch.cr = regs->cr; + vcpu->arch.ctr = regs->ctr; + vcpu->arch.lr = regs->lr; + vcpu->arch.xer = regs->xer; + vcpu->arch.msr = regs->msr; + vcpu->arch.srr0 = regs->srr0; + vcpu->arch.srr1 = regs->srr1; + vcpu->arch.sprg0 = regs->sprg0; + vcpu->arch.sprg1 = regs->sprg1; + vcpu->arch.sprg2 = regs->sprg2; + vcpu->arch.sprg3 = regs->sprg3; + vcpu->arch.sprg5 = regs->sprg4; + vcpu->arch.sprg6 = regs->sprg5; + vcpu->arch.sprg7 = regs->sprg6; + + for (i = 0; i < ARRAY_SIZE(vcpu->arch.gpr); i++) + vcpu->arch.gpr[i] = regs->gpr[i]; + + return 0; +} + +int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu, + struct kvm_sregs *sregs) +{ + return -ENOTSUPP; +} + +int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu, + struct kvm_sregs *sregs) +{ + return -ENOTSUPP; +} + +int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu) +{ + return -ENOTSUPP; +} + +int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu) +{ + return -ENOTSUPP; +} + +/* 'linear_address' is actually an encoding of AS|PID|EADDR . */ +int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu, + struct kvm_translation *tr) +{ + struct tlbe *gtlbe; + int index; + gva_t eaddr; + u8 pid; + u8 as; + + eaddr = tr->linear_address; + pid = (tr->linear_address >> 32) & 0xff; + as = (tr->linear_address >> 40) & 0x1; + + index = kvmppc_44x_tlb_index(vcpu, eaddr, pid, as); + if (index == -1) { + tr->valid = 0; + return 0; + } + + gtlbe = &vcpu->arch.guest_tlb[index]; + + tr->physical_address = tlb_xlate(gtlbe, eaddr); + /* XXX what does "writeable" and "usermode" even mean? */ + tr->valid = 1; + + return 0; +} diff --git a/arch/powerpc/kvm/booke_host.c b/arch/powerpc/kvm/booke_host.c new file mode 100644 index 0000000..b480341 --- /dev/null +++ b/arch/powerpc/kvm/booke_host.c @@ -0,0 +1,83 @@ +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. + * + * Copyright IBM Corp. 2008 + * + * Authors: Hollis Blanchard <ho...@us...> + */ + +#include <linux/errno.h> +#include <linux/kvm_host.h> +#include <linux/module.h> +#include <asm/cacheflush.h> +#include <asm/kvm_ppc.h> + +unsigned long kvmppc_booke_handlers; + +static int kvmppc_booke_init(void) +{ + unsigned long ivor[16]; + unsigned long max_ivor = 0; + int i; + + /* We install our own exception handlers by hijacking IVPR. IVPR must + * be 16-bit aligned, so we need a 64KB allocation. */ + kvmppc_booke_handlers = __get_free_pages(GFP_KERNEL | __GFP_ZERO, + VCPU_SIZE_ORDER); + if (!kvmppc_booke_handlers) + return -ENOMEM; + + /* XXX make sure our handlers are smaller than Linux's */ + + /* Copy our interrupt handlers to match host IVORs. That way we don't + * have to swap the IVORs on every guest/host transition. */ + ivor[0] = mfspr(SPRN_IVOR0); + ivor[1] = mfspr(SPRN_IVOR1); + ivor[2] = mfspr(SPRN_IVOR2); + ivor[3] = mfspr(SPRN_IVOR3); + ivor[4] = mfspr(SPRN_IVOR4); + ivor[5] = mfspr(SPRN_IVOR5); + ivor[6] = mfspr(SPRN_IVOR6); + ivor[7] = mfspr(SPRN_IVOR7); + ivor[8] = mfspr(SPRN_IVOR8); + ivor[9] = mfspr(SPRN_IVOR9); + ivor[10] = mfspr(SPRN_IVOR10); + ivor[11] = mfspr(SPRN_IVOR11); + ivor[12] = mfspr(SPRN_IVOR12); + ivor[13] = mfspr(SPRN_IVOR13); + ivor[14] = mfspr(SPRN_IVOR14); + ivor[15] = mfspr(SPRN_IVOR15); + + for (i = 0; i < 16; i++) { + if (ivor[i] > max_ivor) + max_ivor = ivor[i]; + + memcpy((void *)kvmppc_booke_handlers + ivor[i], + kvmppc_handlers_start + i * kvmppc_handler_len, + kvmppc_handler_len); + } + flush_icache_range(kvmppc_booke_handlers, + kvmppc_booke_handlers + max_ivor + kvmppc_handler_len); + + return kvm_init(NULL, sizeof(struct kvm_vcpu), THIS_MODULE); +} + +static void __exit kvmppc_booke_exit(void) +{ + free_pages(kvmppc_booke_handlers, VCPU_SIZE_ORDER); + kvm_exit(); +} + +module_init(kvmppc_booke_init) +module_exit(kvmppc_booke_exit) diff --git a/arch/powerpc/kvm/booke_interrupts.S b/arch/powerpc/kvm/booke_interrupts.S new file mode 100644 index 0000000..3b653b5 --- /dev/null +++ b/arch/powerpc/kvm/booke_interrupts.S @@ -0,0 +1,436 @@ +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. + * + * Copyright IBM Corp. 2007 + * + * Authors: Hollis Blanchard <ho...@us...> + */ + +#include <asm/ppc_asm.h> +#include <asm/kvm_asm.h> +#include <asm/reg.h> +#include <asm/mmu-44x.h> +#include <asm/page.h> +#include <asm/asm-offsets.h> + +#define KVMPPC_MSR_MASK (MSR_CE|MSR_EE|MSR_PR|MSR_DE|MSR_ME|MSR_IS|MSR_DS) + +#define VCPU_GPR(n) (VCPU_GPRS + (n * 4)) + +/* The host stack layout: */ +#define HOST_R1 0 /* Implied by stwu. */ +#define HOST_CALLEE_LR 4 +#define HOST_RUN 8 +/* r2 is special: it holds 'current', and it made nonvolatile in the + * kernel with the -ffixed-r2 gcc option. */ +#define HOST_R2 12 +#define HOST_NV_GPRS 16 +#define HOST_NV_GPR(n) (HOST_NV_GPRS + ((n - 14) * 4)) +#define HOST_MIN_STACK_SIZE (HOST_NV_GPR(31) + 4) +#define HOST_STACK_SIZE (((HOST_MIN_STACK_SIZE + 15) / 16) * 16) /* Align. */ +#define HOST_STACK_LR (HOST_STACK_SIZE + 4) /* In caller stack frame. */ + +#define NEED_INST_MASK ((1<<BOOKE_INTERRUPT_PROGRAM) | \ + (1<<BOOKE_INTERRUPT_DTLB_MISS)) + +#define NEED_DEAR_MASK ((1<<BOOKE_INTERRUPT_DATA_STORAGE) | \ + (1<<BOOKE_INTERRUPT_DTLB_MISS)) + +#define NEED_ESR_MASK ((1<<BOOKE_INTERRUPT_DATA_STORAGE) | \ + (1<<BOOKE_INTERRUPT_INST_STORAGE) | \ + (1<<BOOKE_INTERRUPT_PROGRAM) | \ + (1<<BOOKE_INTERRUPT_DTLB_MISS)) + +.macro KVM_HANDLER ivor_nr +_GLOBAL(kvmppc_handler_\ivor_nr) + /* Get pointer to vcpu and record exit number. */ + mtspr SPRN_SPRG0, r4 + mfspr r4, SPRN_SPRG1 + stw r5, VCPU_GPR(r5)(r4) + stw r6, VCPU_GPR(r6)(r4) + mfctr r5 + lis r6, kvmppc_resume_host@h + stw r5, VCPU_CTR(r4) + li r5, \ivor_nr + ori r6, r6, kvmppc_resume_host@l + mtctr r6 + bctr +.endm + +_GLOBAL(kvmppc_handlers_start) +KVM_HANDLER BOOKE_INTERRUPT_CRITICAL +KVM_HANDLER BOOKE_INTERRUPT_MACHINE_CHECK +KVM_HANDLER BOOKE_INTERRUPT_DATA_STORAGE +KVM_HANDLER BOOKE_INTERRUPT_INST_STORAGE +KVM_HANDLER BOOKE_INTERRUPT_EXTERNAL +KVM_HANDLER BOOKE_INTERRUPT_ALIGNMENT +KVM_HANDLER BOOKE_INTERRUPT_PROGRAM +KVM_HANDLER BOOKE_INTERRUPT_FP_UNAVAIL +KVM_HANDLER BOOKE_INTERRUPT_SYSCALL +KVM_HANDLER BOOKE_INTERRUPT_AP_UNAVAIL +KVM_HANDLER BOOKE_INTERRUPT_DECREMENTER +KVM_HANDLER BOOKE_INTERRUPT_FIT +KVM_HANDLER BOOKE_INTERRUPT_WATCHDOG +KVM_HANDLER BOOKE_INTERRUPT_DTLB_MISS +KVM_HANDLER BOOKE_INTERRUPT_ITLB_MISS +KVM_HANDLER BOOKE_INTERRUPT_DEBUG + +_GLOBAL(kvmppc_handler_len) + .long kvmppc_handler_1 - kvmppc_handler_0 + + +/* Registers: + * SPRG0: guest r4 + * r4: vcpu pointer + * r5: KVM exit number + */ +_GLOBAL(kvmppc_resume_host) + stw r3, VCPU_GPR(r3)(r4) + mfcr r3 + stw r3, VCPU_CR(r4) + stw r7, VCPU_GPR(r7)(r4) + stw r8, VCPU_GPR(r8)(r4) + stw r9, VCPU_GPR(r9)(r4) + + li r6, 1 + slw r6, r6, r5 + + /* Save the faulting instruction and all GPRs for emulation. */ + andi. r7, r6, NEED_INST_MASK + beq ..skip_inst_copy + mfspr r9, SPRN_SRR0 + mfmsr r8 + ori r7, r8, MSR_DS + mtmsr r7 + isync + lwz r9, 0(r9) + mtmsr r8 + isync + stw r9, VCPU_LAST_INST(r4) + + stw r15, VCPU_GPR(r15)(r4) + stw r16, VCPU_GPR(r16)(r4) + stw r17, VCPU_GPR(r17)(r4) + stw r18, VCPU_GPR(r18)(r4) + stw r19, VCPU_GPR(r19)(r4) + stw r20, VCPU_GPR(r20)(r4) + stw r21, VCPU_GPR(r21)(r4) + stw r22, VCPU_GPR(r22)(r4) + stw r23, VCPU_GPR(r23)(r4) + stw r24, VCPU_GPR(r24)(r4) + stw r25, VCPU_GPR(r25)(r4) + stw r26, VCPU_GPR(r26)(r4) + stw r27, VCPU_GPR(r27)(r4) + stw r28, VCPU_GPR(r28)(r4) + stw r29, VCPU_GPR(r29)(r4) + stw r30, VCPU_GPR(r30)(r4) + stw r31, VCPU_GPR(r31)(r4) +..skip_inst_copy: + + /* Also grab DEAR and ESR before the host can clobber them. */ + + andi. r7, r6, NEED_DEAR_MASK + beq ..skip_dear + mfspr r9, SPRN_DEAR + stw r9, VCPU_FAULT_DEAR(r4) +..skip_dear: + + andi. r7, r6, NEED_ESR_MASK + beq ..skip_esr + mfspr r9, SPRN_ESR + stw r9, VCPU_FAULT_ESR(r4) +..skip_esr: + + /* Save remaining volatile guest register state to vcpu. */ + stw r0, VCPU_GPR(r0)(r4) + stw r1, VCPU_GPR(r1)(r4) + stw r2, VCPU_GPR(r2)(r4) + stw r10, VCPU_GPR(r10)(r4) + stw r11, VCPU_GPR(r11)(r4) + stw r12, VCPU_GPR(r12)(r4) + stw r13, VCPU_GPR(r13)(r4) + stw r14, VCPU_GPR(r14)(r4) /* We need a NV GPR below. */ + mflr r3 + stw r3, VCPU_LR(r4) + mfxer r3 + stw r3, VCPU_XER(r4) + mfspr r3, SPRN_SPRG0 + stw r3, VCPU_GPR(r4)(r4) + mfspr r3, SPRN_SRR0 + stw r3, VCPU_PC(r4) + + /* Restore host stack pointer and PID before IVPR, since the host + * exception handlers use them. */ + lwz r1, VCPU_HOST_STACK(r4) + lwz r3, VCPU_HOST_PID(r4) + mtspr SPRN_PID, r3 + + /* Restore host IVPR before re-enabling interrupts. We cheat and know + * that Linux IVPR is always 0xc0000000. */ + lis r3, 0xc000 + mtspr SPRN_IVPR, r3 + + /* Switch to kernel stack and jump to handler. */ + LOAD_REG_ADDR(r3, kvmppc_handle_exit) + mtctr r3 + lwz r3, HOST_RUN(r1) + lwz r2, HOST_R2(r1) + mr r14, r4 /* Save vcpu pointer. */ + + bctrl /* kvmppc_handle_exit() */ + + /* Restore vcpu pointer and the nonvolatiles we used. */ + mr r4, r14 + lwz r14, VCPU_GPR(r14)(r4) + + /* Sometimes instruction emulation must restore complete GPR state. */ + andi. r5, r3, RESUME_FLAG_NV + beq ..skip_nv_load + lwz r15, VCPU_GPR(r15)(r4) + lwz r16, VCPU_GPR(r16)(r4) + lwz r17, VCPU_GPR(r17)(r4) + lwz r18, VCPU_GPR(r18)(r4) + lwz r19, VCPU_GPR(r19)(r4) + lwz r20, VCPU_GPR(r20)(r4) + lwz r21, VCPU_GPR(r21)(r4) + lwz r22, VCPU_GPR(r22)(r4) + lwz r23, VCPU_GPR(r23)(r4) + lwz r24, VCPU_GPR(r24)(r4) + lwz r25, VCPU_GPR(r25)(r4) + lwz r26, VCPU_GPR(r26)(r4) + lwz r27, VCPU_GPR(r27)(r4) + lwz r28, VCPU_GPR(r28)(r4) + lwz r29, VCPU_GPR(r29)(r4) + lwz r30, VCPU_GPR(r30)(r4) + lwz r31, VCPU_GPR(r31)(r4) +..skip_nv_load: + + /* Should we return to the guest? */ + andi. r5, r3, RESUME_FLAG_HOST + beq lightweight_exit + + srawi r3, r3, 2 /* Shift -ERR back down. */ + +heavyweight_exit: + /* Not returning to guest. */ + + /* We already saved guest volatile register state; now save the + * non-volatiles. */ + stw r15, VCPU_GPR(r15)(r4) + stw r16, VCPU_GPR(r16)(r4) + stw r17, VCPU_GPR(r17)(r4) + stw r18, VCPU_GPR(r18)(r4) + stw r19, VCPU_GPR(r19)(r4) + stw r20, VCPU_GPR(r20)(r4) + stw r21, VCPU_GPR(r21)(r4) + stw r22, VCPU_GPR(r22)(r4) + stw r23, VCPU_GPR(r23)(r4) + stw r24, VCPU_GPR(r24)(r4) + stw r25, VCPU_GPR(r25)(r4) + stw r26, VCPU_GPR(r26)(r4) + stw r27, VCPU_GPR(r27)(r4) + stw r28, VCPU_GPR(r28)(r4) + stw r29, VCPU_GPR(r29)(r4) + stw r30, VCPU_GPR(r30)(r4) + stw r31, VCPU_GPR(r31)(r4) + + /* Load host non-volatile register state from host stack. */ + lwz r14, HOST_NV_GPR(r14)(r1) + lwz r15, HOST_NV_GPR(r15)(r1) + lwz r16, HOST_NV_GPR(r16)(r1) + lwz r17, HOST_NV_GPR(r17)(r1) + lwz r18, HOST_NV_GPR(r18)(r1) + lwz r19, HOST_NV_GPR(r19)(r1) + lwz r20, HOST_NV_GPR(r20)(r1) + lwz r21, HOST_NV_GPR(r21)(r1) + lwz r22, HOST_NV_GPR(r22)(r1) + lwz r23, HOST_NV_GPR(r23)(r1) + lwz r24, HOST_NV_GPR(r24)(r1) + lwz r25, HOST_NV_GPR(r25)(r1) + lwz r26, HOST_NV_GPR(r26)(r1) + lwz r27, HOST_NV_GPR(r27)(r1) + lwz r28, HOST_NV_GPR(r28)(r1) + lwz r29, HOST_NV_GPR(r29)(r1) + lwz r30, HOST_NV_GPR(r30)(r1) + lwz r31, HOST_NV_GPR(r31)(r1) + + /* Return to kvm_vcpu_run(). */ + lwz r4, HOST_STACK_LR(r1) + addi r1, r1, HOST_STACK_SIZE + mtlr r4 + /* r3 still contains the return code from kvmppc_handle_exit(). */ + blr + + +/* Registers: + * r3: kvm_run pointer + * r4: vcpu pointer + */ +_GLOBAL(__kvmppc_vcpu_run) + stwu r1, -HOST_STACK_SIZE(r1) + stw r1, VCPU_HOST_STACK(r4) /* Save stack pointer to vcpu. */ + + /* Save host state to stack. */ + stw r3, HOST_RUN(r1) + mflr r3 + stw r3, HOST_STACK_LR(r1) + + /* Save host non-volatile register state to stack. */ + stw r14, HOST_NV_GPR(r14)(r1) + stw r15, HOST_NV_GPR(r15)(r1) + stw r16, HOST_NV_GPR(r16)(r1) + stw r17, HOST_NV_GPR(r17)(r1) + stw r18, HOST_NV_GPR(r18)(r1) + stw r19, HOST_NV_GPR(r19)(r1) + stw r20, HOST_NV_GPR(r20)(r1) + stw r21, HOST_NV_GPR(r21)(r1) + stw r22, HOST_NV_GPR(r22)(r1) + stw r23, HOST_NV_GPR(r23)(r1) + stw r24, HOST_NV_GPR(r24)(r1) + stw r25, HOST_NV_GPR(r25)(r1) + stw r26, HOST_NV_GPR(r26)(r1) + stw r27, HOST_NV_GPR(r27)(r1) + stw r28, HOST_NV_GPR(r28)(r1) + stw r29, HOST_NV_GPR(r29)(r1) + stw r30, HOST_NV_GPR(r30)(r1) + stw r31, HOST_NV_GPR(r31)(r1) + + /* Load guest non-volatiles. */ + lwz r14, VCPU_GPR(r14)(r4) + lwz r15, VCPU_GPR(r15)(r4) + lwz r16, VCPU_GPR(r16)(r4) + lwz r17, VCPU_GPR(r17)(r4) + lwz r18, VCPU_GPR(r18)(r4) + lwz r19, VCPU_GPR(r19)(r4) + lwz r20, VCPU_GPR(r20)(r4) + lwz r21, VCPU_GPR(r21)(r4) + lwz r22, VCPU_GPR(r22)(r4) + lwz r23, VCPU_GPR(r23)(r4) + lwz r24, VCPU_GPR(r24)(r4) + lwz r25, VCPU_GPR(r25)(r4) + lwz r26, VCPU_GPR(r26)(r4) + lwz r27, VCPU_GPR(r27)(r4) + lwz r28, VCPU_GPR(r28)(r4) + lwz r29, VCPU_GPR(r29)(r4) + lwz r30, VCPU_GPR(r30)(r4) + lwz r31, VCPU_GPR(r31)(r4) + +lightweight_exit: + stw r2, HOST_R2(r1) + + mfspr r3, SPRN_PID + stw r3, VCPU_HOST_PID(r4) + lwz r3, VCPU_PID(r4) + mtspr SPRN_PID, r3 + + /* Prevent all TLB updates. */ + mfmsr r5 + lis r6, (MSR_EE|MSR_CE|MSR_ME|MSR_DE)@h + ori r6, r6, (MSR_EE|MSR_CE|MSR_ME|MSR_DE)@l + andc r6, r5, r6 + mtmsr r6 + + /* Save the host's non-pinned TLB mappings, and load the guest mappings + * over them. Leave the host's "pinned" kernel mappings in place. */ + /* XXX optimization: use generation count to avoid swapping unmodified + * entries. */ + mfspr r10, SPRN_MMUCR /* Save host MMUCR. */ + lis r8, tlb_44x_hwater@ha + lwz r8, tlb_44x_hwater@l(r8) + addi r3, r4, VCPU_HOST_TLB - 4 + addi r9, r4, VCPU_SHADOW_TLB - 4 + li r6, 0 +1: + /* Save host entry. */ + tlbre r7, r6, PPC44x_TLB_PAGEID + mfspr r5, SPRN_MMUCR + stwu r5, 4(r3) + stwu r7, 4(r3) + tlbre r7, r6, PPC44x_TLB_XLAT + stwu r7, 4(r3) + tlbre r7, r6, PPC44x_TLB_ATTRIB + stwu r7, 4(r3) + /* Load guest entry. */ + lwzu r7, 4(r9) + mtspr SPRN_MMUCR, r7 + lwzu r7, 4(r9) + tlbwe r7, r6, PPC44x_TLB_PAGEID + lwzu r7, 4(r9) + tlbwe r7, r6, PPC44x_TLB_XLAT + lwzu r7, 4(r9) + tlbwe r7, r6, PPC44x_TLB_ATTRIB + /* Increment index. */ + addi r6, r6, 1 + cmpw r6, r8 + blt 1b + mtspr SPRN_MMUCR, r10 /* Restore host MMUCR. */ + + iccci 0, 0 /* XXX hack */ + + /* Load some guest volatiles. */ + lwz r0, VCPU_GPR(r0)(r4) + lwz r2, VCPU_GPR(r2)(r4) + lwz r9, VCPU_GPR(r9)(r4) + lwz r10, VCPU_GPR(r10)(r4) + lwz r11, VCPU_GPR(r11)(r4) + lwz r12, VCPU_GPR(r12)(r4) + lwz r13, VCPU_GPR(r13)(r4) + lwz r3, VCPU_LR(r4) + mtlr r3 + lwz r3, VCPU_XER(r4) + mtxer r3 + + /* Switch the IVPR. XXX If we take a TLB miss after this we're screwed, + * so how do we make sure vcpu won't fault? */ + lis r8, kvmppc_booke_handlers@ha + lwz r8, kvmppc_booke_handlers@l(r8) + mtspr SPRN_IVPR, r8 + + /* Save vcpu pointer for the exception handlers. */ + mtspr SPRN_SPRG1, r4 + + /* Can't switch the stack pointer until after IVPR is switched, + * because host interrupt handlers would get confused. */ + lwz r1, VCPU_GPR(r1)(r4) + + /* XXX handle USPRG0 */ + /* Host interrupt handlers may have clobbered these guest-readable + * SPRGs, so we need to reload them here with the guest's values. */ + lwz r3, VCPU_SPRG4(r4) + mtspr SPRN_SPRG4, r3 + lwz r3, VCPU_SPRG5(r4) + mtspr SPRN_SPRG5, r3 + lwz r3, VCPU_SPRG6(r4) + mtspr SPRN_SPRG6, r3 + lwz r3, VCPU_SPRG7(r4) + mtspr SPRN_SPRG7, r3 + + /* Finish loading guest volatiles and jump to guest. */ + lwz r3, VCPU_CTR(r4) + mtctr r3 + lwz r3, VCPU_CR(r4) + mtcr r3 + lwz r5, VCPU_GPR(r5)(r4) + lwz r6, VCPU_GPR(r6)(r4) + lwz r7, VCPU_GPR(r7)(r4) + lwz r8, VCPU_GPR(r8)(r4) + lwz r3, VCPU_PC(r4) + mtsrr0 r3 + lwz r3, VCPU_MSR(r4) + oris r3, r3, KVMPPC_MSR_MASK@h + ori r3, r3, KVMPPC_MSR_MASK@l + mtsrr1 r3 + lwz r3, VCPU_GPR(r3)(r4) + lwz r4, VCPU_GPR(r4)(r4) + rfi diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c new file mode 100644 index 0000000..a03fe0c --- /dev/null +++ b/arch/powerpc/kvm/emulate.c @@ -0,0 +1,760 @@ +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. + * + * Copyright IBM Corp. 2007 + * + * Authors: Hollis Blanchard <ho...@us...> + */ + +#include <linux/jiffies.h> +#include <linux/timer.h> +#include <linux/types.h> +#include <linux/string.h> +#include <linux/kvm_host.h> + +#include <asm/dcr.h> +#include <asm/dcr-regs.h> +#include <asm/time.h> +#include <asm/byteorder.h> +#include <asm/kvm_ppc.h> + +#include "44x_tlb.h" + +/* Instruction decoding */ +static inline unsigned int get_op(u32 inst) +{ + return inst >> 26; +} + +static inline unsigned int get_xop(u32 inst) +{ + return (inst >> 1) & 0x3ff; +} + +static inline unsigned int get_sprn(u32 inst) +{ + return ((inst >> 16) & 0x1f) | ((inst >> 6) & 0x3e0); +} + +static inline unsigned int get_dcrn(u32 inst) +{ + return ((inst >> 16) & 0x1f) | ((inst >> 6) & 0x3e0); +} + +static inline unsigned int get_rt(u32 inst) +{ + return (inst >> 21) & 0x1f; +} + +static inline unsigned int get_rs(u32 inst) +{ + return (inst >> 21) & 0x1f; +} + +static inline unsigned int get_ra(u32 inst) +{ + return (inst >> 16) & 0x1f; +} + +static inline unsigned int get_rb(u32 inst) +{ + return (inst >> 11) & 0x1f; +} + +static inline unsigned int get_rc(u32 inst) +{ + return inst & 0x1; +} + +static inline unsigned int get_ws(u32 inst) +{ + return (inst >> 11) & 0x1f; +} + +static inline unsigned int get_d(u32 inst) +{ + return inst & 0xffff; +} + +static int tlbe_is_host_safe(const struct kvm_vcpu *vcpu, + const struct tlbe *tlbe) +{ + gpa_t gpa; + + if (!get_tlb_v(tlbe)) + return 0; + + /* Does it match current guest AS? */ + /* XXX what about IS != DS? */ + if (get_tlb_ts(tlbe) != !!(vcpu->arch.msr & MSR_IS)) + return 0; + + gpa = get_tlb_raddr(tlbe); + if (!gfn_to_memslot(vcpu->kvm, gpa >> PAGE_SHIFT)) + /* Mapping is not for RAM. */ + return 0; + + return 1; +} + +static int kvmppc_emul_tlbwe(struct kvm_vcpu *vcpu, u32 inst) +{ + u64 eaddr; + u64 raddr; + u64 asid; + u32 flags; + struct tlbe *tlbe; + unsigned int ra; + unsigned int rs; + unsigned int ws; + unsigned int index; + + ra = get_ra(inst); + rs = get_rs(inst); + ws = get_ws(inst); + + index = vcpu->arch.gpr[ra]; + if (index > PPC44x_TLB_SIZE) { + printk("%s: index %d\n", __func__, index); + kvmppc_dump_vcpu(vcpu); + return EMULATE_FAIL; + } + + tlbe = &vcpu->arch.guest_tlb[index]; + + /* Invalidate shadow mappings for the about-to-be-clobbered TLBE. */ + if (tlbe->word0 & PPC44x_TLB_VALID) { + eaddr = get_tlb_eaddr(tlbe); + asid = (tlbe->word0 & PPC44x_TLB_TS) | tlbe->tid; + kvmppc_mmu_invalidate(vcpu, eaddr, asid); + } + + switch (ws) { + case PPC44x_TLB_PAGEID: + tlbe->tid = vcpu->arch.mmucr & 0xff; + tlbe->word0 = vcpu->arch.gpr[rs]; + break; + + case PPC44x_TLB_XLAT: + tlbe->word1 = vcpu->arch.gpr[rs]; + break; + + case PPC44x_TLB_ATTRIB: + tlbe->word2 = vcpu->arch.gpr[rs]; + break; + + default: + return EMULATE_FAIL; + } + + if (tlbe_is_host_safe(vcpu, tlbe)) { + eaddr = get_tlb_eaddr(tlbe); + raddr = get_tlb_raddr(tlbe); + asid = (tlbe->word0 & PPC44x_TLB_TS) | tlbe->tid; + flags = tlbe->word2 & 0xffff; + + /* Create a 4KB mapping on the host. If the guest wanted a + * large page, only the first 4KB is mapped here and the rest + * are mapped on the fly. */ + kvmppc_mmu_map(vcpu, eaddr, raddr >> PAGE_SHIFT, asid, flags); + } + + return EMULATE_DONE; +} + +static void kvmppc_emulate_dec(struct kvm_vcpu *vcpu) +{ + if (vcpu->arch.tcr & TCR_DIE) { + /* The decrementer ticks at the same rate as the timebase, so + * that's how we convert the guest DEC value to the number of + * host ticks. */ + unsigned long nr_jiffies; + + nr_jiffies = vcpu->arch.dec / tb_ticks_per_jiffy; + mod_timer(&vcpu->arch.dec_timer, + get_jiffies_64() + nr_jiffies); + } else { + del_timer(&vcpu->arch.dec_timer); + } +} + +static void kvmppc_emul_rfi(struct kvm_vcpu *vcpu) +{ + vcpu->arch.pc = vcpu->arch.srr0; + kvmppc_set_msr(vcpu, vcpu->arch.srr1); +} + +/* XXX to do: + * lhax + * lhaux + * lswx + * lswi + * stswx + * stswi + * lha + * lhau + * lmw + * stmw + * + * XXX is_bigendian should depend on MMU mapping or MSR[LE] + */ +int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu) +{ + u32 inst = vcpu->arch.last_inst; + u32 ea; + int ra; + int rb; + int rc; + int rs; + int rt; + int sprn; + int dcrn; + enum emulation_result emulated = EMULATE_DONE; + int advance = 1; + + switch (get_op(inst)) { + case 3: /* trap */ + printk("trap!\n"); + kvmppc_queue_exception(vcpu, BOOKE_INTERRUPT_PROGRAM); + advance = 0; + break; + + case 19: + switch (get_xop(inst)) { + case 50: /* rfi */ + kvmppc_emul_rfi(vcpu); + advance = 0; + break; + + default: + emulated = EMULATE_FAIL; + break; + } + break; + + case 31: + switch (get_xop(inst)) { + + case 83: /* mfmsr */ + rt = get_rt(inst); + vcpu->arch.gpr[rt] = vcpu->arch.msr; + break; + + case 87: /* lbzx */ + rt = get_rt(inst); + emulated = kvmppc_handle_load(run, vcpu, rt, 1, 1); + break; + + case 131: /* wrtee */ + rs = get_rs(inst); + vcpu->arch.msr = (vcpu->arch.msr & ~MSR_EE) + | (vcpu->arch.gpr[rs] & MSR_EE); + break; + + case 146: /* mtmsr */ + rs = get_rs(inst); + kvmppc_set_msr(vcpu, vcpu->arch.gpr[rs]); + break; + + case 163: /* wrteei */ + vcpu->arch.msr = (vcpu->arch.msr & ~MSR_EE) + | (inst & MSR_EE); + break; + + case 215: /* stbx */ + rs = get_rs(inst); + emulated = kvmppc_handle_store(run, vcpu, + vcpu->arch.gpr[rs], + 1, 1); + break; + + case 247: /* stbux */ + rs = get_rs(inst); + ra = get_ra(inst); + rb = get_rb(inst); + + ea = vcpu->arch.gpr[rb]; + if (ra) + ea += vcpu->arch.gpr[ra]; + + emulated = kvmppc_handle_store(run, vcpu, + vcpu->arch.gpr[rs], + 1, 1); + vcpu->arch.gpr[rs] = ea; + break; + + case 279: /* lhzx */ + rt = get_rt(inst); + emulated = kvmppc_handle_load(run, vcpu, rt, 2, 1); + break; + + case 311: /* lhzux */ + rt = get_rt(inst); + ra = get_ra(inst); + rb = get_rb(inst); + + ea = vcpu->arch.gpr[rb]; + if (ra) + ea += vcpu->arch.gpr[ra]; + + emulated = kvmppc_handle_load(run, vcpu, rt, 2, 1); + vcpu->arch.gpr[ra] = ea; + break; + + case 323: /* mfdcr */ + dcrn = get_dcrn(inst); + rt = get_rt(inst); + + /* The guest may access CPR0 registers to determine the timebase + * frequency, and it must know the real host frequency because it + * can directly access the timebase registers. + * + * It would be possible to emulate those accesses in userspace, + * but userspace can really only figure out the end frequency. + * We could decompose that into the factors that compute it, but + * that's tricky math, and it's easier to just report the real + * CPR0 values. + */ + switch (dcrn) { + case DCRN_CPR0_CONFIG_ADDR: + vcpu->arch.gpr[rt] = vcpu->arch.cpr0_cfgaddr; + break; + case DCRN_CPR0_CONFIG_DATA: + local_irq_disable(); + mtdcr(DCRN_CPR0_CONFIG_ADDR, + vcpu->arch.cpr0_cfgaddr); + vcpu->arch.gpr[rt] = mfdcr(DCRN_CPR0_CONFIG_DATA); + local_irq_ena... [truncated message content] |
From: Avi K. <av...@qu...> - 2008-04-21 10:30:12
|
From: Hollis Blanchard <ho...@us...> It's a globally exported symbol now. Signed-off-by: Hollis Blanchard <ho...@us...> Signed-off-by: Avi Kivity <av...@qu...> --- include/linux/kvm_host.h | 2 +- virt/kvm/kvm_main.c | 8 ++++---- virt/kvm/kvm_trace.c | 4 ++-- 3 files changed, 7 insertions(+), 7 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 81d4c33..4e16682 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -315,7 +315,7 @@ struct kvm_stats_debugfs_item { struct dentry *dentry; }; extern struct kvm_stats_debugfs_item debugfs_entries[]; -extern struct dentry *debugfs_dir; +extern struct dentry *kvm_debugfs_dir; #ifdef CONFIG_KVM_TRACE int kvm_trace_ioctl(unsigned int ioctl, unsigned long arg); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 0998455..d3cb4cc 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -60,7 +60,7 @@ EXPORT_SYMBOL_GPL(kvm_vcpu_cache); static __read_mostly struct preempt_ops kvm_preempt_ops; -struct dentry *debugfs_dir; +struct dentry *kvm_debugfs_dir; static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl, unsigned long arg); @@ -1392,9 +1392,9 @@ static void kvm_init_debug(void) { struct kvm_stats_debugfs_item *p; - debugfs_dir = debugfs_create_dir("kvm", NULL); + kvm_debugfs_dir = debugfs_create_dir("kvm", NULL); for (p = debugfs_entries; p->name; ++p) - p->dentry = debugfs_create_file(p->name, 0444, debugfs_dir, + p->dentry = debugfs_create_file(p->name, 0444, kvm_debugfs_dir, (void *)(long)p->offset, stat_fops[p->kind]); } @@ -1405,7 +1405,7 @@ static void kvm_exit_debug(void) for (p = debugfs_entries; p->name; ++p) debugfs_remove(p->dentry); - debugfs_remove(debugfs_dir); + debugfs_remove(kvm_debugfs_dir); } static int kvm_suspend(struct sys_device *dev, pm_message_t state) diff --git a/virt/kvm/kvm_trace.c b/virt/kvm/kvm_trace.c index 5425440..0e49547 100644 --- a/virt/kvm/kvm_trace.c +++ b/virt/kvm/kvm_trace.c @@ -159,12 +159,12 @@ static int do_kvm_trace_enable(struct kvm_user_trace_setup *kuts) r = -EIO; atomic_set(&kt->lost_records, 0); - kt->lost_file = debugfs_create_file("lost_records", 0444, debugfs_dir, + kt->lost_file = debugfs_create_file("lost_records", 0444, kvm_debugfs_dir, kt, &kvm_trace_lost_ops); if (!kt->lost_file) goto err; - kt->rchan = relay_open("trace", debugfs_dir, kuts->buf_size, + kt->rchan = relay_open("trace", kvm_debugfs_dir, kuts->buf_size, kuts->buf_nr, &kvm_relay_callbacks, kt); if (!kt->rchan) goto err; -- 1.5.5 |
From: Avi K. <av...@qu...> - 2008-04-21 10:30:12
|
From: Hollis Blanchard <ho...@us...> Device Control Registers are essentially another address space found on PowerPC 4xx processors, analogous to PIO on x86. DCRs are always 32 bits, and can be identified by a 32-bit number. We forward most DCR accesses to userspace for emulation (with the exception of CPR0 registers, which can be read directly for simplicity in timebase frequency determination). Signed-off-by: Hollis Blanchard <ho...@us...> Signed-off-by: Avi Kivity <av...@qu...> --- include/linux/kvm.h | 7 +++++++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/include/linux/kvm.h b/include/linux/kvm.h index f8e211d..a281afe 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -82,6 +82,7 @@ struct kvm_irqchip { #define KVM_EXIT_TPR_ACCESS 12 #define KVM_EXIT_S390_SIEIC 13 #define KVM_EXIT_S390_RESET 14 +#define KVM_EXIT_DCR 15 /* for KVM_RUN, returned by mmap(vcpu_fd, offset=0) */ struct kvm_run { @@ -161,6 +162,12 @@ struct kvm_run { #define KVM_S390_RESET_CPU_INIT 8 #define KVM_S390_RESET_IPL 16 __u64 s390_reset_flags; + /* KVM_EXIT_DCR */ + struct { + __u32 dcrn; + __u32 data; + __u8 is_write; + } dcr; /* Fix the size of the union. */ char padding[256]; }; -- 1.5.5 |
From: Avi K. <av...@qu...> - 2008-04-21 10:30:09
|
From: Feng(Eric) Liu <eri...@in...> This interface allows user a space application to read the trace of kvm related events through relayfs. Signed-off-by: Feng (Eric) Liu <eri...@in...> Signed-off-by: Avi Kivity <av...@qu...> --- arch/x86/kvm/Kconfig | 11 ++ arch/x86/kvm/Makefile | 3 + include/linux/kvm_host.h | 14 +++ virt/kvm/kvm_main.c | 8 +- virt/kvm/kvm_trace.c | 276 ++++++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 311 insertions(+), 1 deletions(-) create mode 100644 virt/kvm/kvm_trace.c diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index 76c70ab..8d45fab 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -50,6 +50,17 @@ config KVM_AMD Provides support for KVM on AMD processors equipped with the AMD-V (SVM) extensions. +config KVM_TRACE + bool "KVM trace support" + depends on KVM && MARKERS && SYSFS + select RELAY + select DEBUG_FS + default n + ---help--- + This option allows reading a trace of kvm-related events through + relayfs. Note the ABI is not considered stable and will be + modified in future updates. + # OK, it's a little counter-intuitive to do this, but it puts it neatly under # the virtualization menu. source drivers/lguest/Kconfig diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile index 4d0c22e..c97d35c 100644 --- a/arch/x86/kvm/Makefile +++ b/arch/x86/kvm/Makefile @@ -3,6 +3,9 @@ # common-objs = $(addprefix ../../../virt/kvm/, kvm_main.o ioapic.o) +ifeq ($(CONFIG_KVM_TRACE),y) +common-objs += $(addprefix ../../../virt/kvm/, kvm_trace.o) +endif EXTRA_CFLAGS += -Ivirt/kvm -Iarch/x86/kvm diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 578c363..bd0c2d2 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -15,6 +15,7 @@ #include <linux/sched.h> #include <linux/mm.h> #include <linux/preempt.h> +#include <linux/marker.h> #include <asm/signal.h> #include <linux/kvm.h> @@ -309,5 +310,18 @@ struct kvm_stats_debugfs_item { struct dentry *dentry; }; extern struct kvm_stats_debugfs_item debugfs_entries[]; +extern struct dentry *debugfs_dir; + +#ifdef CONFIG_KVM_TRACE +int kvm_trace_ioctl(unsigned int ioctl, unsigned long arg); +void kvm_trace_cleanup(void); +#else +static inline +int kvm_trace_ioctl(unsigned int ioctl, unsigned long arg) +{ + return -EINVAL; +} +#define kvm_trace_cleanup() ((void)0) +#endif #endif diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 6a52c08..d5911d9 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -60,7 +60,7 @@ EXPORT_SYMBOL_GPL(kvm_vcpu_cache); static __read_mostly struct preempt_ops kvm_preempt_ops; -static struct dentry *debugfs_dir; +struct dentry *debugfs_dir; static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl, unsigned long arg); @@ -1191,6 +1191,11 @@ static long kvm_dev_ioctl(struct file *filp, r += PAGE_SIZE; /* pio data page */ #endif break; + case KVM_TRACE_ENABLE: + case KVM_TRACE_PAUSE: + case KVM_TRACE_DISABLE: + r = kvm_trace_ioctl(ioctl, arg); + break; default: return kvm_arch_dev_ioctl(filp, ioctl, arg); } @@ -1519,6 +1524,7 @@ EXPORT_SYMBOL_GPL(kvm_init); void kvm_exit(void) { + kvm_trace_cleanup(); misc_deregister(&kvm_dev); kmem_cache_destroy(kvm_vcpu_cache); sysdev_unregister(&kvm_sysdev); diff --git a/virt/kvm/kvm_trace.c b/virt/kvm/kvm_trace.c new file mode 100644 index 0000000..5425440 --- /dev/null +++ b/virt/kvm/kvm_trace.c @@ -0,0 +1,276 @@ +/* + * kvm trace + * + * It is designed to allow debugging traces of kvm to be generated + * on UP / SMP machines. Each trace entry can be timestamped so that + * it's possible to reconstruct a chronological record of trace events. + * The implementation refers to blktrace kernel support. + * + * Copyright (c) 2008 Intel Corporation + * Copyright (C) 2006 Jens Axboe <ax...@ke...> + * + * Authors: Feng(Eric) Liu, eri...@in... + * + * Date: Feb 2008 + */ + +#include <linux/module.h> +#include <linux/relay.h> +#include <linux/debugfs.h> + +#include <linux/kvm_host.h> + +#define KVM_TRACE_STATE_RUNNING (1 << 0) +#define KVM_TRACE_STATE_PAUSE (1 << 1) +#define KVM_TRACE_STATE_CLEARUP (1 << 2) + +struct kvm_trace { + int trace_state; + struct rchan *rchan; + struct dentry *lost_file; + atomic_t lost_records; +}; +static struct kvm_trace *kvm_trace; + +struct kvm_trace_probe { + const char *name; + const char *format; + u32 cycle_in; + marker_probe_func *probe_func; +}; + +static inline int calc_rec_size(int cycle, int extra) +{ + int rec_size = KVM_TRC_HEAD_SIZE; + + rec_size += extra; + return cycle ? rec_size += KVM_TRC_CYCLE_SIZE : rec_size; +} + +static void kvm_add_trace(void *probe_private, void *call_data, + const char *format, va_list *args) +{ + struct kvm_trace_probe *p = probe_private; + struct kvm_trace *kt = kvm_trace; + struct kvm_trace_rec rec; + struct kvm_vcpu *vcpu; + int i, extra, size; + + if (unlikely(kt->trace_state != KVM_TRACE_STATE_RUNNING)) + return; + + rec.event = va_arg(*args, u32); + vcpu = va_arg(*args, struct kvm_vcpu *); + rec.pid = current->tgid; + rec.vcpu_id = vcpu->vcpu_id; + + extra = va_arg(*args, u32); + WARN_ON(!(extra <= KVM_TRC_EXTRA_MAX)); + extra = min_t(u32, extra, KVM_TRC_EXTRA_MAX); + rec.extra_u32 = extra; + + rec.cycle_in = p->cycle_in; + + if (rec.cycle_in) { + u64 cycle = 0; + + cycle = get_cycles(); + rec.u.cycle.cycle_lo = (u32)cycle; + rec.u.cycle.cycle_hi = (u32)(cycle >> 32); + + for (i = 0; i < rec.extra_u32; i++) + rec.u.cycle.extra_u32[i] = va_arg(*args, u32); + } else { + for (i = 0; i < rec.extra_u32; i++) + rec.u.nocycle.extra_u32[i] = va_arg(*args, u32); + } + + size = calc_rec_size(rec.cycle_in, rec.extra_u32 * sizeof(u32)); + relay_write(kt->rchan, &rec, size); +} + +static struct kvm_trace_probe kvm_trace_probes[] = { + { "kvm_trace_entryexit", "%u %p %u %u %u %u %u %u", 1, kvm_add_trace }, + { "kvm_trace_handler", "%u %p %u %u %u %u %u %u", 0, kvm_add_trace }, +}; + +static int lost_records_get(void *data, u64 *val) +{ + struct kvm_trace *kt = data; + + *val = atomic_read(&kt->lost_records); + return 0; +} + +DEFINE_SIMPLE_ATTRIBUTE(kvm_trace_lost_ops, lost_records_get, NULL, "%llu\n"); + +/* + * The relay channel is used in "no-overwrite" mode, it keeps trace of how + * many times we encountered a full subbuffer, to tell user space app the + * lost records there were. + */ +static int kvm_subbuf_start_callback(struct rchan_buf *buf, void *subbuf, + void *prev_subbuf, size_t prev_padding) +{ + struct kvm_trace *kt; + + if (!relay_buf_full(buf)) + return 1; + + kt = buf->chan->private_data; + atomic_inc(&kt->lost_records); + + return 0; +} + +static struct dentry *kvm_create_buf_file_callack(const char *filename, + struct dentry *parent, + int mode, + struct rchan_buf *buf, + int *is_global) +{ + return debugfs_create_file(filename, mode, parent, buf, + &relay_file_operations); +} + +static int kvm_remove_buf_file_callback(struct dentry *dentry) +{ + debugfs_remove(dentry); + return 0; +} + +static struct rchan_callbacks kvm_relay_callbacks = { + .subbuf_start = kvm_subbuf_start_callback, + .create_buf_file = kvm_create_buf_file_callack, + .remove_buf_file = kvm_remove_buf_file_callback, +}; + +static int do_kvm_trace_enable(struct kvm_user_trace_setup *kuts) +{ + struct kvm_trace *kt; + int i, r = -ENOMEM; + + if (!kuts->buf_size || !kuts->buf_nr) + return -EINVAL; + + kt = kzalloc(sizeof(*kt), GFP_KERNEL); + if (!kt) + goto err; + + r = -EIO; + atomic_set(&kt->lost_records, 0); + kt->lost_file = debugfs_create_file("lost_records", 0444, debugfs_dir, + kt, &kvm_trace_lost_ops); + if (!kt->lost_file) + goto err; + + kt->rchan = relay_open("trace", debugfs_dir, kuts->buf_size, + kuts->buf_nr, &kvm_relay_callbacks, kt); + if (!kt->rchan) + goto err; + + kvm_trace = kt; + + for (i = 0; i < ARRAY_SIZE(kvm_trace_probes); i++) { + struct kvm_trace_probe *p = &kvm_trace_probes[i]; + + r = marker_probe_register(p->name, p->format, p->probe_func, p); + if (r) + printk(KERN_INFO "Unable to register probe %s\n", + p->name); + } + + kvm_trace->trace_state = KVM_TRACE_STATE_RUNNING; + + return 0; +err: + if (kt) { + if (kt->lost_file) + debugfs_remove(kt->lost_file); + if (kt->rchan) + relay_close(kt->rchan); + kfree(kt); + } + return r; +} + +static int kvm_trace_enable(char __user *arg) +{ + struct kvm_user_trace_setup kuts; + int ret; + + ret = copy_from_user(&kuts, arg, sizeof(kuts)); + if (ret) + return -EFAULT; + + ret = do_kvm_trace_enable(&kuts); + if (ret) + return ret; + + return 0; +} + +static int kvm_trace_pause(void) +{ + struct kvm_trace *kt = kvm_trace; + int r = -EINVAL; + + if (kt == NULL) + return r; + + if (kt->trace_state == KVM_TRACE_STATE_RUNNING) { + kt->trace_state = KVM_TRACE_STATE_PAUSE; + relay_flush(kt->rchan); + r = 0; + } + + return r; +} + +void kvm_trace_cleanup(void) +{ + struct kvm_trace *kt = kvm_trace; + int i; + + if (kt == NULL) + return; + + if (kt->trace_state == KVM_TRACE_STATE_RUNNING || + kt->trace_state == KVM_TRACE_STATE_PAUSE) { + + kt->trace_state = KVM_TRACE_STATE_CLEARUP; + + for (i = 0; i < ARRAY_SIZE(kvm_trace_probes); i++) { + struct kvm_trace_probe *p = &kvm_trace_probes[i]; + marker_probe_unregister(p->name, p->probe_func, p); + } + + relay_close(kt->rchan); + debugfs_remove(kt->lost_file); + kfree(kt); + } +} + +int kvm_trace_ioctl(unsigned int ioctl, unsigned long arg) +{ + void __user *argp = (void __user *)arg; + long r = -EINVAL; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + switch (ioctl) { + case KVM_TRACE_ENABLE: + r = kvm_trace_enable(argp); + break; + case KVM_TRACE_PAUSE: + r = kvm_trace_pause(); + break; + case KVM_TRACE_DISABLE: + r = 0; + kvm_trace_cleanup(); + break; + } + + return r; +} -- 1.5.5 |
From: Avi K. <av...@qu...> - 2008-04-21 10:30:09
|
From: Joerg Roedel <joe...@am...> When KVM uses NPT there is no reason to intercept task switches. This patch removes the intercept for it in that case. Signed-off-by: Joerg Roedel <joe...@am...> Signed-off-by: Avi Kivity <av...@qu...> --- arch/x86/kvm/svm.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index da3ddef..8d04aed 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -591,6 +591,7 @@ static void init_vmcb(struct vcpu_svm *svm) if (npt_enabled) { /* Setup VMCB for Nested Paging */ control->nested_ctl = 1; + control->intercept &= ~(1ULL << INTERCEPT_TASK_SWITCH); control->intercept_exceptions &= ~(1 << PF_VECTOR); control->intercept_cr_read &= ~(INTERCEPT_CR0_MASK| INTERCEPT_CR3_MASK); -- 1.5.5 |
From: Avi K. <av...@qu...> - 2008-04-21 10:30:09
|
From: Feng (Eric) Liu <eri...@in...> Trace markers allow userspace to trace execution of a virtual machine in order to monitor its performance. Signed-off-by: Feng (Eric) Liu <eri...@in...> Signed-off-by: Avi Kivity <av...@qu...> --- arch/x86/kvm/vmx.c | 35 ++++++++++++++++++++++++++++++- arch/x86/kvm/x86.c | 26 +++++++++++++++++++++++ include/asm-x86/kvm.h | 20 ++++++++++++++++++ include/asm-x86/kvm_host.h | 19 +++++++++++++++++ include/linux/kvm.h | 49 +++++++++++++++++++++++++++++++++++++++++++- 5 files changed, 147 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 6249810..8e5d664 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -1843,6 +1843,8 @@ static void vmx_inject_irq(struct kvm_vcpu *vcpu, int irq) { struct vcpu_vmx *vmx = to_vmx(vcpu); + KVMTRACE_1D(INJ_VIRQ, vcpu, (u32)irq, handler); + if (vcpu->arch.rmode.active) { vmx->rmode.irq.pending = true; vmx->rmode.irq.vector = irq; @@ -1993,6 +1995,8 @@ static int handle_exception(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) error_code = vmcs_read32(VM_EXIT_INTR_ERROR_CODE); if (is_page_fault(intr_info)) { cr2 = vmcs_readl(EXIT_QUALIFICATION); + KVMTRACE_3D(PAGE_FAULT, vcpu, error_code, (u32)cr2, + (u32)((u64)cr2 >> 32), handler); return kvm_mmu_page_fault(vcpu, cr2, error_code); } @@ -2021,6 +2025,7 @@ static int handle_external_interrupt(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) { ++vcpu->stat.irq_exits; + KVMTRACE_1D(INTR, vcpu, vmcs_read32(VM_EXIT_INTR_INFO), handler); return 1; } @@ -2078,6 +2083,8 @@ static int handle_cr(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) reg = (exit_qualification >> 8) & 15; switch ((exit_qualification >> 4) & 3) { case 0: /* mov to cr */ + KVMTRACE_3D(CR_WRITE, vcpu, (u32)cr, (u32)vcpu->arch.regs[reg], + (u32)((u64)vcpu->arch.regs[reg] >> 32), handler); switch (cr) { case 0: vcpu_load_rsp_rip(vcpu); @@ -2110,6 +2117,7 @@ static int handle_cr(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) vcpu->arch.cr0 &= ~X86_CR0_TS; vmcs_writel(CR0_READ_SHADOW, vcpu->arch.cr0); vmx_fpu_activate(vcpu); + KVMTRACE_0D(CLTS, vcpu, handler); skip_emulated_instruction(vcpu); return 1; case 1: /*mov from cr*/ @@ -2118,12 +2126,18 @@ static int handle_cr(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) vcpu_load_rsp_rip(vcpu); vcpu->arch.regs[reg] = vcpu->arch.cr3; vcpu_put_rsp_rip(vcpu); + KVMTRACE_3D(CR_READ, vcpu, (u32)cr, + (u32)vcpu->arch.regs[reg], + (u32)((u64)vcpu->arch.regs[reg] >> 32), + handler); skip_emulated_instruction(vcpu); return 1; case 8: vcpu_load_rsp_rip(vcpu); vcpu->arch.regs[reg] = kvm_get_cr8(vcpu); vcpu_put_rsp_rip(vcpu); + KVMTRACE_2D(CR_READ, vcpu, (u32)cr, + (u32)vcpu->arch.regs[reg], handler); skip_emulated_instruction(vcpu); return 1; } @@ -2169,6 +2183,7 @@ static int handle_dr(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) val = 0; } vcpu->arch.regs[reg] = val; + KVMTRACE_2D(DR_READ, vcpu, (u32)dr, (u32)val, handler); } else { /* mov to dr */ } @@ -2193,6 +2208,9 @@ static int handle_rdmsr(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) return 1; } + KVMTRACE_3D(MSR_READ, vcpu, ecx, (u32)data, (u32)(data >> 32), + handler); + /* FIXME: handling of bits 32:63 of rax, rdx */ vcpu->arch.regs[VCPU_REGS_RAX] = data & -1u; vcpu->arch.regs[VCPU_REGS_RDX] = (data >> 32) & -1u; @@ -2206,6 +2224,9 @@ static int handle_wrmsr(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) u64 data = (vcpu->arch.regs[VCPU_REGS_RAX] & -1u) | ((u64)(vcpu->arch.regs[VCPU_REGS_RDX] & -1u) << 32); + KVMTRACE_3D(MSR_WRITE, vcpu, ecx, (u32)data, (u32)(data >> 32), + handler); + if (vmx_set_msr(vcpu, ecx, data) != 0) { kvm_inject_gp(vcpu, 0); return 1; @@ -2230,6 +2251,9 @@ static int handle_interrupt_window(struct kvm_vcpu *vcpu, cpu_based_vm_exec_control = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL); cpu_based_vm_exec_control &= ~CPU_BASED_VIRTUAL_INTR_PENDING; vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cpu_based_vm_exec_control); + + KVMTRACE_0D(PEND_INTR, vcpu, handler); + /* * If the user space waits to inject interrupts, exit as soon as * possible @@ -2272,6 +2296,8 @@ static int handle_apic_access(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) exit_qualification = vmcs_read64(EXIT_QUALIFICATION); offset = exit_qualification & 0xffful; + KVMTRACE_1D(APIC_ACCESS, vcpu, (u32)offset, handler); + er = emulate_instruction(vcpu, kvm_run, 0, 0, 0); if (er != EMULATE_DONE) { @@ -2335,6 +2361,9 @@ static int kvm_handle_exit(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) struct vcpu_vmx *vmx = to_vmx(vcpu); u32 vectoring_info = vmx->idt_vectoring_info; + KVMTRACE_3D(VMEXIT, vcpu, exit_reason, (u32)vmcs_readl(GUEST_RIP), + (u32)((u64)vmcs_readl(GUEST_RIP) >> 32), entryexit); + if (unlikely(vmx->fail)) { kvm_run->exit_reason = KVM_EXIT_FAIL_ENTRY; kvm_run->fail_entry.hardware_entry_failure_reason @@ -2416,6 +2445,8 @@ static void vmx_intr_assist(struct kvm_vcpu *vcpu) return; } + KVMTRACE_1D(REDELIVER_EVT, vcpu, idtv_info_field, handler); + vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, idtv_info_field); vmcs_write32(VM_ENTRY_INSTRUCTION_LEN, vmcs_read32(VM_EXIT_INSTRUCTION_LEN)); @@ -2601,8 +2632,10 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) intr_info = vmcs_read32(VM_EXIT_INTR_INFO); /* We need to handle NMIs before interrupts are enabled */ - if ((intr_info & INTR_INFO_INTR_TYPE_MASK) == 0x200) /* nmi */ + if ((intr_info & INTR_INFO_INTR_TYPE_MASK) == 0x200) { /* nmi */ + KVMTRACE_0D(NMI, vcpu, handler); asm("int $2"); + } } static void vmx_free_vmcs(struct kvm_vcpu *vcpu) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c7ad235..f070f0a 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -303,6 +303,9 @@ EXPORT_SYMBOL_GPL(kvm_set_cr0); void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw) { kvm_set_cr0(vcpu, (vcpu->arch.cr0 & ~0x0ful) | (msw & 0x0f)); + KVMTRACE_1D(LMSW, vcpu, + (u32)((vcpu->arch.cr0 & ~0x0ful) | (msw & 0x0f)), + handler); } EXPORT_SYMBOL_GPL(kvm_lmsw); @@ -2269,6 +2272,13 @@ int kvm_emulate_pio(struct kvm_vcpu *vcpu, struct kvm_run *run, int in, vcpu->arch.pio.guest_page_offset = 0; vcpu->arch.pio.rep = 0; + if (vcpu->run->io.direction == KVM_EXIT_IO_IN) + KVMTRACE_2D(IO_READ, vcpu, vcpu->run->io.port, (u32)size, + handler); + else + KVMTRACE_2D(IO_WRITE, vcpu, vcpu->run->io.port, (u32)size, + handler); + kvm_x86_ops->cache_regs(vcpu); memcpy(vcpu->arch.pio_data, &vcpu->arch.regs[VCPU_REGS_RAX], 4); kvm_x86_ops->decache_regs(vcpu); @@ -2307,6 +2317,13 @@ int kvm_emulate_pio_string(struct kvm_vcpu *vcpu, struct kvm_run *run, int in, vcpu->arch.pio.guest_page_offset = offset_in_page(address); vcpu->arch.pio.rep = rep; + if (vcpu->run->io.direction == KVM_EXIT_IO_IN) + KVMTRACE_2D(IO_READ, vcpu, vcpu->run->io.port, (u32)size, + handler); + else + KVMTRACE_2D(IO_WRITE, vcpu, vcpu->run->io.port, (u32)size, + handler); + if (!count) { kvm_x86_ops->skip_emulated_instruction(vcpu); return 1; @@ -2414,6 +2431,7 @@ void kvm_arch_exit(void) int kvm_emulate_halt(struct kvm_vcpu *vcpu) { ++vcpu->stat.halt_exits; + KVMTRACE_0D(HLT, vcpu, handler); if (irqchip_in_kernel(vcpu->kvm)) { vcpu->arch.mp_state = VCPU_MP_STATE_HALTED; up_read(&vcpu->kvm->slots_lock); @@ -2451,6 +2469,8 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) a2 = vcpu->arch.regs[VCPU_REGS_RDX]; a3 = vcpu->arch.regs[VCPU_REGS_RSI]; + KVMTRACE_1D(VMMCALL, vcpu, (u32)nr, handler); + if (!is_long_mode(vcpu)) { nr &= 0xFFFFFFFF; a0 &= 0xFFFFFFFF; @@ -2639,6 +2659,11 @@ void kvm_emulate_cpuid(struct kvm_vcpu *vcpu) } kvm_x86_ops->decache_regs(vcpu); kvm_x86_ops->skip_emulated_instruction(vcpu); + KVMTRACE_5D(CPUID, vcpu, function, + (u32)vcpu->arch.regs[VCPU_REGS_RAX], + (u32)vcpu->arch.regs[VCPU_REGS_RBX], + (u32)vcpu->arch.regs[VCPU_REGS_RCX], + (u32)vcpu->arch.regs[VCPU_REGS_RDX], handler); } EXPORT_SYMBOL_GPL(kvm_emulate_cpuid); @@ -2794,6 +2819,7 @@ again: if (test_and_clear_bit(KVM_REQ_TLB_FLUSH, &vcpu->requests)) kvm_x86_ops->tlb_flush(vcpu); + KVMTRACE_0D(VMENTRY, vcpu, entryexit); kvm_x86_ops->run(vcpu, kvm_run); vcpu->guest_mode = 0; diff --git a/include/asm-x86/kvm.h b/include/asm-x86/kvm.h index 12b4b25..80eefef 100644 --- a/include/asm-x86/kvm.h +++ b/include/asm-x86/kvm.h @@ -209,4 +209,24 @@ struct kvm_pit_state { struct kvm_pit_channel_state channels[3]; }; +#define KVM_TRC_INJ_VIRQ (KVM_TRC_HANDLER + 0x02) +#define KVM_TRC_REDELIVER_EVT (KVM_TRC_HANDLER + 0x03) +#define KVM_TRC_PEND_INTR (KVM_TRC_HANDLER + 0x04) +#define KVM_TRC_IO_READ (KVM_TRC_HANDLER + 0x05) +#define KVM_TRC_IO_WRITE (KVM_TRC_HANDLER + 0x06) +#define KVM_TRC_CR_READ (KVM_TRC_HANDLER + 0x07) +#define KVM_TRC_CR_WRITE (KVM_TRC_HANDLER + 0x08) +#define KVM_TRC_DR_READ (KVM_TRC_HANDLER + 0x09) +#define KVM_TRC_DR_WRITE (KVM_TRC_HANDLER + 0x0A) +#define KVM_TRC_MSR_READ (KVM_TRC_HANDLER + 0x0B) +#define KVM_TRC_MSR_WRITE (KVM_TRC_HANDLER + 0x0C) +#define KVM_TRC_CPUID (KVM_TRC_HANDLER + 0x0D) +#define KVM_TRC_INTR (KVM_TRC_HANDLER + 0x0E) +#define KVM_TRC_NMI (KVM_TRC_HANDLER + 0x0F) +#define KVM_TRC_VMMCALL (KVM_TRC_HANDLER + 0x10) +#define KVM_TRC_HLT (KVM_TRC_HANDLER + 0x11) +#define KVM_TRC_CLTS (KVM_TRC_HANDLER + 0x12) +#define KVM_TRC_LMSW (KVM_TRC_HANDLER + 0x13) +#define KVM_TRC_APIC_ACCESS (KVM_TRC_HANDLER + 0x14) + #endif diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h index 2861178..15169cb 100644 --- a/include/asm-x86/kvm_host.h +++ b/include/asm-x86/kvm_host.h @@ -667,4 +667,23 @@ enum { TASK_SWITCH_GATE = 3, }; +#define KVMTRACE_5D(evt, vcpu, d1, d2, d3, d4, d5, name) \ + trace_mark(kvm_trace_##name, "%u %p %u %u %u %u %u %u", KVM_TRC_##evt, \ + vcpu, 5, d1, d2, d3, d4, d5) +#define KVMTRACE_4D(evt, vcpu, d1, d2, d3, d4, name) \ + trace_mark(kvm_trace_##name, "%u %p %u %u %u %u %u %u", KVM_TRC_##evt, \ + vcpu, 4, d1, d2, d3, d4, 0) +#define KVMTRACE_3D(evt, vcpu, d1, d2, d3, name) \ + trace_mark(kvm_trace_##name, "%u %p %u %u %u %u %u %u", KVM_TRC_##evt, \ + vcpu, 3, d1, d2, d3, 0, 0) +#define KVMTRACE_2D(evt, vcpu, d1, d2, name) \ + trace_mark(kvm_trace_##name, "%u %p %u %u %u %u %u %u", KVM_TRC_##evt, \ + vcpu, 2, d1, d2, 0, 0, 0) +#define KVMTRACE_1D(evt, vcpu, d1, name) \ + trace_mark(kvm_trace_##name, "%u %p %u %u %u %u %u %u", KVM_TRC_##evt, \ + vcpu, 1, d1, 0, 0, 0, 0) +#define KVMTRACE_0D(evt, vcpu, name) \ + trace_mark(kvm_trace_##name, "%u %p %u %u %u %u %u %u", KVM_TRC_##evt, \ + vcpu, 0, 0, 0, 0, 0, 0) + #endif diff --git a/include/linux/kvm.h b/include/linux/kvm.h index f04bb42..d302d63 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -14,6 +14,12 @@ #define KVM_API_VERSION 12 +/* for KVM_TRACE_ENABLE */ +struct kvm_user_trace_setup { + __u32 buf_size; /* sub_buffer size of each per-cpu */ + __u32 buf_nr; /* the number of sub_buffers of each per-cpu */ +}; + /* for KVM_CREATE_MEMORY_REGION */ struct kvm_memory_region { __u32 slot; @@ -242,6 +248,42 @@ struct kvm_s390_interrupt { __u64 parm64; }; +#define KVM_TRC_SHIFT 16 +/* + * kvm trace categories + */ +#define KVM_TRC_ENTRYEXIT (1 << KVM_TRC_SHIFT) +#define KVM_TRC_HANDLER (1 << (KVM_TRC_SHIFT + 1)) /* only 12 bits */ + +/* + * kvm trace action + */ +#define KVM_TRC_VMENTRY (KVM_TRC_ENTRYEXIT + 0x01) +#define KVM_TRC_VMEXIT (KVM_TRC_ENTRYEXIT + 0x02) +#define KVM_TRC_PAGE_FAULT (KVM_TRC_HANDLER + 0x01) + +#define KVM_TRC_HEAD_SIZE 12 +#define KVM_TRC_CYCLE_SIZE 8 +#define KVM_TRC_EXTRA_MAX 7 + +/* This structure represents a single trace buffer record. */ +struct kvm_trace_rec { + __u32 event:28; + __u32 extra_u32:3; + __u32 cycle_in:1; + __u32 pid; + __u32 vcpu_id; + union { + struct { + __u32 cycle_lo, cycle_hi; + __u32 extra_u32[KVM_TRC_EXTRA_MAX]; + } cycle; + struct { + __u32 extra_u32[KVM_TRC_EXTRA_MAX]; + } nocycle; + } u; +}; + #define KVMIO 0xAE /* @@ -262,7 +304,12 @@ struct kvm_s390_interrupt { */ #define KVM_GET_VCPU_MMAP_SIZE _IO(KVMIO, 0x04) /* in bytes */ #define KVM_GET_SUPPORTED_CPUID _IOWR(KVMIO, 0x05, struct kvm_cpuid2) - +/* + * ioctls for kvm trace + */ +#define KVM_TRACE_ENABLE _IOW(KVMIO, 0x06, struct kvm_user_trace_setup) +#define KVM_TRACE_PAUSE _IO(KVMIO, 0x07) +#define KVM_TRACE_DISABLE _IO(KVMIO, 0x08) /* * Extension capability list. */ -- 1.5.5 |