You can subscribe to this list here.
2007 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(22) |
Sep
(45) |
Oct
(165) |
Nov
(149) |
Dec
(53) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2008 |
Jan
(155) |
Feb
(71) |
Mar
(219) |
Apr
(262) |
May
(21) |
Jun
(5) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Avi K. <av...@qu...> - 2007-08-27 14:50:21
|
Hollis Blanchard wrote: > What we could do is have the vcpu contain only the mappings needed to > cover the host kernel. Then if the host decides to schedule out the > guest, we restore the rest of the host TLB in the "heavyweight" exit > path. Ah, so you also have a lightweight/heavyweight exit distinction. I've been wanting to make that code generic (move most of *_vcpu_run() into kvm_main.c) and it seems that will be beneficial to ppc as well. -- Any sufficiently difficult bug is indistinguishable from a feature. |
From: Hollis B. <ho...@us...> - 2007-08-27 14:45:50
|
On Mon, 2007-08-27 at 13:34 +0200, Christian Ehrhardt wrote: > *resent because I forgot to use reply-all* > > Hollis Blanchard wrote: > > ... > > +struct kvm_vcpu { > > + /* This is an unmodified copy of the guest's TLB. */ > > + struct tlbe guest_tlb[PPC44x_TLB_SIZE]; > > + /* This is the TLB that's actually used when the guest is running. */ > > + struct tlbe shadow_tlb[PPC44x_TLB_SIZE]; > > + /* This is a copy of the host's TLB. */ > > + struct tlbe host_tlb[PPC44x_TLB_SIZE]; > > + > > + u32 host_stack; > > ... > > > > I understood from you mail before that you need all the vcpu content > together with the exception handlers in one 64k mapping. > > I just roughly estimate sizes here - 3xTLB = 3kb + rest of u32 vcpu > values (33+16+32)x4byte = 324b and te rest of the u64 values 32x8byte = > 256b. > So all together one vcpu struct has ~3,5kb. > > It looks strange to me that host_tlb is part of the guest vcpu structure > since the host_tlb exists only once. Can you think of any > multiprocessor/multiguest scenario where we need really one per guest vcpu? > Shall these design you posted at the moment be > multiprocessor/multiguest/multiguestcpu aware at all ? > > Maybe it would be possible to put the host_tlb array outside of the vcpu > struct as a single instance (still in your 64k mapping) to > a) have it only as often as it exists in hardware and > b) to save space in the 64k mapping reducing the per vcpu size nearly a > by 1/3. The trouble is that when an interrupt occurs while running the guest, the KVM exception handler must restore the host TLB before calling out to the host handler. What we could do is have the vcpu contain only the mappings needed to cover the host kernel. Then if the host decides to schedule out the guest, we restore the rest of the host TLB in the "heavyweight" exit path. Of course, we're not sure exactly how many entries the host kernel is using: it depends on the amount of physical RAM. Also, we're adding complexity in where to find the host TLB state. Since we've got 64KB to work with for our vcpu, I'm not sure it's really worth the effort. -- Hollis Blanchard IBM Linux Technology Center |
From: Christian E. <ehr...@li...> - 2007-08-27 11:37:21
|
*resent because I forgot to use reply-all* Hollis Blanchard wrote: > ... > +struct kvm_vcpu { > + /* This is an unmodified copy of the guest's TLB. */ > + struct tlbe guest_tlb[PPC44x_TLB_SIZE]; > + /* This is the TLB that's actually used when the guest is running. */ > + struct tlbe shadow_tlb[PPC44x_TLB_SIZE]; > + /* This is a copy of the host's TLB. */ > + struct tlbe host_tlb[PPC44x_TLB_SIZE]; > + > + u32 host_stack; > ... > I understood from you mail before that you need all the vcpu content together with the exception handlers in one 64k mapping. I just roughly estimate sizes here - 3xTLB = 3kb + rest of u32 vcpu values (33+16+32)x4byte = 324b and te rest of the u64 values 32x8byte = 256b. So all together one vcpu struct has ~3,5kb. It looks strange to me that host_tlb is part of the guest vcpu structure since the host_tlb exists only once. Can you think of any multiprocessor/multiguest scenario where we need really one per guest vcpu? Shall these design you posted at the moment be multiprocessor/multiguest/multiguestcpu aware at all ? Maybe it would be possible to put the host_tlb array outside of the vcpu struct as a single instance (still in your 64k mapping) to a) have it only as often as it exists in hardware and b) to save space in the 64k mapping reducing the per vcpu size nearly a by 1/3. -- Grüsse / regards, Christian Ehrhardt IBM Linux Technology Center, Open Virtualization +49 7031/16-3385 Ehr...@li... Ehr...@de... IBM Deutschland Entwicklung GmbH Vorsitzender des Aufsichtsrats: Johann Weihen Geschäftsführung: Herbert Kircher Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 |
From: Jimi X. <ji...@po...> - 2007-08-26 19:25:13
|
On Aug 24, 2007, at 6:42 PM, Hollis Blanchard wrote: > > There are plenty of gaping holes here, but all comments are welcome, > especially if you think something is working by accident... comments > diff --git a/drivers/kvm/powerpc/emulate.c b/drivers/kvm/powerpc/=20 > emulate.c > new file mode 100644 > --- /dev/null > +++ b/drivers/kvm/powerpc/emulate.c > @@ -0,0 +1,294 @@ > +/* > + * This program is free software; you can redistribute it and/or =20 > modify > + * it under the terms of the GNU General Public License as =20 > published by > + * the Free Software Foundation; either version 2 of the License, or > + * (at your option) any later version. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + * You should have received a copy of the GNU General Public License > + * along with this program; if not, write to the Free Software > + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA =20 > 02110-1301, USA. > + * > + * Copyright IBM Corp. 2007 > + * > + * Authors: Hollis Blanchard <ho...@us...> > + */ > + > +#include <linux/types.h> > + > +#include "kvm.h" > + > +#define NR_OPCODES (1<<6) > + > +/* XXX move me? */ > +static inline unsigned int get_tlb_pgsize(u32 tlbword0) > +{ > + return (tlbword0 >> 4) & 0xf; > +} > + > +/* XXX move me? */ > +static inline unsigned int get_tlb_epn(u32 tlbword0) > +{ > + return tlbword0 >> 10; > +} > + > +/* XXX move me? */ > +static inline unsigned int get_mmucr_stid(u32 mmucr) > +{ > + return mmucr & 0xff; > +} > + > +/* XXX move me? */ > +static inline unsigned int get_mmucr_sts(u32 mmucr) > +{ > + return (mmucr >> 16) & 0x1; > +} > + > +/* XXX move me? */ > +static inline unsigned int get_tlb_v(u32 tlbword0) > +{ > + return (tlbword0 >> 9) & 0x1; > +} > + > +/* XXX move me? */ > +static inline unsigned int get_tlb_ts(u32 tlbword0) > +{ > + return (tlbword0 >> 8) & 0x1; > +} > + > +static inline unsigned int get_op(u32 inst) > +{ > + return (inst >> 26) & 0x1f; > +} > + > +static inline unsigned int get_xop(u32 inst) > +{ > + return (inst >> 1) & 0x3ff; > +} > + > +static inline unsigned int get_sprn(u32 inst) > +{ > + return ((inst >> 16) & 0x1f) | ((inst >> 6) & 0x3e0); > +} > + > +static inline unsigned int get_rt(u32 inst) > +{ > + return (inst >> 21) & 0x1f; > +} > + > +static inline unsigned int get_rs(u32 inst) > +{ > + return (inst >> 21) & 0x1f; > +} > + > +static inline unsigned int get_ra(u32 inst) > +{ > + return (inst >> 16) & 0x1f; > +} > + > +static inline unsigned int get_rb(u32 inst) > +{ > + return (inst >> 11) & 0x1f; > +} > + > +static inline unsigned int get_rc(u32 inst) > +{ > + return inst & 0x1; > +} > + > +int emulate_instruction(struct kvm_vcpu *vcpu) > +{ > + u32 inst =3D vcpu->last_inst; > + int ra; > + int rb; > + int rc; > + int rs; > + int rt; > + int sprn; > + int i; > + int emulated =3D 1; > + > + switch (get_op(inst)) { > + case 19: > + switch (get_xop(inst)) { > + case 50: /* rfi = */ > + vcpu->pc =3D vcpu->srr0; > + vcpu->msr =3D vcpu->srr1; > + break; > + default: > + emulated =3D 0; > + break; > + } > + > + case 31: > + switch (get_xop(inst)) { > + case 83: /* mfmsr = */ > + rt =3D get_rt(inst); > + vcpu->gpr[rt] =3D vcpu->msr; > + break; > + > + case 467: /* mtspr = */ > + sprn =3D get_sprn(inst); > + rs =3D get_rs(inst); > + switch (sprn) { > + case SPRN_SRR0: > + vcpu->srr0 =3D vcpu->gpr[rs]; > + break; > + case SPRN_SRR1: > + vcpu->srr1 =3D vcpu->gpr[rs]; > + break; > + case SPRN_MMUCR: > + vcpu->mmucr =3D vcpu->gpr[rs]; > + break; > + case SPRN_PID: > + vcpu->pid =3D vcpu->gpr[rs]; > + break; > + case SPRN_SPRG0: > + vcpu->sprg0 =3D vcpu->gpr[rs]; > + break; > + case SPRN_SPRG1: > + vcpu->sprg1 =3D vcpu->gpr[rs]; > + break; > + case SPRN_SPRG2: > + vcpu->sprg2 =3D vcpu->gpr[rs]; > + break; > + case SPRN_SPRG3: > + vcpu->sprg3 =3D vcpu->gpr[rs]; > + break; > + case SPRN_SPRG4: > + vcpu->sprg4 =3D vcpu->gpr[rs]; > + break; > + case SPRN_SPRG5: > + vcpu->sprg5 =3D vcpu->gpr[rs]; > + break; > + case SPRN_SPRG6: > + vcpu->sprg6 =3D vcpu->gpr[rs]; > + break; > + case SPRN_SPRG7: > + vcpu->sprg7 =3D vcpu->gpr[rs]; > + break; > + > + default: > + printk("unknown spr %d\n", sprn); > + emulated =3D 0; > + break; > + } > + break; > + > + case 339: /* mfspr = */ > + sprn =3D get_sprn(inst); > + rt =3D get_rt(inst); > + switch (sprn) { > + case SPRN_SRR0: > + vcpu->gpr[rt] =3D vcpu->srr0; > + break; > + case SPRN_SRR1: > + vcpu->gpr[rt] =3D vcpu->srr1; > + break; > + case SPRN_MMUCR: > + vcpu->gpr[rt] =3D vcpu->mmucr; > + break; > + case SPRN_PID: > + vcpu->gpr[rt] =3D vcpu->pid; > + break; > + case SPRN_SPRG0: > + vcpu->gpr[rt] =3D vcpu->sprg0; > + break; > + case SPRN_SPRG1: > + vcpu->gpr[rt] =3D vcpu->sprg1; > + break; > + case SPRN_SPRG2: > + vcpu->gpr[rt] =3D vcpu->sprg2; > + break; > + case SPRN_SPRG3: > + vcpu->gpr[rt] =3D vcpu->sprg3; > + break; > + case SPRN_SPRG4: > + vcpu->gpr[rt] =3D vcpu->sprg4; > + break; > + case SPRN_SPRG5: > + vcpu->gpr[rt] =3D vcpu->sprg5; > + break; > + case SPRN_SPRG6: > + vcpu->gpr[rt] =3D vcpu->sprg6; > + break; > + case SPRN_SPRG7: > + vcpu->gpr[rt] =3D vcpu->sprg7; > + break; > + } > + break; > + > + case 946: /* tlbre = */ > + break; > + > + case 978: /* tlbwe = */ > + printk("tlbwe\n"); > + break; > + > + case 914: { /* tlbsx = */ > + u32 ea; > + > + rt =3D get_rt(inst); > + ra =3D get_ra(inst); > + rb =3D get_rb(inst); > + rc =3D get_rc(inst); > + > + ea =3D rb; > + if (ra) > + ea +=3D vcpu->gpr[ra]; > + > + /* XXX Replace loop with fancy data structures. = */ > + for (i =3D 0; i < PPC44x_TLB_SIZE; i++) { > + struct tlbe *tlbe =3D = &vcpu->guest_tlb[i]; > + unsigned int pgsize; > + unsigned int epn; > + unsigned int tid; > + > + pgsize =3D get_tlb_pgsize(tlbe->word0); > + epn =3D get_tlb_epn(tlbe->word0); > + if (ea < epn) > + continue; > + if (ea > (1 << 10 << (pgsize << 1))) > + continue; > + tid =3D get_mmucr_stid(tlbe->mmucr); > + if (tid && (tid !=3D = get_mmucr_stid(vcpu->mmucr))) > + continue; > + if (!get_tlb_v(tlbe->word0)) > + continue; > + if (get_tlb_ts(tlbe->word0) !=3D > + = get_mmucr_sts(vcpu->mmucr)) > + continue; > + > + printk("match! %d\n", i); > + vcpu->gpr[rt] =3D i; > + /* XXX handle Rc */ > + } > + } > + break; > + > + case 566: /* = tlbsync */ > + break; > + > + default: > + printk("unknown: op %d xop %d\n", get_op(inst), > + get_xop(inst)); > + emulated =3D 0; > + break; > + } > + break; > + > + default: > + printk("unknown op %d\n", get_op(inst)); > + emulated =3D 0; > + break; > + } > + > + if (emulated) > + vcpu->pc +=3D 4; /* Advance past emulated instruction. = */ > + > + return emulated; > +} > diff --git a/drivers/kvm/powerpc/exceptions_44x.S b/drivers/kvm/=20 > powerpc/exceptions_44x.S > new file mode 100644 > --- /dev/null > +++ b/drivers/kvm/powerpc/exceptions_44x.S > @@ -0,0 +1,524 @@ > +/* > + * This program is free software; you can redistribute it and/or =20 > modify > + * it under the terms of the GNU General Public License as =20 > published by > + * the Free Software Foundation; either version 2 of the License, or > + * (at your option) any later version. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + * You should have received a copy of the GNU General Public License > + * along with this program; if not, write to the Free Software > + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA =20 > 02110-1301, USA. > + * > + * Copyright IBM Corp. 2007 > + * > + * Authors: Hollis Blanchard <ho...@us...> > + */ > + > +#include <asm/ppc_asm.h> > +#include <asm/reg.h> > +#include <asm/mmu-44x.h> > +#include <asm/page.h> > + > +#include "kvm.h" > +#include "kvm-offsets.h" > + > +#define KERNEL_ATTRIB \ > + (PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | PPC44x_TLB_G) > +#define PPC_PIN_LOG 28 > + > +#define VCPU_GPR(n) (VCPU_GPRS + (n * 4)) > + > +/* The host stack layout: */ > +#define HOST_R1 0 /* implicit */ > +#define HOST_LR 4 > +#define HOST_RUN 8 > +#define HOST_NV_GPRS 12 > +#define HOST_NV_GPR(n) (HOST_NV_GPRS + ((n - 14) * 4)) > +#define HOST_STACK_SIZE (HOST_NV_GPR(31) + 4) > + > +.macro KVM_HANDLER ivor_nr > +_GLOBAL(kvm_trampoline_handler_\ivor_nr) > + /* Get pointer to vcpu and record exit number. */ > + mtspr SPRN_SPRG0, r4 > + mfspr r4, SPRN_SPRG1 > + stw r5, VCPU_GPR(r5)(r4) > + li r5, \ivor_nr > + /* This branch is fixed up at install time to jump to > + * kvm_trampoline_resume_host(). */ How do you find it? do you assume it is at a fixed offset from the =20 above symbol. Might be easier just to give it its own symbols like =20 "kvm_tramploine_fixup_\ivor_nr" or something to you do not have to =20 change the fixup code. > + b . > +.endm Unless you are relocating the trampoline as well, I think you will =20 have a distance problem, especially when (if?) =20 kvm_trampoline_resume_host() becomes part of a loadable module, =20 something you may want to nail sooner rather than later by saving the =20= CTR off of R4 and load the tramp address into CTR and then BCTR. > + > +_GLOBAL(kvm_trampoline_start) > +KVM_HANDLER 0 > +KVM_HANDLER 1 > +KVM_HANDLER 2 > +KVM_HANDLER 3 > +KVM_HANDLER 4 > +KVM_HANDLER 5 > +KVM_HANDLER 6 > +KVM_HANDLER 7 > +KVM_HANDLER 8 > +KVM_HANDLER 9 > +KVM_HANDLER 10 > +KVM_HANDLER 11 > +KVM_HANDLER 12 > +KVM_HANDLER 13 > +KVM_HANDLER 14 > +KVM_HANDLER 15 Can you give the above numbers names, for those of us with crappy =20 memory? > + > +/* Registers: > + * SPRG0: guest r4 > + * r4: vcpu pointer > + * r5: KVM exit number > + */ > +_GLOBAL(kvm_trampoline_resume_host) > + stw r3, VCPU_GPR(r3)(r4) > + stw r6, VCPU_GPR(r6)(r4) > + stw r7, VCPU_GPR(r7)(r4) > + stw r8, VCPU_GPR(r8)(r4) > + stw r9, VCPU_GPR(r9)(r4) > + mfcr r3 > + stw r3, VCPU_CR(r4) > + > + cmpwi r5, 6 Is "program" interrupt the only one where the instruction is =20 interesting? > + bne 1f > + /* Program interrupts save off a copy of the faulting =20 > instruction. */ > + mfspr r9, SPRN_SRR0 > + mfmsr r8 > + ori r7, r8, MSR_DS Just in case I would stick isyncs after the MTMSRs this will =20 guarantee that the next instructions will use the new MSR context. > + mtmsr r7 isync > + lwz r9, 0(r9) > + mtmsr r8 isync > + stw r9, VCPU_LAST_INST(r4) You may also want to scribble somewhere that VCPU_LAST_INST is =20 actually relevant to this context, maybe by storing the IAR/PC so you =20= can match them up? Also, make you will need to make sure that all inserted UX=3D1 TLB =20 entry always has SR=3D1 > +1: Can you give this a meaningful label? maybe "not_inst:" or something. IMNSHO, this goes for all non-macro jump points. > + > + /* Reload all host TLB mappings. Unfortunely we must skip the Here is a first "Unfortunately" :) > + * trampoline mapping here. */ > + /* Future optimization: only reload host kernel mappings here, =20= > and do > + * the rest in heavyweight_exit. */ hmm, can't you restore the hosts bolted entries and IVPR here, and =20 let the faults fly in on demand? If AS has not been compromised then you don't even have to invalidate =20= the guest entries. > + lwz r9, VCPU_TRAMPOLINE_TLBE(r4) > + mfspr r8, SPRN_MMUCR /* Save MMUCR. */ > + addi r3, r4, VCPU_HOST_TLB - 4 > + li r6, 0 > +1: > + cmpwi cr0, r6, PPC44x_TLB_SIZE Since you know you are going to do at least one iteration you can =20 save a branch by moving the compare/branch at the bottom of the loop. =20= You could also use the CTR for this loop. > + cmpl cr1, r6, r9=09 > + beq cr0, 3f /* Is this greater than the size of the = TLB? */ > + beq cr1, 2f /* Is this the trampoline? */ > + lwzu r7, 4(r3) > + mtspr SPRN_MMUCR, r7 > + lwzu r7, 4(r3) > + tlbwe r7, r6, PPC44x_TLB_PAGEID > + lwzu r7, 4(r3) > + tlbwe r7, r6, PPC44x_TLB_XLAT > + lwzu r7, 4(r3) > + tlbwe r7, r6, PPC44x_TLB_ATTRIB > + addi r6, r6, 1 > + b 1b > +2: > + addi r3, r3, 16 > + addi r6, r6, 1 > + b 1b > +3: > + > + /* We can jump into the host kernel now that it's completely = mapped > + * (we still have a modified entry 0 though). */ > + mfctr r7 > + LOAD_REG_ADDR(r6, resume_host_continued) > + stw r7, VCPU_CTR(r4) > + mtctr r6 > + bctr > +_GLOBAL(kvm_trampoline_resume_host_len) > + .long . - kvm_trampoline_resume_host You could just define _GLOBAL(kvm_trampoline_resume_host_end) here =20 and do the math later and drop the .long math. > + > +_GLOBAL(kvm_trampoline_resume_guest) > + /* Load all shadow TLB mappings. For simplicity, this includes a = =20 > reload > + * of the trampoline mapping. */ > + mfspr r8, SPRN_MMUCR /* Save host MMUCR. */ > + addi r3, r4, VCPU_SHADOW_TLB - 4 > + li r6, 0 > +1: > + lwzu r7, 4(r3) > + /* Set TID. (The other MMUCR bits are restored later.) */ > + mtspr SPRN_MMUCR, r7 > + lwzu r7, 4(r3) > + tlbwe r7, r6, PPC44x_TLB_PAGEID > + lwzu r7, 4(r3) > + tlbwe r7, r6, PPC44x_TLB_XLAT > + lwzu r7, 4(r3) > + tlbwe r7, r6, PPC44x_TLB_ATTRIB > + addi r6, r6, 1 > + cmpwi r6, PPC44x_TLB_SIZE > + blt 1b > + mtspr SPRN_MMUCR, r9 /* Restore host MMUCR. = */ > + > + /* Finish loading guest volatiles and jump to guest. */ > + lwz r3, VCPU_CTR(r4) > + mtctr r3 > + lwz r3, VCPU_CR(r4) > + mtcr r3 > + lwz r5, VCPU_GPR(r5)(r4) > + lwz r6, VCPU_GPR(r6)(r4) > + lwz r7, VCPU_GPR(r7)(r4) > + lwz r8, VCPU_GPR(r8)(r4) > + lwz r3, VCPU_PC(r4) > + mtsrr0 r3 > + lwz r3, VCPU_MSR(r4) > + mtsrr1 r3 > + lwz r3, VCPU_GPR(r3)(r4) > + lwz r4, VCPU_GPR(r4)(r4) > + rfi > +_GLOBAL(kvm_trampoline_resume_guest_len) > + .long . - kvm_trampoline_resume_guest > + > +_GLOBAL(kvm_trampoline_handler_len) > + .long kvm_trampoline_handler_1 - kvm_trampoline_handler_0 Where do these symbols come from? > + > + > +/* Registers: > + * SPRG0: guest r4 > + * r4: vcpu pointer > + * r5: KVM exit number > + * r8: MMUCR > + * r9: TLB entry # of the trampoline mapping > + */ > +resume_host_continued: > + /* Switch back to the linear vcpu mapping. */ > + lwz r4, VCPU_LINEAR(r4) > + > + /* We're done with the trampoline mapping now. */ > + mulli r3, r9, TLBE_BYTES > + add r3, r3, r4 > + lwz r7, (VCPU_HOST_TLB + 0)(r3) > + mtspr SPRN_MMUCR, r7 > + lwz r7, (VCPU_HOST_TLB + 4)(r3) > + tlbwe r7, r6, PPC44x_TLB_PAGEID > + lwz r7, (VCPU_HOST_TLB + 8)(r3) > + tlbwe r7, r6, PPC44x_TLB_XLAT > + lwz r7, (VCPU_HOST_TLB + 12)(r3) > + tlbwe r7, r6, PPC44x_TLB_ATTRIB > + mtspr SPRN_MMUCR, r8 /* Restore MMUCR. */ > + > + /* Save remaining volatile guest register state to vcpu. */ > + stw r0, VCPU_GPR(r0)(r4) > + stw r1, VCPU_GPR(r1)(r4) > + stw r2, VCPU_GPR(r2)(r4) > + stw r9, VCPU_GPR(r9)(r4) > + stw r10, VCPU_GPR(r10)(r4) > + stw r11, VCPU_GPR(r11)(r4) > + stw r12, VCPU_GPR(r12)(r4) > + stw r13, VCPU_GPR(r13)(r4) > + mflr r3 > + stw r3, VCPU_LR(r4) > + mfxer r3 > + stw r3, VCPU_XER(r4) > + mfspr r3, SPRN_SPRG0 > + stw r3, VCPU_GPR(r4)(r4) > + mfspr r3, SPRN_SRR0 > + stw r3, VCPU_PC(r4) > + mfspr r3, SPRN_SRR1 > + stw r3, VCPU_MSR(r4) > + > + /* Program interrupts save the complete GPR state for emulation. = */ > + cmpwi r5, 6 > + bne 1f > + stw r14, VCPU_GPR(r14)(r4) > + stw r15, VCPU_GPR(r15)(r4) > + stw r16, VCPU_GPR(r16)(r4) > + stw r17, VCPU_GPR(r17)(r4) > + stw r18, VCPU_GPR(r18)(r4) > + stw r19, VCPU_GPR(r19)(r4) > + stw r20, VCPU_GPR(r20)(r4) > + stw r21, VCPU_GPR(r21)(r4) > + stw r22, VCPU_GPR(r22)(r4) > + stw r23, VCPU_GPR(r23)(r4) > + stw r24, VCPU_GPR(r24)(r4) > + stw r25, VCPU_GPR(r25)(r4) > + stw r26, VCPU_GPR(r26)(r4) > + stw r27, VCPU_GPR(r27)(r4) > + stw r28, VCPU_GPR(r28)(r4) > + stw r29, VCPU_GPR(r29)(r4) > + stw r30, VCPU_GPR(r30)(r4) > + stw r31, VCPU_GPR(r31)(r4) > +1: > + > + /* Restore host IVPR before re-enabling interrupts. We cheat and = =20 > know > + * that Linux IVPR is always 0xc0000000. */ > + lis r3, 0xc000 > + mtspr SPRN_IVPR, r3 > + > + /* Switch to kernel stack and jump to handler. */ > + LOAD_REG_ADDR(r3, kvm_handle_exit) > + lwz r1, VCPU_HOST_STACK(r4) > + mtctr r3 > + lwz r3, HOST_RUN(r1) > + stw r14, VCPU_GPR(r14)(r4) > + mr r14, r4 /* Save vcpu pointer. */ > + bctrl /* Call kvm_handle_exit(). */ > + mr r4, r14 > + lwz r14, VCPU_GPR(r14)(r4) > + > + /* Program interrupts restore complete GPR state. */ > + cmpwi r3, 2 > + bne 1f > + lwz r14, VCPU_GPR(r14)(r4) > + lwz r15, VCPU_GPR(r15)(r4) > + lwz r16, VCPU_GPR(r16)(r4) > + lwz r17, VCPU_GPR(r17)(r4) > + lwz r18, VCPU_GPR(r18)(r4) > + lwz r19, VCPU_GPR(r19)(r4) > + lwz r20, VCPU_GPR(r20)(r4) > + lwz r21, VCPU_GPR(r21)(r4) > + lwz r22, VCPU_GPR(r22)(r4) > + lwz r23, VCPU_GPR(r23)(r4) > + lwz r24, VCPU_GPR(r24)(r4) > + lwz r25, VCPU_GPR(r25)(r4) > + lwz r26, VCPU_GPR(r26)(r4) > + lwz r27, VCPU_GPR(r27)(r4) > + lwz r28, VCPU_GPR(r28)(r4) > + lwz r29, VCPU_GPR(r29)(r4) > + lwz r30, VCPU_GPR(r30)(r4) > + lwz r31, VCPU_GPR(r31)(r4) > +1: > + > + /* Should we return to the guest? */ > + cmpwi r3, 0 > + bgt lightweight_exit > + > +heavyweight_exit: > + /* Not returning to guest. */ > + > + /* We already saved guest volatile register state; now save the > + * non-volatiles. */ > + stw r14, VCPU_GPR(r14)(r4) > + stw r15, VCPU_GPR(r15)(r4) > + stw r16, VCPU_GPR(r16)(r4) > + stw r17, VCPU_GPR(r17)(r4) > + stw r18, VCPU_GPR(r18)(r4) > + stw r19, VCPU_GPR(r19)(r4) > + stw r20, VCPU_GPR(r20)(r4) > + stw r21, VCPU_GPR(r21)(r4) > + stw r22, VCPU_GPR(r22)(r4) > + stw r23, VCPU_GPR(r23)(r4) > + stw r24, VCPU_GPR(r24)(r4) > + stw r25, VCPU_GPR(r25)(r4) > + stw r26, VCPU_GPR(r26)(r4) > + stw r27, VCPU_GPR(r27)(r4) > + stw r28, VCPU_GPR(r28)(r4) > + stw r29, VCPU_GPR(r29)(r4) > + stw r30, VCPU_GPR(r30)(r4) > + stw r31, VCPU_GPR(r31)(r4) > + > + /* XXX all SPRs */ > + /* IVLIM */ > + /* DVLIM */ > + /* ignore INV0=82=C4=ECINV3, ITV0-ITV3, DNV0-DNV3, DTV0-DTV3 = since =20 > these only Funky character > + * select the cache way */ > + > + /* Load host non-volatile register state from host stack. */ > + lwz r14, HOST_NV_GPR(r14)(r1) > + lwz r15, HOST_NV_GPR(r15)(r1) > + lwz r16, HOST_NV_GPR(r16)(r1) > + lwz r17, HOST_NV_GPR(r17)(r1) > + lwz r18, HOST_NV_GPR(r18)(r1) > + lwz r19, HOST_NV_GPR(r19)(r1) > + lwz r20, HOST_NV_GPR(r20)(r1) > + lwz r21, HOST_NV_GPR(r21)(r1) > + lwz r22, HOST_NV_GPR(r22)(r1) > + lwz r23, HOST_NV_GPR(r23)(r1) > + lwz r24, HOST_NV_GPR(r24)(r1) > + lwz r25, HOST_NV_GPR(r25)(r1) > + lwz r26, HOST_NV_GPR(r26)(r1) > + lwz r27, HOST_NV_GPR(r27)(r1) > + lwz r28, HOST_NV_GPR(r28)(r1) > + lwz r29, HOST_NV_GPR(r29)(r1) > + lwz r30, HOST_NV_GPR(r30)(r1) > + lwz r31, HOST_NV_GPR(r31)(r1) > + > + /* Return to kvm_vcpu_run(). */ > + lwz r4, HOST_LR(r1) > + addi r1, r1, HOST_STACK_SIZE > + mtlr r4 > + /* r3 still contains the return code from kvm_handle_exit(). */ > + blr > + > + > +/* Registers: > + * r3: kvm_run pointer > + * r4: vcpu pointer > + */ > +_GLOBAL(__vcpu_run) > + stwu r1, -HOST_STACK_SIZE(r1) > + stw r1, VCPU_HOST_STACK(r4) /* Save stack pointer to vcpu. = */ > + > + /* Save host state to stack. */ > + stw r3, HOST_RUN(r1) > + mflr r3 > + stw r3, HOST_LR(r1) > + > + /* Save host non-volatile register state to stack. */ > + stw r14, HOST_NV_GPR(r14)(r1) > + stw r15, HOST_NV_GPR(r15)(r1) > + stw r16, HOST_NV_GPR(r16)(r1) > + stw r17, HOST_NV_GPR(r17)(r1) > + stw r18, HOST_NV_GPR(r18)(r1) > + stw r19, HOST_NV_GPR(r19)(r1) > + stw r20, HOST_NV_GPR(r20)(r1) > + stw r21, HOST_NV_GPR(r21)(r1) > + stw r22, HOST_NV_GPR(r22)(r1) > + stw r23, HOST_NV_GPR(r23)(r1) > + stw r24, HOST_NV_GPR(r24)(r1) > + stw r25, HOST_NV_GPR(r25)(r1) > + stw r26, HOST_NV_GPR(r26)(r1) > + stw r27, HOST_NV_GPR(r27)(r1) > + stw r28, HOST_NV_GPR(r28)(r1) > + stw r29, HOST_NV_GPR(r29)(r1) > + stw r30, HOST_NV_GPR(r30)(r1) > + stw r31, HOST_NV_GPR(r31)(r1) > + > + /* XXX all guest SPRS */ > + > + /* Load guest non-volatiles. */ > + lwz r14, VCPU_GPR(r14)(r4) > + lwz r15, VCPU_GPR(r15)(r4) > + lwz r16, VCPU_GPR(r16)(r4) > + lwz r17, VCPU_GPR(r17)(r4) > + lwz r18, VCPU_GPR(r18)(r4) > + lwz r19, VCPU_GPR(r19)(r4) > + lwz r20, VCPU_GPR(r20)(r4) > + lwz r21, VCPU_GPR(r21)(r4) > + lwz r22, VCPU_GPR(r22)(r4) > + lwz r23, VCPU_GPR(r23)(r4) > + lwz r24, VCPU_GPR(r24)(r4) > + lwz r25, VCPU_GPR(r25)(r4) > + lwz r26, VCPU_GPR(r26)(r4) > + lwz r27, VCPU_GPR(r27)(r4) > + lwz r28, VCPU_GPR(r28)(r4) > + lwz r29, VCPU_GPR(r29)(r4) > + lwz r30, VCPU_GPR(r30)(r4) > + lwz r31, VCPU_GPR(r31)(r4) > + > +lightweight_exit: > + /* Load some guest volatiles. */ > + lwz r0, VCPU_GPR(r0)(r4) > + lwz r1, VCPU_GPR(r1)(r4) > + lwz r2, VCPU_GPR(r2)(r4) > + lwz r9, VCPU_GPR(r9)(r4) > + lwz r10, VCPU_GPR(r10)(r4) > + lwz r11, VCPU_GPR(r11)(r4) > + lwz r12, VCPU_GPR(r12)(r4) > + lwz r13, VCPU_GPR(r13)(r4) > + lwz r3, VCPU_LR(r4) > + mtlr r3 > + lwz r3, VCPU_XER(r4) > + mtxer r3 > + > + /* Prevent any TLB updates. */ > + mfmsr r5 > + lis r6,(MSR_EE|MSR_CE|MSR_ME|MSR_DE)@h > + ori r6,r6,(MSR_EE|MSR_CE|MSR_ME|MSR_DE)@l > + andc r6,r5,r6 > + mtmsr r6 > + > + /* Save all the host TLB mappings. */ > + addi r3, r4, VCPU_HOST_TLB - 4 > + li r6, 0 > +1: > + tlbre r7, r6, PPC44x_TLB_PAGEID > + mfspr r5, SPRN_MMUCR > + stwu r5, 4(r3) > + stwu r7, 4(r3) > + tlbre r7, r6, PPC44x_TLB_XLAT > + stwu r7, 4(r3) > + tlbre r7, r6, PPC44x_TLB_ATTRIB > + stwu r7, 4(r3) > + addi r6, r6, 1 > + cmpwi r6, PPC44x_TLB_SIZE > + blt 1b > + > + /* Create the trampoline mapping. */ > + lwz r6, VCPU_TRAMPOLINE_TLBE(r4) > + mulli r3, r6, TLBE_BYTES > + add r3, r3, r4 > + lwz r7, (VCPU_SHADOW_TLB + 0)(r3) > + mtspr SPRN_MMUCR, r7 > + lwz r7, (VCPU_SHADOW_TLB + 4)(r3) > + tlbwe r7, r6, PPC44x_TLB_PAGEID > + lwz r7, (VCPU_SHADOW_TLB + 8)(r3) > + tlbwe r7, r6, PPC44x_TLB_XLAT > + lwz r7, (VCPU_SHADOW_TLB + 12)(r3) > + tlbwe r7, r6, PPC44x_TLB_ATTRIB > + > + /* Switch the IVPR to the trampoline. */ > + lwz r8, VCPU_TRAMPOLINE(r4) > + mtspr SPRN_IVPR, r8 > + > + /* Transpose the vcpu pointer into the trampoline mapping. */ > + rlwimi r8, r4, 0, 32 - VCPU_SIZE_LOG, 31 > + mr r4, r8 > + /* Save vcpu pointer for the exception handlers. */ > + mtspr SPRN_SPRG1, r4 > + > + /* Need absolute branch to reach kvm_trampoline_resume_guest() = in =20 > the > + * trampoline. */ > + lwz r3, VCPU_RESUME_GUEST(r4) > + mtctr r3 > + bctr > + > +_GLOBAL(dummy_guest) > + nop > + mr r31,r3 > + mr r30,r4 > + mr r29,r5 > + mr r28,r6 > + mr r27,r7 > + li r24,0 /* CPU number */ > + mfspr r3,SPRN_PID /* Get PID */ > + mfmsr r4 /* Get MSR */ > + andi. r4,r4,MSR_IS@l /* TS=3D1? */ > + beq wmmucr /* If not, leave STS=3D0 = */ > + oris r3,r3,PPC44x_MMUCR_STS@h /* Set STS=3D1 */ > +wmmucr: mtspr SPRN_MMUCR,r3 /* Put MMUCR */ > + sync > + bl invstr /* Find our address */ > +invstr: mflr r5 /* Make it = accessible */ > + tlbsx r23,0,r5 /* Find entry we are in = */ > + li r4,0 /* Start at TLB entry 0 = */ > + li r3,0 /* Set PAGEID inval = value */ > +1: cmpw r23,r4 /* Is this our entry? */ > + beq skpinv /* If so, skip the inval = */ > + tlbwe r3,r4,PPC44x_TLB_PAGEID /* If not, inval the = entry */ > +skpinv: addi r4,r4,1 /* Increment */ > + cmpwi r4,64 /* Are we done? */ > + bne 1b /* If not, repeat */ > + isync /* If so, context change = */ > + lis r3,PAGE_OFFSET@h > + ori r3,r3,PAGE_OFFSET@l > + li r4, 0 /* Load the kernel physical = address */ > + li r0,0 > + mtspr SPRN_PID,r0 > + sync > + li r5,0 > + mtspr SPRN_MMUCR,r5 > + sync > + clrrwi r3,r3,10 /* Mask off the effective page = number */ > + ori r3,r3,PPC44x_TLB_VALID | PPC44x_TLB_256M > + clrrwi r4,r4,10 /* Mask off the real page number = */ > + li r5,0 > + ori r5,r5,(PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | =20= > PPC44x_TLB_G) > + li r0,63 /* TLB slot 63 */ > + tlbwe r3,r0,PPC44x_TLB_PAGEID /* Load the pageid fields */ > + tlbwe r4,r0,PPC44x_TLB_XLAT /* Load the translation fields = */ > + tlbwe r5,r0,PPC44x_TLB_ATTRIB /* Load the attrib/access fields = */ > + mfmsr r0 > + mtspr SPRN_SRR1, r0 > + lis r0,3f@h > + ori r0,r0,3f@l > + mtspr SPRN_SRR0,r0 > + sync > + rfi > +3: > diff --git a/drivers/kvm/powerpc/hack.c b/drivers/kvm/powerpc/hack.c > new file mode 100644 > --- /dev/null > +++ b/drivers/kvm/powerpc/hack.c > @@ -0,0 +1,228 @@ > +/* > + * This program is free software; you can redistribute it and/or =20 > modify > + * it under the terms of the GNU General Public License as =20 > published by > + * the Free Software Foundation; either version 2 of the License, or > + * (at your option) any later version. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + * You should have received a copy of the GNU General Public License > + * along with this program; if not, write to the Free Software > + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA =20 > 02110-1301, USA. > + * > + * Copyright IBM Corp. 2007 > + * > + * Authors: Hollis Blanchard <ho...@us...> > + */ > + > +#include <linux/errno.h> > +#include <linux/types.h> > +#include <linux/debugfs.h> > +#include <linux/vmalloc.h> > +#include <asm/uaccess.h> > +#include <asm/cache.h> > +#include <asm/cacheflush.h> > + > +#include "kvm.h" > + > +static struct kvm_vcpu *alloc_vcpu(void) > +{ > + unsigned long ivorlist[16]; > + struct kvm_vcpu *vcpu; > + unsigned long base; > + void *handlers; > + void *resume_host; > + void *resume_guest; > + struct vm_struct *area; > + int i; > + > + /* IVPR must be 16-bit aligned, so we need a 64KB allocation. = This > + * must be physically contiguous so that a single TLB entry maps = the > + * whole thing. */ > + base =3D __get_free_pages(GFP_KERNEL, VCPU_SIZE_ORDER); > + printk("base: %lx\n", base); > + if (!base) > + return NULL; > + > + /* Our trampoline cannot be mapped by the kernel linear mapping, > + * because > + * a) for performance and simplicity we create a mapping for it =20= > in TLB > + * entry 0, and > + * b) when swapping the TLB, it cannot be mapped by two entries > + * simultaneously. > + * > + * Further, by reserving a virtual address area from all other =20= > uses, we > + * can avoid frequent icache flushes. We manually handle the = mapping > + * instead of letting the kernel do it to avoid the simultaneous > + * mapping issue. > + * > + * We must use VM_IOREMAP to ensure we get an area with the = required > + * alignment. > + */ > + area =3D get_vm_area(VCPU_SIZE_BYTES, VM_IOREMAP); > + printk("trampoline: %p\n", area->addr); > + if (!area) { > + free_pages(base, VCPU_SIZE_ORDER); > + return NULL; > + } > + > + handlers =3D (void *)base; > + clear_pages(handlers, VCPU_SIZE_ORDER); > + > + /* XXX make sure our handlers are smaller than Linux's */ > + /* XXX do we need to check ordering? IVOR15 is always greatest? = */ > + > + /* Copy our interrupt handlers to match host IVORs. That way we =20= > don't > + * have to swap the IVORs on every guest/host transition. */ > + ivorlist[0] =3D mfspr(SPRN_IVOR0); > + ivorlist[1] =3D mfspr(SPRN_IVOR1); > + ivorlist[2] =3D mfspr(SPRN_IVOR2); > + ivorlist[3] =3D mfspr(SPRN_IVOR3); > + ivorlist[4] =3D mfspr(SPRN_IVOR4); > + ivorlist[5] =3D mfspr(SPRN_IVOR5); > + ivorlist[6] =3D mfspr(SPRN_IVOR6); > + ivorlist[7] =3D mfspr(SPRN_IVOR7); > + ivorlist[8] =3D mfspr(SPRN_IVOR8); > + ivorlist[9] =3D mfspr(SPRN_IVOR9); > + ivorlist[10] =3D mfspr(SPRN_IVOR10); > + ivorlist[11] =3D mfspr(SPRN_IVOR11); > + ivorlist[12] =3D mfspr(SPRN_IVOR12); > + ivorlist[13] =3D mfspr(SPRN_IVOR13); > + ivorlist[14] =3D mfspr(SPRN_IVOR14); > + ivorlist[15] =3D mfspr(SPRN_IVOR15); > + for (i =3D 0; i < 16; i++) { > + memcpy(handlers + ivorlist[i], > + kvm_trampoline_start + i * = kvm_trampoline_handler_len, > + kvm_trampoline_handler_len); > + } > + > + /* Copy in the trampoline code which is shared by all handlers. = */ > + resume_host =3D handlers + ivorlist[15] + = kvm_trampoline_handler_len; > + memcpy(resume_host, kvm_trampoline_resume_host, > + kvm_trampoline_resume_host_len); > + > + resume_guest =3D resume_host + kvm_trampoline_resume_host_len; > + memcpy(resume_guest, kvm_trampoline_resume_guest, > + kvm_trampoline_resume_guest_len); > + > + /* Manually fix up the handler branches, since we moved the code = =20 > away > + * from its link address. */ > + for (i =3D 0; i < 16; i++) { > + unsigned long *branch; > + branch =3D handlers + ivorlist[i] + = kvm_trampoline_handler_len > + - 4; > + *branch |=3D resume_host - (void *)branch; > + } > + > + /* Place vcpu data structure after the trampoline code. */ > + vcpu =3D resume_guest + kvm_trampoline_resume_guest_len; > + vcpu->linear =3D vcpu; > + vcpu->trampoline =3D area->addr; > + vcpu->resume_guest =3D resume_guest - (void *)base + = vcpu->trampoline; > + > + /* Insert mapping for the trampoline. */ > + vcpu->trampoline_tlbe =3D 0; > + vcpu->shadow_tlb[vcpu->trampoline_tlbe].mmucr =3D 0; > + vcpu->shadow_tlb[vcpu->trampoline_tlbe].word0 =3D > + (unsigned = long)vcpu->trampoline|VCPU_TLB_PGSZ|PPC44x_TLB_VALID; > + vcpu->shadow_tlb[vcpu->trampoline_tlbe].word1 =3D __pa(base); > + vcpu->shadow_tlb[vcpu->trampoline_tlbe].word2 =3D > + PPC44x_TLB_SX|PPC44x_TLB_SW|PPC44x_TLB_SR; > + printk("tlb[%d]: %x %x %x\n", vcpu->trampoline_tlbe, > + vcpu->shadow_tlb[vcpu->trampoline_tlbe].word0, > + vcpu->shadow_tlb[vcpu->trampoline_tlbe].word1, > + vcpu->shadow_tlb[vcpu->trampoline_tlbe].word2); > + > + /* Flush any stale code from the icache. */ > + flush_icache_range(base, (unsigned long)vcpu); NIT, can you add to the comment that this function flushed the d-=20 cache as well. > + > + return vcpu; > +} > + > +int kvm_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, > + unsigned int exit_nr) > +{ > + static int entries; > + int r; > + > + printk("%d @ %x (%x)\n", exit_nr, vcpu->pc, > + vcpu->last_inst); > + > + if (entries++ > 100) > + while (1) ; > + > + switch (exit_nr) { > + case 6: > + r =3D emulate_instruction(vcpu); > + if (r) { > + /* hack: tell asm to reload NV GPRs */ > + return 2; > + } > + case 10: > + asm("lis %0,%1@h\n" > + "mtspr %2,%0" > + : > + : "r"(0), "i"(TSR_DIS), "i"(SPRN_TSR)); That '"r"(0)' thing is dangerous, you can just use "mtspr(SPRN_TSR, =20 TSR_DIS);" > + break; > + } > + return 1; > +} > + > +extern char dummy_guest[]; > + > +static void load_guest(struct kvm_vcpu *vcpu) > +{ > + const unsigned long guestaddr =3D 0xc0000000; > + unsigned long *guest; > + unsigned int guestlen =3D 64 * sizeof(u32); > + > + /* Install a dummy guest. */ > + guest =3D (unsigned long *)guestaddr; > + memcpy(guest, dummy_guest, guestlen); > + flush_icache_range(guestaddr, guestaddr + guestlen); > + > + vcpu->pc =3D 0; > + vcpu->msr =3D MSR_PR|MSR_EE|MSR_IR|MSR_DR; > + > + /* Insert shadow mapping for guest. */ > + vcpu->shadow_tlb[1].mmucr =3D 0; > + vcpu->shadow_tlb[1].word0 =3D PPC44x_TLB_TS|PPC44x_TLB_1K > + |PPC44x_TLB_VALID; > + vcpu->shadow_tlb[1].word1 =3D 0; > + vcpu->shadow_tlb[1].word2 =3D PPC44x_TLB_UX|PPC44x_TLB_UW|=20 > PPC44x_TLB_UR | > + = PPC44x_TLB_SX|PPC44x_TLB_SW|PPC44x_TLB_SR; > + printk("tlb[1]: %x %x %x\n", vcpu->shadow_tlb[1].word0, > + vcpu->shadow_tlb[1].word1, vcpu->shadow_tlb[1].word2); > + > + vcpu->guest_tlb[1] =3D vcpu->shadow_tlb[1]; > +} > + > +static int vcpu_run(struct kvm_vcpu *vcpu) > +{ > + printk("running\n"); > + __vcpu_run(NULL, vcpu); > + local_irq_enable(); > + printk("back\n"); > +} > + > +int hack_init(void) > +{ > + struct kvm_vcpu *vcpu; > + > + vcpu =3D alloc_vcpu(); > + if (!vcpu) > + return -ENOMEM; > + > + load_guest(vcpu); > + return vcpu_run(vcpu); > +} > + > +void hack_exit(void) > +{ > +} > + > +module_init(hack_init); > +module_exit(hack_exit); > diff --git a/drivers/kvm/powerpc/kvm-offsets.c b/drivers/kvm/=20 > powerpc/kvm-offsets.c > new file mode 100644 > --- /dev/null > +++ b/drivers/kvm/powerpc/kvm-offsets.c > @@ -0,0 +1,48 @@ > +/* > + * This program is free software; you can redistribute it and/or =20 > modify > + * it under the terms of the GNU General Public License as =20 > published by > + * the Free Software Foundation; either version 2 of the License, or > + * (at your option) any later version. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + * You should have received a copy of the GNU General Public License > + * along with this program; if not, write to the Free Software > + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA =20 > 02110-1301, USA. > + * > + * Copyright IBM Corp. 2007 > + * > + * Authors: Hollis Blanchard <ho...@us...> > + */ > + > +#include <linux/stddef.h> > +#include <linux/types.h> > +#include "kvm.h" > + > +#define DEFINE(sym, val) \ > + asm volatile("\n->" #sym " %0 " #val : : "i" (val)) > + > +int main(void) > +{ > + DEFINE(TLBE_BYTES, sizeof(struct tlbe)); > + > + DEFINE(VCPU_HOST_STACK, offsetof(struct kvm_vcpu, host_stack)); > + DEFINE(VCPU_HOST_TLB, offsetof(struct kvm_vcpu, host_tlb)); > + DEFINE(VCPU_SHADOW_TLB, offsetof(struct kvm_vcpu, shadow_tlb)); > + DEFINE(VCPU_GPRS, offsetof(struct kvm_vcpu, gpr)); > + DEFINE(VCPU_LR, offsetof(struct kvm_vcpu, lr)); > + DEFINE(VCPU_CR, offsetof(struct kvm_vcpu, cr)); > + DEFINE(VCPU_XER, offsetof(struct kvm_vcpu, xer)); > + DEFINE(VCPU_CTR, offsetof(struct kvm_vcpu, ctr)); > + DEFINE(VCPU_PC, offsetof(struct kvm_vcpu, pc)); > + DEFINE(VCPU_MSR, offsetof(struct kvm_vcpu, msr)); > + DEFINE(VCPU_TRAMPOLINE, offsetof(struct kvm_vcpu, trampoline)); > + DEFINE(VCPU_TRAMPOLINE_TLBE, offsetof(struct kvm_vcpu, =20 > trampoline_tlbe)); > + DEFINE(VCPU_LINEAR, offsetof(struct kvm_vcpu, linear)); > + DEFINE(VCPU_RESUME_GUEST, offsetof(struct kvm_vcpu, = resume_guest)); > + DEFINE(VCPU_LAST_INST, offsetof(struct kvm_vcpu, last_inst)); > + return 0; > +} > diff --git a/drivers/kvm/powerpc/kvm.h b/drivers/kvm/powerpc/kvm.h > new file mode 100644 > --- /dev/null > +++ b/drivers/kvm/powerpc/kvm.h > @@ -0,0 +1,124 @@ > +/* > + * This program is free software; you can redistribute it and/or =20 > modify > + * it under the terms of the GNU General Public License as =20 > published by > + * the Free Software Foundation; either version 2 of the License, or > + * (at your option) any later version. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + * You should have received a copy of the GNU General Public License > + * along with this program; if not, write to the Free Software > + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA =20 > 02110-1301, USA. > + * > + * Copyright IBM Corp. 2007 > + * > + * Authors: Hollis Blanchard <ho...@us...> > + */ > + > +#ifndef __KVM_POWERPC_KVM_H__ > +#define __KVM_POWERPC_KVM_H__ > + > +#include <asm/mmu-44x.h> > + > +/* IVPR must be 64KiB-aligned. */ > +#define VCPU_SIZE_ORDER 4 > +#define VCPU_SIZE_LOG (VCPU_SIZE_ORDER + 12) > +#define VCPU_TLB_PGSZ PPC44x_TLB_64K > +#define VCPU_SIZE_BYTES (1<<VCPU_SIZE_LOG) > + > +#ifndef __ASSEMBLY__ > +#include <linux/mutex.h> > + > +/* for KVM_RUN, returned by mmap(vcpu_fd, offset=3D0) */ > +struct kvm_run { > + /* XXX */ > +}; > + > +struct kvm_stat { > + u32 exits; > + u32 mmio_exits; > + u32 signal_exits; > + u32 light_exits; > +}; > + > +struct tlbe { > + u32 mmucr; > + u32 word0; > + u32 word1; > + u32 word2; > +}; > + > +struct kvm_vcpu { > + /* This is an unmodified copy of the guest's TLB. */ > + struct tlbe guest_tlb[PPC44x_TLB_SIZE]; > + /* This is the TLB that's actually used when the guest is =20 > running. */ > + struct tlbe shadow_tlb[PPC44x_TLB_SIZE]; > + /* This is a copy of the host's TLB. */ > + struct tlbe host_tlb[PPC44x_TLB_SIZE]; > + > + u32 host_stack; You will end up with a 4 byte pad here. -JX > + > + u64 fpr[32]; > + u32 gpr[32]; > + > + u32 pc; > + u32 cr; > + u32 ctr; > + u32 lr; > + u32 xer; > + > + u32 msr; > + u32 mmucr; > + u32 sprg0; > + u32 sprg1; > + u32 sprg2; > + u32 sprg3; > + u32 sprg4; > + u32 sprg5; > + u32 sprg6; > + u32 sprg7; > + u32 srr0; > + u32 srr1; > + u32 csrr0; > + u32 csrr1; > + u32 dsrr0; > + u32 dsrr1; > + u32 dear; > + u32 esr; > + u32 dec; > + u32 decar; > + u32 tbl; > + u32 tbu; > + u32 tcr; > + u32 tsr; > + u32 ivor[16]; > + u32 ivpr; > + u32 pvr; > + u32 pid; > + > + struct kvm_stat stat; > + struct mutex mutex; > + void *linear; /* Virtual address used by the kernel. = */ > + void *trampoline; /* Virtual address used for the = trampoline. */ > + void *resume_guest; /* Trampoline address of resume_guest(). = */ > + unsigned int trampoline_tlbe; NIT:Why "unsigned int" where everywhere else you use "u32" > + u32 last_inst; > +}; > + > +extern int __vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu =20 > *vcpu); > + > +extern char kvm_trampoline_start[]; > +extern void kvm_trampoline_resume_host(void); > +extern unsigned long kvm_trampoline_resume_host_len; > +extern void kvm_trampoline_resume_guest(void); > +extern unsigned long kvm_trampoline_resume_guest_len; > +extern unsigned long kvm_trampoline_handler_len; > + > +extern int emulate_instruction(struct kvm_vcpu *vcpu); > + > +#endif /* __ASSEMBLY__ */ > + > +#endif /* __KVM_POWERPC_KVM_H__ */ |
From: Hollis B. <ho...@us...> - 2007-08-24 22:42:45
|
(There may be a couple rejects for this patch in KVM Kconfig/Makefile, but nothing serious.) I've now got head_44x.S running as a guest up until it enables translation. :) There are plenty of gaping holes here, but all comments are welcome, especially if you think something is working by accident... -- Hollis Blanchard IBM Linux Technology Center |
From: Hollis B. <ho...@us...> - 2007-08-24 19:47:26
|
I have a little code working: basically context switch between guest and host, plus the beginnings of instruction emulation. Unfortunately, since we don't yet have the qemu/KVM interfaces worked out yet, I don't have any interaction with userspace; I'm just hijacking the boot process to install a tiny built-in guest and run it in a loop. So far I've found a couple interesting design points. First of all, we need to make sure a single mapping covers the KVM exception handlers to minimize intrusion on the guest's TLB entries. That means we need to physically co-locate the exception handlers with the vcpu data structure (so they have a place to save state). We can't execute directly out of a module anyways, since that memory isn't physically contiguous. (Note: each copy of the 440 TLB in the vcpu is exactly 1KB, and right now I have three copies.) Another trick: to avoid swapping all 16 IVORs on every guest<->host transition, we can copy the KVM handlers into memory to match the host's IVORs; then we just need to swap IVPR. This is working well, though makes debugging difficult and requires some manual relocation of branches at installation time. Combining those two requirements means each "vcpu" is actually part of a 64KB memory allocation which also contains the exception handlers, and we place that address into IVPR. Additionally, because we cannot have two TLB entries simultaneously mapping the same effective address, and because physically contiguous allocations are part of the kernel linear mapping, we're also allocating an "empty" 64KB vmalloc area, which is where we execute from while in the trampoline (transitioning guest<->host). We manage the TLB mappings for that directly to ensure that the host kernel doesn't create another mapping for it. I can send out patches, but if you aren't interested in about 500 lines of some pretty intense assembly, there won't be much to look at, and it's changing dramatically every day anyways. Just let me know... BTW, I've been focusing on 440 in particular. I'm not quite sure what the best way is to define an alternate "kvm_vcpu" structure for e.g. e500 is, but I have some ideas and I'd be happy to discuss it with whoever starts that work. Just a reminder: next week I'll be at the KVM Forum, where I will be presenting our directions and initial designs on Friday. -- Hollis Blanchard IBM Linux Technology Center |
From: Hollis B. <ho...@us...> - 2007-08-23 22:27:18
|
On Mon, 2007-08-13 at 08:22 -0400, Jimi Xenidis wrote: > > > However, this is a barrier to running 970 code on other PowerPC, > since > > other processors won't even have the option of changing dcbz's > > behavior. > > > > Can you guys think of any other registers like this we need to worry > > about, especially on Book E parts? > > The only issues I can come up are the differences in SPRG treatment > WRT to 3E hosting 3S or the reverse, but I don't think you are going > after that.. are you? I actually don't think the USPRGs are an issue, since guest access to them would trap and we would emulate them. I'm looking for other things that modify usermode behavior. In addition to 970 dcbz, there are MSR bits (e.g. LE) that alter the behavior of usermode instructions too. You'd have a similar issue when trying to run a guest that uses LE on a processor that lacks it. (AFAICS only 100% instruction emulation could solve that problem.) -- Hollis Blanchard IBM Linux Technology Center |
From: Jimi X. <ji...@po...> - 2007-08-13 12:22:40
|
On Aug 9, 2007, at 6:17 PM, Hollis Blanchard wrote: > On 970, there's a HID bit that determines whether dcbz acts on a 32- > byte > or 128-byte block of memory. This is difficult for us to virtualize > without hardware support because dcbz is a user-mode instruction, > and so > executes natively without a trap to the host. Of course, modifying the > HID register would trap, so then we could context switch that register > with the rest of the guest state. The 970 (and all other PPC CPUs that offer a choice) also have a HID (HID5[57] on 970) to make dcbz illegal which will be handy if cache size is neither 32 or 64. I don't think the 3Es have this, too bad. The excellent point is that I think it is an excellent design point for an architected virtualization extension as we see implementations choose different cache line sizes. > > However, this is a barrier to running 970 code on other PowerPC, since > other processors won't even have the option of changing dcbz's > behavior. > > Can you guys think of any other registers like this we need to worry > about, especially on Book E parts? The only issues I can come up are the differences in SPRG treatment WRT to 3E hosting 3S or the reverse, but I don't think you are going after that.. are you? -JX |
From: Hollis B. <ho...@us...> - 2007-08-10 17:52:45
|
On Fri, 2007-08-10 at 08:28 -0700, Yoder Stuart-B08248 wrote: > > It's not mode related, but in BookE, TBU/TBL are User read / Supervisor > write. So we won't be able to trap/emulate reads to the timebase on > real Book E hardware if we think the timebase needs to be virtualized > for some reason. Don't think this is a big deal. Agreed. Practically speaking, it actually is pretty important that guests be able to tell how much real time has elapsed. In Linux's case this is where we replay ticks that would have triggered when the VCPU was de-scheduled. Of course, I have no idea how an RTOS would react to missing its decrementer/FIT interrupts. Hopefully we can mitigate that problem by setting appropriate host scheduling parameters in the first place... If we want to consider live migration in the future, the user-accessible timebase will raise some issues... especially if the two systems' timebase frequencies are not equal. -- Hollis Blanchard IBM Linux Technology Center |
From: Yoder Stuart-B. <stu...@fr...> - 2007-08-10 15:28:47
|
=20 > -----Original Message----- > From: kvm...@li...=20 > [mailto:kvm...@li...] On=20 > Behalf Of Hollis Blanchard > Sent: Thursday, August 09, 2007 5:17 PM > To: kvm...@li... > Subject: [Kvm-ppc-devel] SPRs which alter usermode behavior >=20 > On 970, there's a HID bit that determines whether dcbz acts=20 > on a 32-byte > or 128-byte block of memory. This is difficult for us to virtualize > without hardware support because dcbz is a user-mode=20 > instruction, and so > executes natively without a trap to the host. Of course, modifying the > HID register would trap, so then we could context switch that register > with the rest of the guest state. >=20 > However, this is a barrier to running 970 code on other PowerPC, since > other processors won't even have the option of changing=20 > dcbz's behavior. >=20 > Can you guys think of any other registers like this we need to worry > about, especially on Book E parts? It's not mode related, but in BookE, TBU/TBL are User read / Supervisor write. So we won't be able to trap/emulate reads to the timebase on real Book E hardware if we think the timebase needs to be virtualized for some reason. Don't think this is a big deal. Stuart |
From: Hollis B. <ho...@us...> - 2007-08-09 22:17:35
|
On 970, there's a HID bit that determines whether dcbz acts on a 32-byte or 128-byte block of memory. This is difficult for us to virtualize without hardware support because dcbz is a user-mode instruction, and so executes natively without a trap to the host. Of course, modifying the HID register would trap, so then we could context switch that register with the rest of the guest state. However, this is a barrier to running 970 code on other PowerPC, since other processors won't even have the option of changing dcbz's behavior. Can you guys think of any other registers like this we need to worry about, especially on Book E parts? -- Hollis Blanchard IBM Linux Technology Center |