From: Jerone Y. <jy...@us...> - 2008-04-25 05:57:37
|
This set of patches fixes 100% CPU usage when a guest is idle on PowerPC. This ti me it uses common kvm functions to sleep the guest. Signed-off-by: Jerone Young <jy...@us...> 5 files changed, 91 insertions(+), 4 deletions(-) arch/powerpc/kvm/booke_guest.c | 6 +++ arch/powerpc/kvm/emulate.c | 5 ++ arch/powerpc/kvm/powerpc.c | 15 ++++++- arch/powerpc/platforms/44x/Makefile | 2 - arch/powerpc/platforms/44x/idle.c | 67 +++++++++++++++++++++++++++++++++++ |
From: Jerone Y. <jy...@us...> - 2008-04-25 05:57:36
|
2 files changed, 16 insertions(+) arch/powerpc/kvm/booke_guest.c | 6 ++++++ arch/powerpc/kvm/powerpc.c | 10 ++++++++++ This patch adds vcpu_put & vpu_load in strategic places (as x86 does it), for use of premption. This patch also adds a very critial bit need to wake up guest that end up going to being rescheduled and go to sleep. Signed-off-by: Jerone Young <jy...@us...> diff --git a/arch/powerpc/kvm/booke_guest.c b/arch/powerpc/kvm/booke_guest.c --- a/arch/powerpc/kvm/booke_guest.c +++ b/arch/powerpc/kvm/booke_guest.c @@ -514,6 +514,8 @@ int kvm_arch_vcpu_ioctl_get_regs(struct { int i; + vcpu_load(vcpu); + regs->pc = vcpu->arch.pc; regs->cr = vcpu->arch.cr; regs->ctr = vcpu->arch.ctr; @@ -533,6 +535,8 @@ int kvm_arch_vcpu_ioctl_get_regs(struct for (i = 0; i < ARRAY_SIZE(regs->gpr); i++) regs->gpr[i] = vcpu->arch.gpr[i]; + + vcpu_put(vcpu); return 0; } @@ -595,6 +599,7 @@ int kvm_arch_vcpu_ioctl_translate(struct u8 pid; u8 as; + vcpu_load(vcpu); eaddr = tr->linear_address; pid = (tr->linear_address >> 32) & 0xff; as = (tr->linear_address >> 40) & 0x1; @@ -610,6 +615,7 @@ int kvm_arch_vcpu_ioctl_translate(struct tr->physical_address = tlb_xlate(gtlbe, eaddr); /* XXX what does "writeable" and "usermode" even mean? */ tr->valid = 1; + vcpu_put(vcpu); return 0; } diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -212,6 +212,9 @@ static void kvmppc_decrementer_func(unsi { struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data; + if (waitqueue_active(&vcpu->wq)) + wake_up_interruptible(&vcpu->wq); + kvmppc_queue_exception(vcpu, BOOKE_INTERRUPT_DECREMENTER); } @@ -338,6 +341,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_v int r; sigset_t sigsaved; + vcpu_load(vcpu); + if (vcpu->sigset_active) sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved); @@ -362,12 +367,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_v if (vcpu->sigset_active) sigprocmask(SIG_SETMASK, &sigsaved, NULL); + vcpu_put(vcpu); + return r; } int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq) { + vcpu_load(vcpu); kvmppc_queue_exception(vcpu, BOOKE_INTERRUPT_EXTERNAL); + vcpu_put(vcpu); + return 0; } |
From: Hollis B. <ho...@us...> - 2008-04-25 13:57:20
|
On Friday 25 April 2008 00:56:04 Jerone Young wrote: > diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c > --- a/arch/powerpc/kvm/powerpc.c > +++ b/arch/powerpc/kvm/powerpc.c > @@ -212,6 +212,9 @@ static void kvmppc_decrementer_func(unsi > { > struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data; > > + if (waitqueue_active(&vcpu->wq)) > + wake_up_interruptible(&vcpu->wq); > + > kvmppc_queue_exception(vcpu, BOOKE_INTERRUPT_DECREMENTER); > } Hooray! > int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq) > { > + vcpu_load(vcpu); > kvmppc_queue_exception(vcpu, BOOKE_INTERRUPT_EXTERNAL); > + vcpu_put(vcpu); > + > return 0; > } load/put here is definitely unnecessary. That makes me question how necessary it is in other parts of this patch too. I think the (hardware) TLB is the only state we really need to worry about, because there is no other state that our guest can load into the hardware that is not handled by a regular context switch. If that's true, we should only need vcpu_load/put() on paths where we muck with the TLB behind the host's back, and that is only in the run path. -- Hollis Blanchard IBM Linux Technology Center |
From: Jerone Y. <jy...@us...> - 2008-04-25 05:58:27
|
2 files changed, 68 insertions(+), 1 deletion(-) arch/powerpc/platforms/44x/Makefile | 2 - arch/powerpc/platforms/44x/idle.c | 67 +++++++++++++++++++++++++++++++++++ This patch has been accepted upstream and will be in 2.6.26. So it will eventually need to be removed when we move to 2.6.26rc. This patch adds the ability for the CPU to go into wait state while in cpu_idle loop. This helps virtulization solutions know when the guest Linux kernel is in an idle state. There are two ways to do it. Command line options: idle=spin <-- CPU will spin By default will go into wait mode. Signed-off-by: Jerone Young <jy...@us...> diff --git a/arch/powerpc/platforms/44x/Makefile b/arch/powerpc/platforms/44x/Makefile --- a/arch/powerpc/platforms/44x/Makefile +++ b/arch/powerpc/platforms/44x/Makefile @@ -1,4 +1,4 @@ obj-$(CONFIG_44x) := misc_44x.o -obj-$(CONFIG_44x) := misc_44x.o +obj-$(CONFIG_44x) := misc_44x.o idle.o obj-$(CONFIG_EBONY) += ebony.o obj-$(CONFIG_TAISHAN) += taishan.o obj-$(CONFIG_BAMBOO) += bamboo.o diff --git a/arch/powerpc/platforms/44x/idle.c b/arch/powerpc/platforms/44x/idle.c new file mode 100644 --- /dev/null +++ b/arch/powerpc/platforms/44x/idle.c @@ -0,0 +1,67 @@ +/* + * Copyright 2008 IBM Corp. + * + * Based on arch/powerpc/platforms/pasemi/idle.c: + * Copyright (C) 2006-2007 PA Semi, Inc + * + * Added by: Jerone Young <jy...@us...> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + */ + +#include <linux/of.h> +#include <linux/kernel.h> +#include <asm/machdep.h> + +static int mode_spin; + +static void ppc44x_idle(void) +{ + unsigned long msr_save; + + msr_save = mfmsr(); + /* set wait state MSR */ + mtmsr(msr_save|MSR_WE|MSR_EE|MSR_CE|MSR_DE); + isync(); + /* return to initial state */ + mtmsr(msr_save); + isync(); +} + +int __init ppc44x_idle_init(void) +{ + if (!mode_spin) { + /* If we are not setting spin mode + then we set to wait mode */ + ppc_md.power_save = &ppc44x_idle; + } + + return 0; +} + +arch_initcall(ppc44x_idle_init); + +static int __init idle_param(char *p) +{ + + if (!strcmp("spin", p)) { + mode_spin = 1; + ppc_md.power_save = NULL; + } + + return 0; +} + +early_param("idle", idle_param); |
From: Jerone Y. <jy...@us...> - 2008-04-25 05:58:27
|
2 files changed, 7 insertions(+), 3 deletions(-) arch/powerpc/kvm/emulate.c | 5 +++++ arch/powerpc/kvm/powerpc.c | 5 ++--- This patch handles a guest that is in a wait state. This ensures that the guest is not allways eating up 100% cpu when it is idle. Signed-off-by: Jerone Young <jy...@us...> diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c --- a/arch/powerpc/kvm/emulate.c +++ b/arch/powerpc/kvm/emulate.c @@ -265,6 +265,11 @@ int kvmppc_emulate_instruction(struct kv case 146: /* mtmsr */ rs = get_rs(inst); kvmppc_set_msr(vcpu, vcpu->arch.gpr[rs]); + + /* handle guest vcpu that is in wait state */ + if (vcpu->arch.msr & MSR_WE) { + kvm_vcpu_block(vcpu); + } break; case 163: /* wrteei */ diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -36,13 +36,12 @@ gfn_t unalias_gfn(struct kvm *kvm, gfn_t int kvm_cpu_has_interrupt(struct kvm_vcpu *v) { - /* XXX implement me */ - return 0; + return !!(v->arch.pending_exceptions); } int kvm_arch_vcpu_runnable(struct kvm_vcpu *v) { - return 1; + return !(v->arch.msr & MSR_WE); } |
From: Hollis B. <ho...@us...> - 2008-04-25 13:57:16
|
On Friday 25 April 2008 00:56:03 Jerone Young wrote: > This patch handles a guest that is in a wait state. This ensures that the guest is not allways eating up 100% cpu when it is idle. > > Signed-off-by: Jerone Young <jy...@us...> > > diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c > --- a/arch/powerpc/kvm/emulate.c > +++ b/arch/powerpc/kvm/emulate.c > @@ -265,6 +265,11 @@ int kvmppc_emulate_instruction(struct kv > case 146: /* mtmsr */ > rs = get_rs(inst); > kvmppc_set_msr(vcpu, vcpu->arch.gpr[rs]); > + > + /* handle guest vcpu that is in wait state */ > + if (vcpu->arch.msr & MSR_WE) { > + kvm_vcpu_block(vcpu); > + } > break; > > case 163: /* wrteei */ So if I apply this patch and not #3, the guest will put itself to sleep and never wake up? You need to combine patches 2 and 3. Also, for completeness, you should add the same test to the rfi emulation, which could (theoretically) also set MSR[WE]. -- Hollis Blanchard IBM Linux Technology Center |
From: Hollis B. <ho...@us...> - 2008-04-25 14:01:07
|
On Friday 25 April 2008 00:56:01 Jerone Young wrote: > This set of patches fixes 100% CPU usage when a guest is idle on PowerPC. This time it uses common kvm functions to sleep the guest. Looking much better now, with just a few minor issues to correct. With these patches applied, about how much CPU *does* an idling guest consume? By the way, you don't explicitly *unset* MSR[WE]. I think this works implicitly because of the way we deliver interrupts; could you add a comment explaining that? -- Hollis Blanchard IBM Linux Technology Center |
From: Jerone Y. <jy...@us...> - 2008-04-25 16:37:48
|
On Fri, 2008-04-25 at 09:00 -0500, Hollis Blanchard wrote: > On Friday 25 April 2008 00:56:01 Jerone Young wrote: > > This set of patches fixes 100% CPU usage when a guest is idle on PowerPC. > This time it uses common kvm functions to sleep the guest. > > Looking much better now, with just a few minor issues to correct. With these > patches applied, about how much CPU *does* an idling guest consume? With the current patch *as is* idle guest are eating about 16% CPU. Better then 100%, but more then the other patch. I'll see if by removing the vcpu_loads & vcpu_puts if that goes down. > > By the way, you don't explicitly *unset* MSR[WE]. I think this works > implicitly because of the way we deliver interrupts; could you add a comment > explaining that? Yes it is unset implicity. I can add a comment on this. > |