kvm-ppc-devel Mailing List for kernel virtual machine (Page 5)

Brought to you by: avik, mtosatti

kvm-ppc-devel — The kvm PowerPC port

You can subscribe to this list here.

2007	_Jan	_Feb	_Mar	_Apr	_May	_Jun	_Jul	_Aug (22)	_Sep (45)	_Oct (165)	_Nov (149)	_Dec (53)
2008	_Jan (155)	_Feb (71)	_Mar (219)	_Apr (262)	_May (21)	_Jun (5)	_Jul	_Aug	_Sep	_Oct	_Nov	_Dec

Flat | Threaded

<< < 1 .. 3 4 5 6 7 .. 47 > >> (Page 5 of 47)

Re: [kvm-ppc-devel] [kvm-devel] [PATCH 1/5]Add some trace markers and exposeinterfaces in kernel for tracing

From: Christian E. <ehr...@li...> - 2008-04-18 06:08:25

Liu, Eric E wrote:
> Hollis Blanchard wrote:
>> On Wednesday 16 April 2008 01:45:34 Liu, Eric E wrote:
[...]
>> Actually... we could have kvmtrace itself insert the metadata, so
>> there would be no chance of it being overwritten in the kernel
>> buffers. The header could be written in tip_open_output(), and update
>> fs_size accordingly. 
>>
> Yes, let kvmtrace insert the metadata is more reasonable.
> 

I wanted to note that the kvmtrace tool should, but not need to know everything about the data format.
I think of e.g. changing kernel implementations that change endianess or even flags we don't yet know, but we might need in the future.

What about adding another debugfs entry the kernel can use to expose the "kvmtrace-metadata" defined by the kernel implementation.
The kvmtrace tool could then use that to build up the record by using one entry for kernel defined metadata and another to add any metadata that would be defined by kvmtrace tool itself.

what about that one:
	struct metadata {
		u32 kmagic; /* stores kernel defined metadata read from debugfs entry */
		u32 umagic; /* stores userspace tool defined metadata */
		u32 extra;  /* it is redundant, only use to fit into record. */
	} 

That should give us the flexibility to keep the format if we get more metadata requirements in the future.

-- 

Grüsse / regards, 
Christian Ehrhardt
IBM Linux Technology Center, Open Virtualization

Re: [kvm-ppc-devel] [kvm-devel] [PATCH 1/5]Add some trace markers and exposeinterfaces in kernel for tracing

From: Liu, E. E <eri...@in...> - 2008-04-18 01:41:04

Hollis Blanchard wrote:
> On Wednesday 16 April 2008 01:45:34 Liu, Eric E wrote:
>> 
>>> Hmm, while we're on the subject, I'm not sure what the best way to
>>> automatically byteswap will be. It probably isn't worth it to
>>> convert all trace data to a standard ordering (which would add
>>> overhead to tracing), but I suppose there is no metadata in the
>>> trace log? A command line switch might be inconvenient but
>>> inevitable. 
>> 
>> A tricky approach is that we insert medadata to the trace file before
> reading the trace log, so that the analysis tool can look at the
> medadata to check whether we need to convert byte order?
> 
> Actually, can't we lose trace records? It would be unfortunate if the
> magic metadata record was overwritten by the trace data. Perhaps a
> 1-byte "flags" variable at the beginning of each record could
> indicate the data's endianness?
> 
> Another option would be to have the "kvmtrace" tool transcribe the
> data itself. I think that would be fine, since kvmtrace must run on
> the target anyways, but your kvmtrace implementation doesn't actually
> understand the format of the data.
> 
> Actually... we could have kvmtrace itself insert the metadata, so
> there would be no chance of it being overwritten in the kernel
> buffers. The header could be written in tip_open_output(), and update
> fs_size accordingly. 
> 
Yes, let kvmtrace insert the metadata is more reasonable.

> Do you have any suggestions for the format of the metadata? I'm not
> sure how it should fit into the record format expected by
> kvmtrace_format. 
I think maybe it can like this: 
	struct metadata {
		u32 magic;
		u64 extra;  /* it is redundant, only use to fit into
record. */
	} 
kvmtrace_format can check magic to judge whethe we need to convert byte
order.

[kvm-ppc-devel] [PATCH] Fix missing decleration for kvm_enabled() in qemu for target-ppc/helper.c

From: Jerone Y. <jy...@us...> - 2008-04-17 22:10:17

1 file changed, 1 insertion(+)
qemu/target-ppc/helper.c |    1 +


Recent change now requires target-ppc/helper.c to now include "qemu-kvm.h" to get the definition for kvm_enabled(). This fixes it so things now compile again.

Signed-off-by: Jerone Young <jy...@us...>

diff --git a/qemu/target-ppc/helper.c b/qemu/target-ppc/helper.c
--- a/qemu/target-ppc/helper.c
+++ b/qemu/target-ppc/helper.c
@@ -29,6 +29,7 @@
 #include "exec-all.h"
 #include "helper_regs.h"
 #include "qemu-common.h"
+#include "qemu-kvm.h"
 
 //#define DEBUG_MMU
 //#define DEBUG_BATS

Re: [kvm-ppc-devel] [kvm-devel] [PATCH 1/5]Add some trace markers and exposeinterfaces in kernel for tracing

From: Hollis B. <ho...@us...> - 2008-04-17 21:59:45

On Wednesday 16 April 2008 01:45:34 Liu, Eric E wrote:
> 
> > Hmm, while we're on the subject, I'm not sure what the best way to
> > automatically byteswap will be. It probably isn't worth it to convert
> > all trace data to a standard ordering (which would add overhead to
> > tracing), but I suppose there is no metadata in the trace log? A
> > command line switch might be inconvenient but inevitable.
> 
> A tricky approach is that we insert medadata to the trace file before 
reading the trace log, so that the analysis tool can look at the medadata to 
check whether we need to convert byte order?  

Actually, can't we lose trace records? It would be unfortunate if the magic 
metadata record was overwritten by the trace data. Perhaps a 1-byte "flags" 
variable at the beginning of each record could indicate the data's 
endianness?

Another option would be to have the "kvmtrace" tool transcribe the data 
itself. I think that would be fine, since kvmtrace must run on the target 
anyways, but your kvmtrace implementation doesn't actually understand the 
format of the data.

Actually... we could have kvmtrace itself insert the metadata, so there would 
be no chance of it being overwritten in the kernel buffers. The header could 
be written in tip_open_output(), and update fs_size accordingly.

Do you have any suggestions for the format of the metadata? I'm not sure how 
it should fit into the record format expected by kvmtrace_format.

-- 
Hollis Blanchard
IBM Linux Technology Center

Re: [kvm-ppc-devel] [kvm-devel] [PATCH 1/5]Add some trace markers and exposeinterfaces in kernel for tracing

From: Hollis B. <ho...@us...> - 2008-04-17 18:38:13

On Wednesday 16 April 2008 01:45:34 Liu, Eric E wrote:
> Hollis Blanchard wrote:
> > On Tuesday 15 April 2008 22:13:28 Liu, Eric E wrote:
> >> Hollis Blanchard wrote:
> >>> On Wednesday 09 April 2008 05:01:36 Liu, Eric E wrote:
> >>>> +/* This structure represents a single trace buffer record. */
> >>>> +struct kvm_trace_rec { +       __u32 event:28;
> >>>> +       __u32 extra_u32:3;
> >>>> +       __u32 cycle_in:1;
> >>>> +       __u32 pid;
> >>>> +       __u32 vcpu_id;
> >>>> +       union {
> >>>> +               struct {
> >>>> +                       __u32 cycle_lo, cycle_hi;
> >>>> +                       __u32 extra_u32[KVM_TRC_EXTRA_MAX];
> >>>> +               } cycle; +               struct {
> >>>> +                       __u32 extra_u32[KVM_TRC_EXTRA_MAX];
> >>>> +               } nocycle; +       } u;
> >>>> +};
> >>> 
> >>> Do we really need bitfields here? They are notoriously non-portable.
> >>> 
> >>> Practically speaking, this will prevent me from copying a trace file
> >>> from my big-endian target to my little-endian workstation for
> >>> analysis, at least without some ugly hacking in the userland tool.
> >> Here the main consideration using bitfields is to save storage space
> >> for 
> > each record, but as you said it is non-portable for your mentioned
> > case, so should we need to adjust the struct like this?
> >> 	__u32 event;
> >> 	__16 extra_u32;
> >> 	__16 cycle_in;
> > 
> > If space really is a worry, you could still combine the fields, and
> > just use masks to extract the data later. No matter what,
> > byteswapping is required in the userland tool. I suspect this isn't
> > there already, but it will be easier to add without the bitfields.
> > 
> > Hmm, while we're on the subject, I'm not sure what the best way to
> > automatically byteswap will be. It probably isn't worth it to convert
> > all trace data to a standard ordering (which would add overhead to
> > tracing), but I suppose there is no metadata in the trace log? A
> > command line switch might be inconvenient but inevitable.
> 
> A tricky approach is that we insert medadata to the trace file before 
reading the trace log, so that the analysis tool can look at the medadata to 
check whether we need to convert byte order?   

It is tricky, but I like it, FWIW. :)

-- 
Hollis Blanchard
IBM Linux Technology Center

[kvm-ppc-devel] [Reminder] Call for Presentations: KVM Forum 2008

From: Avi K. <av...@qu...> - 2008-04-17 14:02:17

[Note: KVM Forum registration is now open at
http://kforum.qumranet.com/KVMForum/about_kvmforum.php]

This is the Call for Presentations for the second annual KVM Developer's
Forum, to be held on June 10-13, 2008, in Napa, California, USA [1].  We
are looking for presentations on KVM development, quality assurance,
management, security, interoperability, architecture support, and
interesting use cases.  Presentations are 50 minutes in length; there
are also 25-minute mini-presentation slots available.

KVM Forum presentations are an excellent way to inform the KVM
development community about your work, and to gather valuable feedback
about your approach.

Please send your presentation proposal to the KVM Forum 2008 Content
Committee at kf2...@qu... by April 20th.

KVM Forum 2008 Content Committee:
Dor Laor
Anthony Liguori
Avi Kivity

[1] http://kforum.qumranet.com/KVMForum/about_kvmforum.php

-- 
Any sufficiently difficult bug is indistinguishable from a feature.

Re: [kvm-ppc-devel] [kvm-devel] [PATCH 4 of 4] [KVM POWERPC] PowerPC 440 KVM implementation

From: Christian E. <ehr...@li...> - 2008-04-17 14:00:23

Attachments: fix_kvmppc_nomodule_build.diff

Avi Kivity wrote:
> Avi Kivity wrote:
>> Hollis Blanchard wrote:
>>> +config KVM
>>> + tristate "Kernel-based Virtual Machine (KVM) support"
>>> + depends on EXPERIMENTAL
>>> + select PREEMPT_NOTIFIERS
>>> + select ANON_INODES
>>> + ---help---
>>> + Support hosting virtualized guest machines. You will also
>>> + need to select one or more of the processor modules below.
>>> +
>>> + This module provides access to the hardware capabilities through
>>> + a character device node named /dev/kvm.
>>> +
>>> + To compile this as a module, choose M here: the module
>>> + will be called kvm.
>>> +
>>> + If unsure, say N.
>>
>> In my ignorance, I set KVM=m on a non-44x build, which then failed. 
>> This needs either to depend on 44x, or to be fixed to compile.
> 
> Setting 44x, I get
>> AS [M] arch/powerpc/kvm/booke_interrupts.o
>> arch/powerpc/kvm/booke_interrupts.S: Assembler messages:
>> arch/powerpc/kvm/booke_interrupts.S:351: Error: unsupported relocation 
>> against VCPU_HOST_TLB
>> arch/powerpc/kvm/booke_interrupts.S:352: Error: unsupported relocation 
>> against VCPU_SHADOW_TLB

Afaik we just don't support building kvm as module atm. So a simple and fast solution would be to change the Kconfig options from tristate to bool.
Additionally we still have some cross references between the code build on the two used symbols (KVM&KVM_BOOKE_HOST) which means that we need to ensure that if KVM if configured for powerpc we also select exactly one host implementation. To do so I changed the second option to a choice field which eventually has always selected one suboption.

To ensure that the only existent suboption we have atm can be selected we need the smallest commonality as dependency at the KVM config option which atm put "depends 44x" there.
We can change that once we support modules and/or separate selections.

A patch for that is attached, but I would like to wait for Hollis comments on that before you apply that.

> So further Kconfig restrictions are needed, or perhaps a patch. .config 
> attached.
> 
> 
[...]

-- 

Grüsse / regards, 
Christian Ehrhardt
IBM Linux Technology Center, Open Virtualization

Re: [kvm-ppc-devel] [PATCH 4 of 4] [KVM POWERPC] PowerPC 440 KVM implementation

From: Avi K. <av...@qu...> - 2008-04-17 11:19:44

Attachments: ppc.config

Avi Kivity wrote:
> Hollis Blanchard wrote:
>> +config KVM
>> + tristate "Kernel-based Virtual Machine (KVM) support"
>> + depends on EXPERIMENTAL
>> + select PREEMPT_NOTIFIERS
>> + select ANON_INODES
>> + ---help---
>> + Support hosting virtualized guest machines. You will also
>> + need to select one or more of the processor modules below.
>> +
>> + This module provides access to the hardware capabilities through
>> + a character device node named /dev/kvm.
>> +
>> + To compile this as a module, choose M here: the module
>> + will be called kvm.
>> +
>> + If unsure, say N.
>
> In my ignorance, I set KVM=m on a non-44x build, which then failed. 
> This needs either to depend on 44x, or to be fixed to compile.

Setting 44x, I get
> AS [M] arch/powerpc/kvm/booke_interrupts.o
> arch/powerpc/kvm/booke_interrupts.S: Assembler messages:
> arch/powerpc/kvm/booke_interrupts.S:351: Error: unsupported relocation 
> against VCPU_HOST_TLB
> arch/powerpc/kvm/booke_interrupts.S:352: Error: unsupported relocation 
> against VCPU_SHADOW_TLB

So further Kconfig restrictions are needed, or perhaps a patch. .config 
attached.

-- 
error compiling committee.c: too many arguments to function

Re: [kvm-ppc-devel] [PATCH 4 of 4] [KVM POWERPC] PowerPC 440 KVM implementation

From: Avi K. <av...@qu...> - 2008-04-17 11:14:43

Hollis Blanchard wrote:
> +config KVM
> +	tristate "Kernel-based Virtual Machine (KVM) support"
> +	depends on EXPERIMENTAL
> +	select PREEMPT_NOTIFIERS
> +	select ANON_INODES
> +	---help---
> +	  Support hosting virtualized guest machines. You will also
> +	  need to select one or more of the processor modules below.
> +
> +	  This module provides access to the hardware capabilities through
> +	  a character device node named /dev/kvm.
> +
> +	  To compile this as a module, choose M here: the module
> +	  will be called kvm.
> +
> +	  If unsure, say N.
>   

In my ignorance, I set KVM=m on a non-44x build, which then failed. This 
needs either to depend on 44x, or to be fixed to compile.

> +
> +config KVM_BOOKE_HOST
> +	tristate "KVM host support for Book E PowerPC processors"
> +	depends on KVM && 44x
> +	---help---
> +	  Provides host support for KVM on Book E PowerPC processors. Currently
> +	  this works on 440 processors only.
> +
>   


-- 
error compiling committee.c: too many arguments to function

Re: [kvm-ppc-devel] [PATCH 0 of 4] [v3] KVM for PowerPC 440

From: Avi K. <av...@qu...> - 2008-04-17 10:00:00

Hollis Blanchard wrote:
> [POWERPC KVM] Implement KVM for PowerPC 440
>
> Avi, these patches have now been acked by Paul Mackerras, so I think they're
> ready to be committed.
>   

Applied all, thanks.

-- 
error compiling committee.c: too many arguments to function

Re: [kvm-ppc-devel] [kvm-devel] [PATCH 0 of 2] [RESEND] Fix PowerPC cpu initiliaztion

From: Avi K. <av...@qu...> - 2008-04-17 09:44:14

Jerone Young wrote:
> This patch apparently fell through the cracks or I didn't send the rised version to the list. These patches fix cpu initilization for PowerPC. Without them guest cannot be launched.
>
>   

Applied both, thanks.

-- 
error compiling committee.c: too many arguments to function

[kvm-ppc-devel] [PATCH 0 of 4] [v3] KVM for PowerPC 440

From: Hollis B. <ho...@us...> - 2008-04-17 04:30:00

[POWERPC KVM] Implement KVM for PowerPC 440

Avi, these patches have now been acked by Paul Mackerras, so I think they're
ready to be committed.

Changes from v2:
- Move some code into booke_guest.c
- Actually include booke_host.c this time

Changes from v1:
- Updates for latest KVM upstream (build fixes)
- Split some code out into booke_host.c

Changes from v0:
- Move FPRs into struct kvm_fpu
- Use u64 for register size in kvm_regs
- Create Documentation/powerpc/kvm_440.txt
- Comment describing incompatibility with PPC_EARLY_DEBUG
- Replace hardcoded mask in kvmppc_44x_tlb_shadow_attrib() and add comment
- Comment to explain why we special-case DCRN_CPR0_*
- Don't redefine DCRN_CPR0_*
- Slightly reword description for patch 2

[kvm-ppc-devel] [PATCH 2 of 4] [KVM] Add DCR access information to struct kvm_run

From: Hollis B. <ho...@us...> - 2008-04-17 04:29:06

1 file changed, 7 insertions(+)
include/linux/kvm.h |    7 +++++++


Device Control Registers are essentially another address space found on PowerPC
4xx processors, analogous to PIO on x86. DCRs are always 32 bits, and can be
identified by a 32-bit number. We forward most DCR accesses to userspace for
emulation (with the exception of CPR0 registers, which can be read directly
for simplicity in timebase frequency determination).

Signed-off-by: Hollis Blanchard <ho...@us...>

diff --git a/include/linux/kvm.h b/include/linux/kvm.h
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -82,6 +82,7 @@
 #define KVM_EXIT_TPR_ACCESS       12
 #define KVM_EXIT_S390_SIEIC       13
 #define KVM_EXIT_S390_RESET       14
+#define KVM_EXIT_DCR              15
 
 /* for KVM_RUN, returned by mmap(vcpu_fd, offset=0) */
 struct kvm_run {
@@ -161,6 +162,12 @@
 #define KVM_S390_RESET_CPU_INIT  8
 #define KVM_S390_RESET_IPL       16
 		__u64 s390_reset_flags;
+		/* KVM_EXIT_DCR */
+		struct {
+			__u32 dcrn;
+			__u32 data;
+			__u8  is_write;
+		} dcr;
 		/* Fix the size of the union. */
 		char padding[256];
 	};

[kvm-ppc-devel] [PATCH 1 of 4] [POWERPC 44x] Export tlb_44x_hwater for KVM

From: Hollis B. <ho...@us...> - 2008-04-17 04:28:19

1 file changed, 2 insertions(+)
include/asm-powerpc/mmu-44x.h |    2 ++


PowerPC 440 KVM needs to know how many TLB entries are used for the host kernel
linear mapping (it does not modify these mappings when switching between guest
and host execution).

Signed-off-by: Hollis Blanchard <ho...@us...>
Acked-by: Josh Boyer <jw...@li...>
Acked-by: Paul Mackerras <pa...@sa...>

diff --git a/include/asm-powerpc/mmu-44x.h b/include/asm-powerpc/mmu-44x.h
--- a/include/asm-powerpc/mmu-44x.h
+++ b/include/asm-powerpc/mmu-44x.h
@@ -53,6 +53,8 @@
 
 #ifndef __ASSEMBLY__
 
+extern unsigned int tlb_44x_hwater;
+
 typedef unsigned long long phys_addr_t;
 
 typedef struct {

[kvm-ppc-devel] [PATCH 4 of 4] [KVM POWERPC] PowerPC 440 KVM implementation

From: Hollis B. <ho...@us...> - 2008-04-17 04:28:19

19 files changed, 3160 insertions(+), 2 deletions(-)
Documentation/powerpc/kvm_440.txt   |   41 +
arch/powerpc/Kconfig                |    1 
arch/powerpc/Kconfig.debug          |    3 
arch/powerpc/Makefile               |    1 
arch/powerpc/kernel/asm-offsets.c   |   26 +
arch/powerpc/kvm/44x_tlb.c          |  224 ++++++++++
arch/powerpc/kvm/44x_tlb.h          |   91 ++++
arch/powerpc/kvm/Kconfig            |   44 ++
arch/powerpc/kvm/Makefile           |   15 
arch/powerpc/kvm/booke_guest.c      |  615 ++++++++++++++++++++++++++++
arch/powerpc/kvm/booke_host.c       |   83 +++
arch/powerpc/kvm/booke_interrupts.S |  436 ++++++++++++++++++++
arch/powerpc/kvm/emulate.c          |  760 +++++++++++++++++++++++++++++++++++
arch/powerpc/kvm/powerpc.c          |  436 ++++++++++++++++++++
include/asm-powerpc/kvm.h           |   53 ++
include/asm-powerpc/kvm_asm.h       |   55 ++
include/asm-powerpc/kvm_host.h      |  152 +++++++
include/asm-powerpc/kvm_para.h      |   38 +
include/asm-powerpc/kvm_ppc.h       |   88 ++++


This functionality is definitely experimental, but is capable of running
unmodified PowerPC 440 Linux kernels as guests on a PowerPC 440 host. (Only
tested with 440EP "Bamboo" guests so far, but with appropriate userspace
support other SoC/board combinations should work.)

See Documentation/powerpc/kvm_440.txt for technical details.

Signed-off-by: Hollis Blanchard <ho...@us...>
Acked-by: Paul Mackerras <pa...@sa...>

diff --git a/Documentation/powerpc/kvm_440.txt b/Documentation/powerpc/kvm_440.txt
new file mode 100644
--- /dev/null
+++ b/Documentation/powerpc/kvm_440.txt
@@ -0,0 +1,41 @@
+Hollis Blanchard <ho...@us...>
+15 Apr 2008
+
+Various notes on the implementation of KVM for PowerPC 440:
+
+To enforce isolation, host userspace, guest kernel, and guest userspace all
+run at user privilege level. Only the host kernel runs in supervisor mode.
+Executing privileged instructions in the guest traps into KVM (in the host
+kernel), where we decode and emulate them. Through this technique, unmodified
+440 Linux kernels can be run (slowly) as guests. Future performance work will
+focus on reducing the overhead and frequency of these traps.
+
+The usual code flow is started from userspace invoking an "run" ioctl, which
+causes KVM to switch into guest context. We use IVPR to hijack the host
+interrupt vectors while running the guest, which allows us to direct all
+interrupts to kvmppc_handle_interrupt(). At this point, we could either
+- handle the interrupt completely (e.g. emulate "mtspr SPRG0"), or
+- let the host interrupt handler run (e.g. when the decrementer fires), or
+- return to host userspace (e.g. when the guest performs device MMIO)
+
+Address spaces: We take advantage of the fact that Linux doesn't use the AS=1
+address space (in host or guest), which gives us virtual address space to use
+for guest mappings. While the guest is running, the host kernel remains mapped
+in AS=0, but the guest can only use AS=1 mappings.
+
+TLB entries: The TLB entries covering the host linear mapping remain
+present while running the guest. This reduces the overhead of lightweight
+exits, which are handled by KVM running in the host kernel. We keep three
+copies of the TLB:
+ - guest TLB: contents of the TLB as the guest sees it
+ - shadow TLB: the TLB that is actually in hardware while guest is running
+ - host TLB: to restore TLB state when context switching guest -> host
+When a TLB miss occurs because a mapping was not present in the shadow TLB,
+but was present in the guest TLB, KVM handles the fault without invoking the
+guest. Large guest pages are backed by multiple 4KB shadow pages through this
+mechanism.
+
+IO: MMIO and DCR accesses are emulated by userspace. We use virtio for network
+and block IO, so those drivers must be enabled in the guest. It's possible
+that some qemu device emulation (e.g. e1000 or rtl8139) may also work with
+little effort.
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -722,3 +722,4 @@
 config PPC_LIB_RHEAP
 	bool
 
+source "arch/powerpc/kvm/Kconfig"
diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
--- a/arch/powerpc/Kconfig.debug
+++ b/arch/powerpc/Kconfig.debug
@@ -151,6 +151,9 @@
 
 config PPC_EARLY_DEBUG
 	bool "Early debugging (dangerous)"
+	# PPC_EARLY_DEBUG on 440 leaves AS=1 mappings above the TLB high water
+	# mark, which doesn't work with current 440 KVM.
+	depends on !KVM
 	help
 	  Say Y to enable some early debugging facilities that may be available
 	  for your processor/board combination. Those facilities are hacks
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -147,6 +147,7 @@
 				   arch/powerpc/platforms/
 core-$(CONFIG_MATH_EMULATION)	+= arch/powerpc/math-emu/
 core-$(CONFIG_XMON)		+= arch/powerpc/xmon/
+core-$(CONFIG_KVM) 		+= arch/powerpc/kvm/
 
 drivers-$(CONFIG_OPROFILE)	+= arch/powerpc/oprofile/
 
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -23,6 +23,7 @@
 #include <linux/mm.h>
 #include <linux/suspend.h>
 #include <linux/hrtimer.h>
+#include <linux/kvm_host.h>
 #ifdef CONFIG_PPC64
 #include <linux/time.h>
 #include <linux/hardirq.h>
@@ -329,5 +330,30 @@
 
 	DEFINE(PGD_TABLE_SIZE, PGD_TABLE_SIZE);
 
+#ifdef CONFIG_KVM
+	DEFINE(TLBE_BYTES, sizeof(struct tlbe));
+
+	DEFINE(VCPU_HOST_STACK, offsetof(struct kvm_vcpu, arch.host_stack));
+	DEFINE(VCPU_HOST_PID, offsetof(struct kvm_vcpu, arch.host_pid));
+	DEFINE(VCPU_HOST_TLB, offsetof(struct kvm_vcpu, arch.host_tlb));
+	DEFINE(VCPU_SHADOW_TLB, offsetof(struct kvm_vcpu, arch.shadow_tlb));
+	DEFINE(VCPU_GPRS, offsetof(struct kvm_vcpu, arch.gpr));
+	DEFINE(VCPU_LR, offsetof(struct kvm_vcpu, arch.lr));
+	DEFINE(VCPU_CR, offsetof(struct kvm_vcpu, arch.cr));
+	DEFINE(VCPU_XER, offsetof(struct kvm_vcpu, arch.xer));
+	DEFINE(VCPU_CTR, offsetof(struct kvm_vcpu, arch.ctr));
+	DEFINE(VCPU_PC, offsetof(struct kvm_vcpu, arch.pc));
+	DEFINE(VCPU_MSR, offsetof(struct kvm_vcpu, arch.msr));
+	DEFINE(VCPU_SPRG4, offsetof(struct kvm_vcpu, arch.sprg4));
+	DEFINE(VCPU_SPRG5, offsetof(struct kvm_vcpu, arch.sprg5));
+	DEFINE(VCPU_SPRG6, offsetof(struct kvm_vcpu, arch.sprg6));
+	DEFINE(VCPU_SPRG7, offsetof(struct kvm_vcpu, arch.sprg7));
+	DEFINE(VCPU_PID, offsetof(struct kvm_vcpu, arch.pid));
+
+	DEFINE(VCPU_LAST_INST, offsetof(struct kvm_vcpu, arch.last_inst));
+	DEFINE(VCPU_FAULT_DEAR, offsetof(struct kvm_vcpu, arch.fault_dear));
+	DEFINE(VCPU_FAULT_ESR, offsetof(struct kvm_vcpu, arch.fault_esr));
+#endif
+
 	return 0;
 }
diff --git a/arch/powerpc/kvm/44x_tlb.c b/arch/powerpc/kvm/44x_tlb.c
new file mode 100644
--- /dev/null
+++ b/arch/powerpc/kvm/44x_tlb.c
@@ -0,0 +1,224 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright IBM Corp. 2007
+ *
+ * Authors: Hollis Blanchard <ho...@us...>
+ */
+
+#include <linux/types.h>
+#include <linux/string.h>
+#include <linux/kvm_host.h>
+#include <linux/highmem.h>
+#include <asm/mmu-44x.h>
+#include <asm/kvm_ppc.h>
+
+#include "44x_tlb.h"
+
+#define PPC44x_TLB_USER_PERM_MASK (PPC44x_TLB_UX|PPC44x_TLB_UR|PPC44x_TLB_UW)
+#define PPC44x_TLB_SUPER_PERM_MASK (PPC44x_TLB_SX|PPC44x_TLB_SR|PPC44x_TLB_SW)
+
+static unsigned int kvmppc_tlb_44x_pos;
+
+static u32 kvmppc_44x_tlb_shadow_attrib(u32 attrib, int usermode)
+{
+	/* Mask off reserved bits. */
+	attrib &= PPC44x_TLB_PERM_MASK|PPC44x_TLB_ATTR_MASK;
+
+	if (!usermode) {
+		/* Guest is in supervisor mode, so we need to translate guest
+		 * supervisor permissions into user permissions. */
+		attrib &= ~PPC44x_TLB_USER_PERM_MASK;
+		attrib |= (attrib & PPC44x_TLB_SUPER_PERM_MASK) << 3;
+	}
+
+	/* Make sure host can always access this memory. */
+	attrib |= PPC44x_TLB_SX|PPC44x_TLB_SR|PPC44x_TLB_SW;
+
+	return attrib;
+}
+
+/* Search the guest TLB for a matching entry. */
+int kvmppc_44x_tlb_index(struct kvm_vcpu *vcpu, gva_t eaddr, unsigned int pid,
+                         unsigned int as)
+{
+	int i;
+
+	/* XXX Replace loop with fancy data structures. */
+	for (i = 0; i < PPC44x_TLB_SIZE; i++) {
+		struct tlbe *tlbe = &vcpu->arch.guest_tlb[i];
+		unsigned int tid;
+
+		if (eaddr < get_tlb_eaddr(tlbe))
+			continue;
+
+		if (eaddr > get_tlb_end(tlbe))
+			continue;
+
+		tid = get_tlb_tid(tlbe);
+		if (tid && (tid != pid))
+			continue;
+
+		if (!get_tlb_v(tlbe))
+			continue;
+
+		if (get_tlb_ts(tlbe) != as)
+			continue;
+
+		return i;
+	}
+
+	return -1;
+}
+
+struct tlbe *kvmppc_44x_itlb_search(struct kvm_vcpu *vcpu, gva_t eaddr)
+{
+	unsigned int as = !!(vcpu->arch.msr & MSR_IS);
+	unsigned int index;
+
+	index = kvmppc_44x_tlb_index(vcpu, eaddr, vcpu->arch.pid, as);
+	if (index == -1)
+		return NULL;
+	return &vcpu->arch.guest_tlb[index];
+}
+
+struct tlbe *kvmppc_44x_dtlb_search(struct kvm_vcpu *vcpu, gva_t eaddr)
+{
+	unsigned int as = !!(vcpu->arch.msr & MSR_DS);
+	unsigned int index;
+
+	index = kvmppc_44x_tlb_index(vcpu, eaddr, vcpu->arch.pid, as);
+	if (index == -1)
+		return NULL;
+	return &vcpu->arch.guest_tlb[index];
+}
+
+static int kvmppc_44x_tlbe_is_writable(struct tlbe *tlbe)
+{
+	return tlbe->word2 & (PPC44x_TLB_SW|PPC44x_TLB_UW);
+}
+
+/* Must be called with mmap_sem locked for writing. */
+static void kvmppc_44x_shadow_release(struct kvm_vcpu *vcpu,
+                                      unsigned int index)
+{
+	struct tlbe *stlbe = &vcpu->arch.shadow_tlb[index];
+	struct page *page = vcpu->arch.shadow_pages[index];
+
+	kunmap(vcpu->arch.shadow_pages[index]);
+
+	if (get_tlb_v(stlbe)) {
+		if (kvmppc_44x_tlbe_is_writable(stlbe))
+			kvm_release_page_dirty(page);
+		else
+			kvm_release_page_clean(page);
+	}
+}
+
+/* Caller must ensure that the specified guest TLB entry is safe to insert into
+ * the shadow TLB. */
+void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 gvaddr, gfn_t gfn, u64 asid,
+                    u32 flags)
+{
+	struct page *new_page;
+	struct tlbe *stlbe;
+	hpa_t hpaddr;
+	unsigned int victim;
+
+	/* Future optimization: don't overwrite the TLB entry containing the
+	 * current PC (or stack?). */
+	victim = kvmppc_tlb_44x_pos++;
+	if (kvmppc_tlb_44x_pos > tlb_44x_hwater)
+		kvmppc_tlb_44x_pos = 0;
+	stlbe = &vcpu->arch.shadow_tlb[victim];
+
+	/* Get reference to new page. */
+	down_write(&current->mm->mmap_sem);
+	new_page = gfn_to_page(vcpu->kvm, gfn);
+	if (is_error_page(new_page)) {
+		printk(KERN_ERR "Couldn't get guest page!\n");
+		kvm_release_page_clean(new_page);
+		return;
+	}
+	hpaddr = page_to_phys(new_page);
+
+	/* Drop reference to old page. */
+	kvmppc_44x_shadow_release(vcpu, victim);
+	up_write(&current->mm->mmap_sem);
+
+	vcpu->arch.shadow_pages[victim] = new_page;
+
+	/* XXX Make sure (va, size) doesn't overlap any other
+	 * entries. 440x6 user manual says the result would be
+	 * "undefined." */
+
+	/* XXX what about AS? */
+
+	stlbe->tid = asid & 0xff;
+
+	/* Force TS=1 for all guest mappings. */
+	/* For now we hardcode 4KB mappings, but it will be important to
+	 * use host large pages in the future. */
+	stlbe->word0 = (gvaddr & PAGE_MASK) | PPC44x_TLB_VALID | PPC44x_TLB_TS
+	               | PPC44x_TLB_4K;
+
+	stlbe->word1 = (hpaddr & 0xfffffc00) | ((hpaddr >> 32) & 0xf);
+	stlbe->word2 = kvmppc_44x_tlb_shadow_attrib(flags,
+	                                            vcpu->arch.msr & MSR_PR);
+}
+
+void kvmppc_mmu_invalidate(struct kvm_vcpu *vcpu, u64 eaddr, u64 asid)
+{
+	unsigned int pid = asid & 0xff;
+	int i;
+
+	/* XXX Replace loop with fancy data structures. */
+	down_write(&current->mm->mmap_sem);
+	for (i = 0; i <= tlb_44x_hwater; i++) {
+		struct tlbe *stlbe = &vcpu->arch.shadow_tlb[i];
+		unsigned int tid;
+
+		if (!get_tlb_v(stlbe))
+			continue;
+
+		if (eaddr < get_tlb_eaddr(stlbe))
+			continue;
+
+		if (eaddr > get_tlb_end(stlbe))
+			continue;
+
+		tid = get_tlb_tid(stlbe);
+		if (tid && (tid != pid))
+			continue;
+
+		kvmppc_44x_shadow_release(vcpu, i);
+		stlbe->word0 = 0;
+	}
+	up_write(&current->mm->mmap_sem);
+}
+
+/* Invalidate all mappings, so that when they fault back in they will get the
+ * proper permission bits. */
+void kvmppc_mmu_priv_switch(struct kvm_vcpu *vcpu, int usermode)
+{
+	int i;
+
+	/* XXX Replace loop with fancy data structures. */
+	down_write(&current->mm->mmap_sem);
+	for (i = 0; i <= tlb_44x_hwater; i++) {
+		kvmppc_44x_shadow_release(vcpu, i);
+		vcpu->arch.shadow_tlb[i].word0 = 0;
+	}
+	up_write(&current->mm->mmap_sem);
+}
diff --git a/arch/powerpc/kvm/44x_tlb.h b/arch/powerpc/kvm/44x_tlb.h
new file mode 100644
--- /dev/null
+++ b/arch/powerpc/kvm/44x_tlb.h
@@ -0,0 +1,91 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright IBM Corp. 2007
+ *
+ * Authors: Hollis Blanchard <ho...@us...>
+ */
+
+#ifndef __KVM_POWERPC_TLB_H__
+#define __KVM_POWERPC_TLB_H__
+
+#include <linux/kvm_host.h>
+#include <asm/mmu-44x.h>
+
+extern int kvmppc_44x_tlb_index(struct kvm_vcpu *vcpu, gva_t eaddr,
+                                unsigned int pid, unsigned int as);
+extern struct tlbe *kvmppc_44x_dtlb_search(struct kvm_vcpu *vcpu, gva_t eaddr);
+extern struct tlbe *kvmppc_44x_itlb_search(struct kvm_vcpu *vcpu, gva_t eaddr);
+
+/* TLB helper functions */
+static inline unsigned int get_tlb_size(const struct tlbe *tlbe)
+{
+	return (tlbe->word0 >> 4) & 0xf;
+}
+
+static inline gva_t get_tlb_eaddr(const struct tlbe *tlbe)
+{
+	return tlbe->word0 & 0xfffffc00;
+}
+
+static inline gva_t get_tlb_bytes(const struct tlbe *tlbe)
+{
+	unsigned int pgsize = get_tlb_size(tlbe);
+	return 1 << 10 << (pgsize << 1);
+}
+
+static inline gva_t get_tlb_end(const struct tlbe *tlbe)
+{
+	return get_tlb_eaddr(tlbe) + get_tlb_bytes(tlbe) - 1;
+}
+
+static inline u64 get_tlb_raddr(const struct tlbe *tlbe)
+{
+	u64 word1 = tlbe->word1;
+	return ((word1 & 0xf) << 32) | (word1 & 0xfffffc00);
+}
+
+static inline unsigned int get_tlb_tid(const struct tlbe *tlbe)
+{
+	return tlbe->tid & 0xff;
+}
+
+static inline unsigned int get_tlb_ts(const struct tlbe *tlbe)
+{
+	return (tlbe->word0 >> 8) & 0x1;
+}
+
+static inline unsigned int get_tlb_v(const struct tlbe *tlbe)
+{
+	return (tlbe->word0 >> 9) & 0x1;
+}
+
+static inline unsigned int get_mmucr_stid(const struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.mmucr & 0xff;
+}
+
+static inline unsigned int get_mmucr_sts(const struct kvm_vcpu *vcpu)
+{
+	return (vcpu->arch.mmucr >> 16) & 0x1;
+}
+
+static inline gpa_t tlb_xlate(struct tlbe *tlbe, gva_t eaddr)
+{
+	unsigned int pgmask = get_tlb_bytes(tlbe) - 1;
+
+	return get_tlb_raddr(tlbe) | (eaddr & pgmask);
+}
+
+#endif /* __KVM_POWERPC_TLB_H__ */
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
new file mode 100644
--- /dev/null
+++ b/arch/powerpc/kvm/Kconfig
@@ -0,0 +1,44 @@
+#
+# KVM configuration
+#
+
+menuconfig VIRTUALIZATION
+	bool "Virtualization"
+	---help---
+	  Say Y here to get to see options for using your Linux host to run
+	  other operating systems inside virtual machines (guests).
+	  This option alone does not add any kernel code.
+
+	  If you say N, all options in this submenu will be skipped and
+	  disabled.
+
+if VIRTUALIZATION
+
+config KVM
+	tristate "Kernel-based Virtual Machine (KVM) support"
+	depends on EXPERIMENTAL
+	select PREEMPT_NOTIFIERS
+	select ANON_INODES
+	---help---
+	  Support hosting virtualized guest machines. You will also
+	  need to select one or more of the processor modules below.
+
+	  This module provides access to the hardware capabilities through
+	  a character device node named /dev/kvm.
+
+	  To compile this as a module, choose M here: the module
+	  will be called kvm.
+
+	  If unsure, say N.
+
+config KVM_BOOKE_HOST
+	tristate "KVM host support for Book E PowerPC processors"
+	depends on KVM && 44x
+	---help---
+	  Provides host support for KVM on Book E PowerPC processors. Currently
+	  this works on 440 processors only.
+
+source drivers/virtio/Kconfig
+
+endif # VIRTUALIZATION
+
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
new file mode 100644
--- /dev/null
+++ b/arch/powerpc/kvm/Makefile
@@ -0,0 +1,15 @@
+#
+# Makefile for Kernel-based Virtual Machine module
+#
+
+EXTRA_CFLAGS += -Ivirt/kvm -Iarch/powerpc/kvm
+
+common-objs = $(addprefix ../../../virt/kvm/, kvm_main.o)
+
+kvm-objs := $(common-objs) powerpc.o emulate.o booke_guest.o
+obj-$(CONFIG_KVM) += kvm.o
+
+AFLAGS_booke_interrupts.o := -I$(obj)
+
+kvm-booke-host-objs := booke_host.o booke_interrupts.o 44x_tlb.o
+obj-$(CONFIG_KVM_BOOKE_HOST) += kvm-booke-host.o
diff --git a/arch/powerpc/kvm/booke_guest.c b/arch/powerpc/kvm/booke_guest.c
new file mode 100644
--- /dev/null
+++ b/arch/powerpc/kvm/booke_guest.c
@@ -0,0 +1,615 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright IBM Corp. 2007
+ *
+ * Authors: Hollis Blanchard <ho...@us...>
+ *          Christian Ehrhardt <ehr...@li...>
+ */
+
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/kvm_host.h>
+#include <linux/module.h>
+#include <linux/vmalloc.h>
+#include <linux/fs.h>
+#include <asm/cputable.h>
+#include <asm/uaccess.h>
+#include <asm/kvm_ppc.h>
+
+#include "44x_tlb.h"
+
+#define VM_STAT(x) offsetof(struct kvm, stat.x), KVM_STAT_VM
+#define VCPU_STAT(x) offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU
+
+struct kvm_stats_debugfs_item debugfs_entries[] = {
+	{ "exits",      VCPU_STAT(sum_exits) },
+	{ "mmio",       VCPU_STAT(mmio_exits) },
+	{ "dcr",        VCPU_STAT(dcr_exits) },
+	{ "sig",        VCPU_STAT(signal_exits) },
+	{ "light",      VCPU_STAT(light_exits) },
+	{ "itlb_r",     VCPU_STAT(itlb_real_miss_exits) },
+	{ "itlb_v",     VCPU_STAT(itlb_virt_miss_exits) },
+	{ "dtlb_r",     VCPU_STAT(dtlb_real_miss_exits) },
+	{ "dtlb_v",     VCPU_STAT(dtlb_virt_miss_exits) },
+	{ "sysc",       VCPU_STAT(syscall_exits) },
+	{ "isi",        VCPU_STAT(isi_exits) },
+	{ "dsi",        VCPU_STAT(dsi_exits) },
+	{ "inst_emu",   VCPU_STAT(emulated_inst_exits) },
+	{ "dec",        VCPU_STAT(dec_exits) },
+	{ "ext_intr",   VCPU_STAT(ext_intr_exits) },
+	{ NULL }
+};
+
+static const u32 interrupt_msr_mask[16] = {
+	[BOOKE_INTERRUPT_CRITICAL]      = MSR_ME,
+	[BOOKE_INTERRUPT_MACHINE_CHECK] = 0,
+	[BOOKE_INTERRUPT_DATA_STORAGE]  = MSR_CE|MSR_ME|MSR_DE,
+	[BOOKE_INTERRUPT_INST_STORAGE]  = MSR_CE|MSR_ME|MSR_DE,
+	[BOOKE_INTERRUPT_EXTERNAL]      = MSR_CE|MSR_ME|MSR_DE,
+	[BOOKE_INTERRUPT_ALIGNMENT]     = MSR_CE|MSR_ME|MSR_DE,
+	[BOOKE_INTERRUPT_PROGRAM]       = MSR_CE|MSR_ME|MSR_DE,
+	[BOOKE_INTERRUPT_FP_UNAVAIL]    = MSR_CE|MSR_ME|MSR_DE,
+	[BOOKE_INTERRUPT_SYSCALL]       = MSR_CE|MSR_ME|MSR_DE,
+	[BOOKE_INTERRUPT_AP_UNAVAIL]    = MSR_CE|MSR_ME|MSR_DE,
+	[BOOKE_INTERRUPT_DECREMENTER]   = MSR_CE|MSR_ME|MSR_DE,
+	[BOOKE_INTERRUPT_FIT]           = MSR_CE|MSR_ME|MSR_DE,
+	[BOOKE_INTERRUPT_WATCHDOG]      = MSR_ME,
+	[BOOKE_INTERRUPT_DTLB_MISS]     = MSR_CE|MSR_ME|MSR_DE,
+	[BOOKE_INTERRUPT_ITLB_MISS]     = MSR_CE|MSR_ME|MSR_DE,
+	[BOOKE_INTERRUPT_DEBUG]         = MSR_ME,
+};
+
+const unsigned char exception_priority[] = {
+	[BOOKE_INTERRUPT_DATA_STORAGE] = 0,
+	[BOOKE_INTERRUPT_INST_STORAGE] = 1,
+	[BOOKE_INTERRUPT_ALIGNMENT] = 2,
+	[BOOKE_INTERRUPT_PROGRAM] = 3,
+	[BOOKE_INTERRUPT_FP_UNAVAIL] = 4,
+	[BOOKE_INTERRUPT_SYSCALL] = 5,
+	[BOOKE_INTERRUPT_AP_UNAVAIL] = 6,
+	[BOOKE_INTERRUPT_DTLB_MISS] = 7,
+	[BOOKE_INTERRUPT_ITLB_MISS] = 8,
+	[BOOKE_INTERRUPT_MACHINE_CHECK] = 9,
+	[BOOKE_INTERRUPT_DEBUG] = 10,
+	[BOOKE_INTERRUPT_CRITICAL] = 11,
+	[BOOKE_INTERRUPT_WATCHDOG] = 12,
+	[BOOKE_INTERRUPT_EXTERNAL] = 13,
+	[BOOKE_INTERRUPT_FIT] = 14,
+	[BOOKE_INTERRUPT_DECREMENTER] = 15,
+};
+
+const unsigned char priority_exception[] = {
+	BOOKE_INTERRUPT_DATA_STORAGE,
+	BOOKE_INTERRUPT_INST_STORAGE,
+	BOOKE_INTERRUPT_ALIGNMENT,
+	BOOKE_INTERRUPT_PROGRAM,
+	BOOKE_INTERRUPT_FP_UNAVAIL,
+	BOOKE_INTERRUPT_SYSCALL,
+	BOOKE_INTERRUPT_AP_UNAVAIL,
+	BOOKE_INTERRUPT_DTLB_MISS,
+	BOOKE_INTERRUPT_ITLB_MISS,
+	BOOKE_INTERRUPT_MACHINE_CHECK,
+	BOOKE_INTERRUPT_DEBUG,
+	BOOKE_INTERRUPT_CRITICAL,
+	BOOKE_INTERRUPT_WATCHDOG,
+	BOOKE_INTERRUPT_EXTERNAL,
+	BOOKE_INTERRUPT_FIT,
+	BOOKE_INTERRUPT_DECREMENTER,
+};
+
+
+void kvmppc_dump_tlbs(struct kvm_vcpu *vcpu)
+{
+	struct tlbe *tlbe;
+	int i;
+
+	printk("vcpu %d TLB dump:\n", vcpu->vcpu_id);
+	printk("| %2s | %3s | %8s | %8s | %8s |\n",
+			"nr", "tid", "word0", "word1", "word2");
+
+	for (i = 0; i < PPC44x_TLB_SIZE; i++) {
+		tlbe = &vcpu->arch.guest_tlb[i];
+		if (tlbe->word0 & PPC44x_TLB_VALID)
+			printk(" G%2d |  %02X | %08X | %08X | %08X |\n",
+			       i, tlbe->tid, tlbe->word0, tlbe->word1,
+			       tlbe->word2);
+	}
+
+	for (i = 0; i < PPC44x_TLB_SIZE; i++) {
+		tlbe = &vcpu->arch.shadow_tlb[i];
+		if (tlbe->word0 & PPC44x_TLB_VALID)
+			printk(" S%2d | %02X | %08X | %08X | %08X |\n",
+			       i, tlbe->tid, tlbe->word0, tlbe->word1,
+			       tlbe->word2);
+	}
+}
+
+/* TODO: use vcpu_printf() */
+void kvmppc_dump_vcpu(struct kvm_vcpu *vcpu)
+{
+	int i;
+
+	printk("pc:   %08x msr:  %08x\n", vcpu->arch.pc, vcpu->arch.msr);
+	printk("lr:   %08x ctr:  %08x\n", vcpu->arch.lr, vcpu->arch.ctr);
+	printk("srr0: %08x srr1: %08x\n", vcpu->arch.srr0, vcpu->arch.srr1);
+
+	printk("exceptions: %08lx\n", vcpu->arch.pending_exceptions);
+
+	for (i = 0; i < 32; i += 4) {
+		printk("gpr%02d: %08x %08x %08x %08x\n", i,
+		       vcpu->arch.gpr[i],
+		       vcpu->arch.gpr[i+1],
+		       vcpu->arch.gpr[i+2],
+		       vcpu->arch.gpr[i+3]);
+	}
+}
+
+/* Check if we are ready to deliver the interrupt */
+static int kvmppc_can_deliver_interrupt(struct kvm_vcpu *vcpu, int interrupt)
+{
+	int r;
+
+	switch (interrupt) {
+	case BOOKE_INTERRUPT_CRITICAL:
+		r = vcpu->arch.msr & MSR_CE;
+		break;
+	case BOOKE_INTERRUPT_MACHINE_CHECK:
+		r = vcpu->arch.msr & MSR_ME;
+		break;
+	case BOOKE_INTERRUPT_EXTERNAL:
+		r = vcpu->arch.msr & MSR_EE;
+		break;
+	case BOOKE_INTERRUPT_DECREMENTER:
+		r = vcpu->arch.msr & MSR_EE;
+		break;
+	case BOOKE_INTERRUPT_FIT:
+		r = vcpu->arch.msr & MSR_EE;
+		break;
+	case BOOKE_INTERRUPT_WATCHDOG:
+		r = vcpu->arch.msr & MSR_CE;
+		break;
+	case BOOKE_INTERRUPT_DEBUG:
+		r = vcpu->arch.msr & MSR_DE;
+		break;
+	default:
+		r = 1;
+	}
+
+	return r;
+}
+
+static void kvmppc_deliver_interrupt(struct kvm_vcpu *vcpu, int interrupt)
+{
+	switch (interrupt) {
+	case BOOKE_INTERRUPT_DECREMENTER:
+		vcpu->arch.tsr |= TSR_DIS;
+		break;
+	}
+
+	vcpu->arch.srr0 = vcpu->arch.pc;
+	vcpu->arch.srr1 = vcpu->arch.msr;
+	vcpu->arch.pc = vcpu->arch.ivpr | vcpu->arch.ivor[interrupt];
+	kvmppc_set_msr(vcpu, vcpu->arch.msr & interrupt_msr_mask[interrupt]);
+}
+
+/* Check pending exceptions and deliver one, if possible. */
+void kvmppc_check_and_deliver_interrupts(struct kvm_vcpu *vcpu)
+{
+	unsigned long *pending = &vcpu->arch.pending_exceptions;
+	unsigned int exception;
+	unsigned int priority;
+
+	priority = find_first_bit(pending, BITS_PER_BYTE * sizeof(*pending));
+	while (priority <= BOOKE_MAX_INTERRUPT) {
+		exception = priority_exception[priority];
+		if (kvmppc_can_deliver_interrupt(vcpu, exception)) {
+			kvmppc_clear_exception(vcpu, exception);
+			kvmppc_deliver_interrupt(vcpu, exception);
+			break;
+		}
+
+		priority = find_next_bit(pending,
+		                         BITS_PER_BYTE * sizeof(*pending),
+		                         priority + 1);
+	}
+}
+
+static int kvmppc_emulate_mmio(struct kvm_run *run, struct kvm_vcpu *vcpu)
+{
+	enum emulation_result er;
+	int r;
+
+	er = kvmppc_emulate_instruction(run, vcpu);
+	switch (er) {
+	case EMULATE_DONE:
+		/* Future optimization: only reload non-volatiles if they were
+		 * actually modified. */
+		r = RESUME_GUEST_NV;
+		break;
+	case EMULATE_DO_MMIO:
+		run->exit_reason = KVM_EXIT_MMIO;
+		/* We must reload nonvolatiles because "update" load/store
+		 * instructions modify register state. */
+		/* Future optimization: only reload non-volatiles if they were
+		 * actually modified. */
+		r = RESUME_HOST_NV;
+		break;
+	case EMULATE_FAIL:
+		/* XXX Deliver Program interrupt to guest. */
+		printk(KERN_EMERG "%s: emulation failed (%08x)\n", __func__,
+		       vcpu->arch.last_inst);
+		r = RESUME_HOST;
+		break;
+	default:
+		BUG();
+	}
+
+	return r;
+}
+
+/**
+ * kvmppc_handle_exit
+ * 
+ * Return value is in the form (errcode<<2 | RESUME_FLAG_HOST | RESUME_FLAG_NV)
+ */
+int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
+                       unsigned int exit_nr)
+{
+	enum emulation_result er;
+	int r = RESUME_HOST;
+
+	local_irq_enable();
+
+	run->exit_reason = KVM_EXIT_UNKNOWN;
+	run->ready_for_interrupt_injection = 1;
+
+	switch (exit_nr) {
+	case BOOKE_INTERRUPT_MACHINE_CHECK:
+		printk("MACHINE CHECK: %lx\n", mfspr(SPRN_MCSR));
+		kvmppc_dump_vcpu(vcpu);
+		r = RESUME_HOST;
+		break;
+
+	case BOOKE_INTERRUPT_EXTERNAL:
+	case BOOKE_INTERRUPT_DECREMENTER:
+		/* Since we switched IVPR back to the host's value, the host
+		 * handled this interrupt the moment we enabled interrupts.
+		 * Now we just offer it a chance to reschedule the guest. */
+
+		/* XXX At this point the TLB still holds our shadow TLB, so if
+		 * we do reschedule the host will fault over it. Perhaps we
+		 * should politely restore the host's entries to minimize
+		 * misses before ceding control. */
+		if (need_resched())
+			cond_resched();
+		if (exit_nr == BOOKE_INTERRUPT_DECREMENTER)
+			vcpu->stat.dec_exits++;
+		else
+			vcpu->stat.ext_intr_exits++;
+		r = RESUME_GUEST;
+		break;
+
+	case BOOKE_INTERRUPT_PROGRAM:
+		if (vcpu->arch.msr & MSR_PR) {
+			/* Program traps generated by user-level software must be handled
+			 * by the guest kernel. */
+			vcpu->arch.esr = vcpu->arch.fault_esr;
+			kvmppc_queue_exception(vcpu, BOOKE_INTERRUPT_PROGRAM);
+			r = RESUME_GUEST;
+			break;
+		}
+
+		er = kvmppc_emulate_instruction(run, vcpu);
+		switch (er) {
+		case EMULATE_DONE:
+			/* Future optimization: only reload non-volatiles if
+			 * they were actually modified by emulation. */
+			vcpu->stat.emulated_inst_exits++;
+			r = RESUME_GUEST_NV;
+			break;
+		case EMULATE_DO_DCR:
+			run->exit_reason = KVM_EXIT_DCR;
+			r = RESUME_HOST;
+			break;
+		case EMULATE_FAIL:
+			/* XXX Deliver Program interrupt to guest. */
+			printk(KERN_CRIT "%s: emulation at %x failed (%08x)\n",
+			       __func__, vcpu->arch.pc, vcpu->arch.last_inst);
+			/* For debugging, encode the failing instruction and
+			 * report it to userspace. */
+			run->hw.hardware_exit_reason = ~0ULL << 32;
+			run->hw.hardware_exit_reason |= vcpu->arch.last_inst;
+			r = RESUME_HOST;
+			break;
+		default:
+			BUG();
+		}
+		break;
+
+	case BOOKE_INTERRUPT_DATA_STORAGE:
+		vcpu->arch.dear = vcpu->arch.fault_dear;
+		vcpu->arch.esr = vcpu->arch.fault_esr;
+		kvmppc_queue_exception(vcpu, exit_nr);
+		vcpu->stat.dsi_exits++;
+		r = RESUME_GUEST;
+		break;
+
+	case BOOKE_INTERRUPT_INST_STORAGE:
+		vcpu->arch.esr = vcpu->arch.fault_esr;
+		kvmppc_queue_exception(vcpu, exit_nr);
+		vcpu->stat.isi_exits++;
+		r = RESUME_GUEST;
+		break;
+
+	case BOOKE_INTERRUPT_SYSCALL:
+		kvmppc_queue_exception(vcpu, exit_nr);
+		vcpu->stat.syscall_exits++;
+		r = RESUME_GUEST;
+		break;
+
+	case BOOKE_INTERRUPT_DTLB_MISS: {
+		struct tlbe *gtlbe;
+		unsigned long eaddr = vcpu->arch.fault_dear;
+		gfn_t gfn;
+
+		/* Check the guest TLB. */
+		gtlbe = kvmppc_44x_dtlb_search(vcpu, eaddr);
+		if (!gtlbe) {
+			/* The guest didn't have a mapping for it. */
+			kvmppc_queue_exception(vcpu, exit_nr);
+			vcpu->arch.dear = vcpu->arch.fault_dear;
+			vcpu->arch.esr = vcpu->arch.fault_esr;
+			vcpu->stat.dtlb_real_miss_exits++;
+			r = RESUME_GUEST;
+			break;
+		}
+
+		vcpu->arch.paddr_accessed = tlb_xlate(gtlbe, eaddr);
+		gfn = vcpu->arch.paddr_accessed >> PAGE_SHIFT;
+
+		if (kvm_is_visible_gfn(vcpu->kvm, gfn)) {
+			/* The guest TLB had a mapping, but the shadow TLB
+			 * didn't, and it is RAM. This could be because:
+			 * a) the entry is mapping the host kernel, or
+			 * b) the guest used a large mapping which we're faking
+			 * Either way, we need to satisfy the fault without
+			 * invoking the guest. */
+			kvmppc_mmu_map(vcpu, eaddr, gfn, gtlbe->tid,
+			               gtlbe->word2);
+			vcpu->stat.dtlb_virt_miss_exits++;
+			r = RESUME_GUEST;
+		} else {
+			/* Guest has mapped and accessed a page which is not
+			 * actually RAM. */
+			r = kvmppc_emulate_mmio(run, vcpu);
+		}
+
+		break;
+	}
+
+	case BOOKE_INTERRUPT_ITLB_MISS: {
+		struct tlbe *gtlbe;
+		unsigned long eaddr = vcpu->arch.pc;
+		gfn_t gfn;
+
+		r = RESUME_GUEST;
+
+		/* Check the guest TLB. */
+		gtlbe = kvmppc_44x_itlb_search(vcpu, eaddr);
+		if (!gtlbe) {
+			/* The guest didn't have a mapping for it. */
+			kvmppc_queue_exception(vcpu, exit_nr);
+			vcpu->stat.itlb_real_miss_exits++;
+			break;
+		}
+
+		vcpu->stat.itlb_virt_miss_exits++;
+
+		gfn = tlb_xlate(gtlbe, eaddr) >> PAGE_SHIFT;
+
+		if (kvm_is_visible_gfn(vcpu->kvm, gfn)) {
+			/* The guest TLB had a mapping, but the shadow TLB
+			 * didn't. This could be because:
+			 * a) the entry is mapping the host kernel, or
+			 * b) the guest used a large mapping which we're faking
+			 * Either way, we need to satisfy the fault without
+			 * invoking the guest. */
+			kvmppc_mmu_map(vcpu, eaddr, gfn, gtlbe->tid,
+			               gtlbe->word2);
+		} else {
+			/* Guest mapped and leaped at non-RAM! */
+			kvmppc_queue_exception(vcpu,
+			                       BOOKE_INTERRUPT_MACHINE_CHECK);
+		}
+
+		break;
+	}
+
+	default:
+		printk(KERN_EMERG "exit_nr %d\n", exit_nr);
+		BUG();
+	}
+
+	local_irq_disable();
+
+	kvmppc_check_and_deliver_interrupts(vcpu);
+
+	/* Do some exit accounting. */
+	vcpu->stat.sum_exits++;
+	if (!(r & RESUME_HOST)) {
+		/* To avoid clobbering exit_reason, only check for signals if
+		 * we aren't already exiting to userspace for some other
+		 * reason. */
+		if (signal_pending(current)) {
+			run->exit_reason = KVM_EXIT_INTR;
+			r = (-EINTR << 2) | RESUME_HOST | (r & RESUME_FLAG_NV);
+
+			vcpu->stat.signal_exits++;
+		} else {
+			vcpu->stat.light_exits++;
+		}
+	} else {
+		switch (run->exit_reason) {
+		case KVM_EXIT_MMIO:
+			vcpu->stat.mmio_exits++;
+			break;
+		case KVM_EXIT_DCR:
+			vcpu->stat.dcr_exits++;
+			break;
+		case KVM_EXIT_INTR:
+			vcpu->stat.signal_exits++;
+			break;
+		}
+	}
+
+	return r;
+}
+
+/* Initial guest state: 16MB mapping 0 -> 0, PC = 0, MSR = 0, R1 = 16MB */
+int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
+{
+	struct tlbe *tlbe = &vcpu->arch.guest_tlb[0];
+
+	tlbe->tid = 0;
+	tlbe->word0 = PPC44x_TLB_16M | PPC44x_TLB_VALID;
+	tlbe->word1 = 0;
+	tlbe->word2 = PPC44x_TLB_SX | PPC44x_TLB_SW | PPC44x_TLB_SR;
+
+	tlbe++;
+	tlbe->tid = 0;
+	tlbe->word0 = 0xef600000 | PPC44x_TLB_4K | PPC44x_TLB_VALID;
+	tlbe->word1 = 0xef600000;
+	tlbe->word2 = PPC44x_TLB_SX | PPC44x_TLB_SW | PPC44x_TLB_SR
+	              | PPC44x_TLB_I | PPC44x_TLB_G;
+
+	vcpu->arch.pc = 0;
+	vcpu->arch.msr = 0;
+	vcpu->arch.gpr[1] = (16<<20) - 8; /* -8 for the callee-save LR slot */
+
+	/* Eye-catching number so we know if the guest takes an interrupt
+	 * before it's programmed its own IVPR. */
+	vcpu->arch.ivpr = 0x55550000;
+
+	/* Since the guest can directly access the timebase, it must know the
+	 * real timebase frequency. Accordingly, it must see the state of
+	 * CCR1[TCS]. */
+	vcpu->arch.ccr1 = mfspr(SPRN_CCR1);
+
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
+{
+	int i;
+
+	regs->pc = vcpu->arch.pc;
+	regs->cr = vcpu->arch.cr;
+	regs->ctr = vcpu->arch.ctr;
+	regs->lr = vcpu->arch.lr;
+	regs->xer = vcpu->arch.xer;
+	regs->msr = vcpu->arch.msr;
+	regs->srr0 = vcpu->arch.srr0;
+	regs->srr1 = vcpu->arch.srr1;
+	regs->pid = vcpu->arch.pid;
+	regs->sprg0 = vcpu->arch.sprg0;
+	regs->sprg1 = vcpu->arch.sprg1;
+	regs->sprg2 = vcpu->arch.sprg2;
+	regs->sprg3 = vcpu->arch.sprg3;
+	regs->sprg5 = vcpu->arch.sprg4;
+	regs->sprg6 = vcpu->arch.sprg5;
+	regs->sprg7 = vcpu->arch.sprg6;
+
+	for (i = 0; i < ARRAY_SIZE(regs->gpr); i++)
+		regs->gpr[i] = vcpu->arch.gpr[i];
+
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
+{
+	int i;
+
+	vcpu->arch.pc = regs->pc;
+	vcpu->arch.cr = regs->cr;
+	vcpu->arch.ctr = regs->ctr;
+	vcpu->arch.lr = regs->lr;
+	vcpu->arch.xer = regs->xer;
+	vcpu->arch.msr = regs->msr;
+	vcpu->arch.srr0 = regs->srr0;
+	vcpu->arch.srr1 = regs->srr1;
+	vcpu->arch.sprg0 = regs->sprg0;
+	vcpu->arch.sprg1 = regs->sprg1;
+	vcpu->arch.sprg2 = regs->sprg2;
+	vcpu->arch.sprg3 = regs->sprg3;
+	vcpu->arch.sprg5 = regs->sprg4;
+	vcpu->arch.sprg6 = regs->sprg5;
+	vcpu->arch.sprg7 = regs->sprg6;
+
+	for (i = 0; i < ARRAY_SIZE(vcpu->arch.gpr); i++)
+		vcpu->arch.gpr[i] = regs->gpr[i];
+
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
+                                  struct kvm_sregs *sregs)
+{
+	return -ENOTSUPP;
+}
+
+int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
+                                  struct kvm_sregs *sregs)
+{
+	return -ENOTSUPP;
+}
+
+int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
+{
+	return -ENOTSUPP;
+}
+
+int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
+{
+	return -ENOTSUPP;
+}
+
+/* 'linear_address' is actually an encoding of AS|PID|EADDR . */
+int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
+                                  struct kvm_translation *tr)
+{
+	struct tlbe *gtlbe;
+	int index;
+	gva_t eaddr;
+	u8 pid;
+	u8 as;
+
+	eaddr = tr->linear_address;
+	pid = (tr->linear_address >> 32) & 0xff;
+	as = (tr->linear_address >> 40) & 0x1;
+
+	index = kvmppc_44x_tlb_index(vcpu, eaddr, pid, as);
+	if (index == -1) {
+		tr->valid = 0;
+		return 0;
+	}
+
+	gtlbe = &vcpu->arch.guest_tlb[index];
+
+	tr->physical_address = tlb_xlate(gtlbe, eaddr);
+	/* XXX what does "writeable" and "usermode" even mean? */
+	tr->valid = 1;
+
+	return 0;
+}
diff --git a/arch/powerpc/kvm/booke_host.c b/arch/powerpc/kvm/booke_host.c
new file mode 100644
--- /dev/null
+++ b/arch/powerpc/kvm/booke_host.c
@@ -0,0 +1,83 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright IBM Corp. 2008
+ *
+ * Authors: Hollis Blanchard <ho...@us...>
+ */
+
+#include <linux/errno.h>
+#include <linux/kvm_host.h>
+#include <linux/module.h>
+#include <asm/cacheflush.h>
+#include <asm/kvm_ppc.h>
+
+unsigned long kvmppc_booke_handlers;
+
+static int kvmppc_booke_init(void)
+{
+	unsigned long ivor[16];
+	unsigned long max_ivor = 0;
+	int i;
+
+	/* We install our own exception handlers by hijacking IVPR. IVPR must
+	 * be 16-bit aligned, so we need a 64KB allocation. */
+	kvmppc_booke_handlers = __get_free_pages(GFP_KERNEL | __GFP_ZERO,
+	                                         VCPU_SIZE_ORDER);
+	if (!kvmppc_booke_handlers)
+		return -ENOMEM;
+
+	/* XXX make sure our handlers are smaller than Linux's */
+
+	/* Copy our interrupt handlers to match host IVORs. That way we don't
+	 * have to swap the IVORs on every guest/host transition. */
+	ivor[0] = mfspr(SPRN_IVOR0);
+	ivor[1] = mfspr(SPRN_IVOR1);
+	ivor[2] = mfspr(SPRN_IVOR2);
+	ivor[3] = mfspr(SPRN_IVOR3);
+	ivor[4] = mfspr(SPRN_IVOR4);
+	ivor[5] = mfspr(SPRN_IVOR5);
+	ivor[6] = mfspr(SPRN_IVOR6);
+	ivor[7] = mfspr(SPRN_IVOR7);
+	ivor[8] = mfspr(SPRN_IVOR8);
+	ivor[9] = mfspr(SPRN_IVOR9);
+	ivor[10] = mfspr(SPRN_IVOR10);
+	ivor[11] = mfspr(SPRN_IVOR11);
+	ivor[12] = mfspr(SPRN_IVOR12);
+	ivor[13] = mfspr(SPRN_IVOR13);
+	ivor[14] = mfspr(SPRN_IVOR14);
+	ivor[15] = mfspr(SPRN_IVOR15);
+
+	for (i = 0; i < 16; i++) {
+		if (ivor[i] > max_ivor)
+			max_ivor = ivor[i];
+
+		memcpy((void *)kvmppc_booke_handlers + ivor[i],
+		       kvmppc_handlers_start + i * kvmppc_handler_len,
+		       kvmppc_handler_len);
+	}
+	flush_icache_range(kvmppc_booke_handlers,
+	                   kvmppc_booke_handlers + max_ivor + kvmppc_handler_len);
+
+	return kvm_init(NULL, sizeof(struct kvm_vcpu), THIS_MODULE);
+}
+
+static void __exit kvmppc_booke_exit(void)
+{
+	free_pages(kvmppc_booke_handlers, VCPU_SIZE_ORDER);
+	kvm_exit();
+}
+
+module_init(kvmppc_booke_init)
+module_exit(kvmppc_booke_exit)
diff --git a/arch/powerpc/kvm/booke_interrupts.S b/arch/powerpc/kvm/booke_interrupts.S
new file mode 100644
--- /dev/null
+++ b/arch/powerpc/kvm/booke_interrupts.S
@@ -0,0 +1,436 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright IBM Corp. 2007
+ *
+ * Authors: Hollis Blanchard <ho...@us...>
+ */
+
+#include <asm/ppc_asm.h>
+#include <asm/kvm_asm.h>
+#include <asm/reg.h>
+#include <asm/mmu-44x.h>
+#include <asm/page.h>
+#include <asm/asm-offsets.h>
+
+#define KVMPPC_MSR_MASK (MSR_CE|MSR_EE|MSR_PR|MSR_DE|MSR_ME|MSR_IS|MSR_DS)
+
+#define VCPU_GPR(n)     (VCPU_GPRS + (n * 4))
+
+/* The host stack layout: */
+#define HOST_R1         0 /* Implied by stwu. */
+#define HOST_CALLEE_LR  4
+#define HOST_RUN        8
+/* r2 is special: it holds 'current', and it made nonvolatile in the
+ * kernel with the -ffixed-r2 gcc option. */
+#define HOST_R2         12
+#define HOST_NV_GPRS    16
+#define HOST_NV_GPR(n)  (HOST_NV_GPRS + ((n - 14) * 4))
+#define HOST_MIN_STACK_SIZE (HOST_NV_GPR(31) + 4)
+#define HOST_STACK_SIZE (((HOST_MIN_STACK_SIZE + 15) / 16) * 16) /* Align. */
+#define HOST_STACK_LR   (HOST_STACK_SIZE + 4) /* In caller stack frame. */
+
+#define NEED_INST_MASK ((1<<BOOKE_INTERRUPT_PROGRAM) | \
+                        (1<<BOOKE_INTERRUPT_DTLB_MISS))
+
+#define NEED_DEAR_MASK ((1<<BOOKE_INTERRUPT_DATA_STORAGE) | \
+                        (1<<BOOKE_INTERRUPT_DTLB_MISS))
+
+#define NEED_ESR_MASK ((1<<BOOKE_INTERRUPT_DATA_STORAGE) | \
+                       (1<<BOOKE_INTERRUPT_INST_STORAGE) | \
+                       (1<<BOOKE_INTERRUPT_PROGRAM) | \
+                       (1<<BOOKE_INTERRUPT_DTLB_MISS))
+
+.macro KVM_HANDLER ivor_nr
+_GLOBAL(kvmppc_handler_\ivor_nr)
+	/* Get pointer to vcpu and record exit number. */
+	mtspr	SPRN_SPRG0, r4
+	mfspr	r4, SPRN_SPRG1
+	stw	r5, VCPU_GPR(r5)(r4)
+	stw	r6, VCPU_GPR(r6)(r4)
+	mfctr	r5
+	lis	r6, kvmppc_resume_host@h
+	stw	r5, VCPU_CTR(r4)
+	li	r5, \ivor_nr
+	ori	r6, r6, kvmppc_resume_host@l
+	mtctr	r6
+	bctr
+.endm
+
+_GLOBAL(kvmppc_handlers_start)
+KVM_HANDLER BOOKE_INTERRUPT_CRITICAL
+KVM_HANDLER BOOKE_INTERRUPT_MACHINE_CHECK
+KVM_HANDLER BOOKE_INTERRUPT_DATA_STORAGE
+KVM_HANDLER BOOKE_INTERRUPT_INST_STORAGE
+KVM_HANDLER BOOKE_INTERRUPT_EXTERNAL
+KVM_HANDLER BOOKE_INTERRUPT_ALIGNMENT
+KVM_HANDLER BOOKE_INTERRUPT_PROGRAM
+KVM_HANDLER BOOKE_INTERRUPT_FP_UNAVAIL
+KVM_HANDLER BOOKE_INTERRUPT_SYSCALL
+KVM_HANDLER BOOKE_INTERRUPT_AP_UNAVAIL
+KVM_HANDLER BOOKE_INTERRUPT_DECREMENTER
+KVM_HANDLER BOOKE_INTERRUPT_FIT
+KVM_HANDLER BOOKE_INTERRUPT_WATCHDOG
+KVM_HANDLER BOOKE_INTERRUPT_DTLB_MISS
+KVM_HANDLER BOOKE_INTERRUPT_ITLB_MISS
+KVM_HANDLER BOOKE_INTERRUPT_DEBUG
+
+_GLOBAL(kvmppc_handler_len)
+	.long kvmppc_handler_1 - kvmppc_handler_0
+
+
+/* Registers:
+ *  SPRG0: guest r4
+ *  r4: vcpu pointer
+ *  r5: KVM exit number
+ */
+_GLOBAL(kvmppc_resume_host)
+	stw	r3, VCPU_GPR(r3)(r4)
+	mfcr	r3
+	stw	r3, VCPU_CR(r4)
+	stw	r7, VCPU_GPR(r7)(r4)
+	stw	r8, VCPU_GPR(r8)(r4)
+	stw	r9, VCPU_GPR(r9)(r4)
+
+	li	r6, 1
+	slw	r6, r6, r5
+
+	/* Save the faulting instruction and all GPRs for emulation. */
+	andi.	r7, r6, NEED_INST_MASK
+	beq	..skip_inst_copy
+	mfspr	r9, SPRN_SRR0
+	mfmsr	r8
+	ori	r7, r8, MSR_DS
+	mtmsr	r7
+	isync
+	lwz	r9, 0(r9)
+	mtmsr	r8
+	isync
+	stw	r9, VCPU_LAST_INST(r4)
+
+	stw	r15, VCPU_GPR(r15)(r4)
+	stw	r16, VCPU_GPR(r16)(r4)
+	stw	r17, VCPU_GPR(r17)(r4)
+	stw	r18, VCPU_GPR(r18)(r4)
+	stw	r19, VCPU_GPR(r19)(r4)
+	stw	r20, VCPU_GPR(r20)(r4)
+	stw	r21, VCPU_GPR(r21)(r4)
+	stw	r22, VCPU_GPR(r22)(r4)
+	stw	r23, VCPU_GPR(r23)(r4)
+	stw	r24, VCPU_GPR(r24)(r4)
+	stw	r25, VCPU_GPR(r25)(r4)
+	stw	r26, VCPU_GPR(r26)(r4)
+	stw	r27, VCPU_GPR(r27)(r4)
+	stw	r28, VCPU_GPR(r28)(r4)
+	stw	r29, VCPU_GPR(r29)(r4)
+	stw	r30, VCPU_GPR(r30)(r4)
+	stw	r31, VCPU_GPR(r31)(r4)
+..skip_inst_copy:
+
+	/* Also grab DEAR and ESR before the host can clobber them. */
+
+	andi.	r7, r6, NEED_DEAR_MASK
+	beq	..skip_dear
+	mfspr	r9, SPRN_DEAR
+	stw	r9, VCPU_FAULT_DEAR(r4)
+..skip_dear:
+
+	andi.	r7, r6, NEED_ESR_MASK
+	beq	..skip_esr
+	mfspr	r9, SPRN_ESR
+	stw	r9, VCPU_FAULT_ESR(r4)
+..skip_esr:
+
+	/* Save remaining volatile guest register state to vcpu. */
+	stw	r0, VCPU_GPR(r0)(r4)
+	stw	r1, VCPU_GPR(r1)(r4)
+	stw	r2, VCPU_GPR(r2)(r4)
+	stw	r10, VCPU_GPR(r10)(r4)
+	stw	r11, VCPU_GPR(r11)(r4)
+	stw	r12, VCPU_GPR(r12)(r4)
+	stw	r13, VCPU_GPR(r13)(r4)
+	stw	r14, VCPU_GPR(r14)(r4) /* We need a NV GPR below. */
+	mflr	r3
+	stw	r3, VCPU_LR(r4)
+	mfxer	r3
+	stw	r3, VCPU_XER(r4)
+	mfspr	r3, SPRN_SPRG0
+	stw	r3, VCPU_GPR(r4)(r4)
+	mfspr	r3, SPRN_SRR0
+	stw	r3, VCPU_PC(r4)
+
+	/* Restore host stack pointer and PID before IVPR, since the host
+	 * exception handlers use them. */
+	lwz	r1, VCPU_HOST_STACK(r4)
+	lwz	r3, VCPU_HOST_PID(r4)
+	mtspr	SPRN_PID, r3
+
+	/* Restore host IVPR before re-enabling interrupts. We cheat and know
+	 * that Linux IVPR is always 0xc0000000. */
+	lis	r3, 0xc000
+	mtspr	SPRN_IVPR, r3
+
+	/* Switch to kernel stack and jump to handler. */
+	LOAD_REG_ADDR(r3, kvmppc_handle_exit)
+	mtctr	r3
+	lwz	r3, HOST_RUN(r1)
+	lwz	r2, HOST_R2(r1)
+	mr	r14, r4 /* Save vcpu pointer. */
+
+	bctrl	/* kvmppc_handle_exit() */
+
+	/* Restore vcpu pointer and the nonvolatiles we used. */
+	mr	r4, r14
+	lwz	r14, VCPU_GPR(r14)(r4)
+
+	/* Sometimes instruction emulation must restore complete GPR state. */
+	andi.	r5, r3, RESUME_FLAG_NV
+	beq	..skip_nv_load
+	lwz	r15, VCPU_GPR(r15)(r4)
+	lwz	r16, VCPU_GPR(r16)(r4)
+	lwz	r17, VCPU_GPR(r17)(r4)
+	lwz	r18, VCPU_GPR(r18)(r4)
+	lwz	r19, VCPU_GPR(r19)(r4)
+	lwz	r20, VCPU_GPR(r20)(r4)
+	lwz	r21, VCPU_GPR(r21)(r4)
+	lwz	r22, VCPU_GPR(r22)(r4)
+	lwz	r23, VCPU_GPR(r23)(r4)
+	lwz	r24, VCPU_GPR(r24)(r4)
+	lwz	r25, VCPU_GPR(r25)(r4)
+	lwz	r26, VCPU_GPR(r26)(r4)
+	lwz	r27, VCPU_GPR(r27)(r4)
+	lwz	r28, VCPU_GPR(r28)(r4)
+	lwz	r29, VCPU_GPR(r29)(r4)
+	lwz	r30, VCPU_GPR(r30)(r4)
+	lwz	r31, VCPU_GPR(r31)(r4)
+..skip_nv_load:
+
+	/* Should we return to the guest? */
+	andi.	r5, r3, RESUME_FLAG_HOST
+	beq	lightweight_exit
+
+	srawi	r3, r3, 2 /* Shift -ERR back down. */
+
+heavyweight_exit:
+	/* Not returning to guest. */
+
+	/* We already saved guest volatile register state; now save the
+	 * non-volatiles. */
+	stw	r15, VCPU_GPR(r15)(r4)
+	stw	r16, VCPU_GPR(r16)(r4)
+	stw	r17, VCPU_GPR(r17)(r4)
+	stw	r18, VCPU_GPR(r18)(r4)
+	stw	r19, VCPU_GPR(r19)(r4)
+	stw	r20, VCPU_GPR(r20)(r4)
+	stw	r21, VCPU_GPR(r21)(r4)
+	stw	r22, VCPU_GPR(r22)(r4)
+	stw	r23, VCPU_GPR(r23)(r4)
+	stw	r24, VCPU_GPR(r24)(r4)
+	stw	r25, VCPU_GPR(r25)(r4)
+	stw	r26, VCPU_GPR(r26)(r4)
+	stw	r27, VCPU_GPR(r27)(r4)
+	stw	r28, VCPU_GPR(r28)(r4)
+	stw	r29, VCPU_GPR(r29)(r4)
+	stw	r30, VCPU_GPR(r30)(r4)
+	stw	r31, VCPU_GPR(r31)(r4)
+
+	/* Load host non-volatile register state from host stack. */
+	lwz	r14, HOST_NV_GPR(r14)(r1)
+	lwz	r15, HOST_NV_GPR(r15)(r1)
+	lwz	r16, HOST_NV_GPR(r16)(r1)
+	lwz	r17, HOST_NV_GPR(r17)(r1)
+	lwz	r18, HOST_NV_GPR(r18)(r1)
+	lwz	r19, HOST_NV_GPR(r19)(r1)
+	lwz	r20, HOST_NV_GPR(r20)(r1)
+	lwz	r21, HOST_NV_GPR(r21)(r1)
+	lwz	r22, HOST_NV_GPR(r22)(r1)
+	lwz	r23, HOST_NV_GPR(r23)(r1)
+	lwz	r24, HOST_NV_GPR(r24)(r1)
+	lwz	r25, HOST_NV_GPR(r25)(r1)
+	lwz	r26, HOST_NV_GPR(r26)(r1)
+	lwz	r27, HOST_NV_GPR(r27)(r1)
+	lwz	r28, HOST_NV_GPR(r28)(r1)
+	lwz	r29, HOST_NV_GPR(r29)(r1)
+	lwz	r30, HOST_NV_GPR(r30)(r1)
+	lwz	r31, HOST_NV_GPR(r31)(r1)
+
+	/* Return to kvm_vcpu_run(). */
+	lwz	r4, HOST_STACK_LR(r1)
+	addi	r1, r1, HOST_STACK_SIZE
+	mtlr	r4
+	/* r3 still contains the return code from kvmppc_handle_exit(). */
+	blr
+
+
+/* Registers:
+ *  r3: kvm_run pointer
+ *  r4: vcpu pointer
+ */
+_GLOBAL(__kvmppc_vcpu_run)
+	stwu	r1, -HOST_STACK_SIZE(r1)
+	stw	r1, VCPU_HOST_STACK(r4)	/* Save stack pointer to vcpu. */
+
+	/* Save host state to stack. */
+	stw	r3, HOST_RUN(r1)
+	mflr	r3
+	stw	r3, HOST_STACK_LR(r1)
+
+	/* Save host non-volatile register state to stack. */
+	stw	r14, HOST_NV_GPR(r14)(r1)
+	stw	r15, HOST_NV_GPR(r15)(r1)
+	stw	r16, HOST_NV_GPR(r16)(r1)
+	stw	r17, HOST_NV_GPR(r17)(r1)
+	stw	r18, HOST_NV_GPR(r18)(r1)
+	stw	r19, HOST_NV_GPR(r19)(r1)
+	stw	r20, HOST_NV_GPR(r20)(r1)
+	stw	r21, HOST_NV_GPR(r21)(r1)
+	stw	r22, HOST_NV_GPR(r22)(r1)
+	stw	r23, HOST_NV_GPR(r23)(r1)
+	stw	r24, HOST_NV_GPR(r24)(r1)
+	stw	r25, HOST_NV_GPR(r25)(r1)
+	stw	r26, HOST_NV_GPR(r26)(r1)
+	stw	r27, HOST_NV_GPR(r27)(r1)
+	stw	r28, HOST_NV_GPR(r28)(r1)
+	stw	r29, HOST_NV_GPR(r29)(r1)
+	stw	r30, HOST_NV_GPR(r30)(r1)
+	stw	r31, HOST_NV_GPR(r31)(r1)
+
+	/* Load guest non-volatiles. */
+	lwz	r14, VCPU_GPR(r14)(r4)
+	lwz	r15, VCPU_GPR(r15)(r4)
+	lwz	r16, VCPU_GPR(r16)(r4)
+	lwz	r17, VCPU_GPR(r17)(r4)
+	lwz	r18, VCPU_GPR(r18)(r4)
+	lwz	r19, VCPU_GPR(r19)(r4)
+	lwz	r20, VCPU_GPR(r20)(r4)
+	lwz	r21, VCPU_GPR(r21)(r4)
+	lwz	r22, VCPU_GPR(r22)(r4)
+	lwz	r23, VCPU_GPR(r23)(r4)
+	lwz	r24, VCPU_GPR(r24)(r4)
+	lwz	r25, VCPU_GPR(r25)(r4)
+	lwz	r26, VCPU_GPR(r26)(r4)
+	lwz	r27, VCPU_GPR(r27)(r4)
+	lwz	r28, VCPU_GPR(r28)(r4)
+	lwz	r29, VCPU_GPR(r29)(r4)
+	lwz	r30, VCPU_GPR(r30)(r4)
+	lwz	r31, VCPU_GPR(r31)(r4)
+
+lightweight_exit:
+	stw	r2, HOST_R2(r1)
+
+	mfspr	r3, SPRN_PID
+	stw	r3, VCPU_HOST_PID(r4)
+	lwz	r3, VCPU_PID(r4)
+	mtspr	SPRN_PID, r3
+
+	/* Prevent all TLB updates. */
+	mfmsr	r5
+	lis	r6, (MSR_EE|MSR_CE|MSR_ME|MSR_DE)@h
+	ori	r6, r6, (MSR_EE|MSR_CE|MSR_ME|MSR_DE)@l
+	andc	r6, r5, r6
+	mtmsr	r6
+
+	/* Save the host's non-pinned TLB mappings, and load the guest mappings
+	 * over them. Leave the host's "pinned" kernel mappings in place. */
+	/* XXX optimization: use generation count to avoid swapping unmodified
+	 * entries. */
+	mfspr	r10, SPRN_MMUCR			/* Save host MMUCR. */
+	lis	r8, tlb_44x_hwater@ha
+	lwz	r8, tlb_44x_hwater@l(r8)
+	addi	r3, r4, VCPU_HOST_TLB - 4
+	addi	r9, r4, VCPU_SHADOW_TLB - 4
+	li	r6, 0
+1:
+	/* Save host entry. */
+	tlbre	r7, r6, PPC44x_TLB_PAGEID
+	mfspr	r5, SPRN_MMUCR
+	stwu	r5, 4(r3)
+	stwu	r7, 4(r3)
+	tlbre	r7, r6, PPC44x_TLB_XLAT
+	stwu	r7, 4(r3)
+	tlbre	r7, r6, PPC44x_TLB_ATTRIB
+	stwu	r7, 4(r3)
+	/* Load guest entry. */
+	lwzu	r7, 4(r9)
+	mtspr	SPRN_MMUCR, r7
+	lwzu	r7, 4(r9)
+	tlbwe	r7, r6, PPC44x_TLB_PAGEID
+	lwzu	r7, 4(r9)
+	tlbwe	r7, r6, PPC44x_TLB_XLAT
+	lwzu	r7, 4(r9)
+	tlbwe	r7, r6, PPC44x_TLB_ATTRIB
+	/* Increment index. */
+	addi	r6, r6, 1
+	cmpw	r6, r8
+	blt	1b
+	mtspr	SPRN_MMUCR, r10			/* Restore host MMUCR. */
+
+	iccci	0, 0 /* XXX hack */
+
+	/* Load some guest volatiles. */
+	lwz	r0, VCPU_GPR(r0)(r4)
+	lwz	r2, VCPU_GPR(r2)(r4)
+	lwz	r9, VCPU_GPR(r9)(r4)
+	lwz	r10, VCPU_GPR(r10)(r4)
+	lwz	r11, VCPU_GPR(r11)(r4)
+	lwz	r12, VCPU_GPR(r12)(r4)
+	lwz	r13, VCPU_GPR(r13)(r4)
+	lwz	r3, VCPU_LR(r4)
+	mtlr	r3
+	lwz	r3, VCPU_XER(r4)
+	mtxer	r3
+
+	/* Switch the IVPR. XXX If we take a TLB miss after this we're screwed,
+	 * so how do we make sure vcpu won't fault? */
+	lis	r8, kvmppc_booke_handlers@ha
+	lwz	r8, kvmppc_booke_handlers@l(r8)
+	mtspr	SPRN_IVPR, r8
+
+	/* Save vcpu pointer for the exception handlers. */
+	mtspr	SPRN_SPRG1, r4
+
+	/* Can't switch the stack pointer until after IVPR is switched,
+	 * because host interrupt handlers would get confused. */
+	lwz	r1, VCPU_GPR(r1)(r4)
+
+	/* XXX handle USPRG0 */
+	/* Host interrupt handlers may have clobbered these guest-readable
+	 * SPRGs, so we need to reload them here with the guest's values. */
+	lwz	r3, VCPU_SPRG4(r4)
+	mtspr	SPRN_SPRG4, r3
+	lwz	r3, VCPU_SPRG5(r4)
+	mtspr	SPRN_SPRG5, r3
+	lwz	r3, VCPU_SPRG6(r4)
+	mtspr	SPRN_SPRG6, r3
+	lwz	r3, VCPU_SPRG7(r4)
+	mtspr	SPRN_SPRG7, r3
+
+	/* Finish loading guest volatiles and jump to guest. */
+	lwz	r3, VCPU_CTR(r4)
+	mtctr	r3
+	lwz	r3, VCPU_CR(r4)
+	mtcr	r3
+	lwz	r5, VCPU_GPR(r5)(r4)
+	lwz	r6, VCPU_GPR(r6)(r4)
+	lwz	r7, VCPU_GPR(r7)(r4)
+	lwz	r8, VCPU_GPR(r8)(r4)
+	lwz	r3, VCPU_PC(r4)
+	mtsrr0	r3
+	lwz	r3, VCPU_MSR(r4)
+	oris	r3, r3, KVMPPC_MSR_MASK@h
+	ori	r3, r3, KVMPPC_MSR_MASK@l
+	mtsrr1	r3
+	lwz	r3, VCPU_GPR(r3)(r4)
+	lwz	r4, VCPU_GPR(r4)(r4)
+	rfi
diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
new file mode 100644
--- /dev/null
+++ b/arch/powerpc/kvm/emulate.c
@@ -0,0 +1,760 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright IBM Corp. 2007
+ *
+ * Authors: Hollis Blanchard <ho...@us...>
+ */
+
+#include <linux/jiffies.h>
+#include <linux/timer.h>
+#include <linux/types.h>
+#include <linux/string.h>
+#include <linux/kvm_host.h>
+
+#include <asm/dcr.h>
+#include <asm/dcr-regs.h>
+#include <asm/time.h>
+#include <asm/byteorder.h>
+#include <asm/kvm_ppc.h>
+
+#include "44x_tlb.h"
+
+/* Instruction decoding */
+static inline unsigned int get_op(u32 inst)
+{
+	return inst >> 26;
+}
+
+static inline unsigned int get_xop(u32 inst)
+{
+	return (inst >> 1) & 0x3ff;
+}
+
+static inline unsigned int get_sprn(u32 inst)
+{
+	return ((inst >> 16) & 0x1f) | ((inst >> 6) & 0x3e0);
+}
+
+static inline unsigned int get_dcrn(u32 inst)
+{
+	return ((inst >> 16) & 0x1f) | ((inst >> 6) & 0x3e0);
+}
+
+static inline unsigned int get_rt(u32 inst)
+{
+	return (inst >> 21) & 0x1f;
+}
+
+static inline unsigned int get_rs(u32 inst)
+{
+	return (inst >> 21) & 0x1f;
+}
+
+static inline unsigned int get_ra(u32 inst)
+{
+	return (inst >> 16) & 0x1f;
+}
+
+static inline unsigned int get_rb(u32 inst)
+{
+	return (inst >> 11) & 0x1f;
+}
+
+static inline unsigned int get_rc(u32 inst)
+{
+	return inst & 0x1;
+}
+
+static inline unsigned int get_ws(u32 inst)
+{
+	return (inst >> 11) & 0x1f;
+}
+
+static inline unsigned int get_d(u32 inst)
+{
+	return inst & 0xffff;
+}
+
+static int tlbe_is_host_safe(const struct kvm_vcpu *vcpu,
+                             const struct tlbe *tlbe)
+{
+	gpa_t gpa;
+
+	if (!get_tlb_v(tlbe))
+		return 0;
+
+	/* Does it match current guest AS? */
+	/* XXX what about IS != DS? */
+	if (get_tlb_ts(tlbe) != !!(vcpu->arch.msr & MSR_IS))
+		return 0;
+
+	gpa = get_tlb_raddr(tlbe);
+	if (!gfn_to_memslot(vcpu->kvm, gpa >> PAGE_SHIFT))
+		/* Mapping is not for RAM. */
+		return 0;
+
+	return 1;
+}
+
+static int kvmppc_emul_tlbwe(struct kvm_vcpu *vcpu, u32 inst)
+{
+	u64 eaddr;
+	u64 raddr;
+	u64 asid;
+	u32 flags;
+	struct tlbe *tlbe;
+	unsigned int ra;
+	unsigned int rs;
+	unsigned int ws;
+	unsigned int index;
+
+	ra = get_ra(inst);
+	rs = get_rs(inst);
+	ws = get_ws(inst);
+
+	index = vcpu->arch.gpr[ra];
+	if (index > PPC44x_TLB_SIZE) {
+		printk("%s: index %d\n", __func__, index);
+		kvmppc_dump_vcpu(vcpu);
+		return EMULATE_FAIL;
+	}
+
+	tlbe = &vcpu->arch.guest_tlb[index];
+
+	/* Invalidate shadow mappings for the about-to-be-clobbered TLBE. */
+	if (tlbe->word0 & PPC44x_TLB_VALID) {
+		eaddr = get_tlb_eaddr(tlbe);
+		asid = (tlbe->word0 & PPC44x_TLB_TS) | tlbe->tid;
+		kvmppc_mmu_invalidate(vcpu, eaddr, asid);
+	}
+
+	switch (ws) {
+	case PPC44x_TLB_PAGEID:
+		tlbe->tid = vcpu->arch.mmucr & 0xff;
+		tlbe->word0 = vcpu->arch.gpr[rs];
+		break;
+
+	case PPC44x_TLB_XLAT:
+		tlbe->word1 = vcpu->arch.gpr[rs];
+		break;
+
+	case PPC44x_TLB_ATTRIB:
+		tlbe->word2 = vcpu->arch.gpr[rs];
+		break;
+
+	default:
+		return EMULATE_FAIL;
+	}
+
+	if (tlbe_is_host_safe(vcpu, tlbe)) {
+		eaddr = get_tlb_eaddr(tlbe);
+		raddr = get_tlb_raddr(tlbe);
+		asid = (tlbe->word0 & PPC44x_TLB_TS) | tlbe->tid;
+		flags = tlbe->word2 & 0xffff;
+
+		/* Create a 4KB mapping on the host. If the guest wanted a
+		 * large page, only the first 4KB is mapped here and the rest
+		 * are mapped on the fly. */
+		kvmppc_mmu_map(vcpu, eaddr, raddr >> PAGE_SHIFT, asid, flags);
+	}
+
+	return EMULATE_DONE;
+}
+
+static void kvmppc_emulate_dec(struct kvm_vcpu *vcpu)
+{
+	if (vcpu->arch.tcr & TCR_DIE) {
+		/* The decrementer ticks at the same rate as the timebase, so
+		 * that's how we convert the guest DEC value to the number of
+		 * host ticks. */
+		unsigned long nr_jiffies;
+
+		nr_jiffies = vcpu->arch.dec / tb_ticks_per_jiffy;
+		mod_timer(&vcpu->arch.dec_timer,
+		          get_jiffies_64() + nr_jiffies);
+	} else {
+		del_timer(&vcpu->arch.dec_timer);
+	}
+}
+
+static void kvmppc_emul_rfi(struct kvm_vcpu *vcpu)
+{
+	vcpu->arch.pc = vcpu->arch.srr0;
+	kvmppc_set_msr(vcpu, vcpu->arch.srr1);
+}
+
+/* XXX to do:
+ * lhax
+ * lhaux
+ * lswx
+ * lswi
+ * stswx
+ * stswi
+ * lha
+ * lhau
+ * lmw
+ * stmw
+ *
+ * XXX is_bigendian should depend on MMU mapping or MSR[LE]
+ */
+int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
+{
+	u32 inst = vcpu->arch.last_inst;
+	u32 ea;
+	int ra;
+	int rb;
+	int rc;
+	int rs;
+	int rt;
+	int sprn;
+	int dcrn;
+	enum emulation_result emulated = EMULATE_DONE;
+	int advance = 1;
+
+	switch (get_op(inst)) {
+	case 3:                                                 /* trap */
+		printk("trap!\n");
+		kvmppc_queue_exception(vcpu, BOOKE_INTERRUPT_PROGRAM);
+		advance = 0;
+		break;
+
+	case 19:
+		switch (get_xop(inst)) {
+		case 50:                                        /* rfi */
+			kvmppc_emul_rfi(vcpu);
+			advance = 0;
+			break;
+
+		default:
+			emulated = EMULATE_FAIL;
+			break;
+		}
+		break;
+
+	case 31:
+		switch (get_xop(inst)) {
+
+		case 83:                                        /* mfmsr */
+			rt = get_rt(inst);
+			vcpu->arch.gpr[rt] = vcpu->arch.msr;
+			break;
+
+		case 87:                                        /* lbzx */
+			rt = get_rt(inst);
+			emulated = kvmppc_handle_load(run, vcpu, rt, 1, 1);
+			break;
+
+		case 131:                                       /* wrtee */
+			rs = get_rs(inst);
+			vcpu->arch.msr = (vcpu->arch.msr & ~MSR_EE)
+			                 | (vcpu->arch.gpr[rs] & MSR_EE);
+			break;
+
+		case 146:                                       /* mtmsr */
+			rs = get_rs(inst);
+			kvmppc_set_msr(vcpu, vcpu->arch.gpr[rs]);
+			break;
+
+		case 163:                                       /* wrteei */
+			vcpu->arch.msr = (vcpu->arch.msr & ~MSR_EE)
+			                 | (inst & MSR_EE);
+			break;
+
+		case 215:                                       /* stbx */
+			rs = get_rs(inst);
+			emulated = kvmppc_handle_store(run, vcpu,
+			                               vcpu->arch.gpr[rs],
+			                               1, 1);
+			break;
+
+		case 247:                                       /* stbux */
+			rs = get_rs(inst);
+			ra = get_ra(inst);
+			rb = get_rb(inst);
+
+			ea = vcpu->arch.gpr[rb];
+			if (ra)
+				ea += vcpu->arch.gpr[ra];
+
+			emulated = kvmppc_handle_store(run, vcpu,
+			                               vcpu->arch.gpr[rs],
+			                               1, 1);
+			vcpu->arch.gpr[rs] = ea;
+			break;
+
+		case 279:                                       /* lhzx */
+			rt = get_rt(inst);
+			emulated = kvmppc_handle_load(run, vcpu, rt, 2, 1);
+			break;
+
+		case 311:                                       /* lhzux */
+			rt = get_rt(inst);
+			ra = get_ra(inst);
+			rb = get_rb(inst);
+
+			ea = vcpu->arch.gpr[rb];
+			if (ra)
+				ea += vcpu->arch.gpr[ra];
+
+			emulated = kvmppc_handle_load(run, vcpu, rt, 2, 1);
+			vcpu->arch.gpr[ra] = ea;
+			break;
+
+		case 323:                                       /* mfdcr */
+			dcrn = get_dcrn(inst);
+			rt = get_rt(inst);
+
+			/* The guest may access CPR0 registers to determine the timebase
+			 * frequency, and it must know the real host frequency because it
+			 * can directly access the timebase registers.
+			 *
+			 * It would be possible to emulate those accesses in userspace,
+			 * but userspace can really only figure out the end frequency.
+			 * We could decompose that into the factors that compute it, but
+			 * that's tricky math, and it's easier to just report the real
+			 * CPR0 values.
+			 */
+			switch (dcrn) {
+			case DCRN_CPR0_CONFIG_ADDR:
+				vcpu->arch.gpr[rt] = vcpu->arch.cpr0_cfgaddr;
+				break;
+			case DCRN_CPR0_CONFIG_DATA:
+				local_irq_disable();
+				mtdcr(DCRN_CPR0_CONFIG_ADDR,
+				      vcpu->arch.cpr0_cfgaddr);
+				vcpu->arch.gpr[rt] = mfdcr(DCRN_CPR0_CONFIG_DATA);
+				local_irq_enable();
+				break;
+			default:
+				run->dcr.dcrn = dcrn;
+				run->dcr.data =  0;
+				run->dcr.is_write = 0;
+				vcpu->arch.io_gpr = rt;
+				vcpu->arch.dcr_needed = 1;
+				emulated = EMULATE_DO_DCR;
+			}
+
+			break;
+
+		case 339:                                       /* mfspr */
+			sprn = get_sprn(inst);
+			rt = get_rt(inst);
+
+			switch (sprn) {
+			case SPRN_SRR0:
+				vcpu->arch.gpr[rt] = vcpu->arch.srr0; break;
+			case SPRN_SRR1:
+				vcpu->arch.gpr[rt] = vcpu->arch.srr1; break;
+			case SPRN_MMUCR:
+				vcpu->arch.gpr[rt] = vcpu->arch.mmucr; break;
+			case SPRN_PID:
+				vcpu->arch.gpr[rt] = vcpu->arch.pid; break;
+			case SPRN_IVPR:
+				vcpu->arch.gpr[rt] = vcpu->arch.ivpr; break;
+			case SPRN_CCR0:
+				vcpu->arch.gpr[rt] = vcpu->arch.ccr0; break;
+			case SPRN_CCR1:
+				vcpu->arch.gpr[rt] = vcpu->arch.ccr1; break;
+			case SPRN_PVR:
+				vcpu->arch.gpr[rt] = vcpu->arch.pvr; break;
+			case SPRN_DEAR:
+				vcpu->arch.gpr[rt] = vcpu->arch.dear; break;
+			case SPRN_ESR:
+				vcpu->arch.gpr[rt] = vcpu->arch.esr; break;
+			case SPRN_DBCR0:
+				vcpu->arch.gpr[rt] = vcpu->arch.dbcr0; break;
+			case SPRN_DBCR1:
+				vcpu->arch.gpr[rt] = vcpu->arch.dbcr1; break;
+
+			/* Note: mftb and TBR...
 
[truncated message content]

[kvm-ppc-devel] [PATCH 3 of 4] [KVM POWERPC] Add MAINTAINERS entry for PowerPC KVM

From: Hollis B. <ho...@us...> - 2008-04-17 04:28:14

1 file changed, 7 insertions(+)
MAINTAINERS |    7 +++++++


Signed-off-by: Hollis Blanchard <ho...@us...>
Acked-by: Paul Mackerras <pa...@sa...>

diff --git a/MAINTAINERS b/MAINTAINERS
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2302,6 +2302,13 @@
 W:	kvm.sourceforge.net
 S:	Supported
 
+KERNEL VIRTUAL MACHINE (KVM) FOR POWERPC
+P:	Hollis Blanchard
+M:	ho...@us...
+L:	kvm...@li...
+W:	kvm.sourceforge.net
+S:	Supported
+
 KERNEL VIRTUAL MACHINE For Itanium(KVM/IA64)
 P:	Anthony Xu
 M:	ant...@in...

Re: [kvm-ppc-devel] [kvm-devel] [PATCH 1/5]Add some trace markers and exposeinterfaces in kernel for tracing

From: Liu, E. E <eri...@in...> - 2008-04-16 06:45:46

Hollis Blanchard wrote:
> On Tuesday 15 April 2008 22:13:28 Liu, Eric E wrote:
>> Hollis Blanchard wrote:
>>> On Wednesday 09 April 2008 05:01:36 Liu, Eric E wrote:
>>>> +/* This structure represents a single trace buffer record. */
>>>> +struct kvm_trace_rec { +       __u32 event:28;
>>>> +       __u32 extra_u32:3;
>>>> +       __u32 cycle_in:1;
>>>> +       __u32 pid;
>>>> +       __u32 vcpu_id;
>>>> +       union {
>>>> +               struct {
>>>> +                       __u32 cycle_lo, cycle_hi;
>>>> +                       __u32 extra_u32[KVM_TRC_EXTRA_MAX];
>>>> +               } cycle; +               struct {
>>>> +                       __u32 extra_u32[KVM_TRC_EXTRA_MAX];
>>>> +               } nocycle; +       } u;
>>>> +};
>>> 
>>> Do we really need bitfields here? They are notoriously non-portable.
>>> 
>>> Practically speaking, this will prevent me from copying a trace file
>>> from my big-endian target to my little-endian workstation for
>>> analysis, at least without some ugly hacking in the userland tool.
>> Here the main consideration using bitfields is to save storage space
>> for 
> each record, but as you said it is non-portable for your mentioned
> case, so should we need to adjust the struct like this?
>> 	__u32 event;
>> 	__16 extra_u32;
>> 	__16 cycle_in;
> 
> If space really is a worry, you could still combine the fields, and
> just use masks to extract the data later. No matter what,
> byteswapping is required in the userland tool. I suspect this isn't
> there already, but it will be easier to add without the bitfields.
> 
> Hmm, while we're on the subject, I'm not sure what the best way to
> automatically byteswap will be. It probably isn't worth it to convert
> all trace data to a standard ordering (which would add overhead to
> tracing), but I suppose there is no metadata in the trace log? A
> command line switch might be inconvenient but inevitable.

A tricky approach is that we insert medadata to the trace file before reading the trace log, so that the analysis tool can look at the medadata to check whether we need to convert byte order?

Re: [kvm-ppc-devel] [kvm-devel] [PATCH 1/5]Add some trace markers and exposeinterfaces in kernel for tracing

From: Hollis B. <ho...@us...> - 2008-04-16 05:35:19

On Tuesday 15 April 2008 22:13:28 Liu, Eric E wrote:
> Hollis Blanchard wrote:
> > On Wednesday 09 April 2008 05:01:36 Liu, Eric E wrote:
> >> +/* This structure represents a single trace buffer record. */
> >> +struct kvm_trace_rec { +       __u32 event:28;
> >> +       __u32 extra_u32:3;
> >> +       __u32 cycle_in:1;
> >> +       __u32 pid;
> >> +       __u32 vcpu_id;
> >> +       union {
> >> +               struct {
> >> +                       __u32 cycle_lo, cycle_hi;
> >> +                       __u32 extra_u32[KVM_TRC_EXTRA_MAX];
> >> +               } cycle; +               struct {
> >> +                       __u32 extra_u32[KVM_TRC_EXTRA_MAX];
> >> +               } nocycle; +       } u;
> >> +};
> > 
> > Do we really need bitfields here? They are notoriously non-portable.
> > 
> > Practically speaking, this will prevent me from copying a trace file
> > from my big-endian target to my little-endian workstation for
> > analysis, at least without some ugly hacking in the userland tool.
> Here the main consideration using bitfields is to save storage space for 
each record, but as you said it is non-portable for your mentioned case, so 
should we need to adjust the struct like this? 
> 	__u32 event;
> 	__16 extra_u32;
> 	__16 cycle_in;

If space really is a worry, you could still combine the fields, and just use 
masks to extract the data later. No matter what, byteswapping is required in 
the userland tool. I suspect this isn't there already, but it will be easier 
to add without the bitfields.

Hmm, while we're on the subject, I'm not sure what the best way to 
automatically byteswap will be. It probably isn't worth it to convert all 
trace data to a standard ordering (which would add overhead to tracing), but 
I suppose there is no metadata in the trace log? A command line switch might 
be inconvenient but inevitable.

-- 
Hollis Blanchard
IBM Linux Technology Center

Re: [kvm-ppc-devel] [PATCH] [2/3] kvmppc: full instruction emulation tracing

From: Hollis B. <ho...@us...> - 2008-04-15 21:02:57

On Monday 14 April 2008 07:23:57 ehr...@li... wrote:
> From: Christian Ehrhardt <ehr...@li...>
>
> This adds a relayfs based kernel interface that allows tracing of all
> emulated guest instructions. It is based on Hollis tracing of tlb
> activities. A suitable userspace program to read from that interface and a
> post processing script that give users a readable output follow.
> This feature is configurable in .config under the kvmppc virtualization
> itself.
>
> Signed-off-by: Christian Ehrhardt <ehr...@li...>

By the way, check out the new kvm_trace stuff checked in to kvm.git last week. 
Looks like we'll probably want to use that.

-- 
Hollis Blanchard
IBM Linux Technology Center

Re: [kvm-ppc-devel] [PATCH 2 of 2] Add PowerPC KVM guest wait handling support

From: Jerone Y. <jy...@us...> - 2008-04-15 20:35:04

On Tue, 2008-04-15 at 13:56 -0500, Hollis Blanchard wrote:
> On Tuesday 15 April 2008 13:29:19 Jerone Young wrote:
> > On Tue, 2008-04-15 at 13:19 -0500, Hollis Blanchard wrote:
> > > On Tuesday 15 April 2008 11:14:52 Jerone Young wrote:
> > > > Actually there appears to be a real problem with preempt notify in 44x.
> > > > I had no gotten a chance to get back with you about it. But I did some
> > > > investigation into it last week. We are following the same code paths
> > > > (common code) as x86 for preempt initalization. But I ran some tests
> > > > using preempt_disable() & preempt_enable() around some places where it
> > > > would make since (places where we disable interrupts), but just using
> > > > these functions whould cause the kernel to dump sig #11.
> > >
> > > preempt_disable() and preempt_enable() are completely different, and I've
> > > called those in the past and had no problems.
> >
> > But it's not used anywhere in our code. And places where it can go ..
> > blows up big time.
> 
> We don't explicitly use preempt_disable() now, because we use 
> local_irq_disable() instead.
> 
> However, as I said before, when we *did* use preempt_disable() explicitly, it 
> didn't even blow up a little bit.
> 
> preempt_disable() IS NOT THE SAME as preempt_notify_unregister(). Please read 
> that again, because I think you're skimming over the function names. These 
> are very different things.
> 
> > > > The issue we have using function kvm_vcpu_block() that it is identical
> > > > to the code below, BUT it calls vcpu_put which then calls
> > > > preempt_notify_unregister() if it is called it will also sig #11.
> > >
> > > If preempt_notify_unregister() dereferences a bad pointer, that shouldn't
> > > be too difficult to track down.
> > >
> > > > I'm not sure if what is going on honestly. Based on what I found it
> > > > should "just work" as we are initializing everything like x86 (we are
> > > > calling preempt_notify_init() in the same place). But for 44x any
> > > > preempt notfication calls blow up. So it appears calling anything
> > > > preempt notify related just blows up.
> > > >
> > > > This is a much bigger issue. I'm not sure that we honest want to be
> > > > stuck on this for long periods of time just to have a function call in
> > > > a place where we honestly do not absolutely need to have it at this
> > > > time. Plus I'm no expert with these scheduling frameworks. But givin
> > > > what have read around the net what we have now "should" work. It just
> > > > doesn't.
> > > >
> > > > Something can come back to later. But for now we should just roll with
> > > > the working code.
> > >
> > > No, I don't want to commit the hack. Sorry if I didn't make that clear
> > > before.
> >
> > But this isn't a hack. You just want to use the common function. The
> > issue is there are a lot place we do not use the common function.
> 
> We use common code wherever it makes sense, and it makes a lot of sense here. 
> If you can point out other areas we could or should share code, I would be 
> interested in taking a look.
> 
> > Since 
> > we really do need to get functional code up and going this is something
> > that can be postponed to be fixed. I will (and have been) take a look at
> > this to change this. But if we put a ton of emphasis on something that
> > isn't needed to get us going, then we will never move forward to solve
> > the real issues left to get this stuff usable.
> >
> > > Accessing a bad pointer doesn't seem like a "much bigger issue" to me, so
> > > if it's more complicated than that please elaborate.
> >
> > The problem is the pointer is assigned by the frame work. I'm not
> > passing in any pointer to preempt. So something more is going on.
> 
> Obviously you're not passing a pointer in, yet sig11 means the code is 
> dereferencing a bad pointer. The backtrace probably indicates exactly what 
> pointer is causing the problem, and it's 100% reproducible, so it doesn't 
> sound like anything too subtle.
> 
> > I will 
> > have to look into this more after I finish the next task. I don't
> > honestly know all that much about the preempt frame work. But I have
> > spent time trying to figure it out.
> 
> Great, I will look forward to the corrected patch.

I would suggest that you include the current patch so that other can use
the functionality and it can be changed to use the common code function
once I figure out the preempt issue. Otherwise anyone trying to use our
source is still going to suffer from guest eating 100% cpu when idle.

>

Re: [kvm-ppc-devel] [PATCH] [v2][kvm-userspace] Make "make sync" in kernel dir work for multiple archs

From: Hollis B. <ho...@us...> - 2008-04-15 19:58:11

On Tuesday 15 April 2008 14:43:12 Jerone Young wrote:
> 1 file changed, 31 insertions(+), 17 deletions(-)
> kernel/Makefile |   48 +++++++++++++++++++++++++++++++-----------------
>
>
> This patch add the ability for make sync in the kernel directory to work
> for mulitiple architectures and not just x86.
>
> Signed-off-by: Jerone Young <jy...@us...>
>
> diff --git a/kernel/Makefile b/kernel/Makefile
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -1,5 +1,10 @@ include ../config.mak
>  include ../config.mak
>
> +ARCH_DIR=$(ARCH)
> +ifneq '$(filter $(ARCH_DIR), x86_64 i386)' ''
> +	ARCH_DIR=x86
> +endif
> +
>  KVERREL = $(patsubst /lib/modules/%/build,%,$(KERNELDIR))
>
>  DESTDIR=
> @@ -18,11 +23,25 @@ _hack = mv $1 $1.orig && \
>
>  	    | sed '/\#include/! s/\blapic\b/l_apic/g' > $1 && rm $1.orig
>
>  _unifdef = mv $1 $1.orig && \
> -	  unifdef -DCONFIG_X86 $1.orig > $1; \
> +	  unifdef -DCONFIG_$(ARCH_DIR) $1.orig > $1; \
>            [ $$? -le 1 ] && rm $1.orig

This isn't going to work because you've changed -DCONFIG_X86 to -DCONFIG_x86 .

-- 
Hollis Blanchard
IBM Linux Technology Center

[kvm-ppc-devel] [PATCH] [v2][kvm-userspace] Make "make sync" in kernel dir work for multiple archs

From: Jerone Y. <jy...@us...> - 2008-04-15 19:43:15

1 file changed, 31 insertions(+), 17 deletions(-)
kernel/Makefile |   48 +++++++++++++++++++++++++++++++-----------------


This patch add the ability for make sync in the kernel directory to work for mulitiple architectures and not just x86.

Signed-off-by: Jerone Young <jy...@us...>

diff --git a/kernel/Makefile b/kernel/Makefile
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -1,5 +1,10 @@ include ../config.mak
 include ../config.mak
 
+ARCH_DIR=$(ARCH)
+ifneq '$(filter $(ARCH_DIR), x86_64 i386)' ''
+	ARCH_DIR=x86
+endif
+ 
 KVERREL = $(patsubst /lib/modules/%/build,%,$(KERNELDIR))
 
 DESTDIR=
@@ -18,11 +23,25 @@ _hack = mv $1 $1.orig && \
 	    | sed '/\#include/! s/\blapic\b/l_apic/g' > $1 && rm $1.orig
 
 _unifdef = mv $1 $1.orig && \
-	  unifdef -DCONFIG_X86 $1.orig > $1; \
+	  unifdef -DCONFIG_$(ARCH_DIR) $1.orig > $1; \
           [ $$? -le 1 ] && rm $1.orig
 
 hack = $(call _hack,tmp/$(strip $1))
 unifdef = $(call _unifdef,tmp/$(strip $1))
+
+ifneq '$(filter $(ARCH_DIR), x86_64 i386)' ''
+UNIFDEF_FILES = include/linux/kvm.h \
+		include/linux/kvm_para.h \
+		include/asm-$(ARCH_DIR)/kvm.h \
+		include/asm-x86/kvm_para.h
+
+HACK_FILES = kvm_main.c \
+		mmu.c \
+		vmx.c \
+		svm.c \
+		x86.c \
+		irq.h 
+endif
 
 all::
 	# include header priority 1) $LINUX 2) $KERNELDIR 3) include-compat
@@ -34,26 +53,21 @@ sync:
 sync:
 	rm -rf tmp include
 	rsync --exclude='*.mod.c' -R \
-             "$(LINUX)"/arch/x86/kvm/./*.[ch] \
+             "$(LINUX)"/arch/$(ARCH_DIR)/kvm/./*.[ch] \
              "$(LINUX)"/virt/kvm/./*.[ch] \
 	     "$(LINUX)"/./include/linux/kvm*.h \
-	     "$(LINUX)"/./include/asm-x86/kvm*.h \
+	     "$(LINUX)"/./include/asm-$(ARCH_DIR)/kvm*.h \
              tmp/
-	mkdir -p include/linux include/asm-x86
-	ln -s asm-x86 include/asm
-	ln -sf asm-x86 include-compat/asm
+	mkdir -p include/linux include/asm-$(ARCH_DIR)
+	ln -s asm-$(ARCH_DIR) include/asm
+	ln -sf asm-$(ARCH_DIR) include-compat/asm
 
-	$(call unifdef, include/linux/kvm.h)
-	$(call unifdef, include/linux/kvm_para.h)
-	$(call unifdef, include/asm-x86/kvm.h)
-	$(call unifdef, include/asm-x86/kvm_para.h)
-	$(call hack, include/linux/kvm.h)
-	$(call hack, kvm_main.c)
-	$(call hack, mmu.c)
-	$(call hack, vmx.c)
-	$(call hack, svm.c)
-	$(call hack, x86.c)
-	$(call hack, irq.h)
+	for i in $(UNIFDEF_FILES); \
+		do $(call unifdef, $$i); done
+
+	for i in $(HACK_FILES); \
+		do $(call hack, $$i); done
+
 	for i in $$(find tmp -type f -printf '%P '); \
 		do cmp -s $$i tmp/$$i || cp tmp/$$i $$i; done
 	rm -rf tmp

Re: [kvm-ppc-devel] [PATCH 2 of 2] Add PowerPC KVM guest wait handling support

From: Hollis B. <ho...@us...> - 2008-04-15 19:28:46

On Tuesday 15 April 2008 14:09:22 Jerone Young wrote:
> On Tue, 2008-04-15 at 13:56 -0500, Hollis Blanchard wrote:
> > On Tuesday 15 April 2008 13:29:19 Jerone Young wrote:
> > > I will
> > > have to look into this more after I finish the next task. I don't
> > > honestly know all that much about the preempt frame work. But I have
> > > spent time trying to figure it out.
> >
> > Great, I will look forward to the corrected patch.
>
> I would suggest that you include the current patch so that other can use
> the functionality and it can be changed to use the common code function
> once I figure out the preempt issue. Otherwise anyone trying to use our
> source is still going to suffer from guest eating 100% cpu when idle.

Look, I won't commit this. Just fix it. I don't know how else to say it.

-- 
Hollis Blanchard
IBM Linux Technology Center

Re: [kvm-ppc-devel] [PATCH 2 of 2] Add PowerPC KVM guest wait handling support

From: Jerone Y. <jy...@us...> - 2008-04-15 19:27:02

On Tue, 2008-04-15 at 13:19 -0500, Hollis Blanchard wrote:
> On Tuesday 15 April 2008 11:14:52 Jerone Young wrote:
> > Actually there appears to be a real problem with preempt notify in 44x.
> > I had no gotten a chance to get back with you about it. But I did some
> > investigation into it last week. We are following the same code paths
> > (common code) as x86 for preempt initalization. But I ran some tests
> > using preempt_disable() & preempt_enable() around some places where it
> > would make since (places where we disable interrupts), but just using
> > these functions whould cause the kernel to dump sig #11.
> 
> preempt_disable() and preempt_enable() are completely different, and I've 
> called those in the past and had no problems.

But it's not used anywhere in our code. And places where it can go ..
blows up big time. 

> 
> > The issue we have using function kvm_vcpu_block() that it is identical
> > to the code below, BUT it calls vcpu_put which then calls
> > preempt_notify_unregister() if it is called it will also sig #11.
> 
> If preempt_notify_unregister() dereferences a bad pointer, that shouldn't be 
> too difficult to track down.
> 
> > I'm not sure if what is going on honestly. Based on what I found it
> > should "just work" as we are initializing everything like x86 (we are
> > calling preempt_notify_init() in the same place). But for 44x any
> > preempt notfication calls blow up. So it appears calling anything
> > preempt notify related just blows up.
> >
> > This is a much bigger issue. I'm not sure that we honest want to be
> > stuck on this for long periods of time just to have a function call in a
> > place where we honestly do not absolutely need to have it at this time.
> > Plus I'm no expert with these scheduling frameworks. But givin what have
> > read around the net what we have now "should" work. It just doesn't.
> >
> > Something can come back to later. But for now we should just roll with
> > the working code.
> 
> No, I don't want to commit the hack. Sorry if I didn't make that clear before.

But this isn't a hack. You just want to use the common function. The
issue is there are a lot place we do not use the common function. Since
we really do need to get functional code up and going this is something
that can be postponed to be fixed. I will (and have been) take a look at
this to change this. But if we put a ton of emphasis on something that
isn't needed to get us going, then we will never move forward to solve
the real issues left to get this stuff usable. 


> 
> Accessing a bad pointer doesn't seem like a "much bigger issue" to me, so if 
> it's more complicated than that please elaborate.

The problem is the pointer is assigned by the frame work. I'm not
passing in any pointer to preempt. So something more is going on. I will
have to look into this more after I finish the next task. I don't
honestly know all that much about the preempt frame work. But I have
spent time trying to figure it out.

[kvm-ppc-devel] [PATCH 1 of 2] Add kvm_load_registers into qemu_arch_init for PowerPC

From: Jerone Y. <jy...@us...> - 2008-04-15 18:57:11

1 file changed, 6 insertions(+)
qemu/qemu-kvm-powerpc.c |    6 ++++++


This patch adds a call to load_kvm_registers after creation of vcpu. This is required for ppc since we are required to set certain registers before boot.

Signed-off-by: Jerone Young <jy...@us...>

diff --git a/qemu/qemu-kvm-powerpc.c b/qemu/qemu-kvm-powerpc.c
--- a/qemu/qemu-kvm-powerpc.c
+++ b/qemu/qemu-kvm-powerpc.c
@@ -121,6 +121,12 @@ void kvm_arch_save_regs(CPUState *env)
 
 int kvm_arch_qemu_init_env(CPUState *cenv)
 {
+    if (cenv->cpu_index == 0) {
+        /* load any registers set in env into
+           kvm  for the first guest vcpu */
+        kvm_load_registers(cenv);
+    }
+
     return 0;
 }

6 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 3 4 5 6 7 .. 47 > >> (Page 5 of 47)

2007	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug (22)	Sep (45)	Oct (165)	Nov (149)	Dec (53)
2008	Jan (155)	Feb (71)	Mar (219)	Apr (262)	May (21)	Jun (5)	Jul	Aug	Sep	Oct	Nov	Dec