kvm-devel Mailing List for kernel virtual machine (Page 13)

Brought to you by: avik, mtosatti

kvm-devel — kernel virtual machine development

You can subscribe to this list here.

2006	_Jan	_Feb	_Mar	_Apr	_May	_Jun	_Jul	_Aug	_Sep	_Oct (33)	_Nov (325)	_Dec (320)
2007	_Jan (484)	_Feb (438)	_Mar (407)	_Apr (713)	_May (831)	_Jun (806)	_Jul (1023)	_Aug (1184)	_Sep (1118)	_Oct (1461)	_Nov (1224)	_Dec (1042)
2008	_Jan (1449)	_Feb (1110)	_Mar (1428)	_Apr (1643)	_May (682)	_Jun	_Jul	_Aug	_Sep	_Oct	_Nov	_Dec

Flat | Threaded

<< < 1 .. 11 12 13 14 15 .. 703 > >> (Page 13 of 703)

Re: [kvm-devel] [PATCH/RFC] stop_machine: make stop_machine_run more virtualization friendly

From: Christian B. <bor...@de...> - 2008-05-08 14:43:54

Am Donnerstag, 8. Mai 2008 schrieb Jeremy Fitzhardinge:
> Christian Borntraeger wrote:
> > On kvm I have seen some rare hangs in stop_machine when I used more guest
> > cpus than hosts cpus. e.g. 32 guest cpus on 1 host cpu triggered the
> > hang quite often. I could also reproduce the problem on a 4 way z/VM host 
with 
> > a 64 way guest.
> >   
> 
> I think that's one of those "don't do that then" cases ;)

I really like 64 guest cpus as a good testcase for all kind of things. 

> 
> I think x86 (at least) is now using ticket locks, which is fair.  Which 
> kernel are you seeing this problem on?

Sorry, forgot to mention. Its kvm.git from 2 days ago on s390.

Re: [kvm-devel] QEMU "drive_init()" Disk Format Security Bypass

From: Daniel P. B. <ber...@re...> - 2008-05-08 14:42:19

On Thu, May 08, 2008 at 05:02:28PM +0300, Eren T?rkay wrote:
> Hello,
> 
> An advisory about $subject was released today by secunia. The security flaw 
> was fixed in QEmu SVN repository.
> 
> Kvm uses some of the old version of qemu that I can't backport patch I grabbed 
> from qemu svn repository. Could you look at this issue and provide a patch?

KVM is synced to latest CVS version of QEMU on a regular basis.

> http://secunia.com/advisories/30111/
> 
> Svn commit: 
> http://svn.savannah.gnu.org/viewvc/?view=rev&root=qemu&revision=4277

If you look at the KVM userspace code you'll see this patch is already
included:

http://git.kernel.org/?p=virt/kvm/kvm-userspace.git;a=commit;h=ce486fc1116eb53d40635be926bfa147ad520908

Regards,
Daniel
-- 
|: Red Hat, Engineering, Boston   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

Re: [kvm-devel] QEMU "drive_init()" Disk Format Security Bypass

From: Eren T. <tur...@gm...> - 2008-05-08 14:18:22

On 08 May 2008 Thu 17:12:14 Daniel P. Berrange wrote:
> If you look at the KVM userspace code you'll see this patch is already
> included:
>
> http://git.kernel.org/?p=virt/kvm/kvm-userspace.git;a=commit;h=ce486fc1116e
>b53d40635be926bfa147ad520908

Thank you, I'll grab the patch and apply it to tarball.

> Regards,
> Daniel

Regards,
Eren

Re: [kvm-devel] [PATCH] Fix e1000 can_receive handler

From: Anthony L. <ali...@us...> - 2008-05-08 14:04:19

Aurelien Jarno wrote:
> On Wed, May 07, 2008 at 04:40:58PM -0500, Anthony Liguori wrote:
>   
>> The current logic of the can_receive handler is to allow packets whenever the
>> receiver is disabled or when there are descriptors available in the ring.
>>
>> I think the logic ought to be to allow packets whenever the receiver is enabled
>> and there are descriptors available in the ring.
>>     
>
> The current behaviour is actually correct, this is the way QEMU works:
> when the card is stopped, it should always accept packets, and then
> discard them.
>   

The previous patches in my virtio series change that behavior.  Before 
delivering a packet to a VLAN, it checks to see if any of the VLAN 
clients are able to receive a packet.

This is very important for achieving good performance with tap.  With 
virtio-net, we were dropping a ton of packets with tap because there 
weren't descriptors available on the RX ring.

I plan to submit that behavioral change to QEMU upstream along with the 
virtio drivers.  I'm still optimizing phys_page_find() though.  The 
performance impact of switching the ring manipulation to using the 
stl_phys accessors is unacceptable for KVM.

Regards,

Anthony Liguori

[kvm-devel] QEMU "drive_init()" Disk Format Security Bypass

From: Eren T. <tur...@gm...> - 2008-05-08 14:03:10

Hello,

An advisory about $subject was released today by secunia. The security flaw 
was fixed in QEmu SVN repository.

Kvm uses some of the old version of qemu that I can't backport patch I grabbed 
from qemu svn repository. Could you look at this issue and provide a patch?

http://secunia.com/advisories/30111/

Svn commit: 
http://svn.savannah.gnu.org/viewvc/?view=rev&root=qemu&revision=4277

Discussion: http://lists.gnu.org/archive/html/qemu-devel/2008-04/msg00675.html

Regards,
Eren

Re: [kvm-devel] [PATCH/RFC] stop_machine: make stop_machine_run more virtualization friendly

From: Jeremy F. <je...@go...> - 2008-05-08 13:33:56

Christian Borntraeger wrote:
> On kvm I have seen some rare hangs in stop_machine when I used more guest
> cpus than hosts cpus. e.g. 32 guest cpus on 1 host cpu triggered the
> hang quite often. I could also reproduce the problem on a 4 way z/VM host with 
> a 64 way guest.
>   

I think that's one of those "don't do that then" cases ;)

> It turned out that the guest was consuming all available cpus mostly for
> spinning on scheduler locks like rq->lock. This is expected as the threads are 
> calling yield all the time. 
> The problem is now, that the host scheduling decisings together with the guest 
> scheduling decisions and spinlocks not being fair managed to create an 
> interesting scenario similar to a live lock. (Sometimes the hang resolved 
> itself after some minutes)
>   

I think x86 (at least) is now using ticket locks, which is fair.  Which 
kernel are you seeing this problem on?

> Changing stop_machine to yield the cpu to the hypervisor when yielding inside 
> the guest fixed the problem for me. While I am not completely happy with this 
> patch, I think it causes no harm and it really improves the situation for me.
>
> I used cpu_relax for yielding to the hypervisor, does that work on all 
> architectures?
>   

On x86, cpu_relax is just a "pause" instruction ("rep;nop").  We don't 
hook it in paravirt_ops, and while VT/SVM can be used to fault into the 
hypervisor on this instruction, I don't know if kvm actually does so.  
Either way, it wouldn't work for VMI, Xen or lguest.

    J

> p.s.: If you want to reproduce the problem, cpu hotplug and kprobes use 
> stop_machine_run and both triggered the problem after some retries. 
>
>
> Signed-off-by: Christian Borntraeger <bor...@de...>
> CC: Ingo Molnar <mi...@el...>
> CC: Rusty Russell <ru...@ru...>
>
> ---
>  kernel/stop_machine.c |    7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> Index: kvm/kernel/stop_machine.c
> ===================================================================
> --- kvm.orig/kernel/stop_machine.c
> +++ kvm/kernel/stop_machine.c
> @@ -62,8 +62,7 @@ static int stopmachine(void *cpu)
>  		 * help our sisters onto their CPUs. */
>  		if (!prepared && !irqs_disabled)
>  			yield();
> -		else
> -			cpu_relax();
> +		cpu_relax();
>  	}
>  
>  	/* Ack: we are exiting. */
> @@ -106,8 +105,10 @@ static int stop_machine(void)
>  	}
>  
>  	/* Wait for them all to come to life. */
> -	while (atomic_read(&stopmachine_thread_ack) != stopmachine_num_threads)
> +	while (atomic_read(&stopmachine_thread_ack) != stopmachine_num_threads) {
>  		yield();
> +		cpu_relax();
> +	}
>  
>  	/* If some failed, kill them all. */
>  	if (ret < 0) {
>
> _______________________________________________
> Virtualization mailing list
> Vir...@li...
> https://lists.linux-foundation.org/mailman/listinfo/virtualization
>

Re: [kvm-devel] [PATCH] Fix e1000 can_receive handler

From: Aurelien J. <aur...@au...> - 2008-05-08 13:27:57

On Wed, May 07, 2008 at 04:40:58PM -0500, Anthony Liguori wrote:
> The current logic of the can_receive handler is to allow packets whenever the
> receiver is disabled or when there are descriptors available in the ring.
> 
> I think the logic ought to be to allow packets whenever the receiver is enabled
> and there are descriptors available in the ring.

The current behaviour is actually correct, this is the way QEMU works:
when the card is stopped, it should always accept packets, and then
discard them.

-- 
  .''`.  Aurelien Jarno	            | GPG: 1024D/F1BCDB73
 : :' :  Debian developer           | Electrical Engineer
 `. `'   au...@de...         | aur...@au...
   `-    people.debian.org/~aurel32 | www.aurel32.net

[kvm-devel] [PATCH/RFC] stop_machine: make stop_machine_run more virtualization friendly

From: Christian B. <bor...@de...> - 2008-05-08 13:23:57

On kvm I have seen some rare hangs in stop_machine when I used more guest
cpus than hosts cpus. e.g. 32 guest cpus on 1 host cpu triggered the
hang quite often. I could also reproduce the problem on a 4 way z/VM host with 
a 64 way guest.

It turned out that the guest was consuming all available cpus mostly for
spinning on scheduler locks like rq->lock. This is expected as the threads are 
calling yield all the time. 
The problem is now, that the host scheduling decisings together with the guest 
scheduling decisions and spinlocks not being fair managed to create an 
interesting scenario similar to a live lock. (Sometimes the hang resolved 
itself after some minutes)

Changing stop_machine to yield the cpu to the hypervisor when yielding inside 
the guest fixed the problem for me. While I am not completely happy with this 
patch, I think it causes no harm and it really improves the situation for me.

I used cpu_relax for yielding to the hypervisor, does that work on all 
architectures?

p.s.: If you want to reproduce the problem, cpu hotplug and kprobes use 
stop_machine_run and both triggered the problem after some retries. 


Signed-off-by: Christian Borntraeger <bor...@de...>
CC: Ingo Molnar <mi...@el...>
CC: Rusty Russell <ru...@ru...>

---
 kernel/stop_machine.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Index: kvm/kernel/stop_machine.c
===================================================================
--- kvm.orig/kernel/stop_machine.c
+++ kvm/kernel/stop_machine.c
@@ -62,8 +62,7 @@ static int stopmachine(void *cpu)
 		 * help our sisters onto their CPUs. */
 		if (!prepared && !irqs_disabled)
 			yield();
-		else
-			cpu_relax();
+		cpu_relax();
 	}
 
 	/* Ack: we are exiting. */
@@ -106,8 +105,10 @@ static int stop_machine(void)
 	}
 
 	/* Wait for them all to come to life. */
-	while (atomic_read(&stopmachine_thread_ack) != stopmachine_num_threads)
+	while (atomic_read(&stopmachine_thread_ack) != stopmachine_num_threads) {
 		yield();
+		cpu_relax();
+	}
 
 	/* If some failed, kill them all. */
 	if (ret < 0) {

[kvm-devel] [PATCH 4/4] kvm/guest: fix paravirt clocksource to be compartible with xen.

From: Gerd H. <kr...@re...> - 2008-05-08 11:49:12

This patch switches the kvm clocksource code over to use the
paravirt clock helpers, thereby making it compatible with xen.

Signed-off-by: Gerd Hoffmann <kr...@re...>
---
 arch/x86/Kconfig           |    1 +
 arch/x86/kernel/kvmclock.c |   84 ++++++++++++++++---------------------------
 2 files changed, 32 insertions(+), 53 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b12e188..30feb9f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -388,6 +388,7 @@ config VMI
 config KVM_CLOCK
 	bool "KVM paravirtualized clock"
 	select PARAVIRT
+	select PARAVIRT_CLOCK
 	depends on !(X86_VISWS || X86_VOYAGER)
 	help
 	  Turning on this option will allow you to run a paravirtualized clock
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 4bc1be5..1c63f75 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -18,6 +18,7 @@
 
 #include <linux/clocksource.h>
 #include <linux/kvm_para.h>
+#include <asm/pvclock.h>
 #include <asm/arch_hooks.h>
 #include <asm/msr.h>
 #include <asm/apic.h>
@@ -37,17 +38,9 @@ early_param("no-kvmclock", parse_no_kvmclock);
 
 /* The hypervisor will put information about time periodically here */
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct kvm_vcpu_time_info, hv_clock);
-#define get_clock(cpu, field) per_cpu(hv_clock, cpu).field
-
-static inline u64 kvm_get_delta(u64 last_tsc)
-{
-	int cpu = smp_processor_id();
-	u64 delta = native_read_tsc() - last_tsc;
-	return (delta * get_clock(cpu, tsc_to_system_mul)) >> KVM_SCALE;
-}
 
 static struct kvm_wall_clock wall_clock;
-static cycle_t kvm_clock_read(void);
+
 /*
  * The wallclock is the time of day when we booted. Since then, some time may
  * have elapsed since the hypervisor wrote the data. So we try to account for
@@ -55,35 +48,19 @@ static cycle_t kvm_clock_read(void);
  */
 unsigned long kvm_get_wallclock(void)
 {
-	u32 wc_sec, wc_nsec;
-	u64 delta;
+	struct kvm_vcpu_time_info *vcpu_time;
 	struct timespec ts;
-	int version, nsec;
 	int low, high;
 
 	low = (int)__pa(&wall_clock);
 	high = ((u64)__pa(&wall_clock) >> 32);
+	native_write_msr(MSR_KVM_WALL_CLOCK, low, high);
 
-	delta = kvm_clock_read();
+	vcpu_time = &get_cpu_var(hv_clock);
+	pvclock_read_wallclock(&wall_clock, vcpu_time, &ts);
+	put_cpu_var(hv_clock);
 
-	native_write_msr(MSR_KVM_WALL_CLOCK, low, high);
-	do {
-		version = wall_clock.wc_version;
-		rmb();
-		wc_sec = wall_clock.wc_sec;
-		wc_nsec = wall_clock.wc_nsec;
-		rmb();
-	} while ((wall_clock.wc_version != version) || (version & 1));
-
-	delta = kvm_clock_read() - delta;
-	delta += wc_nsec;
-	nsec = do_div(delta, NSEC_PER_SEC);
-	set_normalized_timespec(&ts, wc_sec + delta, nsec);
-	/*
-	 * Of all mechanisms of time adjustment I've tested, this one
-	 * was the champion!
-	 */
-	return ts.tv_sec + 1;
+	return ts.tv_sec;
 }
 
 int kvm_set_wallclock(unsigned long now)
@@ -91,28 +68,17 @@ int kvm_set_wallclock(unsigned long now)
 	return 0;
 }
 
-/*
- * This is our read_clock function. The host puts an tsc timestamp each time
- * it updates a new time. Without the tsc adjustment, we can have a situation
- * in which a vcpu starts to run earlier (smaller system_time), but probes
- * time later (compared to another vcpu), leading to backwards time
- */
 static cycle_t kvm_clock_read(void)
 {
-	u64 last_tsc, now;
-	int cpu;
+	struct kvm_vcpu_time_info *src;
+	cycle_t ret;
 
-	preempt_disable();
-	cpu = smp_processor_id();
-
-	last_tsc = get_clock(cpu, tsc_timestamp);
-	now = get_clock(cpu, system_time);
-
-	now += kvm_get_delta(last_tsc);
-	preempt_enable();
-
-	return now;
+	src = &get_cpu_var(hv_clock);
+	ret = pvclock_clocksource_read(src);
+	put_cpu_var(hv_clock);
+	return ret;
 }
+
 static struct clocksource kvm_clock = {
 	.name = "kvm-clock",
 	.read = kvm_clock_read,
@@ -123,13 +89,14 @@ static struct clocksource kvm_clock = {
 	.flags = CLOCK_SOURCE_IS_CONTINUOUS,
 };
 
-static int kvm_register_clock(void)
+static int kvm_register_clock(char *txt)
 {
 	int cpu = smp_processor_id();
 	int low, high;
 	low = (int)__pa(&per_cpu(hv_clock, cpu)) | 1;
 	high = ((u64)__pa(&per_cpu(hv_clock, cpu)) >> 32);
-
+	printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
+	       cpu, high, low, txt);
 	return native_write_msr_safe(MSR_KVM_SYSTEM_TIME, low, high);
 }
 
@@ -140,12 +107,20 @@ static void kvm_setup_secondary_clock(void)
 	 * Now that the first cpu already had this clocksource initialized,
 	 * we shouldn't fail.
 	 */
-	WARN_ON(kvm_register_clock());
+	WARN_ON(kvm_register_clock("secondary cpu clock"));
 	/* ok, done with our trickery, call native */
 	setup_secondary_APIC_clock();
 }
 #endif
 
+#ifdef CONFIG_SMP
+void __init kvm_smp_prepare_boot_cpu(void)
+{
+	WARN_ON(kvm_register_clock("primary cpu clock"));
+	native_smp_prepare_boot_cpu();
+}
+#endif
+
 /*
  * After the clock is registered, the host will keep writing to the
  * registered memory location. If the guest happens to shutdown, this memory
@@ -174,7 +149,7 @@ void __init kvmclock_init(void)
 		return;
 
 	if (kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)) {
-		if (kvm_register_clock())
+		if (kvm_register_clock("boot clock"))
 			return;
 		pv_time_ops.get_wallclock = kvm_get_wallclock;
 		pv_time_ops.set_wallclock = kvm_set_wallclock;
@@ -182,6 +157,9 @@ void __init kvmclock_init(void)
 #ifdef CONFIG_X86_LOCAL_APIC
 		pv_apic_ops.setup_secondary_clock = kvm_setup_secondary_clock;
 #endif
+#ifdef CONFIG_SMP
+		smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
+#endif
 		machine_ops.shutdown  = kvm_shutdown;
 #ifdef CONFIG_KEXEC
 		machine_ops.crash_shutdown  = kvm_crash_shutdown;
-- 
1.5.4.1

[kvm-devel] [PATCH 2/4] Make xen use the generic paravirt clocksource code.

From: Gerd H. <kr...@re...> - 2008-05-08 11:48:56

This patch switches the xen paravirt clock over to use the
generic paravirt clock code.

Cc: Jeremy Fitzhardinge <je...@go...>
Signed-off-by: Gerd Hoffmann <kr...@re...>
---
 arch/x86/xen/Kconfig |    1 +
 arch/x86/xen/time.c  |  110 +++++---------------------------------------------
 2 files changed, 12 insertions(+), 99 deletions(-)

diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
index 2e641be..3a4f16a 100644
--- a/arch/x86/xen/Kconfig
+++ b/arch/x86/xen/Kconfig
@@ -5,6 +5,7 @@
 config XEN
 	bool "Xen guest support"
 	select PARAVIRT
+	select PARAVIRT_CLOCK
 	depends on X86_32
 	depends on X86_CMPXCHG && X86_TSC && !(X86_VISWS || X86_VOYAGER)
 	help
diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index c39e1a5..3d5f945 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -13,6 +13,7 @@
 #include <linux/clockchips.h>
 #include <linux/kernel_stat.h>
 
+#include <asm/pvclock.h>
 #include <asm/xen/hypervisor.h>
 #include <asm/xen/hypercall.h>
 
@@ -30,17 +31,6 @@
 
 static cycle_t xen_clocksource_read(void);
 
-/* These are perodically updated in shared_info, and then copied here. */
-struct shadow_time_info {
-	u64 tsc_timestamp;     /* TSC at last update of time vals.  */
-	u64 system_timestamp;  /* Time, in nanosecs, since boot.    */
-	u32 tsc_to_nsec_mul;
-	int tsc_shift;
-	u32 version;
-};
-
-static DEFINE_PER_CPU(struct shadow_time_info, shadow_time);
-
 /* runstate info updated by Xen */
 static DEFINE_PER_CPU(struct vcpu_runstate_info, runstate);
 
@@ -230,95 +220,14 @@ unsigned long xen_cpu_khz(void)
 	return xen_khz;
 }
 
-/*
- * Reads a consistent set of time-base values from Xen, into a shadow data
- * area.
- */
-static unsigned get_time_values_from_xen(void)
-{
-	struct vcpu_time_info   *src;
-	struct shadow_time_info *dst;
-
-	/* src is shared memory with the hypervisor, so we need to
-	   make sure we get a consistent snapshot, even in the face of
-	   being preempted. */
-	src = &__get_cpu_var(xen_vcpu)->time;
-	dst = &__get_cpu_var(shadow_time);
-
-	do {
-		dst->version = src->version;
-		rmb();		/* fetch version before data */
-		dst->tsc_timestamp     = src->tsc_timestamp;
-		dst->system_timestamp  = src->system_time;
-		dst->tsc_to_nsec_mul   = src->tsc_to_system_mul;
-		dst->tsc_shift         = src->tsc_shift;
-		rmb();		/* test version after fetching data */
-	} while ((src->version & 1) | (dst->version ^ src->version));
-
-	return dst->version;
-}
-
-/*
- * Scale a 64-bit delta by scaling and multiplying by a 32-bit fraction,
- * yielding a 64-bit result.
- */
-static inline u64 scale_delta(u64 delta, u32 mul_frac, int shift)
-{
-	u64 product;
-#ifdef __i386__
-	u32 tmp1, tmp2;
-#endif
-
-	if (shift < 0)
-		delta >>= -shift;
-	else
-		delta <<= shift;
-
-#ifdef __i386__
-	__asm__ (
-		"mul  %5       ; "
-		"mov  %4,%%eax ; "
-		"mov  %%edx,%4 ; "
-		"mul  %5       ; "
-		"xor  %5,%5    ; "
-		"add  %4,%%eax ; "
-		"adc  %5,%%edx ; "
-		: "=A" (product), "=r" (tmp1), "=r" (tmp2)
-		: "a" ((u32)delta), "1" ((u32)(delta >> 32)), "2" (mul_frac) );
-#elif __x86_64__
-	__asm__ (
-		"mul %%rdx ; shrd $32,%%rdx,%%rax"
-		: "=a" (product) : "0" (delta), "d" ((u64)mul_frac) );
-#else
-#error implement me!
-#endif
-
-	return product;
-}
-
-static u64 get_nsec_offset(struct shadow_time_info *shadow)
-{
-	u64 now, delta;
-	now = native_read_tsc();
-	delta = now - shadow->tsc_timestamp;
-	return scale_delta(delta, shadow->tsc_to_nsec_mul, shadow->tsc_shift);
-}
-
 static cycle_t xen_clocksource_read(void)
 {
-	struct shadow_time_info *shadow = &get_cpu_var(shadow_time);
+        struct vcpu_time_info *src;
 	cycle_t ret;
-	unsigned version;
-
-	do {
-		version = get_time_values_from_xen();
-		barrier();
-		ret = shadow->system_timestamp + get_nsec_offset(shadow);
-		barrier();
-	} while (version != __get_cpu_var(xen_vcpu)->time.version);
-
-	put_cpu_var(shadow_time);
 
+	src = &get_cpu_var(xen_vcpu)->time;
+	ret = pvclock_clocksource_read((void*)src);
+	put_cpu_var(xen_vcpu);
 	return ret;
 }
 
@@ -349,9 +258,14 @@ static void xen_read_wallclock(struct timespec *ts)
 
 unsigned long xen_get_wallclock(void)
 {
+	const struct shared_info *s = HYPERVISOR_shared_info;
+	struct kvm_wall_clock *wall_clock = (void*)&(s->wc_version);
+        struct vcpu_time_info *vcpu_time;
 	struct timespec ts;
 
-	xen_read_wallclock(&ts);
+	vcpu_time = &get_cpu_var(xen_vcpu)->time;
+	pvclock_read_wallclock(wall_clock, (void*)vcpu_time, &ts);
+	put_cpu_var(xen_vcpu);
 
 	return ts.tv_sec;
 }
@@ -576,8 +490,6 @@ __init void xen_time_init(void)
 {
 	int cpu = smp_processor_id();
 
-	get_time_values_from_xen();
-
 	clocksource_register(&xen_clocksource);
 
 	if (HYPERVISOR_vcpu_op(VCPUOP_stop_periodic_timer, cpu, NULL) == 0) {
-- 
1.5.4.1

[kvm-devel] [PATCH 3/4] kvm/host: fix paravirt clocksource to be compatible with xen.

From: Gerd H. <kr...@re...> - 2008-05-08 11:48:52

Signed-off-by: Gerd Hoffmann <kr...@re...>
---
 arch/x86/kvm/x86.c |   63 +++++++++++++++++++++++++++++++++++++++++++--------
 1 files changed, 53 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 979f983..6906d54 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -493,7 +493,7 @@ static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock)
 {
 	static int version;
 	struct kvm_wall_clock wc;
-	struct timespec wc_ts;
+	struct timespec now,sys,boot;
 
 	if (!wall_clock)
 		return;
@@ -502,9 +502,16 @@ static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock)
 
 	kvm_write_guest(kvm, wall_clock, &version, sizeof(version));
 
-	wc_ts = current_kernel_time();
-	wc.wc_sec = wc_ts.tv_sec;
-	wc.wc_nsec = wc_ts.tv_nsec;
+#if 0
+	/* Hmm, getboottime() isn't exported to modules ... */
+	getboottime(&boot);
+#else
+	now = current_kernel_time();
+	ktime_get_ts(&sys);
+	boot = ns_to_timespec(timespec_to_ns(&now) - timespec_to_ns(&sys));
+#endif
+	wc.wc_sec = boot.tv_sec;
+	wc.wc_nsec = boot.tv_nsec;
 	wc.wc_version = version;
 
 	kvm_write_guest(kvm, wall_clock, &wc, sizeof(wc));
@@ -537,20 +544,58 @@ static void kvm_write_guest_time(struct kvm_vcpu *v)
 	/*
 	 * The interface expects us to write an even number signaling that the
 	 * update is finished. Since the guest won't see the intermediate
-	 * state, we just write "2" at the end
+	 * state, we just increase by 2 at the end.
 	 */
-	vcpu->hv_clock.version = 2;
+	vcpu->hv_clock.version += 2;
 
 	shared_kaddr = kmap_atomic(vcpu->time_page, KM_USER0);
 
 	memcpy(shared_kaddr + vcpu->time_offset, &vcpu->hv_clock,
-		sizeof(vcpu->hv_clock));
+	       sizeof(vcpu->hv_clock));
 
 	kunmap_atomic(shared_kaddr, KM_USER0);
 
 	mark_page_dirty(v->kvm, vcpu->time >> PAGE_SHIFT);
 }
 
+static uint32_t div_frac(uint32_t dividend, uint32_t divisor)
+{
+	uint32_t quotient, remainder;
+
+	__asm__ ( "divl %4"
+		  : "=a" (quotient), "=d" (remainder)
+		  : "0" (0), "1" (dividend), "r" (divisor) );
+	return quotient;
+}
+
+static void kvm_set_time_scale(uint32_t tsc_khz, struct kvm_vcpu_time_info *hv_clock)
+{
+	uint64_t nsecs = 1000000000LL;
+	int32_t  shift = 0;
+	uint64_t tps64;
+	uint32_t tps32;
+
+	tps64 = tsc_khz * 1000LL;
+	while (tps64 > nsecs*2) {
+		tps64 >>= 1;
+		shift--;
+	}
+
+	tps32 = (uint32_t)tps64;
+	while (tps32 <= (uint32_t)nsecs) {
+		tps32 <<= 1;
+		shift++;
+	}
+
+	hv_clock->tsc_shift = shift;
+	hv_clock->tsc_to_system_mul = div_frac(nsecs, tps32);
+
+#if 0
+	printk(KERN_DEBUG "%s: tsc_khz %u, tsc_shift %d, tsc_mul %u\n",
+	       __FUNCTION__, tsc_khz, hv_clock->tsc_shift,
+	       hv_clock->tsc_to_system_mul);
+#endif
+}
 
 int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
 {
@@ -599,9 +644,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
 		/* ...but clean it before doing the actual write */
 		vcpu->arch.time_offset = data & ~(PAGE_MASK | 1);
 
-		vcpu->arch.hv_clock.tsc_to_system_mul =
-					clocksource_khz2mult(tsc_khz, 22);
-		vcpu->arch.hv_clock.tsc_shift = 22;
+		kvm_set_time_scale(tsc_khz, &vcpu->arch.hv_clock);
 
 		down_read(&current->mm->mmap_sem);
 		vcpu->arch.time_page =
-- 
1.5.4.1

[kvm-devel] [PATCH 0/4] paravirt clock series.

From: Gerd H. <kr...@re...> - 2008-05-08 11:48:52

Respin of the paravirt clock patch series.

On the host side the kvm paravirt clock is made compatible with the
xen clock.

On the guest side some xen code has been factored out into a separate
source file shared by both kvm and xen clock implementations.

This time it should work ok for kvm smp guests ;)

cheers,
  Gerd

[kvm-devel] [PATCH 1/4] Add helper functions for paravirtual clocksources.

From: Gerd H. <kr...@re...> - 2008-05-08 11:48:52

The helper functions are intended to be used by both xen and kvm
paravirtual clock sources.  Following patches of this series put
them into use.  They are based on the xen code.

Cc: Jeremy Fitzhardinge <je...@go...>
Signed-off-by: Gerd Hoffmann <kr...@re...>
---
 arch/x86/Kconfig          |    4 +
 arch/x86/kernel/Makefile  |    1 +
 arch/x86/kernel/pvclock.c |  148 +++++++++++++++++++++++++++++++++++++++++++++
 include/asm-x86/pvclock.h |    6 ++
 4 files changed, 159 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/kernel/pvclock.c
 create mode 100644 include/asm-x86/pvclock.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 845ea2b..b12e188 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -415,6 +415,10 @@ config PARAVIRT
 	  over full virtualization.  However, when run without a hypervisor
 	  the kernel is theoretically slower and slightly larger.
 
+config PARAVIRT_CLOCK
+	bool
+	default n
+
 endif
 
 config MEMTEST_BOOTPARAM
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index bbdacb3..5d8e086 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -82,6 +82,7 @@ obj-$(CONFIG_VMI)		+= vmi_32.o vmiclock_32.o
 obj-$(CONFIG_KVM_GUEST)		+= kvm.o
 obj-$(CONFIG_KVM_CLOCK)		+= kvmclock.o
 obj-$(CONFIG_PARAVIRT)		+= paravirt.o paravirt_patch_$(BITS).o
+obj-$(CONFIG_PARAVIRT_CLOCK)	+= pvclock.o
 
 ifdef CONFIG_INPUT_PCSPKR
 obj-y				+= pcspeaker.o
diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
new file mode 100644
index 0000000..33e526f
--- /dev/null
+++ b/arch/x86/kernel/pvclock.c
@@ -0,0 +1,148 @@
+/*  paravirtual clock -- common code used by kvm/xen
+
+    This program is free software; you can redistribute it and/or modify
+    it under the terms of the GNU General Public License as published by
+    the Free Software Foundation; either version 2 of the License, or
+    (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with this program; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+*/
+
+#include <linux/kernel.h>
+#include <linux/percpu.h>
+#include <asm/pvclock.h>
+
+/*
+ * These are perodically updated
+ *    xen: magic shared_info page
+ *    kvm: gpa registered via msr
+ * and then copied here.
+ */
+struct pvclock_shadow_time {
+	u64 tsc_timestamp;     /* TSC at last update of time vals.  */
+	u64 system_timestamp;  /* Time, in nanosecs, since boot.    */
+	u32 tsc_to_nsec_mul;
+	int tsc_shift;
+	u32 version;
+};
+
+/*
+ * Scale a 64-bit delta by scaling and multiplying by a 32-bit fraction,
+ * yielding a 64-bit result.
+ */
+static inline u64 scale_delta(u64 delta, u32 mul_frac, int shift)
+{
+	u64 product;
+#ifdef __i386__
+	u32 tmp1, tmp2;
+#endif
+
+	if (shift < 0)
+		delta >>= -shift;
+	else
+		delta <<= shift;
+
+#ifdef __i386__
+	__asm__ (
+		"mul  %5       ; "
+		"mov  %4,%%eax ; "
+		"mov  %%edx,%4 ; "
+		"mul  %5       ; "
+		"xor  %5,%5    ; "
+		"add  %4,%%eax ; "
+		"adc  %5,%%edx ; "
+		: "=A" (product), "=r" (tmp1), "=r" (tmp2)
+		: "a" ((u32)delta), "1" ((u32)(delta >> 32)), "2" (mul_frac) );
+#elif __x86_64__
+	__asm__ (
+		"mul %%rdx ; shrd $32,%%rdx,%%rax"
+		: "=a" (product) : "0" (delta), "d" ((u64)mul_frac) );
+#else
+#error implement me!
+#endif
+
+	return product;
+}
+
+static u64 pvclock_get_nsec_offset(struct pvclock_shadow_time *shadow)
+{
+	u64 delta = native_read_tsc() - shadow->tsc_timestamp;
+	return scale_delta(delta, shadow->tsc_to_nsec_mul, shadow->tsc_shift);
+}
+
+/*
+ * Reads a consistent set of time-base values from hypervisor,
+ * into a shadow data area.
+ */
+static unsigned pvclock_get_time_values(struct pvclock_shadow_time *dst,
+					struct kvm_vcpu_time_info *src)
+{
+	do {
+		dst->version = src->version;
+		rmb();		/* fetch version before data */
+		dst->tsc_timestamp     = src->tsc_timestamp;
+		dst->system_timestamp  = src->system_time;
+		dst->tsc_to_nsec_mul   = src->tsc_to_system_mul;
+		dst->tsc_shift         = src->tsc_shift;
+		rmb();		/* test version after fetching data */
+	} while ((src->version & 1) || (dst->version != src->version));
+
+	return dst->version;
+}
+
+/*
+ * This is our read_clock function. The host puts an tsc timestamp each time
+ * it updates a new time. Without the tsc adjustment, we can have a situation
+ * in which a vcpu starts to run earlier (smaller system_time), but probes
+ * time later (compared to another vcpu), leading to backwards time
+ */
+
+cycle_t pvclock_clocksource_read(struct kvm_vcpu_time_info *src)
+{
+	struct pvclock_shadow_time shadow;
+	unsigned version;
+	cycle_t ret, offset;
+
+	do {
+		version = pvclock_get_time_values(&shadow, src);
+		barrier();
+		offset = pvclock_get_nsec_offset(&shadow);
+		ret = shadow.system_timestamp + offset;
+		barrier();
+	} while (version != src->version);
+
+	return ret;
+}
+
+void pvclock_read_wallclock(struct kvm_wall_clock *wall_clock,
+			    struct kvm_vcpu_time_info *vcpu_time,
+			    struct timespec *ts)
+{
+	u32 version;
+	u64 delta;
+	struct timespec now;
+
+	/* get wallclock at system boot */
+	do {
+		version = wall_clock->wc_version;
+		rmb();		/* fetch version before time */
+		now.tv_sec  = wall_clock->wc_sec;
+		now.tv_nsec = wall_clock->wc_nsec;
+		rmb();		/* fetch time before checking version */
+	} while ((wall_clock->wc_version & 1) || (version != wall_clock->wc_version));
+
+	delta = pvclock_clocksource_read(vcpu_time);	/* time since system boot */
+	delta += now.tv_sec * (u64)NSEC_PER_SEC + now.tv_nsec;
+
+	now.tv_nsec = do_div(delta, NSEC_PER_SEC);
+	now.tv_sec = delta;
+
+	set_normalized_timespec(ts, now.tv_sec, now.tv_nsec);
+}
diff --git a/include/asm-x86/pvclock.h b/include/asm-x86/pvclock.h
new file mode 100644
index 0000000..2b9812f
--- /dev/null
+++ b/include/asm-x86/pvclock.h
@@ -0,0 +1,6 @@
+#include <linux/clocksource.h>
+#include <asm/kvm_para.h>
+cycle_t pvclock_clocksource_read(struct kvm_vcpu_time_info *src);
+void pvclock_read_wallclock(struct kvm_wall_clock *wall,
+			    struct kvm_vcpu_time_info *vcpu,
+			    struct timespec *ts);
-- 
1.5.4.1

Re: [kvm-devel] [RFC][PATCH 0/4] Enabled NMI support for KVM

From: Jan K. <jan...@si...> - 2008-05-08 11:31:34

Yang, Sheng wrote:
> Hi
> 
> This patchset enabled NMI support for KVM.
> 
> The first three patches enabled NMI for in-kernel irqchip and NMI supporting 
> on VMX. The last patch enabled NMI watchdog in linux, can be used to test the 
> NMI injection.
> 
> In fact, this series should also included Jan Kiszka's patch to enable NMI for 
> userspace irqchip, but now I got little trouble to get the merged version 
> work on my machine... We would post it as soon as we solved it. 
> 
> Another thing is the vmx_intr_assist() and do_interrupt_requests() got some 
> duplication. And the logic for normal interrupt and NMI is similiar, but 
> vmx_intr_assist() seems a little implicit... 
> 
> Any comments welcome!

To make rebasing my work easier (hope I find some time later today):
Your patches show up line-wrapped here. Could you check and repost?

Thanks,
Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux

[kvm-devel] stock ubuntu 8.04 amd64 network instability (kernel oops included)

From: Paolo L. <pa...@hy...> - 2008-05-08 10:54:37

FYI.
Please review it at:

https://bugs.launchpad.net/ubuntu/+source/kvm/+bug/228163

Regards
Paolo

[kvm-devel] [RFC][PATCH 2/4] KVM: IOAPIC/LAPIC: Enable NMI support

From: Yang, S. <she...@in...> - 2008-05-08 08:50:20

Attachments: 0002-KVM-IOAPIC-LAPIC-Enable-NMI-support.patch

From 3a5e332c32a2ec585447505e2503d91cf2fb2a54 Mon Sep 17 00:00:00 2001
From: Sheng Yang <she...@in...>
Date: Tue, 1 Apr 2008 14:47:59 +0800
Subject: [PATCH 2/4] KVM: IOAPIC/LAPIC: Enable NMI support


Signed-off-by: Sheng Yang <she...@in...>
---
 arch/x86/kvm/lapic.c       |    3 ++-
 arch/x86/kvm/x86.c         |    6 ++++++
 include/asm-x86/kvm_host.h |    4 ++++
 virt/kvm/ioapic.c          |   20 ++++++++++++++++++--
 4 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 6226fe0..df5aba9 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -351,8 +351,9 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int 
delivery_mode,
 	case APIC_DM_SMI:
 		printk(KERN_DEBUG "Ignoring guest SMI\n");
 		break;
+
 	case APIC_DM_NMI:
-		printk(KERN_DEBUG "Ignoring guest NMI\n");
+		kvm_inject_nmi(vcpu);
 		break;

 	case APIC_DM_INIT:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 979f983..f95ebdb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -173,6 +173,12 @@ void kvm_inject_page_fault(struct kvm_vcpu *vcpu, 
unsigned long addr,
 	kvm_queue_exception_e(vcpu, PF_VECTOR, error_code);
 }

+void kvm_inject_nmi(struct kvm_vcpu *vcpu)
+{
+	vcpu->arch.nmi_pending = 1;
+}
+EXPORT_SYMBOL_GPL(kvm_inject_nmi);
+
 void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 
error_code)
 {
 	WARN_ON(vcpu->arch.exception.pending);
diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h
index 1d8cd01..2df8416 100644
--- a/include/asm-x86/kvm_host.h
+++ b/include/asm-x86/kvm_host.h
@@ -285,6 +285,8 @@ struct kvm_vcpu_arch {
 	struct kvm_vcpu_time_info hv_clock;
 	unsigned int time_offset;
 	struct page *time_page;
+
+	bool nmi_pending;
 };

 struct kvm_mem_alias {
@@ -513,6 +515,8 @@ void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned 
nr, u32 error_code);
 void kvm_inject_page_fault(struct kvm_vcpu *vcpu, unsigned long cr2,
 			   u32 error_code);

+void kvm_inject_nmi(struct kvm_vcpu *vcpu);
+
 void fx_init(struct kvm_vcpu *vcpu);

 int emulator_read_std(unsigned long addr,
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index 4232fd7..99a1736 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -146,6 +146,11 @@ static void ioapic_inj_irq(struct kvm_ioapic *ioapic,
 	kvm_apic_set_irq(vcpu, vector, trig_mode);
 }

+static void ioapic_inj_nmi(struct kvm_vcpu *vcpu)
+{
+	kvm_inject_nmi(vcpu);
+}
+
 static u32 ioapic_get_delivery_bitmask(struct kvm_ioapic *ioapic, u8 dest,
 				       u8 dest_mode)
 {
@@ -239,8 +244,19 @@ static void ioapic_deliver(struct kvm_ioapic *ioapic, int 
irq)
 			}
 		}
 		break;
-
-		/* TODO: NMI */
+	case IOAPIC_NMI:
+		for (vcpu_id = 0; deliver_bitmask != 0; vcpu_id++) {
+			if (!(deliver_bitmask & (1 << vcpu_id)))
+				continue;
+			deliver_bitmask &= ~(1 << vcpu_id);
+			vcpu = ioapic->kvm->vcpus[vcpu_id];
+			if (vcpu)
+				ioapic_inj_nmi(vcpu);
+			else
+				ioapic_debug("NMI to vcpu %d failed\n",
+						vcpu->vcpu_id);
+		}
+		break;
 	default:
 		printk(KERN_WARNING "Unsupported delivery mode %d\n",
 		       delivery_mode);
--
1.5.5

[kvm-devel] [RFC][PATCH 1/4] KVM: LAPIC: Unified the duplicate calling of setting IRR

From: Yang, S. <she...@in...> - 2008-05-08 08:50:19

Attachments: 0001-KVM-LAPIC-Unified-the-duplicate-calling-of-setting.patch

From 650cad44069541fcd9fea8be6a78837e812b3dfd Mon Sep 17 00:00:00 2001
From: Sheng Yang <she...@in...>
Date: Thu, 8 May 2008 09:58:50 +0800
Subject: [PATCH 1/4] KVM: LAPIC: Unified the duplicate calling of setting IRR

It's strange got two callings of setting IRR seperately for IOAPIC and IPI in
lapic. The patch unified them into __apic_set_irq().

Signed-off-by: Sheng Yang <she...@in...>
---
 arch/x86/kvm/lapic.c |   69 +++++++++++++++++++++++--------------------------
 1 files changed, 32 insertions(+), 37 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 7652f88..6226fe0 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -184,20 +184,40 @@ int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_lapic_find_highest_irr);

-int kvm_apic_set_irq(struct kvm_vcpu *vcpu, u8 vec, u8 trig)
+static int __apic_set_irq(struct kvm_vcpu *vcpu, u8 vector, u8 trig_mode)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;

-	if (!apic_test_and_set_irr(vec, apic)) {
-		/* a new pending irq is set in IRR */
-		if (trig)
-			apic_set_vector(vec, apic->regs + APIC_TMR);
-		else
-			apic_clear_vector(vec, apic->regs + APIC_TMR);
-		kvm_vcpu_kick(apic->vcpu);
-		return 1;
+	/* FIXME add logic for vcpu on reset */
+	if (unlikely(!apic_enabled(apic)))
+		return 0;
+
+	if (apic_test_and_set_irr(vector, apic)) {
+		if (trig_mode)
+			apic_debug("level trig mode repeatedly for vector %d\n",
+					vector);
+		return 0;
 	}
-	return 0;
+
+	if (trig_mode) {
+		apic_debug("level trig mode for vector %d\n", vector);
+		apic_set_vector(vector, apic->regs + APIC_TMR);
+	} else
+		apic_clear_vector(vector, apic->regs + APIC_TMR);
+
+	if (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE)
+		kvm_vcpu_kick(vcpu);
+	else if (vcpu->arch.mp_state == KVM_MP_STATE_HALTED) {
+		vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
+		if (waitqueue_active(&vcpu->wq))
+			wake_up_interruptible(&vcpu->wq);
+	}
+	return 1;
+}
+
+int kvm_apic_set_irq(struct kvm_vcpu *vcpu, u8 vec, u8 trig)
+{
+	return __apic_set_irq(vcpu, vec, trig);
 }

 static inline int apic_find_highest_isr(struct kvm_lapic *apic)
@@ -315,38 +335,13 @@ static int apic_match_dest(struct kvm_vcpu *vcpu, struct 
kvm_lapic *source,
 static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
 			     int vector, int level, int trig_mode)
 {
-	int orig_irr, result = 0;
+	int result = 0;
 	struct kvm_vcpu *vcpu = apic->vcpu;

 	switch (delivery_mode) {
 	case APIC_DM_FIXED:
 	case APIC_DM_LOWEST:
-		/* FIXME add logic for vcpu on reset */
-		if (unlikely(!apic_enabled(apic)))
-			break;
-
-		orig_irr = apic_test_and_set_irr(vector, apic);
-		if (orig_irr && trig_mode) {
-			apic_debug("level trig mode repeatedly for vector %d",
-				   vector);
-			break;
-		}
-
-		if (trig_mode) {
-			apic_debug("level trig mode for vector %d", vector);
-			apic_set_vector(vector, apic->regs + APIC_TMR);
-		} else
-			apic_clear_vector(vector, apic->regs + APIC_TMR);
-
-		if (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE)
-			kvm_vcpu_kick(vcpu);
-		else if (vcpu->arch.mp_state == KVM_MP_STATE_HALTED) {
-			vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
-			if (waitqueue_active(&vcpu->wq))
-				wake_up_interruptible(&vcpu->wq);
-		}
-
-		result = (orig_irr == 0);
+		result = __apic_set_irq(vcpu, vector, trig_mode);
 		break;

 	case APIC_DM_REMRD:
--
1.5.5

[kvm-devel] [RFC][PATCH 4/4] KVM: Enable NMI Watchdog by PIT source

From: Yang, S. <she...@in...> - 2008-05-08 08:49:55

Attachments: 0004-KVM-Enable-NMI-Watchdog-by-PIT-source.patch

From 176a066e5fd0d98cb63e910c93d57f7ec2850105 Mon Sep 17 00:00:00 2001
From: Sheng Yang <she...@in...>
Date: Thu, 8 May 2008 16:00:59 +0800
Subject: [PATCH 4/4] KVM: Enable NMI Watchdog by PIT source

The NMI watchdog used LINT0 of LAPIC to deliver NMI. It didn't disable PIC 
after switch to IOAPIC, but program LVT0 of every LAPIC as NMI, then deliver 
PIT interrupt to LINT0. So NMIs got the same generate freqency as PIT 
interrupts.

The patch emulated this process and enabled NMI watchdog. For currently KVM, 
in fact we didn't connected PIC to LAPIC, so the patch bypassed PIC, sent the
signal directly to the LAPIC.

Signed-off-by: Sheng Yang <she...@in...>
---
 arch/x86/kvm/i8254.c |   16 ++++++++++++++++
 arch/x86/kvm/irq.h   |    1 +
 arch/x86/kvm/lapic.c |   32 ++++++++++++++++++++++++++++----
 arch/x86/kvm/vmx.c   |    1 -
 4 files changed, 45 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 6d6dc6c..7c6ea62 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -563,12 +563,28 @@ void kvm_free_pit(struct kvm *kvm)

 static void __inject_pit_timer_intr(struct kvm *kvm)
 {
+	int i;
+	struct kvm_vcpu *vcpu;
+
 	mutex_lock(&kvm->lock);
 	kvm_ioapic_set_irq(kvm->arch.vioapic, 0, 1);
 	kvm_ioapic_set_irq(kvm->arch.vioapic, 0, 0);
 	kvm_pic_set_irq(pic_irqchip(kvm), 0, 1);
 	kvm_pic_set_irq(pic_irqchip(kvm), 0, 0);
 	mutex_unlock(&kvm->lock);
+
+	/*
+	 * For NMI watchdog in IOAPIC mode
+	 * After IOAPIC enabled, NMI watchdog programmed LVT0 of lapic as NMI,
+	 * then a timer interrupt through IOAPIC and a NMI through PIC to lapic
+	 * would be delivered when PIT time up.
+	 */
+	for (i = 0; i < KVM_MAX_VCPUS; ++i) {
+		vcpu = kvm->vcpus[i];
+		if (!vcpu)
+			continue;
+		kvm_apic_local_deliver(vcpu, APIC_LVT0);
+	}
 }

 void kvm_inject_pit_timer_irqs(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h
index 1802134..7066660 100644
--- a/arch/x86/kvm/irq.h
+++ b/arch/x86/kvm/irq.h
@@ -84,6 +84,7 @@ void kvm_timer_intr_post(struct kvm_vcpu *vcpu, int vec);
 void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu);
 void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu);
 void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu);
+int kvm_apic_local_deliver(struct kvm_vcpu *vcpu, int lvt_type);

 int pit_has_pending_timer(struct kvm_vcpu *vcpu);
 int apic_has_pending_timer(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index df5aba9..d790996 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -383,6 +383,14 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int 
delivery_mode,
 		}
 		break;

+	case APIC_DM_EXTINT:
+		/*
+		 * Should only be called by kvm_apic_local_deliver() with LVT0,
+		 * before NMI watchdog was enabled. Already handled by
+		 * kvm_apic_accept_pic_intr().
+		 */
+		break;
+
 	default:
 		printk(KERN_ERR "TODO: unsupported delivery mode %x\n",
 		       delivery_mode);
@@ -748,6 +756,9 @@ static void apic_mmio_write(struct kvm_io_device *this,
 	case APIC_LVTTHMR:
 	case APIC_LVTPC:
 	case APIC_LVT0:
+		if (val == APIC_DM_NMI)
+			apic_debug("Receive NMI setting on APIC_LVT0 "
+				"for cpu %d\n", apic->vcpu->vcpu_id);
 	case APIC_LVT1:
 	case APIC_LVTERR:
 		/* TODO: Check vector */
@@ -963,12 +974,25 @@ int apic_has_pending_timer(struct kvm_vcpu *vcpu)
 	return 0;
 }

-static int __inject_apic_timer_irq(struct kvm_lapic *apic)
+int kvm_apic_local_deliver(struct kvm_vcpu *vcpu, int lvt_type)
 {
-	int vector;
+	struct kvm_lapic *apic = vcpu->arch.apic;
+	int vector, mode, trig_mode;
+	u32 reg;
+
+	if (apic && apic_enabled(apic)) {
+		reg = apic_get_reg(apic, lvt_type);
+		vector = reg & APIC_VECTOR_MASK;
+		mode = reg & APIC_MODE_MASK;
+		trig_mode = reg & APIC_LVT_LEVEL_TRIGGER;
+		return __apic_accept_irq(apic, mode, vector, 1, trig_mode);
+	}
+	return 0;
+}

-	vector = apic_lvt_vector(apic, APIC_LVTT);
-	return __apic_accept_irq(apic, APIC_DM_FIXED, vector, 1, 0);
+static int __inject_apic_timer_irq(struct kvm_lapic *apic)
+{
+	return kvm_apic_local_deliver(apic->vcpu, APIC_LVTT);
 }

 static enum hrtimer_restart apic_timer_fn(struct hrtimer *data)
--
1.5.5

[kvm-devel] [RFC][PATCH 3/4] KVM: VMX: Enable NMI with in-kernel irqchip

From: Yang, S. <she...@in...> - 2008-05-08 08:49:47

Attachments: 0003-KVM-VMX-Enable-NMI-with-in-kernel-irqchip.patch

From 4942a5c35c97e5edb6fe1303e04fb86f25cac345 Mon Sep 17 00:00:00 2001
From: Sheng Yang <she...@in...>
Date: Thu, 8 May 2008 16:00:57 +0800
Subject: [PATCH 3/4] KVM: VMX: Enable NMI with in-kernel irqchip


Signed-off-by: Sheng Yang <she...@in...>
---
 arch/x86/kvm/vmx.c         |  133 
+++++++++++++++++++++++++++++++++++++-------
 arch/x86/kvm/vmx.h         |   12 ++++-
 arch/x86/kvm/x86.c         |    1 +
 include/asm-x86/kvm_host.h |    1 +
 4 files changed, 125 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 26c4f02..55fe525 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -260,6 +260,12 @@ static inline int cpu_has_vmx_vpid(void)
 		SECONDARY_EXEC_ENABLE_VPID);
 }

+static inline int cpu_has_virtual_nmis(void)
+{
+	return (vmcs_config.pin_based_exec_ctrl &
+		PIN_BASED_VIRTUAL_NMIS);
+}
+
 static int __find_msr_index(struct vcpu_vmx *vmx, u32 msr)
 {
 	int i;
@@ -1068,7 +1074,7 @@ static __init int setup_vmcs_config(struct vmcs_config 
*vmcs_conf)
 	u32 _vmentry_control = 0;

 	min = PIN_BASED_EXT_INTR_MASK | PIN_BASED_NMI_EXITING;
-	opt = 0;
+	opt = PIN_BASED_VIRTUAL_NMIS;
 	if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_PINBASED_CTLS,
 				&_pin_based_exec_control) < 0)
 		return -EIO;
@@ -2110,6 +2116,13 @@ static void vmx_inject_irq(struct kvm_vcpu *vcpu, int 
irq)
 			irq | INTR_TYPE_EXT_INTR | INTR_INFO_VALID_MASK);
 }

+static void vmx_inject_nmi(struct kvm_vcpu *vcpu)
+{
+	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD,
+			INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK | NMI_VECTOR);
+	vcpu->arch.nmi_pending = 0;
+}
+
 static void kvm_do_inject_irq(struct kvm_vcpu *vcpu)
 {
 	int word_index = __ffs(vcpu->arch.irq_summary);
@@ -2146,9 +2159,11 @@ static void do_interrupt_requests(struct kvm_vcpu 
*vcpu,
 		/*
 		 * Interrupts blocked.  Wait for unblock.
 		 */
-		cpu_based_vm_exec_control |= CPU_BASED_VIRTUAL_INTR_PENDING;
+		cpu_based_vm_exec_control |=
+			CPU_BASED_VIRTUAL_INTR_PENDING;
 	else
-		cpu_based_vm_exec_control &= ~CPU_BASED_VIRTUAL_INTR_PENDING;
+		cpu_based_vm_exec_control &=
+			~CPU_BASED_VIRTUAL_INTR_PENDING;
 	vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cpu_based_vm_exec_control);
 }

@@ -2633,6 +2648,19 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu, 
struct kvm_run *kvm_run)
 	return 1;
 }

+static int handle_nmi_window(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
+{
+	u32 cpu_based_vm_exec_control;
+
+	/* clear pending NMI */
+	cpu_based_vm_exec_control = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
+	cpu_based_vm_exec_control &= ~CPU_BASED_VIRTUAL_NMI_PENDING;
+	vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cpu_based_vm_exec_control);
+	++vcpu->stat.nmi_window_exits;
+
+	return 1;
+}
+
 /*
  * The exit handlers return 1 if the exit was handled fully and guest 
execution
  * may resume.  Otherwise they set the kvm_run parameter to indicate what 
needs
@@ -2643,6 +2671,7 @@ static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu 
*vcpu,
 	[EXIT_REASON_EXCEPTION_NMI]           = handle_exception,
 	[EXIT_REASON_EXTERNAL_INTERRUPT]      = handle_external_interrupt,
 	[EXIT_REASON_TRIPLE_FAULT]            = handle_triple_fault,
+	[EXIT_REASON_NMI_WINDOW]	      = handle_nmi_window,
 	[EXIT_REASON_IO_INSTRUCTION]          = handle_io,
 	[EXIT_REASON_CR_ACCESS]               = handle_cr,
 	[EXIT_REASON_DR_ACCESS]               = handle_dr,
@@ -2730,17 +2759,52 @@ static void enable_irq_window(struct kvm_vcpu *vcpu)
 	vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cpu_based_vm_exec_control);
 }

+static void enable_nmi_window(struct kvm_vcpu *vcpu)
+{
+	u32 cpu_based_vm_exec_control;
+
+	if (!cpu_has_virtual_nmis())
+		return;
+
+	cpu_based_vm_exec_control = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
+	cpu_based_vm_exec_control |= CPU_BASED_VIRTUAL_NMI_PENDING;
+	vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cpu_based_vm_exec_control);
+}
+
+static int vmx_nmi_enabled(struct kvm_vcpu *vcpu)
+{
+	u32 guest_intr = vmcs_read32(GUEST_INTERRUPTIBILITY_INFO);
+	return !(guest_intr & (GUEST_INTR_STATE_NMI |
+			       GUEST_INTR_STATE_MOV_SS |
+			       GUEST_INTR_STATE_STI));
+}
+
+static int vmx_irq_enabled(struct kvm_vcpu *vcpu)
+{
+	u32 guest_intr = vmcs_read32(GUEST_INTERRUPTIBILITY_INFO);
+	return (!(guest_intr & (GUEST_INTR_STATE_MOV_SS |
+			       GUEST_INTR_STATE_STI)) &&
+		(vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF));
+}
+
+static void enable_intr_window(struct kvm_vcpu *vcpu)
+{
+	if (vcpu->arch.nmi_pending)
+		enable_nmi_window(vcpu);
+	else if (kvm_cpu_has_interrupt(vcpu))
+		enable_irq_window(vcpu);
+}
+
 static void vmx_intr_assist(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
-	u32 idtv_info_field, intr_info_field;
-	int has_ext_irq, interrupt_window_open;
+	u32 idtv_info_field, intr_info_field, exit_intr_info_field;
 	int vector;

 	update_tpr_threshold(vcpu);

-	has_ext_irq = kvm_cpu_has_interrupt(vcpu);
 	intr_info_field = vmcs_read32(VM_ENTRY_INTR_INFO_FIELD);
+	exit_intr_info_field = vmcs_read32(VM_EXIT_INTR_INFO);
 	idtv_info_field = vmx->idt_vectoring_info;
 	if (intr_info_field & INTR_INFO_VALID_MASK) {
 		if (idtv_info_field & INTR_INFO_VALID_MASK) {
@@ -2748,8 +2812,7 @@ static void vmx_intr_assist(struct kvm_vcpu *vcpu)
 			if (printk_ratelimit())
 				printk(KERN_ERR "Fault when IDT_Vectoring\n");
 		}
-		if (has_ext_irq)
-			enable_irq_window(vcpu);
+		enable_intr_window(vcpu);
 		return;
 	}
 	if (unlikely(idtv_info_field & INTR_INFO_VALID_MASK)) {
@@ -2759,30 +2822,56 @@ static void vmx_intr_assist(struct kvm_vcpu *vcpu)
 			u8 vect = idtv_info_field & VECTORING_INFO_VECTOR_MASK;

 			vmx_inject_irq(vcpu, vect);
-			if (unlikely(has_ext_irq))
-				enable_irq_window(vcpu);
+			enable_intr_window(vcpu);
 			return;
 		}

 		KVMTRACE_1D(REDELIVER_EVT, vcpu, idtv_info_field, handler);

-		vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, idtv_info_field);
+		/*
+		 * SDM 3: 25.7.1.2
+		 * Clear bit "block by NMI" before VM entry if a NMI delivery
+		 * faulted.
+		 */
+		if ((idtv_info_field & VECTORING_INFO_TYPE_MASK)
+		    == INTR_TYPE_NMI_INTR && cpu_has_virtual_nmis())
+			vmcs_write32(GUEST_INTERRUPTIBILITY_INFO,
+				vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) &
+				~GUEST_INTR_STATE_NMI);
+
+		vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, idtv_info_field
+				& ~INTR_INFO_RESVD_BITS_MASK);
 		vmcs_write32(VM_ENTRY_INSTRUCTION_LEN,
 				vmcs_read32(VM_EXIT_INSTRUCTION_LEN));

 		if (unlikely(idtv_info_field & INTR_INFO_DELIVER_CODE_MASK))
 			vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE,
 				vmcs_read32(IDT_VECTORING_ERROR_CODE));
-		if (unlikely(has_ext_irq))
-			enable_irq_window(vcpu);
+		enable_intr_window(vcpu);
 		return;
 	}
-	if (!has_ext_irq)
+	if (cpu_has_virtual_nmis()) {
+		/*
+		 * SDM 3: 25.7.1.2
+		 * Re-set bit "block by NMI" before VM entry if vmexit caused by
+		 * a guest IRET fault.
+		 */
+		if ((exit_intr_info_field & INTR_INFO_UNBLOCK_NMI) &&
+		    (exit_intr_info_field & INTR_INFO_VECTOR_MASK) != 8)
+			vmcs_write32(GUEST_INTERRUPTIBILITY_INFO,
+				vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) |
+				GUEST_INTR_STATE_NMI);
+		else if (vcpu->arch.nmi_pending) {
+			if (vmx_nmi_enabled(vcpu))
+				vmx_inject_nmi(vcpu);
+			enable_intr_window(vcpu);
+			return;
+		}
+
+	}
+	if (!kvm_cpu_has_interrupt(vcpu))
 		return;
-	interrupt_window_open =
-		((vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF) &&
-		 (vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) & 3) == 0);
-	if (interrupt_window_open) {
+	if (vmx_irq_enabled(vcpu)) {
 		vector = kvm_cpu_get_interrupt(vcpu);
 		vmx_inject_irq(vcpu, vector);
 		kvm_timer_intr_post(vcpu, vector);
@@ -2943,7 +3032,8 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu, struct 
kvm_run *kvm_run)
 		fixup_rmode_irq(vmx);

 	vcpu->arch.interrupt_window_open =
-		(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) & 3) == 0;
+		(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) &
+		 (GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS)) == 0;

 	asm("mov %0, %%ds; mov %0, %%es" : : "r"(__USER_DS));
 	vmx->launched = 1;
@@ -2951,9 +3041,10 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu, struct 
kvm_run *kvm_run)
 	intr_info = vmcs_read32(VM_EXIT_INTR_INFO);

 	/* We need to handle NMIs before interrupts are enabled */
-	if ((intr_info & INTR_INFO_INTR_TYPE_MASK) == 0x200) { /* nmi */
+	if ((intr_info & INTR_INFO_INTR_TYPE_MASK) == 0x200) {
 		KVMTRACE_0D(NMI, vcpu, handler);
-		asm("int $2");
+		if (!cpu_has_virtual_nmis())
+			asm("int $2");
 	}
 }

diff --git a/arch/x86/kvm/vmx.h b/arch/x86/kvm/vmx.h
index 79d94c6..425a134 100644
--- a/arch/x86/kvm/vmx.h
+++ b/arch/x86/kvm/vmx.h
@@ -40,6 +40,7 @@
 #define CPU_BASED_CR8_LOAD_EXITING              0x00080000
 #define CPU_BASED_CR8_STORE_EXITING             0x00100000
 #define CPU_BASED_TPR_SHADOW                    0x00200000
+#define CPU_BASED_VIRTUAL_NMI_PENDING		0x00400000
 #define CPU_BASED_MOV_DR_EXITING                0x00800000
 #define CPU_BASED_UNCOND_IO_EXITING             0x01000000
 #define CPU_BASED_USE_IO_BITMAPS                0x02000000
@@ -216,7 +217,7 @@ enum vmcs_field {
 #define EXIT_REASON_TRIPLE_FAULT        2

 #define EXIT_REASON_PENDING_INTERRUPT   7
-
+#define EXIT_REASON_NMI_WINDOW		8
 #define EXIT_REASON_TASK_SWITCH         9
 #define EXIT_REASON_CPUID               10
 #define EXIT_REASON_HLT                 12
@@ -251,7 +252,9 @@ enum vmcs_field {
 #define INTR_INFO_VECTOR_MASK           0xff            /* 7:0 */
 #define INTR_INFO_INTR_TYPE_MASK        0x700           /* 10:8 */
 #define INTR_INFO_DELIVER_CODE_MASK     0x800           /* 11 */
+#define INTR_INFO_UNBLOCK_NMI		0x1000		/* 12 */
 #define INTR_INFO_VALID_MASK            0x80000000      /* 31 */
+#define INTR_INFO_RESVD_BITS_MASK       0x7ffff000

 #define VECTORING_INFO_VECTOR_MASK           	INTR_INFO_VECTOR_MASK
 #define VECTORING_INFO_TYPE_MASK        	INTR_INFO_INTR_TYPE_MASK
@@ -259,9 +262,16 @@ enum vmcs_field {
 #define VECTORING_INFO_VALID_MASK       	INTR_INFO_VALID_MASK

 #define INTR_TYPE_EXT_INTR              (0 << 8) /* external interrupt */
+#define INTR_TYPE_NMI_INTR		(2 << 8) /* NMI */
 #define INTR_TYPE_EXCEPTION             (3 << 8) /* processor exception */
 #define INTR_TYPE_SOFT_INTR             (4 << 8) /* software interrupt */

+/* GUEST_INTERRUPTIBILITY_INFO flags. */
+#define GUEST_INTR_STATE_STI		0x00000001
+#define GUEST_INTR_STATE_MOV_SS		0x00000002
+#define GUEST_INTR_STATE_SMI		0x00000004
+#define GUEST_INTR_STATE_NMI		0x00000008
+
 /*
  * Exit Qualifications for MOV for Control Register Access
  */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f95ebdb..16e9cd2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -72,6 +72,7 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 	{ "mmio_exits", VCPU_STAT(mmio_exits) },
 	{ "signal_exits", VCPU_STAT(signal_exits) },
 	{ "irq_window", VCPU_STAT(irq_window_exits) },
+	{ "nmi_window", VCPU_STAT(nmi_window_exits) },
 	{ "halt_exits", VCPU_STAT(halt_exits) },
 	{ "halt_wakeup", VCPU_STAT(halt_wakeup) },
 	{ "hypercalls", VCPU_STAT(hypercalls) },
diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h
index 2df8416..277216c 100644
--- a/include/asm-x86/kvm_host.h
+++ b/include/asm-x86/kvm_host.h
@@ -344,6 +344,7 @@ struct kvm_vcpu_stat {
 	u32 mmio_exits;
 	u32 signal_exits;
 	u32 irq_window_exits;
+	u32 nmi_window_exits;
 	u32 halt_exits;
 	u32 halt_wakeup;
 	u32 request_irq_exits;
--
1.5.5

[kvm-devel] [RFC][PATCH 0/4] Enabled NMI support for KVM

From: Yang, S. <she...@in...> - 2008-05-08 08:49:41

Hi

This patchset enabled NMI support for KVM.

The first three patches enabled NMI for in-kernel irqchip and NMI supporting 
on VMX. The last patch enabled NMI watchdog in linux, can be used to test the 
NMI injection.

In fact, this series should also included Jan Kiszka's patch to enable NMI for 
userspace irqchip, but now I got little trouble to get the merged version 
work on my machine... We would post it as soon as we solved it. 

Another thing is the vmx_intr_assist() and do_interrupt_requests() got some 
duplication. And the logic for normal interrupt and NMI is similiar, but 
vmx_intr_assist() seems a little implicit... 

Any comments welcome!

--
Thanks
Yang, Sheng

[kvm-devel] [PATCH] qemu-kvm: fix monitor and gdbstub deadlocks

From: Jan K. <jan...@si...> - 2008-05-08 08:30:28

Some monitor commands as well as the vm_stop() issued by the gdbstub on
external interruption so far deadlock on some vcpu locks in the kernel.
Patch below resolve the issue by temporarily or permanently stopping all
vcpu threads before issuing the related KVM IOCTLs.

Among other things, this patch now allows to break into the BIOS or
other guest code spinning in the vcpu (*) and to use things like "info
cpus" in the monitor.

Signed-off-by: Jan Kiszka <jan...@si...>
---
 qemu/gdbstub.c  |   12 ++++--------
 qemu/monitor.c  |    3 +--
 qemu/qemu-kvm.c |   39 +++++++++++++++++++++++++--------------
 qemu/vl.c       |    2 +-
 4 files changed, 31 insertions(+), 25 deletions(-)

Index: b/qemu/monitor.c
===================================================================
--- a/qemu/monitor.c
+++ b/qemu/monitor.c
@@ -286,8 +286,7 @@ static CPUState *mon_get_cpu(void)
         mon_set_cpu(0);
     }
 
-    if (kvm_enabled())
-	kvm_save_registers(mon_cpu);
+    kvm_save_registers(mon_cpu);
 
     return mon_cpu;
 }
Index: b/qemu/qemu-kvm.c
===================================================================
--- a/qemu/qemu-kvm.c
+++ b/qemu/qemu-kvm.c
@@ -130,18 +130,6 @@ static int pre_kvm_run(void *opaque, int
     return 0;
 }
 
-void kvm_load_registers(CPUState *env)
-{
-    if (kvm_enabled())
-	kvm_arch_load_regs(env);
-}
-
-void kvm_save_registers(CPUState *env)
-{
-    if (kvm_enabled())
-	kvm_arch_save_regs(env);
-}
-
 int kvm_cpu_exec(CPUState *env)
 {
     int r;
@@ -290,6 +278,24 @@ static void kvm_vm_state_change_handler(
 	pause_all_threads();
 }
 
+void kvm_load_registers(CPUState *env)
+{
+    if (kvm_enabled())
+	kvm_arch_load_regs(env);
+}
+
+void kvm_save_registers(CPUState *env)
+{
+    if (!kvm_enabled())
+        return;
+
+    if (vm_running)
+        pause_all_threads();
+    kvm_arch_save_regs(env);
+    if (vm_running)
+        resume_all_threads();
+}
+
 static void update_regs_for_sipi(CPUState *env)
 {
     kvm_arch_update_regs_for_sipi(env);
@@ -698,7 +704,7 @@ int kvm_qemu_init_env(CPUState *cenv)
 int kvm_update_debugger(CPUState *env)
 {
     struct kvm_debug_guest dbg;
-    int i;
+    int i, r;
 
     dbg.enabled = 0;
     if (env->nb_breakpoints || env->singlestep_enabled) {
@@ -709,7 +715,12 @@ int kvm_update_debugger(CPUState *env)
 	}
 	dbg.singlestep = env->singlestep_enabled;
     }
-    return kvm_guest_debug(kvm_context, env->cpu_index, &dbg);
+    if (vm_running)
+        pause_all_threads();
+    r = kvm_guest_debug(kvm_context, env->cpu_index, &dbg);
+    if (vm_running)
+        resume_all_threads();
+    return r;
 }
 
 
Index: b/qemu/vl.c
===================================================================
--- a/qemu/vl.c
+++ b/qemu/vl.c
@@ -7165,12 +7165,12 @@ void vm_stop(int reason)
 {
     if (vm_running) {
         cpu_disable_ticks();
-        vm_running = 0;
         if (reason != 0) {
             if (vm_stop_cb) {
                 vm_stop_cb(vm_stop_opaque, reason);
             }
         }
+        vm_running = 0;
         vm_state_notify(0);
     }
 }
Index: b/qemu/gdbstub.c
===================================================================
--- a/qemu/gdbstub.c
+++ b/qemu/gdbstub.c
@@ -904,8 +904,7 @@ static int gdb_handle_packet(GDBState *s
             addr = strtoull(p, (char **)&p, 16);
 #if defined(TARGET_I386)
             env->eip = addr;
-	    if (kvm_enabled())
-		kvm_load_registers(env);
+            kvm_load_registers(env);
 #elif defined (TARGET_PPC)
             env->nip = addr;
 #elif defined (TARGET_SPARC)
@@ -928,8 +927,7 @@ static int gdb_handle_packet(GDBState *s
             addr = strtoull(p, (char **)&p, 16);
 #if defined(TARGET_I386)
             env->eip = addr;
-	    if (kvm_enabled())
-		kvm_load_registers(env);
+            kvm_load_registers(env);
 #elif defined (TARGET_PPC)
             env->nip = addr;
 #elif defined (TARGET_SPARC)
@@ -973,8 +971,7 @@ static int gdb_handle_packet(GDBState *s
         }
         break;
     case 'g':
-	if (kvm_enabled())
-	    kvm_save_registers(env);
+        kvm_save_registers(env);
         reg_size = cpu_gdb_read_registers(env, mem_buf);
         memtohex(buf, mem_buf, reg_size);
         put_packet(s, buf);
@@ -984,8 +981,7 @@ static int gdb_handle_packet(GDBState *s
         len = strlen(p) / 2;
         hextomem((uint8_t *)registers, p, len);
         cpu_gdb_write_registers(env, mem_buf, len);
-	if (kvm_enabled())
-	    kvm_load_registers(env);
+        kvm_load_registers(env);
         put_packet(s, "OK");
         break;
     case 'm':


(*) Fully functional guest debugging and guest debug register support is 
also working here, patches just wait to be reformatted...

[kvm-devel] [PATCH] qemu-kvm: fix guest resetting

From: Jan K. <jan...@si...> - 2008-05-08 08:30:20

Resetting guests used to be racy, deadlock-prone, or simply broken (for
SMP). This patch fixes the issues - at least for me on x86 (tested on
Intel SMP host, UP and SMP guest, in-kernel und user space irqchip,
guest- and monitor-issued resets). Note that ia64 and powerpc may need
to look into the SMP thing as well (=>kvm_arch_cpu_reset).

At this chance, the patch also cleans up some unneeded reset fragments.

Signed-off-by: Jan Kiszka <jan...@si...>
---
 qemu/qemu-kvm-ia64.c    |    4 ++++
 qemu/qemu-kvm-powerpc.c |    4 ++++
 qemu/qemu-kvm-x86.c     |   16 ++++++++++++++++
 qemu/qemu-kvm.c         |   38 ++++++++++++++++++--------------------
 qemu/qemu-kvm.h         |    1 +
 qemu/vl.c               |   11 ++++++-----
 6 files changed, 49 insertions(+), 25 deletions(-)

Index: b/qemu/qemu-kvm.c
===================================================================
--- a/qemu/qemu-kvm.c
+++ b/qemu/qemu-kvm.c
@@ -28,8 +28,6 @@ kvm_context_t kvm_context;
 
 extern int smp_cpus;
 
-static int qemu_kvm_reset_requested;
-
 pthread_mutex_t qemu_mutex = PTHREAD_MUTEX_INITIALIZER;
 pthread_cond_t qemu_aio_cond = PTHREAD_COND_INITIALIZER;
 pthread_cond_t qemu_vcpu_cond = PTHREAD_COND_INITIALIZER;
@@ -56,7 +54,6 @@ struct vcpu_info {
     int signalled;
     int stop;
     int stopped;
-    int reload_regs;
     int created;
 } vcpu_info[256];
 
@@ -257,17 +254,20 @@ static int all_threads_paused(void)
 
 static void pause_all_threads(void)
 {
+    CPUState *cpu_single = cpu_single_env;
     int i;
 
-    for (i = 0; i < smp_cpus; ++i) {
-	vcpu_info[i].stop = 1;
-	pthread_kill(vcpu_info[i].thread, SIG_IPI);
-    }
+    for (i = 0; i < smp_cpus; ++i)
+	if (!cpu_single || cpu_single->cpu_index != i) {
+	    vcpu_info[i].stop = 1;
+	    pthread_kill(vcpu_info[i].thread, SIG_IPI);
+	}
+
     while (!all_threads_paused()) {
 	pthread_mutex_unlock(&qemu_mutex);
-	kvm_eat_signal(&io_signal_table, NULL, 1000);
+	kvm_eat_signal(&io_signal_table, cpu_single, 1000);
 	pthread_mutex_lock(&qemu_mutex);
-	cpu_single_env = NULL;
+	cpu_single_env = cpu_single;
     }
 }
 
@@ -317,11 +317,18 @@ void qemu_kvm_system_reset_request(void)
 {
     int i;
 
+    pause_all_threads();
+
+    qemu_system_reset();
+
+    for (i = 0; i < smp_cpus; ++i)
+	kvm_arch_cpu_reset(vcpu_info[i].env);
+
     for (i = 0; i < smp_cpus; ++i) {
-	vcpu_info[i].reload_regs = 1;
+	vcpu_info[i].stop = 0;
+	vcpu_info[i].stopped = 0;
 	pthread_kill(vcpu_info[i].thread, SIG_IPI);
     }
-    qemu_system_reset();
 }
 
 static int kvm_main_loop_cpu(CPUState *env)
@@ -354,11 +361,6 @@ static int kvm_main_loop_cpu(CPUState *e
 	    kvm_cpu_exec(env);
 	env->interrupt_request &= ~CPU_INTERRUPT_EXIT;
 	kvm_main_loop_wait(env, 0);
-        if (info->reload_regs) {
-	    info->reload_regs = 0;
-	    if (env->cpu_index == 0) /* ap needs to be placed in INIT */
-		kvm_arch_load_regs(env);
-	}
     }
     pthread_mutex_unlock(&qemu_mutex);
     return 0;
@@ -467,10 +469,6 @@ int kvm_main_loop(void)
             break;
         else if (qemu_powerdown_requested())
             qemu_system_powerdown();
-        else if (qemu_reset_requested()) {
-            pthread_kill(vcpu_info[0].thread, SIG_IPI);
-            qemu_kvm_reset_requested = 1;
-        }
         pthread_mutex_unlock(&qemu_mutex);
     }
 
Index: b/qemu/qemu-kvm-ia64.c
===================================================================
--- a/qemu/qemu-kvm-ia64.c
+++ b/qemu/qemu-kvm-ia64.c
@@ -61,3 +61,7 @@ int kvm_arch_try_push_interrupts(void *o
 void kvm_arch_update_regs_for_sipi(CPUState *env)
 {
 }
+
+void kvm_arch_cpu_reset(CPUState *env)
+{
+}
Index: b/qemu/qemu-kvm-powerpc.c
===================================================================
--- a/qemu/qemu-kvm-powerpc.c
+++ b/qemu/qemu-kvm-powerpc.c
@@ -213,3 +213,7 @@ int handle_powerpc_dcr_write(int vcpu, u
 
     return 0; /* XXX ignore failed DCR ops */
 }
+
+void kvm_arch_cpu_reset(CPUState *env)
+{
+}
Index: b/qemu/qemu-kvm-x86.c
===================================================================
--- a/qemu/qemu-kvm-x86.c
+++ b/qemu/qemu-kvm-x86.c
@@ -689,3 +689,19 @@ int handle_tpr_access(void *opaque, int 
     kvm_tpr_access_report(cpu_single_env, rip, is_write);
     return 0;
 }
+
+void kvm_arch_cpu_reset(CPUState *env)
+{
+    struct kvm_mp_state mp_state = { .mp_state = KVM_MP_STATE_UNINITIALIZED };
+
+    kvm_arch_load_regs(env);
+    if (env->cpu_index != 0) {
+	if (kvm_irqchip_in_kernel(kvm_context))
+	    kvm_set_mpstate(kvm_context, env->cpu_index, &mp_state);
+	else {
+	    env->interrupt_request &= ~CPU_INTERRUPT_HARD;
+	    env->hflags |= HF_HALTED_MASK;
+	    env->exception_index = EXCP_HLT;
+	}
+    }
+}
Index: b/qemu/qemu-kvm.h
===================================================================
--- a/qemu/qemu-kvm.h
+++ b/qemu/qemu-kvm.h
@@ -55,6 +55,7 @@ void kvm_arch_post_kvm_run(void *opaque,
 int kvm_arch_has_work(CPUState *env);
 int kvm_arch_try_push_interrupts(void *opaque);
 void kvm_arch_update_regs_for_sipi(CPUState *env);
+void kvm_arch_cpu_reset(CPUState *env);
 
 CPUState *qemu_kvm_cpu_env(int index);
 
Index: b/qemu/vl.c
===================================================================
--- a/qemu/vl.c
+++ b/qemu/vl.c
@@ -7235,6 +7235,12 @@ void qemu_system_reset(void)
 
 void qemu_system_reset_request(void)
 {
+#ifdef USE_KVM
+    if (kvm_allowed && !no_reboot) {
+        qemu_kvm_system_reset_request();
+        return;
+    }
+#endif
     if (no_reboot) {
         shutdown_requested = 1;
     } else {
@@ -7242,11 +7248,6 @@ void qemu_system_reset_request(void)
     }
     if (cpu_single_env)
         cpu_interrupt(cpu_single_env, CPU_INTERRUPT_EXIT);
-#ifdef USE_KVM
-    if (kvm_allowed)
-        if (!no_reboot)
-            qemu_kvm_system_reset_request();
-#endif
 }
 
 void qemu_system_shutdown_request(void)

Re: [kvm-devel] [PATCH 08 of 11] anon-vma-rwsem

From: Andrea A. <an...@qu...> - 2008-05-08 05:49:32

On Thu, May 08, 2008 at 08:30:20AM +0300, Pekka Enberg wrote:
> On Thu, May 8, 2008 at 8:27 AM, Pekka Enberg <pe...@cs...> wrote:
> > You might want to read carefully what Linus wrote:
> >
> >  > The one that already has a 4 byte padding thing on x86-64 just after the
> >  > spinlock? And that on 32-bit x86 (with less than 256 CPU's) would have two
> >  > bytes of padding if we didn't just make the spinlock type unconditionally
> >  > 32 bits rather than the 16 bits we actually _use_?
> >
> >  So you need to add the flag _after_ ->lock and _before_ ->head....
> 
> Oh should have taken my morning coffee first, before ->lock works
> obviously as well.

Sorry, Linus's right: I didn't realize the "after the spinlock" was
literally after the spinlock, I didn't see the 4 byte padding when I
read the code and put the flag:1 in. If put between ->lock and ->head
it doesn't take more memory on x86-64 as described literlly. So the
next would be to find another place like that in the address
space. Perhaps after the private_lock using the same trick or perhaps
the slab alignment won't actually alter the number of slabs per page
regardless.

I leave that to Christoph, he's surely better than me at doing this, I
give it up entirely and I consider my attempt to merge a total failure
and I strongly regret it.

On a side note the anon_vma will change to this when XPMEM support is
compiled in:

 struct anon_vma {
-	spinlock_t lock;	/* Serialize access to vma list */
+	atomic_t refcount;	/* vmas on the list */
+	struct rw_semaphore sem;/* Serialize access to vma list */
 	struct list_head head;	   /* List of private "related" vmas
	*/
 };

not sure if it'll grow in size or not after that but let's say it's
not a big deal.

Re: [kvm-devel] [PATCH 08 of 11] anon-vma-rwsem

From: Pekka E. <pe...@cs...> - 2008-05-08 05:30:16

On Thu, May 8, 2008 at 8:27 AM, Pekka Enberg <pe...@cs...> wrote:
> You might want to read carefully what Linus wrote:
>
>  > The one that already has a 4 byte padding thing on x86-64 just after the
>  > spinlock? And that on 32-bit x86 (with less than 256 CPU's) would have two
>  > bytes of padding if we didn't just make the spinlock type unconditionally
>  > 32 bits rather than the 16 bits we actually _use_?
>
>  So you need to add the flag _after_ ->lock and _before_ ->head....

Oh should have taken my morning coffee first, before ->lock works
obviously as well.

Re: [kvm-devel] [PATCH 08 of 11] anon-vma-rwsem

From: Pekka E. <pe...@cs...> - 2008-05-08 05:27:44

On Thu, May 8, 2008 at 8:20 AM, Andrea Arcangeli <an...@qu...> wrote:
>  Actually I looked both at the struct and at the slab alignment just in
>  case it was changed recently. Now after reading your mail I also
>  compiled it just in case.
>
>  @@ -27,6 +27,7 @@ struct anon_vma {
>   struct anon_vma {
>
>         spinlock_t lock;        /* Serialize access to vma list */
>
>         struct list_head head;  /* List of private "related" vmas */
>  +       int flag:1;
>   };

You might want to read carefully what Linus wrote:

> The one that already has a 4 byte padding thing on x86-64 just after the
> spinlock? And that on 32-bit x86 (with less than 256 CPU's) would have two
> bytes of padding if we didn't just make the spinlock type unconditionally
> 32 bits rather than the 16 bits we actually _use_?

So you need to add the flag _after_ ->lock and _before_ ->head....

                        Pekka

157 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 11 12 13 14 15 .. 703 > >> (Page 13 of 703)

2006	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (33)	Nov (325)	Dec (320)
2007	Jan (484)	Feb (438)	Mar (407)	Apr (713)	May (831)	Jun (806)	Jul (1023)	Aug (1184)	Sep (1118)	Oct (1461)	Nov (1224)	Dec (1042)
2008	Jan (1449)	Feb (1110)	Mar (1428)	Apr (1643)	May (682)	Jun	Jul	Aug	Sep	Oct	Nov	Dec