Thread: [kvm-devel] [PATCH 0/3] KVM clock, new iteration

Brought to you by: avik, mtosatti

kvm-devel

[kvm-devel] [PATCH 0/3] KVM clock, new iteration

From: Glauber de O. C. <gc...@re...> - 2008-01-11 13:11:02

Hi folks,

A lot of time has been elapsed since I last sent a version of kvm clock,
with a lot of things in the middle, including my vacations and a nasty bug
in the clock itself ;-)

Here it is a new version. I'm following avi's last suggestion of using an msr
to make it more migration-proof. If it's not approved, we can easily switch back
to the hypercall mechanism. That's the easy part ;-)

Clock updates are done every time we're scheduled. It is _not_ enough to check
if we're changing cpus, because the tsc will change its frequency when the cpu is
idle (if using the mwait idle). So after a long idle period, one cpu will end up stepping
ahead causing a backward time (remember the bug I talked about? :-) )

As I fell it ready for inclusion (which obviously does not mean it is perfect, nor that
I don't value any further comments ), I'm also posting the userspace patches in a separate
series that will follow shortly.

[kvm-devel] [PATCH 1/3] put kvm_para.h include outside __KERNEL__

From: Glauber de O. C. <gc...@re...> - 2008-01-11 13:11:11

kvm_para.h potentially contains definitions that are to be used by kvm-userspace,
so it should not be included inside the __KERNEL__ block. To protect its own data structures,
kvm_para.h already includes its own __KERNEL__ block.

Signed-off-by: Glauber de Oliveira Costa <gc...@re...>
---
 include/linux/kvm_para.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h
index 6af91a5..5497aac 100644
--- a/include/linux/kvm_para.h
+++ b/include/linux/kvm_para.h
@@ -14,12 +14,12 @@
 
 #define KVM_HC_VAPIC_POLL_IRQ            1
 
-#ifdef __KERNEL__
 /*
  * hypercalls use architecture specific
  */
 #include <asm/kvm_para.h>
 
+#ifdef __KERNEL__
 static inline int kvm_para_has_feature(unsigned int feature)
 {
 	if (kvm_arch_para_features() & (1UL << feature))
-- 
1.5.0.6

[kvm-devel] [PATCH 2/3] [PATCH] kvmclock - the host part.

From: Glauber de O. C. <gc...@re...> - 2008-01-11 13:11:12

This is the host part of kvm clocksource implementation. As it does
not include clockevents, it is a fairly simple implementation. We
only have to register a per-vcpu area, and start writting to it periodically.

The area is binary compatible with xen, as we use the same shadow_info structure.

Signed-off-by: Glauber de Oliveira Costa <gc...@re...>
---
 arch/x86/kvm/x86.c         |   64 +++++++++++++++++++++++++++++++++++++++++++-
 include/asm-x86/kvm_host.h |    5 +++
 include/asm-x86/kvm_para.h |    5 +++
 include/linux/kvm.h        |    1 +
 4 files changed, 74 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8a90403..d3fd920 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -19,6 +19,7 @@
 #include "irq.h"
 #include "mmu.h"
 
+#include <linux/clocksource.h>
 #include <linux/kvm.h>
 #include <linux/fs.h>
 #include <linux/vmalloc.h>
@@ -412,7 +413,7 @@ static u32 msrs_to_save[] = {
 #ifdef CONFIG_X86_64
 	MSR_CSTAR, MSR_KERNEL_GS_BASE, MSR_SYSCALL_MASK, MSR_LSTAR,
 #endif
-	MSR_IA32_TIME_STAMP_COUNTER,
+	MSR_IA32_TIME_STAMP_COUNTER, MSR_PARAVIRT_CLOCK,
 };
 
 static unsigned num_msrs_to_save;
@@ -467,6 +468,50 @@ static int do_set_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
 	return kvm_set_msr(vcpu, index, *data);
 }
 
+#define WC_OFFSET offsetof(struct shared_info, wc_version)
+
+static void kvm_write_guest_time(struct kvm_vcpu *v)
+{
+	struct timespec ts, wc_ts;
+	int wc_args[3]; /* version, wc_sec, wc_nsec */
+	unsigned long flags;
+	struct kvm_vcpu_arch *vcpu = &v->arch;
+
+	if (!vcpu->shared_info)
+		return;
+
+	/* Make the update of both version numbers visible to guest */
+	wc_args[0] = ++vcpu->wc_version;
+	kvm_write_guest(v->kvm, vcpu->shared_info + WC_OFFSET, wc_args,
+			sizeof(wc_args));
+	vcpu->hv_clock.version++;
+	kvm_write_guest(v->kvm, vcpu->clock_addr, &vcpu->hv_clock,
+			sizeof(vcpu->hv_clock));
+
+	/* Keep irq disabled to prevent changes to the clock */
+	local_irq_save(flags);
+	kvm_get_msr(v, MSR_IA32_TIME_STAMP_COUNTER,
+			  &vcpu->hv_clock.tsc_timestamp);
+	wc_ts = current_kernel_time();
+	ktime_get_ts(&ts);
+	local_irq_restore(flags);
+
+	/* With all the info we got, fill in the values */
+	wc_args[1] = wc_ts.tv_sec;
+	wc_args[2] = wc_ts.tv_nsec;
+	wc_args[0] = ++vcpu->wc_version;
+
+	vcpu->hv_clock.system_time = ts.tv_nsec +
+				     (NSEC_PER_SEC * (u64)ts.tv_sec);
+	vcpu->hv_clock.version++;
+
+	/* And finally, let the guest see them */
+	kvm_write_guest(v->kvm, vcpu->shared_info + WC_OFFSET, wc_args,
+			sizeof(wc_args));
+	kvm_write_guest(v->kvm, vcpu->clock_addr, &vcpu->hv_clock,
+			sizeof(vcpu->hv_clock));
+}
+
 
 int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
 {
@@ -494,6 +539,18 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
 	case MSR_IA32_MISC_ENABLE:
 		vcpu->arch.ia32_misc_enable_msr = data;
 		break;
+	case MSR_PARAVIRT_CLOCK: {
+		vcpu->arch.shared_info = data;
+
+		vcpu->arch.clock_addr = data + offsetof(struct vcpu_info, time)
+				 + sizeof(struct vcpu_info) * vcpu->vcpu_id;
+
+		vcpu->arch.hv_clock.tsc_to_system_mul =
+				clocksource_khz2mult(tsc_khz, 22);
+		vcpu->arch.hv_clock.tsc_shift = 22;
+		kvm_write_guest_time(vcpu);
+		break;
+	}
 	default:
 		pr_unimpl(vcpu, "unhandled wrmsr: 0x%x data %llx\n", msr, data);
 		return 1;
@@ -553,6 +610,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
 		data = vcpu->arch.shadow_efer;
 		break;
 #endif
+	case MSR_PARAVIRT_CLOCK:
+		data = vcpu->arch.shared_info;
+		break;
 	default:
 		pr_unimpl(vcpu, "unhandled rdmsr: 0x%x\n", msr);
 		return 1;
@@ -680,6 +740,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 	case KVM_CAP_USER_MEMORY:
 	case KVM_CAP_SET_TSS_ADDR:
 	case KVM_CAP_EXT_CPUID:
+	case KVM_CAP_CLOCKSOURCE:
 		r = 1;
 		break;
 	case KVM_CAP_VAPIC:
@@ -737,6 +798,7 @@ out:
 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
 	kvm_x86_ops->vcpu_load(vcpu, cpu);
+	kvm_write_guest_time(vcpu);
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h
index d6db0de..bbc4b51 100644
--- a/include/asm-x86/kvm_host.h
+++ b/include/asm-x86/kvm_host.h
@@ -261,6 +261,11 @@ struct kvm_vcpu_arch {
 	/* emulate context */
 
 	struct x86_emulate_ctxt emulate_ctxt;
+
+	struct vcpu_time_info hv_clock;
+	gpa_t shared_info;
+	gpa_t clock_addr; /* so we avoid computing it every time */
+	int wc_version;
 };
 
 struct kvm_mem_alias {
diff --git a/include/asm-x86/kvm_para.h b/include/asm-x86/kvm_para.h
index c6f3fd8..abe412a 100644
--- a/include/asm-x86/kvm_para.h
+++ b/include/asm-x86/kvm_para.h
@@ -1,5 +1,6 @@
 #ifndef __X86_KVM_PARA_H
 #define __X86_KVM_PARA_H
+#include <xen/interface/xen.h>
 
 /* This CPUID returns the signature 'KVMKVMKVM' in ebx, ecx, and edx.  It
  * should be used to determine that a VM is running under KVM.
@@ -10,9 +11,13 @@
  * paravirtualization, the appropriate feature bit should be checked.
  */
 #define KVM_CPUID_FEATURES	0x40000001
+#define KVM_FEATURE_CLOCKSOURCE 0
 
+#define MSR_PARAVIRT_CLOCK	0x11
 #ifdef __KERNEL__
 #include <asm/processor.h>
+extern void kvmclock_init(void);
+
 
 /* This instruction is vmcall.  On non-VT architectures, it will generate a
  * trap that we will then rewrite to the appropriate instruction.
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 4de4fd2..78ce53f 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -232,6 +232,7 @@ struct kvm_vapic_addr {
 #define KVM_CAP_SET_TSS_ADDR 4
 #define KVM_CAP_EXT_CPUID 5
 #define KVM_CAP_VAPIC 6
+#define KVM_CAP_CLOCKSOURCE 7
 
 /*
  * ioctls for VM fds
-- 
1.5.0.6

[kvm-devel] [PATCH 3/3] [PATCH] kvmclock implementation, the guest part.

From: Glauber de O. C. <gc...@re...> - 2008-01-11 13:11:13

This is the guest part of kvm clock implementation
It does not do tsc-only timing, as tsc can have deltas
between cpus, and it did not seem worthy to me to keep
adjusting them.

We do use it, however, for fine-grained adjustment.

Other than that, time comes from the host.

Signed-off-by: Glauber de Oliveira Costa <gc...@re...>
---
 arch/x86/Kconfig            |   10 +++
 arch/x86/kernel/Makefile_32 |    1 +
 arch/x86/kernel/kvmclock.c  |  160 +++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/setup_32.c  |    5 ++
 arch/x86/xen/time.c         |    6 +-
 5 files changed, 179 insertions(+), 3 deletions(-)
 create mode 100644 arch/x86/kernel/kvmclock.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index ab2df55..968315e 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -350,6 +350,16 @@ config VMI
 	  at the moment), by linking the kernel to a GPL-ed ROM module
 	  provided by the hypervisor.
 
+config KVM_CLOCK
+	bool "KVM paravirtualized clock"
+	select PARAVIRT
+	help
+	  Turning on this option will allow you to run a paravirtualized clock
+	  when running over the KVM hypervisor. Instead of relying on a PIT
+	  (or probably other) emulation by the underlying device model, the host
+	  provides the guest with timing infrastructure, as time of day, and
+	  timer expiration.
+
 source "arch/x86/lguest/Kconfig"
 
 endif
diff --git a/arch/x86/kernel/Makefile_32 b/arch/x86/kernel/Makefile_32
index a7bc93c..f6332b6 100644
--- a/arch/x86/kernel/Makefile_32
+++ b/arch/x86/kernel/Makefile_32
@@ -44,6 +44,7 @@ obj-$(CONFIG_K8_NB)		+= k8.o
 obj-$(CONFIG_MGEODE_LX)		+= geode_32.o mfgpt_32.o
 
 obj-$(CONFIG_VMI)		+= vmi_32.o vmiclock_32.o
+obj-$(CONFIG_KVM_CLOCK)		+= kvmclock.o
 obj-$(CONFIG_PARAVIRT)		+= paravirt_32.o
 obj-y				+= pcspeaker.o
 
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
new file mode 100644
index 0000000..f2094f4
--- /dev/null
+++ b/arch/x86/kernel/kvmclock.c
@@ -0,0 +1,160 @@
+/*  KVM paravirtual clock driver. A clocksource implementation
+    Copyright (C) 2007 Glauber de Oliveira Costa, Red Hat Inc.
+
+    This program is free software; you can redistribute it and/or modify
+    it under the terms of the GNU General Public License as published by
+    the Free Software Foundation; either version 2 of the License, or
+    (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with this program; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+*/
+
+#include <linux/clocksource.h>
+#include <linux/kvm_para.h>
+#include <asm/arch_hooks.h>
+#include <asm/msr.h>
+
+
+#define KVM_SCALE 22
+
+static int kvmclock = 1;
+
+static int parse_no_kvmclock(char *arg)
+{
+	kvmclock = 0;
+	return 0;
+}
+early_param("no-kvmclock", parse_no_kvmclock);
+
+struct shared_info shared_info __attribute__((__aligned__(PAGE_SIZE)));
+
+/* The hypervisor will put information about time periodically here */
+DEFINE_PER_CPU(struct vcpu_time_info *, hv_clock);
+#define get_clock(cpu, field) per_cpu(hv_clock, cpu)->field
+
+static inline u64 kvm_get_delta(u64 last_tsc)
+{
+	int cpu = smp_processor_id();
+	u64 delta = native_read_tsc() - last_tsc;
+	return (delta * get_clock(cpu, tsc_to_system_mul)) >> KVM_SCALE;
+}
+
+/*
+ * The wallclock is the time of day when we booted. Since then, some time may
+ * have elapsed since the hypervisor wrote the data. So we try to account for
+ * that. Even if the tsc is not accurate, it gives us a more accurate timing
+ * than not adjusting at all
+ */
+unsigned long kvm_get_wallclock(void)
+{
+	u32 wc_sec, wc_nsec;
+	u64 delta, last_tsc;
+	struct timespec ts;
+	int version, nsec, cpu = smp_processor_id();
+
+	do {
+		version = shared_info.wc_version;
+		rmb();
+		wc_sec = shared_info.wc_sec;
+		wc_nsec = shared_info.wc_nsec;
+		last_tsc = get_clock(cpu, tsc_timestamp);
+		rmb();
+	} while ((shared_info.wc_version != version) || (version & 1));
+
+	delta = kvm_get_delta(last_tsc);
+	delta += wc_nsec;
+	nsec = do_div(delta, NSEC_PER_SEC);
+	set_normalized_timespec(&ts, wc_sec + delta, nsec);
+	/*
+	 * Of all mechanisms of time adjustment I've tested, this one
+	 * was the champion!
+	 */
+	return ts.tv_sec + 1;
+}
+
+int kvm_set_wallclock(unsigned long now)
+{
+	return 0;
+}
+
+/*
+ * This is our read_clock function. The host puts an tsc timestamp each time
+ * it updates a new time, and then we can use it to derive a slightly more
+ * precise notion of elapsed time, converted to nanoseconds.
+ */
+static cycle_t kvm_clock_read(void)
+{
+	u64 last_tsc, now;
+	u32 version;
+	int cpu;
+
+	preempt_disable();
+	cpu = smp_processor_id();
+
+	do {
+		version = get_clock(cpu, version);
+		rmb();
+		last_tsc = get_clock(cpu, tsc_timestamp);
+		now = get_clock(cpu, system_time);
+		rmb();
+	} while ((get_clock(cpu, version) != version) || (version & 1));
+
+	now += kvm_get_delta(last_tsc);
+	preempt_enable();
+
+	return now;
+}
+
+static struct clocksource kvm_clock = {
+	.name = "kvm-clock",
+	.read = kvm_clock_read,
+	.rating = 400,
+	.mask = CLOCKSOURCE_MASK(64),
+	.mult = 1 << KVM_SCALE,
+	.shift = KVM_SCALE,
+	.flags = CLOCK_SOURCE_IS_CONTINUOUS,
+};
+
+static int kvm_register_clock(void)
+{
+	int cpu = smp_processor_id();
+	u64 kvm_clock_info = (u64)(u64 *)&shared_info;
+	kvm_clock_info = __pa(kvm_clock_info);
+	__get_cpu_var(hv_clock) = &(shared_info.vcpu_info[cpu].time);
+
+	return native_write_msr_safe(MSR_PARAVIRT_CLOCK, kvm_clock_info);
+}
+
+static void kvm_setup_secondary_clock(void)
+{
+	/*
+	 * Now that the first cpu already had this clocksource initialized,
+	 * we shouldn't fail.
+	 */
+	WARN_ON(kvm_register_clock());
+	/* ok, done with our trickery, call native */
+	setup_secondary_APIC_clock();
+}
+
+void __init kvmclock_init(void)
+{
+	if (!kvm_para_available())
+		return;
+
+	if (kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)) {
+		if (kvm_register_clock())
+			return;
+		pv_time_ops.get_wallclock = kvm_get_wallclock;
+		pv_time_ops.set_wallclock = kvm_set_wallclock;
+		pv_time_ops.sched_clock = kvm_clock_read;
+		pv_apic_ops.setup_secondary_clock = kvm_setup_secondary_clock;
+		clocksource_register(&kvm_clock);
+	}
+}
diff --git a/arch/x86/kernel/setup_32.c b/arch/x86/kernel/setup_32.c
index 9c24b45..89c0eb2 100644
--- a/arch/x86/kernel/setup_32.c
+++ b/arch/x86/kernel/setup_32.c
@@ -44,6 +44,7 @@
 #include <linux/crash_dump.h>
 #include <linux/dmi.h>
 #include <linux/pfn.h>
+#include <linux/kvm_para.h>
 
 #include <video/edid.h>
 
@@ -614,6 +615,10 @@ void __init setup_arch(char **cmdline_p)
 
 	max_low_pfn = setup_memory();
 
+#ifdef CONFIG_KVM_CLOCK
+	kvmclock_init();
+#endif
+
 #ifdef CONFIG_VMI
 	/*
 	 * Must be after max_low_pfn is determined, and before kernel
diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index d083ff5..7728b87 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -28,7 +28,7 @@
 #define TIMER_SLOP	100000
 #define NS_PER_TICK	(1000000000LL / HZ)
 
-static cycle_t xen_clocksource_read(void);
+cycle_t xen_clocksource_read(void);
 
 /* These are perodically updated in shared_info, and then copied here. */
 struct shadow_time_info {
@@ -304,7 +304,7 @@ static u64 get_nsec_offset(struct shadow_time_info *shadow)
 	return scale_delta(delta, shadow->tsc_to_nsec_mul, shadow->tsc_shift);
 }
 
-static cycle_t xen_clocksource_read(void)
+cycle_t xen_clocksource_read(void)
 {
 	struct shadow_time_info *shadow = &get_cpu_var(shadow_time);
 	cycle_t ret;
@@ -322,7 +322,7 @@ static cycle_t xen_clocksource_read(void)
 	return ret;
 }
 
-static void xen_read_wallclock(struct timespec *ts)
+void xen_read_wallclock(struct timespec *ts)
 {
 	const struct shared_info *s = HYPERVISOR_shared_info;
 	u32 version;
-- 
1.5.0.6

Re: [kvm-devel] [PATCH 3/3] [PATCH] kvmclock implementation, the guest part.

From: Gerd H. <kr...@re...> - 2008-01-11 13:23:09

  Hi,

> diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
> index d083ff5..7728b87 100644
> --- a/arch/x86/xen/time.c
> +++ b/arch/x86/xen/time.c
> -static cycle_t xen_clocksource_read(void);
> +cycle_t xen_clocksource_read(void);

Huh?  You kill the static, but don't use the functions anywhere?  Looks
like half-done code sharing with xen paravirt clock ...

cheers,
  Gerd

Re: [kvm-devel] [PATCH 3/3] [PATCH] kvmclock implementation, the guest part.

From: Glauber de O. C. <gc...@re...> - 2008-01-11 13:34:11

Gerd Hoffmann wrote:
>   Hi,
> 
>> diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
>> index d083ff5..7728b87 100644
>> --- a/arch/x86/xen/time.c
>> +++ b/arch/x86/xen/time.c
>> -static cycle_t xen_clocksource_read(void);
>> +cycle_t xen_clocksource_read(void);
> 
> Huh?  You kill the static, but don't use the functions anywhere?  Looks
> like half-done code sharing with xen paravirt clock ...
> 
> cheers,
>   Gerd
> 
It's called experimentation. I was sure I deleted this part of the patch 
altogheter, but I must have
missed it. Thanks for the point out.

Re: [kvm-devel] [PATCH 2/3] [PATCH] kvmclock - the host part.

From: Anthony L. <an...@co...> - 2008-01-11 15:47:18

Glauber de Oliveira Costa wrote:
> This is the host part of kvm clocksource implementation. As it does
> not include clockevents, it is a fairly simple implementation. We
> only have to register a per-vcpu area, and start writting to it periodically.
>
> The area is binary compatible with xen, as we use the same shadow_info structure.
>
> diff --git a/include/asm-x86/kvm_para.h b/include/asm-x86/kvm_para.h
> index c6f3fd8..abe412a 100644
> --- a/include/asm-x86/kvm_para.h
> +++ b/include/asm-x86/kvm_para.h
> @@ -1,5 +1,6 @@
>  #ifndef __X86_KVM_PARA_H
>  #define __X86_KVM_PARA_H
> +#include <xen/interface/xen.h>

Can we abstract that out into a neutral header instead of including the 
Xen headers directly in KVM.  Please rename the structure too to 
something neutral.

Including something as generic as xen/interface/xen.h when CONFIG_XEN 
may not be set is not a good thing, I believe.

Regards,

Anthony Liguori

>  
>  /* This CPUID returns the signature 'KVMKVMKVM' in ebx, ecx, and edx.  It
>   * should be used to determine that a VM is running under KVM.
> @@ -10,9 +11,13 @@
>   * paravirtualization, the appropriate feature bit should be checked.
>   */
>  #define KVM_CPUID_FEATURES	0x40000001
> +#define KVM_FEATURE_CLOCKSOURCE 0
>  
> +#define MSR_PARAVIRT_CLOCK	0x11
>  #ifdef __KERNEL__
>  #include <asm/processor.h>
> +extern void kvmclock_init(void);
> +
>  
>  /* This instruction is vmcall.  On non-VT architectures, it will generate a
>   * trap that we will then rewrite to the appropriate instruction.
> diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> index 4de4fd2..78ce53f 100644
> --- a/include/linux/kvm.h
> +++ b/include/linux/kvm.h
> @@ -232,6 +232,7 @@ struct kvm_vapic_addr {
>  #define KVM_CAP_SET_TSS_ADDR 4
>  #define KVM_CAP_EXT_CPUID 5
>  #define KVM_CAP_VAPIC 6
> +#define KVM_CAP_CLOCKSOURCE 7
>  
>  /*
>   * ioctls for VM fds
>

Re: [kvm-devel] [PATCH 2/3] [PATCH] kvmclock - the host part.

From: Avi K. <av...@qu...> - 2008-01-12 20:45:06

Glauber de Oliveira Costa wrote:
> This is the host part of kvm clocksource implementation. As it does
> not include clockevents, it is a fairly simple implementation. We
> only have to register a per-vcpu area, and start writting to it periodically.
>
> The area is binary compatible with xen, as we use the same shadow_info structure.
>
> @@ -412,7 +413,7 @@ static u32 msrs_to_save[] = {
>  #ifdef CONFIG_X86_64
>  	MSR_CSTAR, MSR_KERNEL_GS_BASE, MSR_SYSCALL_MASK, MSR_LSTAR,
>  #endif
> -	MSR_IA32_TIME_STAMP_COUNTER,
> +	MSR_IA32_TIME_STAMP_COUNTER, MSR_PARAVIRT_CLOCK,
>  };
>   

Let us be good citizens and put the msr into our private MSR_KVM_* 
namespace.

>  
>  
> +#define WC_OFFSET offsetof(struct shared_info, wc_version)
> +
> +static void kvm_write_guest_time(struct kvm_vcpu *v)
> +{
> +	struct timespec ts, wc_ts;
> +	int wc_args[3]; /* version, wc_sec, wc_nsec */
> +	unsigned long flags;
> +	struct kvm_vcpu_arch *vcpu = &v->arch;
> +
> +	if (!vcpu->shared_info)
> +		return;
> +
> +	/* Make the update of both version numbers visible to guest */
> +	wc_args[0] = ++vcpu->wc_version;
> +	kvm_write_guest(v->kvm, vcpu->shared_info + WC_OFFSET, wc_args,
> +			sizeof(wc_args));
> +	vcpu->hv_clock.version++;
> +	kvm_write_guest(v->kvm, vcpu->clock_addr, &vcpu->hv_clock,
> +			sizeof(vcpu->hv_clock));
> +
> +	/* Keep irq disabled to prevent changes to the clock */
> +	local_irq_save(flags);
> +	kvm_get_msr(v, MSR_IA32_TIME_STAMP_COUNTER,
> +			  &vcpu->hv_clock.tsc_timestamp);
> +	wc_ts = current_kernel_time();
> +	ktime_get_ts(&ts);
> +	local_irq_restore(flags);
> +
> +	/* With all the info we got, fill in the values */
> +	wc_args[1] = wc_ts.tv_sec;
> +	wc_args[2] = wc_ts.tv_nsec;
> +	wc_args[0] = ++vcpu->wc_version;
> +
> +	vcpu->hv_clock.system_time = ts.tv_nsec +
> +				     (NSEC_PER_SEC * (u64)ts.tv_sec);
> +	vcpu->hv_clock.version++;
> +
> +	/* And finally, let the guest see them */
> +	kvm_write_guest(v->kvm, vcpu->shared_info + WC_OFFSET, wc_args,
> +			sizeof(wc_args));
> +	kvm_write_guest(v->kvm, vcpu->clock_addr, &vcpu->hv_clock,
> +			sizeof(vcpu->hv_clock));
> +}
> +
>   

Why write things twice?  This is per-vcpu, and the guest cannot see 
anything between the two writes.

Also, kvm_write_guest() needs mmap_sem for read.  Suggest you grab the 
page using gfn_to_page() and kmap_atomic() it instead.

-- 
Any sufficiently difficult bug is indistinguishable from a feature.

Re: [kvm-devel] [PATCH 2/3] [PATCH] kvmclock - the host part.

From: Avi K. <av...@qu...> - 2008-01-12 20:55:50

Anthony Liguori wrote:
> Glauber de Oliveira Costa wrote:
>   
>> This is the host part of kvm clocksource implementation. As it does
>> not include clockevents, it is a fairly simple implementation. We
>> only have to register a per-vcpu area, and start writting to it periodically.
>>
>> The area is binary compatible with xen, as we use the same shadow_info structure.
>>
>> diff --git a/include/asm-x86/kvm_para.h b/include/asm-x86/kvm_para.h
>> index c6f3fd8..abe412a 100644
>> --- a/include/asm-x86/kvm_para.h
>> +++ b/include/asm-x86/kvm_para.h
>> @@ -1,5 +1,6 @@
>>  #ifndef __X86_KVM_PARA_H
>>  #define __X86_KVM_PARA_H
>> +#include <xen/interface/xen.h>
>>     
>
> Can we abstract that out into a neutral header instead of including the 
> Xen headers directly in KVM.  Please rename the structure too to 
> something neutral.
>
> Including something as generic as xen/interface/xen.h when CONFIG_XEN 
> may not be set is not a good thing, I believe.
>   

Well, one of the motivations behind this is to actually supply a Xen 
clock to Xen guests.  So maybe we can call the feature a "kvm xenclock" 
instead and feel justified in including Xen headers.

However, you are probably right in that we are getting into a dependency 
and kconfig hell we do not yet deserve.  Not even mentioning that 
CONFIG_XEN is a guest thing while kvmclock is also a host thing.

So we are probably better of using a non-Xen structure that accidentally 
happens to be binary compatible with the Xen interface.  Stranger things 
have happened.

If we agree on this, please define _only_ the time-related parts so we 
have a decoupled interface.  We'll need two msrs:  one for wc and one 
for vcpu time.

-- 
Any sufficiently difficult bug is indistinguishable from a feature.

Re: [kvm-devel] [PATCH 2/3] [PATCH] kvmclock - the host part.

From: Anthony L. <an...@co...> - 2008-01-12 22:50:52

Avi Kivity wrote:
> Anthony Liguori wrote:
>> Glauber de Oliveira Costa wrote:
>>  
>>> This is the host part of kvm clocksource implementation. As it does
>>> not include clockevents, it is a fairly simple implementation. We
>>> only have to register a per-vcpu area, and start writting to it 
>>> periodically.
>>>
>>> The area is binary compatible with xen, as we use the same 
>>> shadow_info structure.
>>>
>>> diff --git a/include/asm-x86/kvm_para.h b/include/asm-x86/kvm_para.h
>>> index c6f3fd8..abe412a 100644
>>> --- a/include/asm-x86/kvm_para.h
>>> +++ b/include/asm-x86/kvm_para.h
>>> @@ -1,5 +1,6 @@
>>>  #ifndef __X86_KVM_PARA_H
>>>  #define __X86_KVM_PARA_H
>>> +#include <xen/interface/xen.h>
>>>     
>>
>> Can we abstract that out into a neutral header instead of including 
>> the Xen headers directly in KVM.  Please rename the structure too to 
>> something neutral.
>>
>> Including something as generic as xen/interface/xen.h when CONFIG_XEN 
>> may not be set is not a good thing, I believe.
>>   
>
> Well, one of the motivations behind this is to actually supply a Xen 
> clock to Xen guests.  So maybe we can call the feature a "kvm 
> xenclock" instead and feel justified in including Xen headers.
>
> However, you are probably right in that we are getting into a 
> dependency and kconfig hell we do not yet deserve.  Not even 
> mentioning that CONFIG_XEN is a guest thing while kvmclock is also a 
> host thing.
>
> So we are probably better of using a non-Xen structure that 
> accidentally happens to be binary compatible with the Xen interface.  
> Stranger things have happened.

Yes, this is what I meant.  My concerns is dependencies and namespace 
pollution.  I think using the same interface is a very Good Thing.

Regards,

Anthony Liguori

> If we agree on this, please define _only_ the time-related parts so we 
> have a decoupled interface.  We'll need two msrs:  one for wc and one 
> for vcpu time.
>

Re: [kvm-devel] [PATCH 1/3] put kvm_para.h include outside __KERNEL__

From: Amit S. <ami...@qu...> - 2008-01-11 15:06:26

On Friday 11 January 2008 18:40:54 Glauber de Oliveira Costa wrote:
> kvm_para.h potentially contains definitions that are to be used by
> kvm-userspace, so it should not be included inside the __KERNEL__ block. To
> protect its own data structures, kvm_para.h already includes its own
> __KERNEL__ block.
>
> Signed-off-by: Glauber de Oliveira Costa <gc...@re...>

Acked-by: Amit Shah <ami...@qu...>

Re: [kvm-devel] [PATCH 0/3] KVM clock, new iteration

From: Anthony L. <an...@co...> - 2008-01-11 15:49:42

Glauber de Oliveira Costa wrote:
> Hi folks,
>
> A lot of time has been elapsed since I last sent a version of kvm clock,
> with a lot of things in the middle, including my vacations and a nasty bug
> in the clock itself ;-)
>
> Here it is a new version. I'm following avi's last suggestion of using an msr
> to make it more migration-proof. If it's not approved, we can easily switch back
> to the hypercall mechanism. That's the easy part ;-)
>
> Clock updates are done every time we're scheduled. It is _not_ enough to check
> if we're changing cpus, because the tsc will change its frequency when the cpu is
> idle (if using the mwait idle). So after a long idle period, one cpu will end up stepping
> ahead causing a backward time (remember the bug I talked about? :-) )
>
> As I fell it ready for inclusion (which obviously does not mean it is perfect, nor that
> I don't value any further comments ), I'm also posting the userspace patches in a separate
> series that will follow shortly.
>
>   

This patch series is looking good.  I like the switch to an MSR.

Regards,

Anthony Liguori