Thread: [kvm-devel] [PATCH 0/3] Kvm clocksource, new spin

Brought to you by: avik, mtosatti

kvm-devel

[kvm-devel] [PATCH 0/3] Kvm clocksource, new spin

From: Glauber de O. C. <gc...@re...> - 2007-11-08 21:38:50

Hi folks,

Here's a new spin of the clocksource implementation.
In this new version:
* followed avi's suggestion of:
  - letting the cpu itself register its memory area.
  - using a gfn instead of a phys addr as a parameter, to be sure we can 
    cover the whole memory area
  - write guest time at exits.

Also, I 'm not using an anonymous struct in the kvm_hv_clock union, so the
vcpu struct can grab just what it needs, and not the whole padding the guest needs

This is it.

Have fun

[kvm-devel] [PATCH 1/3] include files for kvmclock

From: Glauber de O. C. <gc...@re...> - 2007-11-08 21:38:53

This patch introduces the include files for kvm clock.
They'll be needed for both guest and host part.

Signed-off-by: Glauber de Oliveira Costa <gc...@re...>
---
 include/asm-x86/kvm_para.h |   25 +++++++++++++++++++++++++
 include/linux/kvm.h        |    1 +
 include/linux/kvm_para.h   |    2 ++
 3 files changed, 28 insertions(+), 0 deletions(-)

diff --git a/include/asm-x86/kvm_para.h b/include/asm-x86/kvm_para.h
index c6f3fd8..0f6b813 100644
--- a/include/asm-x86/kvm_para.h
+++ b/include/asm-x86/kvm_para.h
@@ -10,15 +10,40 @@
  * paravirtualization, the appropriate feature bit should be checked.
  */
 #define KVM_CPUID_FEATURES	0x40000001
+#define KVM_FEATURE_CLOCKSOURCE 0
 
 #ifdef __KERNEL__
 #include <asm/processor.h>
+extern void kvmclock_init(void);
+
+/*
+ * Guest has page alignment and padding requirements. At the host, it will
+ * only lead to wasted space at the vcpu struct. For this reason, the struct
+ * is not anonymous
+ */
+union kvm_hv_clock {
+	struct kvm_hv_clock_s {
+		u64 tsc_mult;
+		u64 now_ns;
+		/* That's the wall clock, not the water closet */
+		u64 wc_sec;
+		u64 last_tsc;
+		/* At first, we could use the tsc value as a marker, but Jeremy
+		 * well noted that it will cause us locking problems in 32-bit
+		 * sys, so we have a special version field */
+		u32 version;
+	} fields;
+	char page_align[PAGE_SIZE];
+};
+
 
 /* This instruction is vmcall.  On non-VT architectures, it will generate a
  * trap that we will then rewrite to the appropriate instruction.
  */
 #define KVM_HYPERCALL ".byte 0x0f,0x01,0xc1"
 
+#define KVM_HCALL_REGISTER_CLOCK	1
+
 /* For KVM hypercalls, a three-byte sequence of either the vmrun or the vmmrun
  * instruction.  The hypervisor may replace it with something else but only the
  * instructions are guaranteed to be supported.
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 71d33d6..9862241 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -359,6 +359,7 @@ struct kvm_signal_mask {
 #define KVM_CAP_MMU_SHADOW_CACHE_CONTROL 2
 #define KVM_CAP_USER_MEMORY 3
 #define KVM_CAP_SET_TSS_ADDR 4
+#define KVM_CAP_CLOCKSOURCE	  5
 
 /*
  * ioctls for VM fds
diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h
index e4db25f..094efc7 100644
--- a/include/linux/kvm_para.h
+++ b/include/linux/kvm_para.h
@@ -11,6 +11,8 @@
 
 /* Return values for hypercalls */
 #define KVM_ENOSYS		1000
+#define KVM_EINVAL		1019
+#define KVM_ENODEV		1022
 
 #ifdef __KERNEL__
 /*
-- 
1.5.0.6

[kvm-devel] [PATCH 2/3] kvmclock - the host part.

From: Glauber de O. C. <gc...@re...> - 2007-11-08 21:39:00

This is the host part of kvm clocksource implementation. As it does
not include clockevents, it is a fairly simple implementation. We
only have to register a per-vcpu area, and start writting to it periodically.

Signed-off-by: Glauber de Oliveira Costa <gc...@re...>
---
 drivers/kvm/kvm_main.c |    1 +
 drivers/kvm/x86.c      |   32 ++++++++++++++++++++++++++++++++
 drivers/kvm/x86.h      |    4 ++++
 3 files changed, 37 insertions(+), 0 deletions(-)

diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index d095002..c2c79b8 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -1243,6 +1243,7 @@ static long kvm_dev_ioctl(struct file *filp,
 		case KVM_CAP_MMU_SHADOW_CACHE_CONTROL:
 		case KVM_CAP_USER_MEMORY:
 		case KVM_CAP_SET_TSS_ADDR:
+		case KVM_CAP_CLOCKSOURCE:
 			r = 1;
 			break;
 		default:
diff --git a/drivers/kvm/x86.c b/drivers/kvm/x86.c
index e905d46..ef31fed 100644
--- a/drivers/kvm/x86.c
+++ b/drivers/kvm/x86.c
@@ -19,6 +19,7 @@
 #include "segment_descriptor.h"
 #include "irq.h"
 
+#include <linux/clocksource.h>
 #include <linux/kvm.h>
 #include <linux/fs.h>
 #include <linux/vmalloc.h>
@@ -1628,6 +1629,28 @@ int kvm_emulate_halt(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_emulate_halt);
 
+static void kvm_write_guest_time(struct kvm_vcpu *vcpu)
+{
+	struct timespec ts;
+	int r;
+
+	if (!vcpu->clock_gpa)
+		return;
+
+	/* Updates version to the next odd number, indicating we're writing */
+	vcpu->hv_clock.version++;
+	kvm_write_guest(vcpu->kvm, vcpu->clock_gpa, &vcpu->hv_clock, PAGE_SIZE);
+
+	kvm_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER,
+			  &vcpu->hv_clock.last_tsc);
+
+	ktime_get_ts(&ts);
+	vcpu->hv_clock.now_ns = ts.tv_nsec + (NSEC_PER_SEC * (u64)ts.tv_sec);
+	vcpu->hv_clock.wc_sec = get_seconds();
+	vcpu->hv_clock.version++;
+	kvm_write_guest(vcpu->kvm, vcpu->clock_gpa, &vcpu->hv_clock, PAGE_SIZE);
+}
+
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 {
 	unsigned long nr, a0, a1, a2, a3, ret;
@@ -1648,7 +1671,15 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 		a3 &= 0xFFFFFFFF;
 	}
 
+	ret = 0;
 	switch (nr) {
+	case  KVM_HCALL_REGISTER_CLOCK:
+
+		vcpu->clock_gpa = a0 << PAGE_SHIFT;
+		vcpu->hv_clock.tsc_mult = clocksource_khz2mult(tsc_khz, 22);
+
+		break;
+
 	default:
 		ret = -KVM_ENOSYS;
 		break;
@@ -1924,6 +1955,7 @@ out:
 		goto preempted;
 	}
 
+	kvm_write_guest_time(vcpu);
 	post_kvm_run_save(vcpu, kvm_run);
 
 	return r;
diff --git a/drivers/kvm/x86.h b/drivers/kvm/x86.h
index 663b822..fd77b66 100644
--- a/drivers/kvm/x86.h
+++ b/drivers/kvm/x86.h
@@ -83,6 +83,10 @@ struct kvm_vcpu {
 	/* emulate context */
 
 	struct x86_emulate_ctxt emulate_ctxt;
+
+	struct kvm_hv_clock_s hv_clock;
+	gpa_t clock_gpa; /* guest frame number, physical addr */
+
 };
 
 int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t gva, u32 error_code);
-- 
1.5.0.6

[kvm-devel] [PATCH 3/3] kvmclock implementation, the guest part.

From: Glauber de O. C. <gc...@re...> - 2007-11-08 21:39:00

This is the guest part of kvm clock implementation
It does not do tsc-only timing, as tsc can have deltas
between cpus, and it did not seem worthy to me to keep
adjusting them.

We do use it, however, for fine-grained adjustment.

Other than that, time comes from the host.

Signed-off-by: Glauber de Oliveira Costa <gc...@re...>
---
 arch/x86/Kconfig.i386       |   10 +++
 arch/x86/kernel/Makefile_32 |    1 +
 arch/x86/kernel/kvmclock.c  |  171 +++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/setup_32.c  |    5 +
 4 files changed, 187 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/kernel/kvmclock.c

diff --git a/arch/x86/Kconfig.i386 b/arch/x86/Kconfig.i386
index 7331efe..5fe4025 100644
--- a/arch/x86/Kconfig.i386
+++ b/arch/x86/Kconfig.i386
@@ -257,6 +257,16 @@ config VMI
 	  at the moment), by linking the kernel to a GPL-ed ROM module
 	  provided by the hypervisor.
 
+config KVM_CLOCK
+	bool "KVM paravirtualized clock"
+	select PARAVIRT
+	help
+	  Turning on this option will allow you to run a paravirtualized clock
+	  when running over the KVM hypervisor. Instead of relying on a PIT
+	  (or probably other) emulation by the underlying device model, the host
+	  provides the guest with timing infrastructure, as time of day, and
+	  timer expiration.
+
 source "arch/x86/lguest/Kconfig"
 
 endif
diff --git a/arch/x86/kernel/Makefile_32 b/arch/x86/kernel/Makefile_32
index b9d6798..df76d8c 100644
--- a/arch/x86/kernel/Makefile_32
+++ b/arch/x86/kernel/Makefile_32
@@ -43,6 +43,7 @@ obj-$(CONFIG_K8_NB)		+= k8.o
 obj-$(CONFIG_MGEODE_LX)		+= geode_32.o mfgpt_32.o
 
 obj-$(CONFIG_VMI)		+= vmi_32.o vmiclock_32.o
+obj-$(CONFIG_KVM_CLOCK)		+= kvmclock.o
 obj-$(CONFIG_PARAVIRT)		+= paravirt_32.o
 obj-y				+= pcspeaker.o
 
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
new file mode 100644
index 0000000..df14613
--- /dev/null
+++ b/arch/x86/kernel/kvmclock.c
@@ -0,0 +1,171 @@
+/*  KVM paravirtual clock driver. A clocksource implementation
+    Copyright (C) 2007 Glauber de Oliveira Costa, Red Hat Inc.
+
+    This program is free software; you can redistribute it and/or modify
+    it under the terms of the GNU General Public License as published by
+    the Free Software Foundation; either version 2 of the License, or
+    (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with this program; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+*/
+
+#include <linux/clocksource.h>
+#include <linux/clockchips.h>
+#include <linux/interrupt.h>
+#include <linux/kvm_para.h>
+#include <linux/ktime.h>
+#include <asm/arch_hooks.h>
+#include <asm/i8253.h>
+
+#include <mach_ipi.h>
+#include <irq_vectors.h>
+
+#define KVM_SCALE 22
+
+#define get_clock(cpu, field) hv_clock[cpu].fields.field
+
+static int kvmclock = 1;
+
+static int parse_no_kvmclock(char *arg)
+{
+	kvmclock = 0;
+	return 0;
+}
+early_param("no-kvmclock", parse_no_kvmclock);
+
+/* The hypervisor will put information about time periodically here */
+union kvm_hv_clock hv_clock[NR_CPUS] __attribute__((__aligned__(PAGE_SIZE)));
+
+static inline u64 kvm_get_delta(u64 last_tsc)
+{
+	int cpu = smp_processor_id();
+	u64 delta = native_read_tsc() - last_tsc;
+	return (delta * get_clock(cpu, tsc_mult)) >> KVM_SCALE;
+}
+
+/*
+ * The wallclock is the time of day when we booted. Since then, some time may
+ * have elapsed since the hypervisor wrote the data. So we try to account for
+ * that. Even if the tsc is not accurate, it gives us a more accurate timing
+ * than not adjusting at all
+ */
+unsigned long kvm_get_wallclock(void)
+{
+	u64 wc_sec, delta, last_tsc;
+	struct timespec ts;
+	int version, nsec, cpu = smp_processor_id();
+
+	do {
+		version = get_clock(cpu, version);
+		rmb();
+		last_tsc = get_clock(cpu, last_tsc);
+		rmb();
+		wc_sec = get_clock(cpu, wc_sec);
+		rmb();
+	} while ((get_clock(cpu, version) != version) && !(version & 1));
+
+	delta = kvm_get_delta(last_tsc);
+	nsec = do_div(delta, NSEC_PER_SEC);
+	set_normalized_timespec(&ts, wc_sec + delta, nsec);
+
+	/*
+	 * Of all mechanisms of time adjustment I've tested, this one
+	 * was the champion!
+	 */
+	return ts.tv_sec + 1;
+}
+
+int kvm_set_wallclock(unsigned long now)
+{
+	return 0;
+}
+
+/*
+ * This is our read_clock function. The host puts an tsc timestamp each time
+ * it updates a new time, and then we can use it to derive a slightly more
+ * precise notion of elapsed time, converted to nanoseconds.
+ */
+static cycle_t kvm_clock_read(void)
+{
+
+	u64 last_tsc, now;
+	u32 version;
+	int cpu = smp_processor_id();
+
+	do {
+		version = get_clock(cpu, version);
+		rmb();
+		last_tsc = get_clock(cpu, last_tsc);
+		rmb();
+		now = get_clock(cpu, now_ns);
+		rmb();
+	} while ((get_clock(cpu, version) != version) && !(version & 1));
+
+	return now + kvm_get_delta(last_tsc);
+}
+
+static struct clocksource kvm_clock = {
+	.name = "kvm-clock",
+	.read = kvm_clock_read,
+	.rating = 400,
+	.mask = CLOCKSOURCE_MASK(64),
+	.mult = 1 << KVM_SCALE,
+	.shift = KVM_SCALE,
+	.flags = CLOCK_SOURCE_IS_CONTINUOUS,
+};
+
+unsigned long long kvm_sched_clock(void)
+{
+	return kvm_clock_read();
+}
+
+static int kvm_register_clock(unsigned int cpu)
+{
+	unsigned long kvm_clock_info = __pa((unsigned long)&hv_clock[cpu]);
+	kvm_clock_info >>= PAGE_SHIFT; /* page frame number */
+	return kvm_hypercall1(KVM_HCALL_REGISTER_CLOCK, kvm_clock_info);
+}
+
+void kvm_setup_secondary_clock(void)
+{
+	/*
+	 * Now that the first cpu already had this clocksource initialized,
+	 * we shouldn't fail.
+	 */
+	int cpu = smp_processor_id();
+	WARN_ON(kvm_register_clock(cpu));
+	/* ok, done with our trickery, call native */
+	setup_secondary_APIC_clock();
+}
+
+void __init kvmclock_init(void)
+{
+	int cpu = smp_processor_id();
+	int r;
+
+	/*
+	 * If we can't use the paravirt clock, just go with
+	 * the usual timekeeping
+	 */
+	if (!kvm_para_available())
+		return;
+
+	r = kvm_register_clock(cpu);
+	if (r)
+		return;
+
+	if (kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)) {
+		pv_time_ops.get_wallclock = kvm_get_wallclock;
+		pv_time_ops.set_wallclock = kvm_set_wallclock;
+		pv_time_ops.sched_clock = kvm_sched_clock;
+		pv_apic_ops.setup_secondary_clock = kvm_setup_secondary_clock;
+		clocksource_register(&kvm_clock);
+	}
+}
diff --git a/arch/x86/kernel/setup_32.c b/arch/x86/kernel/setup_32.c
index e1e18c3..9f13ff6 100644
--- a/arch/x86/kernel/setup_32.c
+++ b/arch/x86/kernel/setup_32.c
@@ -44,6 +44,7 @@
 #include <linux/crash_dump.h>
 #include <linux/dmi.h>
 #include <linux/pfn.h>
+#include <linux/kvm_para.h>
 
 #include <video/edid.h>
 
@@ -614,6 +615,10 @@ void __init setup_arch(char **cmdline_p)
 
 	max_low_pfn = setup_memory();
 
+#ifdef CONFIG_KVM_CLOCK
+	kvmclock_init();
+#endif
+
 #ifdef CONFIG_VMI
 	/*
 	 * Must be after max_low_pfn is determined, and before kernel
-- 
1.5.0.6

Re: [kvm-devel] [PATCH 2/3] kvmclock - the host part.

From: Dong, E. <edd...@in...> - 2007-11-13 05:01:06

> +static void kvm_write_guest_time(struct kvm_vcpu *vcpu) +{
> +	struct timespec ts;
> +	int r;
> +
> +	if (!vcpu->clock_gpa)
> +		return;
> +
> +	/* Updates version to the next odd number, indicating
> we're writing */
> +	vcpu->hv_clock.version++;
> +	kvm_write_guest(vcpu->kvm, vcpu->clock_gpa,
> &vcpu->hv_clock, PAGE_SIZE);
> +
> +	kvm_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER,
> +			  &vcpu->hv_clock.last_tsc);
> +
> +	ktime_get_ts(&ts);

Do we need to disable preemption here?

> +	vcpu->hv_clock.now_ns =3D ts.tv_nsec + (NSEC_PER_SEC *
> (u64)ts.tv_sec); +	vcpu->hv_clock.wc_sec =3D get_seconds();
> +	vcpu->hv_clock.version++;
> +	kvm_write_guest(vcpu->kvm, vcpu->clock_gpa,
> &vcpu->hv_clock, PAGE_SIZE);
> +}
> +
thx,eddie

Re: [kvm-devel] [PATCH 2/3] kvmclock - the host part.

From: Glauber de O. C. <gc...@re...> - 2007-11-13 11:52:56

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Dong, Eddie escreveu:
>> +static void kvm_write_guest_time(struct kvm_vcpu *vcpu) +{
>> +	struct timespec ts;
>> +	int r;
>> +
>> +	if (!vcpu->clock_gpa)
>> +		return;
>> +
>> +	/* Updates version to the next odd number, indicating
>> we're writing */
>> +	vcpu->hv_clock.version++;
>> +	kvm_write_guest(vcpu->kvm, vcpu->clock_gpa,
>> &vcpu->hv_clock, PAGE_SIZE);
>> +
>> +	kvm_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER,
>> +			  &vcpu->hv_clock.last_tsc);
>> +
>> +	ktime_get_ts(&ts);
> 
> Do we need to disable preemption here?
After thinking for a little while, you are theoretically right.
In the current state, we could even be preempted between all operations
;-) Maybe after avi's suggestion of moving the call to it it will end up
in a preempt safe region, but anyway, it's safer to add the preempt
markers here.
I'll put it in next version, thanks

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Remi - http://enigmail.mozdev.org

iD8DBQFHOZBrjYI8LaFUWXMRAo81AKCfbkzhLl7F6BUjzUHVyErCFeHxFACg1teB
eqsOnJiAqB3JiYf+2YdMZ4o=
=ENKU
-----END PGP SIGNATURE-----

Re: [kvm-devel] [PATCH 2/3] kvmclock - the host part.

From: Izik E. <iz...@qu...> - 2007-11-13 12:09:59

Glauber de Oliveira Costa wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Dong, Eddie escreveu:
>   
>>> +static void kvm_write_guest_time(struct kvm_vcpu *vcpu) +{
>>> +	struct timespec ts;
>>> +	int r;
>>> +
>>> +	if (!vcpu->clock_gpa)
>>> +		return;
>>> +
>>> +	/* Updates version to the next odd number, indicating
>>> we're writing */
>>> +	vcpu->hv_clock.version++;
>>> +	kvm_write_guest(vcpu->kvm, vcpu->clock_gpa,
>>> &vcpu->hv_clock, PAGE_SIZE);
>>> +
>>> +	kvm_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER,
>>> +			  &vcpu->hv_clock.last_tsc);
>>> +
>>> +	ktime_get_ts(&ts);
>>>       
>> Do we need to disable preemption here?
>>     
> After thinking for a little while, you are theoretically right.
> In the current state, we could even be preempted between all operations
> ;-) Maybe after avi's suggestion of moving the call to it it will end up
> in a preempt safe region, but anyway, it's safer to add the preempt
> markers here.
> I'll put it in next version, thanks
>   
note that you cant call to kvm_write_guest with preemption disabled 
(witch you do few lines below)

Re: [kvm-devel] [PATCH 2/3] kvmclock - the host part.

From: Avi K. <av...@qu...> - 2007-11-13 14:48:52

Glauber de Oliveira Costa wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Dong, Eddie escreveu:
>   
>>> +static void kvm_write_guest_time(struct kvm_vcpu *vcpu) +{
>>> +	struct timespec ts;
>>> +	int r;
>>> +
>>> +	if (!vcpu->clock_gpa)
>>> +		return;
>>> +
>>> +	/* Updates version to the next odd number, indicating
>>> we're writing */
>>> +	vcpu->hv_clock.version++;
>>> +	kvm_write_guest(vcpu->kvm, vcpu->clock_gpa,
>>> &vcpu->hv_clock, PAGE_SIZE);
>>> +
>>> +	kvm_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER,
>>> +			  &vcpu->hv_clock.last_tsc);
>>> +
>>> +	ktime_get_ts(&ts);
>>>       
>> Do we need to disable preemption here?
>>     
> After thinking for a little while, you are theoretically right.
> In the current state, we could even be preempted between all operations
> ;-) Maybe after avi's suggestion of moving the call to it it will end up
> in a preempt safe region, but anyway, it's safer to add the preempt
> markers here.
> I'll put it in next version, thanks
>
>   

Well, you can't kvm_write_guest() with preemption enabled.

preempt notifiers to the rescue!  We have a callout during preemption, 
so you can just zero out a flag there, and when we're scheduled again 
retry the whole thing.

-- 
error compiling committee.c: too many arguments to function

Re: [kvm-devel] [PATCH 2/3] kvmclock - the host part.

From: Dong, E. <edd...@in...> - 2007-11-13 15:23:57

Avi Kivity wrote:
> Glauber de Oliveira Costa wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>=20
>> Dong, Eddie escreveu:
>>=20
>>>> +static void kvm_write_guest_time(struct kvm_vcpu *vcpu) +{
>>>> +	struct timespec ts; +	int r;
>>>> +
>>>> +	if (!vcpu->clock_gpa)
>>>> +		return;
>>>> +
>>>> +	/* Updates version to the next odd number, indicating we're
>>>> writing */ +	vcpu->hv_clock.version++;
>>>> +	kvm_write_guest(vcpu->kvm, vcpu->clock_gpa,
>>>> &vcpu->hv_clock, PAGE_SIZE);
>>>> +
>>>> +	kvm_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER,
>>>> +			  &vcpu->hv_clock.last_tsc);
>>>> +
>>>> +	ktime_get_ts(&ts);
>>>>=20
>>> Do we need to disable preemption here?
>>>=20
>> After thinking for a little while, you are theoretically right.
>> In the current state, we could even be preempted between all
>> operations ;-) Maybe after avi's suggestion of moving the call to it
>> it will end up in a preempt safe region, but anyway, it's safer to
>> add the preempt markers here. I'll put it in next version, thanks
>>=20
>>=20
>=20
> Well, you can't kvm_write_guest() with preemption enabled.
>=20
> preempt notifiers to the rescue!  We have a callout during preemption,
> so you can just zero out a flag there, and when we're scheduled again
> retry the whole thing.=20
>=20

The preemption issue is within following code which need to be done in a
short enough period.

+	kvm_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER,
+			  &vcpu->hv_clock.last_tsc);
+
+	ktime_get_ts(&ts);
+	vcpu->hv_clock.now_ns =3D ts.tv_nsec + (NSEC_PER_SEC *
(u64)ts.tv_sec);
+	vcpu->hv_clock.wc_sec =3D get_seconds();

I am even thinking we have to disable interrupt between these lines,
otherwise
guest wall clock may see backward time source when calculating the
delta TSC since last vcpu->hv_clock.now_ns update.

Also why we use both ktime_get_ts(&ts) & get_seconds() together? they
are not atomic
and may cause issues?

thx,eddie

Re: [kvm-devel] [PATCH 2/3] kvmclock - the host part.

From: Avi K. <av...@qu...> - 2007-11-13 16:14:01

Dong, Eddie wrote:
>>>
>>> After thinking for a little while, you are theoretically right.
>>> In the current state, we could even be preempted between all
>>> operations ;-) Maybe after avi's suggestion of moving the call to it
>>> it will end up in a preempt safe region, but anyway, it's safer to
>>> add the preempt markers here. I'll put it in next version, thanks
>>>
>>>
>>>       
>> Well, you can't kvm_write_guest() with preemption enabled.
>>
>> preempt notifiers to the rescue!  We have a callout during preemption,
>> so you can just zero out a flag there, and when we're scheduled again
>> retry the whole thing. 
>>
>>     
>
> The preemption issue is within following code which need to be done in a
> short enough period.
>
> +	kvm_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER,
> +			  &vcpu->hv_clock.last_tsc);
> +
> +	ktime_get_ts(&ts);
> +	vcpu->hv_clock.now_ns = ts.tv_nsec + (NSEC_PER_SEC *
> (u64)ts.tv_sec);
> +	vcpu->hv_clock.wc_sec = get_seconds();
>
> I am even thinking we have to disable interrupt between these lines,
> otherwise
> guest wall clock may see backward time source when calculating the
> delta TSC since last vcpu->hv_clock.now_ns update.
>   

That's true.  While we do need to handle vcpu migration and 
descheduling, the code sequence you note needs to be as atomic as possible.


-- 
error compiling committee.c: too many arguments to function

Re: [kvm-devel] [PATCH 2/3] kvmclock - the host part.

From: Dong, E. <edd...@in...> - 2007-11-14 00:42:12

>>=20
>> +	kvm_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER,
>> +			  &vcpu->hv_clock.last_tsc);
>> +
>> +	ktime_get_ts(&ts);
>> +	vcpu->hv_clock.now_ns =3D ts.tv_nsec + (NSEC_PER_SEC *
>> (u64)ts.tv_sec); +	vcpu->hv_clock.wc_sec =3D get_seconds();
>>=20
>> I am even thinking we have to disable interrupt between these lines,
>> otherwise guest wall clock may see backward time source when
>> calculating the delta TSC since last vcpu->hv_clock.now_ns update.
>>=20
>=20
> That's true.  While we do need to handle vcpu migration and
> descheduling, the code sequence you note needs to be as atomic as
> possible.=20
>=20

Yes VCPU migration is another issue. Since the physical platform TSC is=20
totally different across migration, so we can't expose host TSC to
guest.
For KVM, it should be simple to just use guest_TSC (=3DHOST_TSC + =
offset).

thanks, eddie

Re: [kvm-devel] [PATCH 1/3] include files for kvmclock

From: Gerd H. <kr...@re...> - 2007-11-09 08:37:42

> +/*
> + * Guest has page alignment and padding requirements. At the host, it will
> + * only lead to wasted space at the vcpu struct. For this reason, the struct
> + * is not anonymous
> + */
> +union kvm_hv_clock {
> +	struct kvm_hv_clock_s {
> +		u64 tsc_mult;
> +		u64 now_ns;
> +		/* That's the wall clock, not the water closet */
> +		u64 wc_sec;
> +		u64 last_tsc;
> +		/* At first, we could use the tsc value as a marker, but Jeremy
> +		 * well noted that it will cause us locking problems in 32-bit
> +		 * sys, so we have a special version field */
> +		u32 version;
> +	} fields;
> +	char page_align[PAGE_SIZE];
> +};

What is the point in using a whole page per vcpu?  You probably don't
want struct kvm_hv_clock_s cross a page border.  Is that the only reason
or are there other constrains too?

As the kvm clock looks quite simliar to what xen does, how about making
the structs binary-compatible or simply reuse the xen version (struct
vcpu_time_info in xen/interface/xen.h)?

cheers,
  Gerd

Re: [kvm-devel] [PATCH 1/3] include files for kvmclock

From: Avi K. <av...@qu...> - 2007-11-11 09:16:50

Gerd Hoffmann wrote:
>> +/*
>> + * Guest has page alignment and padding requirements. At the host, it will
>> + * only lead to wasted space at the vcpu struct. For this reason, the struct
>> + * is not anonymous
>> + */
>> +union kvm_hv_clock {
>> +	struct kvm_hv_clock_s {
>> +		u64 tsc_mult;
>> +		u64 now_ns;
>> +		/* That's the wall clock, not the water closet */
>> +		u64 wc_sec;
>> +		u64 last_tsc;
>> +		/* At first, we could use the tsc value as a marker, but Jeremy
>> +		 * well noted that it will cause us locking problems in 32-bit
>> +		 * sys, so we have a special version field */
>> +		u32 version;
>> +	} fields;
>> +	char page_align[PAGE_SIZE];
>> +};
>>     
>
> What is the point in using a whole page per vcpu?  You probably don't
> want struct kvm_hv_clock_s cross a page border.  Is that the only reason
> or are there other constrains too?
>   

We don't even have the cross-page-boundary constraint.

The real issue is that on i386 physical addresses are 64-bit while 
hypercall arguments are 32-bit.  A page frame number is 32-bit and so 
poses no issues.

I see two workarounds for this:
- make the first hypercall argument a u64 rather than a long (using two 
registers on i386)
- use an msr for registering the clock address

The first is more general, while the second has the nice property of 
taking care of live migration automatically.

> As the kvm clock looks quite simliar to what xen does, how about making
> the structs binary-compatible or simply reuse the xen version (struct
> vcpu_time_info in xen/interface/xen.h)?
>   

If there are no technical issues, I have no objections.

-- 
error compiling committee.c: too many arguments to function

Re: [kvm-devel] [PATCH 2/3] kvmclock - the host part.

From: Avi K. <av...@qu...> - 2007-11-11 10:18:36

Glauber de Oliveira Costa wrote:
> This is the host part of kvm clocksource implementation. As it does
> not include clockevents, it is a fairly simple implementation. We
> only have to register a per-vcpu area, and start writting to it periodically.
>
>   

Missing live migration support  (a way for userspace to read and write 
the guest clock address).  Should probably be in a separate patch.

> @@ -1924,6 +1955,7 @@ out:
>  		goto preempted;
>  	}
>  
> +	kvm_write_guest_time(vcpu);
>  	post_kvm_run_save(vcpu, kvm_run);
>   

Why here?  Seems like we're leaving the guest for a while at this place.

Suggest putting it on top of __vcpu_run(), guarded by a flag, and 
setting the flag every time we put the vcpu.

-- 
error compiling committee.c: too many arguments to function

Re: [kvm-devel] [PATCH 2/3] kvmclock - the host part.

From: Glauber de O. C. <gc...@re...> - 2007-11-13 11:27:31

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Avi Kivity escreveu:
> Glauber de Oliveira Costa wrote:
>> This is the host part of kvm clocksource implementation. As it does
>> not include clockevents, it is a fairly simple implementation. We
>> only have to register a per-vcpu area, and start writting to it
>> periodically.
>>
>>   
> 
> Missing live migration support  (a way for userspace to read and write
> the guest clock address).  Should probably be in a separate patch.

I think it's a matter of issuing a hypercall for reading the clock
address. It's fair simple, and can be done in a later version of this patch.
As for writting, the register hypercall itself can be used. It has no
special side-effects we should care about.

>> @@ -1924,6 +1955,7 @@ out:
>>          goto preempted;
>>      }
>>  
>> +    kvm_write_guest_time(vcpu);
>>      post_kvm_run_save(vcpu, kvm_run);
>>   
> 
> Why here?  Seems like we're leaving the guest for a while at this place.
> 
> Suggest putting it on top of __vcpu_run(), guarded by a flag, and
> setting the flag every time we put the vcpu.

No special preference. It just sounded exity enough to me. I can move to
where you suggest.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Remi - http://enigmail.mozdev.org

iD8DBQFHOYpojYI8LaFUWXMRApf8AJ4jQ/ZTBlub1IwFkJrYZyart+f7bwCfT+9m
l1Rblsmw96ZatCf60g2dNYY=
=DBpn
-----END PGP SIGNATURE-----

Re: [kvm-devel] [PATCH 2/3] kvmclock - the host part.

From: Avi K. <av...@qu...> - 2007-11-13 14:45:46

Glauber de Oliveira Costa wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Avi Kivity escreveu:
>   
>> Glauber de Oliveira Costa wrote:
>>     
>>> This is the host part of kvm clocksource implementation. As it does
>>> not include clockevents, it is a fairly simple implementation. We
>>> only have to register a per-vcpu area, and start writting to it
>>> periodically.
>>>
>>>   
>>>       
>> Missing live migration support  (a way for userspace to read and write
>> the guest clock address).  Should probably be in a separate patch.
>>     
>
> I think it's a matter of issuing a hypercall for reading the clock
> address. It's fair simple, and can be done in a later version of this patch.
> As for writting, the register hypercall itself can be used. It has no
> special side-effects we should care about.
>   

kvm live migration is done (at least thus far) without guest 
involvement.  So you need the host to be able to transfer this state.


>   
>>> @@ -1924,6 +1955,7 @@ out:
>>>          goto preempted;
>>>      }
>>>  
>>> +    kvm_write_guest_time(vcpu);
>>>      post_kvm_run_save(vcpu, kvm_run);
>>>   
>>>       
>> Why here?  Seems like we're leaving the guest for a while at this place.
>>
>> Suggest putting it on top of __vcpu_run(), guarded by a flag, and
>> setting the flag every time we put the vcpu.
>>     
>
> No special preference. It just sounded exity enough to me. I can move to
> where you suggest.
>
>   

It's more than a place, it's a set of rules:

- if the vcpu is migrated, we need a new timebase
- ditto if we're descheduled (well that's the same thing)
- if "some time passes"


-- 
error compiling committee.c: too many arguments to function