From: Glauber de O. C. <gc...@re...> - 2007-11-08 21:38:50
|
Hi folks, Here's a new spin of the clocksource implementation. In this new version: * followed avi's suggestion of: - letting the cpu itself register its memory area. - using a gfn instead of a phys addr as a parameter, to be sure we can cover the whole memory area - write guest time at exits. Also, I 'm not using an anonymous struct in the kvm_hv_clock union, so the vcpu struct can grab just what it needs, and not the whole padding the guest needs This is it. Have fun |
From: Glauber de O. C. <gc...@re...> - 2007-11-08 21:38:53
|
This patch introduces the include files for kvm clock. They'll be needed for both guest and host part. Signed-off-by: Glauber de Oliveira Costa <gc...@re...> --- include/asm-x86/kvm_para.h | 25 +++++++++++++++++++++++++ include/linux/kvm.h | 1 + include/linux/kvm_para.h | 2 ++ 3 files changed, 28 insertions(+), 0 deletions(-) diff --git a/include/asm-x86/kvm_para.h b/include/asm-x86/kvm_para.h index c6f3fd8..0f6b813 100644 --- a/include/asm-x86/kvm_para.h +++ b/include/asm-x86/kvm_para.h @@ -10,15 +10,40 @@ * paravirtualization, the appropriate feature bit should be checked. */ #define KVM_CPUID_FEATURES 0x40000001 +#define KVM_FEATURE_CLOCKSOURCE 0 #ifdef __KERNEL__ #include <asm/processor.h> +extern void kvmclock_init(void); + +/* + * Guest has page alignment and padding requirements. At the host, it will + * only lead to wasted space at the vcpu struct. For this reason, the struct + * is not anonymous + */ +union kvm_hv_clock { + struct kvm_hv_clock_s { + u64 tsc_mult; + u64 now_ns; + /* That's the wall clock, not the water closet */ + u64 wc_sec; + u64 last_tsc; + /* At first, we could use the tsc value as a marker, but Jeremy + * well noted that it will cause us locking problems in 32-bit + * sys, so we have a special version field */ + u32 version; + } fields; + char page_align[PAGE_SIZE]; +}; + /* This instruction is vmcall. On non-VT architectures, it will generate a * trap that we will then rewrite to the appropriate instruction. */ #define KVM_HYPERCALL ".byte 0x0f,0x01,0xc1" +#define KVM_HCALL_REGISTER_CLOCK 1 + /* For KVM hypercalls, a three-byte sequence of either the vmrun or the vmmrun * instruction. The hypervisor may replace it with something else but only the * instructions are guaranteed to be supported. diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 71d33d6..9862241 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -359,6 +359,7 @@ struct kvm_signal_mask { #define KVM_CAP_MMU_SHADOW_CACHE_CONTROL 2 #define KVM_CAP_USER_MEMORY 3 #define KVM_CAP_SET_TSS_ADDR 4 +#define KVM_CAP_CLOCKSOURCE 5 /* * ioctls for VM fds diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h index e4db25f..094efc7 100644 --- a/include/linux/kvm_para.h +++ b/include/linux/kvm_para.h @@ -11,6 +11,8 @@ /* Return values for hypercalls */ #define KVM_ENOSYS 1000 +#define KVM_EINVAL 1019 +#define KVM_ENODEV 1022 #ifdef __KERNEL__ /* -- 1.5.0.6 |
From: Glauber de O. C. <gc...@re...> - 2007-11-08 21:39:00
|
This is the host part of kvm clocksource implementation. As it does not include clockevents, it is a fairly simple implementation. We only have to register a per-vcpu area, and start writting to it periodically. Signed-off-by: Glauber de Oliveira Costa <gc...@re...> --- drivers/kvm/kvm_main.c | 1 + drivers/kvm/x86.c | 32 ++++++++++++++++++++++++++++++++ drivers/kvm/x86.h | 4 ++++ 3 files changed, 37 insertions(+), 0 deletions(-) diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c index d095002..c2c79b8 100644 --- a/drivers/kvm/kvm_main.c +++ b/drivers/kvm/kvm_main.c @@ -1243,6 +1243,7 @@ static long kvm_dev_ioctl(struct file *filp, case KVM_CAP_MMU_SHADOW_CACHE_CONTROL: case KVM_CAP_USER_MEMORY: case KVM_CAP_SET_TSS_ADDR: + case KVM_CAP_CLOCKSOURCE: r = 1; break; default: diff --git a/drivers/kvm/x86.c b/drivers/kvm/x86.c index e905d46..ef31fed 100644 --- a/drivers/kvm/x86.c +++ b/drivers/kvm/x86.c @@ -19,6 +19,7 @@ #include "segment_descriptor.h" #include "irq.h" +#include <linux/clocksource.h> #include <linux/kvm.h> #include <linux/fs.h> #include <linux/vmalloc.h> @@ -1628,6 +1629,28 @@ int kvm_emulate_halt(struct kvm_vcpu *vcpu) } EXPORT_SYMBOL_GPL(kvm_emulate_halt); +static void kvm_write_guest_time(struct kvm_vcpu *vcpu) +{ + struct timespec ts; + int r; + + if (!vcpu->clock_gpa) + return; + + /* Updates version to the next odd number, indicating we're writing */ + vcpu->hv_clock.version++; + kvm_write_guest(vcpu->kvm, vcpu->clock_gpa, &vcpu->hv_clock, PAGE_SIZE); + + kvm_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER, + &vcpu->hv_clock.last_tsc); + + ktime_get_ts(&ts); + vcpu->hv_clock.now_ns = ts.tv_nsec + (NSEC_PER_SEC * (u64)ts.tv_sec); + vcpu->hv_clock.wc_sec = get_seconds(); + vcpu->hv_clock.version++; + kvm_write_guest(vcpu->kvm, vcpu->clock_gpa, &vcpu->hv_clock, PAGE_SIZE); +} + int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) { unsigned long nr, a0, a1, a2, a3, ret; @@ -1648,7 +1671,15 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) a3 &= 0xFFFFFFFF; } + ret = 0; switch (nr) { + case KVM_HCALL_REGISTER_CLOCK: + + vcpu->clock_gpa = a0 << PAGE_SHIFT; + vcpu->hv_clock.tsc_mult = clocksource_khz2mult(tsc_khz, 22); + + break; + default: ret = -KVM_ENOSYS; break; @@ -1924,6 +1955,7 @@ out: goto preempted; } + kvm_write_guest_time(vcpu); post_kvm_run_save(vcpu, kvm_run); return r; diff --git a/drivers/kvm/x86.h b/drivers/kvm/x86.h index 663b822..fd77b66 100644 --- a/drivers/kvm/x86.h +++ b/drivers/kvm/x86.h @@ -83,6 +83,10 @@ struct kvm_vcpu { /* emulate context */ struct x86_emulate_ctxt emulate_ctxt; + + struct kvm_hv_clock_s hv_clock; + gpa_t clock_gpa; /* guest frame number, physical addr */ + }; int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t gva, u32 error_code); -- 1.5.0.6 |
From: Glauber de O. C. <gc...@re...> - 2007-11-08 21:39:00
|
This is the guest part of kvm clock implementation It does not do tsc-only timing, as tsc can have deltas between cpus, and it did not seem worthy to me to keep adjusting them. We do use it, however, for fine-grained adjustment. Other than that, time comes from the host. Signed-off-by: Glauber de Oliveira Costa <gc...@re...> --- arch/x86/Kconfig.i386 | 10 +++ arch/x86/kernel/Makefile_32 | 1 + arch/x86/kernel/kvmclock.c | 171 +++++++++++++++++++++++++++++++++++++++++++ arch/x86/kernel/setup_32.c | 5 + 4 files changed, 187 insertions(+), 0 deletions(-) create mode 100644 arch/x86/kernel/kvmclock.c diff --git a/arch/x86/Kconfig.i386 b/arch/x86/Kconfig.i386 index 7331efe..5fe4025 100644 --- a/arch/x86/Kconfig.i386 +++ b/arch/x86/Kconfig.i386 @@ -257,6 +257,16 @@ config VMI at the moment), by linking the kernel to a GPL-ed ROM module provided by the hypervisor. +config KVM_CLOCK + bool "KVM paravirtualized clock" + select PARAVIRT + help + Turning on this option will allow you to run a paravirtualized clock + when running over the KVM hypervisor. Instead of relying on a PIT + (or probably other) emulation by the underlying device model, the host + provides the guest with timing infrastructure, as time of day, and + timer expiration. + source "arch/x86/lguest/Kconfig" endif diff --git a/arch/x86/kernel/Makefile_32 b/arch/x86/kernel/Makefile_32 index b9d6798..df76d8c 100644 --- a/arch/x86/kernel/Makefile_32 +++ b/arch/x86/kernel/Makefile_32 @@ -43,6 +43,7 @@ obj-$(CONFIG_K8_NB) += k8.o obj-$(CONFIG_MGEODE_LX) += geode_32.o mfgpt_32.o obj-$(CONFIG_VMI) += vmi_32.o vmiclock_32.o +obj-$(CONFIG_KVM_CLOCK) += kvmclock.o obj-$(CONFIG_PARAVIRT) += paravirt_32.o obj-y += pcspeaker.o diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c new file mode 100644 index 0000000..df14613 --- /dev/null +++ b/arch/x86/kernel/kvmclock.c @@ -0,0 +1,171 @@ +/* KVM paravirtual clock driver. A clocksource implementation + Copyright (C) 2007 Glauber de Oliveira Costa, Red Hat Inc. + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, write to the Free Software + Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +*/ + +#include <linux/clocksource.h> +#include <linux/clockchips.h> +#include <linux/interrupt.h> +#include <linux/kvm_para.h> +#include <linux/ktime.h> +#include <asm/arch_hooks.h> +#include <asm/i8253.h> + +#include <mach_ipi.h> +#include <irq_vectors.h> + +#define KVM_SCALE 22 + +#define get_clock(cpu, field) hv_clock[cpu].fields.field + +static int kvmclock = 1; + +static int parse_no_kvmclock(char *arg) +{ + kvmclock = 0; + return 0; +} +early_param("no-kvmclock", parse_no_kvmclock); + +/* The hypervisor will put information about time periodically here */ +union kvm_hv_clock hv_clock[NR_CPUS] __attribute__((__aligned__(PAGE_SIZE))); + +static inline u64 kvm_get_delta(u64 last_tsc) +{ + int cpu = smp_processor_id(); + u64 delta = native_read_tsc() - last_tsc; + return (delta * get_clock(cpu, tsc_mult)) >> KVM_SCALE; +} + +/* + * The wallclock is the time of day when we booted. Since then, some time may + * have elapsed since the hypervisor wrote the data. So we try to account for + * that. Even if the tsc is not accurate, it gives us a more accurate timing + * than not adjusting at all + */ +unsigned long kvm_get_wallclock(void) +{ + u64 wc_sec, delta, last_tsc; + struct timespec ts; + int version, nsec, cpu = smp_processor_id(); + + do { + version = get_clock(cpu, version); + rmb(); + last_tsc = get_clock(cpu, last_tsc); + rmb(); + wc_sec = get_clock(cpu, wc_sec); + rmb(); + } while ((get_clock(cpu, version) != version) && !(version & 1)); + + delta = kvm_get_delta(last_tsc); + nsec = do_div(delta, NSEC_PER_SEC); + set_normalized_timespec(&ts, wc_sec + delta, nsec); + + /* + * Of all mechanisms of time adjustment I've tested, this one + * was the champion! + */ + return ts.tv_sec + 1; +} + +int kvm_set_wallclock(unsigned long now) +{ + return 0; +} + +/* + * This is our read_clock function. The host puts an tsc timestamp each time + * it updates a new time, and then we can use it to derive a slightly more + * precise notion of elapsed time, converted to nanoseconds. + */ +static cycle_t kvm_clock_read(void) +{ + + u64 last_tsc, now; + u32 version; + int cpu = smp_processor_id(); + + do { + version = get_clock(cpu, version); + rmb(); + last_tsc = get_clock(cpu, last_tsc); + rmb(); + now = get_clock(cpu, now_ns); + rmb(); + } while ((get_clock(cpu, version) != version) && !(version & 1)); + + return now + kvm_get_delta(last_tsc); +} + +static struct clocksource kvm_clock = { + .name = "kvm-clock", + .read = kvm_clock_read, + .rating = 400, + .mask = CLOCKSOURCE_MASK(64), + .mult = 1 << KVM_SCALE, + .shift = KVM_SCALE, + .flags = CLOCK_SOURCE_IS_CONTINUOUS, +}; + +unsigned long long kvm_sched_clock(void) +{ + return kvm_clock_read(); +} + +static int kvm_register_clock(unsigned int cpu) +{ + unsigned long kvm_clock_info = __pa((unsigned long)&hv_clock[cpu]); + kvm_clock_info >>= PAGE_SHIFT; /* page frame number */ + return kvm_hypercall1(KVM_HCALL_REGISTER_CLOCK, kvm_clock_info); +} + +void kvm_setup_secondary_clock(void) +{ + /* + * Now that the first cpu already had this clocksource initialized, + * we shouldn't fail. + */ + int cpu = smp_processor_id(); + WARN_ON(kvm_register_clock(cpu)); + /* ok, done with our trickery, call native */ + setup_secondary_APIC_clock(); +} + +void __init kvmclock_init(void) +{ + int cpu = smp_processor_id(); + int r; + + /* + * If we can't use the paravirt clock, just go with + * the usual timekeeping + */ + if (!kvm_para_available()) + return; + + r = kvm_register_clock(cpu); + if (r) + return; + + if (kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)) { + pv_time_ops.get_wallclock = kvm_get_wallclock; + pv_time_ops.set_wallclock = kvm_set_wallclock; + pv_time_ops.sched_clock = kvm_sched_clock; + pv_apic_ops.setup_secondary_clock = kvm_setup_secondary_clock; + clocksource_register(&kvm_clock); + } +} diff --git a/arch/x86/kernel/setup_32.c b/arch/x86/kernel/setup_32.c index e1e18c3..9f13ff6 100644 --- a/arch/x86/kernel/setup_32.c +++ b/arch/x86/kernel/setup_32.c @@ -44,6 +44,7 @@ #include <linux/crash_dump.h> #include <linux/dmi.h> #include <linux/pfn.h> +#include <linux/kvm_para.h> #include <video/edid.h> @@ -614,6 +615,10 @@ void __init setup_arch(char **cmdline_p) max_low_pfn = setup_memory(); +#ifdef CONFIG_KVM_CLOCK + kvmclock_init(); +#endif + #ifdef CONFIG_VMI /* * Must be after max_low_pfn is determined, and before kernel -- 1.5.0.6 |
From: Dong, E. <edd...@in...> - 2007-11-13 05:01:06
|
> +static void kvm_write_guest_time(struct kvm_vcpu *vcpu) +{ > + struct timespec ts; > + int r; > + > + if (!vcpu->clock_gpa) > + return; > + > + /* Updates version to the next odd number, indicating > we're writing */ > + vcpu->hv_clock.version++; > + kvm_write_guest(vcpu->kvm, vcpu->clock_gpa, > &vcpu->hv_clock, PAGE_SIZE); > + > + kvm_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER, > + &vcpu->hv_clock.last_tsc); > + > + ktime_get_ts(&ts); Do we need to disable preemption here? > + vcpu->hv_clock.now_ns =3D ts.tv_nsec + (NSEC_PER_SEC * > (u64)ts.tv_sec); + vcpu->hv_clock.wc_sec =3D get_seconds(); > + vcpu->hv_clock.version++; > + kvm_write_guest(vcpu->kvm, vcpu->clock_gpa, > &vcpu->hv_clock, PAGE_SIZE); > +} > + thx,eddie |
From: Glauber de O. C. <gc...@re...> - 2007-11-13 11:52:56
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Dong, Eddie escreveu: >> +static void kvm_write_guest_time(struct kvm_vcpu *vcpu) +{ >> + struct timespec ts; >> + int r; >> + >> + if (!vcpu->clock_gpa) >> + return; >> + >> + /* Updates version to the next odd number, indicating >> we're writing */ >> + vcpu->hv_clock.version++; >> + kvm_write_guest(vcpu->kvm, vcpu->clock_gpa, >> &vcpu->hv_clock, PAGE_SIZE); >> + >> + kvm_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER, >> + &vcpu->hv_clock.last_tsc); >> + >> + ktime_get_ts(&ts); > > Do we need to disable preemption here? After thinking for a little while, you are theoretically right. In the current state, we could even be preempted between all operations ;-) Maybe after avi's suggestion of moving the call to it it will end up in a preempt safe region, but anyway, it's safer to add the preempt markers here. I'll put it in next version, thanks -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Remi - http://enigmail.mozdev.org iD8DBQFHOZBrjYI8LaFUWXMRAo81AKCfbkzhLl7F6BUjzUHVyErCFeHxFACg1teB eqsOnJiAqB3JiYf+2YdMZ4o= =ENKU -----END PGP SIGNATURE----- |
From: Izik E. <iz...@qu...> - 2007-11-13 12:09:59
|
Glauber de Oliveira Costa wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Dong, Eddie escreveu: > >>> +static void kvm_write_guest_time(struct kvm_vcpu *vcpu) +{ >>> + struct timespec ts; >>> + int r; >>> + >>> + if (!vcpu->clock_gpa) >>> + return; >>> + >>> + /* Updates version to the next odd number, indicating >>> we're writing */ >>> + vcpu->hv_clock.version++; >>> + kvm_write_guest(vcpu->kvm, vcpu->clock_gpa, >>> &vcpu->hv_clock, PAGE_SIZE); >>> + >>> + kvm_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER, >>> + &vcpu->hv_clock.last_tsc); >>> + >>> + ktime_get_ts(&ts); >>> >> Do we need to disable preemption here? >> > After thinking for a little while, you are theoretically right. > In the current state, we could even be preempted between all operations > ;-) Maybe after avi's suggestion of moving the call to it it will end up > in a preempt safe region, but anyway, it's safer to add the preempt > markers here. > I'll put it in next version, thanks > note that you cant call to kvm_write_guest with preemption disabled (witch you do few lines below) |
From: Avi K. <av...@qu...> - 2007-11-13 14:48:52
|
Glauber de Oliveira Costa wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Dong, Eddie escreveu: > >>> +static void kvm_write_guest_time(struct kvm_vcpu *vcpu) +{ >>> + struct timespec ts; >>> + int r; >>> + >>> + if (!vcpu->clock_gpa) >>> + return; >>> + >>> + /* Updates version to the next odd number, indicating >>> we're writing */ >>> + vcpu->hv_clock.version++; >>> + kvm_write_guest(vcpu->kvm, vcpu->clock_gpa, >>> &vcpu->hv_clock, PAGE_SIZE); >>> + >>> + kvm_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER, >>> + &vcpu->hv_clock.last_tsc); >>> + >>> + ktime_get_ts(&ts); >>> >> Do we need to disable preemption here? >> > After thinking for a little while, you are theoretically right. > In the current state, we could even be preempted between all operations > ;-) Maybe after avi's suggestion of moving the call to it it will end up > in a preempt safe region, but anyway, it's safer to add the preempt > markers here. > I'll put it in next version, thanks > > Well, you can't kvm_write_guest() with preemption enabled. preempt notifiers to the rescue! We have a callout during preemption, so you can just zero out a flag there, and when we're scheduled again retry the whole thing. -- error compiling committee.c: too many arguments to function |
From: Dong, E. <edd...@in...> - 2007-11-13 15:23:57
|
Avi Kivity wrote: > Glauber de Oliveira Costa wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >>=20 >> Dong, Eddie escreveu: >>=20 >>>> +static void kvm_write_guest_time(struct kvm_vcpu *vcpu) +{ >>>> + struct timespec ts; + int r; >>>> + >>>> + if (!vcpu->clock_gpa) >>>> + return; >>>> + >>>> + /* Updates version to the next odd number, indicating we're >>>> writing */ + vcpu->hv_clock.version++; >>>> + kvm_write_guest(vcpu->kvm, vcpu->clock_gpa, >>>> &vcpu->hv_clock, PAGE_SIZE); >>>> + >>>> + kvm_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER, >>>> + &vcpu->hv_clock.last_tsc); >>>> + >>>> + ktime_get_ts(&ts); >>>>=20 >>> Do we need to disable preemption here? >>>=20 >> After thinking for a little while, you are theoretically right. >> In the current state, we could even be preempted between all >> operations ;-) Maybe after avi's suggestion of moving the call to it >> it will end up in a preempt safe region, but anyway, it's safer to >> add the preempt markers here. I'll put it in next version, thanks >>=20 >>=20 >=20 > Well, you can't kvm_write_guest() with preemption enabled. >=20 > preempt notifiers to the rescue! We have a callout during preemption, > so you can just zero out a flag there, and when we're scheduled again > retry the whole thing.=20 >=20 The preemption issue is within following code which need to be done in a short enough period. + kvm_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER, + &vcpu->hv_clock.last_tsc); + + ktime_get_ts(&ts); + vcpu->hv_clock.now_ns =3D ts.tv_nsec + (NSEC_PER_SEC * (u64)ts.tv_sec); + vcpu->hv_clock.wc_sec =3D get_seconds(); I am even thinking we have to disable interrupt between these lines, otherwise guest wall clock may see backward time source when calculating the delta TSC since last vcpu->hv_clock.now_ns update. Also why we use both ktime_get_ts(&ts) & get_seconds() together? they are not atomic and may cause issues? thx,eddie |
From: Avi K. <av...@qu...> - 2007-11-13 16:14:01
|
Dong, Eddie wrote: >>> >>> After thinking for a little while, you are theoretically right. >>> In the current state, we could even be preempted between all >>> operations ;-) Maybe after avi's suggestion of moving the call to it >>> it will end up in a preempt safe region, but anyway, it's safer to >>> add the preempt markers here. I'll put it in next version, thanks >>> >>> >>> >> Well, you can't kvm_write_guest() with preemption enabled. >> >> preempt notifiers to the rescue! We have a callout during preemption, >> so you can just zero out a flag there, and when we're scheduled again >> retry the whole thing. >> >> > > The preemption issue is within following code which need to be done in a > short enough period. > > + kvm_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER, > + &vcpu->hv_clock.last_tsc); > + > + ktime_get_ts(&ts); > + vcpu->hv_clock.now_ns = ts.tv_nsec + (NSEC_PER_SEC * > (u64)ts.tv_sec); > + vcpu->hv_clock.wc_sec = get_seconds(); > > I am even thinking we have to disable interrupt between these lines, > otherwise > guest wall clock may see backward time source when calculating the > delta TSC since last vcpu->hv_clock.now_ns update. > That's true. While we do need to handle vcpu migration and descheduling, the code sequence you note needs to be as atomic as possible. -- error compiling committee.c: too many arguments to function |
From: Dong, E. <edd...@in...> - 2007-11-14 00:42:12
|
>>=20 >> + kvm_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER, >> + &vcpu->hv_clock.last_tsc); >> + >> + ktime_get_ts(&ts); >> + vcpu->hv_clock.now_ns =3D ts.tv_nsec + (NSEC_PER_SEC * >> (u64)ts.tv_sec); + vcpu->hv_clock.wc_sec =3D get_seconds(); >>=20 >> I am even thinking we have to disable interrupt between these lines, >> otherwise guest wall clock may see backward time source when >> calculating the delta TSC since last vcpu->hv_clock.now_ns update. >>=20 >=20 > That's true. While we do need to handle vcpu migration and > descheduling, the code sequence you note needs to be as atomic as > possible.=20 >=20 Yes VCPU migration is another issue. Since the physical platform TSC is=20 totally different across migration, so we can't expose host TSC to guest. For KVM, it should be simple to just use guest_TSC (=3DHOST_TSC + = offset). thanks, eddie |
From: Gerd H. <kr...@re...> - 2007-11-09 08:37:42
|
> +/* > + * Guest has page alignment and padding requirements. At the host, it will > + * only lead to wasted space at the vcpu struct. For this reason, the struct > + * is not anonymous > + */ > +union kvm_hv_clock { > + struct kvm_hv_clock_s { > + u64 tsc_mult; > + u64 now_ns; > + /* That's the wall clock, not the water closet */ > + u64 wc_sec; > + u64 last_tsc; > + /* At first, we could use the tsc value as a marker, but Jeremy > + * well noted that it will cause us locking problems in 32-bit > + * sys, so we have a special version field */ > + u32 version; > + } fields; > + char page_align[PAGE_SIZE]; > +}; What is the point in using a whole page per vcpu? You probably don't want struct kvm_hv_clock_s cross a page border. Is that the only reason or are there other constrains too? As the kvm clock looks quite simliar to what xen does, how about making the structs binary-compatible or simply reuse the xen version (struct vcpu_time_info in xen/interface/xen.h)? cheers, Gerd |
From: Avi K. <av...@qu...> - 2007-11-11 09:16:50
|
Gerd Hoffmann wrote: >> +/* >> + * Guest has page alignment and padding requirements. At the host, it will >> + * only lead to wasted space at the vcpu struct. For this reason, the struct >> + * is not anonymous >> + */ >> +union kvm_hv_clock { >> + struct kvm_hv_clock_s { >> + u64 tsc_mult; >> + u64 now_ns; >> + /* That's the wall clock, not the water closet */ >> + u64 wc_sec; >> + u64 last_tsc; >> + /* At first, we could use the tsc value as a marker, but Jeremy >> + * well noted that it will cause us locking problems in 32-bit >> + * sys, so we have a special version field */ >> + u32 version; >> + } fields; >> + char page_align[PAGE_SIZE]; >> +}; >> > > What is the point in using a whole page per vcpu? You probably don't > want struct kvm_hv_clock_s cross a page border. Is that the only reason > or are there other constrains too? > We don't even have the cross-page-boundary constraint. The real issue is that on i386 physical addresses are 64-bit while hypercall arguments are 32-bit. A page frame number is 32-bit and so poses no issues. I see two workarounds for this: - make the first hypercall argument a u64 rather than a long (using two registers on i386) - use an msr for registering the clock address The first is more general, while the second has the nice property of taking care of live migration automatically. > As the kvm clock looks quite simliar to what xen does, how about making > the structs binary-compatible or simply reuse the xen version (struct > vcpu_time_info in xen/interface/xen.h)? > If there are no technical issues, I have no objections. -- error compiling committee.c: too many arguments to function |
From: Avi K. <av...@qu...> - 2007-11-11 10:18:36
|
Glauber de Oliveira Costa wrote: > This is the host part of kvm clocksource implementation. As it does > not include clockevents, it is a fairly simple implementation. We > only have to register a per-vcpu area, and start writting to it periodically. > > Missing live migration support (a way for userspace to read and write the guest clock address). Should probably be in a separate patch. > @@ -1924,6 +1955,7 @@ out: > goto preempted; > } > > + kvm_write_guest_time(vcpu); > post_kvm_run_save(vcpu, kvm_run); > Why here? Seems like we're leaving the guest for a while at this place. Suggest putting it on top of __vcpu_run(), guarded by a flag, and setting the flag every time we put the vcpu. -- error compiling committee.c: too many arguments to function |
From: Glauber de O. C. <gc...@re...> - 2007-11-13 11:27:31
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Avi Kivity escreveu: > Glauber de Oliveira Costa wrote: >> This is the host part of kvm clocksource implementation. As it does >> not include clockevents, it is a fairly simple implementation. We >> only have to register a per-vcpu area, and start writting to it >> periodically. >> >> > > Missing live migration support (a way for userspace to read and write > the guest clock address). Should probably be in a separate patch. I think it's a matter of issuing a hypercall for reading the clock address. It's fair simple, and can be done in a later version of this patch. As for writting, the register hypercall itself can be used. It has no special side-effects we should care about. >> @@ -1924,6 +1955,7 @@ out: >> goto preempted; >> } >> >> + kvm_write_guest_time(vcpu); >> post_kvm_run_save(vcpu, kvm_run); >> > > Why here? Seems like we're leaving the guest for a while at this place. > > Suggest putting it on top of __vcpu_run(), guarded by a flag, and > setting the flag every time we put the vcpu. No special preference. It just sounded exity enough to me. I can move to where you suggest. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Remi - http://enigmail.mozdev.org iD8DBQFHOYpojYI8LaFUWXMRApf8AJ4jQ/ZTBlub1IwFkJrYZyart+f7bwCfT+9m l1Rblsmw96ZatCf60g2dNYY= =DBpn -----END PGP SIGNATURE----- |
From: Avi K. <av...@qu...> - 2007-11-13 14:45:46
|
Glauber de Oliveira Costa wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Avi Kivity escreveu: > >> Glauber de Oliveira Costa wrote: >> >>> This is the host part of kvm clocksource implementation. As it does >>> not include clockevents, it is a fairly simple implementation. We >>> only have to register a per-vcpu area, and start writting to it >>> periodically. >>> >>> >>> >> Missing live migration support (a way for userspace to read and write >> the guest clock address). Should probably be in a separate patch. >> > > I think it's a matter of issuing a hypercall for reading the clock > address. It's fair simple, and can be done in a later version of this patch. > As for writting, the register hypercall itself can be used. It has no > special side-effects we should care about. > kvm live migration is done (at least thus far) without guest involvement. So you need the host to be able to transfer this state. > >>> @@ -1924,6 +1955,7 @@ out: >>> goto preempted; >>> } >>> >>> + kvm_write_guest_time(vcpu); >>> post_kvm_run_save(vcpu, kvm_run); >>> >>> >> Why here? Seems like we're leaving the guest for a while at this place. >> >> Suggest putting it on top of __vcpu_run(), guarded by a flag, and >> setting the flag every time we put the vcpu. >> > > No special preference. It just sounded exity enough to me. I can move to > where you suggest. > > It's more than a place, it's a set of rules: - if the vcpu is migrated, we need a new timebase - ditto if we're descheduled (well that's the same thing) - if "some time passes" -- error compiling committee.c: too many arguments to function |