From: Glauber de O. C. <gc...@re...> - 2008-01-11 13:11:02
|
Hi folks, A lot of time has been elapsed since I last sent a version of kvm clock, with a lot of things in the middle, including my vacations and a nasty bug in the clock itself ;-) Here it is a new version. I'm following avi's last suggestion of using an msr to make it more migration-proof. If it's not approved, we can easily switch back to the hypercall mechanism. That's the easy part ;-) Clock updates are done every time we're scheduled. It is _not_ enough to check if we're changing cpus, because the tsc will change its frequency when the cpu is idle (if using the mwait idle). So after a long idle period, one cpu will end up stepping ahead causing a backward time (remember the bug I talked about? :-) ) As I fell it ready for inclusion (which obviously does not mean it is perfect, nor that I don't value any further comments ), I'm also posting the userspace patches in a separate series that will follow shortly. |
From: Glauber de O. C. <gc...@re...> - 2008-01-11 13:11:11
|
kvm_para.h potentially contains definitions that are to be used by kvm-userspace, so it should not be included inside the __KERNEL__ block. To protect its own data structures, kvm_para.h already includes its own __KERNEL__ block. Signed-off-by: Glauber de Oliveira Costa <gc...@re...> --- include/linux/kvm_para.h | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h index 6af91a5..5497aac 100644 --- a/include/linux/kvm_para.h +++ b/include/linux/kvm_para.h @@ -14,12 +14,12 @@ #define KVM_HC_VAPIC_POLL_IRQ 1 -#ifdef __KERNEL__ /* * hypercalls use architecture specific */ #include <asm/kvm_para.h> +#ifdef __KERNEL__ static inline int kvm_para_has_feature(unsigned int feature) { if (kvm_arch_para_features() & (1UL << feature)) -- 1.5.0.6 |
From: Glauber de O. C. <gc...@re...> - 2008-01-11 13:11:12
|
This is the host part of kvm clocksource implementation. As it does not include clockevents, it is a fairly simple implementation. We only have to register a per-vcpu area, and start writting to it periodically. The area is binary compatible with xen, as we use the same shadow_info structure. Signed-off-by: Glauber de Oliveira Costa <gc...@re...> --- arch/x86/kvm/x86.c | 64 +++++++++++++++++++++++++++++++++++++++++++- include/asm-x86/kvm_host.h | 5 +++ include/asm-x86/kvm_para.h | 5 +++ include/linux/kvm.h | 1 + 4 files changed, 74 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8a90403..d3fd920 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -19,6 +19,7 @@ #include "irq.h" #include "mmu.h" +#include <linux/clocksource.h> #include <linux/kvm.h> #include <linux/fs.h> #include <linux/vmalloc.h> @@ -412,7 +413,7 @@ static u32 msrs_to_save[] = { #ifdef CONFIG_X86_64 MSR_CSTAR, MSR_KERNEL_GS_BASE, MSR_SYSCALL_MASK, MSR_LSTAR, #endif - MSR_IA32_TIME_STAMP_COUNTER, + MSR_IA32_TIME_STAMP_COUNTER, MSR_PARAVIRT_CLOCK, }; static unsigned num_msrs_to_save; @@ -467,6 +468,50 @@ static int do_set_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data) return kvm_set_msr(vcpu, index, *data); } +#define WC_OFFSET offsetof(struct shared_info, wc_version) + +static void kvm_write_guest_time(struct kvm_vcpu *v) +{ + struct timespec ts, wc_ts; + int wc_args[3]; /* version, wc_sec, wc_nsec */ + unsigned long flags; + struct kvm_vcpu_arch *vcpu = &v->arch; + + if (!vcpu->shared_info) + return; + + /* Make the update of both version numbers visible to guest */ + wc_args[0] = ++vcpu->wc_version; + kvm_write_guest(v->kvm, vcpu->shared_info + WC_OFFSET, wc_args, + sizeof(wc_args)); + vcpu->hv_clock.version++; + kvm_write_guest(v->kvm, vcpu->clock_addr, &vcpu->hv_clock, + sizeof(vcpu->hv_clock)); + + /* Keep irq disabled to prevent changes to the clock */ + local_irq_save(flags); + kvm_get_msr(v, MSR_IA32_TIME_STAMP_COUNTER, + &vcpu->hv_clock.tsc_timestamp); + wc_ts = current_kernel_time(); + ktime_get_ts(&ts); + local_irq_restore(flags); + + /* With all the info we got, fill in the values */ + wc_args[1] = wc_ts.tv_sec; + wc_args[2] = wc_ts.tv_nsec; + wc_args[0] = ++vcpu->wc_version; + + vcpu->hv_clock.system_time = ts.tv_nsec + + (NSEC_PER_SEC * (u64)ts.tv_sec); + vcpu->hv_clock.version++; + + /* And finally, let the guest see them */ + kvm_write_guest(v->kvm, vcpu->shared_info + WC_OFFSET, wc_args, + sizeof(wc_args)); + kvm_write_guest(v->kvm, vcpu->clock_addr, &vcpu->hv_clock, + sizeof(vcpu->hv_clock)); +} + int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) { @@ -494,6 +539,18 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) case MSR_IA32_MISC_ENABLE: vcpu->arch.ia32_misc_enable_msr = data; break; + case MSR_PARAVIRT_CLOCK: { + vcpu->arch.shared_info = data; + + vcpu->arch.clock_addr = data + offsetof(struct vcpu_info, time) + + sizeof(struct vcpu_info) * vcpu->vcpu_id; + + vcpu->arch.hv_clock.tsc_to_system_mul = + clocksource_khz2mult(tsc_khz, 22); + vcpu->arch.hv_clock.tsc_shift = 22; + kvm_write_guest_time(vcpu); + break; + } default: pr_unimpl(vcpu, "unhandled wrmsr: 0x%x data %llx\n", msr, data); return 1; @@ -553,6 +610,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) data = vcpu->arch.shadow_efer; break; #endif + case MSR_PARAVIRT_CLOCK: + data = vcpu->arch.shared_info; + break; default: pr_unimpl(vcpu, "unhandled rdmsr: 0x%x\n", msr); return 1; @@ -680,6 +740,7 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_USER_MEMORY: case KVM_CAP_SET_TSS_ADDR: case KVM_CAP_EXT_CPUID: + case KVM_CAP_CLOCKSOURCE: r = 1; break; case KVM_CAP_VAPIC: @@ -737,6 +798,7 @@ out: void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) { kvm_x86_ops->vcpu_load(vcpu, cpu); + kvm_write_guest_time(vcpu); } void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h index d6db0de..bbc4b51 100644 --- a/include/asm-x86/kvm_host.h +++ b/include/asm-x86/kvm_host.h @@ -261,6 +261,11 @@ struct kvm_vcpu_arch { /* emulate context */ struct x86_emulate_ctxt emulate_ctxt; + + struct vcpu_time_info hv_clock; + gpa_t shared_info; + gpa_t clock_addr; /* so we avoid computing it every time */ + int wc_version; }; struct kvm_mem_alias { diff --git a/include/asm-x86/kvm_para.h b/include/asm-x86/kvm_para.h index c6f3fd8..abe412a 100644 --- a/include/asm-x86/kvm_para.h +++ b/include/asm-x86/kvm_para.h @@ -1,5 +1,6 @@ #ifndef __X86_KVM_PARA_H #define __X86_KVM_PARA_H +#include <xen/interface/xen.h> /* This CPUID returns the signature 'KVMKVMKVM' in ebx, ecx, and edx. It * should be used to determine that a VM is running under KVM. @@ -10,9 +11,13 @@ * paravirtualization, the appropriate feature bit should be checked. */ #define KVM_CPUID_FEATURES 0x40000001 +#define KVM_FEATURE_CLOCKSOURCE 0 +#define MSR_PARAVIRT_CLOCK 0x11 #ifdef __KERNEL__ #include <asm/processor.h> +extern void kvmclock_init(void); + /* This instruction is vmcall. On non-VT architectures, it will generate a * trap that we will then rewrite to the appropriate instruction. diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 4de4fd2..78ce53f 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -232,6 +232,7 @@ struct kvm_vapic_addr { #define KVM_CAP_SET_TSS_ADDR 4 #define KVM_CAP_EXT_CPUID 5 #define KVM_CAP_VAPIC 6 +#define KVM_CAP_CLOCKSOURCE 7 /* * ioctls for VM fds -- 1.5.0.6 |
From: Glauber de O. C. <gc...@re...> - 2008-01-11 13:11:13
|
This is the guest part of kvm clock implementation It does not do tsc-only timing, as tsc can have deltas between cpus, and it did not seem worthy to me to keep adjusting them. We do use it, however, for fine-grained adjustment. Other than that, time comes from the host. Signed-off-by: Glauber de Oliveira Costa <gc...@re...> --- arch/x86/Kconfig | 10 +++ arch/x86/kernel/Makefile_32 | 1 + arch/x86/kernel/kvmclock.c | 160 +++++++++++++++++++++++++++++++++++++++++++ arch/x86/kernel/setup_32.c | 5 ++ arch/x86/xen/time.c | 6 +- 5 files changed, 179 insertions(+), 3 deletions(-) create mode 100644 arch/x86/kernel/kvmclock.c diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index ab2df55..968315e 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -350,6 +350,16 @@ config VMI at the moment), by linking the kernel to a GPL-ed ROM module provided by the hypervisor. +config KVM_CLOCK + bool "KVM paravirtualized clock" + select PARAVIRT + help + Turning on this option will allow you to run a paravirtualized clock + when running over the KVM hypervisor. Instead of relying on a PIT + (or probably other) emulation by the underlying device model, the host + provides the guest with timing infrastructure, as time of day, and + timer expiration. + source "arch/x86/lguest/Kconfig" endif diff --git a/arch/x86/kernel/Makefile_32 b/arch/x86/kernel/Makefile_32 index a7bc93c..f6332b6 100644 --- a/arch/x86/kernel/Makefile_32 +++ b/arch/x86/kernel/Makefile_32 @@ -44,6 +44,7 @@ obj-$(CONFIG_K8_NB) += k8.o obj-$(CONFIG_MGEODE_LX) += geode_32.o mfgpt_32.o obj-$(CONFIG_VMI) += vmi_32.o vmiclock_32.o +obj-$(CONFIG_KVM_CLOCK) += kvmclock.o obj-$(CONFIG_PARAVIRT) += paravirt_32.o obj-y += pcspeaker.o diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c new file mode 100644 index 0000000..f2094f4 --- /dev/null +++ b/arch/x86/kernel/kvmclock.c @@ -0,0 +1,160 @@ +/* KVM paravirtual clock driver. A clocksource implementation + Copyright (C) 2007 Glauber de Oliveira Costa, Red Hat Inc. + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, write to the Free Software + Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +*/ + +#include <linux/clocksource.h> +#include <linux/kvm_para.h> +#include <asm/arch_hooks.h> +#include <asm/msr.h> + + +#define KVM_SCALE 22 + +static int kvmclock = 1; + +static int parse_no_kvmclock(char *arg) +{ + kvmclock = 0; + return 0; +} +early_param("no-kvmclock", parse_no_kvmclock); + +struct shared_info shared_info __attribute__((__aligned__(PAGE_SIZE))); + +/* The hypervisor will put information about time periodically here */ +DEFINE_PER_CPU(struct vcpu_time_info *, hv_clock); +#define get_clock(cpu, field) per_cpu(hv_clock, cpu)->field + +static inline u64 kvm_get_delta(u64 last_tsc) +{ + int cpu = smp_processor_id(); + u64 delta = native_read_tsc() - last_tsc; + return (delta * get_clock(cpu, tsc_to_system_mul)) >> KVM_SCALE; +} + +/* + * The wallclock is the time of day when we booted. Since then, some time may + * have elapsed since the hypervisor wrote the data. So we try to account for + * that. Even if the tsc is not accurate, it gives us a more accurate timing + * than not adjusting at all + */ +unsigned long kvm_get_wallclock(void) +{ + u32 wc_sec, wc_nsec; + u64 delta, last_tsc; + struct timespec ts; + int version, nsec, cpu = smp_processor_id(); + + do { + version = shared_info.wc_version; + rmb(); + wc_sec = shared_info.wc_sec; + wc_nsec = shared_info.wc_nsec; + last_tsc = get_clock(cpu, tsc_timestamp); + rmb(); + } while ((shared_info.wc_version != version) || (version & 1)); + + delta = kvm_get_delta(last_tsc); + delta += wc_nsec; + nsec = do_div(delta, NSEC_PER_SEC); + set_normalized_timespec(&ts, wc_sec + delta, nsec); + /* + * Of all mechanisms of time adjustment I've tested, this one + * was the champion! + */ + return ts.tv_sec + 1; +} + +int kvm_set_wallclock(unsigned long now) +{ + return 0; +} + +/* + * This is our read_clock function. The host puts an tsc timestamp each time + * it updates a new time, and then we can use it to derive a slightly more + * precise notion of elapsed time, converted to nanoseconds. + */ +static cycle_t kvm_clock_read(void) +{ + u64 last_tsc, now; + u32 version; + int cpu; + + preempt_disable(); + cpu = smp_processor_id(); + + do { + version = get_clock(cpu, version); + rmb(); + last_tsc = get_clock(cpu, tsc_timestamp); + now = get_clock(cpu, system_time); + rmb(); + } while ((get_clock(cpu, version) != version) || (version & 1)); + + now += kvm_get_delta(last_tsc); + preempt_enable(); + + return now; +} + +static struct clocksource kvm_clock = { + .name = "kvm-clock", + .read = kvm_clock_read, + .rating = 400, + .mask = CLOCKSOURCE_MASK(64), + .mult = 1 << KVM_SCALE, + .shift = KVM_SCALE, + .flags = CLOCK_SOURCE_IS_CONTINUOUS, +}; + +static int kvm_register_clock(void) +{ + int cpu = smp_processor_id(); + u64 kvm_clock_info = (u64)(u64 *)&shared_info; + kvm_clock_info = __pa(kvm_clock_info); + __get_cpu_var(hv_clock) = &(shared_info.vcpu_info[cpu].time); + + return native_write_msr_safe(MSR_PARAVIRT_CLOCK, kvm_clock_info); +} + +static void kvm_setup_secondary_clock(void) +{ + /* + * Now that the first cpu already had this clocksource initialized, + * we shouldn't fail. + */ + WARN_ON(kvm_register_clock()); + /* ok, done with our trickery, call native */ + setup_secondary_APIC_clock(); +} + +void __init kvmclock_init(void) +{ + if (!kvm_para_available()) + return; + + if (kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)) { + if (kvm_register_clock()) + return; + pv_time_ops.get_wallclock = kvm_get_wallclock; + pv_time_ops.set_wallclock = kvm_set_wallclock; + pv_time_ops.sched_clock = kvm_clock_read; + pv_apic_ops.setup_secondary_clock = kvm_setup_secondary_clock; + clocksource_register(&kvm_clock); + } +} diff --git a/arch/x86/kernel/setup_32.c b/arch/x86/kernel/setup_32.c index 9c24b45..89c0eb2 100644 --- a/arch/x86/kernel/setup_32.c +++ b/arch/x86/kernel/setup_32.c @@ -44,6 +44,7 @@ #include <linux/crash_dump.h> #include <linux/dmi.h> #include <linux/pfn.h> +#include <linux/kvm_para.h> #include <video/edid.h> @@ -614,6 +615,10 @@ void __init setup_arch(char **cmdline_p) max_low_pfn = setup_memory(); +#ifdef CONFIG_KVM_CLOCK + kvmclock_init(); +#endif + #ifdef CONFIG_VMI /* * Must be after max_low_pfn is determined, and before kernel diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c index d083ff5..7728b87 100644 --- a/arch/x86/xen/time.c +++ b/arch/x86/xen/time.c @@ -28,7 +28,7 @@ #define TIMER_SLOP 100000 #define NS_PER_TICK (1000000000LL / HZ) -static cycle_t xen_clocksource_read(void); +cycle_t xen_clocksource_read(void); /* These are perodically updated in shared_info, and then copied here. */ struct shadow_time_info { @@ -304,7 +304,7 @@ static u64 get_nsec_offset(struct shadow_time_info *shadow) return scale_delta(delta, shadow->tsc_to_nsec_mul, shadow->tsc_shift); } -static cycle_t xen_clocksource_read(void) +cycle_t xen_clocksource_read(void) { struct shadow_time_info *shadow = &get_cpu_var(shadow_time); cycle_t ret; @@ -322,7 +322,7 @@ static cycle_t xen_clocksource_read(void) return ret; } -static void xen_read_wallclock(struct timespec *ts) +void xen_read_wallclock(struct timespec *ts) { const struct shared_info *s = HYPERVISOR_shared_info; u32 version; -- 1.5.0.6 |
From: Gerd H. <kr...@re...> - 2008-01-11 13:23:09
|
Hi, > diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c > index d083ff5..7728b87 100644 > --- a/arch/x86/xen/time.c > +++ b/arch/x86/xen/time.c > -static cycle_t xen_clocksource_read(void); > +cycle_t xen_clocksource_read(void); Huh? You kill the static, but don't use the functions anywhere? Looks like half-done code sharing with xen paravirt clock ... cheers, Gerd |
From: Glauber de O. C. <gc...@re...> - 2008-01-11 13:34:11
|
Gerd Hoffmann wrote: > Hi, > >> diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c >> index d083ff5..7728b87 100644 >> --- a/arch/x86/xen/time.c >> +++ b/arch/x86/xen/time.c >> -static cycle_t xen_clocksource_read(void); >> +cycle_t xen_clocksource_read(void); > > Huh? You kill the static, but don't use the functions anywhere? Looks > like half-done code sharing with xen paravirt clock ... > > cheers, > Gerd > It's called experimentation. I was sure I deleted this part of the patch altogheter, but I must have missed it. Thanks for the point out. |
From: Anthony L. <an...@co...> - 2008-01-11 15:47:18
|
Glauber de Oliveira Costa wrote: > This is the host part of kvm clocksource implementation. As it does > not include clockevents, it is a fairly simple implementation. We > only have to register a per-vcpu area, and start writting to it periodically. > > The area is binary compatible with xen, as we use the same shadow_info structure. > > diff --git a/include/asm-x86/kvm_para.h b/include/asm-x86/kvm_para.h > index c6f3fd8..abe412a 100644 > --- a/include/asm-x86/kvm_para.h > +++ b/include/asm-x86/kvm_para.h > @@ -1,5 +1,6 @@ > #ifndef __X86_KVM_PARA_H > #define __X86_KVM_PARA_H > +#include <xen/interface/xen.h> Can we abstract that out into a neutral header instead of including the Xen headers directly in KVM. Please rename the structure too to something neutral. Including something as generic as xen/interface/xen.h when CONFIG_XEN may not be set is not a good thing, I believe. Regards, Anthony Liguori > > /* This CPUID returns the signature 'KVMKVMKVM' in ebx, ecx, and edx. It > * should be used to determine that a VM is running under KVM. > @@ -10,9 +11,13 @@ > * paravirtualization, the appropriate feature bit should be checked. > */ > #define KVM_CPUID_FEATURES 0x40000001 > +#define KVM_FEATURE_CLOCKSOURCE 0 > > +#define MSR_PARAVIRT_CLOCK 0x11 > #ifdef __KERNEL__ > #include <asm/processor.h> > +extern void kvmclock_init(void); > + > > /* This instruction is vmcall. On non-VT architectures, it will generate a > * trap that we will then rewrite to the appropriate instruction. > diff --git a/include/linux/kvm.h b/include/linux/kvm.h > index 4de4fd2..78ce53f 100644 > --- a/include/linux/kvm.h > +++ b/include/linux/kvm.h > @@ -232,6 +232,7 @@ struct kvm_vapic_addr { > #define KVM_CAP_SET_TSS_ADDR 4 > #define KVM_CAP_EXT_CPUID 5 > #define KVM_CAP_VAPIC 6 > +#define KVM_CAP_CLOCKSOURCE 7 > > /* > * ioctls for VM fds > |
From: Avi K. <av...@qu...> - 2008-01-12 20:45:06
|
Glauber de Oliveira Costa wrote: > This is the host part of kvm clocksource implementation. As it does > not include clockevents, it is a fairly simple implementation. We > only have to register a per-vcpu area, and start writting to it periodically. > > The area is binary compatible with xen, as we use the same shadow_info structure. > > @@ -412,7 +413,7 @@ static u32 msrs_to_save[] = { > #ifdef CONFIG_X86_64 > MSR_CSTAR, MSR_KERNEL_GS_BASE, MSR_SYSCALL_MASK, MSR_LSTAR, > #endif > - MSR_IA32_TIME_STAMP_COUNTER, > + MSR_IA32_TIME_STAMP_COUNTER, MSR_PARAVIRT_CLOCK, > }; > Let us be good citizens and put the msr into our private MSR_KVM_* namespace. > > > +#define WC_OFFSET offsetof(struct shared_info, wc_version) > + > +static void kvm_write_guest_time(struct kvm_vcpu *v) > +{ > + struct timespec ts, wc_ts; > + int wc_args[3]; /* version, wc_sec, wc_nsec */ > + unsigned long flags; > + struct kvm_vcpu_arch *vcpu = &v->arch; > + > + if (!vcpu->shared_info) > + return; > + > + /* Make the update of both version numbers visible to guest */ > + wc_args[0] = ++vcpu->wc_version; > + kvm_write_guest(v->kvm, vcpu->shared_info + WC_OFFSET, wc_args, > + sizeof(wc_args)); > + vcpu->hv_clock.version++; > + kvm_write_guest(v->kvm, vcpu->clock_addr, &vcpu->hv_clock, > + sizeof(vcpu->hv_clock)); > + > + /* Keep irq disabled to prevent changes to the clock */ > + local_irq_save(flags); > + kvm_get_msr(v, MSR_IA32_TIME_STAMP_COUNTER, > + &vcpu->hv_clock.tsc_timestamp); > + wc_ts = current_kernel_time(); > + ktime_get_ts(&ts); > + local_irq_restore(flags); > + > + /* With all the info we got, fill in the values */ > + wc_args[1] = wc_ts.tv_sec; > + wc_args[2] = wc_ts.tv_nsec; > + wc_args[0] = ++vcpu->wc_version; > + > + vcpu->hv_clock.system_time = ts.tv_nsec + > + (NSEC_PER_SEC * (u64)ts.tv_sec); > + vcpu->hv_clock.version++; > + > + /* And finally, let the guest see them */ > + kvm_write_guest(v->kvm, vcpu->shared_info + WC_OFFSET, wc_args, > + sizeof(wc_args)); > + kvm_write_guest(v->kvm, vcpu->clock_addr, &vcpu->hv_clock, > + sizeof(vcpu->hv_clock)); > +} > + > Why write things twice? This is per-vcpu, and the guest cannot see anything between the two writes. Also, kvm_write_guest() needs mmap_sem for read. Suggest you grab the page using gfn_to_page() and kmap_atomic() it instead. -- Any sufficiently difficult bug is indistinguishable from a feature. |
From: Avi K. <av...@qu...> - 2008-01-12 20:55:50
|
Anthony Liguori wrote: > Glauber de Oliveira Costa wrote: > >> This is the host part of kvm clocksource implementation. As it does >> not include clockevents, it is a fairly simple implementation. We >> only have to register a per-vcpu area, and start writting to it periodically. >> >> The area is binary compatible with xen, as we use the same shadow_info structure. >> >> diff --git a/include/asm-x86/kvm_para.h b/include/asm-x86/kvm_para.h >> index c6f3fd8..abe412a 100644 >> --- a/include/asm-x86/kvm_para.h >> +++ b/include/asm-x86/kvm_para.h >> @@ -1,5 +1,6 @@ >> #ifndef __X86_KVM_PARA_H >> #define __X86_KVM_PARA_H >> +#include <xen/interface/xen.h> >> > > Can we abstract that out into a neutral header instead of including the > Xen headers directly in KVM. Please rename the structure too to > something neutral. > > Including something as generic as xen/interface/xen.h when CONFIG_XEN > may not be set is not a good thing, I believe. > Well, one of the motivations behind this is to actually supply a Xen clock to Xen guests. So maybe we can call the feature a "kvm xenclock" instead and feel justified in including Xen headers. However, you are probably right in that we are getting into a dependency and kconfig hell we do not yet deserve. Not even mentioning that CONFIG_XEN is a guest thing while kvmclock is also a host thing. So we are probably better of using a non-Xen structure that accidentally happens to be binary compatible with the Xen interface. Stranger things have happened. If we agree on this, please define _only_ the time-related parts so we have a decoupled interface. We'll need two msrs: one for wc and one for vcpu time. -- Any sufficiently difficult bug is indistinguishable from a feature. |
From: Anthony L. <an...@co...> - 2008-01-12 22:50:52
|
Avi Kivity wrote: > Anthony Liguori wrote: >> Glauber de Oliveira Costa wrote: >> >>> This is the host part of kvm clocksource implementation. As it does >>> not include clockevents, it is a fairly simple implementation. We >>> only have to register a per-vcpu area, and start writting to it >>> periodically. >>> >>> The area is binary compatible with xen, as we use the same >>> shadow_info structure. >>> >>> diff --git a/include/asm-x86/kvm_para.h b/include/asm-x86/kvm_para.h >>> index c6f3fd8..abe412a 100644 >>> --- a/include/asm-x86/kvm_para.h >>> +++ b/include/asm-x86/kvm_para.h >>> @@ -1,5 +1,6 @@ >>> #ifndef __X86_KVM_PARA_H >>> #define __X86_KVM_PARA_H >>> +#include <xen/interface/xen.h> >>> >> >> Can we abstract that out into a neutral header instead of including >> the Xen headers directly in KVM. Please rename the structure too to >> something neutral. >> >> Including something as generic as xen/interface/xen.h when CONFIG_XEN >> may not be set is not a good thing, I believe. >> > > Well, one of the motivations behind this is to actually supply a Xen > clock to Xen guests. So maybe we can call the feature a "kvm > xenclock" instead and feel justified in including Xen headers. > > However, you are probably right in that we are getting into a > dependency and kconfig hell we do not yet deserve. Not even > mentioning that CONFIG_XEN is a guest thing while kvmclock is also a > host thing. > > So we are probably better of using a non-Xen structure that > accidentally happens to be binary compatible with the Xen interface. > Stranger things have happened. Yes, this is what I meant. My concerns is dependencies and namespace pollution. I think using the same interface is a very Good Thing. Regards, Anthony Liguori > If we agree on this, please define _only_ the time-related parts so we > have a decoupled interface. We'll need two msrs: one for wc and one > for vcpu time. > |
From: Amit S. <ami...@qu...> - 2008-01-11 15:06:26
|
On Friday 11 January 2008 18:40:54 Glauber de Oliveira Costa wrote: > kvm_para.h potentially contains definitions that are to be used by > kvm-userspace, so it should not be included inside the __KERNEL__ block. To > protect its own data structures, kvm_para.h already includes its own > __KERNEL__ block. > > Signed-off-by: Glauber de Oliveira Costa <gc...@re...> Acked-by: Amit Shah <ami...@qu...> |
From: Anthony L. <an...@co...> - 2008-01-11 15:49:42
|
Glauber de Oliveira Costa wrote: > Hi folks, > > A lot of time has been elapsed since I last sent a version of kvm clock, > with a lot of things in the middle, including my vacations and a nasty bug > in the clock itself ;-) > > Here it is a new version. I'm following avi's last suggestion of using an msr > to make it more migration-proof. If it's not approved, we can easily switch back > to the hypercall mechanism. That's the easy part ;-) > > Clock updates are done every time we're scheduled. It is _not_ enough to check > if we're changing cpus, because the tsc will change its frequency when the cpu is > idle (if using the mwait idle). So after a long idle period, one cpu will end up stepping > ahead causing a backward time (remember the bug I talked about? :-) ) > > As I fell it ready for inclusion (which obviously does not mean it is perfect, nor that > I don't value any further comments ), I'm also posting the userspace patches in a separate > series that will follow shortly. > > This patch series is looking good. I like the switch to an MSR. Regards, Anthony Liguori |