From: Gerd H. <kr...@re...> - 2008-04-11 07:33:26
|
Hi, Tried to use kvmclock with xenner and noticed that the kvmclock (MSR_KVM_SYSTEM_TIME msr) is incompatible with xen. kvm guests do this to translate the tsc delta into nsecs: #define get_clock(cpu, field) per_cpu(hv_clock, cpu).field static inline u64 kvm_get_delta(u64 last_tsc) { int cpu = smp_processor_id(); u64 delta = native_read_tsc() - last_tsc; return (delta * get_clock(cpu, tsc_to_system_mul)) >> 22; } whereas xen guests do this (64bit version): static inline u64 scale_delta(u64 delta, u32 mul_frac, int shift) { u64 product; if (shift < 0) delta >>= -shift; else delta <<= shift; __asm__ ( "mul %%rdx ; shrd $32,%%rdx,%%rax" : "=a" (product) : "0" (delta), "d" ((u64)mul_frac)); return product; } Note that xen does a 64bit multiply (of the 64bit delta and the 32bit factor) yielding a 128bit result, then picking bits 32-95 for the 64bit return value. In contrast kvm does a simple 64bit multiply, which is equivalent to using the lowest 64 bits. Thus kvm is off by factor 2^32, and that without even considering the ordering of two (shift + multiply) operations and any rounding errors ... cheers, Gerd -- http://kraxel.fedorapeople.org/xenner/ |
From: Avi K. <av...@qu...> - 2008-04-11 12:07:43
|
Gerd Hoffmann wrote: > Hi, > > Tried to use kvmclock with xenner and noticed that the kvmclock > (MSR_KVM_SYSTEM_TIME msr) is incompatible with xen. > > Patches are welcome, especially as kvmclock isn't merged yet, so there are no backward compatibility issues. -- Any sufficiently difficult bug is indistinguishable from a feature. |
From: Gerd H. <kr...@re...> - 2008-04-11 13:44:40
Attachments:
kvmclock-1.diff
|
Avi Kivity wrote: > Gerd Hoffmann wrote: >> Hi, >> >> Tried to use kvmclock with xenner and noticed that the kvmclock >> (MSR_KVM_SYSTEM_TIME msr) is incompatible with xen. > > Patches are welcome, especially as kvmclock isn't merged yet, so there > are no backward compatibility issues. Great, so I'll happily go break kvm guests ;) Patch revision #1 attached. It changes the way the tsc-delta-scaling fields are calculated to be compatible with xen. Code is taken from xenner (which got it from xen) and adapted a bit. Host only, kvm guest side not done (yet). With that patch applied xen guests with pv clock enabled happily boot to the login prompt, without complains about time going backwards. Fine. Wall clock is off a few hours though. Oops. I think the way wall clock and system clock work together in xen (Jeremy correct me if I'm wrong) is that the wall clock specifies the point in time where the system clock started going. As kvm fills in host system time into the guest system time fields the guest wall clock fields should be filled with the host boot time timestamp I'd say. Comments? Gerd -- http://kraxel.fedorapeople.org/xenner/ |
From: Gerd H. <kr...@re...> - 2008-04-11 15:01:34
Attachments:
kvmclock-2.diff
|
Gerd Hoffmann wrote: > Wall clock is off a few hours though. Oops. > > I think the way wall clock and system clock work together in xen (Jeremy > correct me if I'm wrong) is that the wall clock specifies the point in > time where the system clock started going. As kvm fills in host system > time into the guest system time fields the guest wall clock fields > should be filled with the host boot time timestamp I'd say. Following up myself with a quick&dirty patch to tackle this issue too. This one calculates the boot time. That should be solveable better, include/linux/time.h lists two functions which sound promising: extern void getboottime(struct timespec *ts); extern void monotonic_to_bootbased(struct timespec *ts); Neither of them is available to modules though, so I can't test without rebooting my laptop ... monotonic_to_bootbased() sounds like we would get hosts ntp adjustments in the guests for free. cheers, Gerd -- http://kraxel.fedorapeople.org/xenner/ |
From: Jeremy F. <je...@go...> - 2008-04-11 19:02:31
|
Gerd Hoffmann wrote: > Wall clock is off a few hours though. Oops. > > I think the way wall clock and system clock work together in xen (Jeremy > correct me if I'm wrong) is that the wall clock specifies the point in > time where the system clock started going. As kvm fills in host system > time into the guest system time fields the guest wall clock fields > should be filled with the host boot time timestamp I'd say. > Yes. The wallclock field in the shared info structure is the wallclock time at boot; you compute the current time by adding the system timestamp to it. System time changes are effected by retroactively changing the boot time of the machine, though that can also change because of suspend/resume/migrate. In general the kernel only reads the wallclock time at boot, and then maintains it for itself from then on. I think. J |
From: Gerd H. <kr...@re...> - 2008-04-18 14:26:30
|
Jeremy Fitzhardinge wrote: > Gerd Hoffmann wrote: >> Wall clock is off a few hours though. Oops. >> >> I think the way wall clock and system clock work together in xen (Jeremy >> correct me if I'm wrong) is that the wall clock specifies the point in >> time where the system clock started going. As kvm fills in host system >> time into the guest system time fields the guest wall clock fields >> should be filled with the host boot time timestamp I'd say. >> > > Yes. The wallclock field in the shared info structure is the wallclock > time at boot; you compute the current time by adding the system > timestamp to it. System time changes are effected by retroactively > changing the boot time of the machine, though that can also change > because of suspend/resume/migrate. > > In general the kernel only reads the wallclock time at boot, and then > maintains it for itself from then on. I think. Thanks. I'm looking at the guest side of the issue right now, trying to identify common code, and while doing so noticed that xen does the version-check-loop in both get_time_values_from_xen(void) and xen_clocksource_read(void), and I can't see any obvious reason for that. The loop in xen_clocksource_read(void) is not needed IMHO. Can I drop it? cheers, Gerd -- http://kraxel.fedorapeople.org/xenner/ |
From: Jeremy F. <je...@go...> - 2008-04-18 22:23:45
|
Gerd Hoffmann wrote: > I'm looking at the guest side of the issue right now, trying to identify > common code, and while doing so noticed that xen does the > version-check-loop in both get_time_values_from_xen(void) and > xen_clocksource_read(void), and I can't see any obvious reason for that. > The loop in xen_clocksource_read(void) is not needed IMHO. Can I drop it? > No. The get_nsec_offset() needs to be atomic with respect to the get_time_values() parameters. There could be a loopless __get_time_values() for use in this case, but given that it almost never loops, I don't think its worthwhile. J |
From: Gerd H. <kr...@re...> - 2008-04-21 07:31:51
|
Jeremy Fitzhardinge wrote: > Gerd Hoffmann wrote: >> I'm looking at the guest side of the issue right now, trying to identify >> common code, and while doing so noticed that xen does the >> version-check-loop in both get_time_values_from_xen(void) and >> xen_clocksource_read(void), and I can't see any obvious reason for that. >> The loop in xen_clocksource_read(void) is not needed IMHO. Can I >> drop it? > > No. The get_nsec_offset() needs to be atomic with respect to the > get_time_values() parameters. Hmm, I somehow fail to see a case where it could be non-atomic ... get_time_values() copies a consistent snapshot, thus xen_clocksource_read() doesn't race against xen updating the fields. The snapshot is in a per-cpu variable, thus it doesn't race against other guest vcpus running get_time_values() at the same time. > There could be a loopless > __get_time_values() for use in this case, but given that it almost never > loops, I don't think its worthwhile. "in this case" ??? I'm confused. There is only a single user of get_nsec_offset(), which is xen_clocksource_read() ... cheers, Gerd -- http://kraxel.fedorapeople.org/xenner/ |
From: Jeremy F. <je...@go...> - 2008-04-21 11:46:43
|
Gerd Hoffmann wrote: > Hmm, I somehow fail to see a case where it could be non-atomic ... > > get_time_values() copies a consistent snapshot, thus > xen_clocksource_read() doesn't race against xen updating the fields. > The snapshot is in a per-cpu variable, thus it doesn't race against > other guest vcpus running get_time_values() at the same time. > Xen could change the parameters in the instant after get_time_values(). That change could be as a result of suspend-resume, so the parameters and the tsc could be wildly different. It's definitely an edge-case, but it's easy enough to deal with. >> There could be a loopless >> __get_time_values() for use in this case, but given that it almost never >> loops, I don't think its worthwhile. >> > > "in this case" ??? I'm confused. There is only a single user of > get_nsec_offset(), which is xen_clocksource_read() ... > Sure, but get_time_values() has several other callers. If xen_clocksource_read() had its own loop to make sure the read_tsc is atomic with respect to get_time_values, then get_time_values itself needn't loop. But, given that it only loops in the very rare case that it races with Xen updating those parameters, it doesn't seem to make much difference either way. J |
From: Gerd H. <kr...@re...> - 2008-04-21 12:50:29
|
Jeremy Fitzhardinge wrote: > Xen could change the parameters in the instant after get_time_values(). > That change could be as a result of suspend-resume, so the parameters > and the tsc could be wildly different. Ah, ok, forgot the rdtsc in the picture. With that in mind I fully agree that the loop is needed. I think kvm guests can even hit that one with the vcpu migrating to a different physical cpu, so we better handle it correctly ;) > Sure, but get_time_values() has several other callers. Not really. There are only two calls, one in clocksource_read() and one in the init path. The later is superfluous I think because clocksource_read() is the only user of the shadowed time info. cheers, Gerd -- http://kraxel.fedorapeople.org/xenner/ |
From: Glauber C. <gc...@re...> - 2008-04-22 17:59:57
|
Gerd Hoffmann wrote: > Jeremy Fitzhardinge wrote: >> Xen could change the parameters in the instant after get_time_values(). >> That change could be as a result of suspend-resume, so the parameters >> and the tsc could be wildly different. > > Ah, ok, forgot the rdtsc in the picture. With that in mind I fully > agree that the loop is needed. I think kvm guests can even hit that one > with the vcpu migrating to a different physical cpu, so we better handle > it correctly ;) It's probably not needed for kvm, since we update everything everytime we get scheduled in the host side, which would cover the case for migration between physical cpus. But it's probably okay to do it to get a common denominator with xen, if needed. |
From: Gerd H. <kr...@re...> - 2008-04-23 06:03:47
|
Glauber Costa wrote: > Gerd Hoffmann wrote: >> Jeremy Fitzhardinge wrote: >>> Xen could change the parameters in the instant after >>> get_time_values(). That change could be as a result of >>> suspend-resume, so the parameters >>> and the tsc could be wildly different. >> >> Ah, ok, forgot the rdtsc in the picture. With that in mind I fully >> agree that the loop is needed. I think kvm guests can even hit that one >> with the vcpu migrating to a different physical cpu, so we better handle >> it correctly ;) > > It's probably not needed for kvm, since we update everything everytime > we get scheduled in the host side, which would cover the case for > migration between physical cpus. No, it wouldn't. The corner case we must catch is: guest reads time info, kvm reschedules the guest to another pcpu, guest reads the tsc. The time info used by the guest for the tsc delta is stale then, it belongs to the previous pcpu. cheers, Gerd -- http://kraxel.fedorapeople.org/xenner/ |
From: Glauber C. <gc...@re...> - 2008-04-24 13:04:09
|
Gerd Hoffmann wrote: > Glauber Costa wrote: >> Gerd Hoffmann wrote: >>> Jeremy Fitzhardinge wrote: >>>> Xen could change the parameters in the instant after >>>> get_time_values(). That change could be as a result of >>>> suspend-resume, so the parameters >>>> and the tsc could be wildly different. >>> Ah, ok, forgot the rdtsc in the picture. With that in mind I fully >>> agree that the loop is needed. I think kvm guests can even hit that one >>> with the vcpu migrating to a different physical cpu, so we better handle >>> it correctly ;) >> It's probably not needed for kvm, since we update everything everytime >> we get scheduled in the host side, which would cover the case for >> migration between physical cpus. > > No, it wouldn't. The corner case we must catch is: guest reads time > info, kvm reschedules the guest to another pcpu, guest reads the tsc. > The time info used by the guest for the tsc delta is stale then, it > belongs to the previous pcpu. > > cheers, > Gerd > Agreed. |
From: Jeremy F. <je...@go...> - 2008-04-21 13:34:42
|
Gerd Hoffmann wrote: > Jeremy Fitzhardinge wrote: > >> Xen could change the parameters in the instant after get_time_values(). >> That change could be as a result of suspend-resume, so the parameters >> and the tsc could be wildly different. >> > > Ah, ok, forgot the rdtsc in the picture. With that in mind I fully > agree that the loop is needed. I think kvm guests can even hit that one > with the vcpu migrating to a different physical cpu, so we better handle > it correctly ;) > Yes, same with Xen. >> Sure, but get_time_values() has several other callers. >> > > Not really. There are only two calls, one in clocksource_read() and one > in the init path. The later is superfluous I think because > clocksource_read() is the only user of the shadowed time info. > Hm. It doesn't look like shadow_time needs to be a static percpu at all. It could just be a local to clocksource_read, I think. J |
From: Gerd H. <kr...@re...> - 2008-04-21 14:20:15
|
Jeremy Fitzhardinge wrote: > Gerd Hoffmann wrote: >> Not really. There are only two calls, one in clocksource_read() and one >> in the init path. The later is superfluous I think because >> clocksource_read() is the only user of the shadowed time info. > > Hm. It doesn't look like shadow_time needs to be a static percpu at > all. It could just be a local to clocksource_read, I think. Good point, one more cleanup. thanks, Gerd -- http://kraxel.fedorapeople.org/xenner/ |