|
From: Ryan H. <ry...@us...> - 2008-05-12 21:24:17
|
* Anthony Liguori <an...@co...> [2008-05-12 15:05]:
> Ryan Harper wrote:
> >I've been digging into some of the instability we see when running
> >larger numbers of guests at the same time. The test I'm currently using
> >involves launching 64 1vcpu guests on an 8-way AMD box.
>
> Note this is a Barcelona system and therefore should have a
> fixed-frequency TSC.
>
> > With the latest
> >kvm-userspace git and kvm.git + Gerd's kvmclock fixes, I can launch all
> >64 of these 1 second apart,
>
> BTW, what if you don't pace-out the startups? Do we still have issues
> with that?
Do you mean without the 1 second delay or with a longer delay? My
experience is that delay helps (fewer hangs), but doesn't solve things
completely.
>
> > and only a handful (1 to 3) end up not
> >making it up. In dmesg on the host, I get a couple messages:
> >
> >[321365.362534] vcpu not ready for apic_round_robin
> >
> >and
> >
> >[321503.023788] Unsupported delivery mode 7
> >
> >Now, the interesting bit for me was when I used numactl to pin the guest
> >to a processor, all of the guests come up with no issues at all. As I
> >looked into it, it means that we're not running any of the vcpu
> >migration code which on svm is comprised of tsc_offset recalibration and
> >apic migration, and on vmx, a little more per-vcpu work
> >
>
> Another data point is that -no-kvm-irqchip doesn't make the situation
> better.
Right. Let me clarify; I still see hung guests, but I need to validate
if I see the apic related messages in the host or not, I don't recall
for certain.
> >I've convinced myself that svm.c's tsc offset calculation works and
> >handles the migration from cpu to cpu quite well. I added the following
> >snippet to trigger if we ever encountered the case where we migrated to
> >a tsc that was behind:
> >
> > rdtscll(tsc_this);
> > delta = vcpu->arch.host_tsc - tsc_this;
> > old_time = vcpu->arch.host_tsc + svm->vmcb->control.tsc_offset;
> > new_time = tsc_this + svm->vmcb->control.tsc_offset + delta;
> > if (new_time < old_time) {
> > printk(KERN_ERR "ACK! (CPU%d->CPU%d) time goes back %llu\n",
> > vcpu->cpu, cpu, old_time - new_time);
> > }
> > svm->vmcb->control.tsc_offset += delta;
> >
>
> Time will never go backwards, but what can happen is that the TSC
> frequency will slow down. This is because upon VCPU migration, we don't
> account for the time between vcpu_put on the old processor and vcpu_load
> on the new processor. This time then disappears.
In svm.c, I do think we account for most of that time since the delta
calculation will shift the guest time forward to the tsc value read in
svm_vcpu_load(). We'll still miss the time between fixing the offset
and when the guest can actually read its tsc.
>
> A possible way to fix this (that's only valid on a processor with a
> fixed-frequency TSC), is to take a high-res timestamp on vcpu_put, and
> then on vcpu_load, take the delta timestamp since the old TSC was saved,
> and use the TSC frequency on the new pcpu to calculate the number of
> elapsed cycles.
>
> Assuming a fixed frequency TSC, and a calibrated TSC across all
> processors, you could get the same affects by using the VT tsc delta
> logic. Basically, it always uses the new CPU's TSC unless that would
> cause the guest to move backwards in time. As long as you have a
> stable, calibrated TSC, this would work out.
>
> Can you try your old patch that did this and see if it fixes the problem?
Yeah, I'll give it a spin.
--
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253 T/L: 678-9253
ry...@us...
|