Re: [kvm-devel] pinning, tsc and apic

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

* Anthony Liguori <an...@co...> [2008-05-12 15:05]:
> Ryan Harper wrote:
> >I've been digging into some of the instability we see when running
> >larger numbers of guests at the same time.  The test I'm currently using
> >involves launching 64 1vcpu guests on an 8-way AMD box.
> 
> Note this is a Barcelona system and therefore should have a 
> fixed-frequency TSC.
> 
> >  With the latest
> >kvm-userspace git and kvm.git + Gerd's kvmclock fixes, I can launch all
> >64 of these 1 second apart,
> 
> BTW, what if you don't pace-out the startups?  Do we still have issues 
> with that?

Do you mean without the 1 second delay or with a longer delay?  My
experience is that delay helps (fewer hangs), but doesn't solve things
completely.

> 
> > and only a handful (1 to 3)  end up not
> >making it up.  In dmesg on the host, I get a couple messages:
> >
> >[321365.362534] vcpu not ready for apic_round_robin
> >
> >and 
> >
> >[321503.023788] Unsupported delivery mode 7
> >
> >Now, the interesting bit for me was when I used numactl to pin the guest
> >to a processor, all of the guests come up with no issues at all.  As I
> >looked into it, it means that we're not running any of the vcpu
> >migration code which on svm is comprised of tsc_offset recalibration and
> >apic migration, and on vmx, a little more per-vcpu work
> >  
> 
> Another data point is that -no-kvm-irqchip doesn't make the situation 
> better.

Right.  Let me clarify; I still see hung guests, but I need to validate
if I see the apic related messages in the host or not, I don't recall
for certain.

> >I've convinced myself that svm.c's tsc offset calculation works and
> >handles the migration from cpu to cpu quite well.  I added the following
> >snippet to trigger if we ever encountered the case where we migrated to
> >a tsc that was behind:
> >
> >    rdtscll(tsc_this);
> >    delta = vcpu->arch.host_tsc - tsc_this;
> >    old_time = vcpu->arch.host_tsc + svm->vmcb->control.tsc_offset;
> >    new_time = tsc_this + svm->vmcb->control.tsc_offset + delta;
> >    if (new_time < old_time) {
> >        printk(KERN_ERR "ACK! (CPU%d->CPU%d) time goes back %llu\n",
> >               vcpu->cpu, cpu, old_time - new_time);
> >    }
> >    svm->vmcb->control.tsc_offset += delta;
> >  
> 
> Time will never go backwards, but what can happen is that the TSC 
> frequency will slow down.  This is because upon VCPU migration, we don't 
> account for the time between vcpu_put on the old processor and vcpu_load 
> on the new processor.  This time then disappears.

In svm.c, I do think we account for most of that time since the delta
calculation will shift the guest time forward to the tsc value read in
svm_vcpu_load().  We'll still miss the time between fixing the offset
and when the guest can actually read its tsc.

> 
> A possible way to fix this (that's only valid on a processor with a 
> fixed-frequency TSC), is to take a high-res timestamp on vcpu_put, and 
> then on vcpu_load, take the delta timestamp since the old TSC was saved, 
> and use the TSC frequency on the new pcpu to calculate the number of 
> elapsed cycles.
> 
> Assuming a fixed frequency TSC, and a calibrated TSC across all 
> processors, you could get the same affects by using the VT tsc delta 
> logic.  Basically, it always uses the new CPU's TSC unless that would 
> cause the guest to move backwards in time.  As long as you have a 
> stable, calibrated TSC, this would work out.
> 
> Can you try your old patch that did this and see if it fixes the problem?

Yeah, I'll give it a spin.

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253   T/L: 678-9253
ry...@us...