From: Ryan H. <ry...@us...> - 2008-05-12 19:20:11
|
I've been digging into some of the instability we see when running larger numbers of guests at the same time. The test I'm currently using involves launching 64 1vcpu guests on an 8-way AMD box. With the latest kvm-userspace git and kvm.git + Gerd's kvmclock fixes, I can launch all 64 of these 1 second apart, and only a handful (1 to 3) end up not making it up. In dmesg on the host, I get a couple messages: [321365.362534] vcpu not ready for apic_round_robin and [321503.023788] Unsupported delivery mode 7 Now, the interesting bit for me was when I used numactl to pin the guest to a processor, all of the guests come up with no issues at all. As I looked into it, it means that we're not running any of the vcpu migration code which on svm is comprised of tsc_offset recalibration and apic migration, and on vmx, a little more per-vcpu work I've convinced myself that svm.c's tsc offset calculation works and handles the migration from cpu to cpu quite well. I added the following snippet to trigger if we ever encountered the case where we migrated to a tsc that was behind: rdtscll(tsc_this); delta = vcpu->arch.host_tsc - tsc_this; old_time = vcpu->arch.host_tsc + svm->vmcb->control.tsc_offset; new_time = tsc_this + svm->vmcb->control.tsc_offset + delta; if (new_time < old_time) { printk(KERN_ERR "ACK! (CPU%d->CPU%d) time goes back %llu\n", vcpu->cpu, cpu, old_time - new_time); } svm->vmcb->control.tsc_offset += delta; Noting that vcpu->arch.host_tsc is the tsc of the previous cpu the vcpu was running on (see svm_put_vcpu()). This allows me to check if we are in fact increasing the guest's view of the tsc. I've not be able to trigger this at all when the vcpus are migrating. As for the apic, the migrate code seems to be rather simple, but I've not yet dived in to see if we've got anything racy in there: lapic.c: void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu) { struct kvm_lapic *apic = vcpu->arch.apic; struct hrtimer *timer; if (!apic) return; timer = &apic->timer.dev; if (hrtimer_cancel(timer)) hrtimer_start(timer, timer->expires, HRTIMER_MODE_ABS); } Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx (512) 838-9253 T/L: 678-9253 ry...@us... |