From: Marcelo T. <mto...@re...> - 2008-05-08 22:44:02
|
There's still a race in kvm_vcpu_block(), if a wake_up_interruptible() call happens before the task state is set to TASK_INTERRUPTIBLE: CPU0 CPU1 kvm_vcpu_block add_wait_queue kvm_cpu_has_interrupt = 0 set interrupt if (waitqueue_active()) wake_up_interruptible() kvm_cpu_has_pending_timer kvm_arch_vcpu_runnable signal_pending set_current_state(TASK_INTERRUPTIBLE) schedule() Can be fixed by using prepare_to_wait() which sets the task state before testing for the wait condition. Unfortunately it can't use wait_event_interruptible() due to vcpu_put/vcpu_load. Signed-off-by: Marcelo Tosatti <mto...@re...> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 0846d3d..fcc08c2 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -783,25 +783,26 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn) */ void kvm_vcpu_block(struct kvm_vcpu *vcpu) { - DECLARE_WAITQUEUE(wait, current); - - add_wait_queue(&vcpu->wq, &wait); - - /* - * We will block until either an interrupt or a signal wakes us up - */ - while (!kvm_cpu_has_interrupt(vcpu) - && !kvm_cpu_has_pending_timer(vcpu) - && !signal_pending(current) - && !kvm_arch_vcpu_runnable(vcpu)) { - set_current_state(TASK_INTERRUPTIBLE); + DEFINE_WAIT(wait); + + for (;;) { + prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE); + + if (kvm_cpu_has_interrupt(vcpu)) + break; + if (kvm_cpu_has_pending_timer(vcpu)) + break; + if (kvm_arch_vcpu_runnable(vcpu)) + break; + if (signal_pending(current)) + break; + vcpu_put(vcpu); schedule(); vcpu_load(vcpu); } - - __set_current_state(TASK_RUNNING); - remove_wait_queue(&vcpu->wq, &wait); + + finish_wait(&vcpu->wq, &wait); } void kvm_resched(struct kvm_vcpu *vcpu) |
From: Avi K. <av...@qu...> - 2008-05-09 07:40:47
|
Marcelo Tosatti wrote: > There's still a race in kvm_vcpu_block(), if a wake_up_interruptible() > call happens before the task state is set to TASK_INTERRUPTIBLE: > > CPU0 CPU1 > > kvm_vcpu_block > > add_wait_queue > > kvm_cpu_has_interrupt = 0 > set interrupt > if (waitqueue_active()) > wake_up_interruptible() > > kvm_cpu_has_pending_timer > kvm_arch_vcpu_runnable > signal_pending > > set_current_state(TASK_INTERRUPTIBLE) > schedule() > > Can be fixed by using prepare_to_wait() which sets the task state before > testing for the wait condition. > > Unfortunately it can't use wait_event_interruptible() due to > vcpu_put/vcpu_load. > > schedule() will call vcpu_put()/vcpu_load() for us through preempt notifiers. I feel a little uneasy about it, but no concreate reason why not to rely on it. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. |
From: Marcelo T. <mto...@re...> - 2008-05-09 14:18:33
|
On Fri, May 09, 2008 at 10:40:47AM +0300, Avi Kivity wrote: > >Unfortunately it can't use wait_event_interruptible() due to > >vcpu_put/vcpu_load. > > > > > > schedule() will call vcpu_put()/vcpu_load() for us through preempt > notifiers. I feel a little uneasy about it, but no concreate reason why > not to rely on it. The preempt notifiers hook call kvm_arch_vcpu_load / kvm_arch_vcpu_put, which won't unlock the vcpu mutex, right? I worry about a possible deadlock where some other operation that requires the vcpu mutex happens but the vcpu thread itself is in hlt. |
From: Avi K. <av...@qu...> - 2008-05-09 15:10:20
|
Marcelo Tosatti wrote: > On Fri, May 09, 2008 at 10:40:47AM +0300, Avi Kivity wrote: > > >>> Unfortunately it can't use wait_event_interruptible() due to >>> vcpu_put/vcpu_load. >>> >>> >>> >> schedule() will call vcpu_put()/vcpu_load() for us through preempt >> notifiers. I feel a little uneasy about it, but no concreate reason why >> not to rely on it. >> > > The preempt notifiers hook call kvm_arch_vcpu_load / kvm_arch_vcpu_put, > which won't unlock the vcpu mutex, right? > > Yes. > I worry about a possible deadlock where some other operation that > requires the vcpu mutex happens but the vcpu thread itself is in hlt. > Suppose the guest executed a busy-spin waiting for an interrupt instead of a hlt? We need to be able to handle that too. The best practice is to issue all vcpu ioctls from the thread that created the vcpu; this becomes mandatory if we ever switch to a syscall interface and remove the mutex. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. |
From: Marcelo T. <mto...@re...> - 2008-05-09 19:18:47
|
On Fri, May 09, 2008 at 06:09:41PM +0300, Avi Kivity wrote: > Marcelo Tosatti wrote: > >On Fri, May 09, 2008 at 10:40:47AM +0300, Avi Kivity wrote: > > > > > >>>Unfortunately it can't use wait_event_interruptible() due to > >>>vcpu_put/vcpu_load. > >>> > >>> > >>> > >>schedule() will call vcpu_put()/vcpu_load() for us through preempt > >>notifiers. I feel a little uneasy about it, but no concreate reason why > >>not to rely on it. > >> > > > >The preempt notifiers hook call kvm_arch_vcpu_load / kvm_arch_vcpu_put, > >which won't unlock the vcpu mutex, right? > > > > > > Yes. > > >I worry about a possible deadlock where some other operation that > >requires the vcpu mutex happens but the vcpu thread itself is in hlt. > > > > Suppose the guest executed a busy-spin waiting for an interrupt instead > of a hlt? We need to be able to handle that too. True. > The best practice is to issue all vcpu ioctls from the thread that > created the vcpu; this becomes mandatory if we ever switch to a syscall > interface and remove the mutex. For things like register dumps I don't believe its worthwhile. Much simpler to stop the vcpu with SIG_IPI, retrieve registers, and run it again (now that you mention the busy-spin, it is broken right now, if a vcpu is spinning without exiting to userspace). So do you want to give wait_event_interruptible() a try or wait for that change until userspace never issues vcpu ioctl's to a possibly busy vcpu (and go with the patch above)? |
From: Marcelo T. <mto...@re...> - 2008-05-09 19:24:25
|
On Fri, May 09, 2008 at 04:22:08PM -0300, Marcelo Tosatti wrote: > For things like register dumps I don't believe its worthwhile. Much > simpler to stop the vcpu with SIG_IPI, retrieve registers, and run it > again (now that you mention the busy-spin, it is broken right now, if a > vcpu is spinning without exiting to userspace). ... which is what Jan's gdb/monitor patch does. |
From: Avi K. <av...@qu...> - 2008-05-11 14:24:36
|
Marcelo Tosatti wrote: > On Fri, May 09, 2008 at 04:22:08PM -0300, Marcelo Tosatti wrote: > >> For things like register dumps I don't believe its worthwhile. Much >> simpler to stop the vcpu with SIG_IPI, retrieve registers, and run it >> again (now that you mention the busy-spin, it is broken right now, if a >> vcpu is spinning without exiting to userspace). >> > > ... which is what Jan's gdb/monitor patch does. > We'd better change it. Suppose your vcpu is spinning, and you want to find out why? One nice thing would be to add an on_vcpu(void (*func)(void *data), void *data), so that people don't have to open-code it for things like this. I tried to do this once, but backed off due to the mess that qemu-kvm threading was at the time. I think it's much better off now. -- error compiling committee.c: too many arguments to function |
From: Avi K. <av...@qu...> - 2008-05-11 14:26:04
|
Marcelo Tosatti wrote: >> The best practice is to issue all vcpu ioctls from the thread that >> created the vcpu; this becomes mandatory if we ever switch to a syscall >> interface and remove the mutex. >> > > For things like register dumps I don't believe its worthwhile. Much > simpler to stop the vcpu with SIG_IPI, retrieve registers, and run it > again (now that you mention the busy-spin, it is broken right now, if a > vcpu is spinning without exiting to userspace). > > So do you want to give wait_event_interruptible() a try or wait for that > change until userspace never issues vcpu ioctl's to a possibly busy vcpu > (and go with the patch above)? > Do we have anything critical that issues vcpu ioctls outside its thread? While I much prefer wait_event_interruptible(), I don't want to break existing userspace. -- error compiling committee.c: too many arguments to function |
From: Marcelo T. <mto...@re...> - 2008-05-14 05:17:50
|
On Sun, May 11, 2008 at 05:26:06PM +0300, Avi Kivity wrote: > >So do you want to give wait_event_interruptible() a try or wait for that > >change until userspace never issues vcpu ioctl's to a possibly busy vcpu > >(and go with the patch above)? > > > > Do we have anything critical that issues vcpu ioctls outside its > thread? While I much prefer wait_event_interruptible(), I don't want to > break existing userspace. Well debugging can be critical, so IMO better avoid wait_event_interruptible() for now. |
From: Avi K. <av...@qu...> - 2008-05-14 07:55:18
|
Marcelo Tosatti wrote: > On Sun, May 11, 2008 at 05:26:06PM +0300, Avi Kivity wrote: > >>> So do you want to give wait_event_interruptible() a try or wait for that >>> change until userspace never issues vcpu ioctl's to a possibly busy vcpu >>> (and go with the patch above)? >>> >>> >> Do we have anything critical that issues vcpu ioctls outside its >> thread? While I much prefer wait_event_interruptible(), I don't want to >> break existing userspace. >> > > Well debugging can be critical, so IMO better avoid wait_event_interruptible() > for now. > The vast majority of users don't care about debugging, and debugging will be broken anyway if a vcpu is spinning (which might be the reason for debugging in the first place). But the w_e_i() conversion can be done later, so I'll apply the patch. -- error compiling committee.c: too many arguments to function |