From: Anthony L. <ali...@us...> - 2007-08-27 16:02:50
|
I've never really thought much about them until now. What's the case for supporting userspace hypercalls? The current way the code works is a little scary. Hypercalls that aren't handled by kernelspace are deferred to userspace. Of course, kernelspace has no idea whether userspace is actually using a given hypercall so if kernelspace needs another one, the two may clash. AFAICT, the primary reason to use hypercalls is performance. A vmcall is a few hundred cycles faster than a PIO exit. In the light-weight exit path, this may make a significant different. However, when going to userspace, it's not only a heavy-weight exit but it's also paying the cost of a ring transition. The few hundred cycle savings is small in comparison to the total cost so I don't think performance is a real benefit here. The hypercall namespace is much smaller than the PIO namespace, and there's no "plug-and-play" like mechanism to resolve conflict. PIO/MMIO has this via PCI and it seems like any userspace device ought to be either a PCI device or use a static PIO port. Plus, paravirtual devices that use PCI/PIO/MMIO are much more likely to be reusable by other VMMs (Xen, QEMU, even VMware). In the future, if we decide a certain hypercall could be done better in userspace, and we have guests using those hypercalls, it makes sense to plumb the hypercalls down. My question is, should we support userspace hypercalls until that point? Regards, Anthony Liguori |
From: Avi K. <av...@qu...> - 2007-08-27 16:19:19
|
Anthony Liguori wrote: > I've never really thought much about them until now. What's the case > for supporting userspace hypercalls? > > The current way the code works is a little scary. Hypercalls that > aren't handled by kernelspace are deferred to userspace. Of course, > kernelspace has no idea whether userspace is actually using a given > hypercall so if kernelspace needs another one, the two may clash. > > AFAICT, the primary reason to use hypercalls is performance. A vmcall > is a few hundred cycles faster than a PIO exit. In the light-weight > exit path, this may make a significant different. However, when going > to userspace, it's not only a heavy-weight exit but it's also paying the > cost of a ring transition. The few hundred cycle savings is small in > comparison to the total cost so I don't think performance is a real > benefit here. > Actually the heavyweight exit is much more expensive than the ring transition. > The hypercall namespace is much smaller than the PIO namespace, and > there's no "plug-and-play" like mechanism to resolve conflict. PIO/MMIO > has this via PCI and it seems like any userspace device ought to be > either a PCI device or use a static PIO port. Plus, paravirtual devices > that use PCI/PIO/MMIO are much more likely to be reusable by other VMMs > (Xen, QEMU, even VMware). > > In the future, if we decide a certain hypercall could be done better in > userspace, and we have guests using those hypercalls, it makes sense to > plumb the hypercalls down. > > My question is, should we support userspace hypercalls until that point? > I've already mentioned this but I'll repeat it for google: allowing hypercalls to fallback to userspace gives you flexibility to have either a kernel implementation or a userspace implementation for the same functionality. This means a pvnet driver can be used either directly with a virtual interface on the host, or having some userspace processing in qemu. Similarly, pvblock can be processed in the kernel for real block devices, or in userspace for qcow format files, without the need to teach the kernel about the qcow format somehow. Dor's initial pv devices are implemented in qemu with a view to having a faster implementation in the kernel, so userspace hypercalls are on the table now. -- Any sufficiently difficult bug is indistinguishable from a feature. |
From: Avi K. <av...@qu...> - 2007-08-27 16:21:24
|
Avi Kivity wrote: > This means a pvnet driver can be used either directly > with a virtual interface on the host, or having some userspace > processing in qemu. Also, 'qmu -net user -net model=pv' if you're unlucky enough not to have root. -- Any sufficiently difficult bug is indistinguishable from a feature. |
From: Avi K. <av...@qu...> - 2007-08-27 17:53:40
|
Anthony Liguori wrote: > On Mon, 2007-08-27 at 20:36 +0300, Avi Kivity wrote: > >> Anthony Liguori wrote: >> >>> On Mon, 2007-08-27 at 19:47 +0300, Avi Kivity wrote: >>> >>> >>>> Avi Kivity wrote: >>>> >>>> >>>>> Thinking a little more about this, it isn't about handling hypercalls >>>>> in userspace, but about handling a virtio sync() in userspace. >>>>> >>>>> So how about having a KVM_HC_WAKE_CHANNEL hypercall (similar to Xen's >>>>> event channel, but assymetric) that has a channel parameter. The >>>>> kernel handler for that hypercall dispatches calls to either a kernel >>>>> handler or a userspace handler. That means we don't need a separate >>>>> ETH_SEND, ETH_RECEIVE, or BLOCK_SEND hypercalls. >>>>> >>>>> >>>> And thinking a tiny little bit more about this, we can have the kernel >>>> (optionally) fire an eventfd, so a separate userspace thread or process >>>> can be woken up to service the device, without a heavyweight exit. >>>> >>>> >>> Yes, I think this is much nicer. By "calls to ... a userspace handler" >>> I presume you mean generating an exit to userspace with a new exit type >>> similar to how hypercalls work today? >>> >>> >>> >> There are two options: >> - hypercall handler sets some fields in vcpu->run and exits to userspace >> - hypercall handler triggers an eventfd and returns to guest >> >> Maybe we can unify the two by only allowing eventfd; >> > > Yes, that would be better except that the latency may be unacceptable. > > Hmm. Good point. I keep saying kvm can have great I/O because the scheduler is not involved in ordinary I/O. Let's not break that. >> userspace can >> attach a signal to the eventfd if it wants a synchronous exit (does >> eventfd allow fcntl(F_SETOWN)?) >> > > Which would address the latency issue nicely. Looking at the fs code, > it looks like eventfd shouldn't have to do anything special for it. > I'm not sure now. Which thread will be selected for accepting the signal? if it isn't guaranteed to be the current thread, we're back with scheduler involvement, and possibly cacheline bouncing. -- Any sufficiently difficult bug is indistinguishable from a feature. |
From: Anthony L. <ali...@us...> - 2007-08-27 18:07:46
|
On Mon, 2007-08-27 at 20:51 +0300, Avi Kivity wrote: > Anthony Liguori wrote: > > On Mon, 2007-08-27 at 20:36 +0300, Avi Kivity wrote: > > > > Yes, that would be better except that the latency may be unacceptable. > > > > > > Hmm. Good point. I keep saying kvm can have great I/O because the > scheduler is not involved in ordinary I/O. Let's not break that. Most definitely! > >> userspace can > >> attach a signal to the eventfd if it wants a synchronous exit (does > >> eventfd allow fcntl(F_SETOWN)?) > >> > > > > Which would address the latency issue nicely. Looking at the fs code, > > it looks like eventfd shouldn't have to do anything special for it. > > > > I'm not sure now. Which thread will be selected for accepting the > signal? if it isn't guaranteed to be the current thread, we're back > with scheduler involvement, and possibly cacheline bouncing. I don't know enough about this in the kernel but I agree on principle, we need to be able to guarantee that the current thread receives the signal or we have to go back to doing an exit. Regards, Anthony Liguori |
From: Dor L. <dor...@qu...> - 2007-08-29 06:59:13
|
>>> I've never really thought much about them until now. What's the case >>> for supporting userspace hypercalls? >>> >>> The current way the code works is a little scary. Hypercalls that >>> aren't handled by kernelspace are deferred to userspace. Of course, >>> kernelspace has no idea whether userspace is actually using a given >>> hypercall so if kernelspace needs another one, the two may clash. >>> >>> AFAICT, the primary reason to use hypercalls is performance. A >vmcall >>> is a few hundred cycles faster than a PIO exit. In the light-weight >>> exit path, this may make a significant different. However, when >going >>> to userspace, it's not only a heavy-weight exit but it's also paying >the >>> cost of a ring transition. The few hundred cycle savings is small in >>> comparison to the total cost so I don't think performance is a real >>> benefit here. >>> >> >> Actually the heavyweight exit is much more expensive than the ring >> transition. >> >>> The hypercall namespace is much smaller than the PIO namespace, and >>> there's no "plug-and-play" like mechanism to resolve conflict. >PIO/MMIO >>> has this via PCI and it seems like any userspace device ought to be >>> either a PCI device or use a static PIO port. Plus, paravirtual >devices >>> that use PCI/PIO/MMIO are much more likely to be reusable by other >VMMs >>> (Xen, QEMU, even VMware). >>> >>> In the future, if we decide a certain hypercall could be done better >in >>> userspace, and we have guests using those hypercalls, it makes sense >to >>> plumb the hypercalls down. >>> >>> My question is, should we support userspace hypercalls until that >point? >>> >> >> I've already mentioned this but I'll repeat it for google: allowing >> hypercalls to fallback to userspace gives you flexibility to have >> either a kernel implementation or a userspace implementation for the >> same functionality. This means a pvnet driver can be used either >> directly with a virtual interface on the host, or having some >> userspace processing in qemu. Similarly, pvblock can be processed in >> the kernel for real block devices, or in userspace for qcow format >> files, without the need to teach the kernel about the qcow format >> somehow. >> >> Dor's initial pv devices are implemented in qemu with a view to having >> a faster implementation in the kernel, so userspace hypercalls are on >> the table now. >> > >Thinking a little more about this, it isn't about handling hypercalls in >userspace, but about handling a virtio sync() in userspace. > >So how about having a KVM_HC_WAKE_CHANNEL hypercall (similar to Xen's >event channel, but assymetric) that has a channel parameter. The kernel >handler for that hypercall dispatches calls to either a kernel handler >or a userspace handler. That means we don't need a separate ETH_SEND, >ETH_RECEIVE, or BLOCK_SEND hypercalls. > Some points: - These were none receive/send/block_send hypercalls on the first place. There were just register and notify hypercalls. - The balloon code also uses hypercalls and let userspace handle them so higher layer will allow the guest inflate/deflate actions. - The good thing about using hypercalls than pio is that it's cpu arch agnostics. - It's also more complex to asign io range for a driver inside the guest (not that complex but harder then issueing a simple hypercall. Regards, Dor. |
From: Avi K. <av...@qu...> - 2007-08-29 21:26:37
|
Dor Laor wrote: >> >> Thinking a little more about this, it isn't about handling hypercalls >> > in > >> userspace, but about handling a virtio sync() in userspace. >> >> So how about having a KVM_HC_WAKE_CHANNEL hypercall (similar to Xen's >> event channel, but assymetric) that has a channel parameter. The >> > kernel > >> handler for that hypercall dispatches calls to either a kernel handler >> or a userspace handler. That means we don't need a separate ETH_SEND, >> ETH_RECEIVE, or BLOCK_SEND hypercalls. >> >> > > Some points: > - These were none receive/send/block_send hypercalls on the first place. > There were just register and notify hypercalls. > But they were ethernet/block/whatever specific. I'm proposing a single "wake this channel up" hypercall. > - The balloon code also uses hypercalls and let userspace handle them so > higher layer will allow the guest inflate/deflate actions. > That could be ported to virtio. It is actually advantageous to balloon asynchronously. -- Any sufficiently difficult bug is indistinguishable from a feature. |
From: Avi K. <av...@qu...> - 2007-08-27 16:35:56
|
Avi Kivity wrote: > Anthony Liguori wrote: >> I've never really thought much about them until now. What's the case >> for supporting userspace hypercalls? >> >> The current way the code works is a little scary. Hypercalls that >> aren't handled by kernelspace are deferred to userspace. Of course, >> kernelspace has no idea whether userspace is actually using a given >> hypercall so if kernelspace needs another one, the two may clash. >> >> AFAICT, the primary reason to use hypercalls is performance. A vmcall >> is a few hundred cycles faster than a PIO exit. In the light-weight >> exit path, this may make a significant different. However, when going >> to userspace, it's not only a heavy-weight exit but it's also paying the >> cost of a ring transition. The few hundred cycle savings is small in >> comparison to the total cost so I don't think performance is a real >> benefit here. >> > > Actually the heavyweight exit is much more expensive than the ring > transition. > >> The hypercall namespace is much smaller than the PIO namespace, and >> there's no "plug-and-play" like mechanism to resolve conflict. PIO/MMIO >> has this via PCI and it seems like any userspace device ought to be >> either a PCI device or use a static PIO port. Plus, paravirtual devices >> that use PCI/PIO/MMIO are much more likely to be reusable by other VMMs >> (Xen, QEMU, even VMware). >> >> In the future, if we decide a certain hypercall could be done better in >> userspace, and we have guests using those hypercalls, it makes sense to >> plumb the hypercalls down. >> >> My question is, should we support userspace hypercalls until that point? >> > > I've already mentioned this but I'll repeat it for google: allowing > hypercalls to fallback to userspace gives you flexibility to have > either a kernel implementation or a userspace implementation for the > same functionality. This means a pvnet driver can be used either > directly with a virtual interface on the host, or having some > userspace processing in qemu. Similarly, pvblock can be processed in > the kernel for real block devices, or in userspace for qcow format > files, without the need to teach the kernel about the qcow format > somehow. > > Dor's initial pv devices are implemented in qemu with a view to having > a faster implementation in the kernel, so userspace hypercalls are on > the table now. > Thinking a little more about this, it isn't about handling hypercalls in userspace, but about handling a virtio sync() in userspace. So how about having a KVM_HC_WAKE_CHANNEL hypercall (similar to Xen's event channel, but assymetric) that has a channel parameter. The kernel handler for that hypercall dispatches calls to either a kernel handler or a userspace handler. That means we don't need a separate ETH_SEND, ETH_RECEIVE, or BLOCK_SEND hypercalls. -- Any sufficiently difficult bug is indistinguishable from a feature. |
From: Avi K. <av...@qu...> - 2007-08-27 16:49:45
|
Avi Kivity wrote: > > Thinking a little more about this, it isn't about handling hypercalls > in userspace, but about handling a virtio sync() in userspace. > > So how about having a KVM_HC_WAKE_CHANNEL hypercall (similar to Xen's > event channel, but assymetric) that has a channel parameter. The > kernel handler for that hypercall dispatches calls to either a kernel > handler or a userspace handler. That means we don't need a separate > ETH_SEND, ETH_RECEIVE, or BLOCK_SEND hypercalls. And thinking a tiny little bit more about this, we can have the kernel (optionally) fire an eventfd, so a separate userspace thread or process can be woken up to service the device, without a heavyweight exit. -- Any sufficiently difficult bug is indistinguishable from a feature. |
From: Anthony L. <ali...@us...> - 2007-08-27 17:32:28
|
On Mon, 2007-08-27 at 19:47 +0300, Avi Kivity wrote: > Avi Kivity wrote: > > > > Thinking a little more about this, it isn't about handling hypercalls > > in userspace, but about handling a virtio sync() in userspace. > > > > So how about having a KVM_HC_WAKE_CHANNEL hypercall (similar to Xen's > > event channel, but assymetric) that has a channel parameter. The > > kernel handler for that hypercall dispatches calls to either a kernel > > handler or a userspace handler. That means we don't need a separate > > ETH_SEND, ETH_RECEIVE, or BLOCK_SEND hypercalls. > > And thinking a tiny little bit more about this, we can have the kernel > (optionally) fire an eventfd, so a separate userspace thread or process > can be woken up to service the device, without a heavyweight exit. Yes, I think this is much nicer. By "calls to ... a userspace handler" I presume you mean generating an exit to userspace with a new exit type similar to how hypercalls work today? Regards, Anthony Liguori |
From: Avi K. <av...@qu...> - 2007-08-27 17:39:13
|
Anthony Liguori wrote: > On Mon, 2007-08-27 at 19:47 +0300, Avi Kivity wrote: > >> Avi Kivity wrote: >> >>> Thinking a little more about this, it isn't about handling hypercalls >>> in userspace, but about handling a virtio sync() in userspace. >>> >>> So how about having a KVM_HC_WAKE_CHANNEL hypercall (similar to Xen's >>> event channel, but assymetric) that has a channel parameter. The >>> kernel handler for that hypercall dispatches calls to either a kernel >>> handler or a userspace handler. That means we don't need a separate >>> ETH_SEND, ETH_RECEIVE, or BLOCK_SEND hypercalls. >>> >> And thinking a tiny little bit more about this, we can have the kernel >> (optionally) fire an eventfd, so a separate userspace thread or process >> can be woken up to service the device, without a heavyweight exit. >> > > > Yes, I think this is much nicer. By "calls to ... a userspace handler" > I presume you mean generating an exit to userspace with a new exit type > similar to how hypercalls work today? > > There are two options: - hypercall handler sets some fields in vcpu->run and exits to userspace - hypercall handler triggers an eventfd and returns to guest Maybe we can unify the two by only allowing eventfd; userspace can attach a signal to the eventfd if it wants a synchronous exit (does eventfd allow fcntl(F_SETOWN)?) -- Any sufficiently difficult bug is indistinguishable from a feature. |
From: Anthony L. <ali...@us...> - 2007-08-27 17:47:37
|
On Mon, 2007-08-27 at 20:36 +0300, Avi Kivity wrote: > Anthony Liguori wrote: > > On Mon, 2007-08-27 at 19:47 +0300, Avi Kivity wrote: > > > >> Avi Kivity wrote: > >> > >>> Thinking a little more about this, it isn't about handling hypercalls > >>> in userspace, but about handling a virtio sync() in userspace. > >>> > >>> So how about having a KVM_HC_WAKE_CHANNEL hypercall (similar to Xen's > >>> event channel, but assymetric) that has a channel parameter. The > >>> kernel handler for that hypercall dispatches calls to either a kernel > >>> handler or a userspace handler. That means we don't need a separate > >>> ETH_SEND, ETH_RECEIVE, or BLOCK_SEND hypercalls. > >>> > >> And thinking a tiny little bit more about this, we can have the kernel > >> (optionally) fire an eventfd, so a separate userspace thread or process > >> can be woken up to service the device, without a heavyweight exit. > >> > > > > > > Yes, I think this is much nicer. By "calls to ... a userspace handler" > > I presume you mean generating an exit to userspace with a new exit type > > similar to how hypercalls work today? > > > > > > There are two options: > - hypercall handler sets some fields in vcpu->run and exits to userspace > - hypercall handler triggers an eventfd and returns to guest > > Maybe we can unify the two by only allowing eventfd; Yes, that would be better except that the latency may be unacceptable. > userspace can > attach a signal to the eventfd if it wants a synchronous exit (does > eventfd allow fcntl(F_SETOWN)?) Which would address the latency issue nicely. Looking at the fs code, it looks like eventfd shouldn't have to do anything special for it. Regards, Anthony Liguori > |
From: Luca <kro...@gm...> - 2007-08-27 19:58:24
|
On 8/27/07, Avi Kivity <av...@qu...> wrote: > Anthony Liguori wrote: > > On Mon, 2007-08-27 at 20:36 +0300, Avi Kivity wrote: > >> userspace can > >> attach a signal to the eventfd if it wants a synchronous exit (does > >> eventfd allow fcntl(F_SETOWN)?) > > > > Which would address the latency issue nicely. Looking at the fs code, > > it looks like eventfd shouldn't have to do anything special for it. > > I'm not sure now. Which thread will be selected for accepting the > signal? It's not specified. > if it isn't guaranteed to be the current thread, we're back > with scheduler involvement, and possibly cacheline bouncing. It's possible to use pthread_sigmask() to block the signal on all threads but one. But this would require changing the rest of the emulator; why not just select() the fd in a dedicated thread? Luca |
From: Avi K. <av...@qu...> - 2007-08-27 20:04:06
|
Luca wrote: > On 8/27/07, Avi Kivity <av...@qu...> wrote: > >> Anthony Liguori wrote: >> >>> On Mon, 2007-08-27 at 20:36 +0300, Avi Kivity wrote: >>> >>>> userspace can >>>> attach a signal to the eventfd if it wants a synchronous exit (does >>>> eventfd allow fcntl(F_SETOWN)?) >>>> >>> Which would address the latency issue nicely. Looking at the fs code, >>> it looks like eventfd shouldn't have to do anything special for it. >>> >> I'm not sure now. Which thread will be selected for accepting the >> signal? >> > > It's not specified. > > So that option's down. >> if it isn't guaranteed to be the current thread, we're back >> with scheduler involvement, and possibly cacheline bouncing. >> > > It's possible to use pthread_sigmask() to block the signal on all > threads but one. But this would require changing the rest of the > emulator; why not just select() the fd in a dedicated thread? > When the guest issues that hypercall, it really wants the I/O to start. If it's a separate thread, the scheduler could choose to let the guest execute and keep the I/O thread waiting (CFS isn't likely to do that, but it's possible). The scheduler could also choose to run the I/O thread on a different processor, and now all the data structures carefully loaded into cache by the guest need to be bounced to the processor running the I/O thread. I think the best policy here is to cover our asses and allow userspace to choose which method it wants to use; and we should start with the guest exiting to userspace to avoid the costs I mentioned. -- Any sufficiently difficult bug is indistinguishable from a feature. |
From: Anthony L. <ali...@us...> - 2007-08-27 21:06:07
|
On Mon, 2007-08-27 at 23:01 +0300, Avi Kivity wrote: > Luca wrote: > >> if it isn't guaranteed to be the current thread, we're back > >> with scheduler involvement, and possibly cacheline bouncing. > >> > > > > It's possible to use pthread_sigmask() to block the signal on all > > threads but one. But this would require changing the rest of the > > emulator; why not just select() the fd in a dedicated thread? > > > > When the guest issues that hypercall, it really wants the I/O to start. > If it's a separate thread, the scheduler could choose to let the guest > execute and keep the I/O thread waiting (CFS isn't likely to do that, > but it's possible). > > The scheduler could also choose to run the I/O thread on a different > processor, and now all the data structures carefully loaded into cache > by the guest need to be bounced to the processor running the I/O thread. > > I think the best policy here is to cover our asses and allow userspace > to choose which method it wants to use; and we should start with the > guest exiting to userspace to avoid the costs I mentioned. I agree that we should start with an exit and then later add an eventfd mechanism if it is needed. Regards, Anthony Liguori |