From: Avi K. <av...@qu...> - 2007-11-08 14:38:57
|
Anthony Liguori wrote: > Avi Kivity wrote: >> Anthony Liguori wrote: >> >>> This is a PCI device that implements a transport for virtio. It >>> allows virtio >>> devices to be used by QEMU based VMMs like KVM or Xen. >>> >>> >> >> Didn't see support for dma. > > Not sure what you're expecting there. Using dma_ops in virtio_ring? > If a pci device is capable of dma (or issuing interrupts), it will be useless with pv pci. >> I think that with Amit's pvdma patches you >> can support dma-capable devices as well without too much fuss. >> > > What is the use case you're thinking of? A semi-paravirt driver that > does dma directly to a device? No, an unmodified driver that, by using clever tricks with dma_ops, can do dma directly to guest memory. See Amit's patches. In fact, why do a virtio transport at all? It can be done either with trap'n'emulate, or by directly mapping the device mmio space into the guest. (what use case are you considering? devices without interrupts and dma? pci door stoppers?) -- error compiling committee.c: too many arguments to function |
From: Anthony L. <ali...@us...> - 2007-11-08 15:06:12
|
Avi Kivity wrote: > If a pci device is capable of dma (or issuing interrupts), it will be > useless with pv pci. Hrm, I think we may be talking about different things. Are you thinking that the driver I posted allows you to do PCI pass-through over virtio? That's not what it is. The driver I posted is a virtio implementation that uses a PCI device. This lets you use virtio-blk and virtio-net under KVM. The alternative to this virtio PCI device would be a virtio transport built with hypercalls like lguest has. I choose a PCI device because it ensured that each virtio device showed up like a normal PCI device. Am I misunderstanding what you're asking about? Regards, Anthony Liguori > >>> I think that with Amit's pvdma patches you >>> can support dma-capable devices as well without too much fuss. >>> >> >> What is the use case you're thinking of? A semi-paravirt driver that >> does dma directly to a device? > > No, an unmodified driver that, by using clever tricks with dma_ops, > can do dma directly to guest memory. See Amit's patches. > > In fact, why do a virtio transport at all? It can be done either with > trap'n'emulate, or by directly mapping the device mmio space into the > guest. > > > (what use case are you considering? devices without interrupts and > dma? pci door stoppers?) > |
From: Avi K. <av...@qu...> - 2007-11-08 15:14:32
|
Anthony Liguori wrote: > Avi Kivity wrote: >> If a pci device is capable of dma (or issuing interrupts), it will be >> useless with pv pci. > > Hrm, I think we may be talking about different things. Are you > thinking that the driver I posted allows you to do PCI pass-through > over virtio? That's not what it is. > > The driver I posted is a virtio implementation that uses a PCI > device. This lets you use virtio-blk and virtio-net under KVM. The > alternative to this virtio PCI device would be a virtio transport > built with hypercalls like lguest has. I choose a PCI device because > it ensured that each virtio device showed up like a normal PCI device. > > Am I misunderstanding what you're asking about? > No, I completely misunderstood the patch. Should review complete patches rather than random hunks. Sorry for the noise. -- error compiling committee.c: too many arguments to function |
From: Dor L. <dor...@gm...> - 2007-11-09 00:39:20
|
Anthony Liguori wrote: > This is a PCI device that implements a transport for virtio. It allows virtio > devices to be used by QEMU based VMMs like KVM or Xen. > > .... > While it's a little premature, we can start thinking of irq path improvements. The current patch acks a private isr and afterwards apic eoi will also be hit since its a level trig irq. This means 2 vmexits per irq. We can start with regular pci irqs and move afterwards to msi. Some other ugly hack options [we're better use msi]: - Read the eoi directly from apic and save the first private isr ack - Convert the specific irq line to edge triggered and dont share it What do you guys think? > +/* A small wrapper to also acknowledge the interrupt when it's handled. > + * I really need an EIO hook for the vring so I can ack the interrupt once we > + * know that we'll be handling the IRQ but before we invoke the callback since > + * the callback may notify the host which results in the host attempting to > + * raise an interrupt that we would then mask once we acknowledged the > + * interrupt. */ > +static irqreturn_t vp_interrupt(int irq, void *opaque) > +{ > + struct virtio_pci_device *vp_dev = opaque; > + struct virtio_pci_vq_info *info; > + irqreturn_t ret = IRQ_NONE; > + u8 isr; > + > + /* reading the ISR has the effect of also clearing it so it's very > + * important to save off the value. */ > + isr = ioread8(vp_dev->ioaddr + VIRTIO_PCI_ISR); > + > + /* It's definitely not us if the ISR was not high */ > + if (!isr) > + return IRQ_NONE; > + > + spin_lock(&vp_dev->lock); > + list_for_each_entry(info, &vp_dev->virtqueues, node) { > + if (vring_interrupt(irq, info->vq) == IRQ_HANDLED) > + ret = IRQ_HANDLED; > + } > + spin_unlock(&vp_dev->lock); > + > + return ret; > +} > |
From: Anthony L. <ali...@us...> - 2007-11-09 02:17:18
|
Dor Laor wrote: > Anthony Liguori wrote: >> This is a PCI device that implements a transport for virtio. It >> allows virtio >> devices to be used by QEMU based VMMs like KVM or Xen. >> >> .... >> > While it's a little premature, we can start thinking of irq path > improvements. > The current patch acks a private isr and afterwards apic eoi will also > be hit since its > a level trig irq. This means 2 vmexits per irq. > We can start with regular pci irqs and move afterwards to msi. > Some other ugly hack options [we're better use msi]: > - Read the eoi directly from apic and save the first private isr ack I must admit, that I don't know a whole lot about interrupt delivery. If we can avoid the private ISR ack then that would certainly be a good thing to do! I think that would involve adding another bit to the virtqueues to indicate whether or not there is work to be handled. It's really just moving the ISR to shared memory so that there's no plenty for accessing it. Regards, Anthony Liguori > - Convert the specific irq line to edge triggered and dont share it > What do you guys think? >> +/* A small wrapper to also acknowledge the interrupt when it's handled. >> + * I really need an EIO hook for the vring so I can ack the >> interrupt once we >> + * know that we'll be handling the IRQ but before we invoke the >> callback since >> + * the callback may notify the host which results in the host >> attempting to >> + * raise an interrupt that we would then mask once we acknowledged the >> + * interrupt. */ >> +static irqreturn_t vp_interrupt(int irq, void *opaque) >> +{ >> + struct virtio_pci_device *vp_dev = opaque; >> + struct virtio_pci_vq_info *info; >> + irqreturn_t ret = IRQ_NONE; >> + u8 isr; >> + >> + /* reading the ISR has the effect of also clearing it so it's very >> + * important to save off the value. */ >> + isr = ioread8(vp_dev->ioaddr + VIRTIO_PCI_ISR); >> + >> + /* It's definitely not us if the ISR was not high */ >> + if (!isr) >> + return IRQ_NONE; >> + >> + spin_lock(&vp_dev->lock); >> + list_for_each_entry(info, &vp_dev->virtqueues, node) { >> + if (vring_interrupt(irq, info->vq) == IRQ_HANDLED) >> + ret = IRQ_HANDLED; >> + } >> + spin_unlock(&vp_dev->lock); >> + >> + return ret; >> +} >> > |
From: Arnd B. <ar...@ar...> - 2007-11-08 17:46:59
|
On Thursday 08 November 2007, Anthony Liguori wrote: > +/* A PCI device has it's own struct device and so does a virtio device so > + * we create a place for the virtio devices to show up in sysfs. =A0I th= ink it > + * would make more sense for virtio to not insist on having it's own dev= ice. */ > +static struct device virtio_pci_root =3D { > +=A0=A0=A0=A0=A0=A0=A0.parent=A0=A0=A0=A0=A0=A0=A0=A0=A0=3D NULL, > +=A0=A0=A0=A0=A0=A0=A0.bus_id=A0=A0=A0=A0=A0=A0=A0=A0=A0=3D "virtio-pci", > +}; > + > +/* Unique numbering for devices under the kvm root */ > +static unsigned int dev_index; > + =2E.. > +/* the PCI probing function */ > +static int __devinit virtio_pci_probe(struct pci_dev *pci_dev, > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0 =A0 =A0 =A0const struct pci_device_id *id) > +{ > +=A0=A0=A0=A0=A0=A0=A0struct virtio_pci_device *vp_dev; > +=A0=A0=A0=A0=A0=A0=A0int err; > + > +=A0=A0=A0=A0=A0=A0=A0/* allocate our structure and fill it out */ > +=A0=A0=A0=A0=A0=A0=A0vp_dev =3D kzalloc(sizeof(struct virtio_pci_device)= , GFP_KERNEL); > +=A0=A0=A0=A0=A0=A0=A0if (vp_dev =3D=3D NULL) > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0return -ENOMEM; > + > +=A0=A0=A0=A0=A0=A0=A0vp_dev->pci_dev =3D pci_dev; > +=A0=A0=A0=A0=A0=A0=A0vp_dev->vdev.dev.parent =3D &virtio_pci_root; If you use=20 vp_dev->vdev.dev.parent =3D &pci_dev->dev; Then there is no need for the special kvm root device, and the actual virtio device shows up in a more logical place, under where it is really (virtually) attached. Arnd <>< |
From: Anthony L. <ali...@us...> - 2007-11-08 19:04:41
|
Arnd Bergmann wrote: > On Thursday 08 November 2007, Anthony Liguori wrote: > >> +/* A PCI device has it's own struct device and so does a virtio device so >> + * we create a place for the virtio devices to show up in sysfs. I think it >> + * would make more sense for virtio to not insist on having it's own device. */ >> +static struct device virtio_pci_root = { >> + .parent = NULL, >> + .bus_id = "virtio-pci", >> +}; >> + >> +/* Unique numbering for devices under the kvm root */ >> +static unsigned int dev_index; >> + >> > > ... > > >> +/* the PCI probing function */ >> +static int __devinit virtio_pci_probe(struct pci_dev *pci_dev, >> + const struct pci_device_id *id) >> +{ >> + struct virtio_pci_device *vp_dev; >> + int err; >> + >> + /* allocate our structure and fill it out */ >> + vp_dev = kzalloc(sizeof(struct virtio_pci_device), GFP_KERNEL); >> + if (vp_dev == NULL) >> + return -ENOMEM; >> + >> + vp_dev->pci_dev = pci_dev; >> + vp_dev->vdev.dev.parent = &virtio_pci_root; >> > > If you use > > vp_dev->vdev.dev.parent = &pci_dev->dev; > > Then there is no need for the special kvm root device, and the actual > virtio device shows up in a more logical place, under where it is > really (virtually) attached. > They already show up underneath of the PCI bus. The issue is that there are two separate 'struct device's for each virtio device. There's the PCI device (that's part of the pci_dev structure) and then there's the virtio_device one. I thought that setting the dev.parent of the virtio_device struct device would result in having two separate entries under the PCI bus directory which would be pretty confusing :-) Regards, Anthony Liguori > Arnd <>< > |
From: Arnd B. <ar...@ar...> - 2007-11-09 11:04:12
|
On Thursday 08 November 2007, Anthony Liguori wrote: > > They already show up underneath of the PCI bus. The issue is that there > are two separate 'struct device's for each virtio device. There's the > PCI device (that's part of the pci_dev structure) and then there's the > virtio_device one. I thought that setting the dev.parent of the > virtio_device struct device would result in having two separate entries > under the PCI bus directory which would be pretty confusing But that's what a device tree means. Think about a USB disk drive: The drive shows up as a child of the USB controller, which in turn is a child of the PCI bridge. Note that I did not suggest having the virtio parent set to the parent of the PCI device, but to the PCI device itself. I find it more confusing to have a device just hanging off the root when it is actually handled by the PCI subsystem. Arnd <>< |
From: Zachary A. <za...@vm...> - 2007-11-21 18:19:29
|
On Wed, 2007-11-21 at 09:13 +0200, Avi Kivity wrote: > Where the device is implemented is an implementation detail that should > be hidden from the guest, isn't that one of the strengths of > virtualization? Two examples: a file-based block device implemented in > qemu gives you fancy file formats with encryption and compression, while > the same device implemented in the kernel gives you a low-overhead path > directly to a zillion-disk SAN volume. Or a user-level network device > capable of running with the slirp stack and no permissions vs. the > kernel device running copyless most of the time and using a dma engine > for the rest but requiring you to be good friends with the admin. > > The user should expect zero reconfigurations moving a VM from one model > to the other. I think that is pretty insightful, and indeed, is probably the only reason we would ever consider using a virtio based driver. But is this really a virtualization problem, and is virtio the right place to solve it? Doesn't I/O hotplug with multipathing or NIC teaming provide the same infrastructure in a way that is useful in more than just a virtualization context? Zach |
From: Avi K. <av...@qu...> - 2007-11-22 07:31:33
|
Zachary Amsden wrote: > On Wed, 2007-11-21 at 09:13 +0200, Avi Kivity wrote: > > >> Where the device is implemented is an implementation detail that should >> be hidden from the guest, isn't that one of the strengths of >> virtualization? Two examples: a file-based block device implemented in >> qemu gives you fancy file formats with encryption and compression, while >> the same device implemented in the kernel gives you a low-overhead path >> directly to a zillion-disk SAN volume. Or a user-level network device >> capable of running with the slirp stack and no permissions vs. the >> kernel device running copyless most of the time and using a dma engine >> for the rest but requiring you to be good friends with the admin. >> >> The user should expect zero reconfigurations moving a VM from one model >> to the other. >> > > I think that is pretty insightful, and indeed, is probably the only > reason we would ever consider using a virtio based driver. > > But is this really a virtualization problem, and is virtio the right > place to solve it? Doesn't I/O hotplug with multipathing or NIC teaming > provide the same infrastructure in a way that is useful in more than > just a virtualization context? > With the aid of a dictionary I was able to understand about half the words in the last sentence. Moving from device to device using hotplug+multipath is complex to configure, available on only some guests, uses rarely-exercised paths in the guest OS, and only works for a few types of devices (network and block). Having host independence in the device means you can change the device implementation for, say, a display driver (consider, for example, a vmgl+virtio driver, which can be implemented in userspace or tunneled via virtio-over-tcp to some remote display without going through userspace, without the guest knowing about it). -- error compiling committee.c: too many arguments to function |