From: Anthony L. <ali...@us...> - 2007-11-11 03:20:25
|
This patch series implements QEMU support for PCI based virtio devices. The series also updates the kvm-userspace tree to have the ability to build virtio devices as modules from an external tree (just as we do with the kvm modules). There's still some notable TODO's left so it's not mergable in it's current state. I just wanted to get it out there for some feedback. Things seem to work pretty well right now. These patches depend on having a guest kernel using Rusty's virtio tree (and patches/1 branch). They also require my virtio patch queue. The latest version of these patches are available at: http://hg.codemonkey.ws/qemu-virtio And my patch queue against Rusty's tree is at: http://hg.codemonkey.ws/virtio-pci |
From: Anthony L. <ali...@us...> - 2007-11-11 03:20:25
|
This patch implements a very simple virtio transport for 9p in QEMU. The kernel side of this is going to change dramatically in the very near future so this is just a demonstration for the moment. It requires an external 9p file server and communicates with it over a TCP port. In future versions of this patch, we'll link directly against a 9p file server library and expose the ability to expose filesystems directly from the QEMU command line. Signed-off-by: Anthony Liguori <ali...@us...> diff --git a/qemu/Makefile.target b/qemu/Makefile.target index 49c0fc7..fedf1e0 100644 --- a/qemu/Makefile.target +++ b/qemu/Makefile.target @@ -449,7 +449,7 @@ VL_OBJS += rtl8139.o VL_OBJS+= hypercall.o # virtio devices -VL_OBJS += virtio.o virtio-blk.o +VL_OBJS += virtio.o virtio-blk.o virtio-9p.o ifeq ($(TARGET_BASE_ARCH), i386) # Hardware support diff --git a/qemu/hw/pc.c b/qemu/hw/pc.c index c7483e8..fa0346b 100644 --- a/qemu/hw/pc.c +++ b/qemu/hw/pc.c @@ -951,6 +951,9 @@ static void pc_init1(ram_addr_t ram_size, int vga_ram_size, int boot_device, virtio_blk_init(pci_bus, 0x5002, 0x2258, bs); } + if (1) + virtio_9p_init(pci_bus, 0x5002, 0x2258); + if (pci_enabled) { pci_piix3_ide_init(pci_bus, bs_table, piix3_devfn + 1, i8259); } else { diff --git a/qemu/hw/virtio-9p.c b/qemu/hw/virtio-9p.c new file mode 100644 index 0000000..c77b594 --- /dev/null +++ b/qemu/hw/virtio-9p.c @@ -0,0 +1,121 @@ +/* + * Virtio 9p backend + * + * Copyright IBM, Corp. 2007 + * + * Authors: + * Anthony Liguori <ali...@us...> + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + */ + +#include "vl.h" +#include "qemu_socket.h" +#include "virtio.h" +#include <sys/uio.h> +#include <arpa/inet.h> + +/* from Linux's linux/virtio_9p.h */ + +/* The ID for virtio console */ +#define VIRTIO_ID_9P 9 +/* Maximum number of virtio channels per partition (1 for now) */ +#define MAX_9P_CHAN 1 + +typedef struct V9fsState +{ + VirtIODevice vdev; + + int fd; + int listening; + VirtQueue *in_vq; +} V9fsState; + +static V9fsState *to_v9fs_state(VirtIODevice *vdev) +{ + return (V9fsState *)vdev; +} + +static void virtio_9p_handle_output(VirtIODevice *vdev, VirtQueue *vq) +{ + V9fsState *s = to_v9fs_state(vdev); + VirtQueueElement elem; + ssize_t len; + + while ((len = virtqueue_pop(vq, &elem)) != 0) { + len = writev(s->fd, elem.out_sg, elem.out_num); + if (len == -1 && (errno == EINTR || errno == EAGAIN)) + continue; + + virtqueue_push(vq, &elem, len); + virtio_notify(vdev, vq); + } +} + +static void virtio_9p_kick_input(VirtIODevice *vdev, VirtQueue *vq) +{ + V9fsState *s = to_v9fs_state(vdev); + s->listening = 1; +} + +static int virtio_9p_can_read(void *opaque) +{ + V9fsState *s = opaque; + return s->listening; +} + +static void virtio_9p_handle_input(void *opaque) +{ + V9fsState *s = opaque; + VirtQueueElement elem; + ssize_t len; + + len = virtqueue_pop(s->in_vq, &elem); + if (!len) { + /* stop listening until more queue space frees up */ + s->listening = 0; + return; + } + +again: + len = readv(s->fd, elem.in_sg, elem.in_num); + if (len == -1 && (errno == EINTR || errno == EAGAIN)) + goto again; + + virtqueue_push(s->in_vq, &elem, len); + virtio_notify(&s->vdev, s->in_vq); +} + +VirtIODevice *virtio_9p_init(PCIBus *bus, uint16_t vendor, uint16_t device) +{ + V9fsState *s; + struct sockaddr_in addr; + struct in_addr in; + + s = (V9fsState *)virtio_init_pci(bus, "virtio-9p", vendor, device, + vendor, VIRTIO_ID_9P, + 0, sizeof(V9fsState)); + + s->fd = socket(PF_INET, SOCK_STREAM, 0); + if (s->fd == -1) + exit(1); + addr.sin_family = AF_INET; + addr.sin_port = htons(1025); + inet_aton("127.0.0.1", &in); + memcpy(&addr.sin_addr, &in, sizeof(in)); + + if (connect(s->fd, (struct sockaddr *)&addr, sizeof(addr)) == -1) + exit(2); + + s->listening = 0; /* don't start listening until the queue is inited */ + + s->in_vq = virtio_add_queue(&s->vdev, virtio_9p_kick_input); + virtio_add_queue(&s->vdev, virtio_9p_handle_output); + + qemu_set_fd_handler2(s->fd, virtio_9p_can_read, virtio_9p_handle_input, + NULL, s); + + return &s->vdev; +} diff --git a/qemu/vl.h b/qemu/vl.h index 249ede2..7b5cc8d 100644 --- a/qemu/vl.h +++ b/qemu/vl.h @@ -1399,6 +1399,8 @@ typedef struct VirtIODevice VirtIODevice; VirtIODevice *virtio_blk_init(PCIBus *bus, uint16_t vendor, uint16_t device, BlockDriverState *bs); +VirtIODevice *virtio_9p_init(PCIBus *bus, uint16_t vendor, uint16_t device); + /* buf = NULL means polling */ typedef int ADBDeviceRequest(ADBDevice *d, uint8_t *buf_out, const uint8_t *buf, int len); |
From: Anthony L. <ali...@us...> - 2007-11-11 03:20:29
|
This patch implements a very naive virtio network device backend in QEMU. Even with this simple implementation, it get's about 3x the throughput of rtl8139. Of course, there's a whole lot of room for optimization to eliminate the unnecessary copying and support more advanced features. To use a virtio network device, simply specific '-net nic,model=virtio' Signed-off-by: Anthony Liguori <ali...@us...> diff --git a/qemu/Makefile.target b/qemu/Makefile.target index fedf1e0..8ed5ea9 100644 --- a/qemu/Makefile.target +++ b/qemu/Makefile.target @@ -449,7 +449,7 @@ VL_OBJS += rtl8139.o VL_OBJS+= hypercall.o # virtio devices -VL_OBJS += virtio.o virtio-blk.o virtio-9p.o +VL_OBJS += virtio.o virtio-blk.o virtio-9p.o virtio-net.o ifeq ($(TARGET_BASE_ARCH), i386) # Hardware support diff --git a/qemu/hw/pci.c b/qemu/hw/pci.c index 7e8adc4..d370988 100644 --- a/qemu/hw/pci.c +++ b/qemu/hw/pci.c @@ -588,6 +588,8 @@ void pci_nic_init(PCIBus *bus, NICInfo *nd, int devfn) pci_rtl8139_init(bus, nd, devfn); } else if (strcmp(nd->model, "pcnet") == 0) { pci_pcnet_init(bus, nd, devfn); + } else if (strcmp(nd->model, "virtio") == 0) { + virtio_net_init(bus, 0x5002, 0x2258, nd, devfn); } else if (strcmp(nd->model, "?") == 0) { fprintf(stderr, "qemu: Supported PCI NICs: i82551 i82557b i82559er" " ne2k_pci pcnet rtl8139\n"); diff --git a/qemu/hw/virtio-net.c b/qemu/hw/virtio-net.c new file mode 100644 index 0000000..9ed7df7 --- /dev/null +++ b/qemu/hw/virtio-net.c @@ -0,0 +1,175 @@ +/* + * Virtio Network Device + * + * Copyright IBM, Corp. 2007 + * + * Authors: + * Anthony Liguori <ali...@us...> + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + */ + +#include "vl.h" +#include "virtio.h" + +/* from Linux's virtio_net.h */ + +/* The ID for virtio_net */ +#define VIRTIO_ID_NET 1 + +/* The feature bitmap for virtio net */ +#define VIRTIO_NET_F_NO_CSUM 0 +#define VIRTIO_NET_F_TSO4 1 +#define VIRTIO_NET_F_UFO 2 +#define VIRTIO_NET_F_TSO4_ECN 3 +#define VIRTIO_NET_F_TSO6 4 +#define VIRTIO_NET_F_MAC 5 + +/* The config defining mac address (6 bytes) */ +#define VIRTIO_CONFIG_NET_MAC_F 0 + +/* This is the first element of the scatter-gather list. If you don't + * specify GSO or CSUM features, you can simply ignore the header. */ +struct virtio_net_hdr +{ +#define VIRTIO_NET_HDR_F_NEEDS_CSUM 1 // Use csum_start, csum_offset + uint8_t flags; +#define VIRTIO_NET_HDR_GSO_NONE 0 // Not a GSO frame +#define VIRTIO_NET_HDR_GSO_TCPV4 1 // GSO frame, IPv4 TCP (TSO) +/* FIXME: Do we need this? If they said they can handle ECN, do they care? */ +#define VIRTIO_NET_HDR_GSO_TCPV4_ECN 2 // GSO frame, IPv4 TCP w/ ECN +#define VIRTIO_NET_HDR_GSO_UDP 3 // GSO frame, IPv4 UDP (UFO) +#define VIRTIO_NET_HDR_GSO_TCPV6 4 // GSO frame, IPv6 TCP + uint8_t gso_type; + uint16_t gso_size; + uint16_t csum_start; + uint16_t csum_offset; +}; + +typedef struct VirtIONet +{ + VirtIODevice vdev; + uint8_t mac[6]; + VirtQueue *rx_vq; + VirtQueue *tx_vq; + VLANClientState *vc; + int can_receive; +} VirtIONet; + +/* FIXME: there may be a bug where the first dns lookup doesn't work. have to + * see if this is a regression */ + +static VirtIONet *to_virtio_net(VirtIODevice *vdev) +{ + return (VirtIONet *)vdev; +} + +static void virtio_net_update_config(VirtIODevice *vdev, uint8_t *config) +{ + VirtIONet *n = to_virtio_net(vdev); + + memcpy(config + VIRTIO_CONFIG_NET_MAC_F, n->mac, 6); +} + +static uint32_t virtio_net_get_features(VirtIODevice *vdev) +{ + return (1 << VIRTIO_NET_F_MAC); +} + +/* RX */ + +static void virtio_net_handle_rx(VirtIODevice *vdev, VirtQueue *vq) +{ + VirtIONet *n = to_virtio_net(vdev); + n->can_receive = 1; +} + +static int virtio_net_can_receive(void *opaque) +{ + VirtIONet *n = opaque; + + return (n->vdev.status & VIRTIO_CONFIG_S_DRIVER_OK) && n->can_receive; +} + +static void virtio_net_receive(void *opaque, const uint8_t *buf, int size) +{ + VirtIONet *n = opaque; + VirtQueueElement elem; + struct virtio_net_hdr *hdr; + int offset, i; + + /* FIXME: the drivers really need to set their status better */ + if (n->rx_vq->vring.avail == NULL) { + n->can_receive = 0; + return; + } + + if (virtqueue_pop(n->rx_vq, &elem) == 0) { + /* wait until the guest adds some rx bufs */ + n->can_receive = 0; + return; + } + + hdr = (void *)elem.in_sg[0].iov_base; + hdr->flags = 0; + hdr->gso_type = VIRTIO_NET_HDR_GSO_NONE; + + /* copy in packet. ugh */ + offset = 0; + i = 1; + while (offset < size && i < elem.in_num) { + int len = MIN(elem.in_sg[i].iov_len, size - offset); + memcpy(elem.in_sg[i].iov_base, buf + offset, len); + offset += len; + i++; + } + + /* signal other side */ + virtqueue_push(n->rx_vq, &elem, sizeof(*hdr) + offset); + virtio_notify(&n->vdev, n->rx_vq); +} + +/* TX */ +static void virtio_net_handle_tx(VirtIODevice *vdev, VirtQueue *vq) +{ + VirtIONet *n = to_virtio_net(vdev); + VirtQueueElement elem; + + while (virtqueue_pop(vq, &elem)) { + int i; + size_t len = 0; + + /* ignore the header for now */ + for (i = 1; i < elem.out_num; i++) { + qemu_send_packet(n->vc, elem.out_sg[i].iov_base, + elem.out_sg[i].iov_len); + len += elem.out_sg[i].iov_len; + } + + virtqueue_push(vq, &elem, sizeof(struct virtio_net_hdr) + len); + virtio_notify(&n->vdev, vq); + } +} + +VirtIODevice *virtio_net_init(PCIBus *bus, uint16_t vendor, uint16_t device, + NICInfo *nd, int devfn) +{ + VirtIONet *n; + + n = (VirtIONet *)virtio_init_pci(bus, "virtio-net", vendor, device, + vendor, VIRTIO_ID_NET, + 6, sizeof(VirtIONet)); + + n->vdev.update_config = virtio_net_update_config; + n->vdev.get_features = virtio_net_get_features; + n->rx_vq = virtio_add_queue(&n->vdev, virtio_net_handle_rx); + n->tx_vq = virtio_add_queue(&n->vdev, virtio_net_handle_tx); + n->can_receive = 0; + memcpy(n->mac, nd->macaddr, 6); + n->vc = qemu_new_vlan_client(nd->vlan, virtio_net_receive, + virtio_net_can_receive, n); + + return &n->vdev; +} diff --git a/qemu/vl.h b/qemu/vl.h index 7b5cc8d..4f26fbb 100644 --- a/qemu/vl.h +++ b/qemu/vl.h @@ -1401,6 +1401,9 @@ VirtIODevice *virtio_blk_init(PCIBus *bus, uint16_t vendor, uint16_t device, VirtIODevice *virtio_9p_init(PCIBus *bus, uint16_t vendor, uint16_t device); +VirtIODevice *virtio_net_init(PCIBus *bus, uint16_t vendor, uint16_t device, + NICInfo *nd, int devfn); + /* buf = NULL means polling */ typedef int ADBDeviceRequest(ADBDevice *d, uint8_t *buf_out, const uint8_t *buf, int len); |
From: Anthony L. <ali...@us...> - 2007-11-11 03:20:29
|
This patch implements a very naive virtio block device backend in QEMU. There's a lot of room for future optimization. We need to merge a -disk patch before we can provide a mechanism to expose this to users. Signed-off-by: Anthony Liguori <ali...@us...> diff --git a/qemu/Makefile.target b/qemu/Makefile.target index c7686b2..49c0fc7 100644 --- a/qemu/Makefile.target +++ b/qemu/Makefile.target @@ -449,7 +449,7 @@ VL_OBJS += rtl8139.o VL_OBJS+= hypercall.o # virtio devices -VL_OBJS += virtio.o +VL_OBJS += virtio.o virtio-blk.o ifeq ($(TARGET_BASE_ARCH), i386) # Hardware support diff --git a/qemu/hw/pc.c b/qemu/hw/pc.c index 8aae814..c7483e8 100644 --- a/qemu/hw/pc.c +++ b/qemu/hw/pc.c @@ -943,6 +943,14 @@ static void pc_init1(ram_addr_t ram_size, int vga_ram_size, int boot_device, #ifdef USE_HYPERCALL pci_hypercall_init(pci_bus); #endif + + if (1) { + BlockDriverState *bs = bdrv_new("vda"); + if (bdrv_open(bs, "/home/anthony/images/linux.img", BDRV_O_SNAPSHOT)) + exit(1); + virtio_blk_init(pci_bus, 0x5002, 0x2258, bs); + } + if (pci_enabled) { pci_piix3_ide_init(pci_bus, bs_table, piix3_devfn + 1, i8259); } else { diff --git a/qemu/hw/virtio-blk.c b/qemu/hw/virtio-blk.c new file mode 100644 index 0000000..a1d5ea6 --- /dev/null +++ b/qemu/hw/virtio-blk.c @@ -0,0 +1,162 @@ +/* + * Virtio Block Device + * + * Copyright IBM, Corp. 2007 + * + * Authors: + * Anthony Liguori <ali...@us...> + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + */ + +#include "vl.h" +#include "virtio.h" + +/* from Linux's linux/virtio_blk.h */ + +/* The ID for virtio_block */ +#define VIRTIO_ID_BLOCK 2 + +/* Feature bits */ +#define VIRTIO_BLK_F_BARRIER 0 /* Does host support barriers? */ +#define VIRTIO_BLK_F_SIZE_MAX 1 /* Indicates maximum segment size */ +#define VIRTIO_BLK_F_SEG_MAX 2 /* Indicates maximum # of segments */ + +/* The capacity (in 512-byte sectors). 8 bytes. */ +#define VIRTIO_CONFIG_BLK_F_CAPACITY 0 +/* The maximum segment size. 4 bytes. */ +#define VIRTIO_CONFIG_BLK_F_SIZE_MAX 0x08 +/* The maximum number of segments. 4 bytes. */ +#define VIRTIO_CONFIG_BLK_F_SEG_MAX 0x0A + +/* These two define direction. */ +#define VIRTIO_BLK_T_IN 0 +#define VIRTIO_BLK_T_OUT 1 + +/* This bit says it's a scsi command, not an actual read or write. */ +#define VIRTIO_BLK_T_SCSI_CMD 2 + +/* Barrier before this op. */ +#define VIRTIO_BLK_T_BARRIER 0x80000000 + +/* This is the first element of the read scatter-gather list. */ +struct virtio_blk_outhdr +{ + /* VIRTIO_BLK_T* */ + uint32_t type; + /* io priority. */ + uint32_t ioprio; + /* Sector (ie. 512 byte offset) */ + uint64_t sector; + /* Where to put reply. */ + uint64_t id; +}; + +#define VIRTIO_BLK_S_OK 0 +#define VIRTIO_BLK_S_IOERR 1 +#define VIRTIO_BLK_S_UNSUPP 2 + +/* This is the first element of the write scatter-gather list */ +struct virtio_blk_inhdr +{ + unsigned char status; +}; + +typedef struct VirtIOBlock +{ + VirtIODevice vdev; + BlockDriverState *bs; +} VirtIOBlock; + +static VirtIOBlock *to_virtio_blk(VirtIODevice *vdev) +{ + return (VirtIOBlock *)vdev; +} + +static void virtio_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq) +{ + VirtIOBlock *s = to_virtio_blk(vdev); + VirtQueueElement elem; + unsigned int count; + + while ((count = virtqueue_pop(vq, &elem)) != 0) { + struct virtio_blk_inhdr *in; + struct virtio_blk_outhdr *out; + unsigned int wlen; + off_t off; + int i; + + out = (void *)elem.out_sg[0].iov_base; + in = (void *)elem.in_sg[elem.in_num - 1].iov_base; + off = out->sector; + + if (out->type & VIRTIO_BLK_T_SCSI_CMD) { + wlen = sizeof(*in); + in->status = VIRTIO_BLK_S_UNSUPP; + } else if (out->type & VIRTIO_BLK_T_OUT) { + wlen = sizeof(*in); + + for (i = 1; i < elem.out_num; i++) { + bdrv_write(s->bs, off, + elem.out_sg[i].iov_base, + elem.out_sg[i].iov_len / 512); + off += elem.out_sg[i].iov_len / 512; + } + + in->status = VIRTIO_BLK_S_OK; + } else { + wlen = sizeof(*in); + + for (i = 0; i < elem.in_num - 1; i++) { + bdrv_read(s->bs, off, + elem.in_sg[i].iov_base, + elem.in_sg[i].iov_len / 512); + off += elem.in_sg[i].iov_len / 512; + wlen += elem.in_sg[i].iov_len; + } + + in->status = VIRTIO_BLK_S_OK; + } + + virtqueue_push(vq, &elem, wlen); + virtio_notify(vdev, vq); + } +} + +static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config) +{ + VirtIOBlock *s = to_virtio_blk(vdev); + int64_t capacity; + uint32_t v; + + bdrv_get_geometry(s->bs, &capacity); + memcpy(config + VIRTIO_CONFIG_BLK_F_CAPACITY, &capacity, sizeof(capacity)); + + v = VIRTQUEUE_MAX_SIZE - 2; + memcpy(config + VIRTIO_CONFIG_BLK_F_SEG_MAX, &v, sizeof(v)); +} + +static uint32_t virtio_blk_get_features(VirtIODevice *vdev) +{ + return (1 << VIRTIO_BLK_F_SEG_MAX); +} + +VirtIODevice *virtio_blk_init(PCIBus *bus, uint16_t vendor, uint16_t device, + BlockDriverState *bs) +{ + VirtIOBlock *s; + + s = (VirtIOBlock *)virtio_init_pci(bus, "virtio-blk", vendor, device, + vendor, VIRTIO_ID_BLOCK, + 16, sizeof(VirtIOBlock)); + + s->vdev.update_config = virtio_blk_update_config; + s->vdev.get_features = virtio_blk_get_features; + s->bs = bs; + + virtio_add_queue(&s->vdev, virtio_blk_handle_output); + + return &s->vdev; +} diff --git a/qemu/vl.h b/qemu/vl.h index fafcf09..249ede2 100644 --- a/qemu/vl.h +++ b/qemu/vl.h @@ -1396,6 +1396,9 @@ void vmchannel_init(CharDriverState *hd, uint32_t deviceid, uint32_t index); typedef struct VirtIODevice VirtIODevice; +VirtIODevice *virtio_blk_init(PCIBus *bus, uint16_t vendor, uint16_t device, + BlockDriverState *bs); + /* buf = NULL means polling */ typedef int ADBDeviceRequest(ADBDevice *d, uint8_t *buf_out, const uint8_t *buf, int len); |
From: Anthony L. <ali...@us...> - 2007-11-11 03:20:30
|
I don't think there's any plans for this driver to every be used seriously as virtio seems like the agreed upon layer. So let's remove the code from the tree so I can use the drivers/ directory for something else. Signed-off-by: Anthony Liguori <ali...@us...> diff --git a/drivers/Kbuild b/drivers/Kbuild deleted file mode 100644 index 16075ce..0000000 --- a/drivers/Kbuild +++ /dev/null @@ -1,3 +0,0 @@ -EXTRA_CFLAGS := -I$(src)/../ -obj-m := hypercall.o -#hypercall-objs := hypercall.o diff --git a/drivers/Makefile b/drivers/Makefile deleted file mode 100644 index 56facbb..0000000 --- a/drivers/Makefile +++ /dev/null @@ -1,21 +0,0 @@ -include ../config.mak -KERNELDIR ?= /lib/modules/$(shell uname -r)/build -KVERREL = $(patsubst /lib/modules/%/build,%,$(KERNELDIR)) - -DESTDIR= - -INSTALLDIR = $(patsubst %/build,%/extra,$(KERNELDIR)) - -all:: - $(MAKE) -C $(KERNELDIR) M=`pwd` "$$@" - -install: - mkdir -p $(DESTDIR)/$(INSTALLDIR) - cp *.ko $(DESTDIR)/$(INSTALLDIR) - /sbin/depmod -a - -clean: - $(MAKE) -C $(KERNELDIR) M=`pwd` $@ - -svnclean: - svn st | grep '^\?' | awk '{print $2}' | xargs rm -rf diff --git a/drivers/hypercall.c b/drivers/hypercall.c deleted file mode 100644 index 8f52c5e..0000000 --- a/drivers/hypercall.c +++ /dev/null @@ -1,497 +0,0 @@ - -#include <linux/module.h> -#include <linux/kernel.h> -#include <linux/compiler.h> -#include <linux/pci.h> -#include <linux/init.h> -#include <linux/ioport.h> -#include <linux/completion.h> -#include <linux/interrupt.h> -#include <asm/io.h> -#include <asm/uaccess.h> -#include <asm/irq.h> - -#define HYPERCALL_DRIVER_NAME "Qumranet_hypercall_driver" -#define HYPERCALL_DRIVER_VERSION "1" -#define PCI_VENDOR_ID_HYPERCALL 0x5002 -#define PCI_DEVICE_ID_HYPERCALL 0x2258 - -MODULE_AUTHOR ("Dor Laor <dor...@qu...>"); -MODULE_DESCRIPTION (HYPERCALL_DRIVER_NAME); -MODULE_LICENSE("GPL"); -MODULE_VERSION(HYPERCALL_DRIVER_VERSION); - -static int debug = 0; -module_param(debug, int, 0); -MODULE_PARM_DESC (debug, "toggle debug flag"); - -#define HYPERCALL_DEBUG 1 -#if HYPERCALL_DEBUG -# define DPRINTK(fmt, args...) printk(KERN_DEBUG "%s: " fmt, __FUNCTION__ , ## args) -# define assert(expr) \ - if(unlikely(!(expr))) { \ - printk(KERN_ERR "Assertion failed! %s,%s,%s,line=%d\n", \ - #expr,__FILE__,__FUNCTION__,__LINE__); \ - } -#else -# define DPRINTK(fmt, args...) -# define assert(expr) do {} while (0) -#endif - -static struct pci_device_id hypercall_pci_tbl[] = { - {PCI_VENDOR_ID_HYPERCALL, PCI_DEVICE_ID_HYPERCALL, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0 }, - {0,} -}; -MODULE_DEVICE_TABLE (pci, hypercall_pci_tbl); - - - -/****** Hypercall device definitions ***************/ -#include <qemu/hw/hypercall.h> - -/* read PIO/MMIO register */ -#define HIO_READ8(reg, ioaddr) ioread8(ioaddr + (reg)) -#define HIO_READ16(reg, ioaddr) ioread16(ioaddr + (reg)) -#define HIO_READ32(reg, ioaddr) ioread32(ioaddr + (reg)) - -/* write PIO/MMIO register */ -#define HIO_WRITE8(reg, val8, ioaddr) iowrite8((val8), ioaddr + (reg)) -#define HIO_WRITE16(reg, val16, ioaddr) iowrite16((val16), ioaddr + (reg)) -#define HIO_WRITE32(reg, val32, ioaddr) iowrite32((val32), ioaddr + (reg)) - - -struct hypercall_dev { - struct pci_dev *pci_dev; - struct kobject kobject; - u32 state; - spinlock_t lock; - u8 name[128]; - u16 irq; - u32 regs_len; - void __iomem *io_addr; - unsigned long base_addr; /* device I/O address */ - unsigned long cmd; -}; - - -static int hypercall_close(struct hypercall_dev* dev); -static int hypercall_open(struct hypercall_dev *dev); -static void hypercall_cleanup_dev(struct hypercall_dev *dev); -static irqreturn_t hypercall_interrupt(int irq, void *dev_instance, - struct pt_regs *regs); - -static void __exit hypercall_sysfs_remove(struct hypercall_dev *dev); -static int hypercall_sysfs_add(struct hypercall_dev *dev); - - -static int __devinit hypercall_init_board(struct pci_dev *pdev, - struct hypercall_dev **dev_out) -{ - unsigned long ioaddr; - struct hypercall_dev *dev; - int rc; - u32 disable_dev_on_err = 0; - unsigned long pio_start, pio_end, pio_flags, pio_len; - unsigned long mmio_start, mmio_end, mmio_flags, mmio_len; - - assert(pdev != NULL); - - *dev_out = NULL; - - dev = kzalloc(sizeof(*dev), GFP_KERNEL); - if (dev == NULL) { - printk (KERN_ERR "%s: Unable to alloc hypercall device\n", pci_name(pdev)); - return -ENOMEM; - } - dev->pci_dev = pdev; - rc = pci_enable_device(pdev); - if (rc) - goto err_out; - disable_dev_on_err = 1; - - pio_start = pci_resource_start (pdev, 0); - pio_end = pci_resource_end (pdev, 0); - pio_flags = pci_resource_flags (pdev, 0); - pio_len = pci_resource_len (pdev, 0); - - mmio_start = pci_resource_start (pdev, 1); - mmio_end = pci_resource_end (pdev, 1); - mmio_flags = pci_resource_flags (pdev, 1); - mmio_len = pci_resource_len (pdev, 1); - - DPRINTK("PIO region size == 0x%02lX\n", pio_len); - DPRINTK("MMIO region size == 0x%02lX\n", mmio_len); - - rc = pci_request_regions (pdev, "hypercall"); - if (rc) - goto err_out; - -#define USE_IO_OPS 1 -#ifdef USE_IO_OPS - ioaddr = (unsigned long)pci_iomap(pdev, 0, 0); - //ioaddr = ioport_map(pio_start, pio_len); - if (!ioaddr) { - printk(KERN_ERR "%s: cannot map PIO, aborting\n", pci_name(pdev)); - rc = -EIO; - goto err_out; - } - dev->base_addr = (unsigned long)pio_start; - dev->io_addr = (void*)ioaddr; - dev->regs_len = pio_len; -#else - ioaddr = pci_iomap(pdev, 1, 0); - if (ioaddr == NULL) { - printk(KERN_ERR "%s: cannot remap MMIO, aborting\n", pci_name(pdev)); - rc = -EIO; - goto err_out; - } - dev->base_addr = ioaddr; - dev->io_addr = (void*)ioaddr; - dev->regs_len = mmio_len; -#endif /* USE_IO_OPS */ - - *dev_out = dev; - return 0; - -err_out: - hypercall_cleanup_dev(dev); - if (disable_dev_on_err) - pci_disable_device(pdev); - return rc; -} - -static int __devinit hypercall_init_one(struct pci_dev *pdev, - const struct pci_device_id *ent) -{ - struct hypercall_dev *dev; - u8 pci_rev; - - assert(pdev != NULL); - assert(ent != NULL); - - pci_read_config_byte(pdev, PCI_REVISION_ID, &pci_rev); - - if (pdev->vendor == PCI_VENDOR_ID_HYPERCALL && - pdev->device == PCI_DEVICE_ID_HYPERCALL) { - printk(KERN_INFO "pci dev %s (id %04x:%04x rev %02x) is a guest hypercall device\n", - pci_name(pdev), pdev->vendor, pdev->device, pci_rev); - } - - if (hypercall_init_board(pdev, &dev) != 0) - return -1; - - assert(dev != NULL); - - dev->irq = pdev->irq; - - spin_lock_init(&dev->lock); - pci_set_drvdata(pdev, dev); - - printk (KERN_INFO "name=%s: base_addr=0x%lx, io_addr=0x%lx, IRQ=%d\n", - dev->name, dev->base_addr, (unsigned long)dev->io_addr, dev->irq); - hypercall_open(dev); - - if (hypercall_sysfs_add(dev) != 0) - return -1; - - return 0; -} - -static void __devexit hypercall_remove_one(struct pci_dev *pdev) -{ - struct hypercall_dev *dev = pci_get_drvdata(pdev); - - assert(dev != NULL); - - hypercall_close(dev); - hypercall_sysfs_remove(dev); - hypercall_cleanup_dev(dev); - pci_disable_device(pdev); -} - -static int hypercall_tx(struct hypercall_dev *dev, unsigned char *buf, size_t len) -{ - void __iomem *ioaddr = (void __iomem*)dev->io_addr; - int i; - - if (len > HP_MEM_SIZE) - return -EINVAL; - - spin_lock(&dev->lock); - HIO_WRITE8(HP_TXSIZE, len, ioaddr); - for (i=0; i< len; i++) - HIO_WRITE8(HP_TXBUFF, buf[i], ioaddr); - spin_unlock(&dev->lock); - - return 0; -} - -/* - * The interrupt handler does all of the rx work and cleans up - * after the tx - */ -static irqreturn_t hypercall_interrupt(int irq, void *dev_instance, - struct pt_regs *regs) -{ - struct hypercall_dev *dev = (struct hypercall_dev *)dev_instance; - void __iomem *ioaddr = (void __iomem*)dev->io_addr; - u32 status; - int irq_handled = IRQ_NONE; - int rx_buf_size; - int i; - u8 buffer[HP_MEM_SIZE]; - u8 *pbuf; - - DPRINTK("base addr is 0x%lx, io_addr=0x%lx\n", dev->base_addr, (long)dev->io_addr); - - spin_lock(&dev->lock); - status = HIO_READ8(HSR_REGISTER, ioaddr); - DPRINTK("irq status is 0x%x\n", status); - - /* shared irq? */ - if (unlikely((status & HSR_VDR) == 0)) { - DPRINTK("not handeling irq, not ours\n"); - goto out; - } - - /* Disable device interrupts */ - HIO_WRITE8(HCR_REGISTER, HCR_DI, ioaddr); - DPRINTK("disable device interrupts\n"); - - rx_buf_size = HIO_READ8(HP_RXSIZE, ioaddr); - DPRINTK("Rx buffer size is %d\n", rx_buf_size); - - if (rx_buf_size > HP_MEM_SIZE) - rx_buf_size = HP_MEM_SIZE; - - for (i=0, pbuf=buffer; i<rx_buf_size; i++, pbuf++) { - *pbuf = HIO_READ8(HP_RXBUFF, ioaddr + i); - DPRINTK("Read 0x%x as dword %d\n", *pbuf, i); - } - *pbuf = '\0'; - DPRINTK("Read buffer %s", (char*)buffer); - - HIO_WRITE8(HCR_REGISTER, HCR_EI, ioaddr); - DPRINTK("Enable interrupt\n"); - irq_handled = IRQ_HANDLED; - out: - spin_unlock(&dev->lock); - - - hypercall_tx(dev, "hello host", sizeof("hello host")); - return irq_handled; -} - - -static int hypercall_open(struct hypercall_dev *dev) -{ - int rc; - - rc = request_irq(dev->irq, &hypercall_interrupt, - SA_SHIRQ, dev->name, dev); - if (rc) { - printk(KERN_ERR "%s failed to request an irq\n", __FUNCTION__); - return rc; - } - - //hypercall_thread_start(dev); - - return 0; -} - -static int hypercall_close(struct hypercall_dev* dev) -{ - //hypercall_thread_stop(dev); - synchronize_irq(dev->irq); - free_irq(dev->irq, dev); - - return 0; -} - -#ifdef CONFIG_PM - -static int hypercall_suspend(struct pci_dev *pdev, pm_message_t state) -{ - pci_save_state(pdev); - pci_set_power_state(pdev, PCI_D3hot); - DPRINTK("Power mgmt suspend, set power state to PCI_D3hot\n"); - - return 0; -} - -static int hypercall_resume(struct pci_dev *pdev) -{ - pci_restore_state(pdev); - pci_set_power_state(pdev, PCI_D0); - DPRINTK("Power mgmt resume, set power state to PCI_D0\n"); - - return 0; -} - -#endif /* CONFIG_PM */ - -static void hypercall_cleanup_dev(struct hypercall_dev *dev) -{ - DPRINTK("cleaning up\n"); - pci_release_regions(dev->pci_dev); - pci_iounmap(dev->pci_dev, (void*)dev->io_addr); - pci_set_drvdata (dev->pci_dev, NULL); - kfree(dev); -} - -static struct pci_driver hypercall_pci_driver = { - .name = HYPERCALL_DRIVER_NAME, - .id_table = hypercall_pci_tbl, - .probe = hypercall_init_one, - .remove = __devexit_p(hypercall_remove_one), -#ifdef CONFIG_PM - .suspend = hypercall_suspend, - .resume = hypercall_resume, -#endif /* CONFIG_PM */ -}; - -static int __init hypercall_init_module(void) -{ - printk (KERN_INFO HYPERCALL_DRIVER_NAME "\n"); - return pci_module_init(&hypercall_pci_driver); -} - -static void __exit hypercall_cleanup_module(void) -{ - pci_unregister_driver(&hypercall_pci_driver); -} - -/* - * sysfs support - */ - -struct hypercall_attribute { - struct attribute attr; - ssize_t (*show)(struct hypercall_dev*, char *buf); - ssize_t (*store)(struct hypercall_dev*, unsigned long val); -}; - -static ssize_t hypercall_attribute_show(struct kobject *kobj, - struct attribute *attr, char *buf) -{ - struct hypercall_attribute *hypercall_attr; - struct hypercall_dev *hdev; - - hypercall_attr = container_of(attr, struct hypercall_attribute, attr); - hdev = container_of(kobj, struct hypercall_dev, kobject); - - if (!hypercall_attr->show) - return -EIO; - - return hypercall_attr->show(hdev, buf); -} - -static ssize_t hypercall_attribute_store(struct kobject *kobj, - struct attribute *attr, const char *buf, size_t count) -{ - struct hypercall_attribute *hypercall_attr; - struct hypercall_dev *hdev; - char *endp; - unsigned long val; - int rc; - - val = simple_strtoul(buf, &endp, 0); - - hypercall_attr = container_of(attr, struct hypercall_attribute, attr); - hdev = container_of(kobj, struct hypercall_dev, kobject); - - if (!hypercall_attr->store) - return -EIO; - - rc = hypercall_attr->store(hdev, val); - if (!rc) - rc = count; - return rc; -} - -#define MAKE_HYPERCALL_R_ATTR(_name) \ -static ssize_t _name##_show(struct hypercall_dev *hdev, char *buf) \ -{ \ - return sprintf(buf, "%lu\n", (unsigned long)hdev->_name); \ -} \ -struct hypercall_attribute hypercall_attr_##_name = __ATTR_RO(_name) - -#define MAKE_HYPERCALL_WR_ATTR(_name) \ -static int _name##_store(struct hypercall_dev *hdev, unsigned long val) \ -{ \ - hdev->_name = (typeof(hdev->_name))val; \ - return 0; \ -} \ -static ssize_t _name##_show(struct hypercall_dev *hdev, char *buf) \ -{ \ - return sprintf(buf, "%lu\n", (unsigned long)hdev->_name); \ -} \ -struct hypercall_attribute hypercall_attr_##_name = \ - __ATTR(_name,S_IRUGO|S_IWUGO,_name##_show,_name##_store) - -MAKE_HYPERCALL_R_ATTR(base_addr); -MAKE_HYPERCALL_R_ATTR(irq); -MAKE_HYPERCALL_WR_ATTR(cmd); - -#define GET_HYPERCALL_ATTR(_name) (&hypercall_attr_##_name.attr) - -static struct attribute *hypercall_default_attrs[] = { - GET_HYPERCALL_ATTR(base_addr), - GET_HYPERCALL_ATTR(irq), - GET_HYPERCALL_ATTR(cmd), - NULL -}; - -static struct sysfs_ops hypercall_sysfs_ops = { - .show = hypercall_attribute_show, - .store = hypercall_attribute_store, -}; - -static void hypercall_sysfs_release(struct kobject *kobj) -{ - DPRINTK(" called for obj name %s\n", kobj->name); -} - -static struct kobj_type hypercall_ktype = { - .release = hypercall_sysfs_release, - .sysfs_ops = &hypercall_sysfs_ops, - .default_attrs = hypercall_default_attrs -}; - - -static int hypercall_sysfs_add(struct hypercall_dev *dev) -{ - int rc; - - kobject_init(&dev->kobject); - dev->kobject.ktype = &hypercall_ktype; - rc = kobject_set_name(&dev->kobject, "%s", HYPERCALL_DRIVER_NAME); - if (rc != 0) { - printk("%s: kobject_set_name failed, err=%d\n", __FUNCTION__, rc); - return rc; - } - - rc = kobject_add(&dev->kobject); - if (rc != 0) { - printk("%s: kobject_add failed, err=%d\n", __FUNCTION__, rc); - return rc; - } - - rc = sysfs_create_link(&dev->pci_dev->dev.kobj, &dev->kobject, - HYPERCALL_DRIVER_NAME); - if (rc != 0) { - printk("%s: sysfs_create_link failed, err=%d\n", __FUNCTION__, rc); - kobject_del(&dev->kobject); - } - - return rc; -} - -static void hypercall_sysfs_remove(struct hypercall_dev *dev) -{ - sysfs_remove_link(&dev->pci_dev->dev.kobj, HYPERCALL_DRIVER_NAME); - kobject_del(&dev->kobject); -} - -module_init(hypercall_init_module); -module_exit(hypercall_cleanup_module); |
From: Dor L. <dor...@gm...> - 2007-11-11 17:25:35
|
Anthony Liguori wrote: > I don't think there's any plans for this driver to every be used seriously as > virtio seems like the agreed upon layer. So let's remove the code from the > tree so I can use the drivers/ directory for something else. > > There's also some vmchannel cmdline parameter that can be erased. > Signed-off-by: Anthony Liguori <ali...@us...> > > diff --git a/drivers/Kbuild b/drivers/Kbuild > deleted file mode 100644 > index 16075ce..0000000 > --- a/drivers/Kbuild > +++ /dev/null > @@ -1,3 +0,0 @@ > -EXTRA_CFLAGS := -I$(src)/../ > -obj-m := hypercall.o > -#hypercall-objs := hypercall.o > diff --git a/drivers/Makefile b/drivers/Makefile > deleted file mode 100644 > index 56facbb..0000000 > --- a/drivers/Makefile > +++ /dev/null > @@ -1,21 +0,0 @@ > -include ../config.mak > -KERNELDIR ?= /lib/modules/$(shell uname -r)/build > -KVERREL = $(patsubst /lib/modules/%/build,%,$(KERNELDIR)) > - > -DESTDIR= > - > -INSTALLDIR = $(patsubst %/build,%/extra,$(KERNELDIR)) > - > -all:: > - $(MAKE) -C $(KERNELDIR) M=`pwd` "$$@" > - > -install: > - mkdir -p $(DESTDIR)/$(INSTALLDIR) > - cp *.ko $(DESTDIR)/$(INSTALLDIR) > - /sbin/depmod -a > - > -clean: > - $(MAKE) -C $(KERNELDIR) M=`pwd` $@ > - > -svnclean: > - svn st | grep '^\?' | awk '{print $2}' | xargs rm -rf > diff --git a/drivers/hypercall.c b/drivers/hypercall.c > deleted file mode 100644 > index 8f52c5e..0000000 > --- a/drivers/hypercall.c > +++ /dev/null > @@ -1,497 +0,0 @@ > - > -#include <linux/module.h> > -#include <linux/kernel.h> > -#include <linux/compiler.h> > -#include <linux/pci.h> > -#include <linux/init.h> > -#include <linux/ioport.h> > -#include <linux/completion.h> > -#include <linux/interrupt.h> > -#include <asm/io.h> > -#include <asm/uaccess.h> > -#include <asm/irq.h> > - > -#define HYPERCALL_DRIVER_NAME "Qumranet_hypercall_driver" > -#define HYPERCALL_DRIVER_VERSION "1" > -#define PCI_VENDOR_ID_HYPERCALL 0x5002 > -#define PCI_DEVICE_ID_HYPERCALL 0x2258 > - > -MODULE_AUTHOR ("Dor Laor <dor...@qu...>"); > -MODULE_DESCRIPTION (HYPERCALL_DRIVER_NAME); > -MODULE_LICENSE("GPL"); > -MODULE_VERSION(HYPERCALL_DRIVER_VERSION); > - > -static int debug = 0; > -module_param(debug, int, 0); > -MODULE_PARM_DESC (debug, "toggle debug flag"); > - > -#define HYPERCALL_DEBUG 1 > -#if HYPERCALL_DEBUG > -# define DPRINTK(fmt, args...) printk(KERN_DEBUG "%s: " fmt, __FUNCTION__ , ## args) > -# define assert(expr) \ > - if(unlikely(!(expr))) { \ > - printk(KERN_ERR "Assertion failed! %s,%s,%s,line=%d\n", \ > - #expr,__FILE__,__FUNCTION__,__LINE__); \ > - } > -#else > -# define DPRINTK(fmt, args...) > -# define assert(expr) do {} while (0) > -#endif > - > -static struct pci_device_id hypercall_pci_tbl[] = { > - {PCI_VENDOR_ID_HYPERCALL, PCI_DEVICE_ID_HYPERCALL, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0 }, > - {0,} > -}; > -MODULE_DEVICE_TABLE (pci, hypercall_pci_tbl); > - > - > - > -/****** Hypercall device definitions ***************/ > -#include <qemu/hw/hypercall.h> > - > -/* read PIO/MMIO register */ > -#define HIO_READ8(reg, ioaddr) ioread8(ioaddr + (reg)) > -#define HIO_READ16(reg, ioaddr) ioread16(ioaddr + (reg)) > -#define HIO_READ32(reg, ioaddr) ioread32(ioaddr + (reg)) > - > -/* write PIO/MMIO register */ > -#define HIO_WRITE8(reg, val8, ioaddr) iowrite8((val8), ioaddr + (reg)) > -#define HIO_WRITE16(reg, val16, ioaddr) iowrite16((val16), ioaddr + (reg)) > -#define HIO_WRITE32(reg, val32, ioaddr) iowrite32((val32), ioaddr + (reg)) > - > - > -struct hypercall_dev { > - struct pci_dev *pci_dev; > - struct kobject kobject; > - u32 state; > - spinlock_t lock; > - u8 name[128]; > - u16 irq; > - u32 regs_len; > - void __iomem *io_addr; > - unsigned long base_addr; /* device I/O address */ > - unsigned long cmd; > -}; > - > - > -static int hypercall_close(struct hypercall_dev* dev); > -static int hypercall_open(struct hypercall_dev *dev); > -static void hypercall_cleanup_dev(struct hypercall_dev *dev); > -static irqreturn_t hypercall_interrupt(int irq, void *dev_instance, > - struct pt_regs *regs); > - > -static void __exit hypercall_sysfs_remove(struct hypercall_dev *dev); > -static int hypercall_sysfs_add(struct hypercall_dev *dev); > - > - > -static int __devinit hypercall_init_board(struct pci_dev *pdev, > - struct hypercall_dev **dev_out) > -{ > - unsigned long ioaddr; > - struct hypercall_dev *dev; > - int rc; > - u32 disable_dev_on_err = 0; > - unsigned long pio_start, pio_end, pio_flags, pio_len; > - unsigned long mmio_start, mmio_end, mmio_flags, mmio_len; > - > - assert(pdev != NULL); > - > - *dev_out = NULL; > - > - dev = kzalloc(sizeof(*dev), GFP_KERNEL); > - if (dev == NULL) { > - printk (KERN_ERR "%s: Unable to alloc hypercall device\n", pci_name(pdev)); > - return -ENOMEM; > - } > - dev->pci_dev = pdev; > - rc = pci_enable_device(pdev); > - if (rc) > - goto err_out; > - disable_dev_on_err = 1; > - > - pio_start = pci_resource_start (pdev, 0); > - pio_end = pci_resource_end (pdev, 0); > - pio_flags = pci_resource_flags (pdev, 0); > - pio_len = pci_resource_len (pdev, 0); > - > - mmio_start = pci_resource_start (pdev, 1); > - mmio_end = pci_resource_end (pdev, 1); > - mmio_flags = pci_resource_flags (pdev, 1); > - mmio_len = pci_resource_len (pdev, 1); > - > - DPRINTK("PIO region size == 0x%02lX\n", pio_len); > - DPRINTK("MMIO region size == 0x%02lX\n", mmio_len); > - > - rc = pci_request_regions (pdev, "hypercall"); > - if (rc) > - goto err_out; > - > -#define USE_IO_OPS 1 > -#ifdef USE_IO_OPS > - ioaddr = (unsigned long)pci_iomap(pdev, 0, 0); > - //ioaddr = ioport_map(pio_start, pio_len); > - if (!ioaddr) { > - printk(KERN_ERR "%s: cannot map PIO, aborting\n", pci_name(pdev)); > - rc = -EIO; > - goto err_out; > - } > - dev->base_addr = (unsigned long)pio_start; > - dev->io_addr = (void*)ioaddr; > - dev->regs_len = pio_len; > -#else > - ioaddr = pci_iomap(pdev, 1, 0); > - if (ioaddr == NULL) { > - printk(KERN_ERR "%s: cannot remap MMIO, aborting\n", pci_name(pdev)); > - rc = -EIO; > - goto err_out; > - } > - dev->base_addr = ioaddr; > - dev->io_addr = (void*)ioaddr; > - dev->regs_len = mmio_len; > -#endif /* USE_IO_OPS */ > - > - *dev_out = dev; > - return 0; > - > -err_out: > - hypercall_cleanup_dev(dev); > - if (disable_dev_on_err) > - pci_disable_device(pdev); > - return rc; > -} > - > -static int __devinit hypercall_init_one(struct pci_dev *pdev, > - const struct pci_device_id *ent) > -{ > - struct hypercall_dev *dev; > - u8 pci_rev; > - > - assert(pdev != NULL); > - assert(ent != NULL); > - > - pci_read_config_byte(pdev, PCI_REVISION_ID, &pci_rev); > - > - if (pdev->vendor == PCI_VENDOR_ID_HYPERCALL && > - pdev->device == PCI_DEVICE_ID_HYPERCALL) { > - printk(KERN_INFO "pci dev %s (id %04x:%04x rev %02x) is a guest hypercall device\n", > - pci_name(pdev), pdev->vendor, pdev->device, pci_rev); > - } > - > - if (hypercall_init_board(pdev, &dev) != 0) > - return -1; > - > - assert(dev != NULL); > - > - dev->irq = pdev->irq; > - > - spin_lock_init(&dev->lock); > - pci_set_drvdata(pdev, dev); > - > - printk (KERN_INFO "name=%s: base_addr=0x%lx, io_addr=0x%lx, IRQ=%d\n", > - dev->name, dev->base_addr, (unsigned long)dev->io_addr, dev->irq); > - hypercall_open(dev); > - > - if (hypercall_sysfs_add(dev) != 0) > - return -1; > - > - return 0; > -} > - > -static void __devexit hypercall_remove_one(struct pci_dev *pdev) > -{ > - struct hypercall_dev *dev = pci_get_drvdata(pdev); > - > - assert(dev != NULL); > - > - hypercall_close(dev); > - hypercall_sysfs_remove(dev); > - hypercall_cleanup_dev(dev); > - pci_disable_device(pdev); > -} > - > -static int hypercall_tx(struct hypercall_dev *dev, unsigned char *buf, size_t len) > -{ > - void __iomem *ioaddr = (void __iomem*)dev->io_addr; > - int i; > - > - if (len > HP_MEM_SIZE) > - return -EINVAL; > - > - spin_lock(&dev->lock); > - HIO_WRITE8(HP_TXSIZE, len, ioaddr); > - for (i=0; i< len; i++) > - HIO_WRITE8(HP_TXBUFF, buf[i], ioaddr); > - spin_unlock(&dev->lock); > - > - return 0; > -} > - > -/* > - * The interrupt handler does all of the rx work and cleans up > - * after the tx > - */ > -static irqreturn_t hypercall_interrupt(int irq, void *dev_instance, > - struct pt_regs *regs) > -{ > - struct hypercall_dev *dev = (struct hypercall_dev *)dev_instance; > - void __iomem *ioaddr = (void __iomem*)dev->io_addr; > - u32 status; > - int irq_handled = IRQ_NONE; > - int rx_buf_size; > - int i; > - u8 buffer[HP_MEM_SIZE]; > - u8 *pbuf; > - > - DPRINTK("base addr is 0x%lx, io_addr=0x%lx\n", dev->base_addr, (long)dev->io_addr); > - > - spin_lock(&dev->lock); > - status = HIO_READ8(HSR_REGISTER, ioaddr); > - DPRINTK("irq status is 0x%x\n", status); > - > - /* shared irq? */ > - if (unlikely((status & HSR_VDR) == 0)) { > - DPRINTK("not handeling irq, not ours\n"); > - goto out; > - } > - > - /* Disable device interrupts */ > - HIO_WRITE8(HCR_REGISTER, HCR_DI, ioaddr); > - DPRINTK("disable device interrupts\n"); > - > - rx_buf_size = HIO_READ8(HP_RXSIZE, ioaddr); > - DPRINTK("Rx buffer size is %d\n", rx_buf_size); > - > - if (rx_buf_size > HP_MEM_SIZE) > - rx_buf_size = HP_MEM_SIZE; > - > - for (i=0, pbuf=buffer; i<rx_buf_size; i++, pbuf++) { > - *pbuf = HIO_READ8(HP_RXBUFF, ioaddr + i); > - DPRINTK("Read 0x%x as dword %d\n", *pbuf, i); > - } > - *pbuf = '\0'; > - DPRINTK("Read buffer %s", (char*)buffer); > - > - HIO_WRITE8(HCR_REGISTER, HCR_EI, ioaddr); > - DPRINTK("Enable interrupt\n"); > - irq_handled = IRQ_HANDLED; > - out: > - spin_unlock(&dev->lock); > - > - > - hypercall_tx(dev, "hello host", sizeof("hello host")); > - return irq_handled; > -} > - > - > -static int hypercall_open(struct hypercall_dev *dev) > -{ > - int rc; > - > - rc = request_irq(dev->irq, &hypercall_interrupt, > - SA_SHIRQ, dev->name, dev); > - if (rc) { > - printk(KERN_ERR "%s failed to request an irq\n", __FUNCTION__); > - return rc; > - } > - > - //hypercall_thread_start(dev); > - > - return 0; > -} > - > -static int hypercall_close(struct hypercall_dev* dev) > -{ > - //hypercall_thread_stop(dev); > - synchronize_irq(dev->irq); > - free_irq(dev->irq, dev); > - > - return 0; > -} > - > -#ifdef CONFIG_PM > - > -static int hypercall_suspend(struct pci_dev *pdev, pm_message_t state) > -{ > - pci_save_state(pdev); > - pci_set_power_state(pdev, PCI_D3hot); > - DPRINTK("Power mgmt suspend, set power state to PCI_D3hot\n"); > - > - return 0; > -} > - > -static int hypercall_resume(struct pci_dev *pdev) > -{ > - pci_restore_state(pdev); > - pci_set_power_state(pdev, PCI_D0); > - DPRINTK("Power mgmt resume, set power state to PCI_D0\n"); > - > - return 0; > -} > - > -#endif /* CONFIG_PM */ > - > -static void hypercall_cleanup_dev(struct hypercall_dev *dev) > -{ > - DPRINTK("cleaning up\n"); > - pci_release_regions(dev->pci_dev); > - pci_iounmap(dev->pci_dev, (void*)dev->io_addr); > - pci_set_drvdata (dev->pci_dev, NULL); > - kfree(dev); > -} > - > -static struct pci_driver hypercall_pci_driver = { > - .name = HYPERCALL_DRIVER_NAME, > - .id_table = hypercall_pci_tbl, > - .probe = hypercall_init_one, > - .remove = __devexit_p(hypercall_remove_one), > -#ifdef CONFIG_PM > - .suspend = hypercall_suspend, > - .resume = hypercall_resume, > -#endif /* CONFIG_PM */ > -}; > - > -static int __init hypercall_init_module(void) > -{ > - printk (KERN_INFO HYPERCALL_DRIVER_NAME "\n"); > - return pci_module_init(&hypercall_pci_driver); > -} > - > -static void __exit hypercall_cleanup_module(void) > -{ > - pci_unregister_driver(&hypercall_pci_driver); > -} > - > -/* > - * sysfs support > - */ > - > -struct hypercall_attribute { > - struct attribute attr; > - ssize_t (*show)(struct hypercall_dev*, char *buf); > - ssize_t (*store)(struct hypercall_dev*, unsigned long val); > -}; > - > -static ssize_t hypercall_attribute_show(struct kobject *kobj, > - struct attribute *attr, char *buf) > -{ > - struct hypercall_attribute *hypercall_attr; > - struct hypercall_dev *hdev; > - > - hypercall_attr = container_of(attr, struct hypercall_attribute, attr); > - hdev = container_of(kobj, struct hypercall_dev, kobject); > - > - if (!hypercall_attr->show) > - return -EIO; > - > - return hypercall_attr->show(hdev, buf); > -} > - > -static ssize_t hypercall_attribute_store(struct kobject *kobj, > - struct attribute *attr, const char *buf, size_t count) > -{ > - struct hypercall_attribute *hypercall_attr; > - struct hypercall_dev *hdev; > - char *endp; > - unsigned long val; > - int rc; > - > - val = simple_strtoul(buf, &endp, 0); > - > - hypercall_attr = container_of(attr, struct hypercall_attribute, attr); > - hdev = container_of(kobj, struct hypercall_dev, kobject); > - > - if (!hypercall_attr->store) > - return -EIO; > - > - rc = hypercall_attr->store(hdev, val); > - if (!rc) > - rc = count; > - return rc; > -} > - > -#define MAKE_HYPERCALL_R_ATTR(_name) \ > -static ssize_t _name##_show(struct hypercall_dev *hdev, char *buf) \ > -{ \ > - return sprintf(buf, "%lu\n", (unsigned long)hdev->_name); \ > -} \ > -struct hypercall_attribute hypercall_attr_##_name = __ATTR_RO(_name) > - > -#define MAKE_HYPERCALL_WR_ATTR(_name) \ > -static int _name##_store(struct hypercall_dev *hdev, unsigned long val) \ > -{ \ > - hdev->_name = (typeof(hdev->_name))val; \ > - return 0; \ > -} \ > -static ssize_t _name##_show(struct hypercall_dev *hdev, char *buf) \ > -{ \ > - return sprintf(buf, "%lu\n", (unsigned long)hdev->_name); \ > -} \ > -struct hypercall_attribute hypercall_attr_##_name = \ > - __ATTR(_name,S_IRUGO|S_IWUGO,_name##_show,_name##_store) > - > -MAKE_HYPERCALL_R_ATTR(base_addr); > -MAKE_HYPERCALL_R_ATTR(irq); > -MAKE_HYPERCALL_WR_ATTR(cmd); > - > -#define GET_HYPERCALL_ATTR(_name) (&hypercall_attr_##_name.attr) > - > -static struct attribute *hypercall_default_attrs[] = { > - GET_HYPERCALL_ATTR(base_addr), > - GET_HYPERCALL_ATTR(irq), > - GET_HYPERCALL_ATTR(cmd), > - NULL > -}; > - > -static struct sysfs_ops hypercall_sysfs_ops = { > - .show = hypercall_attribute_show, > - .store = hypercall_attribute_store, > -}; > - > -static void hypercall_sysfs_release(struct kobject *kobj) > -{ > - DPRINTK(" called for obj name %s\n", kobj->name); > -} > - > -static struct kobj_type hypercall_ktype = { > - .release = hypercall_sysfs_release, > - .sysfs_ops = &hypercall_sysfs_ops, > - .default_attrs = hypercall_default_attrs > -}; > - > - > -static int hypercall_sysfs_add(struct hypercall_dev *dev) > -{ > - int rc; > - > - kobject_init(&dev->kobject); > - dev->kobject.ktype = &hypercall_ktype; > - rc = kobject_set_name(&dev->kobject, "%s", HYPERCALL_DRIVER_NAME); > - if (rc != 0) { > - printk("%s: kobject_set_name failed, err=%d\n", __FUNCTION__, rc); > - return rc; > - } > - > - rc = kobject_add(&dev->kobject); > - if (rc != 0) { > - printk("%s: kobject_add failed, err=%d\n", __FUNCTION__, rc); > - return rc; > - } > - > - rc = sysfs_create_link(&dev->pci_dev->dev.kobj, &dev->kobject, > - HYPERCALL_DRIVER_NAME); > - if (rc != 0) { > - printk("%s: sysfs_create_link failed, err=%d\n", __FUNCTION__, rc); > - kobject_del(&dev->kobject); > - } > - > - return rc; > -} > - > -static void hypercall_sysfs_remove(struct hypercall_dev *dev) > -{ > - sysfs_remove_link(&dev->pci_dev->dev.kobj, HYPERCALL_DRIVER_NAME); > - kobject_del(&dev->kobject); > -} > - > -module_init(hypercall_init_module); > -module_exit(hypercall_cleanup_module); > > |
From: Avi K. <av...@qu...> - 2007-11-27 12:26:22
|
Anthony Liguori wrote: > I don't think there's any plans for this driver to every be used seriously as > virtio seems like the agreed upon layer. So let's remove the code from the > tree so I can use the drivers/ directory for something else. > > Applied, thanks. -- error compiling committee.c: too many arguments to function |
From: Anthony L. <ali...@us...> - 2007-11-11 03:20:30
|
This provides a make sync within the drivers/ directory to allow virtio drivers to be built as third-party modules much as we do with the KVM driver. Signed-off-by: Anthony Liguori <ali...@us...> diff --git a/.gitignore b/.gitignore index 600508a..f9b95f8 100644 --- a/.gitignore +++ b/.gitignore @@ -26,4 +26,11 @@ kernel/x86_emulate.[ch] kernel/include/linux/kvm*.h kernel/Module.symvers kernel/Modules.symvers -kernel/.tmp_versions \ No newline at end of file +kernel/.tmp_versions +drivers/config.c +drivers/virtio.c +drivers/virtio_ring.c +drivers/virtio_pci.c +drivers/virtio_blk.c +drivers/virtio_net.c +drivers/include/linux/virtio*.h diff --git a/Makefile b/Makefile index 776ff01..dd1da6e 100644 --- a/Makefile +++ b/Makefile @@ -68,7 +68,7 @@ srpm: $(RM) $(tmpspec) clean: - for i in $(if $(WANT_MODULE), kernel) user libkvm qemu; do \ + for i in $(if $(WANT_MODULE), kernel drivers) user libkvm qemu; do \ make -C $$i clean; \ done rm -f config.mak user/config.mak diff --git a/drivers/Kbuild b/drivers/Kbuild new file mode 100644 index 0000000..74d1ed0 --- /dev/null +++ b/drivers/Kbuild @@ -0,0 +1,2 @@ +EXTRA_CFLAGS := -I$(src)/include -include $(src)/external-module-compat.h +obj-m := virtio.o virtio_ring.o virtio_pci.o virtio_net.o virtio_blk.o diff --git a/drivers/Makefile b/drivers/Makefile new file mode 100644 index 0000000..a0d232c --- /dev/null +++ b/drivers/Makefile @@ -0,0 +1,23 @@ +include ../config.mak + +KVERREL = $(patsubst /lib/modules/%/build,%,$(KERNELDIR)) + +DESTDIR= + +INSTALLDIR = $(patsubst %/build,%/extra,$(KERNELDIR)) +ORIGMODDIR = $(patsubst %/build,%/kernel,$(KERNELDIR)) + +LINUX = ../linux-2.6 + +all:: + $(MAKE) -C $(KERNELDIR) M=`pwd` "$$@" + +sync: + mkdir -p include/linux + rsync --exclude='*.mod.c' "$(LINUX)"/drivers/virtio/*.[ch] . + rsync "$(LINUX)"/include/linux/virtio*.h include/linux + rsync "$(LINUX)"/drivers/block/virtio_blk.c . + rsync "$(LINUX)"/drivers/net/virtio_net.c . + +clean: + $(MAKE) -C $(KERNELDIR) M=`pwd` $@ diff --git a/drivers/external-module-compat.h b/drivers/external-module-compat.h new file mode 100644 index 0000000..f0e4868 --- /dev/null +++ b/drivers/external-module-compat.h @@ -0,0 +1,14 @@ +#ifndef _VIRTIO_EXT_MOD_COMPAT_H +#define _VIRTIO_EXT_MOD_COMPAT_H + +/* make sure out copies of the headers get included */ + +#include "include/linux/virtio.h" +#include "include/linux/virtio_config.h" +#include "include/linux/virtio_ring.h" +#include "include/linux/virtio_pci.h" +#include "include/linux/virtio_blk.h" +#include "include/linux/virtio_net.h" +#include "include/linux/virtio_9p.h" + +#endif |
From: Avi K. <av...@qu...> - 2007-11-27 12:28:08
|
Anthony Liguori wrote: > This provides a make sync within the drivers/ directory to allow virtio drivers > to be built as third-party modules much as we do with the KVM driver. > > We will want to ship and build guest drivers as a separate package (that's not to say that this patch is wrong; but the drivers package will need its own configure/make/install thing). -- error compiling committee.c: too many arguments to function |
From: Anthony L. <ali...@us...> - 2007-11-11 03:20:47
|
This implements the basic virtio and virtio_ring infrastructure for QEMU. It's designed at the moment with intimate knowledge of the virtio_pci device. In the future, we may want to abstract that a little further so that the virtio backend device doesn't have any knowledge of the virtio transport. Signed-off-by: Anthony Liguori <ali...@us...> diff --git a/qemu/Makefile.target b/qemu/Makefile.target index 65f449e..c7686b2 100644 --- a/qemu/Makefile.target +++ b/qemu/Makefile.target @@ -448,6 +448,9 @@ VL_OBJS += rtl8139.o # PCI Hypercall VL_OBJS+= hypercall.o +# virtio devices +VL_OBJS += virtio.o + ifeq ($(TARGET_BASE_ARCH), i386) # Hardware support VL_OBJS+= ide.o pckbd.o ps2.o vga.o $(SOUND_HW) dma.o $(AUDIODRV) diff --git a/qemu/hw/virtio.c b/qemu/hw/virtio.c new file mode 100644 index 0000000..e30ffb1 --- /dev/null +++ b/qemu/hw/virtio.c @@ -0,0 +1,336 @@ +/* + * Virtio Support + * + * Copyright IBM, Corp. 2007 + * + * Authors: + * Anthony Liguori <ali...@us...> + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + */ + +#include "vl.h" +#include "virtio.h" +#include <err.h> + +/* from Linux's linux/virtio_pci.h */ + +/* A 32-bit r/o bitmask of the features supported by the host */ +#define VIRTIO_PCI_HOST_FEATURES 0 + +/* A 32-bit r/w bitmask of features activated by the guest */ +#define VIRTIO_PCI_GUEST_FEATURES 4 + +/* A 32-bit r/w PFN for the currently selected queue */ +#define VIRTIO_PCI_QUEUE_PFN 8 + +/* A 16-bit r/o queue size for the currently selected queue */ +#define VIRTIO_PCI_QUEUE_NUM 12 + +/* A 16-bit r/w queue selector */ +#define VIRTIO_PCI_QUEUE_SEL 14 + +/* A 16-bit r/w queue notifier */ +#define VIRTIO_PCI_QUEUE_NOTIFY 16 + +/* An 8-bit device status register. */ +#define VIRTIO_PCI_STATUS 18 + +/* An 8-bit r/o interrupt status register. Reading the value will return the + * current contents of the ISR and will also clear it. This is effectively + * a read-and-acknowledge. */ +#define VIRTIO_PCI_ISR 19 + +/* A 32-bit r/o configuration size. This is the amount of memory required + * to be allocated for VIRTIO_PCI_CONFIG_PFN. */ +#define VIRTIO_PCI_CONFIG_LEN 20 + +/* A 32-bit r/w PFN for the shared configuration information. The PA written + * by the host must point to at least VIRTIO_PCI_CONFIG_LEN bytes */ +#define VIRTIO_PCI_CONFIG_PFN 24 + +/* virt queue functions */ + +#define wmb() asm volatile("sfence" ::: "memory") + +static void virtqueue_init(VirtQueue *vq, void *p) +{ + vq->vring.desc = p; + vq->vring.avail = p + vq->vring.num * sizeof(VRingDesc); + vq->vring.used = p + (vq->vring.num + 1) * + (sizeof(VRingDesc) + sizeof(uint16_t)); +} + +static unsigned virtqueue_next_desc(VirtQueue *vq, unsigned int i) +{ + unsigned int next; + + /* If this descriptor says it doesn't chain, we're done. */ + if (!(vq->vring.desc[i].flags & VRING_DESC_F_NEXT)) + return vq->vring.num; + + /* Check they're not leading us off end of descriptors. */ + next = vq->vring.desc[i].next; + /* Make sure compiler knows to grab that: we don't want it changing! */ + wmb(); + + if (next >= vq->vring.num) + errx(1, "Desc next is %u", next); + + return next; +} + +void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem, + unsigned int len) +{ + VRingUsedElem *used; + + /* Get a pointer to the next entry in the used ring. */ + used = &vq->vring.used->ring[vq->vring.used->idx % vq->vring.num]; + used->id = elem->index; + used->len = len; + /* Make sure buffer is written before we update index. */ + wmb(); + vq->vring.used->idx++; +} + +int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem) +{ + unsigned int i, head; + unsigned int position; + + /* Check it isn't doing very strange things with descriptor numbers. */ + if ((uint16_t)(vq->vring.avail->idx - vq->last_avail_idx) > vq->vring.num) + errx(1, "Guest moved used index from %u to %u", + vq->last_avail_idx, vq->vring.avail->idx); + + /* If there's nothing new since last we looked, return invalid. */ + if (vq->vring.avail->idx == vq->last_avail_idx) + return 0; + + /* Grab the next descriptor number they're advertising, and increment + * the index we've seen. */ + head = vq->vring.avail->ring[vq->last_avail_idx++ % vq->vring.num]; + + /* If their number is silly, that's a fatal mistake. */ + if (head >= vq->vring.num) + errx(1, "Guest says index %u is available", head); + + /* When we start there are none of either input nor output. */ + position = elem->out_num = elem->in_num = 0; + + i = head; + do { + struct iovec *sg; + + if ((vq->vring.desc[i].addr + vq->vring.desc[i].len) > ram_size) + errx(1, "Guest sent invalid pointer"); + + if (vq->vring.desc[i].flags & VRING_DESC_F_WRITE) + sg = &elem->in_sg[elem->in_num++]; + else + sg = &elem->out_sg[elem->out_num++]; + + /* Grab the first descriptor, and check it's OK. */ + sg->iov_len = vq->vring.desc[i].len; + sg->iov_base = phys_ram_base + vq->vring.desc[i].addr; + + /* If we've got too many, that implies a descriptor loop. */ + if ((elem->in_num + elem->out_num) > vq->vring.num) + errx(1, "Looped descriptor"); + } while ((i = virtqueue_next_desc(vq, i)) != vq->vring.num); + + elem->index = head; + + return elem->in_num + elem->out_num; +} + +/* virtio device */ + +static VirtIODevice *to_virtio_device(PCIDevice *pci_dev) +{ + return (VirtIODevice *)pci_dev; +} + +static void virtio_update_irq(VirtIODevice *vdev) +{ + qemu_set_irq(vdev->pci_dev.irq[0], vdev->isr & 1); +} + +static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val) +{ + VirtIODevice *vdev = to_virtio_device(opaque); + ram_addr_t pa; + + addr -= vdev->addr; + + switch (addr) { + case VIRTIO_PCI_GUEST_FEATURES: + if (vdev->set_features) + vdev->set_features(vdev, val); + vdev->features = val; + break; + case VIRTIO_PCI_QUEUE_PFN: + pa = (ram_addr_t)val << TARGET_PAGE_BITS; + vdev->vq[vdev->queue_sel].pfn = val; + if (pa == 0) { + vdev->vq[vdev->queue_sel].vring.desc = NULL; + vdev->vq[vdev->queue_sel].vring.avail = NULL; + vdev->vq[vdev->queue_sel].vring.used = NULL; + } else if (pa < (ram_size - TARGET_PAGE_SIZE)) { + virtqueue_init(&vdev->vq[vdev->queue_sel], phys_ram_base + pa); + /* FIXME if pa == 0, deal with device tear down */ + } + break; + case VIRTIO_PCI_QUEUE_SEL: + if (val < VIRTIO_PCI_QUEUE_MAX) + vdev->queue_sel = val; + break; + case VIRTIO_PCI_QUEUE_NOTIFY: + if (val < VIRTIO_PCI_QUEUE_MAX && vdev->vq[val].vring.desc) + vdev->vq[val].handle_output(vdev, &vdev->vq[val]); + break; + case VIRTIO_PCI_STATUS: + vdev->status = val & 0xFF; + break; + case VIRTIO_PCI_CONFIG_PFN: + pa = (ram_addr_t)val << TARGET_PAGE_BITS; + if (pa && pa < (ram_size - vdev->config_len) && vdev->update_config) + vdev->update_config(vdev, phys_ram_base + pa); + vdev->config_pfn = val; + break; + } +} + +static uint32_t virtio_ioport_read(void *opaque, uint32_t addr) +{ + VirtIODevice *vdev = to_virtio_device(opaque); + uint32_t ret = 0xFFFFFFFF; + + addr -= vdev->addr; + + switch (addr) { + case VIRTIO_PCI_HOST_FEATURES: + ret = vdev->get_features(vdev); + break; + case VIRTIO_PCI_GUEST_FEATURES: + ret = vdev->features; + break; + case VIRTIO_PCI_QUEUE_PFN: + ret = vdev->vq[vdev->queue_sel].pfn; + break; + case VIRTIO_PCI_QUEUE_NUM: + ret = vdev->vq[vdev->queue_sel].vring.num; + break; + case VIRTIO_PCI_QUEUE_SEL: + ret = vdev->queue_sel; + break; + case VIRTIO_PCI_STATUS: + ret = vdev->status; + break; + case VIRTIO_PCI_ISR: + /* reading from the ISR also clears it. */ + ret = vdev->isr; + vdev->isr = 0; + virtio_update_irq(vdev); + break; + case VIRTIO_PCI_CONFIG_LEN: + ret = vdev->config_len; + break; + case VIRTIO_PCI_CONFIG_PFN: + ret = vdev->config_pfn; + break; + default: + break; + } + + return ret; +} + +static void virtio_map(PCIDevice *pci_dev, int region_num, + uint32_t addr, uint32_t size, int type) +{ + VirtIODevice *vdev = to_virtio_device(pci_dev); + + vdev->addr = addr; + register_ioport_write(addr, size, 1, virtio_ioport_write, vdev); + register_ioport_write(addr, size, 2, virtio_ioport_write, vdev); + register_ioport_write(addr, size, 4, virtio_ioport_write, vdev); + register_ioport_read(addr, size, 1, virtio_ioport_read, vdev); + register_ioport_read(addr, size, 2, virtio_ioport_read, vdev); + register_ioport_read(addr, size, 4, virtio_ioport_read, vdev); +} + +VirtQueue *virtio_add_queue(VirtIODevice *vdev, + void (*handle_output)(VirtIODevice *, VirtQueue *)) +{ + int i; + + for (i = 0; i < VIRTIO_PCI_QUEUE_MAX; i++) { + if (vdev->vq[i].vring.num == 0) + break; + } + + /* FIXME bug_on i == VIRTIO_PCI_QUEUE_MAX */ + + vdev->vq[i].vring.num = VIRTQUEUE_MAX_SIZE; + vdev->vq[i].handle_output = handle_output; + vdev->vq[i].index = i; + + return &vdev->vq[i]; +} + +void virtio_notify(VirtIODevice *vdev, VirtQueue *vq) +{ + if (vq->vring.avail->flags & VRING_AVAIL_F_NO_INTERRUPT) + return; + + vdev->isr = 1; + virtio_update_irq(vdev); +} + +VirtIODevice *virtio_init_pci(PCIBus *bus, const char *name, + uint16_t vendor, uint16_t device, + uint16_t subvendor, uint16_t subdevice, + size_t config_size, size_t struct_size) +{ + VirtIODevice *vdev; + PCIDevice *pci_dev; + uint8_t *config; + + pci_dev = pci_register_device(bus, name, struct_size, + -1, NULL, NULL); + vdev = to_virtio_device(pci_dev); + + vdev->status = 0; + vdev->isr = 0; + vdev->queue_sel = 0; + memset(vdev->vq, 0, sizeof(vdev->vq)); + + config = pci_dev->config; + config[0x00] = vendor & 0xFF; + config[0x01] = (vendor >> 8) & 0xFF; + config[0x02] = device & 0xFF; + config[0x03] = (device >> 8) & 0xFF; + + config[0x09] = 0x00; + config[0x0a] = 0x00; + config[0x0b] = 0x05; + config[0x0e] = 0x00; + + config[0x2c] = subvendor & 0xFF; + config[0x2d] = (subvendor >> 8) & 0xFF; + config[0x2e] = subdevice & 0xFF; + config[0x2f] = (subdevice >> 8) & 0xFF; + + config[0x3d] = 1; + + vdev->name = name; + vdev->config_len = config_size; + + pci_register_io_region(pci_dev, 0, 28, PCI_ADDRESS_SPACE_IO, virtio_map); + + return vdev; +} diff --git a/qemu/hw/virtio.h b/qemu/hw/virtio.h new file mode 100644 index 0000000..44d1245 --- /dev/null +++ b/qemu/hw/virtio.h @@ -0,0 +1,136 @@ +/* + * Virtio Support + * + * Copyright IBM, Corp. 2007 + * + * Authors: + * Anthony Liguori <ali...@us...> + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + */ + +#ifndef _QEMU_VIRTIO_H +#define _QEMU_VIRTIO_H + +/* from Linux's linux/virtio_config.h */ + +/* Status byte for guest to report progress, and synchronize features. */ +/* We have seen device and processed generic fields (VIRTIO_CONFIG_F_VIRTIO) */ +#define VIRTIO_CONFIG_S_ACKNOWLEDGE 1 +/* We have found a driver for the device. */ +#define VIRTIO_CONFIG_S_DRIVER 2 +/* Driver has used its parts of the config, and is happy */ +#define VIRTIO_CONFIG_S_DRIVER_OK 4 +/* We've given up on this device. */ +#define VIRTIO_CONFIG_S_FAILED 0x80 + +/* from Linux's linux/virtio_ring.h */ + +/* This marks a buffer as continuing via the next field. */ +#define VRING_DESC_F_NEXT 1 +/* This marks a buffer as write-only (otherwise read-only). */ +#define VRING_DESC_F_WRITE 2 + +/* This means don't notify other side when buffer added. */ +#define VRING_USED_F_NO_NOTIFY 1 +/* This means don't interrupt guest when buffer consumed. */ +#define VRING_AVAIL_F_NO_INTERRUPT 1 + +typedef struct VirtQueue VirtQueue; + +typedef struct VRingDesc +{ + uint64_t addr; + uint32_t len; + uint16_t flags; + uint16_t next; +} VRingDesc; + +typedef struct VRingAvail +{ + uint16_t flags; + uint16_t idx; + uint16_t ring[0]; +} VRingAvail; + +typedef struct VRingUsedElem +{ + uint32_t id; + uint32_t len; +} VRingUsedElem; + +typedef struct VRingUsed +{ + uint16_t flags; + uint16_t idx; + VRingUsedElem ring[0]; +} VRingUsed; + +typedef struct VRing +{ + unsigned int num; + VRingDesc *desc; + VRingAvail *avail; + VRingUsed *used; +} VRing; + +struct VirtQueue +{ + VRing vring; + uint32_t pfn; + uint16_t last_avail_idx; + void (*handle_output)(VirtIODevice *vdev, VirtQueue *vq); + int index; +}; + +#define VIRTQUEUE_MAX_SIZE 127 + +typedef struct VirtQueueElement +{ + unsigned int index; + unsigned int out_num; + unsigned int in_num; + struct iovec in_sg[VIRTQUEUE_MAX_SIZE]; + struct iovec out_sg[VIRTQUEUE_MAX_SIZE]; +} VirtQueueElement; + +#define VIRTIO_PCI_QUEUE_MAX 16 + +struct VirtIODevice +{ + PCIDevice pci_dev; + const char *name; + uint32_t addr; + uint16_t vendor; + uint16_t device; + uint8_t status; + uint8_t isr; + uint16_t queue_sel; + uint32_t features; + uint32_t config_pfn; + uint32_t config_len; + uint32_t (*get_features)(VirtIODevice *vdev); + void (*set_features)(VirtIODevice *vdev, uint32_t val); + void (*update_config)(VirtIODevice *vdev, uint8_t *config); + VirtQueue vq[VIRTIO_PCI_QUEUE_MAX]; +}; + +VirtIODevice *virtio_init_pci(PCIBus *bus, const char *name, + uint16_t vendor, uint16_t device, + uint16_t subvendor, uint16_t subdevice, + size_t config_size, size_t struct_size); + +VirtQueue *virtio_add_queue(VirtIODevice *vdev, + void (*handle_output)(VirtIODevice *, + VirtQueue *)); + +void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem, + unsigned int len); + +int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem); + +void virtio_notify(VirtIODevice *vdev, VirtQueue *vq); + +#endif diff --git a/qemu/vl.h b/qemu/vl.h index 01aeabc..fafcf09 100644 --- a/qemu/vl.h +++ b/qemu/vl.h @@ -1392,6 +1392,10 @@ typedef struct ADBDevice ADBDevice; void pci_hypercall_init(PCIBus *bus); void vmchannel_init(CharDriverState *hd, uint32_t deviceid, uint32_t index); +/* virtio.c */ + +typedef struct VirtIODevice VirtIODevice; + /* buf = NULL means polling */ typedef int ADBDeviceRequest(ADBDevice *d, uint8_t *buf_out, const uint8_t *buf, int len); |
From: Dor L. <dor...@gm...> - 2007-11-11 17:24:40
|
Anthony Liguori wrote: > This patch implements a very naive virtio block device backend in QEMU. > There's a lot of room for future optimization. We need to merge a -disk patch > before we can provide a mechanism to expose this to users. > > Signed-off-by: Anthony Liguori <ali...@us...> > > diff --git a/qemu/Makefile.target b/qemu/Makefile.target > index c7686b2..49c0fc7 100644 > [snip] > + > + if (1) { > + BlockDriverState *bs = bdrv_new("vda"); > + if (bdrv_open(bs, "/home/anthony/images/linux.img", BDRV_O_SNAPSHOT)) > + exit(1); > Can you add a printf to the exit(1). I had to gdb the code to find why my qemu is not running no more (in earlier version I did remember to change the path but not after the new patches. [snip] > + > +static void virtio_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq) > +{ > + VirtIOBlock *s = to_virtio_blk(vdev); > + VirtQueueElement elem; > + unsigned int count; > + > + while ((count = virtqueue_pop(vq, &elem)) != 0) { > + struct virtio_blk_inhdr *in; > + struct virtio_blk_outhdr *out; > + unsigned int wlen; > + off_t off; > + int i; > + > + out = (void *)elem.out_sg[0].iov_base; > + in = (void *)elem.in_sg[elem.in_num - 1].iov_base; > + off = out->sector; > + > + if (out->type & VIRTIO_BLK_T_SCSI_CMD) { > + wlen = sizeof(*in); > + in->status = VIRTIO_BLK_S_UNSUPP; > + } else if (out->type & VIRTIO_BLK_T_OUT) { > + wlen = sizeof(*in); > + > + for (i = 1; i < elem.out_num; i++) { > + bdrv_write(s->bs, off, > + elem.out_sg[i].iov_base, > + elem.out_sg[i].iov_len / 512); > + off += elem.out_sg[i].iov_len / 512; > + } > + > + in->status = VIRTIO_BLK_S_OK; > + } else { > + wlen = sizeof(*in); > + > + for (i = 0; i < elem.in_num - 1; i++) { > + bdrv_read(s->bs, off, > + elem.in_sg[i].iov_base, > + elem.in_sg[i].iov_len / 512); > + off += elem.in_sg[i].iov_len / 512; > + wlen += elem.in_sg[i].iov_len; > + } > + > + in->status = VIRTIO_BLK_S_OK; > + } > + > + virtqueue_push(vq, &elem, wlen); > + virtio_notify(vdev, vq); > + } > You can move the notify out of the while loop. This way you save irqs. > +} > + > +static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config) > +{ > + VirtIOBlock *s = to_virtio_blk(vdev); > + int64_t capacity; > + uint32_t v; > + > + bdrv_get_geometry(s->bs, &capacity); > + memcpy(config + VIRTIO_CONFIG_BLK_F_CAPACITY, &capacity, sizeof(capacity)); > + > + v = VIRTQUEUE_MAX_SIZE - 2; > + memcpy(config + VIRTIO_CONFIG_BLK_F_SEG_MAX, &v, sizeof(v)); > +} > + > +static uint32_t virtio_blk_get_features(VirtIODevice *vdev) > +{ > + return (1 << VIRTIO_BLK_F_SEG_MAX); > In general I think we need to add another feature or even version number ( I know you guys hate it). The reason is - Let's say you dont change functionality but change the irq protocol (for example the isr won't be zeroed on read), then an old guest driver wouldn't know it runs on a new host version and will have its irq line pulled up. So I suggest adding a capability of VIRTIO_ISR_CLEAR_XXX or adding a version number. Comments? > +} > + > +VirtIODevice *virtio_blk_init(PCIBus *bus, uint16_t vendor, uint16_t device, > + BlockDriverState *bs) > +{ > + VirtIOBlock *s; > + > + s = (VirtIOBlock *)virtio_init_pci(bus, "virtio-blk", vendor, device, > + vendor, VIRTIO_ID_BLOCK, > + 16, sizeof(VirtIOBlock)); > + > + s->vdev.update_config = virtio_blk_update_config; > + s->vdev.get_features = virtio_blk_get_features; > + s->bs = bs; > + > + virtio_add_queue(&s->vdev, virtio_blk_handle_output); > + > + return &s->vdev; > +} > diff --git a/qemu/vl.h b/qemu/vl.h > index fafcf09..249ede2 100644 > --- a/qemu/vl.h > +++ b/qemu/vl.h > @@ -1396,6 +1396,9 @@ void vmchannel_init(CharDriverState *hd, uint32_t deviceid, uint32_t index); > > typedef struct VirtIODevice VirtIODevice; > > +VirtIODevice *virtio_blk_init(PCIBus *bus, uint16_t vendor, uint16_t device, > + BlockDriverState *bs); > + > /* buf = NULL means polling */ > typedef int ADBDeviceRequest(ADBDevice *d, uint8_t *buf_out, > const uint8_t *buf, int len); > > |
From: Dor L. <dor...@gm...> - 2007-11-11 17:31:21
|
Anthony Liguori wrote: > This patch implements a very naive virtio network device backend in QEMU. > Even with this simple implementation, it get's about 3x the throughput of > rtl8139. Of course, there's a whole lot of room for optimization to eliminate > the unnecessary copying and support more advanced features. > > To use a virtio network device, simply specific '-net nic,model=virtio' > > Signed-off-by: Anthony Liguori <ali...@us...> > [snip] > +/* TX */ > +static void virtio_net_handle_tx(VirtIODevice *vdev, VirtQueue *vq) > +{ > + VirtIONet *n = to_virtio_net(vdev); > + VirtQueueElement elem; > + > + while (virtqueue_pop(vq, &elem)) { > + int i; > + size_t len = 0; > + > + /* ignore the header for now */ > + for (i = 1; i < elem.out_num; i++) { > + qemu_send_packet(n->vc, elem.out_sg[i].iov_base, > + elem.out_sg[i].iov_len); > + len += elem.out_sg[i].iov_len; > + } > + > + virtqueue_push(vq, &elem, sizeof(struct virtio_net_hdr) + len); > + virtio_notify(&n->vdev, vq); > + } > +} > Get the notify out of the while please. Regardless of above, at the moment you copy the receive packets into virtio sg's. I had a did an incomplete job of adding vectored read to qemu. What's your opinion about it? Maybe using qemu's pending dma functions? Regards & cheers, Dor. > + > +VirtIODevice *virtio_net_init(PCIBus *bus, uint16_t vendor, uint16_t device, > + NICInfo *nd, int devfn) > +{ > + VirtIONet *n; > + > + n = (VirtIONet *)virtio_init_pci(bus, "virtio-net", vendor, device, > + vendor, VIRTIO_ID_NET, > + 6, sizeof(VirtIONet)); > + > + n->vdev.update_config = virtio_net_update_config; > + n->vdev.get_features = virtio_net_get_features; > + n->rx_vq = virtio_add_queue(&n->vdev, virtio_net_handle_rx); > + n->tx_vq = virtio_add_queue(&n->vdev, virtio_net_handle_tx); > + n->can_receive = 0; > + memcpy(n->mac, nd->macaddr, 6); > + n->vc = qemu_new_vlan_client(nd->vlan, virtio_net_receive, > + virtio_net_can_receive, n); > + > + return &n->vdev; > +} > diff --git a/qemu/vl.h b/qemu/vl.h > index 7b5cc8d..4f26fbb 100644 > --- a/qemu/vl.h > +++ b/qemu/vl.h > @@ -1401,6 +1401,9 @@ VirtIODevice *virtio_blk_init(PCIBus *bus, uint16_t vendor, uint16_t device, > > VirtIODevice *virtio_9p_init(PCIBus *bus, uint16_t vendor, uint16_t device); > > +VirtIODevice *virtio_net_init(PCIBus *bus, uint16_t vendor, uint16_t device, > + NICInfo *nd, int devfn); > + > /* buf = NULL means polling */ > typedef int ADBDeviceRequest(ADBDevice *d, uint8_t *buf_out, > const uint8_t *buf, int len); > > |
From: Anthony L. <ali...@us...> - 2007-11-11 19:29:23
|
Dor Laor wrote: > Anthony Liguori wrote: >> This patch implements a very naive virtio network device backend in >> QEMU. >> Even with this simple implementation, it get's about 3x the >> throughput of >> rtl8139. Of course, there's a whole lot of room for optimization to >> eliminate >> the unnecessary copying and support more advanced features. >> >> To use a virtio network device, simply specific '-net nic,model=virtio' >> >> Signed-off-by: Anthony Liguori <ali...@us...> >> > [snip] > >> +/* TX */ >> +static void virtio_net_handle_tx(VirtIODevice *vdev, VirtQueue *vq) >> +{ >> + VirtIONet *n = to_virtio_net(vdev); >> + VirtQueueElement elem; >> + >> + while (virtqueue_pop(vq, &elem)) { >> + int i; >> + size_t len = 0; >> + >> + /* ignore the header for now */ >> + for (i = 1; i < elem.out_num; i++) { >> + qemu_send_packet(n->vc, elem.out_sg[i].iov_base, >> + elem.out_sg[i].iov_len); >> + len += elem.out_sg[i].iov_len; >> + } >> + >> + virtqueue_push(vq, &elem, sizeof(struct virtio_net_hdr) + len); >> + virtio_notify(&n->vdev, vq); >> + } >> +} >> > Get the notify out of the while please. > > Regardless of above, at the moment you copy the receive packets into > virtio sg's. > I had a did an incomplete job of adding vectored read to qemu. > What's your opinion about it? Maybe using qemu's pending dma functions? I did a quick hack where I had direct access to the tun's fd so that I could readv/writev directly. It didn't seem to help all that much but I'm not too surprised by that, it's was still synchronous. The dma API is definitely a step in the right direction. The way I'm approaching optimization is to first hack things up so that my particular case works, and then figure out how to fit it in within the rest of QEMU. For instance, I'm playing around right now with having the block driver open a file directly, and using the most recent linux-aio routines. It apparently now supports but fd notification (through eventfd) and asynchronous fdsync support (for barriers). I think we should be able to get very good performance with this. Once I get promising performance results, I'll figure out how to work it into the existing block API. Regards, Anthony Liguori > Regards & cheers, > Dor. >> + >> +VirtIODevice *virtio_net_init(PCIBus *bus, uint16_t vendor, uint16_t >> device, >> + NICInfo *nd, int devfn) >> +{ >> + VirtIONet *n; >> + >> + n = (VirtIONet *)virtio_init_pci(bus, "virtio-net", vendor, device, >> + vendor, VIRTIO_ID_NET, >> + 6, sizeof(VirtIONet)); >> + >> + n->vdev.update_config = virtio_net_update_config; >> + n->vdev.get_features = virtio_net_get_features; >> + n->rx_vq = virtio_add_queue(&n->vdev, virtio_net_handle_rx); >> + n->tx_vq = virtio_add_queue(&n->vdev, virtio_net_handle_tx); >> + n->can_receive = 0; >> + memcpy(n->mac, nd->macaddr, 6); >> + n->vc = qemu_new_vlan_client(nd->vlan, virtio_net_receive, >> + virtio_net_can_receive, n); >> + >> + return &n->vdev; >> +} >> diff --git a/qemu/vl.h b/qemu/vl.h >> index 7b5cc8d..4f26fbb 100644 >> --- a/qemu/vl.h >> +++ b/qemu/vl.h >> @@ -1401,6 +1401,9 @@ VirtIODevice *virtio_blk_init(PCIBus *bus, >> uint16_t vendor, uint16_t device, >> >> VirtIODevice *virtio_9p_init(PCIBus *bus, uint16_t vendor, uint16_t >> device); >> >> +VirtIODevice *virtio_net_init(PCIBus *bus, uint16_t vendor, uint16_t >> device, >> + NICInfo *nd, int devfn); >> + >> /* buf = NULL means polling */ >> typedef int ADBDeviceRequest(ADBDevice *d, uint8_t *buf_out, >> const uint8_t *buf, int len); >> >> > |
From: Anthony L. <ali...@us...> - 2007-11-11 19:25:25
|
Dor Laor wrote: >> + >> + if (1) { >> + BlockDriverState *bs = bdrv_new("vda"); >> + if (bdrv_open(bs, "/home/anthony/images/linux.img", >> BDRV_O_SNAPSHOT)) >> + exit(1); >> > Can you add a printf to the exit(1). I had to gdb the code to find why > my qemu is not running no more (in earlier version > I did remember to change the path but not after the new patches. Heh, sorry about that. > [snip] >> + virtqueue_push(vq, &elem, wlen); >> + virtio_notify(vdev, vq); >> + } >> > You can move the notify out of the while loop. This way you save irqs. I don't think it does actually. This raises the line multiple times, but if the line is already raised, it's effectively a nop. Now, with KVM's in-kernel APIC, it ends up generating more syscalls than is necessary so it's worth changing. >> +} >> + >> +static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t >> *config) >> +{ >> + VirtIOBlock *s = to_virtio_blk(vdev); >> + int64_t capacity; >> + uint32_t v; >> + >> + bdrv_get_geometry(s->bs, &capacity); >> + memcpy(config + VIRTIO_CONFIG_BLK_F_CAPACITY, &capacity, >> sizeof(capacity)); >> + >> + v = VIRTQUEUE_MAX_SIZE - 2; >> + memcpy(config + VIRTIO_CONFIG_BLK_F_SEG_MAX, &v, sizeof(v)); >> +} >> + >> +static uint32_t virtio_blk_get_features(VirtIODevice *vdev) >> +{ >> + return (1 << VIRTIO_BLK_F_SEG_MAX); >> > In general I think we need to add another feature or even version > number ( I know you guys hate it). > The reason is - Let's say you dont change functionality but change the > irq protocol (for example the isr won't be zeroed on read), then an old > guest driver wouldn't know it runs on a new host version and will have > its irq line pulled up. > So I suggest adding a capability of VIRTIO_ISR_CLEAR_XXX or adding a > version number. > Comments? I don't think we'll actually change the ISR protocol. I think it's the best that it can actually be. However, if we do need to change the ABI for some reason, I think the right thing to do is just use a new device ID (since it's effectively a new device). Regards, Anthony Liguori >> +} >> + >> +VirtIODevice *virtio_blk_init(PCIBus *bus, uint16_t vendor, uint16_t >> device, >> + BlockDriverState *bs) >> +{ >> + VirtIOBlock *s; >> + >> + s = (VirtIOBlock *)virtio_init_pci(bus, "virtio-blk", vendor, >> device, >> + vendor, VIRTIO_ID_BLOCK, >> + 16, sizeof(VirtIOBlock)); >> + >> + s->vdev.update_config = virtio_blk_update_config; >> + s->vdev.get_features = virtio_blk_get_features; >> + s->bs = bs; >> + >> + virtio_add_queue(&s->vdev, virtio_blk_handle_output); >> + >> + return &s->vdev; >> +} >> diff --git a/qemu/vl.h b/qemu/vl.h >> index fafcf09..249ede2 100644 >> --- a/qemu/vl.h >> +++ b/qemu/vl.h >> @@ -1396,6 +1396,9 @@ void vmchannel_init(CharDriverState *hd, >> uint32_t deviceid, uint32_t index); >> >> typedef struct VirtIODevice VirtIODevice; >> >> +VirtIODevice *virtio_blk_init(PCIBus *bus, uint16_t vendor, uint16_t >> device, >> + BlockDriverState *bs); >> + >> /* buf = NULL means polling */ >> typedef int ADBDeviceRequest(ADBDevice *d, uint8_t *buf_out, >> const uint8_t *buf, int len); >> >> > |
From: Dor L. <dor...@gm...> - 2007-11-12 12:20:57
|
Anthony Liguori wrote: > Dor Laor wrote: >>> >> In general I think we need to add another feature or even version >> number ( I know you guys hate it). >> The reason is - Let's say you dont change functionality but change >> the irq protocol (for example the isr won't be zeroed on read), then >> an old >> guest driver wouldn't know it runs on a new host version and will >> have its irq line pulled up. >> So I suggest adding a capability of VIRTIO_ISR_CLEAR_XXX or adding a >> version number. >> Comments? > > I don't think we'll actually change the ISR protocol. I think it's > the best that it can actually be. However, if we do need to change > the ABI for some reason, I think the right thing to do is just use a > new device ID (since it's effectively a new device). > > Regards, > > Anthony Liguori Changing the devid is acceptable and much more easier then backward compatibility support, I prefer it too. Note that there are some disadvantages when changing the devid - For example, if you installed an old device drivers in the guest kernel and after a while you bring the guest down, upgrade the kvm host version and bring the guest back up. If it has a new device id (and since the abi is not maintained the driver won't match VENDOR=virtio; DEVID=*), then the guest won't have a driver for the new device. In that case I think a guest agent can switch to fully virtualized device and afterwards install the driver automatically. This is what we do in our production environment. Regards, Dor |
From: Rusty R. <ru...@ru...> - 2007-11-13 00:28:52
|
On Tuesday 13 November 2007 10:25:54 Anthony Liguori wrote: > Dor Laor wrote: > > Anthony Liguori wrote: > >> Dor Laor wrote: > >>> In general I think we need to add another feature or even version > >>> number ( I know you guys hate it). > >>> The reason is - Let's say you dont change functionality but change > >>> the irq protocol (for example the isr won't be zeroed on read), then > >>> an old > >>> guest driver wouldn't know it runs on a new host version and will > >>> have its irq line pulled up. > >>> So I suggest adding a capability of VIRTIO_ISR_CLEAR_XXX or adding a > >>> version number. > >>> Comments? > >> > >> I don't think we'll actually change the ISR protocol. I think it's > >> the best that it can actually be. However, if we do need to change > >> the ABI for some reason, I think the right thing to do is just use a > >> new device ID (since it's effectively a new device). > >> > >> Regards, > >> > >> Anthony Liguori > > > > Changing the devid is acceptable and much more easier then backward > > compatibility support, I prefer it too. > > Note that there are some disadvantages when changing the devid - For > > example, > > if you installed an old device drivers in the guest kernel and after a > > while you bring the guest down, > > upgrade the kvm host version and bring the guest back up. > > If it has a new device id (and since the abi is not maintained the > > driver won't match VENDOR=virtio; DEVID=*), > > then the guest won't have a driver for the new device. > > In that case I think a guest agent can switch to fully virtualized > > device and afterwards install the driver automatically. > > This is what we do in our production environment. > > Hrm, I have to think about backwards compatibility at the virtio-pci > layer. virtio-pci basically exposes two things, the first is an ABI for > doing bidirectional notification and setting driver status. The second > is a standard and transparent mechanism for virt_rings. > > I think that the first is simply enough that we don't need a feature > mask or a version number. Maybe perhaps with the status bits, I don't > know. For the virt_rings, if we had something more sophisticated than > virt_rings down the road, that can be discovered/configured in the > per-device configuration area so I don't think we need feature/version > info for that. I originally had feature bits on each virtqueue for just such a reason. Used the same "ack" logic as the normal feature bits (ie. set corresponding bit to indicate guest wants to use the feature). Probably worth engineering this in now. Cheers, Rusty. |
From: Avi K. <av...@qu...> - 2007-11-27 09:45:01
|
Rusty Russell wrote: > On Tuesday 13 November 2007 10:25:54 Anthony Liguori wrote: > >> Dor Laor wrote: >> >>> Anthony Liguori wrote: >>> >>>> Dor Laor wrote: >>>> >>>>> In general I think we need to add another feature or even version >>>>> number ( I know you guys hate it). >>>>> The reason is - Let's say you dont change functionality but change >>>>> the irq protocol (for example the isr won't be zeroed on read), then >>>>> an old >>>>> guest driver wouldn't know it runs on a new host version and will >>>>> have its irq line pulled up. >>>>> So I suggest adding a capability of VIRTIO_ISR_CLEAR_XXX or adding a >>>>> version number. >>>>> Comments? >>>>> >>>> I don't think we'll actually change the ISR protocol. I think it's >>>> the best that it can actually be. However, if we do need to change >>>> the ABI for some reason, I think the right thing to do is just use a >>>> new device ID (since it's effectively a new device). >>>> >>>> Regards, >>>> >>>> Anthony Liguori >>>> >>> Changing the devid is acceptable and much more easier then backward >>> compatibility support, I prefer it too. >>> Note that there are some disadvantages when changing the devid - For >>> example, >>> if you installed an old device drivers in the guest kernel and after a >>> while you bring the guest down, >>> upgrade the kvm host version and bring the guest back up. >>> If it has a new device id (and since the abi is not maintained the >>> driver won't match VENDOR=virtio; DEVID=*), >>> then the guest won't have a driver for the new device. >>> In that case I think a guest agent can switch to fully virtualized >>> device and afterwards install the driver automatically. >>> This is what we do in our production environment. >>> >> Hrm, I have to think about backwards compatibility at the virtio-pci >> layer. virtio-pci basically exposes two things, the first is an ABI for >> doing bidirectional notification and setting driver status. The second >> is a standard and transparent mechanism for virt_rings. >> >> I think that the first is simply enough that we don't need a feature >> mask or a version number. Maybe perhaps with the status bits, I don't >> know. For the virt_rings, if we had something more sophisticated than >> virt_rings down the road, that can be discovered/configured in the >> per-device configuration area so I don't think we need feature/version >> info for that. >> > > I originally had feature bits on each virtqueue for just such a reason. Used > the same "ack" logic as the normal feature bits (ie. set corresponding bit to > indicate guest wants to use the feature). > > Probably worth engineering this in now. > > I don't think a per-queue feature bit is worthwhile. What we need is feature bits for the ABI. I'd start with one feature bit for the base device ABI. We can add further bits if we add new descriptor types, new queues, or newer config features The feature bits would be explicitly named and documented, instead of "I have a third virtqueue so I can use this new ABI". -- error compiling committee.c: too many arguments to function |
From: Daniel P. B. <ber...@re...> - 2007-11-12 15:02:17
|
On Sat, Nov 10, 2007 at 09:20:36PM -0600, Anthony Liguori wrote: > This patch implements a very naive virtio block device backend in QEMU. > There's a lot of room for future optimization. We need to merge a -disk patch > before we can provide a mechanism to expose this to users. The latest generation of -disk patches posted to upstream qemu-devel are looking quite promising & with any luck should get accepted for merge in the very near future. Are these block & net drivers able todo hot-plug/unplug after a domain has been created ? If so we'd also want to have disk_add/disk_remove and net_add/net_remove monitor commands (cf usb_add/usb_remove) for on-the-fly changes. Regards Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=| |
From: Avi K. <av...@qu...> - 2007-11-12 15:06:12
|
Daniel P. Berrange wrote: > On Sat, Nov 10, 2007 at 09:20:36PM -0600, Anthony Liguori wrote: > >> This patch implements a very naive virtio block device backend in QEMU. >> There's a lot of room for future optimization. We need to merge a -disk patch >> before we can provide a mechanism to expose this to users. >> > > The latest generation of -disk patches posted to upstream qemu-devel are > looking quite promising & with any luck should get accepted for merge in > the very near future. > > Are these block & net drivers able todo hot-plug/unplug after a domain > has been created ? If so we'd also want to have disk_add/disk_remove > and net_add/net_remove monitor commands (cf usb_add/usb_remove) for > on-the-fly changes. > Some poor soul would need to add this functionality to kvm's acpi tables. -- error compiling committee.c: too many arguments to function |
From: Anthony L. <ali...@us...> - 2007-11-12 23:26:15
|
Avi Kivity wrote: > Daniel P. Berrange wrote: >> On Sat, Nov 10, 2007 at 09:20:36PM -0600, Anthony Liguori wrote: >> >>> This patch implements a very naive virtio block device backend in QEMU. >>> There's a lot of room for future optimization. We need to merge a >>> -disk patch >>> before we can provide a mechanism to expose this to users. >>> >> >> The latest generation of -disk patches posted to upstream qemu-devel are >> looking quite promising & with any luck should get accepted for merge in >> the very near future. >> >> Are these block & net drivers able todo hot-plug/unplug after a domain >> has been created ? If so we'd also want to have disk_add/disk_remove >> and net_add/net_remove monitor commands (cf usb_add/usb_remove) for >> on-the-fly changes. >> > > Some poor soul would need to add this functionality to kvm's acpi tables. I don't think there's anything I need to do in virtio-pci to support this. I would just see a ->probe() event I imagine. Regards, Anthony Liguori |
From: Anthony L. <ali...@us...> - 2007-11-12 23:26:15
|
Dor Laor wrote: > Anthony Liguori wrote: >> Dor Laor wrote: >>>> >>> In general I think we need to add another feature or even version >>> number ( I know you guys hate it). >>> The reason is - Let's say you dont change functionality but change >>> the irq protocol (for example the isr won't be zeroed on read), then >>> an old >>> guest driver wouldn't know it runs on a new host version and will >>> have its irq line pulled up. >>> So I suggest adding a capability of VIRTIO_ISR_CLEAR_XXX or adding a >>> version number. >>> Comments? >> >> I don't think we'll actually change the ISR protocol. I think it's >> the best that it can actually be. However, if we do need to change >> the ABI for some reason, I think the right thing to do is just use a >> new device ID (since it's effectively a new device). >> >> Regards, >> >> Anthony Liguori > Changing the devid is acceptable and much more easier then backward > compatibility support, I prefer it too. > Note that there are some disadvantages when changing the devid - For > example, > if you installed an old device drivers in the guest kernel and after a > while you bring the guest down, > upgrade the kvm host version and bring the guest back up. > If it has a new device id (and since the abi is not maintained the > driver won't match VENDOR=virtio; DEVID=*), > then the guest won't have a driver for the new device. > In that case I think a guest agent can switch to fully virtualized > device and afterwards install the driver automatically. > This is what we do in our production environment. Hrm, I have to think about backwards compatibility at the virtio-pci layer. virtio-pci basically exposes two things, the first is an ABI for doing bidirectional notification and setting driver status. The second is a standard and transparent mechanism for virt_rings. I think that the first is simply enough that we don't need a feature mask or a version number. Maybe perhaps with the status bits, I don't know. For the virt_rings, if we had something more sophisticated than virt_rings down the road, that can be discovered/configured in the per-device configuration area so I don't think we need feature/version info for that. Rusty, Avi: any thoughts? I'm a bit on the fence. Regards, Anthony Liguori > Regards, > Dor |