You can subscribe to this list here.
2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(33) |
Nov
(325) |
Dec
(320) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2007 |
Jan
(484) |
Feb
(438) |
Mar
(407) |
Apr
(713) |
May
(831) |
Jun
(806) |
Jul
(1023) |
Aug
(1184) |
Sep
(1118) |
Oct
(1461) |
Nov
(1224) |
Dec
(1042) |
2008 |
Jan
(1449) |
Feb
(1110) |
Mar
(1428) |
Apr
(1643) |
May
(682) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Hollis B. <ho...@us...> - 2008-05-14 20:03:31
|
On Wednesday 14 May 2008 14:10:06 Jan Kiszka wrote: > Hollis Blanchard wrote: > > On Wednesday 14 May 2008 10:28:51 Jan Kiszka wrote: > >> So gdb on power relies only on those few hw-breakpoints? With x86 you > >> can perfectly run gdb (with soft BPs) in parallel with the gdbstub > >> (currently based on hw-BPs, but the same would be true for soft-BPs > >> inserted by the gdbstub). > > > > GDB on Power inserts trap instructions, i.e. standard "soft" breakpoints. It > > does not rely on the hardware breakpoints. > > > > It gets a little more complicated when you involve a GDB stub. IIRC, GDB will > > ask the stub to set the breakpoint, and if the stub doesn't support it, GDB > > will fall back to overwriting the instructions in memory. Does the Qemu GDB > > stub advertise breakpoint support? > > Yes, QEMU reacts on both Z0 (soft-BP) and Z1 (hard-BP). That's something > even gdbserver does not do! It just handles watchpoints (Z2..4). > > > > > If not, the only support needed in KVM would be to send all debug interrupts > > to qemu, and allow qemu to send them back down for in-guest breakpoints. > > > > Simply returning "unsupported" on Z0 is not yet enough for x86, KVM's > kernel side should not yet inform QEMU about soft-BP exceptions. But in > theory, this should be easily fixable (or is already the case for other > archs). And it would safe us from keeping track of N software > breakpoints, where N could even become larger than 32, the current > hardcoded limit for plain QEMU. :) > > Meanwhile I realized that the proposed KVM_DEBUG_GUEST API is > insufficient: We need a return channel for the debug register state > (specifically to figure out details about hit watchpoints). I'm now > favoring KVM_SET_DEBUG and KVM_GET_DEBUG as two new IOCTLs, enabling us > to write _and_ read-back the suggested data structure. How about simply extending kvm_exit.debug to contain the virtual address of the breakpoint hit? In Qemu, when exit_reason == KVM_EXIT_DEBUG, it would just need to see if that address is for a breakpoint Qemu set or not. If so, it's happy. If not, (commence handwaving) tell KVM to forward the debug interrupt to the guest. This way, the list of breakpoints is maintained in userspace (in the qemu gdb stub), which is nice because it could be arbitrarily large. Also, this is not specific to hardware debug registers: soft and hard breakpoint interrupts would follow the same path. There's still a question of whether the GDB stub should set the breakpoint itself (Z0/Z1) or force GDB to modify memory, but either way the KVM code is simple. -- Hollis Blanchard IBM Linux Technology Center |
From: Jan K. <jan...@we...> - 2008-05-14 19:50:09
|
Hollis Blanchard wrote: > On Wednesday 14 May 2008 14:10:06 Jan Kiszka wrote: >> Hollis Blanchard wrote: >>> On Wednesday 14 May 2008 10:28:51 Jan Kiszka wrote: >>>> So gdb on power relies only on those few hw-breakpoints? With x86 you >>>> can perfectly run gdb (with soft BPs) in parallel with the gdbstub >>>> (currently based on hw-BPs, but the same would be true for soft-BPs >>>> inserted by the gdbstub). >>> GDB on Power inserts trap instructions, i.e. standard "soft" breakpoints. > It >>> does not rely on the hardware breakpoints. >>> >>> It gets a little more complicated when you involve a GDB stub. IIRC, GDB > will >>> ask the stub to set the breakpoint, and if the stub doesn't support it, > GDB >>> will fall back to overwriting the instructions in memory. Does the Qemu > GDB >>> stub advertise breakpoint support? >> Yes, QEMU reacts on both Z0 (soft-BP) and Z1 (hard-BP). That's something >> even gdbserver does not do! It just handles watchpoints (Z2..4). >> >>> If not, the only support needed in KVM would be to send all debug > interrupts >>> to qemu, and allow qemu to send them back down for in-guest breakpoints. >>> >> Simply returning "unsupported" on Z0 is not yet enough for x86, KVM's >> kernel side should not yet inform QEMU about soft-BP exceptions. But in >> theory, this should be easily fixable (or is already the case for other >> archs). And it would safe us from keeping track of N software >> breakpoints, where N could even become larger than 32, the current >> hardcoded limit for plain QEMU. :) >> >> Meanwhile I realized that the proposed KVM_DEBUG_GUEST API is >> insufficient: We need a return channel for the debug register state >> (specifically to figure out details about hit watchpoints). I'm now >> favoring KVM_SET_DEBUG and KVM_GET_DEBUG as two new IOCTLs, enabling us >> to write _and_ read-back the suggested data structure. > > How about simply extending kvm_exit.debug to contain the virtual address of > the breakpoint hit? Ah, there is an interface for such stuff already! And it can even take quite some payload... > In Qemu, when exit_reason == KVM_EXIT_DEBUG, it would > just need to see if that address is for a breakpoint Qemu set or not. If so, > it's happy. If not, (commence handwaving) tell KVM to forward the debug > interrupt to the guest. This way, the list of breakpoints is maintained in > userspace (in the qemu gdb stub), which is nice because it could be > arbitrarily large. Yes, but I would rather pass the debug registers (more general: some arch dependent state set) back in this slot. Those contain everything the gdbstub needs to know to catch relevant hardware-BP/watchpoint events (and report them to the gdb frontend). > > Also, this is not specific to hardware debug registers: soft and hard > breakpoint interrupts would follow the same path. There's still a question of > whether the GDB stub should set the breakpoint itself (Z0/Z1) or force GDB to > modify memory, but either way the KVM code is simple. Only rejecting Z0 will enable us to avoid any soft-BP tracking in qemu-kvm, and that is definitely my plan. Z1 may become an option to add later on and would be answered as "unsupported" for now. Jan |
From: Jan K. <jan...@we...> - 2008-05-14 19:10:10
|
Hollis Blanchard wrote: > On Wednesday 14 May 2008 10:28:51 Jan Kiszka wrote: >> So gdb on power relies only on those few hw-breakpoints? With x86 you >> can perfectly run gdb (with soft BPs) in parallel with the gdbstub >> (currently based on hw-BPs, but the same would be true for soft-BPs >> inserted by the gdbstub). > > GDB on Power inserts trap instructions, i.e. standard "soft" breakpoints. It > does not rely on the hardware breakpoints. > > It gets a little more complicated when you involve a GDB stub. IIRC, GDB will > ask the stub to set the breakpoint, and if the stub doesn't support it, GDB > will fall back to overwriting the instructions in memory. Does the Qemu GDB > stub advertise breakpoint support? Yes, QEMU reacts on both Z0 (soft-BP) and Z1 (hard-BP). That's something even gdbserver does not do! It just handles watchpoints (Z2..4). > > If not, the only support needed in KVM would be to send all debug interrupts > to qemu, and allow qemu to send them back down for in-guest breakpoints. > Simply returning "unsupported" on Z0 is not yet enough for x86, KVM's kernel side should not yet inform QEMU about soft-BP exceptions. But in theory, this should be easily fixable (or is already the case for other archs). And it would safe us from keeping track of N software breakpoints, where N could even become larger than 32, the current hardcoded limit for plain QEMU. :) Meanwhile I realized that the proposed KVM_DEBUG_GUEST API is insufficient: We need a return channel for the debug register state (specifically to figure out details about hit watchpoints). I'm now favoring KVM_SET_DEBUG and KVM_GET_DEBUG as two new IOCTLs, enabling us to write _and_ read-back the suggested data structure. Jan |
From: Muli Ben-Y. <mu...@il...> - 2008-05-14 18:31:52
|
On Wed, May 14, 2008 at 06:09:42PM +0300, Dor Laor wrote: > > Do you have any performance numbers for networking to see how it > > compares to the real hardware? > > > > - Linux host (or: real Windows running on that host) > > For host you can measure yourself but for Linux guest (to host) it > currently do about 1G, using TSO (work in progress) it can do 2.5G, > and there is also work in progress to make the kernel know virtio > through the tap interface which will further boot performance. ... with what kind of CPU utilization? > > - PV Windows (network driver) > > About 700Mb+-, there is currently extra copy that we need to omit. > Thanks for Anthony, we just have to change the driver. Same question (although it's less interesting if we can't even saturate the pipe). > > - non-PV Windows > > What do you mean? Other fully emulated nics like e1000? > It does not perform as pv but depending on the guest it can do up to > 600Mb+-. Same question (although again it's less interesting if we can't even saturate the pipe). Cheers, Muli |
From: Linus T. <tor...@li...> - 2008-05-14 18:27:38
|
On Wed, 14 May 2008, Christoph Lameter wrote: > > The problem is that the code in rmap.c try_to_umap() and friends loops > over reverse maps after taking a spinlock. The mm_struct is only known > after the rmap has been acccessed. This means *inside* the spinlock. So you queue them. That's what we do with things like the dirty bit. We need to hold various spinlocks to look up pages, but then we can't actually call the filesystem with the spinlock held. Converting a spinlock to a waiting lock for things like that is simply not acceptable. You have to work with the system. Yeah, there's only a single bit worth of information on whether a page is dirty or not, so "queueing" that information is trivial (it's just the return value from "page_mkclean_file()". Some things are harder than others, and I suspect you need some kind of "gather" structure to queue up all the vma's that can be affected. But it sounds like for the case of rmap, the approach of: - the page lock is the higher-level "sleeping lock" (which makes sense, since this is very close to an IO event, and that is what the page lock is generally used for) But hey, it could be anything else - maybe you have some other even bigger lock to allow you to handle lots of pages in one go. - with that lock held, you do the whole rmap dance (which requires spinlocks) and gather up the vma's and the struct mm's involved. - outside the spinlocks you then do whatever it is you need to do. This doesn't sound all that different from TLB shoot-down in SMP, and the "mmu_gather" structure. Now, admittedly we can do the TLB shoot-down while holding the spinlocks, but if we couldn't that's how we'd still do it: it would get more involved (because we'd need to guarantee that the gather can hold *all* the pages - right now we can just flush in the middle if we need to), but it wouldn't be all that fundamentally different. And no, I really haven't even wanted to look at what XPMEM really needs to do, so maybe the above thing doesn't work for you, and you have other issues. I'm just pointing you in a general direction, not trying to say "this is exactly how to get there". Linus |
From: Hollis B. <ho...@us...> - 2008-05-14 18:27:06
|
On Wednesday 14 May 2008 10:28:51 Jan Kiszka wrote: > So gdb on power relies only on those few hw-breakpoints? With x86 you > can perfectly run gdb (with soft BPs) in parallel with the gdbstub > (currently based on hw-BPs, but the same would be true for soft-BPs > inserted by the gdbstub). GDB on Power inserts trap instructions, i.e. standard "soft" breakpoints. It does not rely on the hardware breakpoints. It gets a little more complicated when you involve a GDB stub. IIRC, GDB will ask the stub to set the breakpoint, and if the stub doesn't support it, GDB will fall back to overwriting the instructions in memory. Does the Qemu GDB stub advertise breakpoint support? If not, the only support needed in KVM would be to send all debug interrupts to qemu, and allow qemu to send them back down for in-guest breakpoints. -- Hollis Blanchard IBM Linux Technology Center |
From: Christoph L. <cla...@sg...> - 2008-05-14 17:57:17
|
On Wed, 14 May 2008, Linus Torvalds wrote: > One thing to realize is that most of the time (read: pretty much *always*) > when we have the problem of wanting to sleep inside a spinlock, the > solution is actually to just move the sleeping to outside the lock, and > then have something else that serializes things. The problem is that the code in rmap.c try_to_umap() and friends loops over reverse maps after taking a spinlock. The mm_struct is only known after the rmap has been acccessed. This means *inside* the spinlock. That is why I tried to convert the locks to scan the revese maps to semaphores. If that is done then one can indeed do the callouts outside of atomic contexts. > Can it be done? I don't know. But I do know that I'm unlikely to accept a > noticeable slowdown in some very core code for a case that affects about > 0.00001% of the population. In other words, I think you *have* to do it. With larger number of processor semaphores make a lot of sense since the holdoff times on spinlocks will increase. If we go to sleep then the processor can do something useful instead of hogging a cacheline. A rw lock there can also increase concurrency during reclaim espcially if the anon_vma chains and the number of address spaces mapping a page is high. |
From: Anthony L. <an...@co...> - 2008-05-14 17:50:18
|
Dor Laor wrote: > On Wed, 2008-05-14 at 17:49 +0200, Tomasz Chmielewski wrote: > >> Dor Laor schrieb: >> >> (...) >> >> >>>> - PV Windows (network driver) >>>> >>> About 700Mb+-, there is currently extra copy that we need to omit. >>> Thanks for Anthony, we just have to change the driver. >>> >>> >>>> - non-PV Windows >>>> >>> What do you mean? Other fully emulated nics like e1000? >>> It does not perform as pv but depending on the guest it can do up to >>> 600Mb+-. >>> >> Just generally, how Windows PV drivers help to improve network performance. >> >> So, a PV network driver can do about 700Mb/s, and an emulated NIC can do about >> 600 Mb/s, Windows guest to host? >> >> That would be about 20% improvement? >> FWIW, virtio-net is much better with my patches applied. The difference between the e1000 and virtio-net is that e1000 consumes almost twice as much CPU as virtio-net so in my testing, the performance improvement with virtio-net is about 2x. We were loosing about 20-30% throughput because of the delays in handling incoming packets. Regards, Anthony LIguori >> > > It's work in progress, doing zero copy in the guest, adding TSO, using > virtio'd tap will drastically boot performance. There is no reason the > performance won't match Linux guest. > Also I don't exactly remember the numbers but the gain in the tx pass is > grater. > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > kvm-devel mailing list > kvm...@li... > https://lists.sourceforge.net/lists/listinfo/kvm-devel > |
From: Linus T. <tor...@li...> - 2008-05-14 16:57:10
|
On Wed, 14 May 2008, Robin Holt wrote: > > Would it be acceptable to always put a sleepable stall in even if the > code path did not require the pages be unwritable prior to continuing? > If we did that, I would be freed from having a pool of invalidate > threads ready for XPMEM to use for that work. Maybe there is a better > way, but the sleeping requirement we would have on the threads make most > options seem unworkable. I'm not understanding the question. If you can do you management outside of the spinlocks, then you can obviously do whatever you want, including sleping. It's changing the existing spinlocks to be sleepable that is not acceptable, because it's such a performance problem. Linus |
From: Jerone Y. <jy...@us...> - 2008-05-14 16:46:31
|
On Wed, 2008-05-14 at 17:28 +0200, Jan Kiszka wrote: > Jerone Young wrote: > > On Mon, 2008-05-12 at 13:34 +0200, Jan Kiszka wrote: > >> Hi, > >> > >> before going wild with my idea, I would like to collect some comments on > >> this approach: > >> > >> While doing first kernel debugging with my debug register patches for > >> kvm, I quickly ran into the 4-breakpoints-only limitation that comes > >> from the fact that we blindly map software to hardware breakpoints. > >> Unhandy, simply suboptimal. Also, having 4 breakpoint slots hard-coded > >> in the generic interface is not fair to arch that may support more. > >> Moreover, we do not support watchpoints although this would easily be > >> feasible. But if we supported watchpoints (via debug registers on x86), > >> we would need the break out of the 4 slots limitations even earlier. In > >> short, I came to the conclusion that a rewrite of the KVM_DEBUG_GUEST > >> interface is required. > > So embedded power is also limited to 4 hardware registers for break > > points. But there are 2 sepreate registers fro watch points. The reason > > to use the registers is the hardware does the work for you and (at least > > on Power) will throw an exception or trap. Then you deal with it. > > > > But you still face the fact that you can only have a small number of > > breakpoints & watch points. Also you cannot use gdb in the guest at the > > sametime while using the gdb stub on the guest itself (as there is only > > one set of registers). > > So gdb on power relies only on those few hw-breakpoints? With x86 you > can perfectly run gdb (with soft BPs) in parallel with the gdbstub > (currently based on hw-BPs, but the same would be true for soft-BPs > inserted by the gdbstub). > > > > > > >> Why do we set breakpoints in the kernel? Why not simply catching all > >> debug traps, inserting software breakpoint ops into the guest code, and > >> handling all this stuff as normal debuggers do? And the hardware > >> breakpoints should just be pushed through the kernel interface like > >> ptrace does. > > > > See above...But the cpu basically does the work for you. So you don't > > have to try and go through and first insert a trap into the code in > > memory. But then you have to remember the code that you replaced with > > the trap and execute it after you handle the trap. This can get a little > > hairy. > > I cannot imaging that this is so hairy. It is basically daily (x86-) > debugger business. Maybe we need to handle it differently if other > arches prefer their own way. But for x86 I don't see a need to restrict > our self to use hw-BPs _only_. > > > > > Currently I'm actually implementing breakpoint support now in Power. But > > you do have to create some mappings to handle traps and see if you put > > the trap there, and execute the code you replaced. Also what if the > > breakpoint is removed. Then you have to go back through and actually > > replace the trap code. Doesn't sound hard, but I'm not sure of all the > > pitfalls. > > Again, this /should/ not be different from what gdb does to applications > or kgdb does to the kernel. (Looks like I need to get my feet wet soon. :) ) > > > > >> The new KVM_DEBUG_GUEST interface I currently have in mind would look > >> like this: > >> > >> #define KVM_DBGGUEST_ENABLE 0x01 > >> #define KVM_DBGGUEST_SINGLESTEP 0x02 > >> > >> struct kvm_debug_guest { > >> __u32 control; > >> struct kvm_debug_guest_arch arch; > >> } > > > > > >> Setting KVM_DBGGUEST_ENABLE would forward all debug-related traps to > >> userspace first, which can then decide to handle or re-inject them. > >> KVM_DBGGUEST_SINGLESTEP would work as before. And the extension for x86 > >> would look like this: > >> > >> struct kvm_debug_guest_arch { > >> __u32 use_hw_breakpoints; > >> __u64 debugreg[8]; > >> } > >> > >> If use_hw_breakpoints is non-zero, KVM would completely overwrite the > >> guest's debug registers with the content of debugreg, giving full > >> control of this feature to the host-side debugger (faking the content of > >> debug registers, effectively disabling them for the guest - as we now do > >> all the time). > > > > Hmmm...so today at least the gdbstub in qemu does not inject traps and > > track code that it trapped (I could be mistaken). This whould all need > > to be implemented as well. > > gdbstub inserts "virtual" traps today, ie. a call from the translated > guest code to a helper which signals the breakpoint to the stub. And I > don't want to change this. I want to add the BP injection/removal to > qemu-kvm as it already takes over breakpoint (and soon also watchpoint) > maintenance from qemu. > > However, would the proposed interface for KVM_DEBUG_GUEST (with an > appropriate kvm_debug_guest_arch for power) restrict your plans in any way? I think you should go ahead and go for it. I will be required to make changes around it for the use of hardware breakpoints if it goes in. But honestly it would be better to only use software breakpoints if they work, as opposed to hardware breakpoints .. due to the limits. > > Jan > |
From: Alex W. <ale...@hp...> - 2008-05-14 16:40:16
|
Trivial build warning/fixes when the local DEBUG define is enabled. Signed-off-by: Alex Williamson <ale...@hp...> -- diff --git a/qemu/hw/acpi.c b/qemu/hw/acpi.c index c4419c4..c305702 100644 --- a/qemu/hw/acpi.c +++ b/qemu/hw/acpi.c @@ -586,7 +586,7 @@ static uint32_t gpe_readb(void *opaque, uint32_t addr) } #if defined(DEBUG) - printf("gpe read %lx == %lx\n", addr, val); + printf("gpe read %x == %x\n", addr, val); #endif return val; } @@ -619,7 +619,7 @@ static void gpe_writeb(void *opaque, uint32_t addr, uint32_t val) } #if defined(DEBUG) - printf("gpe write %lx <== %d\n", addr, val); + printf("gpe write %x <== %d\n", addr, val); #endif } @@ -639,7 +639,7 @@ static uint32_t pcihotplug_read(void *opaque, uint32_t addr) } #if defined(DEBUG) - printf("pcihotplug read %lx == %lx\n", addr, val); + printf("pcihotplug read %x == %x\n", addr, val); #endif return val; } @@ -657,14 +657,14 @@ static void pcihotplug_write(void *opaque, uint32_t addr, uint32_t val) } #if defined(DEBUG) - printf("pcihotplug write %lx <== %d\n", addr, val); + printf("pcihotplug write %x <== %d\n", addr, val); #endif } static uint32_t pciej_read(void *opaque, uint32_t addr) { #if defined(DEBUG) - printf("pciej read %lx == %lx\n", addr, val); + printf("pciej read %x\n", addr); #endif return 0; } @@ -676,7 +676,7 @@ static void pciej_write(void *opaque, uint32_t addr, uint32_t val) device_hot_remove_success(0, slot); #if defined(DEBUG) - printf("pciej write %lx <== %d\n", addr, val); + printf("pciej write %x <== %d\n", addr, val); #endif } |
From: Fabrice B. <fa...@be...> - 2008-05-14 16:25:05
|
Paul Brook wrote: >> I suggested it because my original plan for the configuration file was >> based on this syntax with a strong inspiration from the OpenFirmware >> device tree. The idea was that the object name ("drive" here) had no >> hardcoded meaning, except for some predefined object names in order to >> keep a kind of backward compatibility with the current QEMU options. In >> order to create a new drive for example, you just have to do: >> >> mydrive.class=drive >> mydrive.if=scsi >> mydrive.file=abc.img >> >> the "class" field is used to select the device model. Then all the other >> parameters are used to initialize the device model. That way it is >> possible to keep the compatibility with the existing options and add a >> provision to instanciate arbitrary new device models, such as: > > I like the idea, but I'm not so keen on the automatic allocation. I generally > prefer explicit declaration over implicit things. The latter makes it very > easy to not notice when you make a typo. > > It sounds like what you really want is something similar to an OF device tree. > So you have something like: > > # pciide0 may be an alias (possibly provided by qemu) > # e.g. pci0.slot1.func1.ide > alias hda ide0.primary.master > > hda.type=disk > hda.file=foo.img > > You can then define some form of magic aliases that select the next unused > device. e.g. > > alias mydrive $next_ide_disk > > IMHO This provides the flexibility and structure that Fabrice is talking > about, and with suitable aliases can be made to look a lot like the existing > options. Right. It is my intent too to allow aliases to keep the same "familiar" names as the command line. Moreover the tree you suggest is necessary in order to derive the device instanciation order. In my idea, the tree has no relation with the actual device connections which are specified by explicit fields such as slots, functions, interface index, disk indexes or anything else. An interesting shortcut can be to automatically define a field "index" if the device name terminates with a number (if I remember correctly OpenFirmware does something like this). The initialization phase would consist in traversing the tree recursively and by instanciating a device for all nodes containing a "class" (or "type" if you prefer) field. The parents would be instanciated before the children to ensure a coherent initialization order. Regarding the syntax, quoted strings must be supported of course. I don't think there is a great complexity in that :-) A cpp like preprocessing can be added, but it can be done later. > This may require some internal restructuring to allow the machine descriptions > to feed into the user config file. Hopefully it is not necessary to fully implement the proposal now. But ultimately, each QEMU device would have to register its class name and an instanciation function. The machine descriptions would have to predefine some object names so that the user can modify parameters. Regards, Fabrice. |
From: Dor L. <dor...@qu...> - 2008-05-14 16:23:04
|
On Wed, 2008-05-14 at 17:49 +0200, Tomasz Chmielewski wrote: > Dor Laor schrieb: > > (...) > > >> - PV Windows (network driver) > > > > About 700Mb+-, there is currently extra copy that we need to omit. > > Thanks for Anthony, we just have to change the driver. > > > >> - non-PV Windows > > > > What do you mean? Other fully emulated nics like e1000? > > It does not perform as pv but depending on the guest it can do up to > > 600Mb+-. > > Just generally, how Windows PV drivers help to improve network performance. > > So, a PV network driver can do about 700Mb/s, and an emulated NIC can do about > 600 Mb/s, Windows guest to host? > > That would be about 20% improvement? > > It's work in progress, doing zero copy in the guest, adding TSO, using virtio'd tap will drastically boot performance. There is no reason the performance won't match Linux guest. Also I don't exactly remember the numbers but the gain in the tx pass is grater. |
From: Robin H. <ho...@sg...> - 2008-05-14 16:22:25
|
On Wed, May 14, 2008 at 08:18:21AM -0700, Linus Torvalds wrote: > > > On Wed, 14 May 2008, Robin Holt wrote: > > > > Are you suggesting the sending side would not need to sleep or the > > receiving side? > > One thing to realize is that most of the time (read: pretty much *always*) > when we have the problem of wanting to sleep inside a spinlock, the > solution is actually to just move the sleeping to outside the lock, and > then have something else that serializes things. > > That way, the core code (protected by the spinlock, and in all the hot > paths) doesn't sleep, but the special case code (that wants to sleep) can > have some other model of serialization that allows sleeping, and that > includes as a small part the spinlocked region. > > I do not know how XPMEM actually works, or how you use it, but it > seriously sounds like that is how things *should* work. And yes, that > probably means that the mmu-notifiers as they are now are simply not > workable: they'd need to be moved up so that they are inside the mmap > semaphore but not the spinlocks. We are in the process of attempting this now. Unfortunately for SGI, Christoph is on vacation right now so we have been trying to work it internally. We are looking through two possible methods, one we add a callout to the tlb flush paths for both the mmu_gather and flush_tlb_page locations. The other we place a specific callout seperate from the gather callouts in the paths we are concerned with. We will look at both more carefully before posting. In either implementation, not all call paths would require the stall to ensure data integrity. Would it be acceptable to always put a sleepable stall in even if the code path did not require the pages be unwritable prior to continuing? If we did that, I would be freed from having a pool of invalidate threads ready for XPMEM to use for that work. Maybe there is a better way, but the sleeping requirement we would have on the threads make most options seem unworkable. Thanks, Robin |
From: Kelly F. <kf...@fe...> - 2008-05-14 16:12:21
|
On Wed, 14 May 2008, Johannes Schindelin wrote: > Hi, > > On Wed, 14 May 2008, Javier Guerra wrote: > >> On Wed, May 14, 2008 at 10:13 AM, Johannes Schindelin >> <Joh...@gm...> wrote: >>> On Wed, 14 May 2008, Javier Guerra wrote: >>> > What about Lua? (http://www.lua.org) >>> > >>> > it started up as a configuration language, and evolved into a full >>> > programming language, while remaining _very_ light (less than 200K >>> > with all libraries), and wonderfully easy to embed into C programs. >>> >>> Okay, so much for the upsides. Now for the downsides: a language that >>> nearly nobody knows, for something that is not meant to be executed (think >>> security implications). >> >> when embedded, you get to choose what libraries are available. there >> are several examples of fairly secure settings. > > Why artificially make it complicated, and then have to take care of such > issues? > > That's like an ex-colleague of mine, who absolutely had to rewrite the > database engine in-RAM, and when the application was too slow over modem > (leeching in megabytes, where it got bytes from the SQL database before), > he tried to force Windows Terminal Services, instead of reverting his > change. > > Simplicity is underrated. > > Ciao, > Dscho Why not just bypass the whole config file idea and just use enviornment variables? No more parsing or dependencies on the language of the day. Yes, you wouldn't have the tree format that some people are asking for. You'd get all of the power of your favorite shell, plus maybe some benefits when migrating a VM to another machine. -kf #!/bin/sh drive_0_file=foo.img drive_0_if=scsi drive_1_file=bar.img drive_1_if=scsi drive_2_file=wiz.img drive_2_if=scsi exec qemu |
From: Tomasz C. <ma...@wp...> - 2008-05-14 15:49:32
|
Dor Laor schrieb: (...) >> - PV Windows (network driver) > > About 700Mb+-, there is currently extra copy that we need to omit. > Thanks for Anthony, we just have to change the driver. > >> - non-PV Windows > > What do you mean? Other fully emulated nics like e1000? > It does not perform as pv but depending on the guest it can do up to > 600Mb+-. Just generally, how Windows PV drivers help to improve network performance. So, a PV network driver can do about 700Mb/s, and an emulated NIC can do about 600 Mb/s, Windows guest to host? That would be about 20% improvement? -- Tomasz Chmielewski http://wpkg.org |
From: Javier G. <ja...@gu...> - 2008-05-14 15:42:46
|
On Wed, May 14, 2008 at 10:37 AM, Johannes Schindelin <Joh...@gm...> wrote: > On Wed, 14 May 2008, Javier Guerra wrote: > > when embedded, you get to choose what libraries are available. there > > are several examples of fairly secure settings. > > Why artificially make it complicated, and then have to take care of such > issues? i should have said "Lua is secure by default, all libraries are optional", but it's quickly turning into a bikeshed (http://www.bikeshed.com) my own personal preference would be to just stuff the whole command line parameters into the qcow2 image file. -- Javier |
From: Johannes S. <Joh...@gm...> - 2008-05-14 15:37:14
|
Hi, On Wed, 14 May 2008, Javier Guerra wrote: > On Wed, May 14, 2008 at 10:13 AM, Johannes Schindelin > <Joh...@gm...> wrote: > > On Wed, 14 May 2008, Javier Guerra wrote: > > > What about Lua? (http://www.lua.org) > > > > > > it started up as a configuration language, and evolved into a full > > > programming language, while remaining _very_ light (less than 200K > > > with all libraries), and wonderfully easy to embed into C programs. > > > > Okay, so much for the upsides. Now for the downsides: a language that > > nearly nobody knows, for something that is not meant to be executed (think > > security implications). > > when embedded, you get to choose what libraries are available. there > are several examples of fairly secure settings. Why artificially make it complicated, and then have to take care of such issues? That's like an ex-colleague of mine, who absolutely had to rewrite the database engine in-RAM, and when the application was too slow over modem (leeching in megabytes, where it got bytes from the SQL database before), he tried to force Windows Terminal Services, instead of reverting his change. Simplicity is underrated. Ciao, Dscho |
From: Javier G. <ja...@gu...> - 2008-05-14 15:30:51
|
On Wed, May 14, 2008 at 10:13 AM, Johannes Schindelin <Joh...@gm...> wrote: > On Wed, 14 May 2008, Javier Guerra wrote: > > What about Lua? (http://www.lua.org) > > > > it started up as a configuration language, and evolved into a full > > programming language, while remaining _very_ light (less than 200K > > with all libraries), and wonderfully easy to embed into C programs. > > Okay, so much for the upsides. Now for the downsides: a language that > nearly nobody knows, for something that is not meant to be executed (think > security implications). when embedded, you get to choose what libraries are available. there are several examples of fairly secure settings. personally, i find shell scripts enough for setting up parameters. a static config wouldn't bring much advantages. -- Javier |
From: Jan K. <jan...@we...> - 2008-05-14 15:29:04
|
Jerone Young wrote: > On Mon, 2008-05-12 at 13:34 +0200, Jan Kiszka wrote: >> Hi, >> >> before going wild with my idea, I would like to collect some comments on >> this approach: >> >> While doing first kernel debugging with my debug register patches for >> kvm, I quickly ran into the 4-breakpoints-only limitation that comes >> from the fact that we blindly map software to hardware breakpoints. >> Unhandy, simply suboptimal. Also, having 4 breakpoint slots hard-coded >> in the generic interface is not fair to arch that may support more. >> Moreover, we do not support watchpoints although this would easily be >> feasible. But if we supported watchpoints (via debug registers on x86), >> we would need the break out of the 4 slots limitations even earlier. In >> short, I came to the conclusion that a rewrite of the KVM_DEBUG_GUEST >> interface is required. > So embedded power is also limited to 4 hardware registers for break > points. But there are 2 sepreate registers fro watch points. The reason > to use the registers is the hardware does the work for you and (at least > on Power) will throw an exception or trap. Then you deal with it. > > But you still face the fact that you can only have a small number of > breakpoints & watch points. Also you cannot use gdb in the guest at the > sametime while using the gdb stub on the guest itself (as there is only > one set of registers). So gdb on power relies only on those few hw-breakpoints? With x86 you can perfectly run gdb (with soft BPs) in parallel with the gdbstub (currently based on hw-BPs, but the same would be true for soft-BPs inserted by the gdbstub). > > >> Why do we set breakpoints in the kernel? Why not simply catching all >> debug traps, inserting software breakpoint ops into the guest code, and >> handling all this stuff as normal debuggers do? And the hardware >> breakpoints should just be pushed through the kernel interface like >> ptrace does. > > See above...But the cpu basically does the work for you. So you don't > have to try and go through and first insert a trap into the code in > memory. But then you have to remember the code that you replaced with > the trap and execute it after you handle the trap. This can get a little > hairy. I cannot imaging that this is so hairy. It is basically daily (x86-) debugger business. Maybe we need to handle it differently if other arches prefer their own way. But for x86 I don't see a need to restrict our self to use hw-BPs _only_. > > Currently I'm actually implementing breakpoint support now in Power. But > you do have to create some mappings to handle traps and see if you put > the trap there, and execute the code you replaced. Also what if the > breakpoint is removed. Then you have to go back through and actually > replace the trap code. Doesn't sound hard, but I'm not sure of all the > pitfalls. Again, this /should/ not be different from what gdb does to applications or kgdb does to the kernel. (Looks like I need to get my feet wet soon. :) ) > >> The new KVM_DEBUG_GUEST interface I currently have in mind would look >> like this: >> >> #define KVM_DBGGUEST_ENABLE 0x01 >> #define KVM_DBGGUEST_SINGLESTEP 0x02 >> >> struct kvm_debug_guest { >> __u32 control; >> struct kvm_debug_guest_arch arch; >> } > > >> Setting KVM_DBGGUEST_ENABLE would forward all debug-related traps to >> userspace first, which can then decide to handle or re-inject them. >> KVM_DBGGUEST_SINGLESTEP would work as before. And the extension for x86 >> would look like this: >> >> struct kvm_debug_guest_arch { >> __u32 use_hw_breakpoints; >> __u64 debugreg[8]; >> } >> >> If use_hw_breakpoints is non-zero, KVM would completely overwrite the >> guest's debug registers with the content of debugreg, giving full >> control of this feature to the host-side debugger (faking the content of >> debug registers, effectively disabling them for the guest - as we now do >> all the time). > > Hmmm...so today at least the gdbstub in qemu does not inject traps and > track code that it trapped (I could be mistaken). This whould all need > to be implemented as well. gdbstub inserts "virtual" traps today, ie. a call from the translated guest code to a helper which signals the breakpoint to the stub. And I don't want to change this. I want to add the BP injection/removal to qemu-kvm as it already takes over breakpoint (and soon also watchpoint) maintenance from qemu. However, would the proposed interface for KVM_DEBUG_GUEST (with an appropriate kvm_debug_guest_arch for power) restrict your plans in any way? Jan |
From: Daniel P. B. <ber...@re...> - 2008-05-14 15:25:39
|
On Wed, May 14, 2008 at 09:45:02AM -0500, Javier Guerra wrote: > What about Lua? (http://www.lua.org) > > it started up as a configuration language, and evolved into a full > programming language, while remaining _very_ light (less than 200K > with all libraries), and wonderfully easy to embed into C programs. Config files are data, not programs. Xen made this mistake originally too just having python config files that were eval'd, but thankfully they've defined a sensible data format now. Dan. -- |: Red Hat, Engineering, Boston -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| |
From: Linus T. <tor...@li...> - 2008-05-14 15:18:58
|
On Wed, 14 May 2008, Robin Holt wrote: > > Are you suggesting the sending side would not need to sleep or the > receiving side? One thing to realize is that most of the time (read: pretty much *always*) when we have the problem of wanting to sleep inside a spinlock, the solution is actually to just move the sleeping to outside the lock, and then have something else that serializes things. That way, the core code (protected by the spinlock, and in all the hot paths) doesn't sleep, but the special case code (that wants to sleep) can have some other model of serialization that allows sleeping, and that includes as a small part the spinlocked region. I do not know how XPMEM actually works, or how you use it, but it seriously sounds like that is how things *should* work. And yes, that probably means that the mmu-notifiers as they are now are simply not workable: they'd need to be moved up so that they are inside the mmap semaphore but not the spinlocks. Can it be done? I don't know. But I do know that I'm unlikely to accept a noticeable slowdown in some very core code for a case that affects about 0.00001% of the population. In other words, I think you *have* to do it. Linus |
From: Paul B. <pa...@co...> - 2008-05-14 15:18:26
|
On Wednesday 14 May 2008, Anthony Liguori wrote: > Paul Brook wrote: > >> the "class" field is used to select the device model. Then all the other > >> parameters are used to initialize the device model. That way it is > >> possible to keep the compatibility with the existing options and add a > >> provision to instanciate arbitrary new device models, such as: > > > > I like the idea, but I'm not so keen on the automatic allocation. I > > generally prefer explicit declaration over implicit things. The latter > > makes it very easy to not notice when you make a typo. > > > > It sounds like what you really want is something similar to an OF device > > tree. So you have something like: > > > > # pciide0 may be an alias (possibly provided by qemu) > > # e.g. pci0.slot1.func1.ide > > alias hda ide0.primary.master > > What I don't like about the ide0.primary.master syntax is that there > isn't enough structure. I would prefer: > > alias hda ide,bus=0,primary,master > > If you combine this with your magic variable idea, you could also do: > > alias hda ide,bus=0,unit=$next > > But you could also just fold that into Fabrice's syntax (which I prefer): What I dislike about this is that it's a flat format, where you identify things by setting some combination of attributes. I really like the idea of having a tree structure. Paul |
From: Anthony L. <an...@co...> - 2008-05-14 15:18:22
|
Dor Laor wrote: > On Wed, 2008-05-14 at 17:41 +0300, Avi Kivity wrote: > > Please don't jump over me but I think it is worth mentioning OVF, at > least for to know what's you opinions. > > Open Virtualization Format - > http://www.vmware.com/appliances/learn/ovf.html > > It's xml based, supported by all major hypervisors, so qemu/kvm/xen > users might eventually use a product that support OVF. > xml is a non-starter for QEMU. We go out of our way to be portable. Having an XML dependency that could be satisfied on Windows and Linux would probably require more code to support the dependency than all of QEMU itself. Regards, Anthony Liguori |
From: Jerone Y. <jy...@us...> - 2008-05-14 15:15:23
|
On Mon, 2008-05-12 at 13:34 +0200, Jan Kiszka wrote: > Hi, > > before going wild with my idea, I would like to collect some comments on > this approach: > > While doing first kernel debugging with my debug register patches for > kvm, I quickly ran into the 4-breakpoints-only limitation that comes > from the fact that we blindly map software to hardware breakpoints. > Unhandy, simply suboptimal. Also, having 4 breakpoint slots hard-coded > in the generic interface is not fair to arch that may support more. > Moreover, we do not support watchpoints although this would easily be > feasible. But if we supported watchpoints (via debug registers on x86), > we would need the break out of the 4 slots limitations even earlier. In > short, I came to the conclusion that a rewrite of the KVM_DEBUG_GUEST > interface is required. So embedded power is also limited to 4 hardware registers for break points. But there are 2 sepreate registers fro watch points. The reason to use the registers is the hardware does the work for you and (at least on Power) will throw an exception or trap. Then you deal with it. But you still face the fact that you can only have a small number of breakpoints & watch points. Also you cannot use gdb in the guest at the sametime while using the gdb stub on the guest itself (as there is only one set of registers). > > Why do we set breakpoints in the kernel? Why not simply catching all > debug traps, inserting software breakpoint ops into the guest code, and > handling all this stuff as normal debuggers do? And the hardware > breakpoints should just be pushed through the kernel interface like > ptrace does. See above...But the cpu basically does the work for you. So you don't have to try and go through and first insert a trap into the code in memory. But then you have to remember the code that you replaced with the trap and execute it after you handle the trap. This can get a little hairy. Currently I'm actually implementing breakpoint support now in Power. But you do have to create some mappings to handle traps and see if you put the trap there, and execute the code you replaced. Also what if the breakpoint is removed. Then you have to go back through and actually replace the trap code. Doesn't sound hard, but I'm not sure of all the pitfalls. > > The new KVM_DEBUG_GUEST interface I currently have in mind would look > like this: > > #define KVM_DBGGUEST_ENABLE 0x01 > #define KVM_DBGGUEST_SINGLESTEP 0x02 > > struct kvm_debug_guest { > __u32 control; > struct kvm_debug_guest_arch arch; > } > Setting KVM_DBGGUEST_ENABLE would forward all debug-related traps to > userspace first, which can then decide to handle or re-inject them. > KVM_DBGGUEST_SINGLESTEP would work as before. And the extension for x86 > would look like this: > > struct kvm_debug_guest_arch { > __u32 use_hw_breakpoints; > __u64 debugreg[8]; > } > > If use_hw_breakpoints is non-zero, KVM would completely overwrite the > guest's debug registers with the content of debugreg, giving full > control of this feature to the host-side debugger (faking the content of > debug registers, effectively disabling them for the guest - as we now do > all the time). Hmmm...so today at least the gdbstub in qemu does not inject traps and track code that it trapped (I could be mistaken). This whould all need to be implemented as well. > > Questions: > - Does anyone see traps and pitfalls in this approach? > - May I replace the existing interface with this one, or am I overseeing > some use case that already worked with the current code so that ABI > compatibility is required (most debug stuff should have been simply > broken so far, also due to bugs in userland)? > > Jan > > ------------------------------------------------------------------------- > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > Don't miss this year's exciting event. There's still time to save $100. > Use priority code J8TL2D2. > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > _______________________________________________ kvm-devel mailing list kvm...@li... https://lists.sourceforge.net/lists/listinfo/kvm-devel |