You can subscribe to this list here.
2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(33) |
Nov
(325) |
Dec
(320) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2007 |
Jan
(484) |
Feb
(438) |
Mar
(407) |
Apr
(713) |
May
(831) |
Jun
(806) |
Jul
(1023) |
Aug
(1184) |
Sep
(1118) |
Oct
(1461) |
Nov
(1224) |
Dec
(1042) |
2008 |
Jan
(1449) |
Feb
(1110) |
Mar
(1428) |
Apr
(1643) |
May
(682) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: David M. <da...@da...> - 2008-04-30 03:14:47
|
From: Avi Kivity <av...@qu...> Date: Wed, 30 Apr 2008 00:46:13 +0300 > I sent an email a couple of days ago to pos...@vg...: > > > Hi, please create the following lists for kvm: > > > > kvm (x86 and general discussion) > > kvm-ppc (powerpc, managed by Hollis Blanchard) > > kvm-ia64 (ia64) > > kvm-commits (read-only, tracks commits to kvm git HEAD) > > > > Thanks. I've created (and tested) all of these lists. |
From: Glauber C. <gc...@re...> - 2008-04-30 03:05:48
|
Anthony Liguori wrote: > Glauber Costa wrote: >> Avi Kivity wrote: >> >>> Glauber Costa wrote: >>> >>>> Hi. This is a proposal for reducing the impact of kvm functions in >>>> core qemu >>>> code. This is by all means not ready, but I felt like posting it, so >>>> a discussion >>>> on it could follow. >>>> >>>> The idea in this patch is to replace the specific kvm details from >>>> core qemu files >>>> like vl.c, with driver_yyy() functions. When kvm is not running, >>>> those functions would >>>> just return (most of time), absolutely reducing the impact of kvm code. >>>> >>>> As I wanted to test it, in this patch I changed the kvm functions to >>>> be called driver_yyy(), >>>> but that's not my final goal. I intend to use a function pointer >>>> schema, similar to what the linux >>>> kernel already do for a lot of its subsystem, to isolate the changes. >>>> >>>> Comments deeply welcome. >>>> >>> While I would be very annoyed if someone referred to kvm as a qemu >>> accelerator, I think accelerator_yyy() is more descriptive than >>> driver_yyy(). >>> >> How about booster? ;-) >> > > I don't think the concern from a QEMU perspective is that QEMU is too > intimately tied to KVM. The concern is that overtime, it will be very > difficult to make changes to QEMU without breaking KVM support because > of the shear number of hooks we require. Fabrice had actually suggested > merging libkvm into QEMU. We just need to reduce the overall number of > if (kvm_enabled()) blocks. They are not mutually exclusive. Even if we do merge libkvm into qemu, having it more modular and separated will be way better than having kvm hooks all over. |
From: Marcelo T. <mto...@re...> - 2008-04-30 02:13:35
|
On Tue, Apr 29, 2008 at 07:47:54PM -0500, Anthony Liguori wrote: > Marcelo Tosatti wrote: > >>>Moving the signal handling + pipe write to a separate thread should get > >>>rid of it. > >>> > >>> > >>Yeah, but then you just introduce buffering problems since if you're > >>getting that many signals, the pipe will get full. > >> > > > >It is OK to lose signals if you have at least one queued in the pipe. > > > > If you're getting so many signals that you can't make forward progress > on any system call, you're application is not going to function > anymore. A use of signals in this manner is broken by design. > > >>No point in designing for something that isn't likely to happen in > >>practice. > >> > > > >You should not design something making the assumption that this scenario > >won't happen. > > > >For example this could happen in high throughput guests using POSIX AIO, > >actually pretty likely to happen if data is cached in hosts pagecache. > > > > We really just need to move away from signals as best as we can. I've > got a patch started that implements a thread-pool based AIO mechanism > for QEMU. Notifications are done over a pipe so we don't have to deal > with the unreliability of signals. > > I can't imagine a guest trying to do so much IO though that this would > really ever happen. POSIX AIO can only have one outstanding request > per-fd. To complete the IO request, you would have to eventually go > back to the guest and during that time, the IO thread is going to be > able to make forward progress. No. POSIX AIO can have one _in flight_ AIO per-fd, but many pending AIO requests per-fd. You don't have to go back to guest mode to get more AIO completions. And then you have drivers arming timers. I just don't like the idea. > You won't get a signal again until a new IO request is submitted. > > Regards, > > Anthony Liguori > > >Its somewhat similar to what happens with NAPI and interrupt mitigation. > > > > |
From: SourceForge.net <no...@so...> - 2008-04-30 01:34:09
|
Bugs item #1953353, was opened at 2008-04-28 07:50 Message generated for change (Comment added) made by mtosatti You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=1953353&group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Rafal Wijata (ravpl) Assigned to: Nobody/Anonymous (nobody) Summary: could not load PC BIOS '/path/to/bios.bin' on "-m 4096" Initial Comment: The maximum amount of memory I can give to kvm is ~3560M I run custom compiled kvm-66 on F8 box with Linux mailhub 2.6.24.4-64.fc8 #1 SMP Sat Mar 29 09:15:49 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux The modules are loaded from F8 kernel, rather than those shipped with kvm-66 ---------------------------------------------------------------------- Comment By: Marcelo Tosatti (mtosatti) Date: 2008-04-29 21:34 Message: Logged In: YES user_id=2022487 Originator: NO Can you reproduce the problem with the modules shipped with kvm-66? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=1953353&group_id=180599 |
From: Anthony L. <ali...@us...> - 2008-04-30 00:49:14
|
Marcelo Tosatti wrote: >>> Moving the signal handling + pipe write to a separate thread should get >>> rid of it. >>> >>> >> Yeah, but then you just introduce buffering problems since if you're >> getting that many signals, the pipe will get full. >> > > It is OK to lose signals if you have at least one queued in the pipe. > If you're getting so many signals that you can't make forward progress on any system call, you're application is not going to function anymore. A use of signals in this manner is broken by design. >> No point in designing for something that isn't likely to happen in practice. >> > > You should not design something making the assumption that this scenario > won't happen. > > For example this could happen in high throughput guests using POSIX AIO, > actually pretty likely to happen if data is cached in hosts pagecache. > We really just need to move away from signals as best as we can. I've got a patch started that implements a thread-pool based AIO mechanism for QEMU. Notifications are done over a pipe so we don't have to deal with the unreliability of signals. I can't imagine a guest trying to do so much IO though that this would really ever happen. POSIX AIO can only have one outstanding request per-fd. To complete the IO request, you would have to eventually go back to the guest and during that time, the IO thread is going to be able to make forward progress. You won't get a signal again until a new IO request is submitted. Regards, Anthony Liguori > Its somewhat similar to what happens with NAPI and interrupt mitigation. > > |
From: Marcelo T. <mto...@re...> - 2008-04-30 00:34:49
|
On Tue, Apr 29, 2008 at 07:22:53PM -0500, Anthony Liguori wrote: > Marcelo Tosatti wrote: > >True. > > > >Either way, this starvation due to signals can't happen with the current > >scheme because signals are blocked. It seems a significant drawback. > > > > Practically speaking, I don't see signal starving being a problem. Part > of the benefit of this approach is that we're reducing the overall > number of signals received. With the in-kernel PIT, the number of > userspace signals is even further reduced. > > >Moving the signal handling + pipe write to a separate thread should get > >rid of it. > > > > Yeah, but then you just introduce buffering problems since if you're > getting that many signals, the pipe will get full. It is OK to lose signals if you have at least one queued in the pipe. > No point in designing for something that isn't likely to happen in practice. You should not design something making the assumption that this scenario won't happen. For example this could happen in high throughput guests using POSIX AIO, actually pretty likely to happen if data is cached in hosts pagecache. Its somewhat similar to what happens with NAPI and interrupt mitigation. |
From: Anthony L. <an...@co...> - 2008-04-30 00:27:50
|
Glauber Costa wrote: > Avi Kivity wrote: > >> Glauber Costa wrote: >> >>> Hi. This is a proposal for reducing the impact of kvm functions in >>> core qemu >>> code. This is by all means not ready, but I felt like posting it, so a >>> discussion >>> on it could follow. >>> >>> The idea in this patch is to replace the specific kvm details from >>> core qemu files >>> like vl.c, with driver_yyy() functions. When kvm is not running, those >>> functions would >>> just return (most of time), absolutely reducing the impact of kvm code. >>> >>> As I wanted to test it, in this patch I changed the kvm functions to >>> be called driver_yyy(), >>> but that's not my final goal. I intend to use a function pointer >>> schema, similar to what the linux >>> kernel already do for a lot of its subsystem, to isolate the changes. >>> >>> Comments deeply welcome. >>> >>> >> While I would be very annoyed if someone referred to kvm as a qemu >> accelerator, I think accelerator_yyy() is more descriptive than >> driver_yyy(). >> > How about booster? ;-) > I don't think the concern from a QEMU perspective is that QEMU is too intimately tied to KVM. The concern is that overtime, it will be very difficult to make changes to QEMU without breaking KVM support because of the shear number of hooks we require. Fabrice had actually suggested merging libkvm into QEMU. We just need to reduce the overall number of if (kvm_enabled()) blocks. You have to understand a lot about KVM to know why it's necessary to do an register reload call-out in the vmport hw for instance. Instead of introducing generic hooks in vmport, a more appropriate solution may be to add wrappers to read individual register values within the QEMU device code. We can then add our hooks to that. I think a good argument can be made that devices should never access env-> directly. Regards, Anthony Liguori |
From: Anthony L. <ali...@us...> - 2008-04-30 00:23:22
|
Marcelo Tosatti wrote: > True. > > Either way, this starvation due to signals can't happen with the current > scheme because signals are blocked. It seems a significant drawback. > Practically speaking, I don't see signal starving being a problem. Part of the benefit of this approach is that we're reducing the overall number of signals received. With the in-kernel PIT, the number of userspace signals is even further reduced. > Moving the signal handling + pipe write to a separate thread should get > rid of it. > Yeah, but then you just introduce buffering problems since if you're getting that many signals, the pipe will get full. No point in designing for something that isn't likely to happen in practice. Regards, Anthony Liguori >> No, they're independent of the patch. The symptom is that the guest >> tends to hang during boot for prolonged periods of time. It tends to be >> random whether it will hang at all, how long it will hang, and at what >> point it will hang. It tends to hang more often in particular places. >> In my ubuntu server guest, for instance, it tends to hang right after >> partition probing, right after "Loading hardware drivers", and right >> before spawning a getty on /dev/tty0. >> >> Normally, it will only hang for a few seconds. However, with -smp N >> where N > number of cpus in the host, it tends to happen more >> frequently. Using clock=jiffies in the guest makes the problem go away. >> > > I'll try to reproduce that, thanks. > > >> -no-kvm-pit doesn't make a difference (the guest is normally using the >> pm-timer). >> >> Regards, >> >> Anthony Liguori >> |
From: Marcelo T. <mto...@re...> - 2008-04-30 00:05:18
|
On Tue, Apr 29, 2008 at 06:44:29PM -0500, Anthony Liguori wrote: > Marcelo Tosatti wrote: > >Why? If you leave data in the pipe the next select() won't block. > > > >Isnt there the possibility that this loop can be stuck for significant > >amounts of time? If you're receiving lots of notifications through > >signals. > > > > If you're getting that many signals, then select() is never going to > run anyway. True. Either way, this starvation due to signals can't happen with the current scheme because signals are blocked. It seems a significant drawback. Moving the signal handling + pipe write to a separate thread should get rid of it. > No, they're independent of the patch. The symptom is that the guest > tends to hang during boot for prolonged periods of time. It tends to be > random whether it will hang at all, how long it will hang, and at what > point it will hang. It tends to hang more often in particular places. > In my ubuntu server guest, for instance, it tends to hang right after > partition probing, right after "Loading hardware drivers", and right > before spawning a getty on /dev/tty0. > > Normally, it will only hang for a few seconds. However, with -smp N > where N > number of cpus in the host, it tends to happen more > frequently. Using clock=jiffies in the guest makes the problem go away. I'll try to reproduce that, thanks. > -no-kvm-pit doesn't make a difference (the guest is normally using the > pm-timer). > > Regards, > > Anthony Liguori |
From: Anthony L. <ali...@us...> - 2008-04-29 23:44:56
|
Marcelo Tosatti wrote: > Why? If you leave data in the pipe the next select() won't block. > > Isnt there the possibility that this loop can be stuck for significant > amounts of time? If you're receiving lots of notifications through > signals. > If you're getting that many signals, then select() is never going to run anyway. >>>>>> - kvm_eat_signal(&io_signal_table, NULL, 1000); >>>>>> pthread_mutex_lock(&qemu_mutex); >>>>>> - cpu_single_env = NULL; >>>>>> - main_loop_wait(0); >>>>>> + main_loop_wait(10); >>>>>> >>>>>> >>>>>> >>>>> Increase that 1000 or something. Will make it easier to spot bugs. >>>>> >>>>> >>>>> >>>> I have actually and it does introduce some bugs. I'm not entirely clear >>>> what is causing them though. >>>> >>>> >>> Should indicate that some event previously delivered through signals and >>> received by sigtimedwait is not waking up the IO thread. >>> >>> >> I'll take a look and see. I'm having time keeping issues in the guest >> so it's hard to tell what problems are caused by the IO thread verses time. >> > > Time issues only with the patch? If not, please share details. > No, they're independent of the patch. The symptom is that the guest tends to hang during boot for prolonged periods of time. It tends to be random whether it will hang at all, how long it will hang, and at what point it will hang. It tends to hang more often in particular places. In my ubuntu server guest, for instance, it tends to hang right after partition probing, right after "Loading hardware drivers", and right before spawning a getty on /dev/tty0. Normally, it will only hang for a few seconds. However, with -smp N where N > number of cpus in the host, it tends to happen more frequently. Using clock=jiffies in the guest makes the problem go away. -no-kvm-pit doesn't make a difference (the guest is normally using the pm-timer). Regards, Anthony Liguori |
From: Glauber C. <gc...@re...> - 2008-04-29 23:41:59
|
Avi Kivity wrote: > Glauber Costa wrote: >> Hi. This is a proposal for reducing the impact of kvm functions in >> core qemu >> code. This is by all means not ready, but I felt like posting it, so a >> discussion >> on it could follow. >> >> The idea in this patch is to replace the specific kvm details from >> core qemu files >> like vl.c, with driver_yyy() functions. When kvm is not running, those >> functions would >> just return (most of time), absolutely reducing the impact of kvm code. >> >> As I wanted to test it, in this patch I changed the kvm functions to >> be called driver_yyy(), >> but that's not my final goal. I intend to use a function pointer >> schema, similar to what the linux >> kernel already do for a lot of its subsystem, to isolate the changes. >> >> Comments deeply welcome. >> > > While I would be very annoyed if someone referred to kvm as a qemu > accelerator, I think accelerator_yyy() is more descriptive than > driver_yyy(). How about booster? ;-) > I did not see any references to kqemu, but I imagine you mean this to > abstract kqemu support as well. Yeah, even the kvm part is not complete. As I said, just wanted to get it going. > > Other than that, looks really good. > thanks |
From: Marcelo T. <mto...@re...> - 2008-04-29 23:34:12
|
On Tue, Apr 29, 2008 at 06:15:58PM -0500, Anthony Liguori wrote: > Marcelo Tosatti wrote: > >Problem is if the IO thread _receives_ SIGIPI instead of some vcpu > >thread. > > > >So there is potential harm in not blocking it. > > > > Hrm, aren't SIG_IPIs delivered to a particular thread-id though? When > would the IO thread receive a SIG_IPI? Right, they are only delivered to specific threads. Still not harmful to make it explicit. > >>>What is the reason for this loop instead of a straight read? > >>> > >>>Its alright to be interrupted by a signal. > >>> > >>> > >>Just general habit with QEMU. > >> > > > >Please don't :-) > > > > I don't see the harm. In fact, I think it's more correct. Otherwise, > we have to wait for another invocation of the fd callback. Why? If you leave data in the pipe the next select() won't block. Isnt there the possibility that this loop can be stuck for significant amounts of time? If you're receiving lots of notifications through signals. > >>>>- kvm_eat_signal(&io_signal_table, NULL, 1000); > >>>> pthread_mutex_lock(&qemu_mutex); > >>>>- cpu_single_env = NULL; > >>>>- main_loop_wait(0); > >>>>+ main_loop_wait(10); > >>>> > >>>> > >>>Increase that 1000 or something. Will make it easier to spot bugs. > >>> > >>> > >>I have actually and it does introduce some bugs. I'm not entirely clear > >>what is causing them though. > >> > > > >Should indicate that some event previously delivered through signals and > >received by sigtimedwait is not waking up the IO thread. > > > > I'll take a look and see. I'm having time keeping issues in the guest > so it's hard to tell what problems are caused by the IO thread verses time. Time issues only with the patch? If not, please share details. |
From: Avi K. <av...@qu...> - 2008-04-29 23:31:25
|
Anthony Liguori wrote: > We hold qemu_mutex while machine->init() executes, which issues a VCPU create. > We need to make sure to not return from the VCPU creation until the VCPU > file descriptor is valid to ensure that APIC creation succeeds. > > However, we also need to make sure that the VCPU thread doesn't start running > until the machine->init() is complete. This is addressed today because the > VCPU thread tries to grab the qemu_mutex before doing anything interesting. > If we release qemu_mutex to wait for VCPU creation, then we open a window for > a race to occur. > > This patch introduces two wait conditions. The first lets the VCPU create > code that runs in the IO thread to wait for a VCPU to initialize. The second > condition lets the VCPU thread wait for the machine to fully initialize before > running. > > Applied, thanks. > An added benefit of this patch is it makes the dependencies now explicit. > > Indeed. -- Any sufficiently difficult bug is indistinguishable from a feature. |
From: Avi K. <av...@qu...> - 2008-04-29 23:26:01
|
Fabian Deutsch wrote: > Hey. > > I've been trying Microsoft Windows 2003 a couple of times. The wiki > tells me that "everything" should work okay. It does, when using -smp 1, > but gets ugly when using -smp 2 or so. > > SO might it be useful, to add the column "smp" to the "Guest Support > Status" Page in the wiki? > SMP Windows work best if you have FlexPriority on your hardware. What host cpu are you using? -- Any sufficiently difficult bug is indistinguishable from a feature. |
From: Avi K. <av...@qu...> - 2008-04-29 23:22:24
|
Laurent Vivier wrote: >> Why dst.val is not 0x53e10 ? >> > > I can answer myself to this one: > > emulate_2op_SrcB("sal", c->src, c->dst, ctxt->eflags); > > does nothing if dst.byte == 0 > > So next question is the good question... > > >> Why dst.byte is 0 ? >> >> Because dst.bytes is only set if dst.type == OP_MEM, or ad hoc in the instruction itself. Better to set it unconditionally (and adjust in the instruction if necessary). -- Any sufficiently difficult bug is indistinguishable from a feature. |
From: Anthony L. <ali...@us...> - 2008-04-29 23:16:10
|
Marcelo Tosatti wrote: > Problem is if the IO thread _receives_ SIGIPI instead of some vcpu > thread. > > So there is potential harm in not blocking it. > Hrm, aren't SIG_IPIs delivered to a particular thread-id though? When would the IO thread receive a SIG_IPI? >>> What is the reason for this loop instead of a straight read? >>> >>> Its alright to be interrupted by a signal. >>> >>> >> Just general habit with QEMU. >> > > Please don't :-) > I don't see the harm. In fact, I think it's more correct. Otherwise, we have to wait for another invocation of the fd callback. >>>> - kvm_eat_signal(&io_signal_table, NULL, 1000); >>>> pthread_mutex_lock(&qemu_mutex); >>>> - cpu_single_env = NULL; >>>> - main_loop_wait(0); >>>> + main_loop_wait(10); >>>> >>>> >>> Increase that 1000 or something. Will make it easier to spot bugs. >>> >>> >> I have actually and it does introduce some bugs. I'm not entirely clear >> what is causing them though. >> > > Should indicate that some event previously delivered through signals and > received by sigtimedwait is not waking up the IO thread. > I'll take a look and see. I'm having time keeping issues in the guest so it's hard to tell what problems are caused by the IO thread verses time. Regards, Anthony Liguori |
From: Anthony L. <ali...@us...> - 2008-04-29 23:12:53
|
Hollis Blanchard wrote: > On Tuesday 29 April 2008 17:17:49 Avi Kivity wrote: > >> I like this very much, as it only affects accessors and not the mmu core >> itself. >> >> Hollis/Xiantao/Carsten, can you confirm that this approach works for >> you? Carsten, I believe you don't have mmio, but at least this >> shouldn't interfere. >> > > OK, so the idea is to mmap /sys/bus/pci/.../region within the guest RAM area, > and just include it within the normal guest RAM memslot? > In the case of x86, since the PCI IO region is pretty far away from normal RAM, we'll probably use a different memory slot, but yes, that's the general idea. IIUC PPC correctly, all IO pages have corresponding struct pages. This means that get_user_pages() would succeed and you can reference count them? In this case, we would never take the VM_PFNMAP path. Is that correct? > How will the IOMMU be programmed? Wouldn't you still need to register a > special type of memslot for that? > That's independent of this patchset. For non-aware guests, we'll have to pin all of physical memory up front and then create an IOMMU table from the pinned physical memory. For aware guests with a PV DMA window API, we'll be able to build that mapping on the fly (enforcing mlock allocation limits). Regards, Anthony Liguori |
From: Avi K. <av...@qu...> - 2008-04-29 23:11:55
|
Jan Kiszka wrote: > This still leaves me with the question how to handle the case when the > host sets and arms some debug registers to debug the guest and the > latter does the same to debug itself. Guest access will be trapped, OK, > but KVM will then have to decide which value should actually be > transfered into the registers. Hmm, does SVM virtualizes all debug > registers, leaving the real ones to the host? > There's no way this can work. There are still only four debug registers, and the guest and host together can ask for eight different addresses. It is theoretically doable by hiding all mappings to pages that are debug targets, but it would probably double the kvm code size. A good short-term compomise is to abort if the guest starts using enabling a debug address register. A better solution might be to place host debug addresses into unused guest debug registers, so that as long as nr_guest_debug + nr_host_debug <= 4, we can still proceed. -- Any sufficiently difficult bug is indistinguishable from a feature. |
From: Marcelo T. <mto...@re...> - 2008-04-29 23:10:01
|
On Tue, Apr 29, 2008 at 05:42:51PM -0500, Anthony Liguori wrote: > Marcelo Tosatti wrote: > >Hi Anthony, > > > >How is -no-kvm-irqchip working with the patch? > > > > Seems to work fine. What is your expectation? Just wondering if vcpu's are being properly awake. > >Make sure the IO thread has SIG_IPI blocked (those are for APIC vcpu > >initialization only). > > > > Just so I'm clear, there's really no harm in not blocking SIG_IPI > because it would just be ignored by the IO thread (since the SIG_IPI > handler is a nop). But yeah, we should explicitly block it. Problem is if the IO thread _receives_ SIGIPI instead of some vcpu thread. So there is potential harm in not blocking it. > >What is the reason for this loop instead of a straight read? > > > >Its alright to be interrupted by a signal. > > > > Just general habit with QEMU. Please don't :-) > >>- kvm_eat_signal(&io_signal_table, NULL, 1000); > >> pthread_mutex_lock(&qemu_mutex); > >>- cpu_single_env = NULL; > >>- main_loop_wait(0); > >>+ main_loop_wait(10); > >> > > > >Increase that 1000 or something. Will make it easier to spot bugs. > > > > I have actually and it does introduce some bugs. I'm not entirely clear > what is causing them though. Should indicate that some event previously delivered through signals and received by sigtimedwait is not waking up the IO thread. |
From: Hollis B. <ho...@us...> - 2008-04-29 22:57:27
|
On Tuesday 29 April 2008 17:17:49 Avi Kivity wrote: > Anthony Liguori wrote: > > This patch allows VMA's that contain no backing page to be used for guest > > memory. This is a drop-in replacement for Ben-Ami's first page in his direct > > mmio series. Here, we continue to allow mmio pages to be represented in the > > rmap. > > > > > > I like this very much, as it only affects accessors and not the mmu core > itself. > > Hollis/Xiantao/Carsten, can you confirm that this approach works for > you? Carsten, I believe you don't have mmio, but at least this > shouldn't interfere. OK, so the idea is to mmap /sys/bus/pci/.../region within the guest RAM area, and just include it within the normal guest RAM memslot? How will the IOMMU be programmed? Wouldn't you still need to register a special type of memslot for that? -- Hollis Blanchard IBM Linux Technology Center |
From: Avi K. <av...@qu...> - 2008-04-29 22:54:00
|
Anthony Liguori wrote: > Avi Kivity wrote: >> It depends on what's going on? Does a page table point to mmio? Or >> the glommerclock? >> >> Not sure there is a single answer. >> >>> Perhaps we should be replacing consumers of gfn_to_page() with >>> copy_to_user() instead? >> >> Indeed we should. The problem is access in atomic contexts. It's >> easy to detect failure, but not always easy to handle it. > > So I think we should replace it with a rate limited printk and > returning bad_page. That way the guest can't exploit it and we'll > still hopefully get printk()s to track down instances of things going > bad. > Agreed. Add a stacktrace so we can see what causes the badness. -- Any sufficiently difficult bug is indistinguishable from a feature. |
From: Anthony L. <ali...@us...> - 2008-04-29 22:51:48
|
Avi Kivity wrote: > It depends on what's going on? Does a page table point to mmio? Or > the glommerclock? > > Not sure there is a single answer. > >> Perhaps we should be replacing consumers of gfn_to_page() with >> copy_to_user() instead? > > Indeed we should. The problem is access in atomic contexts. It's > easy to detect failure, but not always easy to handle it. So I think we should replace it with a rate limited printk and returning bad_page. That way the guest can't exploit it and we'll still hopefully get printk()s to track down instances of things going bad. Regards, Anthony Liguori |
From: Avi K. <av...@qu...> - 2008-04-29 22:50:09
|
Amit Shah wrote: > >>> + if (is_error_page(host_page)) { >>> + printk(KERN_INFO "%s: gfn %p not valid\n", >>> + __func__, (void *)page_gfn); >>> + r = -1; >>> >> r = -1 is not really informative. Better use some meaningful error. >> > > The error's going to the guest. The guest, as we know, has already done a > successful DMA allocation. Something went wrong in the hypercall, and we > don't know why (bad page). Any kind of error here isn't going to be > intelligible to the guest anyway. It's mostly a host thing if we ever hit > this. > > If the guest is not able to handle it, why bother returning an error? Better to kill it. But in any case, -1 is not a good error number. >>> + if (find_pci_pt_dev(&vcpu->kvm->arch.pci_pt_dev_head, >>> + &pci_pt_info, 0, KVM_PT_SOURCE_ASSIGN)) >>> + r++; /* We have assigned the device */ >>> + >>> + kunmap(host_page); >>> >> better use atomic mappings here. >> > > We can't use atomic mappings for guest pages. They can be swapped out. > kmap()ed pages can't be swapped out either. The atomic in kmap_atomic() only refers to the context in which the pages are used. Atmoic kmaps are greatly preferable to the nonatomic ones. -- Any sufficiently difficult bug is indistinguishable from a feature. |
From: Anthony L. <ali...@us...> - 2008-04-29 22:44:28
|
Marcelo Tosatti wrote: > Hi Anthony, > > How is -no-kvm-irqchip working with the patch? > Seems to work fine. What is your expectation? > On Tue, Apr 29, 2008 at 09:28:14AM -0500, Anthony Liguori wrote: > >> This patch eliminates the use of sigtimedwait() in the IO thread. To avoid the >> signal/select race condition, we use a pipe that we write to in the signal >> handlers. This was suggested by Rusty and seems to work well. >> >> +static int kvm_eat_signal(CPUState *env, int timeout) >> { >> struct timespec ts; >> int r, e, ret = 0; >> siginfo_t siginfo; >> + sigset_t waitset; >> >> + sigemptyset(&waitset); >> + sigaddset(&waitset, SIG_IPI); >> ts.tv_sec = timeout / 1000; >> ts.tv_nsec = (timeout % 1000) * 1000000; >> - r = sigtimedwait(&waitset->sigset, &siginfo, &ts); >> + qemu_kvm_unlock(); >> + r = sigtimedwait(&waitset, &siginfo, &ts); >> + qemu_kvm_lock(env); >> + cpu_single_env = env; >> > > This assignment seems redundant now. > Yeah, I have a bigger patch which eliminates all of the explicit assignments to cpu_single_env. >> >> @@ -263,12 +238,8 @@ static void pause_all_threads(void) >> vcpu_info[i].stop = 1; >> pthread_kill(vcpu_info[i].thread, SIG_IPI); >> > > Make sure the IO thread has SIG_IPI blocked (those are for APIC vcpu > initialization only). > Just so I'm clear, there's really no harm in not blocking SIG_IPI because it would just be ignored by the IO thread (since the SIG_IPI handler is a nop). But yeah, we should explicitly block it. >> +static void sig_aio_fd_read(void *opaque) >> +{ >> + int signum; >> + ssize_t len; >> + >> + do { >> + len = read(kvm_sigfd[0], &signum, sizeof(signum)); >> + } while (len == -1 && errno == EINTR); >> > > What is the reason for this loop instead of a straight read? > > Its alright to be interrupted by a signal. > Just general habit with QEMU. >> + signal(SIGUSR1, sig_aio_handler); >> + signal(SIGUSR2, sig_aio_handler); >> + signal(SIGALRM, sig_aio_handler); >> + signal(SIGIO, sig_aio_handler); >> + >> + if (pipe(kvm_sigfd) == -1) >> + abort(); >> > > perror() would be nice. > Yeah, everything needs proper error handling. >> - kvm_eat_signal(&io_signal_table, NULL, 1000); >> pthread_mutex_lock(&qemu_mutex); >> - cpu_single_env = NULL; >> - main_loop_wait(0); >> + main_loop_wait(10); >> > > Increase that 1000 or something. Will make it easier to spot bugs. > I have actually and it does introduce some bugs. I'm not entirely clear what is causing them though. Regards, Anthony Liguori > Similarly in qemu_kvm_aio_wait(). > > |
From: Avi K. <av...@qu...> - 2008-04-29 22:43:51
|
Anthony Liguori wrote: >> >> >>> >>> struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn) >>> { >>> - return pfn_to_page(gfn_to_pfn(kvm, gfn)); >>> + pfn_t pfn; >>> + >>> + pfn = gfn_to_pfn(kvm, gfn); >>> + if (pfn_valid(pfn)) >>> + return pfn_to_page(pfn); >>> + >>> + return NULL; >>> } >>> >> >> You're returning NULL here, not bad_page. >> > > My thinking was that bad_page indicates that the gfn is invalid. This > is a different type of error though. The problem is that the guest is > we are trying to kmap() a page that has no struct page associated with > it. I'm not sure what the right thing to do here is. > It depends on what's going on? Does a page table point to mmio? Or the glommerclock? Not sure there is a single answer. > Perhaps we should be replacing consumers of gfn_to_page() with > copy_to_user() instead? Indeed we should. The problem is access in atomic contexts. It's easy to detect failure, but not always easy to handle it. -- Any sufficiently difficult bug is indistinguishable from a feature. |