You can subscribe to this list here.
2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(33) |
Nov
(325) |
Dec
(320) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2007 |
Jan
(484) |
Feb
(438) |
Mar
(407) |
Apr
(713) |
May
(831) |
Jun
(806) |
Jul
(1023) |
Aug
(1184) |
Sep
(1118) |
Oct
(1461) |
Nov
(1224) |
Dec
(1042) |
2008 |
Jan
(1449) |
Feb
(1110) |
Mar
(1428) |
Apr
(1643) |
May
(682) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: David S. A. <da...@ci...> - 2008-04-24 17:25:03
|
What is the rip (GUEST_RIP) value in the VMEXIT trace output? Is that the current instruction pointer for the guest? I take it the virt in the PAGE_FAULT trace output is the virtual address the guest was referencing when the page fault occurred. What I don't understand (one of many things really) is what the 0xfffb63b0 corresponds to in the guest. Any ideas? Also, the expensive page fault occurs on errorcode = 0x0000000b (PAGE_FAULT trace data). What does the 4th bit in 0xb mean? bit 0 set means PFERR_PRESENT_MASK is set, and bit 1 means PT_WRITABLE_MASK. What is bit 3? david David S. Ahern wrote: > > Avi Kivity wrote: >> Ah! The flood detector is not seeing the access through the >> kmap_atomic() pte, because that access has gone through the emulator. >> last_updated_pte_accessed(vcpu) will never return true. >> >> Can you verify that last_updated_pte_accessed(vcpu) indeed always >> returns false? >> > > It returns both true and false. I added a tracer to kvm_mmu_pte_write() to dump > the rc of last_updated_pte_accessed(vcpu). ie., > pte_access = last_updated_pte_accessed(vcpu); > KVMTRACE_1D(PTE_ACCESS, vcpu, (u32) pte_access, handler); > > A sample: > > (+ 4488) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 c016104a ] > (+ 0) PAGE_FAULT [ errorcode = 0x0000000b, virt = 0x00000000 fffb63b0 ] > (+ 2480) PAGE_FAULT1 [ write_count = 0 ] > (+ 424) PAGE_FAULT2 [ level = 2 metaphysical = 0 access 0x00000007 ] > (+ 51672) PAGE_FAULT3 > (+ 472) PAGE_FAULT4 > (+ 704) PAGE_FAULT5 [ shadow_ent = 0x80000001 2dfb5043 ] > (+ 1496) VMENTRY > (+ 4568) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 c01610e7 ] > (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 c0009db4 ] > (+ 2352) PAGE_FAULT1 [ write_count = 0 ] > (+ 728) PAGE_FAULT5 [ shadow_ent = 0x00000001 91409041 ] > (+ 0) PTE_WRITE [ gpa = 0x00000000 00009db4 gpte = 0x00000000 41fb5363 ] > (+ 0) PTE_ACCESS [ pte_access = 1 ] > (+ 6864) VMENTRY > (+ 3896) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 c01610ee ] > (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 c0009db0 ] > (+ 2376) PAGE_FAULT1 [ write_count = 1 ] > (+ 720) PAGE_FAULT5 [ shadow_ent = 0x00000001 91409041 ] > (+ 0) PTE_WRITE [ gpa = 0x00000000 00009db0 gpte = 0x00000000 00000000 ] > (+ 0) PTE_ACCESS [ pte_access = 0 ] > (+ 12344) VMENTRY > (+ 4688) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 c016127c ] > (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 c0009db4 ] > (+ 2416) PAGE_FAULT1 [ write_count = 2 ] > (+ 792) PAGE_FAULT5 [ shadow_ent = 0x00000001 91409043 ] > (+ 1128) VMENTRY > (+ 4512) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 c016104a ] > (+ 0) PAGE_FAULT [ errorcode = 0x0000000b, virt = 0x00000000 fffb63b0 ] > (+ 2448) PAGE_FAULT1 [ write_count = 0 ] > (+ 448) PAGE_FAULT2 [ level = 2 metaphysical = 0 access 0x00000007 ] > (+ 51520) PAGE_FAULT3 > (+ 432) PAGE_FAULT4 > (+ 696) PAGE_FAULT5 [ shadow_ent = 0x80000001 2df5a043 ] > (+ 1480) VMENTRY > > > david > |
From: Marcelo T. <mto...@re...> - 2008-04-24 17:11:24
|
On Wed, Apr 23, 2008 at 09:30:06AM +0300, Avi Kivity wrote: > > as I got no reply, I guess it is a bad setup on my part. If that might > > help, this happenned while I was doing a "make -j" on webkit svn tree > > (ie. heavy c++ compilation workload) . > > > > > > No this is not bad setup. No amount of bad setup should give this warning. > > You didn't get a reply because no one knows what to make of it, and > because it's much more fun to debate endianess or contemplete guests > with eighty thousand disks than to fix those impossible bugs. If you > can give clear instructions on how to reproduce this, we will try it > out. Please be sure to state OS name and versions for the guest as well > as the host. It is valid to have more than PAGES_PER_HPAGE in the largepage's shadowed count. If the gpte read races with a pte-update-from-guest (and the pte update results in a different sp->role), it might account twice for a single gfn. Such "zombie" shadow pages should eventually be removed through recycling, allowing for instantiation of a large page, unless references can be leaked. Can't spot such leakage problem though. Thoughts? diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 2ad6f54..8ae2118 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -387,7 +387,6 @@ static void account_shadowed(struct kvm *kvm, gfn_t gfn) write_count = slot_largepage_idx(gfn, gfn_to_memslot(kvm, gfn)); *write_count += 1; - WARN_ON(*write_count > KVM_PAGES_PER_HPAGE); } static void unaccount_shadowed(struct kvm *kvm, gfn_t gfn) |
From: Avi K. <av...@qu...> - 2008-04-24 16:52:38
|
Marcelo Tosatti wrote: > On Thu, Apr 24, 2008 at 04:44:27PM +0300, Avi Kivity wrote: > >> Chris Lalancette wrote: >> >>> Avi Kivity wrote: >>> >>> >>>> Ok. __pit_timer_fn() is called from an interrupt, which then calls >>>> smp_call_function_single(), which calls spin_lock(). If we've already >>>> taken the lock, we hang. >>>> >>>> >>>> >>> Ah. Just adding a "me too"; I didn't get a chance to debug it yesterday, >>> but I >>> was seeing similar problems. If I disabled in-kernel pit with >>> -no-kvm-pit, all >>> was well. >>> >>> >> How to fix it, though? the only idea that comes to mind is to affine >> the hrtimer with vcpu0 (like the local apic timers) which would mean we >> only need to unwait the waitqueue, and never need to send the IPI. >> Would slightly improve performance as well. >> > > Yes, agree. > > For now I think just revert > I committed this, so this should be fixed for now. I'm not sure hrtimer migration would work 100% reliably (suppose it fired just after a vcpu migration) so I think a queue_work is better. -- error compiling committee.c: too many arguments to function |
From: Avi K. <av...@qu...> - 2008-04-24 16:39:03
|
Mohammed Gamal wrote: > Hi, > My Project proposal "Improving and Stabilizing Real-Mode Support for > Intel Hosts" has been accepted into Google Summer of Code under the > Linux Foundation. You may have a look at the proposal abstract here: > http://code.google.com/soc/2008/linux/appinfo.html?csaid=1CC1C8B4CCC1120E > . > > Excellent! It's an area where kvm has been lagging for a long time. > Any pointers on where to start, what would you like to see done, and > any other comments and suggestions would greatly be appreciated. > It boils down to: - removing the current hacks (fix_rmode_dataseg etc) - identifying when the vcpu state does not allow using VT (when a segment limit != 65535, or when ss.rpl != cpl in protected mode) - can be done by intercepting vmentry failures - or by looking at the state directly when entering real mode or protected mode - trapping to the emulator in this case - extending the emulator to support any opcodes which we will encounter when doing this Guillaume Thouvenin (copied) has been working on this lately. -- error compiling committee.c: too many arguments to function |
From: Hollis B. <ho...@us...> - 2008-04-24 16:38:03
|
On Thursday 24 April 2008 07:54:10 Avi Kivity wrote: > I propose moving the kvm lists to vger.kernel.org, for the following > benefits: > > - better spam control > - faster service (I see significant lag with the sourceforge lists) > - no ads appended to the end of each email > > If no objections are raised, and if the vger postmasters agree, I will > mass subscribe the current subscribers so that there will be no service > interruption. Sounds good to me. Will we continue to have manual moderation for non-subscribers? That seems to be a good way to handle the last 1% of spam that sneaks through the automated filters. -- Hollis Blanchard IBM Linux Technology Center |
From: Marcelo T. <mto...@re...> - 2008-04-24 16:28:55
|
On Thu, Apr 24, 2008 at 04:44:27PM +0300, Avi Kivity wrote: > Chris Lalancette wrote: > >Avi Kivity wrote: > > > >>Ok. __pit_timer_fn() is called from an interrupt, which then calls > >>smp_call_function_single(), which calls spin_lock(). If we've already > >>taken the lock, we hang. > >> > >> > > > >Ah. Just adding a "me too"; I didn't get a chance to debug it yesterday, > >but I > >was seeing similar problems. If I disabled in-kernel pit with > >-no-kvm-pit, all > >was well. > > > > How to fix it, though? the only idea that comes to mind is to affine > the hrtimer with vcpu0 (like the local apic timers) which would mean we > only need to unwait the waitqueue, and never need to send the IPI. > Would slightly improve performance as well. Yes, agree. For now I think just revert --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -200,10 +200,8 @@ int __pit_timer_fn(struct kvm_kpit_state *ps) atomic_inc(&pt->pending); smp_mb__after_atomic_inc(); - if (vcpu0 && waitqueue_active(&vcpu0->wq)) { - vcpu0->arch.mp_state = KVM_MP_STATE_RUNNABLE; - wake_up_interruptible(&vcpu0->wq); - } + if (vcpu0) + kvm_vcpu_kick(vcpu0); And add a big fat FIXME. |
From: Avi K. <av...@qu...> - 2008-04-24 16:14:13
|
Dor Laor wrote: > On Thu, 2008-04-24 at 15:47 +0300, Avi Kivity wrote: > >> Dor Laor wrote: >> >>> while investigating the revert of "fix sci irq set when acpi timer" I >>> discovered the reason. Please also re-revert the original patch. >>> >>> >> Applied, but system_powerdown still doesn't work with the sci acpi timer >> fix. >> >> >> > > With the above patch and revert the reversion of "fix sci irq set when > acpi timer" it works for me. Can you give some details or try again? > > My host is 2.6.24.3-50-fc8, your kvm head (user > da00aa5090cf2de4259960ea659ac783bd65a041 + re-revert, kernel > 1736e3f0a3e173a7f452b32ad19bfff085a1a255). > Guest is winXp with acpi hal. After it boots I go to the monitor and > issue a system_powerdown. > Everything works and there is no segfaults (tested multiple time) > > With current master, Windows and Linux (FC6, x86_64) both shut down correctly after system_powerdown. With the sci timer thing re-applied, Windows continues to work but Linux no longer responds to ACPI powerdown. -- error compiling committee.c: too many arguments to function |
From: Anthony L. <an...@co...> - 2008-04-24 16:05:40
|
Mohammed Gamal wrote: > Hi, > My Project proposal "Improving and Stabilizing Real-Mode Support for > Intel Hosts" has been accepted into Google Summer of Code under the > Linux Foundation. You may have a look at the proposal abstract here: > http://code.google.com/soc/2008/linux/appinfo.html?csaid=1CC1C8B4CCC1120E > . > > Any pointers on where to start, what would you like to see done, and > any other comments and suggestions would greatly be appreciated. > We have a two stage plan to address real-mode on Intel systems. Both involve using x86_emulate() to emulate 16-bit (and 32-bit) instructions that VT cannot handle. The first stage is to detect vmentry failures and run x86_emulate() for a single instruction. If you look at the mailing list, you'll see patches from myself and Guillaume. This should be enough to allow most Ubuntu installer CDs to work under KVM. In this case, the CDs are using a version of GFXBOOT that doesn't use big real mode, but still relies on an undefined architectural aspect that VT doesn't support. The second stage is to use a loop of x86_emulate() to run all 16-bit code (instead of using vm86 mode). This will allow us to support guests that use big real mode. The best place to start is probably to try out some of the patches on the list, and get familiar with the GFXBOOT assembly routines. There's a #kvm in FreeNode, that's a good place to start if you're having trouble getting started. Regards, Anthony Liguori > Regards, > Mohammed > > ------------------------------------------------------------------------- > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > Don't miss this year's exciting event. There's still time to save $100. > Use priority code J8TL2D2. > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > _______________________________________________ > kvm-devel mailing list > kvm...@li... > https://lists.sourceforge.net/lists/listinfo/kvm-devel > |
From: Mohammed G. <m.g...@gm...> - 2008-04-24 16:01:33
|
On Thu, Apr 24, 2008 at 5:47 PM, Avi Kivity <av...@qu...> wrote: > Mohammed Gamal wrote: > > > Hi, > > My Project proposal "Improving and Stabilizing Real-Mode Support for > > Intel Hosts" has been accepted into Google Summer of Code under the > > Linux Foundation. You may have a look at the proposal abstract here: > > http://code.google.com/soc/2008/linux/appinfo.html?csaid=1CC1C8B4CCC1120E > > . > > > > > > > > Excellent! It's an area where kvm has been lagging for a long time. > > > > > Any pointers on where to start, what would you like to see done, and > > any other comments and suggestions would greatly be appreciated. > > > > > > It boils down to: > > - removing the current hacks (fix_rmode_dataseg etc) > - identifying when the vcpu state does not allow using VT > (when a segment limit != 65535, or when ss.rpl != cpl in protected mode) > - can be done by intercepting vmentry failures > - or by looking at the state directly when entering real mode or protected > mode > - trapping to the emulator in this case > - extending the emulator to support any opcodes which we will encounter > when doing this > > Guillaume Thouvenin (copied) has been working on this lately. > > -- > error compiling committee.c: too many arguments to function > > Thanks for the feedback, this has been really helpful. I also saw Guillame's work and I'd be very much willing to cooperate :). Regards, Mohammed |
From: Avi K. <av...@qu...> - 2008-04-24 16:01:33
|
Hollis Blanchard wrote: > On Thursday 24 April 2008 07:54:10 Avi Kivity wrote: > >> I propose moving the kvm lists to vger.kernel.org, for the following >> benefits: >> >> - better spam control >> - faster service (I see significant lag with the sourceforge lists) >> - no ads appended to the end of each email >> >> If no objections are raised, and if the vger postmasters agree, I will >> mass subscribe the current subscribers so that there will be no service >> interruption. >> > > Sounds good to me. > > Will we continue to have manual moderation for non-subscribers? That seems to > be a good way to handle the last 1% of spam that sneaks through the automated > filters. > kvm-devel doesn't do manual moderation. If vger has the infrastructure, I don't see why you can't continue doing this on kvm-ppc-devel. btw, we can probably shorten the names to kvm@ and kvm-$arch@ while we're at it. -- error compiling committee.c: too many arguments to function |
From: Chris L. <cla...@re...> - 2008-04-24 15:44:07
|
Avi Kivity wrote: > >> 1811: 8d 86 00 00 ff 3f lea 0x3fff0000(%rsi),%eax >> 1817: 83 f8 03 cmp $0x3,%eax >> 181a: 0f 87 e2 01 00 00 ja 1a02 <svm_set_msr+0x27f> > Now it uses %rsi instead of %esi, and any junk in the upper bits will > cause the ja to be taken. > > We need to get a reduced testcase to the gcc folks, this is a serious > bug. Any changes in the code to work around this would be fragile. > Ouch, I missed this on my reading of it. I'll try to come up with a standalone program that shows this. Thanks, Avi. Chris Lalancette |
From: Andrea A. <an...@qu...> - 2008-04-24 15:39:47
|
On Thu, Apr 24, 2008 at 04:51:12AM -0500, Robin Holt wrote: > It seems to me the work done by mmu_notifier_mm_destroy should really > be done inside the mm_lock()/mm_unlock area of mmu_unregister and There's no mm_lock/unlock for mmu_unregister anymore. That's the whole point of using srcu so it becomes reliable and quick. > mm_notifier_release when we have removed the last entry. That would > give the users job the same performance after they are done using the > special device that they had prior to its use. That's not feasible. Otherwise mmu_notifier_mm will go away at any time under both _release from exit_mmap and under _unregister too. exit_mmap holds an mm_count implicit, so freeing mmu_notifier_mm after the last mmdrop makes it safe. mmu_notifier_unregister also holds the mm_count because mm_count was pinned by mmu_notifier_register. That solves the issue with mmu_notifier_mm going away from under mmu_notifier_unregister and _release and that's why it can only be freed after mm_count == 0. There's at least one small issue I noticed so far, that while _release don't need to care about _register, but _unregister definitely need to care about _register. I've to take the mmap_sem in addition or in replacement of the unregister_lock. The srcu_read_lock can also likely moved just before releasing the unregister_lock but that's just a minor optimization to make the code more strict. > On Thu, Apr 24, 2008 at 08:49:40AM +0200, Andrea Arcangeli wrote: > ... > > diff --git a/mm/memory.c b/mm/memory.c > > --- a/mm/memory.c > > +++ b/mm/memory.c > ... > > @@ -603,25 +605,39 @@ > > * readonly mappings. The tradeoff is that copy_page_range is more > > * efficient than faulting. > > */ > > + ret = 0; > > if (!(vma->vm_flags & (VM_HUGETLB|VM_NONLINEAR|VM_PFNMAP|VM_INSERTPAGE))) { > > if (!vma->anon_vma) > > - return 0; > > + goto out; > > } > > > > - if (is_vm_hugetlb_page(vma)) > > - return copy_hugetlb_page_range(dst_mm, src_mm, vma); > > + if (unlikely(is_vm_hugetlb_page(vma))) { > > + ret = copy_hugetlb_page_range(dst_mm, src_mm, vma); > > + goto out; > > + } > > > > + if (is_cow_mapping(vma->vm_flags)) > > + mmu_notifier_invalidate_range_start(src_mm, addr, end); > > + > > + ret = 0; > > I don't think this is needed. It's not needed right, but I thought it was cleaner if they all use "ret" after I had to change the code at the end of the function. Anyway I'll delete this to make the patch shorter and only change the minimum, agreed. > ... > > +/* avoid memory allocations for mm_unlock to prevent deadlock */ > > +void mm_unlock(struct mm_struct *mm, struct mm_lock_data *data) > > +{ > > + if (mm->map_count) { > > + if (data->nr_anon_vma_locks) > > + mm_unlock_vfree(data->anon_vma_locks, > > + data->nr_anon_vma_locks); > > + if (data->i_mmap_locks) > > I think you really want data->nr_i_mmap_locks. Indeed. It never happens that there are zero vmas with filebacked mappings, this is why this couldn't be triggered in practice, thanks! > The second paragraph of this comment seems extraneous. ok removed. > > + /* > > + * Wait ->release if mmu_notifier_unregister run list_del_rcu. > > + * srcu can't go away from under us because one mm_count is > > + * hold by exit_mmap. > > + */ > > These two sentences don't make any sense to me. Well that was a short explanation of why the mmu_notifier_mm structure can only be freed after the last mmdrop, which is what you asked at the top. I'll try to rephrase. > > +void mmu_notifier_unregister(struct mmu_notifier *mn, struct mm_struct *mm) > > +{ > > + int before_release = 0, srcu; > > + > > + BUG_ON(atomic_read(&mm->mm_count) <= 0); > > + > > + srcu = srcu_read_lock(&mm->mmu_notifier_mm->srcu); > > + spin_lock(&mm->mmu_notifier_mm->unregister_lock); > > + if (!hlist_unhashed(&mn->hlist)) { > > + hlist_del_rcu(&mn->hlist); > > + before_release = 1; > > + } > > + spin_unlock(&mm->mmu_notifier_mm->unregister_lock); > > + if (before_release) > > + /* > > + * exit_mmap will block in mmu_notifier_release to > > + * guarantee ->release is called before freeing the > > + * pages. > > + */ > > + mn->ops->release(mn, mm); > > I am not certain about the need to do the release callout when the driver > has already told this subsystem it is done. For XPMEM, this callout > would immediately return. I would expect it to be the same or GRU. The point is that you don't want to run it twice. And without this you will have to serialize against ->release yourself in the driver. It's much more convenient if you know that ->release will be called just once, and before mmu_notifier_unregister returns. It could be called by _release even after you're already inside _unregister, _release may reach the spinlock before _unregister, you won't notice the difference. Solving this race in the driver looked too complex, I rather solve it once inside the mmu notifier code to be sure. Missing a release event is fatal because all sptes have to be dropped before _release returns. The requirement is the same for _unregister, all sptes have to be dropped before it returns. ->release should be able to sleep as long as it wants even with only 1/N applied. exit_mmap can sleep too, no problem. You can't unregister inside ->release first of all because 'ret' instruction must be still allocated to return to mmu notifier code. |
From: Avi K. <av...@qu...> - 2008-04-24 15:20:41
|
Chris Lalancette wrote: > Avi Kivity wrote: > >> You mean the gcc generates wrong code? It seems fine here (though >> wonderfully obfuscated). >> >> Can you attach an objdump -Sr svm.o? Also, what gcc version are you using? >> >> > > (sending attachment in private mail, so I don't spam the whole list with 189K of > objdump). Note that this is an objdump -Sr of the current code, with my patch > *not* applied. > > gcc is gcc-4.3.0-7 in Fedora 9. > > It's a gcc bug. svm_set_msr() places ecx in %rsi, and consistently uses %esi to refer to the first 32 bits. But when it compiles this bit: > case MSR_K7_EVNTSEL0: > case MSR_K7_EVNTSEL1: > case MSR_K7_EVNTSEL2: > case MSR_K7_EVNTSEL3: > /* > * only support writing 0 to the performance counters for now > * to make Windows happy. Should be replaced by a real > * performance counter emulation later. > */ > if (data != 0) > goto unhandled; > break; (where MSR_K7_EVENTSEL[0123] == 0xc001000[0123]) it compiles it into > 1811: 8d 86 00 00 ff 3f lea 0x3fff0000(%rsi),%eax > 1817: 83 f8 03 cmp $0x3,%eax > 181a: 0f 87 e2 01 00 00 ja 1a02 <svm_set_msr+0x27f> Now it uses %rsi instead of %esi, and any junk in the upper bits will cause the ja to be taken. We need to get a reduced testcase to the gcc folks, this is a serious bug. Any changes in the code to work around this would be fragile. -- error compiling committee.c: too many arguments to function |
From: Mohammed G. <m.g...@gm...> - 2008-04-24 14:58:45
|
Hi, My Project proposal "Improving and Stabilizing Real-Mode Support for Intel Hosts" has been accepted into Google Summer of Code under the Linux Foundation. You may have a look at the proposal abstract here: http://code.google.com/soc/2008/linux/appinfo.html?csaid=1CC1C8B4CCC1120E . Any pointers on where to start, what would you like to see done, and any other comments and suggestions would greatly be appreciated. Regards, Mohammed |
From: Avi K. <av...@qu...> - 2008-04-24 13:44:37
|
Chris Lalancette wrote: > Avi Kivity wrote: > >> Ok. __pit_timer_fn() is called from an interrupt, which then calls >> smp_call_function_single(), which calls spin_lock(). If we've already >> taken the lock, we hang. >> >> > > Ah. Just adding a "me too"; I didn't get a chance to debug it yesterday, but I > was seeing similar problems. If I disabled in-kernel pit with -no-kvm-pit, all > was well. > How to fix it, though? the only idea that comes to mind is to affine the hrtimer with vcpu0 (like the local apic timers) which would mean we only need to unwait the waitqueue, and never need to send the IPI. Would slightly improve performance as well. -- error compiling committee.c: too many arguments to function |
From: Dor L. <dor...@qu...> - 2008-04-24 13:37:52
|
On Thu, 2008-04-24 at 15:47 +0300, Avi Kivity wrote: > Dor Laor wrote: > > while investigating the revert of "fix sci irq set when acpi timer" I > > discovered the reason. Please also re-revert the original patch. > > > > Applied, but system_powerdown still doesn't work with the sci acpi timer > fix. > > With the above patch and revert the reversion of "fix sci irq set when acpi timer" it works for me. Can you give some details or try again? My host is 2.6.24.3-50-fc8, your kvm head (user da00aa5090cf2de4259960ea659ac783bd65a041 + re-revert, kernel 1736e3f0a3e173a7f452b32ad19bfff085a1a255). Guest is winXp with acpi hal. After it boots I go to the monitor and issue a system_powerdown. Everything works and there is no segfaults (tested multiple time) Dor |
From: Chris L. <cla...@re...> - 2008-04-24 13:33:24
|
Avi Kivity wrote: > > Ok. __pit_timer_fn() is called from an interrupt, which then calls > smp_call_function_single(), which calls spin_lock(). If we've already > taken the lock, we hang. > Ah. Just adding a "me too"; I didn't get a chance to debug it yesterday, but I was seeing similar problems. If I disabled in-kernel pit with -no-kvm-pit, all was well. Chris Lalancette |
From: Avi K. <av...@qu...> - 2008-04-24 13:33:23
|
David Miller wrote: >> If no objections are raised, and if the vger postmasters agree, I will >> mass subscribe the current subscribers so that there will be no service >> interruption. >> > > I'm fine with adding the list, but we don't do mass subscribes like > that, and this rule applies to every list which moves to vger from > somewhere else. > > (three lists) > People will need to subscribe themselves to the new list. > Okay. Will give people a week's notice to subscribe. -- error compiling committee.c: too many arguments to function |
From: David M. <da...@da...> - 2008-04-24 13:26:40
|
From: Avi Kivity <av...@qu...> Date: Thu, 24 Apr 2008 15:54:10 +0300 > I propose moving the kvm lists to vger.kernel.org, for the following > benefits: > > - better spam control > - faster service (I see significant lag with the sourceforge lists) > - no ads appended to the end of each email > > If no objections are raised, and if the vger postmasters agree, I will > mass subscribe the current subscribers so that there will be no service > interruption. I'm fine with adding the list, but we don't do mass subscribes like that, and this rule applies to every list which moves to vger from somewhere else. People will need to subscribe themselves to the new list. We enforce this for two reasons: 1) If they don't go through the action of subscribing, they won't be able to unsubscribe, so they'll be upset and also bug us postmasters for help. 2) People are generally unhappy when they are subscribed to mailing lists at a site they didn't ask to be at. |
From: Glauber C. <gl...@gm...> - 2008-04-24 13:25:32
|
On Sat, Apr 19, 2008 at 6:11 PM, Glauber Costa <gl...@gm...> wrote: > > On Fri, Apr 18, 2008 at 1:27 PM, Avi Kivity <av...@qu...> wrote: > > > > Glauber de Oliveira Costa wrote: > > > > > Hi, > > > I've got some qemu crashes while trying to passthrough an ide device > > > to a kvm guest. After some investigation, it turned out that > > register_ioport_{read/write} will abort on errors instead of returning > > > a meaningful error. > > > > > > However, even if we do return an error, the asynchronous nature of pci > > > config space mapping updates makes it a little bit hard to treat. > > > > > > This series of patches basically treats errors in the mapping functions in > > > the pci layer. If anything goes wrong, we unregister the pci device, > > unmapping > > > any mappings that happened to be sucessfull already. > > > > > > After these patches are applied, a lot of warnings appears. And, you know, > > > everytime there is a warning, god kills a kitten. But I'm not planning on > > > touching the other pieces of qemu code for this until we set up (or not) > > in > > > this solution > > > > > > Comments are very welcome, specially from qemu folks (since it is a bit > > invasive) > > > > > > > > > > > > > Have you considered, instead of rolling back the changes you already made > > before the failure, to have a function which checks if an ioport > > registration will be successful? This may simplify the code. > > > Yes, I did. > > Basic problem is that I basically could not find this information > handy until we were deep in the stack, right before calling the update > mapping functions. I turned out preferring this option. I can, > however, take a fresh look at that. > Looked at this again, and it does seem to me that we don't have too much to gain from a "test-before" solution. We definitely can't test it reliably until update_mappings arrive, (since the mapping can change) and by this time, the pci device is already registered, and we would have to de-register it anyway. There is room for "improvement" (with a wide definition of improvement) if we test all the ports of a device in advance (inside update_mappings) instead of a port-by-port basis. We could get rid of the flag, but it would be traded off by another complexities. So unless someone have a very direct alternate solution for this I'm failing to see, I do advocate for those humble patches. -- Glauber Costa. "Free as in Freedom" http://glommer.net "The less confident you are, the more serious you have to act." |
From: Avi K. <av...@qu...> - 2008-04-24 13:21:25
|
Avi Kivity wrote: > Yang, Sheng wrote: >> On Thursday 24 April 2008 19:37:03 Avi Kivity wrote: >> >>> Yunfeng Zhao wrote: >>> >>>> Hi All, >>>> >>>> This is today's KVM test result against kvm.git >>>> 873c05fa7e6fea27090b1bf0f67a073eadb04782 and kvm-userspace.git >>>> d102d750f397b543fe620a3c77a7e5e42c483865. >>>> >>> I suspect 873c05fa7e6fea27090b1bf0f67a073eadb04782 itself, it's the >>> only >>> thing that has any chance of badness. >>> >>> Marcelo, any idea? Perhaps due to load, interrupts accumulate and >>> can't >>> be injected fast enough? >>> >>> These tests are run on a 2.6.22 host, which has a hacked >>> smp_call_function_single() in external-module-compat.h, which may >>> exaberate the problem. >>> >> >> Yeah, I suspect the commit too(I tried tip without that, and found >> mostly alright). In fact, I didn't use kvm_vcpu_kick() just because >> that I found this function may causing hang on my host... But I >> didn't do more investigate so I can't tell what's wrong, then I just >> chose way to keep it working... I am sorry for not clarify... >> > > I think smp_call_function_single() is miscompiled when using the > compatibility code. I took it out-of-line to be sure (it is now in > kernel/external-module-compat.c). > > No evidence, but... > Ok. __pit_timer_fn() is called from an interrupt, which then calls smp_call_function_single(), which calls spin_lock(). If we've already taken the lock, we hang. -- error compiling committee.c: too many arguments to function |
From: Glauber C. <gl...@gm...> - 2008-04-24 13:20:42
|
On Mon, Apr 21, 2008 at 4:15 AM, Gerd Hoffmann <kr...@re...> wrote: > Marcelo Tosatti wrote: > >> >From what me and marcelo discussed, I think there's a possibility that > >> it has marginally something to do with precision of clock calculation. > >> Gerd's patches address that issues. Can somebody test this with those > >> patches (both guest and host), while I'm off ? > > > > Haven't seen Gerd's guest patches ? > > I'm still busy cooking them up. I've mentioned them in a mail, but they > didn't ran over the list (yet). Stay tuned ;) > > cheers, > Gerd > Just saw Gerd's patches flying around. Can anyone that is able to reproduce this problem (a subgroup of human beings that does not include me) test it with them applied? If it still fails, please let me know asap -- Glauber Costa. "Free as in Freedom" http://glommer.net "The less confident you are, the more serious you have to act." |
From: Glauber C. <gl...@gm...> - 2008-04-24 13:16:41
|
> - > - delta = kvm_clock_read(); > - > native_write_msr(MSR_KVM_WALL_CLOCK, low, high); > - do { > - version = wall_clock.wc_version; > - rmb(); > - wc_sec = wall_clock.wc_sec; > - wc_nsec = wall_clock.wc_nsec; > - rmb(); > - } while ((wall_clock.wc_version != version) || (version & 1)); > - > - delta = kvm_clock_read() - delta; > - delta += wc_nsec; > - nsec = do_div(delta, NSEC_PER_SEC); > - set_normalized_timespec(&ts, wc_sec + delta, nsec); > - /* > - * Of all mechanisms of time adjustment I've tested, this one > - * was the champion! > - */ > - return ts.tv_sec + 1; > + > + vcpu_time = &get_cpu_var(hv_clock); > + pvclock_read_wallclock(&wall_clock, vcpu_time, &ts); > + put_cpu_var(hv_clock); > + > + return ts.tv_sec; > } Here it is. Despite some needed cleanups, patches look beautiful to me. -- Glauber Costa. "Free as in Freedom" http://glommer.net "The less confident you are, the more serious you have to act." |
From: Glauber C. <gl...@gm...> - 2008-04-24 13:11:00
|
> +void pvclock_read_wallclock(struct kvm_wall_clock *wall_clock, > + struct kvm_vcpu_time_info *vcpu_time, > + struct timespec *ts) > +{ > + u32 version; > + u64 delta; > + struct timespec now; > + > + /* get wallclock at system boot */ > + do { > + version = wall_clock->wc_version; > + rmb(); /* fetch version before time */ > + now.tv_sec = wall_clock->wc_sec; > + now.tv_nsec = wall_clock->wc_nsec; > + rmb(); /* fetch time before checking version */ > + } while ((wall_clock->wc_version & 1) || (version != wall_clock->wc_version)); > + > + delta = pvclock_clocksource_read(vcpu_time); /* time since system boot */ > + delta += now.tv_sec * (u64)NSEC_PER_SEC + now.tv_nsec; > + > + now.tv_nsec = do_div(delta, NSEC_PER_SEC); > + now.tv_sec = delta; > + > + set_normalized_timespec(ts, now.tv_sec, now.tv_nsec); > +} This is not exactly what kvm does. For us, wallclock read and system time reads are decoupled operations, controlled by different msrs. This function might exist, but in this case, it have to be wrapped around a kvm_read_wallclock(), that does the msr read. (I'm not sure whether or not you do it in your later patches, doing sequential reads :-) ) > ------------------------------------------------------------------------- > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > Don't miss this year's exciting event. There's still time to save $100. > Use priority code J8TL2D2. > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ads! ads! -- Glauber Costa. "Free as in Freedom" http://glommer.net "The less confident you are, the more serious you have to act." |
From: Glauber C. <gc...@re...> - 2008-04-24 13:04:09
|
Gerd Hoffmann wrote: > Glauber Costa wrote: >> Gerd Hoffmann wrote: >>> Jeremy Fitzhardinge wrote: >>>> Xen could change the parameters in the instant after >>>> get_time_values(). That change could be as a result of >>>> suspend-resume, so the parameters >>>> and the tsc could be wildly different. >>> Ah, ok, forgot the rdtsc in the picture. With that in mind I fully >>> agree that the loop is needed. I think kvm guests can even hit that one >>> with the vcpu migrating to a different physical cpu, so we better handle >>> it correctly ;) >> It's probably not needed for kvm, since we update everything everytime >> we get scheduled in the host side, which would cover the case for >> migration between physical cpus. > > No, it wouldn't. The corner case we must catch is: guest reads time > info, kvm reschedules the guest to another pcpu, guest reads the tsc. > The time info used by the guest for the tsc delta is stale then, it > belongs to the previous pcpu. > > cheers, > Gerd > Agreed. |