You can subscribe to this list here.
2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(33) |
Nov
(325) |
Dec
(320) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2007 |
Jan
(484) |
Feb
(438) |
Mar
(407) |
Apr
(713) |
May
(831) |
Jun
(806) |
Jul
(1023) |
Aug
(1184) |
Sep
(1118) |
Oct
(1461) |
Nov
(1224) |
Dec
(1042) |
2008 |
Jan
(1449) |
Feb
(1110) |
Mar
(1428) |
Apr
(1643) |
May
(682) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Andrea A. <an...@qu...> - 2008-04-26 14:10:30
|
Hello, after updating kvm-userland.git, kvm.git and linux-2.6-hg, and after make distclean and rebuild with slightly reduced .config, I can't compile the external module anymore. Looking into it with V=1, $(src) defines to "" and including /external-module-compat.h clearly fails. I fixed it like below, because it seems more consistent to enforce the ordering of the "special" includes in the same place, notably $(src)/include is already included as $LINUX at point 1 of the comment, so this looks a cleanup of superflous line in Kconfig besides fixing my compile by moving the external-module-compat in the same place with the other includes where `pwd` works instead of $(src) that doesn't work anymore for whatever reason. Signed-off-by: Andrea Arcangeli <an...@qu...> diff --git a/kernel/Kbuild b/kernel/Kbuild index cabfc75..d9245eb 100644 --- a/kernel/Kbuild +++ b/kernel/Kbuild @@ -1,4 +1,3 @@ -EXTRA_CFLAGS := -I$(src)/include -include $(src)/external-module-compat.h obj-m := kvm.o kvm-intel.o kvm-amd.o kvm-objs := kvm_main.o x86.o mmu.o x86_emulate.o anon_inodes.o irq.o i8259.o \ lapic.o ioapic.o preempt.o i8254.o external-module-compat.o diff --git a/kernel/Makefile b/kernel/Makefile index 78ff923..e3fccbe 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -27,7 +27,8 @@ all:: # include header priority 1) $LINUX 2) $KERNELDIR 3) include-compat $(MAKE) -C $(KERNELDIR) M=`pwd` \ LINUXINCLUDE="-I`pwd`/include -Iinclude -I`pwd`/include-compat \ - -include include/linux/autoconf.h" \ + -include include/linux/autoconf.h \ + -include `pwd`/external-module-compat.h" "$$@" sync: header-sync source-sync |
From: Andrea A. <an...@qu...> - 2008-04-26 14:04:05
|
On Sat, Apr 26, 2008 at 08:17:34AM -0500, Robin Holt wrote: > Since this include and the one for mm_types.h both are build breakages > for ia64, I think you need to apply your ia64_cpumask and the following > (possibly as a single patch) first or in your patch 1. Without that, > ia64 doing a git-bisect could hit a build failure. Agreed, so it doesn't risk to break ia64 compilation, thanks for the great XPMEM feedback! Also note, I figured out that mmu_notifier_release can actually run concurrently against other mmu notifiers in case there's a vmtruncate (->release could already run concurrently if invoked by _unregister, the only guarantee is that ->release will be called one time and only one time and that no mmu notifier will ever run after _unregister returns). In short I can't keep the list_del_init in _release and I need a list_del_init_rcu instead to fix this minor issue. So this won't really make much difference after all. I'll release #v14 with all this after a bit of kvm testing with it... diff --git a/include/linux/list.h b/include/linux/list.h --- a/include/linux/list.h +++ b/include/linux/list.h @@ -755,6 +755,14 @@ static inline void hlist_del_init(struct } } +static inline void hlist_del_init_rcu(struct hlist_node *n) +{ + if (!hlist_unhashed(n)) { + __hlist_del(n); + n->pprev = NULL; + } +} + /** * hlist_replace_rcu - replace old entry by new one * @old : the element to be replaced diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -22,7 +22,10 @@ struct mmu_notifier_ops { /* * Called either by mmu_notifier_unregister or when the mm is * being destroyed by exit_mmap, always before all pages are - * freed. It's mandatory to implement this method. + * freed. It's mandatory to implement this method. This can + * run concurrently to other mmu notifier methods and it + * should teardown all secondary mmu mappings and freeze the + * secondary mmu. */ void (*release)(struct mmu_notifier *mn, struct mm_struct *mm); diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -19,12 +19,13 @@ /* * This function can't run concurrently against mmu_notifier_register - * or any other mmu notifier method. mmu_notifier_register can only - * run with mm->mm_users > 0 (and exit_mmap runs only when mm_users is - * zero). All other tasks of this mm already quit so they can't invoke - * mmu notifiers anymore. This can run concurrently only against - * mmu_notifier_unregister and it serializes against it with the - * mmu_notifier_mm->lock in addition to RCU. struct mmu_notifier_mm + * because mm->mm_users > 0 during mmu_notifier_register and exit_mmap + * runs with mm_users == 0. Other tasks may still invoke mmu notifiers + * in parallel despite there's no task using this mm anymore, through + * the vmas outside of the exit_mmap context, like with + * vmtruncate. This serializes against mmu_notifier_unregister with + * the mmu_notifier_mm->lock in addition to SRCU and it serializes + * against the other mmu notifiers with SRCU. struct mmu_notifier_mm * can't go away from under us as exit_mmap holds a mm_count pin * itself. */ @@ -44,7 +45,7 @@ void __mmu_notifier_release(struct mm_st * to wait ->release to finish and * mmu_notifier_unregister to return. */ - hlist_del_init(&mn->hlist); + hlist_del_init_rcu(&mn->hlist); /* * SRCU here will block mmu_notifier_unregister until * ->release returns. @@ -185,6 +186,8 @@ int mmu_notifier_register(struct mmu_not * side note: mmu_notifier_release can't run concurrently with * us because we hold the mm_users pin (either implicitly as * current->mm or explicitly with get_task_mm() or similar). + * We can't race against any other mmu notifiers either thanks + * to mm_lock(). */ spin_lock(&mm->mmu_notifier_mm->lock); hlist_add_head(&mn->hlist, &mm->mmu_notifier_mm->list); |
From: Robin H. <ho...@sg...> - 2008-04-26 13:17:33
|
On Thu, Apr 24, 2008 at 07:41:45PM +0200, Andrea Arcangeli wrote: > A full new update will some become visible here: > > http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.25/mmu-notifier-v14-pre3/ I grabbed these and built them. Only change needed was another include. After that, everything built fine and xpmem regression tests ran through the first four sets. The fifth is the oversubscription test which trips my xpmem bug. This is as good as the v12 runs from before. Since this include and the one for mm_types.h both are build breakages for ia64, I think you need to apply your ia64_cpumask and the following (possibly as a single patch) first or in your patch 1. Without that, ia64 doing a git-bisect could hit a build failure. Index: mmu_v14_pre3_xpmem_v003_v1/include/linux/srcu.h =================================================================== --- mmu_v14_pre3_xpmem_v003_v1.orig/include/linux/srcu.h 2008-04-26 06:41:54.000000000 -0500 +++ mmu_v14_pre3_xpmem_v003_v1/include/linux/srcu.h 2008-04-26 07:01:17.292071827 -0500 @@ -27,6 +27,8 @@ #ifndef _LINUX_SRCU_H #define _LINUX_SRCU_H +#include <linux/mutex.h> + struct srcu_struct_array { int c[2]; }; |
From: LINkeR <li...@in...> - 2008-04-26 13:06:14
|
Hello It is some solution, how to set up public IP address directly to guest system? I my test case, now I use only situation, when host system have all public addresses and I set up NAT to guests. Some applications, like VOIP, is difficult to run behind NAT. Then, the problem is running this kind of applications in guests systems. If you have public IP subnet, for example /29, is not problem to set up 1 IP to host and make route to guest. But if you have few servers in server housing, nobody give you /29. Usually you have only few IP addresses each from different subnet. I don't know if this is question for KVM or qemu. Thanks for any hints Tomas Rusnak -- ------------------------------------------------------------------------- ICQ: 147137905 Skype: linker83 |
From: Mitha <tec...@MA...> - 2008-04-26 12:59:59
|
My doctor cannot help asking me how I grew so big http://www.islound.com/ |
From: Michal L. <ml...@lo...> - 2008-04-26 12:12:13
|
Hi, I've experienced a kernel Oops on 2.6.24 with kvm 66 on AMD in 64bit mode while starting up WinXP: kvm: emulating exchange as write Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: [<ffffffff883a7a5a>] :kvm:x86_emulate_insn+0x3fa/0x4240 PGD 7658d067 PUD 242a6067 PMD 0 Oops: 0002 [1] SMP CPU 0 Modules linked in: bridge llc reiserfs tun kvm_amd kvm nfs nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs w83627ehf hwmon_vid autofs4 snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device iptable_filter ip_tables ip6table_filter ip6_tables x_tables af_packet ipv6 fuse ext2 loop snd_hda_intel snd_pcm snd_timer k8temp i2c_nforce2 i2c_core hwmon sr_mod button cdrom snd soundcore snd_page_alloc forcedeth sg floppy linear sd_mod ehci_hcd ohci_hcd usbcore dm_snapshot edd dm_mod fan sata_nv pata_amd libata scsi_mod thermal processor Pid: 3139, comm: qemu-system-x86 Not tainted 2.6.24-mludvig #1 RIP: 0010:[<ffffffff883a7a5a>] [<ffffffff883a7a5a>] :kvm:x86_emulate_insn+0x3fa/0x4240 RSP: 0018:ffff8100609fdc18 EFLAGS: 00010246 RAX: 000000008001003b RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8100609fe000 RBP: ffff8100609ff320 R08: ffff8100609ff3c0 R09: 0000000000000006 R10: 0000000000000002 R11: 0000000000000000 R12: ffff8100609ff368 R13: ffff8100609ff3c0 R14: ffffffff883be600 R15: 0000000001971353 FS: 0000000040804950(0063) GS:ffffffff8053e000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 000000006bb74000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process qemu-system-x86 (pid: 3139, threadinfo ffff8100609fc000, task ffff8100794d5680) Stack: 00000000609fdc74 0000000012187318 000000000ea5f068 ffff8100609ff3c0 ffff8100609fdc94 ffffffff8839c9e0 0000000000000000 ffff8100609fe000 ffff8100609ff320 0000000000000000 0000000000000000 0000000000000000 Call Trace: [<ffffffff8839c9e0>] :kvm:kvm_get_cs_db_l_bits+0x20/0x40 [<ffffffff8839dd2f>] :kvm:emulate_instruction+0x1bf/0x340 [<ffffffff883c1a22>] :kvm_amd:emulate_on_interception+0x12/0x60 [<ffffffff883a11d9>] :kvm:kvm_arch_vcpu_ioctl_run+0x169/0x6d0 [<ffffffff8839c14c>] :kvm:kvm_vcpu_ioctl+0x41c/0x440 [<ffffffff802305f3>] __wake_up+0x43/0x70 [<ffffffff803374c1>] __up_read+0x21/0xb0 [<ffffffff802586ec>] futex_wake+0xcc/0xf0 [<ffffffff80259559>] do_futex+0x129/0xbb0 [<ffffffff8022e7bd>] __dequeue_entity+0x3d/0x50 [<ffffffff8839b925>] :kvm:kvm_vm_ioctl+0x85/0x200 [<ffffffff802ab10f>] do_ioctl+0x2f/0xa0 [<ffffffff802ab3a0>] vfs_ioctl+0x220/0x2d0 [<ffffffff802ab4e1>] sys_ioctl+0x91/0xb0 [<ffffffff8020bcae>] system_call+0x7e/0x83 Code: 66 89 02 e9 ee fc ff ff 48 8b 95 88 00 00 00 48 8b 45 78 88 RIP [<ffffffff883a7a5a>] :kvm:x86_emulate_insn+0x3fa/0x4240 RSP <ffff8100609fdc18> CR2: 0000000000000000 ---[ end trace d358bab3f035112e ]--- The host is still alive but the XP guest is locked up in a boot screen. Michal |
From: Mohammed G. <m.g...@gm...> - 2008-04-26 09:28:56
|
On Sat, Apr 26, 2008 at 9:49 AM, Avi Kivity <av...@qu...> wrote: > Anthony Liguori wrote: > > > The second stage is to use a loop of x86_emulate() to run all 16-bit code > (instead of using vm86 mode). This will allow us to support guests that use > big real mode. > > > > > > > > Why do that unconditionally, instead of only when in a big-real-mode state? Is big-real-mode the only state where we have problems? |
From: Avi K. <av...@qu...> - 2008-04-26 08:09:20
|
Marcelo Tosatti wrote: > On Wed, Apr 23, 2008 at 09:30:06AM +0300, Avi Kivity wrote: > >>> as I got no reply, I guess it is a bad setup on my part. If that might >>> help, this happenned while I was doing a "make -j" on webkit svn tree >>> (ie. heavy c++ compilation workload) . >>> >>> >>> >> No this is not bad setup. No amount of bad setup should give this warning. >> >> You didn't get a reply because no one knows what to make of it, and >> because it's much more fun to debate endianess or contemplete guests >> with eighty thousand disks than to fix those impossible bugs. If you >> can give clear instructions on how to reproduce this, we will try it >> out. Please be sure to state OS name and versions for the guest as well >> as the host. >> > > It is valid to have more than PAGES_PER_HPAGE in the largepage's > shadowed count. If the gpte read races with a pte-update-from-guest (and > the pte update results in a different sp->role), it might account twice > for a single gfn. > > Such "zombie" shadow pages should eventually be removed through > recycling, allowing for instantiation of a large page, unless references > can be leaked. Can't spot such leakage problem though. > > That strikes me as unlikely (though a valid scenario). An alternative explanation is that we're seeing a nonpae guest, so each page can be shadowed in two different roles (two quadrants for a pte page) or even four (for a pgd page). Thomas, are you running a 32-bit nonpae guest? -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. |
From: Avi K. <av...@qu...> - 2008-04-26 07:17:01
|
Ryan Harper wrote: > There is a race between when the vcpu thread issues a create ioctl and when > apic_reset() gets called resulting in getting a badfd error. > > The problem is indeed there, but the fix is wrong: > main thread vcpu thread > diff --git a/qemu/qemu-kvm.c b/qemu/qemu-kvm.c > index 78127de..3513e8c 100644 > --- a/qemu/qemu-kvm.c > +++ b/qemu/qemu-kvm.c > @@ -31,7 +31,9 @@ extern int smp_cpus; > static int qemu_kvm_reset_requested; > > pthread_mutex_t qemu_mutex = PTHREAD_MUTEX_INITIALIZER; > +pthread_mutex_t vcpu_mutex = PTHREAD_MUTEX_INITIALIZER; > pthread_cond_t qemu_aio_cond = PTHREAD_COND_INITIALIZER; > +pthread_cond_t qemu_vcpuup_cond = PTHREAD_COND_INITIALIZER; > __thread struct vcpu_info *vcpu; > > struct qemu_kvm_signal_table { > @@ -369,6 +371,11 @@ static void *ap_main_loop(void *_env) > sigfillset(&signals); > sigprocmask(SIG_BLOCK, &signals, NULL); > kvm_create_vcpu(kvm_context, env->cpu_index); > + /* block until cond_wait occurs */ > + pthread_mutex_lock(&vcpu_mutex); > + /* now we can signal */ > + pthread_cond_signal(&qemu_vcpuup_cond); > + pthread_mutex_unlock(&vcpu_mutex); > kvm_qemu_init_env(env); > kvm_main_loop_cpu(env); > return NULL; > @@ -388,9 +395,10 @@ static void kvm_add_signal(struct qemu_kvm_signal_table *sigtab, int signum) > > void kvm_init_new_ap(int cpu, CPUState *env) > { > + pthread_mutex_lock(&vcpu_mutex); > pthread_create(&vcpu_info[cpu].thread, NULL, ap_main_loop, env); > - /* FIXME: wait for thread to spin up */ > - usleep(200); > + pthread_cond_wait(&qemu_vcpuup_cond, &vcpu_mutex); > pthread_cond_wait() is never correct outside a loop. The signal may arrive before wait is called. The usual idiom is while (condition is not fulfilled) pthread_cond_wait(); I see you have something there to ensure we block, but please use the right idiom. > + pthread_mutex_unlock(&vcpu_mutex); > } > Please reuse qemu_mutex for this, no need for a new one. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. |
From: Avi K. <av...@qu...> - 2008-04-26 06:50:20
|
Anthony Liguori wrote: > The second stage is to use a loop of x86_emulate() to run all 16-bit > code (instead of using vm86 mode). This will allow us to support guests > that use big real mode. > > Why do that unconditionally, instead of only when in a big-real-mode state? -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. |
From: Avi K. <av...@qu...> - 2008-04-26 06:47:36
|
David Abrahams wrote: > If I suspend my host while running a Windows XP guest, the whole machine > crashes, so I was hoping to automate hibernation of the guest OS and > integrate that into my host's suspend process. Does anyone know how to > do that? > > It's doable (not sure how), but kvm ought not to crash when resuming. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. |
From: Avi K. <av...@qu...> - 2008-04-26 06:46:56
|
David Abrahams wrote: > Cam Macdonell wrote: > >> David Abrahams wrote: >> >>> If I suspend my host while running a Windows XP guest, the whole machine >>> crashes, so I was hoping to automate hibernation of the guest OS and >>> integrate that into my host's suspend process. Does anyone know how to >>> do that? >>> >>> >> Hi Dave, >> >> What host OS are you running? >> > > Ubuntu 8.04 Hardy Heron > It's certainly new enough to support suspend/resume with running VMs. What kernel does it have? -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. |
From: Avi K. <av...@qu...> - 2008-04-26 06:45:37
|
David S. Ahern wrote: > David S. Ahern wrote: > >> Avi Kivity wrote: >> >> >>> David S. Ahern wrote: >>> >>> >>>> I added the traces and captured data over another apparent lockup of >>>> the guest. >>>> This seems to be representative of the sequence (pid/vcpu removed). >>>> >>>> (+4776) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 >>>> c016127c ] >>>> (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 >>>> c0009db4 ] >>>> (+3632) VMENTRY >>>> (+4552) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 >>>> c016104a ] >>>> (+ 0) PAGE_FAULT [ errorcode = 0x0000000b, virt = 0x00000000 >>>> fffb61c8 ] >>>> (+ 54928) VMENTRY >>>> >>>> >>> Can you oprofile the host to see where the 54K cycles are spent? >>> >>> > > Most of the cycles (~80% of that 54k+) are spent in paging64_prefetch_page(): > > for (i = 0; i < PT64_ENT_PER_PAGE; ++i) { > gpa_t pte_gpa = gfn_to_gpa(sp->gfn); > pte_gpa += (i+offset) * sizeof(pt_element_t); > > r = kvm_read_guest_atomic(vcpu->kvm, pte_gpa, &pt, > sizeof(pt_element_t)); > if (r || is_present_pte(pt)) > sp->spt[i] = shadow_trap_nonpresent_pte; > else > sp->spt[i] = shadow_notrap_nonpresent_pte; > } > > This loop is run 512 times and takes a total of ~45k cycles, or ~88 cycles per > loop. > > This function gets run >20,000/sec during some of the kscand loops. > > We really ought to optimize it. That's second order however. The real fix is making sure it isn't called so often. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. |
From: Avi K. <av...@qu...> - 2008-04-26 06:43:42
|
David S. Ahern wrote: > What is the rip (GUEST_RIP) value in the VMEXIT trace output? Is that the > current instruction pointer for the guest? > > Yes. > I take it the virt in the PAGE_FAULT trace output is the virtual address the > guest was referencing when the page fault occurred. What I don't understand (one > of many things really) is what the 0xfffb63b0 corresponds to in the guest. Any > ideas? > > I'm pretty sure it is the kmap_atomic() pte. The guest wants to update a pte (call it pte1), which is in HIGHMEM, so it doesn't have a permanent mapping for it. It calls kmap_atomic() which sets up another pte (pte2, two writes), and then accesses pte1 through pte2. > Also, the expensive page fault occurs on errorcode = 0x0000000b (PAGE_FAULT > trace data). What does the 4th bit in 0xb mean? bit 0 set means > PFERR_PRESENT_MASK is set, and bit 1 means PT_WRITABLE_MASK. What is bit 3? > Bit 3 is the reserved bit, which means the shadow pte has an illegal bit combination. kvm sets up vmx to forward non-persent page faults (bit 0 clear) directly to the guest, so it needs some other pattern to get a trapping fault. IOW, there are two types of non-present shadow ptes in kvm: trapping ones (where we don't know what the guest pte looks like) and nontrapping ones (where we know the guest pte is not present, so we forward the fault directly to the guest). The first type is encoded with the reserved bit and present bit set, the second with both of them clear. You can disable this trickery using the bypass_guest_pf module parameter. It should be useful to try it, we'll see the forwarded faults as well. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. |
From: Avi K. <av...@qu...> - 2008-04-26 06:21:14
|
David S. Ahern wrote: > Avi Kivity wrote: > >> Ah! The flood detector is not seeing the access through the >> kmap_atomic() pte, because that access has gone through the emulator. >> last_updated_pte_accessed(vcpu) will never return true. >> >> Can you verify that last_updated_pte_accessed(vcpu) indeed always >> returns false? >> >> > > It returns both true and false. I added a tracer to kvm_mmu_pte_write() to dump > the rc of last_updated_pte_accessed(vcpu). ie., > pte_access = last_updated_pte_accessed(vcpu); > KVMTRACE_1D(PTE_ACCESS, vcpu, (u32) pte_access, handler); > > A sample: > > (+ 4488) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 c016104a ] > (+ 0) PAGE_FAULT [ errorcode = 0x0000000b, virt = 0x00000000 fffb63b0 ] > (+ 2480) PAGE_FAULT1 [ write_count = 0 ] > (+ 424) PAGE_FAULT2 [ level = 2 metaphysical = 0 access 0x00000007 ] > (+ 51672) PAGE_FAULT3 > (+ 472) PAGE_FAULT4 > (+ 704) PAGE_FAULT5 [ shadow_ent = 0x80000001 2dfb5043 ] > (+ 1496) VMENTRY > (+ 4568) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 c01610e7 ] > (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 c0009db4 ] > (+ 2352) PAGE_FAULT1 [ write_count = 0 ] > (+ 728) PAGE_FAULT5 [ shadow_ent = 0x00000001 91409041 ] > (+ 0) PTE_WRITE [ gpa = 0x00000000 00009db4 gpte = 0x00000000 41fb5363 ] > (+ 0) PTE_ACCESS [ pte_access = 1 ] > (+ 6864) VMENTRY > (+ 3896) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 c01610ee ] > (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 c0009db0 ] > (+ 2376) PAGE_FAULT1 [ write_count = 1 ] > (+ 720) PAGE_FAULT5 [ shadow_ent = 0x00000001 91409041 ] > (+ 0) PTE_WRITE [ gpa = 0x00000000 00009db0 gpte = 0x00000000 00000000 ] > (+ 0) PTE_ACCESS [ pte_access = 0 ] > (+ 12344) VMENTRY > (+ 4688) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 c016127c ] > (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 c0009db4 ] > (+ 2416) PAGE_FAULT1 [ write_count = 2 ] > (+ 792) PAGE_FAULT5 [ shadow_ent = 0x00000001 91409043 ] > (+ 1128) VMENTRY > (+ 4512) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 c016104a ] > (+ 0) PAGE_FAULT [ errorcode = 0x0000000b, virt = 0x00000000 fffb63b0 ] > (+ 2448) PAGE_FAULT1 [ write_count = 0 ] > (+ 448) PAGE_FAULT2 [ level = 2 metaphysical = 0 access 0x00000007 ] > (+ 51520) PAGE_FAULT3 > (+ 432) PAGE_FAULT4 > (+ 696) PAGE_FAULT5 [ shadow_ent = 0x80000001 2df5a043 ] > (+ 1480) VMENTRY > > Strange... there should be at least two pte_access = 0 traces in there before flooding can occur, according to my reading of the code. The counter needs to go up to 3 somehow. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. |
From: Ryan H. <ry...@us...> - 2008-04-26 05:33:23
|
* Ryan Harper <ry...@us...> [2008-04-26 00:27]: > There is a race between when the vcpu thread issues a create ioctl and when > apic_reset() gets called resulting in getting a badfd error. > > main thread vcpu thread guilt refresh clipped my text short. main thread vcpu thread ----------- ----------- qemu/hw/pc.c:pc_new_cpu() cpu_init() cpu_x86_init() kvm_init_new_ap() ap_main_loop() *blocks* usleep() apic_init() kvm_set_lapic() kvm_ioctl with unitilized context badfd To fix this, ensure we create the vcpu in the vcpu thread before returning from kvm_init_new_ap. Synchronize on a new mutux, vcpu_mutex, and wait for the vcpuup condition before signaling to ensure the main thread is waiting before we send the signal. With this patch, I can launch 64 kvm guests, 1 second apart and not see any Bad File descriptor errors. Signed-off-by: Ryan Harper <ry...@us...> -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx (512) 838-9253 T/L: 678-9253 ry...@us... |
From: Ryan H. <ry...@us...> - 2008-04-26 05:27:19
|
There is a race between when the vcpu thread issues a create ioctl and when apic_reset() gets called resulting in getting a badfd error. main thread vcpu thread diff --git a/qemu/qemu-kvm.c b/qemu/qemu-kvm.c index 78127de..3513e8c 100644 --- a/qemu/qemu-kvm.c +++ b/qemu/qemu-kvm.c @@ -31,7 +31,9 @@ extern int smp_cpus; static int qemu_kvm_reset_requested; pthread_mutex_t qemu_mutex = PTHREAD_MUTEX_INITIALIZER; +pthread_mutex_t vcpu_mutex = PTHREAD_MUTEX_INITIALIZER; pthread_cond_t qemu_aio_cond = PTHREAD_COND_INITIALIZER; +pthread_cond_t qemu_vcpuup_cond = PTHREAD_COND_INITIALIZER; __thread struct vcpu_info *vcpu; struct qemu_kvm_signal_table { @@ -369,6 +371,11 @@ static void *ap_main_loop(void *_env) sigfillset(&signals); sigprocmask(SIG_BLOCK, &signals, NULL); kvm_create_vcpu(kvm_context, env->cpu_index); + /* block until cond_wait occurs */ + pthread_mutex_lock(&vcpu_mutex); + /* now we can signal */ + pthread_cond_signal(&qemu_vcpuup_cond); + pthread_mutex_unlock(&vcpu_mutex); kvm_qemu_init_env(env); kvm_main_loop_cpu(env); return NULL; @@ -388,9 +395,10 @@ static void kvm_add_signal(struct qemu_kvm_signal_table *sigtab, int signum) void kvm_init_new_ap(int cpu, CPUState *env) { + pthread_mutex_lock(&vcpu_mutex); pthread_create(&vcpu_info[cpu].thread, NULL, ap_main_loop, env); - /* FIXME: wait for thread to spin up */ - usleep(200); + pthread_cond_wait(&qemu_vcpuup_cond, &vcpu_mutex); + pthread_mutex_unlock(&vcpu_mutex); } static void qemu_kvm_init_signal_tables(void) |
From: Andrea A. <an...@qu...> - 2008-04-26 00:57:23
|
On Fri, Apr 25, 2008 at 02:25:32PM -0500, Robin Holt wrote: > I think you still need mm_lock (unless I miss something). What happens > when one callout is scanning mmu_notifier_invalidate_range_start() and > you unlink. That list next pointer with LIST_POISON1 which is a really > bad address for the processor to track. Ok, _release list_del_init qcan't race with that because it happens in exit_mmap when no other mmu notifier can trigger anymore. _unregister can run concurrently but it does list_del_rcu, that only overwrites the pprev pointer with LIST_POISON2. The mmu_notifier_invalidate_range_start won't crash on LIST_POISON1 thanks to srcu. Actually I did more changes than necessary, for example I noticed the mmu_notifier_register can return a list_add_head instead of list_add_head_rcu. _register can't race against _release thanks to the mm_users temporary or implicit pin. _register can't race against _unregister thanks to the mmu_notifier_mm->lock. And register can't race against all other mmu notifiers thanks to the mm_lock. At this time I've no other pending patches on top of v14-pre3 other than the below micro-optimizing cleanup. It'd be great to have confirmation that v14-pre3 passes GRU/XPMEM regressions tests as well as my KVM testing already passed successfully on it. I'll forward v14-pre3 mmu-notifier-core plus the below to Andrew tomorrow, I'm trying to be optimistic here! ;) diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -187,7 +187,7 @@ int mmu_notifier_register(struct mmu_not * current->mm or explicitly with get_task_mm() or similar). */ spin_lock(&mm->mmu_notifier_mm->lock); - hlist_add_head_rcu(&mn->hlist, &mm->mmu_notifier_mm->list); + hlist_add_head(&mn->hlist, &mm->mmu_notifier_mm->list); spin_unlock(&mm->mmu_notifier_mm->lock); out_unlock: mm_unlock(mm, &data); |
From: Marcelo T. <mto...@re...> - 2008-04-26 00:50:40
|
Valgrind caught this: ==11754== Conditional jump or move depends on uninitialised value(s) ==11754== at 0x50C9BC: kvm_create_pit (libkvm-x86.c:153) ==11754== by 0x50CA7F: kvm_arch_create (libkvm-x86.c:178) ==11754== by 0x50AB31: kvm_create (libkvm.c:383) ==11754== by 0x4EE691: kvm_qemu_create_context (qemu-kvm.c:616) ==11754== by 0x412031: main (vl.c:9653) Signed-off-by: Marcelo Tosatti <mto...@re...> diff --git a/libkvm/libkvm.c b/libkvm/libkvm.c index 329f29f..adf09a5 100644 --- a/libkvm/libkvm.c +++ b/libkvm/libkvm.c @@ -249,6 +249,7 @@ kvm_context_t kvm_init(struct kvm_callbacks *callbacks, kvm->opaque = opaque; kvm->dirty_pages_log_all = 0; kvm->no_irqchip_creation = 0; + kvm->no_pit_creation = 0; return kvm; out_close: |
From: David A. <da...@bo...> - 2008-04-25 23:31:36
|
Cam Macdonell wrote: > David Abrahams wrote: >> If I suspend my host while running a Windows XP guest, the whole machine >> crashes, so I was hoping to automate hibernation of the guest OS and >> integrate that into my host's suspend process. Does anyone know how to >> do that? >> > > Hi Dave, > > What host OS are you running? Ubuntu 8.04 Hardy Heron -- Dave Abrahams Boost Consulting http://boost-consulting.com |
From: <wes...@si...> - 2008-04-25 22:27:43
|
实用 操作 即时 制造型企业车间管理能力提升训练高级研修班 主办单位:众人行管理咨询 时 间:2OO8 年04月19- 20 日 地 点:深圳人才酒店 时 间:2OO8 年04月26- 27 日 地 点:上海兆安酒店 联系人:凌小姐 电 话:0755-26075365 传真:0755-61351396 费 用:2200元/人(包括资料费、午餐及上下午茶点等) 为什么要培训? 产品品种越来越多,产品生命周期越来越短的市场环境下,生产部生产压力 越来越大:客户要求交货期更短,更准时,且价格更低,品质更好。制造业的竞 争归根结底是生产效率、成本控制与管理能力的竞争。卓越、有效的生产管理能 力. 车间是制造型企业的中心,车间和制造部门管理的好坏,直接影响着产品“质量 、成本、交货期”各项指标的完成,伴随着微利时代的到来和组织结构趋向扁平 化的今天,车间管理在企业中将扮演愈加重要的角色!车间主管既要带领团队完成 各项工作任务,又要有效地管理生产现场的进度、质量、成本和人员。如何进行 车间管理和生产过程控制,如何协同其他部门共同处理现场的各种问题。这已成 为中国企业车间管理人员必须掌握和重视的职业化技能。是一种从技术到管理实 现一体化、系统化的飞跃。 要成为优秀的企业,企业领导必须重视车间管理人员这群不可或缺的人力资源! 而车间管理人员常常面临: l、工作做了不少,每天也忙忙碌碌,管理好象还是理不出头绪,如何有效的推进 车间管理工作? 2、主管要改善,老板要降本,生产现场如何有效发现问题,持续改进? 3、品种多,计划变化频繁,生产任务忽高忽低,如何提高生产车间柔型,有效的 保证生产进度? 4、生产过程不稳定,机器故障和产品质量问题常常发生,如何有效的控制提高质 量和提高设备利用率? 5、现场很多事情需要依靠下属和同级部门共同努力,可是经常是出了问题后,人 人相互推脱,事情一误再误,如何有效的与他人沟通和协调,如何激发下属的主 动性和责任心?内容系统完整、注重实际运用、两天的精心研修,与您共享车间 管理的奥秘! 培训收益: l、明确现场干部的角色定位,掌握车间日常事务管理及人员管理的精髓 2、掌握简单的质量工具改进生产品质的方法 3、学习有效掌握生产进度,控制制造成本的方法 4、学会发现和挖掘问题,掌握用简单工具解决各种车间复杂问题 5、培养设备保养意识,学会运用TPM的方法提高生产力 培训特色: l、可操作性:聚焦于现场的实际操作训练与实践能力提升 2、系统提高:锁定车间管理人员能力点,通过训练,改变管理行为,提升管理技 能 3、寓教于练:知名企业实际案例分析,您的困惑大家解答,您所参加的不是一堂 枯燥的“填鸭”课程 课程内容: 一、找准车间管理人员职责定位 l、车间管理人员的职责与角色认知 2、如何建立好的管理的基础 3、如何成为好的车间现场管理人员 4、车间管理工作的重心与要点 二、如何有效的推进车间管理工作 l、车间整体工作的推进体系 2、车间管理项目的指标化 3、如何将目标与指标展开为具体的实施方案 4、如何有效的分解车间管理目标 5、如何通过报告与例会进行管理追踪 三、如何有效的挖掘车间问题 l、工厂常见问题 2、如何从4M查核各个环节的问题 3、如何寻找“三呆”,消除“三呆” 4、如何建立适宜的标准,作为暴露问题的指针 四、如何运用5S和目视管理 l、为什么5S是工厂管理合理化的根本 2、5S的核心与实质 3、精益目视管理 4、TAKT信息板、发布信息板 5、5S信息板,KANBAN卡片 6、创建和应用不同类型的视觉控制工具 7、5S的效用 8、案例研讨 五、车间计划管理和异常控制 l、生产作业计划的追踪实施 2、如何控制最佳的生产节拍,保持有效产出 3、如何减少运输时间,缩短交期 4、如何提高生产管理系统的柔性 5、运用U型生产线布置方式提高生产线的柔性 6、如何降低换线时间适应生产线的转换 7、如何利用多能工随时调整生产安排 8、如何化解瓶颈环节的制约 9、瓶颈管理概念 lO、如何解决瓶颈 ll、生产负荷平衡 l2、瓶颈管理案例 l3、如何通过快速换型技术实现多品种生产转换 l4、快速换型的概念和方案 l5、换型物料车与换型工具车的使用 l6、换型的过程分析 l7、快速换型案例 l8、针对小批量生产的作业调度与监控 l9、生产调度与即时统计反馈 2O、针对多品种小批量的作业计划方法 六、现场质量改进 l、如何识别质量问题 2、如何运用品管圈活动改进质量管理 3、推移管理与预防性问题发现 4、质量问题的对应流程与要点 5、质量改善活动的四阶段十步骤 七、现场成本控制 企业生存的根本 l、盈亏平衡点学习老板的经营观 2、现场成本管理的主要指标 3、降低制造成本的主要途径 4、减少现场浪费的活动方法 5、放大镜从宏观到微观的CD工具 6、标准成本与标准工时的测定 7、标准成本/标准工时的差异分析 八、现场设备管理TPM l、设备管理的八大支柱 2、数字化的综合效率管理 3、设备的六大损失 4、改善慢性损失,向零故障挑战 5、设备初期清扫与困难源对策 6、自主保养的七步骤 九、车间人员管理 l、新型的上下级关系 2、自我培养与培养下属的意识 3、如何有效的指导与辅导下属 4、如何塑造持续学习与改善的现场氛围 5、如何有效的向上级沟通与汇报 6、同级部门之间沟通与反馈的技巧 7、人际技巧与关系处理 8、激励下属的技巧与方法 十、综合案例分析 讲师介绍: 陶树令 先生:台湾范本、台湾品质学会理事/资深讲师,专注于生产管理、现 场管理课程近十八年、丰田式管理研究小组(TPS)研究员,TQM小组研究员,G/P小 组研究员、台湾工业总会讲师、AT&T台湾区协力厂商OTD研习会全程讲师、中原大 学工业工程研究所及鸿海(富士康)6Sigma管理备案资深顾问师、台湾公共工程 委员会品质学会备案合格讲师、台湾品质学会品管工程师(CQE),可靠度工程师 (CRE)专业训练及鉴定合格。主持“福特、克莱斯勒、通用等三大汽车公司及货车 制造业的QS9000及QSA台湾首版教材编译”、品管圈(QCC)导入及日本住友集团 QCStory发表会编译、亚洲生产力中心,香港全面品质管理(TQM)研讨会台湾地区代 表。 |
From: Cam M. <ca...@cs...> - 2008-04-25 21:57:58
|
David Abrahams wrote: > If I suspend my host while running a Windows XP guest, the whole machine > crashes, so I was hoping to automate hibernation of the guest OS and > integrate that into my host's suspend process. Does anyone know how to > do that? > Hi Dave, What host OS are you running? Cam |
From: Alexander G. <ag...@su...> - 2008-04-25 21:19:06
|
On Apr 25, 2008, at 10:39 PM, Marcelo Tosatti wrote: > On Fri, Apr 25, 2008 at 11:38:21AM +0200, Alexander Graf wrote: >> >> On Apr 25, 2008, at 3:01 AM, Marcelo Tosatti wrote: >> >>> >>> Add three PCI bridges to support 128 slots. Vendor and device_id >>> have >>> been stolen from my test box. >>> >>> I/O port addresses behind each bridge are statically allocated >>> starting >>> from 0x2000 with 0x1000 length. Once the bridge runs out of I/O >>> space >>> the guest (Linux at least) happily allocates outside of the region. >>> That >>> needs verification. >>> >>> I/O memory addresses are divided between 0xf0000000 -> APIC base. >>> >>> The PCI irq mapping function is also changed, there was the >>> assumption >>> that devices behind the bridge use the IRQ allocated to the bridge >>> device itself, which is weird. Apparently this is how the SPARC ABP >>> PCI >>> host works (only user of the bridge code at the moment). >> >> Is there any reason we're not using the _PIC function and give the OS >> a clue on which APIC pin the device is? Right now everything boils >> down to LNKA - LNKD which it does not have to. >> It might even be a good idea to connect each PCI device to a specific >> APIC pin, so we don't need to share too much, which might become a >> problem with a lot of PCI devices. As far as I know there is no >> limitation on how many pins an APIC may have. > > I was not aware of the _PIC function. Will take a look at it. > > The number of IRQ's certainly needs to increase. The Operating System can call _PIC to set IRQ routing on PIC mode (0) or APIC mode (1). Most DSDTs check for this and enable routing via "native" pins instead of going through the LNKx pseudo-devices. Unfortunately qemu was built around an ISA PC, so nearly everything is based on the assumption that we have two i8259 which can only handle 16 interrupts. This assumption even shows in parts where you wouldn't expect it (like piix_pci.c:228). At least I didn't ;-). I started messing with this today, to get interrupts working properly in Darwin. So maybe we can join efforts here to get the IRQ routing work as it should. In a perfect world (at least in mine) every PCI device would have a separate interrupt lane which is handled by an IRQ > 16. It might be worth considering MSI emulation for this too. Alex |
From: Marcelo T. <mto...@re...> - 2008-04-25 20:36:31
|
On Fri, Apr 25, 2008 at 11:38:21AM +0200, Alexander Graf wrote: > > On Apr 25, 2008, at 3:01 AM, Marcelo Tosatti wrote: > > > > >Add three PCI bridges to support 128 slots. Vendor and device_id have > >been stolen from my test box. > > > >I/O port addresses behind each bridge are statically allocated > >starting > >from 0x2000 with 0x1000 length. Once the bridge runs out of I/O space > >the guest (Linux at least) happily allocates outside of the region. > >That > >needs verification. > > > >I/O memory addresses are divided between 0xf0000000 -> APIC base. > > > >The PCI irq mapping function is also changed, there was the assumption > >that devices behind the bridge use the IRQ allocated to the bridge > >device itself, which is weird. Apparently this is how the SPARC ABP > >PCI > >host works (only user of the bridge code at the moment). > > Is there any reason we're not using the _PIC function and give the OS > a clue on which APIC pin the device is? Right now everything boils > down to LNKA - LNKD which it does not have to. > It might even be a good idea to connect each PCI device to a specific > APIC pin, so we don't need to share too much, which might become a > problem with a lot of PCI devices. As far as I know there is no > limitation on how many pins an APIC may have. I was not aware of the _PIC function. Will take a look at it. The number of IRQ's certainly needs to increase. |
From: Jerone Y. <jy...@us...> - 2008-04-25 20:01:47
|
2 files changed, 68 insertions(+), 1 deletion(-) arch/powerpc/platforms/44x/Makefile | 2 - arch/powerpc/platforms/44x/idle.c | 67 +++++++++++++++++++++++++++++++++++ This patch has been accepted upstream and will be in 2.6.26. So it will eventually need to be removed when we move to 2.6.26rc. This patch adds the ability for the CPU to go into wait state while in cpu_idle loop. This helps virtulization solutions know when the guest Linux kernel is in an idle state. There are two ways to do it. Command line options: idle=spin <-- CPU will spin By default will go into wait mode. Signed-off-by: Jerone Young <jy...@us...> diff --git a/arch/powerpc/platforms/44x/Makefile b/arch/powerpc/platforms/44x/Makefile --- a/arch/powerpc/platforms/44x/Makefile +++ b/arch/powerpc/platforms/44x/Makefile @@ -1,4 +1,4 @@ obj-$(CONFIG_44x) := misc_44x.o -obj-$(CONFIG_44x) := misc_44x.o +obj-$(CONFIG_44x) := misc_44x.o idle.o obj-$(CONFIG_EBONY) += ebony.o obj-$(CONFIG_TAISHAN) += taishan.o obj-$(CONFIG_BAMBOO) += bamboo.o diff --git a/arch/powerpc/platforms/44x/idle.c b/arch/powerpc/platforms/44x/idle.c new file mode 100644 --- /dev/null +++ b/arch/powerpc/platforms/44x/idle.c @@ -0,0 +1,67 @@ +/* + * Copyright 2008 IBM Corp. + * + * Based on arch/powerpc/platforms/pasemi/idle.c: + * Copyright (C) 2006-2007 PA Semi, Inc + * + * Added by: Jerone Young <jy...@us...> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + */ + +#include <linux/of.h> +#include <linux/kernel.h> +#include <asm/machdep.h> + +static int mode_spin; + +static void ppc44x_idle(void) +{ + unsigned long msr_save; + + msr_save = mfmsr(); + /* set wait state MSR */ + mtmsr(msr_save|MSR_WE|MSR_EE|MSR_CE|MSR_DE); + isync(); + /* return to initial state */ + mtmsr(msr_save); + isync(); +} + +int __init ppc44x_idle_init(void) +{ + if (!mode_spin) { + /* If we are not setting spin mode + then we set to wait mode */ + ppc_md.power_save = &ppc44x_idle; + } + + return 0; +} + +arch_initcall(ppc44x_idle_init); + +static int __init idle_param(char *p) +{ + + if (!strcmp("spin", p)) { + mode_spin = 1; + ppc_md.power_save = NULL; + } + + return 0; +} + +early_param("idle", idle_param); |