From: David S. A. <da...@ci...> - 2008-04-16 00:15:35
|
I have been looking at RHEL3 based guests lately, and to say the least the performance is horrible. Rather than write a long tome on what I've done and observed, I'd like to find out if anyone has some insights or known problem areas running 2.4 guests. The short of it is that % system time spikes from time to time (e.g., on exec of a new process such as running /bin/true). I do not see the problem running RHEL3 on ESX, and an equivalent VM running RHEL4 runs fine. That suggests that the 2.4 kernel is doing something in a way that is not handled efficiently by kvm. Can someone shed some light on it? thanks, david |
From: Avi K. <av...@qu...> - 2008-04-16 08:46:40
|
David S. Ahern wrote: > I have been looking at RHEL3 based guests lately, and to say the least the > performance is horrible. Rather than write a long tome on what I've done and > observed, I'd like to find out if anyone has some insights or known problem > areas running 2.4 guests. The short of it is that % system time spikes from time > to time (e.g., on exec of a new process such as running /bin/true). > > I do not see the problem running RHEL3 on ESX, and an equivalent VM running > RHEL4 runs fine. That suggests that the 2.4 kernel is doing something in a way > that is not handled efficiently by kvm. > > Can someone shed some light on it? > It's not something that I test regularly. If you're running a 32-bit kernel, I'd suspect kmap(), or perhaps false positives from the fork detector. kvmtrace will probably give enough info to tell exactly what's going on; 'kvmstat -1' while the badness is happening may also help. -- error compiling committee.c: too many arguments to function |
From: David S. A. <da...@ci...> - 2008-04-17 21:12:56
|
kvm_stat -1 is practically impossible to time correctly to get a good snippet. kvmtrace is a fascinating tool. I captured trace data that encompassed one intense period where the VM appeared to freeze (no terminal response for a few seconds). After converting to text I examined an arbitrary section in time (how do you correlate tsc to unix epoch?), and it shows vcpu0 hammered with interrupts and vcpu1 hammered with page faults. (I put the representative data below; I can send the binary or text files if you really want to see them.) All toll over about a 10-12 second time period the trace text files contain 8426221 lines and 2051344 of them are PAGE_FAULTs (that's 24% of the text lines which seems really high). david --------------------------------- vcpu0 data: 0 (+ 0) INTR vcpu = 0x00000001 pid = 0x000011ea [ vector = 0x00 ] 9968400020536 (+ 1712) VMENTRY vcpu = 0x00000001 pid = 0x000011ea 9968400096784 (+ 76248) VMEXIT vcpu = 0x00000001 pid = 0x000011ea [ exitcode = 0x00000001, rip = 0x00000000 c0154d7a ] 0 (+ 0) INTR vcpu = 0x00000001 pid = 0x000011ea [ vector = 0x00 ] 9968400098576 (+ 1792) VMENTRY vcpu = 0x00000001 pid = 0x000011ea 9968400114528 (+ 15952) VMEXIT vcpu = 0x00000001 pid = 0x000011ea [ exitcode = 0x00000001, rip = 0x00000000 c0154d7a ] 0 (+ 0) INTR vcpu = 0x00000001 pid = 0x000011ea [ vector = 0x00 ] 9968400116328 (+ 1800) VMENTRY vcpu = 0x00000001 pid = 0x000011ea 9968400137216 (+ 20888) VMEXIT vcpu = 0x00000001 pid = 0x000011ea [ exitcode = 0x00000001, rip = 0x00000000 c0154d7a ] 0 (+ 0) INTR vcpu = 0x00000001 pid = 0x000011ea [ vector = 0x00 ] 9968400138840 (+ 1624) VMENTRY vcpu = 0x00000001 pid = 0x000011ea 9968400209344 (+ 70504) VMEXIT vcpu = 0x00000001 pid = 0x000011ea [ exitcode = 0x00000001, rip = 0x00000000 c0154d7c ] 0 (+ 0) INTR vcpu = 0x00000001 pid = 0x000011ea [ vector = 0x00 ] 9968400211056 (+ 1712) VMENTRY vcpu = 0x00000001 pid = 0x000011ea 9968400226312 (+ 15256) VMEXIT vcpu = 0x00000001 pid = 0x000011ea [ exitcode = 0x00000001, rip = 0x00000000 c0154d7c ] 0 (+ 0) INTR vcpu = 0x00000001 pid = 0x000011ea [ vector = 0x00 ] 9968400228040 (+ 1728) VMENTRY vcpu = 0x00000001 pid = 0x000011ea 9968400248688 (+ 20648) VMEXIT vcpu = 0x00000001 pid = 0x000011ea [ exitcode = 0x00000001, rip = 0x00000000 c0154d7c ] vcpu1 data: 9968400002032 (+ 3808) VMEXIT vcpu = 0x00000000 pid = 0x000011ea [ exitcode = 0x00000000, rip = 0x00000000 c016127f ] 0 (+ 0) PAGE_FAULT vcpu = 0x00000000 pid = 0x000011ea [ errorcode = 0x00000003, virt = 0x00000000 c0009db0 ] 9968400005448 (+ 3416) VMENTRY vcpu = 0x00000000 pid = 0x000011ea 9968400009832 (+ 4384) VMEXIT vcpu = 0x00000000 pid = 0x000011ea [ exitcode = 0x00000000, rip = 0x00000000 c016104a ] 0 (+ 0) PAGE_FAULT vcpu = 0x00000000 pid = 0x000011ea [ errorcode = 0x0000000b, virt = 0x00000000 fffb6f88 ] 9968400071584 (+ 61752) VMENTRY vcpu = 0x00000000 pid = 0x000011ea 9968400075608 (+ 4024) VMEXIT vcpu = 0x00000000 pid = 0x000011ea [ exitcode = 0x00000000, rip = 0x00000000 c01610e7 ] 0 (+ 0) PAGE_FAULT vcpu = 0x00000000 pid = 0x000011ea [ errorcode = 0x00000003, virt = 0x00000000 c0009db4 ] 9968400083528 (+ 7920) VMENTRY vcpu = 0x00000000 pid = 0x000011ea 9968400087288 (+ 3760) VMEXIT vcpu = 0x00000000 pid = 0x000011ea [ exitcode = 0x00000000, rip = 0x00000000 c01610ee ] 0 (+ 0) PAGE_FAULT vcpu = 0x00000000 pid = 0x000011ea [ errorcode = 0x00000003, virt = 0x00000000 c0009db0 ] 9968400097312 (+ 10024) VMENTRY vcpu = 0x00000000 pid = 0x000011ea 9968400103064 (+ 5752) VMEXIT vcpu = 0x00000000 pid = 0x000011ea [ exitcode = 0x00000000, rip = 0x00000000 c0160f9c ] 0 (+ 0) PAGE_FAULT vcpu = 0x00000000 pid = 0x000011ea [ errorcode = 0x00000003, virt = 0x00000000 c0009db4 ] 9968400116624 (+ 13560) VMENTRY vcpu = 0x00000000 pid = 0x000011ea 9968400120424 (+ 3800) VMEXIT vcpu = 0x00000000 pid = 0x000011ea [ exitcode = 0x00000000, rip = 0x00000000 c0160fa1 ] 0 (+ 0) PAGE_FAULT vcpu = 0x00000000 pid = 0x000011ea [ errorcode = 0x00000003, virt = 0x00000000 c0009db0 ] 9968400123856 (+ 3432) VMENTRY vcpu = 0x00000000 pid = 0x000011ea 9968400128208 (+ 4352) VMEXIT vcpu = 0x00000000 pid = 0x000011ea [ exitcode = 0x00000000, rip = 0x00000000 c0160dab ] 0 (+ 0) PAGE_FAULT vcpu = 0x00000000 pid = 0x000011ea [ errorcode = 0x00000009, virt = 0x00000000 fffb6d28 ] 9968400183848 (+ 55640) VMENTRY vcpu = 0x00000000 pid = 0x000011ea 9968400188232 (+ 4384) VMEXIT vcpu = 0x00000000 pid = 0x000011ea [ exitcode = 0x00000000, rip = 0x00000000 c0160e4d ] 0 (+ 0) PAGE_FAULT vcpu = 0x00000000 pid = 0x000011ea [ errorcode = 0x00000003, virt = 0x00000000 c0009db4 ] 9968400196160 (+ 7928) VMENTRY vcpu = 0x00000000 pid = 0x000011ea 9968400199928 (+ 3768) VMEXIT vcpu = 0x00000000 pid = 0x000011ea [ exitcode = 0x00000000, rip = 0x00000000 c0160e54 ] 0 (+ 0) PAGE_FAULT vcpu = 0x00000000 pid = 0x000011ea [ errorcode = 0x00000003, virt = 0x00000000 c0009db0 ] 9968400209864 (+ 9936) VMENTRY vcpu = 0x00000000 pid = 0x000011ea 9968400214984 (+ 5120) VMEXIT vcpu = 0x00000000 pid = 0x000011ea [ exitcode = 0x00000000, rip = 0x00000000 c0160f9c ] 0 (+ 0) PAGE_FAULT vcpu = 0x00000000 pid = 0x000011ea [ errorcode = 0x00000003, virt = 0x00000000 c0009db4 ] 9968400228232 (+ 13248) VMENTRY vcpu = 0x00000000 pid = 0x000011ea 9968400232000 (+ 3768) VMEXIT vcpu = 0x00000000 pid = 0x000011ea [ exitcode = 0x00000000, rip = 0x00000000 c0160fa1 ] 0 (+ 0) PAGE_FAULT vcpu = 0x00000000 pid = 0x000011ea [ errorcode = 0x00000003, virt = 0x00000000 c0009db0 ] 9968400235424 (+ 3424) VMENTRY vcpu = 0x00000000 pid = 0x000011ea 9968400239816 (+ 4392) VMEXIT vcpu = 0x00000000 pid = 0x000011ea [ exitcode = 0x00000000, rip = 0x00000000 c0160dab ] 0 (+ 0) PAGE_FAULT vcpu = 0x00000000 pid = 0x000011ea [ errorcode = 0x00000009, virt = 0x00000000 fffb6d30 ] Avi Kivity wrote: > David S. Ahern wrote: >> I have been looking at RHEL3 based guests lately, and to say the least >> the >> performance is horrible. Rather than write a long tome on what I've >> done and >> observed, I'd like to find out if anyone has some insights or known >> problem >> areas running 2.4 guests. The short of it is that % system time spikes >> from time >> to time (e.g., on exec of a new process such as running /bin/true). >> >> I do not see the problem running RHEL3 on ESX, and an equivalent VM >> running >> RHEL4 runs fine. That suggests that the 2.4 kernel is doing something >> in a way >> that is not handled efficiently by kvm. >> >> Can someone shed some light on it? >> > > It's not something that I test regularly. If you're running a 32-bit > kernel, I'd suspect kmap(), or perhaps false positives from the fork > detector. > > kvmtrace will probably give enough info to tell exactly what's going on; > 'kvmstat -1' while the badness is happening may also help. > |
From: Avi K. <av...@qu...> - 2008-04-18 08:01:32
|
David S. Ahern wrote: > kvm_stat -1 is practically impossible to time correctly to get a good snippet. > > kvmtrace is a fascinating tool. I captured trace data that encompassed one > intense period where the VM appeared to freeze (no terminal response for a few > seconds). > > After converting to text I examined an arbitrary section in time (how do you > correlate tsc to unix epoch?), and it shows vcpu0 hammered with interrupts and > vcpu1 hammered with page faults. (I put the representative data below; I can > send the binary or text files if you really want to see them.) All toll over > about a 10-12 second time period the trace text files contain 8426221 lines and > 2051344 of them are PAGE_FAULTs (that's 24% of the text lines which seems really > high). > > david > > > vcpu1 data: > > 0 (+ 0) PAGE_FAULT vcpu = 0x00000000 pid = 0x000011ea [ errorcode = > 0x00000003, virt = 0x00000000 c0009db0 ] > 0 (+ 0) PAGE_FAULT vcpu = 0x00000000 pid = 0x000011ea [ errorcode = > 0x00000003, virt = 0x00000000 c0009db4 ] > 0 (+ 0) PAGE_FAULT vcpu = 0x00000000 pid = 0x000011ea [ errorcode = > 0x00000003, virt = 0x00000000 c0009db0 ] > 0 (+ 0) PAGE_FAULT vcpu = 0x00000000 pid = 0x000011ea [ errorcode = > 0x00000009, virt = 0x00000000 fffb6d28 ] > 0 (+ 0) PAGE_FAULT vcpu = 0x00000000 pid = 0x000011ea [ errorcode = > 0x00000003, virt = 0x00000000 c0009db4 ] > 0 (+ 0) PAGE_FAULT vcpu = 0x00000000 pid = 0x000011ea [ errorcode = > 0x00000003, virt = 0x00000000 c0009db0 ] > 0 (+ 0) PAGE_FAULT vcpu = 0x00000000 pid = 0x000011ea [ errorcode = > 0x00000003, virt = 0x00000000 c0009db4 ] > 0 (+ 0) PAGE_FAULT vcpu = 0x00000000 pid = 0x000011ea [ errorcode = > 0x00000003, virt = 0x00000000 c0009db0 ] > 0 (+ 0) PAGE_FAULT vcpu = 0x00000000 pid = 0x000011ea [ errorcode = > 0x00000009, virt = 0x00000000 fffb6d30 ] > > > The pattern here is c0009db4, c0009db0, fffb6xxx, c0009db0. Setting a pte at c0009db0, accessing the page mapped by the pte, unmapping the pte. Note that c0009db0 (bits 3:11) == 0x1b6 == fffb6xxx (bits 12:20). That's a kmap_atomic() + access +kunmap_atomic() sequence. The expensive accesses ~50K cycles) seem to be the onces at fffb6xxx. Now theses shouldn't show up at all -- the kvm_mmu_pte_write() ought to have set up the ptes correctly. Can you add a trace at mmu_guess_page_from_pte_write(), right before "if (is_present_pte(gpte))"? I'm interested in gpa and gpte. Also a trace at kvm_mmu_pte_write(), where it sets flooded = 1 (hmm, try to increase the 3 to 4 in the line right above that, maybe the fork detector is misfiring). --------------------------------- vcpu0 data: 0 (+ 0) INTR vcpu = 0x00000001 pid = 0x000011ea [ vector = 0x00 ] 9968400020536 (+ 1712) VMENTRY vcpu = 0x00000001 pid = 0x000011ea 9968400096784 (+ 76248) VMEXIT vcpu = 0x00000001 pid = 0x000011ea [ exitcode = 0x00000001, rip = 0x00000000 c0154d7a ] 0 (+ 0) INTR vcpu = 0x00000001 pid = 0x000011ea [ vector = 0x00 ] 9968400098576 (+ 1792) VMENTRY vcpu = 0x00000001 pid = 0x000011ea 9968400114528 (+ 15952) VMEXIT vcpu = 0x00000001 pid = 0x000011ea [ exitcode = 0x00000001, rip = 0x00000000 c0154d7a ] 0 (+ 0) INTR vcpu = 0x00000001 pid = 0x000011ea [ vector = 0x00 ] 9968400116328 (+ 1800) VMENTRY vcpu = 0x00000001 pid = 0x000011ea 9968400137216 (+ 20888) VMEXIT vcpu = 0x00000001 pid = 0x000011ea [ exitcode = 0x00000001, rip = 0x00000000 c0154d7a ] 0 (+ 0) INTR vcpu = 0x00000001 pid = 0x000011ea [ vector = 0x00 ] 9968400138840 (+ 1624) VMENTRY vcpu = 0x00000001 pid = 0x000011ea 9968400209344 (+ 70504) VMEXIT vcpu = 0x00000001 pid = 0x000011ea [ exitcode = 0x00000001, rip = 0x00000000 c0154d7c ] 0 (+ 0) INTR vcpu = 0x00000001 pid = 0x000011ea [ vector = 0x00 ] 9968400211056 (+ 1712) VMENTRY vcpu = 0x00000001 pid = 0x000011ea 9968400226312 (+ 15256) VMEXIT vcpu = 0x00000001 pid = 0x000011ea [ exitcode = 0x00000001, rip = 0x00000000 c0154d7c ] 0 (+ 0) INTR vcpu = 0x00000001 pid = 0x000011ea [ vector = 0x00 ] 9968400228040 (+ 1728) VMENTRY vcpu = 0x00000001 pid = 0x000011ea 9968400248688 (+ 20648) VMEXIT vcpu = 0x00000001 pid = 0x000011ea [ exitcode = 0x00000001, rip = 0x00000000 c0154d7c ] Those are probably IPIs due to the kmaps above. > > Avi Kivity wrote: > >> David S. Ahern wrote: >> >>> I have been looking at RHEL3 based guests lately, and to say the least >>> the >>> performance is horrible. Rather than write a long tome on what I've >>> done and >>> observed, I'd like to find out if anyone has some insights or known >>> problem >>> areas running 2.4 guests. The short of it is that % system time spikes >>> from time >>> to time (e.g., on exec of a new process such as running /bin/true). >>> >>> I do not see the problem running RHEL3 on ESX, and an equivalent VM >>> running >>> RHEL4 runs fine. That suggests that the 2.4 kernel is doing something >>> in a way >>> that is not handled efficiently by kvm. >>> >>> Can someone shed some light on it? >>> >>> >> It's not something that I test regularly. If you're running a 32-bit >> kernel, I'd suspect kmap(), or perhaps false positives from the fork >> detector. >> >> kvmtrace will probably give enough info to tell exactly what's going on; >> 'kvmstat -1' while the badness is happening may also help. >> >> -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. |
From: David S. A. <da...@ci...> - 2008-04-21 04:31:12
|
I added the traces and captured data over another apparent lockup of the guest. This seems to be representative of the sequence (pid/vcpu removed). (+4776) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 c016127c ] (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 c0009db4 ] (+3632) VMENTRY (+4552) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 c016104a ] (+ 0) PAGE_FAULT [ errorcode = 0x0000000b, virt = 0x00000000 fffb61c8 ] (+ 54928) VMENTRY (+4568) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 c01610e7 ] (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 c0009db4 ] (+ 0) PTE_WRITE [ gpa = 0x00000000 00009db4 gpte = 0x00000000 41c5d363 ] (+8432) VMENTRY (+3936) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 c01610ee ] (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 c0009db0 ] (+ 0) PTE_WRITE [ gpa = 0x00000000 00009db0 gpte = 0x00000000 00000000 ] (+ 13832) VMENTRY (+5768) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 c016127c ] (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 c0009db4 ] (+3712) VMENTRY (+4576) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 c016104a ] (+ 0) PAGE_FAULT [ errorcode = 0x0000000b, virt = 0x00000000 fffb61d0 ] (+ 0) PTE_WRITE [ gpa = 0x00000000 3d5981d0 gpte = 0x00000000 3d55d047 ] (+ 65216) VMENTRY (+4232) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 c01610e7 ] (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 c0009db4 ] (+ 0) PTE_WRITE [ gpa = 0x00000000 00009db4 gpte = 0x00000000 3d598363 ] (+8640) VMENTRY (+3936) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 c01610ee ] (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 c0009db0 ] (+ 0) PTE_WRITE [ gpa = 0x00000000 00009db0 gpte = 0x00000000 00000000 ] (+ 14160) VMENTRY I can forward a more complete time snippet if you'd like. vcpu0 + corresponding vcpu1 files have 85000 total lines and compressed the files total ~500k. I did not see the FLOODED trace come out during this sample though I did bump the count from 3 to 4 as you suggested. Correlating rip addresses to the 2.4 kernel: c0160d00-c0161290 = page_referenced It looks like the event is kscand running through the pages. I suspected this some time ago, and tried tweaking the kscand_work_percent sysctl variable. It appeared to lower the peak of the spikes, but maybe I imagined it. I believe lowering that value makes kscand wake up more often but do less work (page scanning) each time it is awakened. david Avi Kivity wrote: > Can you add a trace at mmu_guess_page_from_pte_write(), right before "if > (is_present_pte(gpte))"? I'm interested in gpa and gpte. Also a trace > at kvm_mmu_pte_write(), where it sets flooded = 1 (hmm, try to increase > the 3 to 4 in the line right above that, maybe the fork detector is > misfiring). |
From: Avi K. <av...@qu...> - 2008-04-21 09:20:21
|
David S. Ahern wrote: > I added the traces and captured data over another apparent lockup of the guest. > This seems to be representative of the sequence (pid/vcpu removed). > > (+4776) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 c016127c ] > (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 c0009db4 ] > (+3632) VMENTRY > (+4552) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 c016104a ] > (+ 0) PAGE_FAULT [ errorcode = 0x0000000b, virt = 0x00000000 fffb61c8 ] > (+ 54928) VMENTRY > Can you oprofile the host to see where the 54K cycles are spent? > (+4568) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 c01610e7 ] > (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 c0009db4 ] > (+ 0) PTE_WRITE [ gpa = 0x00000000 00009db4 gpte = 0x00000000 41c5d363 ] > (+8432) VMENTRY > (+3936) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 c01610ee ] > (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 c0009db0 ] > (+ 0) PTE_WRITE [ gpa = 0x00000000 00009db0 gpte = 0x00000000 00000000 ] > (+ 13832) VMENTRY > > > (+5768) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 c016127c ] > (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 c0009db4 ] > (+3712) VMENTRY > (+4576) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 c016104a ] > (+ 0) PAGE_FAULT [ errorcode = 0x0000000b, virt = 0x00000000 fffb61d0 ] > (+ 0) PTE_WRITE [ gpa = 0x00000000 3d5981d0 gpte = 0x00000000 3d55d047 ] > This indeed has the accessed bit clear. > (+ 65216) VMENTRY > (+4232) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 c01610e7 ] > (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 c0009db4 ] > (+ 0) PTE_WRITE [ gpa = 0x00000000 00009db4 gpte = 0x00000000 3d598363 ] > This has the accessed bit set and the user bit clear, and the pte pointing at the previous pte_write gpa. Looks like a kmap_atomic(). > (+8640) VMENTRY > (+3936) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 c01610ee ] > (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 c0009db0 ] > (+ 0) PTE_WRITE [ gpa = 0x00000000 00009db0 gpte = 0x00000000 00000000 ] > (+ 14160) VMENTRY > > I can forward a more complete time snippet if you'd like. vcpu0 + corresponding > vcpu1 files have 85000 total lines and compressed the files total ~500k. > > I did not see the FLOODED trace come out during this sample though I did bump > the count from 3 to 4 as you suggested. > > > Bumping the count was supposed to remove the flooding... > Correlating rip addresses to the 2.4 kernel: > > c0160d00-c0161290 = page_referenced > > It looks like the event is kscand running through the pages. I suspected this > some time ago, and tried tweaking the kscand_work_percent sysctl variable. It > appeared to lower the peak of the spikes, but maybe I imagined it. I believe > lowering that value makes kscand wake up more often but do less work (page > scanning) each time it is awakened. > > What does 'top' in the guest show (perhaps sorted by total cpu time rather than instantaneous usage)? What host kernel are you running? How many host cpus? -- error compiling committee.c: too many arguments to function |
From: David S. A. <da...@ci...> - 2008-04-21 17:07:37
|
host: 2.6.25-rc8, x86_64, kvm-66 1 dual-core Xeon(R) CPU 3050 @ 2.13GHz 6 GB RAM (This behavior also occurs on a larger server with 2 dual-core Xeon(R) CPU 5140 @ 2.33GHz, 4 GB RAM. Same kernel and kvm versions.) guest: RHEL3 U8 (2.4.21-47.ELsmp), 2 vcpus, 2 GB RAM As usual, waited for a guest "event" -- high system time in guest which appears to lock it up. Following the event, kscand was the top CPU user (cumulative time) in the guest. During the event, 2 qemu threads are pegging the host CPU at 100%. Top samples from oprofile (oprofile was started after the freeze start and stopped when guest response returned): samples % image name app name symbol name 171716 35.1350 kvm-intel.ko kvm_intel vmx_vcpu_run 45836 9.3786 vmlinux vmlinux copy_user_generic_string 39417 8.0652 kvm.ko kvm kvm_read_guest_atomic 23604 4.8296 vmlinux vmlinux add_preempt_count 22878 4.6811 vmlinux vmlinux __smp_call_function_mask 16143 3.3030 kvm.ko kvm gfn_to_hva 14648 2.9971 vmlinux vmlinux sub_preempt_count 14589 2.9851 kvm.ko kvm __gfn_to_memslot 11666 2.3870 kvm.ko kvm unalias_gfn 10834 2.2168 kvm.ko kvm kvm_mmu_zap_page 10532 2.1550 kvm.ko kvm paging64_prefetch_page 6285 1.2860 kvm-intel.ko kvm_intel handle_exception 6066 1.2412 kvm.ko kvm kvm_arch_vcpu_ioctl_run 4741 0.9701 kvm.ko kvm kvm_add_trace 3801 0.7777 vmlinux vmlinux __copy_from_user_inatomic 3592 0.7350 vmlinux vmlinux follow_page 3326 0.6805 kvm.ko kvm mmu_memory_cache_alloc 3317 0.6787 kvm-intel.ko kvm_intel kvm_handle_exit 2971 0.6079 kvm.ko kvm paging64_page_fault 2777 0.5682 kvm.ko kvm paging64_walk_addr 2294 0.4694 kvm.ko kvm kvm_mmu_pte_write 2278 0.4661 kvm.ko kvm kvm_flush_remote_tlbs 2266 0.4636 kvm-intel.ko kvm_intel vmcs_writel 2086 0.4268 kvm.ko kvm mmu_set_spte 2041 0.4176 kvm.ko kvm kvm_read_guest 1615 0.3304 vmlinux vmlinux free_hot_cold_page david Avi Kivity wrote: > David S. Ahern wrote: >> I added the traces and captured data over another apparent lockup of >> the guest. >> This seems to be representative of the sequence (pid/vcpu removed). >> >> (+4776) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 >> c016127c ] >> (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 >> c0009db4 ] >> (+3632) VMENTRY >> (+4552) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 >> c016104a ] >> (+ 0) PAGE_FAULT [ errorcode = 0x0000000b, virt = 0x00000000 >> fffb61c8 ] >> (+ 54928) VMENTRY >> > > Can you oprofile the host to see where the 54K cycles are spent? > >> (+4568) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 >> c01610e7 ] >> (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 >> c0009db4 ] >> (+ 0) PTE_WRITE [ gpa = 0x00000000 00009db4 gpte = 0x00000000 >> 41c5d363 ] >> (+8432) VMENTRY >> (+3936) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 >> c01610ee ] >> (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 >> c0009db0 ] >> (+ 0) PTE_WRITE [ gpa = 0x00000000 00009db0 gpte = 0x00000000 >> 00000000 ] >> (+ 13832) VMENTRY >> >> >> (+5768) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 >> c016127c ] >> (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 >> c0009db4 ] >> (+3712) VMENTRY >> (+4576) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 >> c016104a ] >> (+ 0) PAGE_FAULT [ errorcode = 0x0000000b, virt = 0x00000000 >> fffb61d0 ] >> (+ 0) PTE_WRITE [ gpa = 0x00000000 3d5981d0 gpte = 0x00000000 >> 3d55d047 ] >> > > This indeed has the accessed bit clear. > >> (+ 65216) VMENTRY >> (+4232) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 >> c01610e7 ] >> (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 >> c0009db4 ] >> (+ 0) PTE_WRITE [ gpa = 0x00000000 00009db4 gpte = 0x00000000 >> 3d598363 ] >> > > This has the accessed bit set and the user bit clear, and the pte > pointing at the previous pte_write gpa. Looks like a kmap_atomic(). > >> (+8640) VMENTRY >> (+3936) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 >> c01610ee ] >> (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 >> c0009db0 ] >> (+ 0) PTE_WRITE [ gpa = 0x00000000 00009db0 gpte = 0x00000000 >> 00000000 ] >> (+ 14160) VMENTRY >> >> I can forward a more complete time snippet if you'd like. vcpu0 + >> corresponding >> vcpu1 files have 85000 total lines and compressed the files total ~500k. >> >> I did not see the FLOODED trace come out during this sample though I >> did bump >> the count from 3 to 4 as you suggested. >> >> >> > > Bumping the count was supposed to remove the flooding... > >> Correlating rip addresses to the 2.4 kernel: >> >> c0160d00-c0161290 = page_referenced >> >> It looks like the event is kscand running through the pages. I >> suspected this >> some time ago, and tried tweaking the kscand_work_percent sysctl >> variable. It >> appeared to lower the peak of the spikes, but maybe I imagined it. I >> believe >> lowering that value makes kscand wake up more often but do less work >> (page >> scanning) each time it is awakened. >> >> > > What does 'top' in the guest show (perhaps sorted by total cpu time > rather than instantaneous usage)? > > What host kernel are you running? How many host cpus? > |
From: David S. A. <da...@ci...> - 2008-04-22 20:23:08
|
I added tracers to kvm_mmu_page_fault() that include collecting tsc cycles: 1. before vcpu->arch.mmu.page_fault() 2. after vcpu->arch.mmu.page_fault() 3. after mmu_topup_memory_caches() 4. after emulate_instruction() So the delta in the trace reports show: - cycles required for arch.mmu.page_fault (tracer 2) - cycles required for mmu_topup_memory_caches(tracer 3) - cycles required for emulate_instruction() (tracer 4) I captured trace data for ~5-seconds during one of the usual events (again this time it was due to kscand in the guest). I ran the formatted trace data through an awk script to summarize: TSC cycles tracer2 tracer3 tracer4 0 - 10,000: 295067 213251 115873 10,001 - 25,000: 7682 1004 98336 25,001 - 50,000: 201 15 36 50,001 - 100,000: 100655 0 10 > 100,000: 117 0 15 This means vcpu->arch.mmu.page_fault() was called 403,722 times in the roughyl 5-second interval: 295,067 times it took < 10,000 cycles, but 100,772 times it took longer than 50,000 cycles. The page_fault function getting run is paging64_page_fault. mmu_topup_memory_caches() and emulate_instruction() were both run 214,270 times, most of them relatively quickly. Note: I bumped the scheduling priority of the qemu threads to RR 1 so that few host processes could interrupt it. david Avi Kivity wrote: > David S. Ahern wrote: >> I added the traces and captured data over another apparent lockup of >> the guest. >> This seems to be representative of the sequence (pid/vcpu removed). >> >> (+4776) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 >> c016127c ] >> (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 >> c0009db4 ] >> (+3632) VMENTRY >> (+4552) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 >> c016104a ] >> (+ 0) PAGE_FAULT [ errorcode = 0x0000000b, virt = 0x00000000 >> fffb61c8 ] >> (+ 54928) VMENTRY >> > > Can you oprofile the host to see where the 54K cycles are spent? > >> (+4568) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 >> c01610e7 ] >> (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 >> c0009db4 ] >> (+ 0) PTE_WRITE [ gpa = 0x00000000 00009db4 gpte = 0x00000000 >> 41c5d363 ] >> (+8432) VMENTRY >> (+3936) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 >> c01610ee ] >> (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 >> c0009db0 ] >> (+ 0) PTE_WRITE [ gpa = 0x00000000 00009db0 gpte = 0x00000000 >> 00000000 ] >> (+ 13832) VMENTRY >> >> >> (+5768) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 >> c016127c ] >> (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 >> c0009db4 ] >> (+3712) VMENTRY >> (+4576) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 >> c016104a ] >> (+ 0) PAGE_FAULT [ errorcode = 0x0000000b, virt = 0x00000000 >> fffb61d0 ] >> (+ 0) PTE_WRITE [ gpa = 0x00000000 3d5981d0 gpte = 0x00000000 >> 3d55d047 ] >> > > This indeed has the accessed bit clear. > >> (+ 65216) VMENTRY >> (+4232) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 >> c01610e7 ] >> (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 >> c0009db4 ] >> (+ 0) PTE_WRITE [ gpa = 0x00000000 00009db4 gpte = 0x00000000 >> 3d598363 ] >> > > This has the accessed bit set and the user bit clear, and the pte > pointing at the previous pte_write gpa. Looks like a kmap_atomic(). > >> (+8640) VMENTRY >> (+3936) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 >> c01610ee ] >> (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 >> c0009db0 ] >> (+ 0) PTE_WRITE [ gpa = 0x00000000 00009db0 gpte = 0x00000000 >> 00000000 ] >> (+ 14160) VMENTRY >> >> I can forward a more complete time snippet if you'd like. vcpu0 + >> corresponding >> vcpu1 files have 85000 total lines and compressed the files total ~500k. >> >> I did not see the FLOODED trace come out during this sample though I >> did bump >> the count from 3 to 4 as you suggested. >> >> >> > > Bumping the count was supposed to remove the flooding... > >> Correlating rip addresses to the 2.4 kernel: >> >> c0160d00-c0161290 = page_referenced >> >> It looks like the event is kscand running through the pages. I >> suspected this >> some time ago, and tried tweaking the kscand_work_percent sysctl >> variable. It >> appeared to lower the peak of the spikes, but maybe I imagined it. I >> believe >> lowering that value makes kscand wake up more often but do less work >> (page >> scanning) each time it is awakened. >> >> > > What does 'top' in the guest show (perhaps sorted by total cpu time > rather than instantaneous usage)? > > What host kernel are you running? How many host cpus? > |
From: Avi K. <av...@qu...> - 2008-04-23 08:08:28
|
David S. Ahern wrote: > I added tracers to kvm_mmu_page_fault() that include collecting tsc cycles: > > 1. before vcpu->arch.mmu.page_fault() > 2. after vcpu->arch.mmu.page_fault() > 3. after mmu_topup_memory_caches() > 4. after emulate_instruction() > > So the delta in the trace reports show: > - cycles required for arch.mmu.page_fault (tracer 2) > - cycles required for mmu_topup_memory_caches(tracer 3) > - cycles required for emulate_instruction() (tracer 4) > > I captured trace data for ~5-seconds during one of the usual events (again this > time it was due to kscand in the guest). I ran the formatted trace data through > an awk script to summarize: > > TSC cycles tracer2 tracer3 tracer4 > 0 - 10,000: 295067 213251 115873 > 10,001 - 25,000: 7682 1004 98336 > 25,001 - 50,000: 201 15 36 > 50,001 - 100,000: 100655 0 10 > > 100,000: 117 0 15 > > This means vcpu->arch.mmu.page_fault() was called 403,722 times in the roughyl > 5-second interval: 295,067 times it took < 10,000 cycles, but 100,772 times it > took longer than 50,000 cycles. The page_fault function getting run is > paging64_page_fault. > > This does look like the fork detector. Once in every four faults, it triggers and the fault becomes slow. 100K floods == 100K page tables == 200GB of virtual memory, which seems excessive. Is this running a forked load like apache, with many processes? How much memory is on the guest, and is there any memory pressure? > mmu_topup_memory_caches() and emulate_instruction() were both run 214,270 times, > most of them relatively quickly. > b > Note: I bumped the scheduling priority of the qemu threads to RR 1 so that few > host processes could interrupt it. > > david > > > Avi Kivity wrote: > >> David S. Ahern wrote: >> >>> I added the traces and captured data over another apparent lockup of >>> the guest. >>> This seems to be representative of the sequence (pid/vcpu removed). >>> >>> (+4776) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 >>> c016127c ] >>> (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 >>> c0009db4 ] >>> (+3632) VMENTRY >>> (+4552) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 >>> c016104a ] >>> (+ 0) PAGE_FAULT [ errorcode = 0x0000000b, virt = 0x00000000 >>> fffb61c8 ] >>> (+ 54928) VMENTRY >>> >>> >> Can you oprofile the host to see where the 54K cycles are spent? >> >> >>> (+4568) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 >>> c01610e7 ] >>> (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 >>> c0009db4 ] >>> (+ 0) PTE_WRITE [ gpa = 0x00000000 00009db4 gpte = 0x00000000 >>> 41c5d363 ] >>> (+8432) VMENTRY >>> (+3936) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 >>> c01610ee ] >>> (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 >>> c0009db0 ] >>> (+ 0) PTE_WRITE [ gpa = 0x00000000 00009db0 gpte = 0x00000000 >>> 00000000 ] >>> (+ 13832) VMENTRY >>> >>> >>> (+5768) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 >>> c016127c ] >>> (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 >>> c0009db4 ] >>> (+3712) VMENTRY >>> (+4576) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 >>> c016104a ] >>> (+ 0) PAGE_FAULT [ errorcode = 0x0000000b, virt = 0x00000000 >>> fffb61d0 ] >>> (+ 0) PTE_WRITE [ gpa = 0x00000000 3d5981d0 gpte = 0x00000000 >>> 3d55d047 ] >>> >>> >> This indeed has the accessed bit clear. >> >> >>> (+ 65216) VMENTRY >>> (+4232) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 >>> c01610e7 ] >>> (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 >>> c0009db4 ] >>> (+ 0) PTE_WRITE [ gpa = 0x00000000 00009db4 gpte = 0x00000000 >>> 3d598363 ] >>> >>> >> This has the accessed bit set and the user bit clear, and the pte >> pointing at the previous pte_write gpa. Looks like a kmap_atomic(). >> >> >>> (+8640) VMENTRY >>> (+3936) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 >>> c01610ee ] >>> (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 >>> c0009db0 ] >>> (+ 0) PTE_WRITE [ gpa = 0x00000000 00009db0 gpte = 0x00000000 >>> 00000000 ] >>> (+ 14160) VMENTRY >>> >>> I can forward a more complete time snippet if you'd like. vcpu0 + >>> corresponding >>> vcpu1 files have 85000 total lines and compressed the files total ~500k. >>> >>> I did not see the FLOODED trace come out during this sample though I >>> did bump >>> the count from 3 to 4 as you suggested. >>> >>> >>> >>> >> Bumping the count was supposed to remove the flooding... >> >> >>> Correlating rip addresses to the 2.4 kernel: >>> >>> c0160d00-c0161290 = page_referenced >>> >>> It looks like the event is kscand running through the pages. I >>> suspected this >>> some time ago, and tried tweaking the kscand_work_percent sysctl >>> variable. It >>> appeared to lower the peak of the spikes, but maybe I imagined it. I >>> believe >>> lowering that value makes kscand wake up more often but do less work >>> (page >>> scanning) each time it is awakened. >>> >>> >>> >> What does 'top' in the guest show (perhaps sorted by total cpu time >> rather than instantaneous usage)? >> >> What host kernel are you running? How many host cpus? >> >> -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. |
From: David S. A. <da...@ci...> - 2008-04-23 15:24:03
|
>> Avi Kivity wrote: >> >>> David S. Ahern wrote: >>> >>>> I added the traces and captured data over another apparent lockup of >>>> the guest. >>>> This seems to be representative of the sequence (pid/vcpu removed). >>>> >>>> (+4776) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 >>>> c016127c ] >>>> (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 >>>> c0009db4 ] >>>> (+3632) VMENTRY >>>> (+4552) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 >>>> c016104a ] >>>> (+ 0) PAGE_FAULT [ errorcode = 0x0000000b, virt = 0x00000000 >>>> fffb61c8 ] >>>> (+ 54928) VMENTRY >>>> >>> Can you oprofile the host to see where the 54K cycles are spent? >>> >>> I've continued drilling down with the tracers to answer that question. I have done runs with tracers in paging64_page_fault and it showed the overhead is with the fetch() function. On my last run the tracers are in paging64_fetch() as follows: 1. after is_present_pte() check 2. before kvm_mmu_get_page() 3. after kvm_mmu_get_page() 4. after if (!metaphysical) {} The delta between 2 and 3 shows the cycles to run kvm_mmu_get_page(). The delta between 3 and 4 shows the cycles to run kvm_read_guest_atomic(), if it is run. Tracer1 dumps vcpu->arch.last_pt_write_count (a carryover from when the new tracers were in paging64_page_fault); tracer2 dumps the level, metaphysical and access variables; tracer5 dumps value in shadow_ent. A representative trace sample is: (+ 4576) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 c016104a ] (+ 0) PAGE_FAULT [ errorcode = 0x0000000b, virt = 0x00000000 fffb6950 ] (+ 2664) PAGE_FAULT1 [ write_count = 0 ] (+ 472) PAGE_FAULT2 [ level = 2 metaphysical = 0 access 0x00000007 ] (+ 50416) PAGE_FAULT3 (+ 472) PAGE_FAULT4 (+ 856) PAGE_FAULT5 [ shadow_ent_val = 0x80000000 9276d043 ] (+ 1528) VMENTRY (+ 4992) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 c01610e7 ] (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 c0009db4 ] (+ 2296) PAGE_FAULT1 [ write_count = 0 ] (+ 816) PAGE_FAULT5 [ shadow_ent_val = 0x00000000 8a809041 ] (+ 0) PTE_WRITE [ gpa = 0x00000000 00009db4 gpte = 0x00000000 4176d363 ] (+ 6424) VMENTRY (+ 3864) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 c01610ee ] (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 c0009db0 ] (+ 2496) PAGE_FAULT1 [ write_count = 1 ] (+ 856) PAGE_FAULT5 [ shadow_ent_val = 0x00000000 8a809041 ] (+ 0) PTE_WRITE [ gpa = 0x00000000 00009db0 gpte = 0x00000000 00000000 ] (+ 10248) VMENTRY (+ 4744) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 c016127c ] (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 c0009db4 ] (+ 2408) PAGE_FAULT1 [ write_count = 2 ] (+ 760) PAGE_FAULT5 [ shadow_ent_val = 0x00000000 8a809043 ] (+ 1240) VMENTRY (+ 4624) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 c016104a ] (+ 0) PAGE_FAULT [ errorcode = 0x0000000b, virt = 0x00000000 fffb6950 ] (+ 2512) PAGE_FAULT1 [ write_count = 0 ] (+ 496) PAGE_FAULT2 [ level = 2 metaphysical = 0 access 0x00000007 ] (+ 48664) PAGE_FAULT3 (+ 472) PAGE_FAULT4 (+ 856) PAGE_FAULT5 [ shadow_ent_val = 0x80000000 9272d043 ] (+ 1576) VMENTRY So basically every 4th trip through the fetch function it runs kvm_mmu_get_page(). A summary of the entire trace file shows this function rarely executes in less than 50,000 cycles. Also, vcpu->arch.last_pt_write_count is always 0 when the high cycles are hit. More tidbits: - The hugepage option seems to have no effect -- the system spikes and overhead occurs with and without the hugepage option (above data is with it). - As the guest runs for hours, the intensity of the spikes drop though they still occur regularly and kscand continues to be the primary suspect. qemu's RSS tends to the guests memory allotment of 2GB. Internally guest memory usage runs at ~1GB page cache, 57M buffers, 24M swap, ~800MB for processes. - I have looked at process creation and do not see a strong correlation between system time spikes and number of new processes. So far the only correlations seem to be kscand and amount of memory used. ie., stock RHEL3 with few processes shows tiny spikes whereas my tests with 90+ processes using about 800M plus a continually updating page cache (ie., moderate IO levels) the spikes are strong and last for seconds. - Time runs really fast in the guest, gaining several minutes in 24-hours. I'll download your kvm_stat update and give it a try. When I started this investigation I was using Christian's kvmstat script which dumped stats to a file. Plots of that data did not show a strong correlation with guest system time. david |
From: David S. A. <da...@ci...> - 2008-04-25 17:34:34
|
David S. Ahern wrote: > Avi Kivity wrote: > >> David S. Ahern wrote: >> >>> I added the traces and captured data over another apparent lockup of >>> the guest. >>> This seems to be representative of the sequence (pid/vcpu removed). >>> >>> (+4776) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 >>> c016127c ] >>> (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 >>> c0009db4 ] >>> (+3632) VMENTRY >>> (+4552) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 >>> c016104a ] >>> (+ 0) PAGE_FAULT [ errorcode = 0x0000000b, virt = 0x00000000 >>> fffb61c8 ] >>> (+ 54928) VMENTRY >>> >> Can you oprofile the host to see where the 54K cycles are spent? >> Most of the cycles (~80% of that 54k+) are spent in paging64_prefetch_page(): for (i = 0; i < PT64_ENT_PER_PAGE; ++i) { gpa_t pte_gpa = gfn_to_gpa(sp->gfn); pte_gpa += (i+offset) * sizeof(pt_element_t); r = kvm_read_guest_atomic(vcpu->kvm, pte_gpa, &pt, sizeof(pt_element_t)); if (r || is_present_pte(pt)) sp->spt[i] = shadow_trap_nonpresent_pte; else sp->spt[i] = shadow_notrap_nonpresent_pte; } This loop is run 512 times and takes a total of ~45k cycles, or ~88 cycles per loop. This function gets run >20,000/sec during some of the kscand loops. david |
From: Avi K. <av...@qu...> - 2008-04-26 06:45:37
|
David S. Ahern wrote: > David S. Ahern wrote: > >> Avi Kivity wrote: >> >> >>> David S. Ahern wrote: >>> >>> >>>> I added the traces and captured data over another apparent lockup of >>>> the guest. >>>> This seems to be representative of the sequence (pid/vcpu removed). >>>> >>>> (+4776) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 >>>> c016127c ] >>>> (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 >>>> c0009db4 ] >>>> (+3632) VMENTRY >>>> (+4552) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 >>>> c016104a ] >>>> (+ 0) PAGE_FAULT [ errorcode = 0x0000000b, virt = 0x00000000 >>>> fffb61c8 ] >>>> (+ 54928) VMENTRY >>>> >>>> >>> Can you oprofile the host to see where the 54K cycles are spent? >>> >>> > > Most of the cycles (~80% of that 54k+) are spent in paging64_prefetch_page(): > > for (i = 0; i < PT64_ENT_PER_PAGE; ++i) { > gpa_t pte_gpa = gfn_to_gpa(sp->gfn); > pte_gpa += (i+offset) * sizeof(pt_element_t); > > r = kvm_read_guest_atomic(vcpu->kvm, pte_gpa, &pt, > sizeof(pt_element_t)); > if (r || is_present_pte(pt)) > sp->spt[i] = shadow_trap_nonpresent_pte; > else > sp->spt[i] = shadow_notrap_nonpresent_pte; > } > > This loop is run 512 times and takes a total of ~45k cycles, or ~88 cycles per > loop. > > This function gets run >20,000/sec during some of the kscand loops. > > We really ought to optimize it. That's second order however. The real fix is making sure it isn't called so often. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. |
From: Marcelo T. <mto...@re...> - 2008-04-28 18:12:57
|
On Fri, Apr 25, 2008 at 11:33:18AM -0600, David S. Ahern wrote: > Most of the cycles (~80% of that 54k+) are spent in paging64_prefetch_page(): > > for (i = 0; i < PT64_ENT_PER_PAGE; ++i) { > gpa_t pte_gpa = gfn_to_gpa(sp->gfn); > pte_gpa += (i+offset) * sizeof(pt_element_t); > > r = kvm_read_guest_atomic(vcpu->kvm, pte_gpa, &pt, > sizeof(pt_element_t)); > if (r || is_present_pte(pt)) > sp->spt[i] = shadow_trap_nonpresent_pte; > else > sp->spt[i] = shadow_notrap_nonpresent_pte; > } > > This loop is run 512 times and takes a total of ~45k cycles, or ~88 cycles per > loop. > > This function gets run >20,000/sec during some of the kscand loops. Hi David, Do you see the mmu_recycled counter increase? |
From: David S. A. <da...@ci...> - 2008-04-28 23:45:05
|
Hi Marcelo: mmu_recycled is always 0 for this guest -- even after almost 4 hours of uptime. Here is a kvm_stat sample where guest time was very high and qemu had 2 processors at 100% on the host. I removed counters where both columns have 0 value for brevity. exits 45937979 758051 fpu_reload 1416831 87 halt_exits 112911 0 halt_wakeup 31771 0 host_state_reload 2068602 263 insn_emulation 21601480 365493 io_exits 1827374 2705 irq_exits 8934818 285196 mmio_exits 421674 147 mmu_cache_miss 4817689 93680 mmu_flooded 4815273 93680 mmu_pde_zapped 51344 0 mmu_prefetch 4817625 93680 mmu_pte_updated 14803298 270104 mmu_pte_write 19859863 363785 mmu_shadow_zapped 4832106 93679 pf_fixed 32184355 468398 pf_guest 264138 0 remote_tlb_flush 10697762 280522 tlb_flush 10301338 176424 (NOTE: This is for a *5* second sample interval instead of 1 to allow me to capture the data). Here's a sample when the guest is "well-behaved" (system time <10%, though ): exits 51502194 97453 fpu_reload 1421736 227 halt_exits 138361 1927 halt_wakeup 33047 117 host_state_reload 2110190 3740 insn_emulation 24367441 47260 io_exits 1874075 2576 irq_exits 10224702 13333 mmio_exits 435154 1726 mmu_cache_miss 5414097 11258 mmu_flooded 5411548 11243 mmu_pde_zapped 52851 44 mmu_prefetch 5414031 11258 mmu_pte_updated 16854686 29901 mmu_pte_write 22526765 42285 mmu_shadow_zapped 5430025 11313 pf_fixed 36144578 67666 pf_guest 282794 430 remote_tlb_flush 12126268 14619 tlb_flush 11753162 21460 There is definitely a strong correlation between the mmu counters and high system times in the guest. I am still trying to find out what in the guest is stimulating it when running on RHEL3; I do not see this same behavior for an equivalent setup running on RHEL4. By the way I added an mmu_prefetch stat in prefetch_page() to count the number of times the for() loop is hit with PTTYPE == 64; ie., number of times paging64_prefetch_page() is invoked. (I wanted an explicit counter for this loop, though the info seems to duplicate other entries.) That counter is listed above. As I mentioned in a prior post when kscand kicks in the change in mmu_prefetch counter is at 20,000+/sec, with each trip through that function taking 45k+ cycles. kscand is an instigator shortly after boot, however, kscand is *not* the culprit once the system has been up for 30-45 minutes. I have started instrumenting the RHEL3U8 kernel and for the load I am running kscand does not walk the active lists very often once the system is up. So, to dig deeper on what in the guest is stimulating the mmu I collected kvmtrace data for about a 2 minute time interval which caught about a 30-second period where guest system time was steady in the 25-30% range. Summarizing the number of times a RIP appears in an VMEXIT shows the following high runners: count RIP RHEL3-symbol 82549 0xc0140e42 follow_page [kernel] c0140d90 offset b2 42532 0xc0144760 handle_mm_fault [kernel] c01446d0 offset 90 36826 0xc013da4a futex_wait [kernel] c013d870 offset 1da 29987 0xc0145cd0 zap_pte_range [kernel] c0145c10 offset c0 27451 0xc0144018 do_no_page [kernel] c0143e20 offset 1f8 (halt entry removed the list since that is the ideal scenario for an exit). So the RIP correlates to follow_page() for a large percentage of the VMEXITs. I wrote an awk script to summarize (histogram style) the TSC cycles between VMEXIT and VMENTRY for an address. For the first rip, 0xc0140e42, 82,271 times (ie., almost 100% of the time) the trace shows a delta between 50k and 100k cycles between the VMEXIT and the subsequent VMENTRY. Similarly for the second one, 0xc0144760, 42403 times (again almost 100% of the occurrences) the trace shows a delta between 50k and 100k cycles between VMEXIT and VMENTRY. These seems to correlate with the prefetch_page function in kvm, though I am not 100% positive on that. I am now investigating the kernel paths leading to those functions. Any insights would definitely be appreciated. david Marcelo Tosatti wrote: > On Fri, Apr 25, 2008 at 11:33:18AM -0600, David S. Ahern wrote: >> Most of the cycles (~80% of that 54k+) are spent in paging64_prefetch_page(): >> >> for (i = 0; i < PT64_ENT_PER_PAGE; ++i) { >> gpa_t pte_gpa = gfn_to_gpa(sp->gfn); >> pte_gpa += (i+offset) * sizeof(pt_element_t); >> >> r = kvm_read_guest_atomic(vcpu->kvm, pte_gpa, &pt, >> sizeof(pt_element_t)); >> if (r || is_present_pte(pt)) >> sp->spt[i] = shadow_trap_nonpresent_pte; >> else >> sp->spt[i] = shadow_notrap_nonpresent_pte; >> } >> >> This loop is run 512 times and takes a total of ~45k cycles, or ~88 cycles per >> loop. >> >> This function gets run >20,000/sec during some of the kscand loops. > > Hi David, > > Do you see the mmu_recycled counter increase? > |
From: David S. A. <da...@ci...> - 2008-04-30 04:18:21
|
Another tidbit for you guys as I make my way through various permutations: I installed the RHEL3 hugemem kernel and the guest behavior is *much* better. System time still has some regular hiccups that are higher than xen and esx (e.g., 1 minute samples out of 5 show system time between 10 and 15%), but overall guest behavior is good with the hugemem kernel. One side effect I've noticed is that I cannot restart the RHEL3 guest running the hugemem kernel in successive attempts. The guest has 2 vcpus and qemu shows one thread at 100% cpu. If I recall correctly kvm_stat shows a large amount of tlb_flushes (like millions in a 5-second sample). The scenario is: 1. start guest running hugemem kernel, 2. shutdown, 3. restart guest. During 3. it hangs, but at random points. Removing kvm/kvm-intel has no effect - guest still hangs on the restart. Rebooting the host clears the problem. Alternatively, during the hang on a restart I can kill the guest, and then on restart choose the normal, 32-bit smp kernel and the guest boots just fine. At this point I can shutdown the guest and restart with the hugemem kernel and it boots just fine. david David S. Ahern wrote: > Hi Marcelo: > > mmu_recycled is always 0 for this guest -- even after almost 4 hours of uptime. > > Here is a kvm_stat sample where guest time was very high and qemu had 2 > processors at 100% on the host. I removed counters where both columns have 0 > value for brevity. > > exits 45937979 758051 > fpu_reload 1416831 87 > halt_exits 112911 0 > halt_wakeup 31771 0 > host_state_reload 2068602 263 > insn_emulation 21601480 365493 > io_exits 1827374 2705 > irq_exits 8934818 285196 > mmio_exits 421674 147 > mmu_cache_miss 4817689 93680 > mmu_flooded 4815273 93680 > mmu_pde_zapped 51344 0 > mmu_prefetch 4817625 93680 > mmu_pte_updated 14803298 270104 > mmu_pte_write 19859863 363785 > mmu_shadow_zapped 4832106 93679 > pf_fixed 32184355 468398 > pf_guest 264138 0 > remote_tlb_flush 10697762 280522 > tlb_flush 10301338 176424 > > (NOTE: This is for a *5* second sample interval instead of 1 to allow me to > capture the data). > > Here's a sample when the guest is "well-behaved" (system time <10%, though ): > exits 51502194 97453 > fpu_reload 1421736 227 > halt_exits 138361 1927 > halt_wakeup 33047 117 > host_state_reload 2110190 3740 > insn_emulation 24367441 47260 > io_exits 1874075 2576 > irq_exits 10224702 13333 > mmio_exits 435154 1726 > mmu_cache_miss 5414097 11258 > mmu_flooded 5411548 11243 > mmu_pde_zapped 52851 44 > mmu_prefetch 5414031 11258 > mmu_pte_updated 16854686 29901 > mmu_pte_write 22526765 42285 > mmu_shadow_zapped 5430025 11313 > pf_fixed 36144578 67666 > pf_guest 282794 430 > remote_tlb_flush 12126268 14619 > tlb_flush 11753162 21460 > > > There is definitely a strong correlation between the mmu counters and high > system times in the guest. I am still trying to find out what in the guest is > stimulating it when running on RHEL3; I do not see this same behavior for an > equivalent setup running on RHEL4. > > By the way I added an mmu_prefetch stat in prefetch_page() to count the number > of times the for() loop is hit with PTTYPE == 64; ie., number of times > paging64_prefetch_page() is invoked. (I wanted an explicit counter for this > loop, though the info seems to duplicate other entries.) That counter is listed > above. As I mentioned in a prior post when kscand kicks in the change in > mmu_prefetch counter is at 20,000+/sec, with each trip through that function > taking 45k+ cycles. > > kscand is an instigator shortly after boot, however, kscand is *not* the culprit > once the system has been up for 30-45 minutes. I have started instrumenting the > RHEL3U8 kernel and for the load I am running kscand does not walk the active > lists very often once the system is up. > > So, to dig deeper on what in the guest is stimulating the mmu I collected > kvmtrace data for about a 2 minute time interval which caught about a 30-second > period where guest system time was steady in the 25-30% range. Summarizing the > number of times a RIP appears in an VMEXIT shows the following high runners: > > count RIP RHEL3-symbol > 82549 0xc0140e42 follow_page [kernel] c0140d90 offset b2 > 42532 0xc0144760 handle_mm_fault [kernel] c01446d0 offset 90 > 36826 0xc013da4a futex_wait [kernel] c013d870 offset 1da > 29987 0xc0145cd0 zap_pte_range [kernel] c0145c10 offset c0 > 27451 0xc0144018 do_no_page [kernel] c0143e20 offset 1f8 > > (halt entry removed the list since that is the ideal scenario for an exit). > > So the RIP correlates to follow_page() for a large percentage of the VMEXITs. > > I wrote an awk script to summarize (histogram style) the TSC cycles between > VMEXIT and VMENTRY for an address. For the first rip, 0xc0140e42, 82,271 times > (ie., almost 100% of the time) the trace shows a delta between 50k and 100k > cycles between the VMEXIT and the subsequent VMENTRY. Similarly for the second > one, 0xc0144760, 42403 times (again almost 100% of the occurrences) the trace > shows a delta between 50k and 100k cycles between VMEXIT and VMENTRY. These > seems to correlate with the prefetch_page function in kvm, though I am not 100% > positive on that. > > I am now investigating the kernel paths leading to those functions. Any insights > would definitely be appreciated. > > david > > > Marcelo Tosatti wrote: >> On Fri, Apr 25, 2008 at 11:33:18AM -0600, David S. Ahern wrote: >>> Most of the cycles (~80% of that 54k+) are spent in paging64_prefetch_page(): >>> >>> for (i = 0; i < PT64_ENT_PER_PAGE; ++i) { >>> gpa_t pte_gpa = gfn_to_gpa(sp->gfn); >>> pte_gpa += (i+offset) * sizeof(pt_element_t); >>> >>> r = kvm_read_guest_atomic(vcpu->kvm, pte_gpa, &pt, >>> sizeof(pt_element_t)); >>> if (r || is_present_pte(pt)) >>> sp->spt[i] = shadow_trap_nonpresent_pte; >>> else >>> sp->spt[i] = shadow_notrap_nonpresent_pte; >>> } >>> >>> This loop is run 512 times and takes a total of ~45k cycles, or ~88 cycles per >>> loop. >>> >>> This function gets run >20,000/sec during some of the kscand loops. >> Hi David, >> >> Do you see the mmu_recycled counter increase? >> > |
From: Avi K. <av...@qu...> - 2008-04-30 13:02:11
|
David S. Ahern wrote: > Another tidbit for you guys as I make my way through various permutations: > I installed the RHEL3 hugemem kernel and the guest behavior is *much* better. > System time still has some regular hiccups that are higher than xen and esx > (e.g., 1 minute samples out of 5 show system time between 10 and 15%), but > overall guest behavior is good with the hugemem kernel. > > Wait, the amount of info here is overwhelming. Let's stick with the current kernel (32-bit, HIGHMEM4G, right?) Did you get any traces with bypass_guest_pf=0? That may show more info. -- Any sufficiently difficult bug is indistinguishable from a feature. |
From: David S. A. <da...@ci...> - 2008-04-30 13:45:22
|
Avi Kivity wrote: > David S. Ahern wrote: >> Another tidbit for you guys as I make my way through various >> permutations: >> I installed the RHEL3 hugemem kernel and the guest behavior is *much* >> better. >> System time still has some regular hiccups that are higher than xen >> and esx >> (e.g., 1 minute samples out of 5 show system time between 10 and 15%), >> but >> overall guest behavior is good with the hugemem kernel. >> >> > > Wait, the amount of info here is overwhelming. Let's stick with the > current kernel (32-bit, HIGHMEM4G, right?) > > Did you get any traces with bypass_guest_pf=0? That may show more info. > My preference is to stick with the "standard", 32-bit RHEL3 kernel in the guest. My point in the last email was that the hugemem kernel shows a remarkable difference (it uses 3-levels of page tables right?). I was hoping that would ring a bell with someone. Adding bypass_guest_pf=0 did not improve the situation. Did you want anything particular with that setting -- like a RIP summary or a summary of exit-entry cycles? david |
From: Avi K. <av...@qu...> - 2008-04-30 13:51:32
|
David S. Ahern wrote: > Avi Kivity wrote: > >> David S. Ahern wrote: >> >>> Another tidbit for you guys as I make my way through various >>> permutations: >>> I installed the RHEL3 hugemem kernel and the guest behavior is *much* >>> better. >>> System time still has some regular hiccups that are higher than xen >>> and esx >>> (e.g., 1 minute samples out of 5 show system time between 10 and 15%), >>> but >>> overall guest behavior is good with the hugemem kernel. >>> >>> >>> >> Wait, the amount of info here is overwhelming. Let's stick with the >> current kernel (32-bit, HIGHMEM4G, right?) >> >> Did you get any traces with bypass_guest_pf=0? That may show more info. >> >> > > My preference is to stick with the "standard", 32-bit RHEL3 kernel in the guest. > Me too. I would like to see all reasonable guests supported well, without performance issues, and not have to tell the use which kernel to use. > My point in the last email was that the hugemem kernel shows a remarkable > difference (it uses 3-levels of page tables right?). I was hoping that would > ring a bell with someone. > From the traces I saw I think the standard kernel is pae as well. Can you verify? I think it's CONFIG_HIGHMEM4G (instead of CONFIG_HIGHMEM64G) but that option may be different for such an old kernel. > Adding bypass_guest_pf=0 did not improve the situation. Did you want anything > particular with that setting -- like a RIP summary or a summary of exit-entry > cycles? > I asked fo this thinking bypass_guest_pf may help show more information. But thinking a bit more, it will not. I think I do know what the problem is. I will try it out. Is there a free clone (like centos) available somewhere? -- Any sufficiently difficult bug is indistinguishable from a feature. |
From: Avi K. <av...@qu...> - 2008-05-11 12:32:05
Attachments:
prevent-kscand-flooding.patch
|
Avi Kivity wrote: > > I asked fo this thinking bypass_guest_pf may help show more > information. But thinking a bit more, it will not. > > I think I do know what the problem is. I will try it out. Is there a > free clone (like centos) available somewhere? This patch tracks down emulated accesses to speculated ptes and marks them as accessed, preventing the flooding on centos-3.1. Unfortunately it also causes a host oops midway through the boot process. I believe the oops is merely exposed by the patch, not caused by it. -- error compiling committee.c: too many arguments to function |
From: Avi K. <av...@qu...> - 2008-05-11 13:36:42
Attachments:
prevent-kscand-flooding.patch
|
Avi Kivity wrote: > Avi Kivity wrote: >> >> I asked fo this thinking bypass_guest_pf may help show more >> information. But thinking a bit more, it will not. >> >> I think I do know what the problem is. I will try it out. Is there >> a free clone (like centos) available somewhere? > > This patch tracks down emulated accesses to speculated ptes and marks > them as accessed, preventing the flooding on centos-3.1. > Unfortunately it also causes a host oops midway through the boot process. > > I believe the oops is merely exposed by the patch, not caused by it. > It was caused by the patch, please try the updated one attached. -- error compiling committee.c: too many arguments to function |
From: David S. A. <da...@ci...> - 2008-05-13 03:49:22
|
That does the trick with kscand. Do you have recommendations for clock source settings? For example in my test case for this patch the guest gained 73 seconds (ahead of real time) after only 3 hours, 5 min of uptime. thanks, david Avi Kivity wrote: > Avi Kivity wrote: >> Avi Kivity wrote: >>> >>> I asked fo this thinking bypass_guest_pf may help show more >>> information. But thinking a bit more, it will not. >>> >>> I think I do know what the problem is. I will try it out. Is there >>> a free clone (like centos) available somewhere? >> >> This patch tracks down emulated accesses to speculated ptes and marks >> them as accessed, preventing the flooding on centos-3.1. >> Unfortunately it also causes a host oops midway through the boot process. >> >> I believe the oops is merely exposed by the patch, not caused by it. >> > > It was caused by the patch, please try the updated one attached. > > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > Don't miss this year's exciting event. There's still time to save $100. > Use priority code J8TL2D2. > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > > > ------------------------------------------------------------------------ > > _______________________________________________ > kvm-devel mailing list > kvm...@li... > https://lists.sourceforge.net/lists/listinfo/kvm-devel |
From: Avi K. <av...@qu...> - 2008-05-13 07:25:38
|
David S. Ahern wrote: > That does the trick with kscand. > > Not so fast... the patch updates the flood count to 5. Can you check if a lower value still works? Also, whether updating the flood count to 5 (without the rest of the patch) works? Unconditionally bumping the flood count to 5 will likely cause a performance regression on other guests. While I was able to see excessive flooding, I couldn't reproduce your kscand problem. Running /bin/true always returned immediately for me. > Do you have recommendations for clock source settings? For example in my > test case for this patch the guest gained 73 seconds (ahead of real > time) after only 3 hours, 5 min of uptime. > The kernel is trying to correlate tsc and pit, which isn't going to work. Try disabling the tsc, set edx.bit4=0 for cpuid.eax=1 in qemu-kvm-x86 .c do_cpuid_ent(). -- error compiling committee.c: too many arguments to function |
From: David S. A. <da...@ci...> - 2008-05-14 20:36:13
|
Avi Kivity wrote: > Not so fast... the patch updates the flood count to 5. Can you check > if a lower value still works? Also, whether updating the flood count to > 5 (without the rest of the patch) works? > > Unconditionally bumping the flood count to 5 will likely cause a > performance regression on other guests. I put the flood count back to 3, and the RHEL3 guest performance is even better. > > While I was able to see excessive flooding, I couldn't reproduce your > kscand problem. Running /bin/true always returned immediately for me. A poor attempt at finding a simplistic, minimal re-create. The use case I am investigating has over 500 processes/threads with a base memory consumption around 1GB. I was finding it nearly impossible to have a generic re-create of the problem for you to use in your investigations on CentOS. Thanks for the patch. david |
From: Avi K. <av...@qu...> - 2008-05-15 10:53:38
|
David S. Ahern wrote: > Avi Kivity wrote: > >> Not so fast... the patch updates the flood count to 5. Can you check >> if a lower value still works? Also, whether updating the flood count to >> 5 (without the rest of the patch) works? >> >> Unconditionally bumping the flood count to 5 will likely cause a >> performance regression on other guests. >> > > I put the flood count back to 3, and the RHEL3 guest performance is even > better. > > Okay, I committed the patch without the flood count == 5. -- error compiling committee.c: too many arguments to function |
From: David S. A. <da...@ci...> - 2008-05-17 04:31:29
Attachments:
kvm-stats-rhel3
|
Avi Kivity wrote: > > Okay, I committed the patch without the flood count == 5. > I've continued testing the RHEL3 guests with the flood count at 3, and I am right back to where I started. With the patch and the flood count at 3, I had 2 runs totaling around 24 hours that looked really good. Now, I am back to square one. I guess the short of it is that I am not sure if the patch resolves this issue or not. If you want to back it out, I can continue to apply on my end as I continue testing. A snapshot of kvm_stat -f 'mmu*' -l is attached for two test runs with the patch (line wrap is horrible inline). I will work on creating an ap that will stimulate kscand activity similar to what I am seeing. Also, in a prior e-mail I mentioned guest time advancing rapidly. I've noticed that with the -no-kvm-pit option the guest time is much better and typically stays within 3 seconds or so of the host, even through the high kscand activity which is one instance of when I've noticed time jumps with the kernel pit. Yes, this result has been repeatable through 6 or so runs. :-) david |