From: david a. <da...@ci...> - 2008-02-21 18:34:14
|
I know this issue has been discussed on this list before, but I am still experiencing network freezes in a guest that requires a restart to clear. When the network freezes in the guest I no longer see the network interrupts counter incrementing (i.e., the eth0 counter in /proc/interrupts in the guest). Using the crash utility, I verified that the interrupt is still enabled on the guest side and that no interrupts are pending. This suggests that the interrupts are not getting delivered to the VM. On the host side, I attached gdb to the qemu process. When the networking freezes in the guest the irq_state for the rtl8139 device is: irq_state = {1, 0, 0, 0} My current guess is that there is a race condition getting tripped where the change in irq state is not getting pushed to the VM or the race is in clearing the interrupt. Once missed nothing resets the irq_state to 0, so no more of these interrupts are delivered to the VM. From pci_set_irq() in qemu/hw/pci.c: change = level - pci_dev->irq_state[irq_num]; if (!change) return; I take this to mean that as long as irq_state is 1, any further requests to set it to 1 are basically ignored, so the interrupt is not pushed to the VM. Also, I have only seen the network interrupt in this state; the timer interrupts and ide interrupts appear to be working fine even when the network interrupts stop. In debugging the problem I noticed that it is easy for the guest to be overwhelmed with packets such that the ring buffer fills in the rtl8139 device in qemu. You see this in the fifo counter in the output of 'ethtool -S eth0' on the guest side, and I confirmed it by modifying the qemu source for the nic to print when packets are dropped due to no space in the ring buffer. I am seeing as much as 100+ packets dropped per second. (The code for the existing message "RTL8139: C+ Rx mode : descriptor XX is owned by host" was modified to print a summary of the number of times this message was hit per second; without flow control the number of messages was unwieldy.) If I recompile the guest side 8139cp driver to use a higher "weight" (e.g., 48 or 64), then the number of dropped packets due to ring buffer overflow drops dramatically (I have also run cases with the size of the ring buffer increased). Incrementing this weight allows more packets to be pulled from the device each time the cp_rx_poll() function is called in the guest kernel. With the increased weight I still see network freezes but can usually run longer before a freeze occurs (e.g., with the increase I can run for say an hour+ with a network load going versus without the increase I can trigger the freeze pretty quickly with a network load). Host ---- Model: PowerEdge 2950 CPUs: two dual-core, Xeon(R) CPU 5140 @ 2.33GHz OS: RHEL 5.1, x86_64 kernel: 2.6.24.2 kernel, x86_64 Network: attached to gigabit network kvm: kvm-61 Guest ----- VM: 2.5 GB RAM, 2 VCPUs, rtl8139 NIC OS: RHEL4 U4 32-bit kernel: U6 kernel recompiled to run at 250 HZ rather than 1000 HZ NIC: 8139cp driver Command line: /usr/local/bin/qemu-system-x86_64 -localtime -m 2560 -smp 2 \ -hda rootfs.img -hdb trace.img \ -net nic,macaddr=00:16:3e:30:f0:32,model=rtl8139 -net tap \ -monitor stdio -vnc :2 I am continuing to look into the irq processing on the kvm/qemu side. I'd like to know if anyone has suggestions on what to look at. This is my first foray into the kvm and qemu code, and it's a lot to take in all at once. thanks, david |
From: Avi K. <av...@qu...> - 2008-02-24 10:14:05
|
david ahern wrote: > I know this issue has been discussed on this list before, but I am still > experiencing network freezes in a guest that requires a restart to clear. When > the network freezes in the guest I no longer see the network interrupts counter > incrementing (i.e., the eth0 counter in /proc/interrupts in the guest). Using > the crash utility, I verified that the interrupt is still enabled on the guest > side and that no interrupts are pending. This suggests that the interrupts are > not getting delivered to the VM. > > [...] > I am continuing to look into the irq processing on the kvm/qemu side. I'd like > to know if anyone has suggestions on what to look at. This is my first foray > into the kvm and qemu code, and it's a lot to take in all at once. > > Standard procedure is to run with -no-kvm and -no-kvm-irqchip, to see if the problem is in qemu proper, the in-kernel irq handling, or the rest of kvm. -- error compiling committee.c: too many arguments to function |
From: david a. <da...@ci...> - 2008-02-24 19:50:02
|
I've run a lot more tests: - with the -no-kvm-irqchip option the vm eventully stops responding to network or console, - with the -no-kvm option the performance is so bad I cannot get our ap up and running so the results are inconclusive, - I've tried the e1000 and pcnet nic models and both showed the network lockup; with the ne2k_pci nic I did not see dropped packets and the network never locked up in 12+ hours, but system CPU time was 10% higher than when the rtl8139 nic was working - if I remove the "if (!change) return" optimization from pci_set_irq the rtl8139 nic worked fine for 16+ hours. I'm not recommending this as a fix, just confirming that the problem goes away. - I tried adding a thread mutex to the rtl8139 device model around accesses to the RTL8139State data, but the network still locked up. david Avi Kivity wrote: > david ahern wrote: >> I know this issue has been discussed on this list before, but I am still >> experiencing network freezes in a guest that requires a restart to clear. When >> the network freezes in the guest I no longer see the network interrupts counter >> incrementing (i.e., the eth0 counter in /proc/interrupts in the guest). Using >> the crash utility, I verified that the interrupt is still enabled on the guest >> side and that no interrupts are pending. This suggests that the interrupts are >> not getting delivered to the VM. >> >> > > [...] > >> I am continuing to look into the irq processing on the kvm/qemu side. I'd like >> to know if anyone has suggestions on what to look at. This is my first foray >> into the kvm and qemu code, and it's a lot to take in all at once. >> >> > > Standard procedure is to run with -no-kvm and -no-kvm-irqchip, to see if > the problem is in qemu proper, the in-kernel irq handling, or the rest > of kvm. > |
From: Avi K. <av...@qu...> - 2008-02-25 08:57:27
|
david ahern wrote: > I've run a lot more tests: > > > - if I remove the "if (!change) return" optimization from pci_set_irq the > rtl8139 nic worked fine for 16+ hours. I'm not recommending this as a fix, just > confirming that the problem goes away. > > Interesting. What can cause this to happen? - some non-pci device shares the same irq (unlikely) - the pci link sharing is broken. Is the eth0 irq shared? Please post /proc/interrupts. - the in-kernel ioapic is buggy and needs the extra kicking the optimization prevents. Can be checked by re-adding the optimization to kvm_ioapic_set_irq() (keeping it removed in qemu). If it works, the problem is in userspace. If it fails, the problem is in the kernel. Something like static int old_level[16]; if (level == old_level[irq]) return; old_level[irq] = level; -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. |
From: david a. <da...@ci...> - 2008-02-25 16:11:34
|
Avi Kivity wrote: > david ahern wrote: >> I've run a lot more tests: >> >> >> - if I remove the "if (!change) return" optimization from pci_set_irq the >> rtl8139 nic worked fine for 16+ hours. I'm not recommending this as a >> fix, just >> confirming that the problem goes away. >> >> > > Interesting. What can cause this to happen? > > - some non-pci device shares the same irq (unlikely) > > - the pci link sharing is broken. Is the eth0 irq shared? interrupt is not shared. > > Please post /proc/interrupts. # cat /proc/interrupts CPU0 CPU1 0: 10566 46468 IO-APIC-edge timer 1: 5 5 IO-APIC-edge i8042 8: 0 1 IO-APIC-edge rtc 9: 0 0 IO-APIC-level acpi 11: 243118 5656 IO-APIC-level eth0 12: 180 45 IO-APIC-edge i8042 14: 2021 12592 IO-APIC-edge ide0 15: 14 10 IO-APIC-edge ide1 NMI: 0 0 LOC: 56947 56946 ERR: 0 MIS: 31 > > - the in-kernel ioapic is buggy and needs the extra kicking the > optimization prevents. Can be checked by re-adding the optimization to > kvm_ioapic_set_irq() (keeping it removed in qemu). If it works, the > problem is in userspace. If it fails, the problem is in the kernel. > > Something like > > static int old_level[16]; > > if (level == old_level[irq]) > return; > old_level[irq] = level; > > > I'll give this a shot and let you know. If you are interested, here's some more info on the -no-kvm-irqchip option: qemu ends up spinning with 1 thread consuming 100% cpu. Output from top (literally the top 11 lines) with 'show threads' and individual cpu stats: Tasks: 125 total, 2 running, 123 sleeping, 0 stopped, 0 zombie Cpu0 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 :100.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 : 1.0%us, 0.0%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 4046804k total, 4013480k used, 33324k free, 42512k buffers Swap: 2096472k total, 120k used, 2096352k free, 1159892k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4441 root 20 0 2675m 2.5g 9808 R 100 65.0 499:34.09 qemu-system-x86 4426 root 20 0 2675m 2.5g 9808 S 1 65.0 16:24.50 qemu-system-x86 ... Hooking up gdb shows it cycling with the following backtrace: (gdb) bt #0 0x00002ad97b5ee3e8 in do_sigtimedwait () from /lib64/libc.so.6 #1 0x00002ad97b5ee4ae in sigtimedwait () from /lib64/libc.so.6 #2 0x00000000004fb7df in kvm_eat_signal (env=0x2ade460, timeout=10) at /opt/kvm/kvm-61/qemu/qemu-kvm.c:156 #3 0x00000000004fb9e4 in kvm_eat_signals (env=0x2ade460, timeout=10) at /opt/kvm/kvm-61/qemu/qemu-kvm.c:192 #4 0x00000000004fba49 in kvm_main_loop_wait (env=0x2ade460, timeout=10) at /opt/kvm/kvm-61/qemu/qemu-kvm.c:211 #5 0x00000000004fc278 in kvm_main_loop_cpu (env=0x2ade460) at /opt/kvm/kvm-61/qemu/qemu-kvm.c:299 #6 0x000000000040ff2d in main (argc=<value optimized out>, argv=0x7fff304607b8) at /opt/kvm/kvm-61/qemu/vl.c:7856 I have a dump of CPUX86State *env if you want to see it. david |
From: david a. <da...@ci...> - 2008-02-25 17:06:18
|
david ahern wrote: > Avi Kivity wrote: >> - the in-kernel ioapic is buggy and needs the extra kicking the >> optimization prevents. Can be checked by re-adding the optimization to >> kvm_ioapic_set_irq() (keeping it removed in qemu). If it works, the >> problem is in userspace. If it fails, the problem is in the kernel. >> >> Something like >> >> static int old_level[16]; >> >> if (level == old_level[irq]) >> return; >> old_level[irq] = level; >> With the "if (!change) return;" taken out of pci_set_irq() and the above code added to kvm_ioapic_set_irq() networking froze. david |
From: david a. <da...@ci...> - 2008-02-26 14:50:45
|
Usually within a few hours, sometimes within 30 minutes. Load averages as computed by sysstat in the nightly sar files: rxpck/s txpck/s rxbyt/s txbyt/s eth0 975.18 1188.34 82044.06 171655.38 Interrupts come in at 1830/sec for eth0, 250/sec for the timer and 20/sec or ide0. Are you using a RHEL4 image? david Avi Kivity wrote: > david ahern wrote: >> Almost 7 hours and the uniprocessor case is still chugging along. >> >> > > How long does it usually take to hang? > > How do I go about reproducing this? apachebench (host) against httpd > (guest) doesn't seem to trigger it. > |
From: Avi K. <av...@qu...> - 2008-02-26 15:06:56
|
david ahern wrote: > Usually within a few hours, sometimes within 30 minutes. > > Load averages as computed by sysstat in the nightly sar files: > > rxpck/s txpck/s rxbyt/s txbyt/s > eth0 975.18 1188.34 82044.06 171655.38 > > Interrupts come in at 1830/sec for eth0, 250/sec for the timer and 20/sec or ide0. > > Are you using a RHEL4 image? > Nope, FC6 x86_64. My apachebench command line is ab -c 150 -n 10000000 -k http://guest/ 12K eth interrupts/sec, 20 ide interrupts/sec, 250Hz timer. The served file is quite small, so many small packets. -- error compiling committee.c: too many arguments to function |
From: Avi K. <av...@qu...> - 2008-02-25 17:19:54
|
david ahern wrote: > david ahern wrote: > >> Avi Kivity wrote: >> >>> - the in-kernel ioapic is buggy and needs the extra kicking the >>> optimization prevents. Can be checked by re-adding the optimization to >>> kvm_ioapic_set_irq() (keeping it removed in qemu). If it works, the >>> problem is in userspace. If it fails, the problem is in the kernel. >>> >>> Something like >>> >>> static int old_level[16]; >>> >>> if (level == old_level[irq]) >>> return; >>> old_level[irq] = level; >>> >>> > > With the "if (!change) return;" taken out of pci_set_irq() and the above code > added to kvm_ioapic_set_irq() networking froze. > That points the finger at the kernel ioapic. I saw from the /proc/interrupts dump that it's an smp guest. Does it freeze on uniprocessor as well? Maybe it's bad locking in the kernel. -- error compiling committee.c: too many arguments to function |
From: david a. <da...@ci...> - 2008-02-26 04:57:06
|
Almost 7 hours and the uniprocessor case is still chugging along. david Avi Kivity wrote: > david ahern wrote: >> david ahern wrote: >> >>> Avi Kivity wrote: >>> >>>> - the in-kernel ioapic is buggy and needs the extra kicking the >>>> optimization prevents. Can be checked by re-adding the optimization to >>>> kvm_ioapic_set_irq() (keeping it removed in qemu). If it works, the >>>> problem is in userspace. If it fails, the problem is in the kernel. >>>> >>>> Something like >>>> >>>> static int old_level[16]; >>>> >>>> if (level == old_level[irq]) >>>> return; >>>> old_level[irq] = level; >>>> >>>> >> With the "if (!change) return;" taken out of pci_set_irq() and the above code >> added to kvm_ioapic_set_irq() networking froze. >> > > That points the finger at the kernel ioapic. > > I saw from the /proc/interrupts dump that it's an smp guest. Does it > freeze on uniprocessor as well? Maybe it's bad locking in the kernel. > |
From: Avi K. <av...@qu...> - 2008-02-26 09:30:55
|
david ahern wrote: > Almost 7 hours and the uniprocessor case is still chugging along. > > How long does it usually take to hang? How do I go about reproducing this? apachebench (host) against httpd (guest) doesn't seem to trigger it. -- error compiling committee.c: too many arguments to function |
From: Avi K. <av...@qu...> - 2008-02-26 14:41:33
|
Avi Kivity wrote: > david ahern wrote: >> Almost 7 hours and the uniprocessor case is still chugging along. >> >> > > How long does it usually take to hang? > > How do I go about reproducing this? apachebench (host) against httpd > (guest) doesn't seem to trigger it. > ab (on host) against httpd (on guest) reproduces within a few minutes. uniprocessor guest: works smp guest: fails smp guest with one offlined cpu: works smp guest with httpd pinned to cpu 0: works so it's either a guest driver smp problem, or an ioapic problem. -- error compiling committee.c: too many arguments to function |
From: david a. <da...@ci...> - 2008-02-26 14:55:31
|
I noted in my original post that if I increase the weight parameter in the driver to have it pull more packets on each poll before taking it break then it takes longer to freeze. I have looked at a newer version of the 8139cp driver. Very few changes to the poll function; most of them seem to be accommodating changes to the netdevice/napi api. I'll take a closer look at it today. david Avi Kivity wrote: > Avi Kivity wrote: >> david ahern wrote: >>> Almost 7 hours and the uniprocessor case is still chugging along. >>> >>> >> >> How long does it usually take to hang? >> >> How do I go about reproducing this? apachebench (host) against httpd >> (guest) doesn't seem to trigger it. >> > > ab (on host) against httpd (on guest) reproduces within a few minutes. > > uniprocessor guest: works > smp guest: fails > smp guest with one offlined cpu: works > smp guest with httpd pinned to cpu 0: works > > so it's either a guest driver smp problem, or an ioapic problem. > |
From: Eckersid S. <es...@gu...> - 2008-03-04 18:10:19
|
david ahern <daahern <at> cisco.com> writes: > I know this issue has been discussed on this list before, but I am still > experiencing network freezes in a guest that requires a restart to clear. When > the network freezes in the guest I no longer see the network interrupts counter > incrementing (i.e., the eth0 counter in /proc/interrupts in the guest). Using > the crash utility, I verified that the interrupt is still enabled on the guest > side and that no interrupts are pending. This suggests that the interrupts are > not getting delivered to the VM. I just wanted to let the developers know that I'm having similar problems concerning interrupts with networking dying as well. Running a stress test of kvm using an EnGarde Secure Linux 1.5 guest OS. Under a heavy network email load, the guest OS networking gets knocked out - unable to ping, ssh, etc. Can only get things started again by going into vncviewer and restarting the networking services from there. CPUs: 8 x Intel(R) Xeon(R) CPU E5335 @ 2.00GHz KVM 52-1 Host Kernel: 2.6.25-rc2 Kernel Arch: x86_64 Guest OS: EnGarde Secure Linux 32bit i686, 2.4.31-1.5.60 Command Line: /usr/bin/qemu-system -hda /root/images/bwimail01.img -boot c -m 384 -smp 4 -std-vga -net nic,vlan=0,macaddr=52:54:00:12:34:6F -net tap,ifname=tap1,script=/etc/qemu-ifup -vnc 192.168.1.57:1 & Please let me know if you need anymore information and if I could be of any assistance in providing information to have this issue resolved. |
From: david a. <da...@ci...> - 2008-03-05 15:04:56
|
Try adding the noapic option to your guest kernel. I re-ran that test on kvm-62 and my VM was able to run under load for more than 3-1/2 days (the network never locked up; I stopped the test to try other variations). One side effect of the noapic option is that irq balancing is disabled -- all interrupts are delivered via CPU 0. I ran a few tests earlier this week without the noapic option (hence with the apic) but with irq balancing disabled and still had the lockups. It seems to be something specific to the apic. david Eckersid SIlapaswang wrote: > david ahern <daahern <at> cisco.com> writes: > >> I know this issue has been discussed on this list before, but I am still >> experiencing network freezes in a guest that requires a restart to clear. When >> the network freezes in the guest I no longer see the network interrupts counter >> incrementing (i.e., the eth0 counter in /proc/interrupts in the guest). Using >> the crash utility, I verified that the interrupt is still enabled on the guest >> side and that no interrupts are pending. This suggests that the interrupts are >> not getting delivered to the VM. > > I just wanted to let the developers know that I'm having similar problems > concerning interrupts with networking dying as well. > > Running a stress test of kvm using an EnGarde Secure Linux 1.5 guest OS. > Under a heavy network email load, the guest OS networking gets knocked out > - unable to ping, ssh, etc. Can only get things started again by going > into vncviewer and restarting the networking services from there. > > CPUs: 8 x Intel(R) Xeon(R) CPU E5335 @ 2.00GHz > KVM 52-1 > Host Kernel: 2.6.25-rc2 > Kernel Arch: x86_64 > Guest OS: EnGarde Secure Linux 32bit i686, 2.4.31-1.5.60 > > Command Line: > /usr/bin/qemu-system -hda /root/images/bwimail01.img -boot c -m 384 -smp 4 > -std-vga -net nic,vlan=0,macaddr=52:54:00:12:34:6F -net > tap,ifname=tap1,script=/etc/qemu-ifup -vnc 192.168.1.57:1 & > > Please let me know if you need anymore information and if I could be of any > assistance in providing information to have this issue resolved. > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > kvm-devel mailing list > kvm...@li... > https://lists.sourceforge.net/lists/listinfo/kvm-devel > |
From: Avi K. <av...@qu...> - 2008-03-05 17:17:51
|
david ahern wrote: > Try adding the noapic option to your guest kernel. I re-ran that test on kvm-62 > and my VM was able to run under load for more than 3-1/2 days (the network never > locked up; I stopped the test to try other variations). > > One side effect of the noapic option is that irq balancing is disabled -- all > interrupts are delivered via CPU 0. I ran a few tests earlier this week without > the noapic option (hence with the apic) but with irq balancing disabled and > still had the lockups. It seems to be something specific to the apic. > > I got good results with apic and e1000. Can you try it? May be a guest driver bug. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. |
From: david a. <da...@ci...> - 2008-03-05 18:34:01
|
I did not have any better luck with the e1000 or pcnet nics when running kvm-61. I'll try again with kvm-63 and get back to you. david Avi Kivity wrote: > david ahern wrote: >> Try adding the noapic option to your guest kernel. I re-ran that test >> on kvm-62 >> and my VM was able to run under load for more than 3-1/2 days (the >> network never >> locked up; I stopped the test to try other variations). >> >> One side effect of the noapic option is that irq balancing is disabled >> -- all >> interrupts are delivered via CPU 0. I ran a few tests earlier this >> week without >> the noapic option (hence with the apic) but with irq balancing >> disabled and >> still had the lockups. It seems to be something specific to the apic. >> >> > > I got good results with apic and e1000. Can you try it? > > May be a guest driver bug. > |