Thread: [kvm-devel] still seeing network freezes with rtl8139 nic

Brought to you by: avik, mtosatti

kvm-devel

[kvm-devel] still seeing network freezes with rtl8139 nic

From: david a. <da...@ci...> - 2008-02-21 18:34:14

I know this issue has been discussed on this list before, but I am still
experiencing network freezes in a guest that requires a restart to clear. When
the network freezes in the guest I no longer see the network interrupts counter
incrementing (i.e., the eth0 counter in /proc/interrupts in the guest). Using
the crash utility, I verified that the interrupt is still enabled on the guest
side and that no interrupts are pending. This suggests that the interrupts are
not getting delivered to the VM.

On the host side, I attached gdb to the qemu process. When the networking
freezes in the guest the irq_state for the rtl8139 device is:

   irq_state = {1, 0, 0, 0}

My current guess is that there is a race condition getting tripped where the
change in irq state is not getting pushed to the VM or the race is in clearing
the interrupt. Once missed nothing resets the irq_state to 0, so no more of
these interrupts are delivered to the VM. From pci_set_irq() in qemu/hw/pci.c:

    change = level - pci_dev->irq_state[irq_num];
    if (!change)
        return;

I take this to mean that as long as irq_state is 1, any further requests to set
it to 1 are basically ignored, so the interrupt is not pushed to the VM.

Also, I have only seen the network interrupt in this state; the timer interrupts
and ide interrupts appear to be working fine even when the network interrupts stop.

In debugging the problem I noticed that it is easy for the guest to be
overwhelmed with packets such that the ring buffer fills in the rtl8139 device
in qemu. You see this in the fifo counter in the output of 'ethtool -S eth0' on
the guest side, and I confirmed it by modifying the qemu source for the nic to
print when packets are dropped due to no space in the ring buffer. I am seeing
as much as 100+ packets dropped per second. (The code for the existing message
"RTL8139: C+ Rx mode : descriptor XX is owned by host" was modified to print a
summary of the number of times this message was hit per second; without flow
control the number of messages was unwieldy.)

If I recompile the guest side 8139cp driver to use a higher "weight" (e.g., 48
or 64), then the number of dropped packets due to ring buffer overflow drops
dramatically (I have also run cases with the size of the ring buffer increased).
Incrementing this weight allows more packets to be pulled from the device each
time the cp_rx_poll() function is called in the guest kernel. With the increased
weight I still see network freezes but can usually run longer before a freeze
occurs (e.g., with the increase I can run for say an hour+ with a network load
going versus without the increase I can trigger the freeze pretty quickly with a
network load).


Host
----
Model:    PowerEdge 2950
CPUs:     two dual-core, Xeon(R) CPU 5140 @ 2.33GHz
OS:       RHEL 5.1, x86_64
kernel:   2.6.24.2 kernel, x86_64
Network:  attached to gigabit network
kvm:      kvm-61


Guest
-----
VM:      2.5 GB RAM, 2 VCPUs, rtl8139 NIC
OS:      RHEL4 U4 32-bit
kernel:  U6 kernel recompiled to run at 250 HZ rather than 1000 HZ
NIC:     8139cp driver


Command line:
/usr/local/bin/qemu-system-x86_64 -localtime -m 2560 -smp 2 \
     -hda rootfs.img -hdb trace.img \
     -net nic,macaddr=00:16:3e:30:f0:32,model=rtl8139 -net tap \
     -monitor stdio -vnc :2


I am continuing to look into the irq processing on the kvm/qemu side. I'd like
to know if anyone has suggestions on what to look at. This is my first foray
into the kvm and qemu code, and it's a lot to take in all at once.

thanks,

david

Re: [kvm-devel] still seeing network freezes with rtl8139 nic

From: Avi K. <av...@qu...> - 2008-02-24 10:14:05

david ahern wrote:
> I know this issue has been discussed on this list before, but I am still
> experiencing network freezes in a guest that requires a restart to clear. When
> the network freezes in the guest I no longer see the network interrupts counter
> incrementing (i.e., the eth0 counter in /proc/interrupts in the guest). Using
> the crash utility, I verified that the interrupt is still enabled on the guest
> side and that no interrupts are pending. This suggests that the interrupts are
> not getting delivered to the VM.
>
>   

[...]

> I am continuing to look into the irq processing on the kvm/qemu side. I'd like
> to know if anyone has suggestions on what to look at. This is my first foray
> into the kvm and qemu code, and it's a lot to take in all at once.
>
>   

Standard procedure is to run with -no-kvm and -no-kvm-irqchip, to see if 
the problem is in qemu proper, the in-kernel irq handling, or the rest 
of kvm.

-- 
error compiling committee.c: too many arguments to function

Re: [kvm-devel] still seeing network freezes with rtl8139 nic

From: david a. <da...@ci...> - 2008-02-24 19:50:02

I've run a lot more tests:

- with the -no-kvm-irqchip option the vm eventully stops responding to network
or console,

- with the -no-kvm option the performance is so bad I cannot get our ap up and
running so the results are inconclusive,

- I've tried the e1000 and pcnet nic models and both showed the network lockup;
with the ne2k_pci nic I did not see dropped packets and the network never locked
up in 12+ hours, but system CPU time was 10% higher than when the rtl8139 nic
was working

- if I remove the "if (!change) return" optimization from pci_set_irq the
rtl8139 nic worked fine for 16+ hours. I'm not recommending this as a fix, just
confirming that the problem goes away.

- I tried adding a thread mutex to the rtl8139 device model around accesses to
the RTL8139State data, but the network still locked up.

david


Avi Kivity wrote:
> david ahern wrote:
>> I know this issue has been discussed on this list before, but I am still
>> experiencing network freezes in a guest that requires a restart to clear. When
>> the network freezes in the guest I no longer see the network interrupts counter
>> incrementing (i.e., the eth0 counter in /proc/interrupts in the guest). Using
>> the crash utility, I verified that the interrupt is still enabled on the guest
>> side and that no interrupts are pending. This suggests that the interrupts are
>> not getting delivered to the VM.
>>
>>   
> 
> [...]
> 
>> I am continuing to look into the irq processing on the kvm/qemu side. I'd like
>> to know if anyone has suggestions on what to look at. This is my first foray
>> into the kvm and qemu code, and it's a lot to take in all at once.
>>
>>   
> 
> Standard procedure is to run with -no-kvm and -no-kvm-irqchip, to see if 
> the problem is in qemu proper, the in-kernel irq handling, or the rest 
> of kvm.
>

Re: [kvm-devel] still seeing network freezes with rtl8139 nic

From: Avi K. <av...@qu...> - 2008-02-25 08:57:27

david ahern wrote:
> I've run a lot more tests:
>
>
> - if I remove the "if (!change) return" optimization from pci_set_irq the
> rtl8139 nic worked fine for 16+ hours. I'm not recommending this as a fix, just
> confirming that the problem goes away.
>
>   

Interesting.  What can cause this to happen?

- some non-pci device shares the same irq (unlikely)

- the pci link sharing is broken.  Is the eth0 irq shared?

Please post /proc/interrupts.

- the in-kernel ioapic is buggy and needs the extra kicking the 
optimization prevents.  Can be checked by re-adding the optimization to 
kvm_ioapic_set_irq() (keeping it removed in qemu).  If it works, the 
problem is in userspace.  If it fails, the problem is in the kernel.

Something like

  static int old_level[16];

  if (level == old_level[irq])
     return;
  old_level[irq] = level;



-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

Re: [kvm-devel] still seeing network freezes with rtl8139 nic

From: david a. <da...@ci...> - 2008-02-25 16:11:34

Avi Kivity wrote:
> david ahern wrote:
>> I've run a lot more tests:
>>
>>
>> - if I remove the "if (!change) return" optimization from pci_set_irq the
>> rtl8139 nic worked fine for 16+ hours. I'm not recommending this as a
>> fix, just
>> confirming that the problem goes away.
>>
>>   
> 
> Interesting.  What can cause this to happen?
> 
> - some non-pci device shares the same irq (unlikely)
> 
> - the pci link sharing is broken.  Is the eth0 irq shared?

interrupt is not shared.

> 
> Please post /proc/interrupts.

# cat /proc/interrupts
           CPU0       CPU1
  0:      10566      46468    IO-APIC-edge  timer
  1:          5          5    IO-APIC-edge  i8042
  8:          0          1    IO-APIC-edge  rtc
  9:          0          0   IO-APIC-level  acpi
 11:     243118       5656   IO-APIC-level  eth0
 12:        180         45    IO-APIC-edge  i8042
 14:       2021      12592    IO-APIC-edge  ide0
 15:         14         10    IO-APIC-edge  ide1
NMI:          0          0
LOC:      56947      56946
ERR:          0
MIS:         31


> 
> - the in-kernel ioapic is buggy and needs the extra kicking the
> optimization prevents.  Can be checked by re-adding the optimization to
> kvm_ioapic_set_irq() (keeping it removed in qemu).  If it works, the
> problem is in userspace.  If it fails, the problem is in the kernel.
> 
> Something like
> 
>  static int old_level[16];
> 
>  if (level == old_level[irq])
>     return;
>  old_level[irq] = level;
> 
> 
> 

I'll give this a shot and let you know.

If you are interested, here's some more info on the -no-kvm-irqchip option:
qemu ends up spinning with 1 thread consuming 100% cpu. Output from top
(literally the top 11 lines) with 'show threads' and individual cpu stats:

Tasks: 125 total,   2 running, 123 sleeping,   0 stopped,   0 zombie
Cpu0  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  :  1.0%us,  0.0%sy,  0.0%ni, 99.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   4046804k total,  4013480k used,    33324k free,    42512k buffers
Swap:  2096472k total,      120k used,  2096352k free,  1159892k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND

 4441 root      20   0 2675m 2.5g 9808 R  100 65.0 499:34.09 qemu-system-x86

 4426 root      20   0 2675m 2.5g 9808 S    1 65.0  16:24.50 qemu-system-x86

...


Hooking up gdb shows it cycling with the following backtrace:

(gdb) bt
#0  0x00002ad97b5ee3e8 in do_sigtimedwait () from /lib64/libc.so.6
#1  0x00002ad97b5ee4ae in sigtimedwait () from /lib64/libc.so.6
#2  0x00000000004fb7df in kvm_eat_signal (env=0x2ade460, timeout=10) at
/opt/kvm/kvm-61/qemu/qemu-kvm.c:156
#3  0x00000000004fb9e4 in kvm_eat_signals (env=0x2ade460, timeout=10)
    at /opt/kvm/kvm-61/qemu/qemu-kvm.c:192
#4  0x00000000004fba49 in kvm_main_loop_wait (env=0x2ade460, timeout=10)
    at /opt/kvm/kvm-61/qemu/qemu-kvm.c:211
#5  0x00000000004fc278 in kvm_main_loop_cpu (env=0x2ade460) at
/opt/kvm/kvm-61/qemu/qemu-kvm.c:299
#6  0x000000000040ff2d in main (argc=<value optimized out>, argv=0x7fff304607b8)
    at /opt/kvm/kvm-61/qemu/vl.c:7856

I have a dump of CPUX86State *env if you want to see it.

david

Re: [kvm-devel] still seeing network freezes with rtl8139 nic

From: david a. <da...@ci...> - 2008-02-25 17:06:18

david ahern wrote:
> Avi Kivity wrote:
>> - the in-kernel ioapic is buggy and needs the extra kicking the
>> optimization prevents.  Can be checked by re-adding the optimization to
>> kvm_ioapic_set_irq() (keeping it removed in qemu).  If it works, the
>> problem is in userspace.  If it fails, the problem is in the kernel.
>>
>> Something like
>>
>>  static int old_level[16];
>>
>>  if (level == old_level[irq])
>>     return;
>>  old_level[irq] = level;
>>

With the "if (!change) return;" taken out of pci_set_irq() and the above code
added to kvm_ioapic_set_irq() networking froze.

david

Re: [kvm-devel] still seeing network freezes with rtl8139 nic

From: david a. <da...@ci...> - 2008-02-26 14:50:45

Usually within a few hours, sometimes within 30 minutes.

Load averages as computed by sysstat in the nightly sar files:

        rxpck/s   txpck/s   rxbyt/s   txbyt/s
 eth0    975.18   1188.34  82044.06 171655.38

Interrupts come in at 1830/sec for eth0, 250/sec for the timer and 20/sec or ide0.

Are you using a RHEL4 image?

david

Avi Kivity wrote:
> david ahern wrote:
>> Almost 7 hours and the uniprocessor case is still chugging along.
>>
>>   
> 
> How long does it usually take to hang?
> 
> How do I go about reproducing this? apachebench (host) against httpd
> (guest) doesn't seem to trigger it.
>

Re: [kvm-devel] still seeing network freezes with rtl8139 nic

From: Avi K. <av...@qu...> - 2008-02-26 15:06:56

david ahern wrote:
> Usually within a few hours, sometimes within 30 minutes.
>
> Load averages as computed by sysstat in the nightly sar files:
>
>         rxpck/s   txpck/s   rxbyt/s   txbyt/s
>  eth0    975.18   1188.34  82044.06 171655.38
>
> Interrupts come in at 1830/sec for eth0, 250/sec for the timer and 20/sec or ide0.
>
> Are you using a RHEL4 image?
>   

Nope, FC6 x86_64.

My apachebench command line is

  ab -c 150 -n 10000000  -k http://guest/

12K eth interrupts/sec, 20 ide interrupts/sec, 250Hz timer.

The served file is quite small, so many small packets.

-- 
error compiling committee.c: too many arguments to function

Re: [kvm-devel] still seeing network freezes with rtl8139 nic

From: Avi K. <av...@qu...> - 2008-02-25 17:19:54

david ahern wrote:
> david ahern wrote:
>   
>> Avi Kivity wrote:
>>     
>>> - the in-kernel ioapic is buggy and needs the extra kicking the
>>> optimization prevents.  Can be checked by re-adding the optimization to
>>> kvm_ioapic_set_irq() (keeping it removed in qemu).  If it works, the
>>> problem is in userspace.  If it fails, the problem is in the kernel.
>>>
>>> Something like
>>>
>>>  static int old_level[16];
>>>
>>>  if (level == old_level[irq])
>>>     return;
>>>  old_level[irq] = level;
>>>
>>>       
>
> With the "if (!change) return;" taken out of pci_set_irq() and the above code
> added to kvm_ioapic_set_irq() networking froze.
>   

That points the finger at the kernel ioapic.

I saw from the /proc/interrupts dump that it's an smp guest.  Does it 
freeze on uniprocessor as well?  Maybe it's bad locking in the kernel.

-- 
error compiling committee.c: too many arguments to function

Re: [kvm-devel] still seeing network freezes with rtl8139 nic

From: david a. <da...@ci...> - 2008-02-26 04:57:06

Almost 7 hours and the uniprocessor case is still chugging along.

david


Avi Kivity wrote:
> david ahern wrote:
>> david ahern wrote:
>>   
>>> Avi Kivity wrote:
>>>     
>>>> - the in-kernel ioapic is buggy and needs the extra kicking the
>>>> optimization prevents.  Can be checked by re-adding the optimization to
>>>> kvm_ioapic_set_irq() (keeping it removed in qemu).  If it works, the
>>>> problem is in userspace.  If it fails, the problem is in the kernel.
>>>>
>>>> Something like
>>>>
>>>>  static int old_level[16];
>>>>
>>>>  if (level == old_level[irq])
>>>>     return;
>>>>  old_level[irq] = level;
>>>>
>>>>       
>> With the "if (!change) return;" taken out of pci_set_irq() and the above code
>> added to kvm_ioapic_set_irq() networking froze.
>>   
> 
> That points the finger at the kernel ioapic.
> 
> I saw from the /proc/interrupts dump that it's an smp guest.  Does it 
> freeze on uniprocessor as well?  Maybe it's bad locking in the kernel.
>

Re: [kvm-devel] still seeing network freezes with rtl8139 nic

From: Avi K. <av...@qu...> - 2008-02-26 09:30:55

david ahern wrote:
> Almost 7 hours and the uniprocessor case is still chugging along.
>
>   

How long does it usually take to hang?

How do I go about reproducing this? apachebench (host) against httpd 
(guest) doesn't seem to trigger it.

-- 
error compiling committee.c: too many arguments to function

Re: [kvm-devel] still seeing network freezes with rtl8139 nic

From: Avi K. <av...@qu...> - 2008-02-26 14:41:33

Avi Kivity wrote:
> david ahern wrote:
>> Almost 7 hours and the uniprocessor case is still chugging along.
>>
>>   
>
> How long does it usually take to hang?
>
> How do I go about reproducing this? apachebench (host) against httpd 
> (guest) doesn't seem to trigger it.
>

ab (on host) against httpd (on guest) reproduces within a few minutes.

uniprocessor guest: works
smp guest: fails
smp guest with one offlined cpu: works
smp guest with httpd pinned to cpu 0: works

so it's either a guest driver smp problem, or an ioapic problem.

-- 
error compiling committee.c: too many arguments to function

Re: [kvm-devel] still seeing network freezes with rtl8139 nic

From: david a. <da...@ci...> - 2008-02-26 14:55:31

I noted in my original post that if I increase the weight parameter in the
driver to have it pull more packets on each poll before taking it break then it
takes longer to freeze.

I have looked at a newer version of the 8139cp driver. Very few changes to the
poll function; most of them seem to be accommodating changes to the
netdevice/napi api. I'll take a closer look at it today.

david

Avi Kivity wrote:
> Avi Kivity wrote:
>> david ahern wrote:
>>> Almost 7 hours and the uniprocessor case is still chugging along.
>>>
>>>   
>>
>> How long does it usually take to hang?
>>
>> How do I go about reproducing this? apachebench (host) against httpd
>> (guest) doesn't seem to trigger it.
>>
> 
> ab (on host) against httpd (on guest) reproduces within a few minutes.
> 
> uniprocessor guest: works
> smp guest: fails
> smp guest with one offlined cpu: works
> smp guest with httpd pinned to cpu 0: works
> 
> so it's either a guest driver smp problem, or an ioapic problem.
>

Re: [kvm-devel] still seeing network freezes with rtl8139 nic

From: Eckersid S. <es...@gu...> - 2008-03-04 18:10:19

david ahern <daahern <at> cisco.com> writes:

> I know this issue has been discussed on this list before, but I am still
> experiencing network freezes in a guest that requires a restart to clear. When
> the network freezes in the guest I no longer see the network interrupts counter
> incrementing (i.e., the eth0 counter in /proc/interrupts in the guest). Using
> the crash utility, I verified that the interrupt is still enabled on the guest
> side and that no interrupts are pending. This suggests that the interrupts are
> not getting delivered to the VM.

I just wanted to let the developers know that I'm having similar problems
concerning interrupts with networking dying as well.

Running a stress test of kvm using an EnGarde Secure Linux 1.5 guest OS.
Under a heavy network email load, the guest OS networking gets knocked out
- unable to ping, ssh, etc. Can only get things started again by going
into vncviewer and restarting the networking services from there.

CPUs: 8 x Intel(R) Xeon(R) CPU E5335 @ 2.00GHz
KVM 52-1
Host Kernel: 2.6.25-rc2
Kernel Arch: x86_64
Guest OS: EnGarde Secure Linux 32bit i686, 2.4.31-1.5.60

Command Line:
/usr/bin/qemu-system -hda /root/images/bwimail01.img -boot c -m 384 -smp 4
-std-vga -net nic,vlan=0,macaddr=52:54:00:12:34:6F -net
tap,ifname=tap1,script=/etc/qemu-ifup -vnc 192.168.1.57:1 &

Please let me know if you need anymore information and if I could be of any
assistance in providing information to have this issue resolved.

Re: [kvm-devel] still seeing network freezes with rtl8139 nic

From: david a. <da...@ci...> - 2008-03-05 15:04:56

Try adding the noapic option to your guest kernel. I re-ran that test on kvm-62
and my VM was able to run under load for more than 3-1/2 days (the network never
locked up; I stopped the test to try other variations).

One side effect of the noapic option is that irq balancing is disabled -- all
interrupts are delivered via CPU 0. I ran a few tests earlier this week without
the noapic option (hence with the apic) but with irq balancing disabled and
still had the lockups. It seems to be something specific to the apic.

david


Eckersid SIlapaswang wrote:
> david ahern <daahern <at> cisco.com> writes:
> 
>> I know this issue has been discussed on this list before, but I am still
>> experiencing network freezes in a guest that requires a restart to clear. When
>> the network freezes in the guest I no longer see the network interrupts counter
>> incrementing (i.e., the eth0 counter in /proc/interrupts in the guest). Using
>> the crash utility, I verified that the interrupt is still enabled on the guest
>> side and that no interrupts are pending. This suggests that the interrupts are
>> not getting delivered to the VM.
> 
> I just wanted to let the developers know that I'm having similar problems
> concerning interrupts with networking dying as well.
> 
> Running a stress test of kvm using an EnGarde Secure Linux 1.5 guest OS.
> Under a heavy network email load, the guest OS networking gets knocked out
> - unable to ping, ssh, etc. Can only get things started again by going
> into vncviewer and restarting the networking services from there.
> 
> CPUs: 8 x Intel(R) Xeon(R) CPU E5335 @ 2.00GHz
> KVM 52-1
> Host Kernel: 2.6.25-rc2
> Kernel Arch: x86_64
> Guest OS: EnGarde Secure Linux 32bit i686, 2.4.31-1.5.60
> 
> Command Line:
> /usr/bin/qemu-system -hda /root/images/bwimail01.img -boot c -m 384 -smp 4
> -std-vga -net nic,vlan=0,macaddr=52:54:00:12:34:6F -net
> tap,ifname=tap1,script=/etc/qemu-ifup -vnc 192.168.1.57:1 &
> 
> Please let me know if you need anymore information and if I could be of any
> assistance in providing information to have this issue resolved.
> 
> 
> 
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> kvm-devel mailing list
> kvm...@li...
> https://lists.sourceforge.net/lists/listinfo/kvm-devel
>

Re: [kvm-devel] still seeing network freezes with rtl8139 nic

From: Avi K. <av...@qu...> - 2008-03-05 17:17:51

david ahern wrote:
> Try adding the noapic option to your guest kernel. I re-ran that test on kvm-62
> and my VM was able to run under load for more than 3-1/2 days (the network never
> locked up; I stopped the test to try other variations).
>
> One side effect of the noapic option is that irq balancing is disabled -- all
> interrupts are delivered via CPU 0. I ran a few tests earlier this week without
> the noapic option (hence with the apic) but with irq balancing disabled and
> still had the lockups. It seems to be something specific to the apic.
>
>   

I got good results with apic and e1000.  Can you try it?

May be a guest driver bug.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

Re: [kvm-devel] still seeing network freezes with rtl8139 nic

From: david a. <da...@ci...> - 2008-03-05 18:34:01

I did not have any better luck with the e1000 or pcnet nics when running kvm-61.
I'll try again with kvm-63 and get back to you.

david


Avi Kivity wrote:
> david ahern wrote:
>> Try adding the noapic option to your guest kernel. I re-ran that test
>> on kvm-62
>> and my VM was able to run under load for more than 3-1/2 days (the
>> network never
>> locked up; I stopped the test to try other variations).
>>
>> One side effect of the noapic option is that irq balancing is disabled
>> -- all
>> interrupts are delivered via CPU 0. I ran a few tests earlier this
>> week without
>> the noapic option (hence with the apic) but with irq balancing
>> disabled and
>> still had the lockups. It seems to be something specific to the apic.
>>
>>   
> 
> I got good results with apic and e1000.  Can you try it?
> 
> May be a guest driver bug.
>