#378 82579LM e1000e driver hardware hang

open
None
standalone_driver
1
3 days ago
2013-07-03
No

After installing debian kernel 3.9.6 i get this messages and card sometime freeze :

Jul 2 16:39:01 zlatograd-1 snmpd[2870]: Connection from UDP: [127.0.0.1]:37457->[127.0.0.1]
Jul 2 16:39:42 zlatograd-1 kernel: [ 350.963514] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
Jul 2 16:39:42 zlatograd-1 kernel: [ 350.963514] TDH <f4e>
Jul 2 16:39:42 zlatograd-1 kernel: [ 350.963514] TDT <fb0>
Jul 2 16:39:42 zlatograd-1 kernel: [ 350.963514] next_to_use <fb0>
Jul 2 16:39:42 zlatograd-1 kernel: [ 350.963514] next_to_clean <f4e>
Jul 2 16:39:42 zlatograd-1 kernel: [ 350.963514] buffer_info[next_to_clean]:
Jul 2 16:39:42 zlatograd-1 kernel: [ 350.963514] time_stamp <100002f16>
Jul 2 16:39:42 zlatograd-1 kernel: [ 350.963514] next_to_watch <f4e>
Jul 2 16:39:42 zlatograd-1 kernel: [ 350.963514] jiffies <1000031a1>
Jul 2 16:39:42 zlatograd-1 kernel: [ 350.963514] next_to_watch.status <0>
Jul 2 16:39:42 zlatograd-1 kernel: [ 350.963514] MAC Status <40080083>
Jul 2 16:39:42 zlatograd-1 kernel: [ 350.963514] PHY Status <796d>
Jul 2 16:39:42 zlatograd-1 kernel: [ 350.963514] PHY 1000BASE-T Status <3800>
Jul 2 16:39:42 zlatograd-1 kernel: [ 350.963514] PHY Extended Status <3000>
Jul 2 16:39:42 zlatograd-1 kernel: [ 350.963514] PCI Status <10>
Jul 2 16:39:44 zlatograd-1 kernel: [ 352.963071] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
Jul 2 16:39:44 zlatograd-1 kernel: [ 352.963071] TDH <f4e>
Jul 2 16:39:44 zlatograd-1 kernel: [ 352.963071] TDT <fb0>
Jul 2 16:39:44 zlatograd-1 kernel: [ 352.963071] next_to_use <fb0>
Jul 2 16:39:44 zlatograd-1 kernel: [ 352.963071] next_to_clean <f4e>
Jul 2 16:39:44 zlatograd-1 kernel: [ 352.963071] buffer_info[next_to_clean]:
Jul 2 16:39:44 zlatograd-1 kernel: [ 352.963071] time_stamp <100002f16>
Jul 2 16:39:44 zlatograd-1 kernel: [ 352.963071] next_to_watch <f4e>
Jul 2 16:39:44 zlatograd-1 kernel: [ 352.963071] jiffies <100003395>
Jul 2 16:39:44 zlatograd-1 kernel: [ 352.963071] next_to_watch.status <0>
Jul 2 16:39:44 zlatograd-1 kernel: [ 352.963071] MAC Status <40080083>
Jul 2 16:39:44 zlatograd-1 kernel: [ 352.963071] PHY Status <796d>
Jul 2 16:39:44 zlatograd-1 kernel: [ 352.963071] PHY 1000BASE-T Status <3800>
Jul 2 16:39:44 zlatograd-1 kernel: [ 352.963071] PHY Extended Status <3000>
Jul 2 16:39:44 zlatograd-1 kernel: [ 352.963071] PCI Status <10>
Jul 2 16:39:46 zlatograd-1 kernel: [ 354.962701] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
Jul 2 16:39:46 zlatograd-1 kernel: [ 354.962701] TDH <f4e>
Jul 2 16:39:46 zlatograd-1 kernel: [ 354.962701] TDT <fb0>
Jul 2 16:39:46 zlatograd-1 kernel: [ 354.962701] next_to_use <fb0>
Jul 2 16:39:46 zlatograd-1 kernel: [ 354.962701] next_to_clean <f4e>
Jul 2 16:39:46 zlatograd-1 kernel: [ 354.962701] buffer_info[next_to_clean]:
Jul 2 16:39:46 zlatograd-1 kernel: [ 354.962701] time_stamp <100002f16>
Jul 2 16:39:46 zlatograd-1 kernel: [ 354.962701] next_to_watch <f4e>
Jul 2 16:39:46 zlatograd-1 kernel: [ 354.962701] jiffies <100003589>
Jul 2 16:39:46 zlatograd-1 kernel: [ 354.962701] next_to_watch.status <0>
Jul 2 16:39:46 zlatograd-1 kernel: [ 354.962701] MAC Status <40080083>
Jul 2 16:39:46 zlatograd-1 kernel: [ 354.962701] PHY Status <796d>
Jul 2 16:39:46 zlatograd-1 kernel: [ 354.962701] PHY 1000BASE-T Status <3800>
Jul 2 16:39:46 zlatograd-1 kernel: [ 354.962701] PHY Extended Status <3000>
Jul 2 16:39:46 zlatograd-1 kernel: [ 354.962701] PCI Status <10>
Jul 2 16:39:47 zlatograd-1 kernel: [ 355.965725] ------------[ cut here ]------------

Jul 2 16:39:47 zlatograd-1 kernel: [ 355.965737] WARNING: at /build/linux-eHaf39/linux-3.9.6/net/sched/sch_generic.c:255 dev_watchdog+0xde/0x14b()
Jul 2 16:39:47 zlatograd-1 kernel: [ 355.965740] Hardware name: X9SCL/X9SCM
Jul 2 16:39:47 zlatograd-1 kernel: [ 355.965742] NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
Jul 2 16:39:47 zlatograd-1 kernel: [ 355.965744] Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_DSCP iptable_mangle xt_tcpudp ipt_REJECT iptable_filter ip_tables x_tables 8021q garp stp llc loop iTCO_wdt iTCO_vendor_support lpc_ich snd_pcm mfd_core snd_page_alloc snd_timer snd acpi_cpufreq mperf joydev i2c_i801 coretemp soundcore i2c_core kvm_intel kvm psmouse processor serio_raw evdev pcspkr crc32c_intel ghash_clmulni_intel button thermal_sys cryptd ext4 crc16 jbd2 mbcache dm_mod raid1 md_mod sg sd_mod crc_t10dif hid_generic usbhid hid ahci libahci ehci_pci ehci_hcd libata microcode scsi_mod usbcore usb_common e1000e ptp pps_core
Jul 2 16:39:47 zlatograd-1 kernel: [ 355.965803] Pid: 0, comm: swapper/0 Not tainted 3.9-1-amd64 #1 Debian 3.9.6-1
Jul 2 16:39:47 zlatograd-1 kernel: [ 355.965805] Call Trace:
Jul 2 16:39:47 zlatograd-1 kernel: [ 355.965807] <IRQ> [<ffffffff8103d2b8>] ? warn_slowpath_common+0x76/0x8c
Jul 2 16:39:47 zlatograd-1 kernel: [ 355.965818] [<ffffffff812dfcfe>] ? netif_tx_lock+0x7a/0x7a
Jul 2 16:39:47 zlatograd-1 kernel: [ 355.965821] [<ffffffff8103d367>] ? warn_slowpath_fmt+0x47/0x49
Jul 2 16:39:47 zlatograd-1 kernel: [ 355.965825] [<ffffffff812dfceb>] ? netif_tx_lock+0x67/0x7a
Jul 2 16:39:47 zlatograd-1 kernel: [ 355.965831] [<ffffffff812dfddc>] ? dev_watchdog+0xde/0x14b
Jul 2 16:39:47 zlatograd-1 kernel: [ 355.965836] [<ffffffff8104859f>] ? call_timer_fn+0x4b/0xf6
Jul 2 16:39:47 zlatograd-1 kernel: [ 355.965840] [<ffffffff812dfcfe>] ? netif_tx_lock+0x7a/0x7a
Jul 2 16:39:47 zlatograd-1 kernel: [ 355.965845] [<ffffffff81049a3e>] ? run_timer_softirq+0x18d/0x1d6
Jul 2 16:39:47 zlatograd-1 kernel: [ 355.965848] [<ffffffff81043b0e>] ? __do_softirq+0xea/0x205
Jul 2 16:39:47 zlatograd-1 kernel: [ 355.965852] [<ffffffff81043cf3>] ? irq_exit+0x3e/0x80
Jul 2 16:39:47 zlatograd-1 kernel: [ 355.965857] [<ffffffff81028678>] ? smp_apic_timer_interrupt+0x72/0x7e
Jul 2 16:39:47 zlatograd-1 kernel: [ 355.965862] [<ffffffff8138cd9d>] ? apic_timer_interrupt+0x6d/0x80
Jul 2 16:39:47 zlatograd-1 kernel: [ 355.965863] <EOI> [<ffffffff812a23c0>] ? arch_local_irq_enable+0x4/0x8
Jul 2 16:39:47 zlatograd-1 kernel: [ 355.965873] [<ffffffff812a2a44>] ? cpuidle_wrap_enter+0x47/0x7f
Jul 2 16:39:47 zlatograd-1 kernel: [ 355.965878] [<ffffffff812a277e>] ? cpuidle_enter_state+0xa/0x2f
Jul 2 16:39:47 zlatograd-1 kernel: [ 355.965882] [<ffffffff812a284c>] ? cpuidle_idle_call+0xa9/0xfb
Jul 2 16:39:47 zlatograd-1 kernel: [ 355.965887] [<ffffffff81014b0d>] ? cpu_idle+0x96/0xe0
Jul 2 16:39:47 zlatograd-1 kernel: [ 355.965891] [<ffffffff816afd25>] ? start_kernel+0x3d0/0x3db
Jul 2 16:39:47 zlatograd-1 kernel: [ 355.965895] [<ffffffff816af777>] ? repair_env_string+0x54/0x54
Jul 2 16:39:47 zlatograd-1 kernel: [ 355.965899] [<ffffffff816af598>] ? x86_64_start_kernel+0xf2/0xfd
Jul 2 16:39:47 zlatograd-1 kernel: [ 355.965901] ---[ end trace a211a6a196e912a9 ]---
Jul 2 16:39:47 zlatograd-1 kernel: [ 355.965918] e1000e 0000:00:19.0 eth0: Reset adapter unexpectedly
Jul 2 16:39:51 zlatograd-1 kernel: [ 359.803514] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None

00:19.0 Ethernet controller: Intel Corporation 82579LM Gigabit Network Connection (rev 05)
Subsystem: Super Micro Computer Inc Device 1502
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort-="">SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 45
Region 0: Memory at fbb00000 (32-bit, non-prefetchable) [size=128K]
Region 1: Memory at fbb24000 (32-bit, non-prefetchable) [size=4K]
Region 2: I/O ports at f020 [size=32]
Capabilities: [c8] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fee003d8 Data: 0000
Capabilities: [e0] PCI Advanced Features
AFCap: TP+ FLR+
AFCtrl: FLR-
AFStatus: TP-
Kernel driver in use: e1000e

Discussion

  • Todd Fujinaka
    Todd Fujinaka
    2013-12-03

    • assigned_to: dertman
     
  • Andre Tomt
    Andre Tomt
    2014-01-11

    I am also seeing these problems with this adapter. Both on the board in this bug report and on the Intel DQ77KB (BIOS KBQ7710H.86A.0052.2013.0708.1336 07/08/2013 ).

    Kernel is 3.12.7 in my case. It might be easier to reproduce if you are forwarding packets from one adapter to another (routing) - at least that is what it looks like here. With some traffic patterns this happens 1-2 times a minute, which gets REALLY annoying.

    Disabling TSO, GSO and GRO using ethtool (ethtool -K eth0 gso off gro off tso off) works around it for me, but with somewhat reduced performance.

     
  • Oliver Wagner
    Oliver Wagner
    2014-01-15

    I seem to be having the same issue on a machine with a Supermicro X9SCM-F board.

    The box is running Ubuntu 12.04.4. The e1000e driver is a manually compiled 2.5.4.

    Interestingly, the effect has just started recently when I upgraded from Ubuntu's Kernel 3.5 to 3.8.0.35. It did not happen before with either 3.5 nor 3.2 kernels.

    The frequency of the error is about 1-2 times a day.

    I have a second machine with the same board and kernel/driver combo which does not exhibit the behavior. One notable difference is that I've raised the ring buffer size on that box with ethtool -G eth0 rx 2048. The other is a different usage pattern:

    • Box with symptoms: Router/Firewall, packet forwarding between different VLANs on eth0 and eth1
    • Box without symptoms: Fileserver, eth0/eth1 bonded (VLANs used, but no forwarding)

    [195531.978835] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
    [195531.978835] TDH <18>
    [195531.978835] TDT <4a>
    [195531.978835] next_to_use <4a>
    [195531.978835] next_to_clean <16>
    [195531.978835] buffer_info[next_to_clean]:
    [195531.978835] time_stamp <102e9209a>
    [195531.978835] next_to_watch <18>
    [195531.978835] jiffies <102e921bb>
    [195531.978835] next_to_watch.status <0>
    [195531.978835] MAC Status <40080083>
    [195531.978835] PHY Status <796d>
    [195531.978835] PHY 1000BASE-T Status <38ff>
    [195531.978835] PHY Extended Status <3000>
    [195531.978835] PCI Status <10>
    [195533.977786] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
    [195533.977786] TDH <18>
    [195533.977786] TDT <4a>
    [195533.977786] next_to_use <4a>
    [195533.977786] next_to_clean <16>
    [195533.977786] buffer_info[next_to_clean]:
    [195533.977786] time_stamp <102e9209a>
    [195533.977786] next_to_watch <18>
    [195533.977786] jiffies <102e923af>
    [195533.977786] next_to_watch.status <0>
    [195533.977786] MAC Status <40080083>
    [195533.977786] PHY Status <796d>
    [195533.977786] PHY 1000BASE-T Status <3800>
    [195533.977786] PHY Extended Status <3000>
    [195533.977786] PCI Status <10>
    [195535.976760] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
    [195535.976760] TDH <18>
    [195535.976760] TDT <4a>
    [195535.976760] next_to_use <4a>
    [195535.976760] next_to_clean <16>
    [195535.976760] buffer_info[next_to_clean]:
    [195535.976760] time_stamp <102e9209a>
    [195535.976760] next_to_watch <18>
    [195535.976760] jiffies <102e925a3>
    [195535.976760] next_to_watch.status <0>
    [195535.976760] MAC Status <40080083>
    [195535.976760] PHY Status <796d>
    [195535.976760] PHY 1000BASE-T Status <3800>
    [195535.976760] PHY Extended Status <3000>
    [195535.976760] PCI Status <10>
    [195537.975811] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
    [195537.975811] TDH <18>
    [195537.975811] TDT <4a>
    [195537.975811] next_to_use <4a>
    [195537.975811] next_to_clean <16>
    [195537.975811] buffer_info[next_to_clean]:
    [195537.975811] time_stamp <102e9209a>
    [195537.975811] next_to_watch <18>
    [195537.975811] jiffies <102e92797>
    [195537.975811] next_to_watch.status <0>
    [195537.975811] MAC Status <40080083>
    [195537.975811] PHY Status <796d>
    [195537.975811] PHY 1000BASE-T Status <3800>
    [195537.975811] PHY Extended Status <3000>
    [195537.975811] PCI Status <10>
    [195538.986544] e1000e 0000:00:19.0 eth0: Reset adapter unexpectedly
    [195542.786685] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None

    uname -a:
    Linux gateway1 3.8.0-35-generic #50~precise1-Ubuntu SMP Wed Dec 4 17:25:51 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

    lspci -vv:
    00:19.0 Ethernet controller: Intel Corporation 82579LM Gigabit Network Connection (rev 05)
    Subsystem: Super Micro Computer Inc Device 1502
    Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort-="">SERR- <PERR- INTx-
    Latency: 0
    Interrupt: pin A routed to IRQ 40
    Region 0: Memory at dfa00000 (32-bit, non-prefetchable) [size=128K]
    Region 1: Memory at dfa25000 (32-bit, non-prefetchable) [size=4K]
    Region 2: I/O ports at f020 [size=32]
    Capabilities: [c8] Power Management version 2
    Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
    Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
    Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
    Address: 00000000fee0100c Data: 41e1
    Capabilities: [e0] PCI Advanced Features
    AFCap: TP+ FLR+
    AFCtrl: FLR-
    AFStatus: TP-
    Kernel driver in use: e1000e
    Kernel modules: e1000e

    ethtool -i eth0:
    driver: e1000e
    version: 2.5.4-NAPI
    firmware-version: 0.13-4
    bus-info: 0000:00:19.0
    supports-statistics: yes
    supports-test: yes
    supports-eeprom-access: yes
    supports-register-dump: yes

     
  • Oliver Wagner
    Oliver Wagner
    2014-02-25

    Problem still happens with

    [ 0.781704] e1000e: Intel(R) PRO/1000 Network Driver - 3.0.4-NAPI

    on

    Linux gateway1 3.8.0-36-generic #52~precise1-Ubuntu SMP Mon Feb 3 21:54:46 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

    Disabling tso (ethtool -K eth0 tso off) works as a workaround.

    [142654.819785] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
    [142654.819785]   TDH                  <14>
    [142654.819785]   TDT                  <28>
    [142654.819785]   next_to_use          <28>
    [142654.819785]   next_to_clean        <14>
    [142654.819785] buffer_info[next_to_clean]:
    [142654.819785]   time_stamp           <1021f4b4f>
    [142654.819785]   next_to_watch        <14>
    [142654.819785]   jiffies              <1021f51ab>
    [142654.819785]   next_to_watch.status <0>
    [142654.819785] MAC Status             <40080083>
    [142654.819785] PHY Status             <796d>
    [142654.819785] PHY 1000BASE-T Status  <3800>
    [142654.819785] PHY Extended Status    <3000>
    [142654.819785] PCI Status             <10>
    [142656.818831] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
    [142656.818831]   TDH                  <14>
    [142656.818831]   TDT                  <28>
    [142656.818831]   next_to_use          <28>
    [142656.818831]   next_to_clean        <14>
    [142656.818831] buffer_info[next_to_clean]:
    [142656.818831]   time_stamp           <1021f4b4f>
    [142656.818831]   next_to_watch        <14>
    [142656.818831]   jiffies              <1021f539f>
    [142656.818831]   next_to_watch.status <0>
    [142656.818831] MAC Status             <40080083>
    [142656.818831] PHY Status             <796d>
    [142656.818831] PHY 1000BASE-T Status  <3800>
    [142656.818831] PHY Extended Status    <3000>
    [142656.818831] PCI Status             <10>
    [142656.884023] e1000e 0000:00:19.0 eth0: Reset adapter unexpectedly
    [142660.781786] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
    
     
    Last edit: Oliver Wagner 2014-02-25
  • Oliver Wagner
    Oliver Wagner
    2014-05-26

    Problem still happens with

    e1000e: Intel(R) PRO/1000 Network Driver - 3.0.4.1-NAPI

    on

    Linux gateway1 3.13.0-24-generic #47-Ubuntu SMP Fri May 2 23:30:00 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

    (that is: upgrading from Ubuntu 12.04 to 14.04, with the current driver manually compiled)

    [  370.708888] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
    [  370.708888]   TDH                  <8a>
    [  370.708888]   TDT                  <b1>
    [  370.708888]   next_to_use          <b1>
    [  370.708888]   next_to_clean        <86>
    [  370.708888] buffer_info[next_to_clean]:
    [  370.708888]   time_stamp           <100004309>
    [  370.708888]   next_to_watch        <8a>
    [  370.708888]   jiffies              <100004523>
    [  370.708888]   next_to_watch.status <0>
    [  370.708888] MAC Status             <40080083>
    [  370.708888] PHY Status             <796d>
    [  370.708888] PHY 1000BASE-T Status  <38ff>
    [  370.708888] PHY Extended Status    <3000>
    [  370.708888] PCI Status             <10>
    [  372.707753] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
    [  372.707753]   TDH                  <8a>
    [  372.707753]   TDT                  <b1>
    [  372.707753]   next_to_use          <b1>
    [  372.707753]   next_to_clean        <86>
    [  372.707753] buffer_info[next_to_clean]:
    [  372.707753]   time_stamp           <100004309>
    [  372.707753]   next_to_watch        <8a>
    [  372.707753]   jiffies              <100004717>
    [  372.707753]   next_to_watch.status <0>
    [  372.707753] MAC Status             <40080083>
    [  372.707753] PHY Status             <796d>
    [  372.707753] PHY 1000BASE-T Status  <3800>
    [  372.707753] PHY Extended Status    <3000>
    [  372.707753] PCI Status             <10>
    [  373.718453] ------------[ cut here ]------------
    [  373.718467] WARNING: CPU: 2 PID: 0 at /build/buildd/linux-3.13.0/net/sched/sch_generic.c:264 dev_watchdog+0x276/0x280()
    [  373.718470] NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
    [  373.718472] Modules linked in: nf_conntrack_netlink nfnetlink cls_u32 sch_sfq sch_htb xt_LOG ipt_REJECT xt_conntrack xt_nat ipt_MASQUERADE xt_dscp xt_tcpudp xt_mark xt_connmark iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_mangle iptable_filter ip_tables x_tables nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack cuse rpcsec_gss_krb5 nfsv4 snd_usb_audio snd_usbmidi_lib 8021q garp stp mrp snd_hwdep llc snd_pcm snd_page_alloc snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel snd cp210x aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd usbserial joydev soundcore serio_raw cdc_acm binfmt_misc video mac_hid ipmi_si ipmi_devintf lp parport lpc_ich nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache hid_generic psmouse ahci libahci usbhid hid e1000e(OF)
    [  373.718590] CPU: 2 PID: 0 Comm: swapper/2 Tainted: GF          O 3.13.0-24-generic #47-Ubuntu
    [  373.718593] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.0b 09/17/2012
    [  373.718595]  0000000000000009 ffff88022fd03d98 ffffffff81715ac4 ffff88022fd03de0
    [  373.718602]  ffff88022fd03dd0 ffffffff810676bd 0000000000000000 ffff8802203d0000
    [  373.718607]  ffff8802201ff680 0000000000000001 0000000000000002 ffff88022fd03e30
    [  373.718612] Call Trace:
    [  373.718615]  <IRQ>  [<ffffffff81715ac4>] dump_stack+0x45/0x56
    [  373.718629]  [<ffffffff810676bd>] warn_slowpath_common+0x7d/0xa0
    [  373.718633]  [<ffffffff8106772c>] warn_slowpath_fmt+0x4c/0x50
    [  373.718643]  [<ffffffff8163aed6>] dev_watchdog+0x276/0x280
    [  373.718648]  [<ffffffff8163ac60>] ? dev_graft_qdisc+0x80/0x80
    [  373.718654]  [<ffffffff81074226>] call_timer_fn+0x36/0x100
    [  373.718659]  [<ffffffff8163ac60>] ? dev_graft_qdisc+0x80/0x80
    [  373.718663]  [<ffffffff810751bf>] run_timer_softirq+0x1ef/0x2f0
    [  373.718670]  [<ffffffff8106caec>] __do_softirq+0xec/0x2c0
    [  373.718675]  [<ffffffff8106d035>] irq_exit+0x105/0x110
    [  373.718682]  [<ffffffff81728885>] smp_apic_timer_interrupt+0x45/0x60
    [  373.718688]  [<ffffffff8172721d>] apic_timer_interrupt+0x6d/0x80
    [  373.718690]  <EOI>  [<ffffffff815c95e2>] ? cpuidle_enter_state+0x52/0xc0
    [  373.718702]  [<ffffffff815c9709>] cpuidle_idle_call+0xb9/0x1f0
    [  373.718710]  [<ffffffff8101ceae>] arch_cpu_idle+0xe/0x30
    [  373.718716]  [<ffffffff810beb85>] cpu_startup_entry+0xc5/0x290
    [  373.718723]  [<ffffffff81040fc8>] start_secondary+0x218/0x2c0
    [  373.718726] ---[ end trace 86a85b89ac7d527c ]---
    [  373.718750] e1000e 0000:00:19.0 eth0: Reset adapter unexpectedly
    [  377.561691] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
    
     
  • When you add this card to bond with vlan card has hangup. In bond without vlan card works but flapped in hiload. Try disable the tso, gso ... does not help.

    root@r5:~# uname -a
    Linux r5 3.13.11-1-amd64-vyos #1 SMP Tue Aug 5 16:35:03 UTC 2014 x86_64 GNU/Linux

    [ 2030.600895] ------------[ cut here ]------------
    [ 2030.600907] WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0xf2/0x14f()
    [ 2030.600911] NETDEV WATCHDOG: eth1 (e1000e): transmit queue 0 timed out
    [ 2030.600914] Modules linked in: 8021q mrp garp stp llc bonding xt_set ipt_REJECT xt_comment ip_set_hash_net ip_set_bitmap_port ip_set_hash_ip ip_set iptable_nat nf_nat_ipv4 ip6table_filter ip6table_raw ip6_tables iptable_filter nf_conntrack_ipv4 nf_defrag_ipv4 xt_CT nfnetlink_cthelper nfnetlink iptable_raw nf_nat_pptp nf_conntrack_pptp nf_conntrack_proto_gre nf_nat_h323 nf_conntrack_h323 nf_nat_sip nf_conntrack_sip nf_nat_proto_gre nf_nat_tftp nf_nat_ftp nf_nat nf_conntrack_tftp nf_conntrack_ftp nf_conntrack ipv6 cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_ondemand cpufreq_conservative fuse crc32c_intel coretemp lpc_ich evdev pcspkr ipmi_si mfd_core hid_generic i7core_edac microcode ipmi_msghandler i2c_i801 edac_core processor button thermal_sys battery usb_storage ohci_hcd squashfs loop overlayfs ext4 jbd2 crc16 raid10 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid1 raid0 multipath linear md_mod usbhid hid pata_acpi ata_generic ata_piix e1000e igb dca i2c_algo_bit i2c_core ptp pps_core
    [ 2030.601006] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.13.11-1-amd64-vyos #1
    [ 2030.601007] Hardware name: Intel Corporation S3420GP/S3420GP, BIOS S3420GP.86B.01.00.0052.051620141338 05/16/2014
    [ 2030.601009] 0000000000000000 ffffffff815493be ffffffff813e74de ffff88015fcc3de8
    [ 2030.601011] ffffffff8104a700 ffff88015aab5400 ffffffff8136f136 000001d8dea72df7
    [ 2030.601013] ffff8800372dc000 ffffffff8136f044 ffff8800372dc388 ffff88015ab3d828
    [ 2030.601016] Call Trace:
    [ 2030.601017] <IRQ> [<ffffffff813e74de>] ? dump_stack+0x41/0x53
    [ 2030.601024] [<ffffffff8104a700>] ? warn_slowpath_common+0x79/0x92
    [ 2030.601026] [<ffffffff8136f136>] ? dev_watchdog+0xf2/0x14f
    [ 2030.601029] [<ffffffff8136f044>] ? netif_tx_unlock+0x50/0x50
    [ 2030.601031] [<ffffffff8104a7b7>] ? warn_slowpath_fmt+0x45/0x4a
    [ 2030.601034] [<ffffffff813ea3e2>] ? _raw_spin_unlock+0x5/0x6
    [ 2030.601036] [<ffffffff8136efe1>] ? netif_tx_lock+0x67/0x7a
    [ 2030.601040] [<ffffffff8136f136>] ? dev_watchdog+0xf2/0x14f
    [ 2030.601043] [<ffffffff81054f34>] ? call_timer_fn+0x50/0xd2
    [ 2030.601045] [<ffffffff8105514d>] ? run_timer_softirq+0x197/0x1e2
    [ 2030.601047] [<ffffffff8136f044>] ? netif_tx_unlock+0x50/0x50
    [ 2030.601049] [<ffffffff8104e637>] ? __do_softirq+0x100/0x244
    [ 2030.601052] [<ffffffff8104e800>] ? irq_exit+0x40/0x9d
    [ 2030.601055] [<ffffffff8100fad3>] ? do_IRQ+0x94/0xaa
    [ 2030.601057] [<ffffffff813ea7ed>] ? common_interrupt+0x6d/0x6d
    [ 2030.601058] <EOI> [<ffffffff8133702d>] ? cpuidle_enter_state+0x3d/0xa8
    [ 2030.601063] [<ffffffff81337026>] ? cpuidle_enter_state+0x36/0xa8
    [ 2030.601065] [<ffffffff813371b5>] ? cpuidle_idle_call+0xd6/0x129
    [ 2030.601067] [<ffffffff81015fc4>] ? arch_cpu_idle+0x9/0x20
    [ 2030.601071] [<ffffffff81083b80>] ? cpu_startup_entry+0xe5/0x159
    [ 2030.601074] [<ffffffff81032761>] ? start_secondary+0x211/0x216
    [ 2030.601076] ---[ end trace bcf7618d8ca7f5f6 ]---

     
    Last edit: Adamanov Vyacheslav 2014-09-03
  • Interface in bond without vlan's. The same symptoms on 3 routers (S3420GPLX, S3420GPV). Flapped periodicaly with lower freq:

    [14653.698613] e1000e 0000:00:19.0 eth1: Detected Hardware Unit Hang:
    [14653.698613] TDH <75d>
    [14653.698613] TDT <79e>
    [14653.698613] next_to_use <79e>
    [14653.698613] next_to_clean <75d>
    [14653.698613] buffer_info[next_to_clean]:
    [14653.698613] time_stamp <10015e5f2>
    [14653.698613] next_to_watch <75d>
    [14653.698613] jiffies <10015e82f>
    [14653.698613] next_to_watch.status <0>
    [14653.698613] MAC Status <40080083>
    [14653.698613] PHY Status <796d>
    [14653.698613] PHY 1000BASE-T Status <3800>
    [14653.698613] PHY Extended Status <2000>
    [14653.698613] PCI Status <10>
    [14654.450711] e1000e 0000:00:19.0 eth1: Reset adapter unexpectedly
    [14654.747871] bonding: bond0: link status definitely down for interface eth1, disabling it
    [14658.118322] e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
    [14658.247071] bonding: bond0: link status definitely up for interface eth1, 1000 Mbps full duplex.
    [15274.556288] e1000e 0000:00:19.0 eth1: Detected Hardware Unit Hang:
    [15274.556288] TDH <73>
    [15274.556288] TDT <10b>
    [15274.556288] next_to_use <10b>
    [15274.556288] next_to_clean <72>
    [15274.556288] buffer_info[next_to_clean]:
    [15274.556288] time_stamp <10016da0b>
    [15274.556288] next_to_watch <74>
    [15274.556288] jiffies <10016dac3>
    [15274.556288] next_to_watch.status <0>
    [15274.556288] MAC Status <40080083>
    [15274.556288] PHY Status <796d>
    [15274.556288] PHY 1000BASE-T Status <3800>
    [15274.556288] PHY Extended Status <2000>
    [15274.556288] PCI Status <10>

    board with last BIOS vers:
    DMI: Intel Corporation S3420GP/S3420GP, BIOS S3420GP.86B.01.00.0052.051620141338 05/16/2014

    driver: e1000e
    version: 2.3.2-k
    firmware-version: 0.10-2
    bus-info: 0000:00:19.0

    00:19.0 Ethernet controller: Intel Corporation 82578DM Gigabit Network Connection (rev 05)
    Subsystem: Intel Corporation Device 34ec
    Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort-="">SERR- <PERR- INTx-
    Latency: 0
    Interrupt: pin A routed to IRQ 62
    Region 0: Memory at b3200000 (32-bit, non-prefetchable) [size=128K]
    Region 1: Memory at b3224000 (32-bit, non-prefetchable) [size=4K]
    Region 2: I/O ports at 4020 [size=32]
    Capabilities: [c8] Power Management version 2
    Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
    Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
    Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
    Address: 00000000fee0f00c Data: 4179
    Capabilities: [e0] PCI Advanced Features
    AFCap: TP+ FLR+
    AFCtrl: FLR-
    AFStatus: TP-
    Kernel driver in use: e1000e

    root@r5:~# ethtool -d eth1
    MAC Registers


    0x00000: CTRL (Device control register) 0x00100240
    Endian mode (buffers): little
    Link reset: normal
    Set link up: 1
    Invert Loss-Of-Signal: no
    Receive flow control: disabled
    Transmit flow control: disabled
    VLAN mode: disabled
    Auto speed detect: disabled
    Speed select: 1000Mb/s
    Force speed: no
    Force duplex: no
    0x00008: STATUS (Device status register) 0x40080083
    Duplex: full
    Link up: link config
    TBI mode: disabled
    Link speed: 1000Mb/s
    Bus type: PCI
    Bus speed: 33MHz
    Bus width: 32-bit
    0x00100: RCTL (Receive control register) 0x00000000
    Receiver: disabled
    Store bad packets: disabled
    Unicast promiscuous: disabled
    Multicast promiscuous: disabled
    Long packet: disabled
    Descriptor minimum threshold size: 1/2
    Broadcast accept mode: ignore
    VLAN filter: disabled
    Canonical form indicator: disabled
    Discard pause frames: filtered
    Pass MAC control frames: don't pass
    Receive buffer size: 2048
    0x02808: RDLEN (Receive desc length) 0x00000000
    0x02810: RDH (Receive desc head) 0x00000000
    0x02818: RDT (Receive desc tail) 0x00000000
    0x02820: RDTR (Receive delay timer) 0x00000000
    0x00400: TCTL (Transmit ctrl register) 0x3003F0F8
    Transmitter: disabled
    Pad short packets: enabled
    Software XOFF Transmission: disabled
    Re-transmit on late collision: disabled
    0x03808: TDLEN (Transmit desc length) 0x00000000
    0x03810: TDH (Transmit desc head) 0x00000000
    0x03818: TDT (Transmit desc tail) 0x00000000
    0x03820: TIDV (Transmit delay timer) 0x00000000
    PHY type: unknown

     
  • nic statistic:

    root@r3:~# ethtool -S eth1
    NIC statistics:
    rx_packets: 39062874
    tx_packets: 22426122
    rx_bytes: 41520470153
    tx_bytes: 11845821596
    rx_broadcast: 1134
    tx_broadcast: 0
    rx_multicast: 1504
    tx_multicast: 2131
    rx_errors: 0
    tx_errors: 0
    tx_dropped: 0
    multicast: 1504
    collisions: 0
    rx_length_errors: 0
    rx_over_errors: 0
    rx_crc_errors: 0
    rx_frame_errors: 0
    rx_no_buffer_count: 40
    rx_missed_errors: 93857
    tx_aborted_errors: 0
    tx_carrier_errors: 0
    tx_fifo_errors: 0
    tx_heartbeat_errors: 0
    tx_window_errors: 0
    tx_abort_late_coll: 0
    tx_deferred_ok: 0
    tx_single_coll_ok: 0
    tx_multi_coll_ok: 0
    tx_timeout_count: 17
    tx_restart_queue: 1
    rx_long_length_errors: 0
    rx_short_length_errors: 0
    rx_align_errors: 0
    tx_tcp_seg_good: 85446
    tx_tcp_seg_failed: 0
    rx_flow_control_xon: 0
    rx_flow_control_xoff: 0
    tx_flow_control_xon: 0
    tx_flow_control_xoff: 0
    rx_csum_offload_good: 38828383
    rx_csum_offload_errors: 1858
    rx_header_split: 0
    alloc_rx_buff_failed: 0
    tx_smbus: 0
    rx_smbus: 0
    dropped_smbus: 0
    rx_dma_failed: 0
    tx_dma_failed: 0
    rx_hwtstamp_cleared: 0
    uncorr_ecc_errors: 0
    corr_ecc_errors: 0

     
    Last edit: Adamanov Vyacheslav 2014-09-03
  • Todd Fujinaka
    Todd Fujinaka
    2015-05-12

    • assigned_to: dertman --> Yanir Lubetkin
     
  • Oliver Wagner
    Oliver Wagner
    2015-06-25

    Problem still happens with

    e1000e: Intel(R) PRO/1000 Network Driver - 3.2.4.2-NAPI

    on

    Linux gateway1 3.13.0-55-generic #94-Ubuntu SMP Thu Jun 18 00:27:10 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

     
  • Todd Fujinaka
    Todd Fujinaka
    2015-08-20

    • assigned_to: Yanir Lubetkin --> Raanan Avargil
     
  • Gftstf
    Gftstf
    2015-09-27

    Problem persists with version 3.2.4.2-NAPI (manually compiled against Ubuntu 14.04 LTS vivid kernel). My setup is 82579LM (em1), which is NAT-ed to a dual port 82571EB adapter (p2p1, p2p2, bridged on br0). During heavy file transfer operations the adapters hang (syslog snippet attached).

    root@hostname:~# ethtool -i em1
    driver: e1000e
    version: 3.2.4.2-NAPI
    firmware-version: 0.13-4
    bus-info: 0000:00:19.0
    supports-statistics: yes
    supports-test: yes
    supports-eeprom-access: yes
    supports-register-dump: yes
    supports-priv-flags: no
    
    root@hostname:~# ethtool -i p2p1
    driver: e1000e
    version: 3.2.4.2-NAPI
    firmware-version: 5.11-2
    bus-info: 0000:02:00.0
    supports-statistics: yes
    supports-test: yes
    supports-eeprom-access: yes
    supports-register-dump: yes
    supports-priv-flags: no
    
    root@hostname:~# uname -a
    Linux hostname 3.19.0-28-generic #30~14.04.1-Ubuntu SMP Tue Sep 1 09:32:55 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
    

    Disabling TSO made the hangups disappear.

     
    Attachments
  • Oliver Wagner
    Oliver Wagner
    3 days ago

    Problem persists with driver version 3.3.3 on

    Linux version 3.13.0-77-generic (buildd@lcy01-30) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #121-Ubuntu SMP Wed Jan 20 10:50:42 UTC 2016