#356 NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out

closed
nobody
None
standalone_driver
9
2015-02-09
2012-08-31
bugreporta
No

there is a severe open issue since years reported in Tickets #44, #53 and #55 reported by several people causing the server to crash and requiring a hardware reboot. No solution till now. Can anyone solve this? Or at least suggest a workaround?

Thanks.


ethtool -i eth0

driver: e1000e
version: 2.0.0.1-NAPI
firmware-version: 0.12-2
bus-info: 0000:00:19.0


/boot/grub# grep "pcie" grub.cfg
module /boot/vmlinuz-2.6.32-5-xen-amd64 placeholder root=UUID=a508a0f0-1f9f-411b-b0c8-5bb8d07a7e99 ro pcie_aspm=off quiet


cat /proc/version

Linux version 2.6.32-5-xen-amd64 (Debian 2.6.32-35) (dannf@debian.org) (gcc version 4.3.5 (Debian 4.3.5-4) ) #1 SMP Tue Jun 14 12:46:30 UTC 2011


lcpci -vvv

...
06:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
Subsystem: Fujitsu Technology Solutions Device 1192
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort-="">SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 2253
Region 0: Memory at de120000 (32-bit, non-prefetchable) [size=128K]
Region 2: I/O ports at 2000 [size=32]
Region 3: Memory at de100000 (32-bit, non-prefetchable) [size=16K]
Capabilities: [c8] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fee0400c Data: 41b1
Capabilities: [e0] Express (v1) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <128ns, L1 <64us
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
Capabilities: [a0] MSI-X: Enable- Count=5 Masked-
Vector table: BAR=3 offset=00000000
PBA: BAR=3 offset=00002000
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
Capabilities: [140 v1] Device Serial Number 00-40-d0-ff-ff-c0-76-14
Kernel driver in use: e1000e
...


/etc/modprobe.d# cat e1000e.conf
options e1000e IntMode=1,1


dmesg | grep eth

[ 3.038619] e1000e 0000:00:19.0: eth0: (PCI Express:2.5GT/s:Width x1) 00:40:d0:c0:76:15
[ 3.038622] e1000e 0000:00:19.0: eth0: Intel(R) PRO/1000 Network Connection
[ 3.038662] e1000e 0000:00:19.0: eth0: MAC: 10, PHY: 9, PBA No: 313130-032
[ 3.168404] e1000e 0000:06:00.0: eth1: (PCI Express:2.5GT/s:Width x1) 00:40:d0:c0:76:14
[ 3.168407] e1000e 0000:06:00.0: eth1: Intel(R) PRO/1000 Network Connection
[ 3.168492] e1000e 0000:06:00.0: eth1: MAC: 4, PHY: 8, PBA No: 313132-030
[ 11.307611] ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 11.771864] ADDRCONF(NETDEV_UP): eth1: link is not ready
[ 12.791315] e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: None
[ 12.791322] e1000e 0000:00:19.0: eth0: 10/100 speed: disabling TSO
[ 12.792790] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 23.129823] eth0: no IPv6 routers present


Aug 31 05:00:54 stars28 kernel: [48313.917884] ------------[ cut here ]------------
Aug 31 05:00:54 stars28 kernel: [48313.917899] WARNING: at /build/buildd-linux-2.6_2.6.32-35-amd64-aZSlKL/linux-2.6-2.6.32/debian/build/source_amd64_xen/net/sched/sch_generic.c:261 dev_watchdog+0xe2/0x194()
Aug 31 05:00:54 stars28 kernel: [48313.917905] Hardware name: PRIMERGY RX100 S6
Aug 31 05:00:54 stars28 kernel: [48313.917910] NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
Aug 31 05:00:54 stars28 kernel: [48313.917914] Modules linked in: ipt_MASQUERADE iptable_nat nf_nat tun xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_physdev loop iptable_filter ip_tables x_tables bridge stp nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc dummy fuse dm_crypt drbd lru_cache cn xen_evtchn xenfs snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801 evdev serio_raw pcspkr i2c_core container power_meter ac processor button acpi_processor ext3 jbd mbcache dm_mod raid1 md_mod usbhid hid sd_mod crc_t10dif ahci libata thermal ehci_hcd e1000e scsi_mod usbcore nls_base thermal_sys [last unloaded: scsi_wait_scan]
Aug 31 05:00:54 stars28 kernel: [48313.918038] Pid: 0, comm: swapper Not tainted 2.6.32-5-xen-amd64 #1
Aug 31 05:00:54 stars28 kernel: [48313.918042] Call Trace:
Aug 31 05:00:54 stars28 kernel: [48313.918046] <IRQ> [<ffffffff81272e42>] ? dev_watchdog+0xe2/0x194
Aug 31 05:00:54 stars28 kernel: [48313.918059] [<ffffffff81272e42>] ? dev_watchdog+0xe2/0x194
Aug 31 05:00:54 stars28 kernel: [48313.918068] [<ffffffff8104ef00>] ? warn_slowpath_common+0x77/0xa3
Aug 31 05:00:54 stars28 kernel: [48313.918076] [<ffffffff81067029>] ? run_posix_cpu_timers+0x25/0x6ea
Aug 31 05:00:54 stars28 kernel: [48313.918083] [<ffffffff81272d60>] ? dev_watchdog+0x0/0x194
Aug 31 05:00:54 stars28 kernel: [48313.918089] [<ffffffff8104ef88>] ? warn_slowpath_fmt+0x51/0x59
Aug 31 05:00:54 stars28 kernel: [48313.918100] [<ffffffff8130d42a>] ? _spin_unlock_irqrestore+0xd/0xe
Aug 31 05:00:54 stars28 kernel: [48313.918106] [<ffffffff8104b41e>] ? try_to_wake_up+0x289/0x29b
Aug 31 05:00:54 stars28 kernel: [48313.918115] [<ffffffff8100ec52>] ? xen_vcpuop_set_next_event+0x4c/0x60
Aug 31 05:00:54 stars28 kernel: [48313.918122] [<ffffffff81272d34>] ? netif_tx_lock+0x3d/0x69
Aug 31 05:00:54 stars28 kernel: [48313.918131] [<ffffffff8125d7da>] ? netdev_drivername+0x3b/0x40
Aug 31 05:00:54 stars28 kernel: [48313.918137] [<ffffffff81272e42>] ? dev_watchdog+0xe2/0x194
Aug 31 05:00:54 stars28 kernel: [48313.918144] [<ffffffff8100ece2>] ? check_events+0x12/0x20
Aug 31 05:00:54 stars28 kernel: [48313.918151] [<ffffffff8100ec06>] ? xen_vcpuop_set_next_event+0x0/0x60
Aug 31 05:00:54 stars28 kernel: [48313.918161] [<ffffffff8105b5ef>] ? run_timer_softirq+0x1c9/0x268
Aug 31 05:00:54 stars28 kernel: [48313.918169] [<ffffffff81054c9b>] ? __do_softirq+0xdd/0x1a6
Aug 31 05:00:54 stars28 kernel: [48313.918176] [<ffffffff81012cac>] ? call_softirq+0x1c/0x30
Aug 31 05:00:54 stars28 kernel: [48313.918183] [<ffffffff8101422b>] ? do_softirq+0x3f/0x7c
Aug 31 05:00:54 stars28 kernel: [48313.918189] [<ffffffff81054b0b>] ? irq_exit+0x36/0x76
Aug 31 05:00:54 stars28 kernel: [48313.918198] [<ffffffff811f2461>] ? xen_evtchn_do_upcall+0x33/0x42
Aug 31 05:00:54 stars28 kernel: [48313.918204] [<ffffffff81012cfe>] ? xen_do_hypervisor_callback+0x1e/0x30
Aug 31 05:00:54 stars28 kernel: [48313.918208] <EOI> [<ffffffff810093aa>] ? hypercall_page+0x3aa/0x1001
Aug 31 05:00:54 stars28 kernel: [48313.918220] [<ffffffff810093aa>] ? hypercall_page+0x3aa/0x1001
Aug 31 05:00:54 stars28 kernel: [48313.918227] [<ffffffff8100e6a7>] ? xen_safe_halt+0xc/0x15
Aug 31 05:00:54 stars28 kernel: [48313.918233] [<ffffffff8100bfc7>] ? xen_idle+0x37/0x40
Aug 31 05:00:54 stars28 kernel: [48313.918239] [<ffffffff81010eb1>] ? cpu_idle+0xa2/0xda
Aug 31 05:00:54 stars28 kernel: [48313.918246] [<ffffffff81509cdd>] ? start_kernel+0x3dc/0x3e8
Aug 31 05:00:54 stars28 kernel: [48313.918253] [<ffffffff8150bc93>] ? xen_start_kernel+0x586/0x58a
Aug 31 05:00:54 stars28 kernel: [48313.918257] ---[ end trace 632ecf74ca434d46 ]---
Aug 31 05:00:54 stars28 kernel: [48313.918285] e1000e 0000:00:19.0: eth0: Reset adapter


Can anybody solve this issue or will I have to change the server hardware to other than intel nic?

Discussion

  • bugreporta

    bugreporta - 2012-08-31

    Bug #333 seems also to be the same problem.

     
  • Todd Fujinaka

    Todd Fujinaka - 2013-07-09
    • status: open --> closed
     
  • Todd Fujinaka

    Todd Fujinaka - 2013-07-09

    Closing due to age.

     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks