From: <Martin_Zielinski@McAfee.com> - 2011-07-28 13:22:47
|
Thanks Jeff, I don't think I was in contact with Don before. Shall I send the requested information directly? I don't think a large attachment will go to the list. To not only spam the list here are some information about the system: Kernel: 2.6.32.36 ethtool -i eth4 driver: ixgbe version: 2.0.44-k2 firmware-version: 1.8-0 bus-info: 0000:0a:00.0 lspci -vvv is very large but at the error state no traffic is accepted. So the PCIe speed as mentioned in the datasheet is no limiting factor here. 0a:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit Network Connection (rev 01) Subsystem: Unknown device 1b6d:00a0 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 40 Region 0: Memory at df5c0000 (64-bit, non-prefetchable) [size=128K] Region 2: I/O ports at ecc0 [size=32] Region 4: Memory at df5b8000 (64-bit, non-prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=1 PME- Capabilities: [50] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable- Address: 0000000000000000 Data: 0000 Capabilities: [70] MSI-X: Enable+ Mask- TabSize=64 Vector table: BAR=4 offset=00000000 PBA: BAR=4 offset=00002000 Capabilities: [a0] Express Endpoint IRQ 0 Device: Supported: MaxPayload 512 bytes, PhantFunc 0, ExtTag- Device: Latency L0s <512ns, L1 <64us Device: AtnBtn- AtnInd- PwrInd- Device: Errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ Device: MaxPayload 256 bytes, MaxReadReq 512 bytes Link: Supported Speed unknown, Width x8, ASPM L0s, Port 4 Link: Latency L0s unlimited, L1 <32us Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch- Link: Speed unknown, Width x8 Capabilities: [e0] Vital Product Data Capabilities: [100] Advanced Error Reporting Capabilities: [140] Device Serial Number 00-00-00-ff-ff-00-00-00 Capabilities: [150] Unknown (14) Capabilities: [160] Unknown (16) ethtool -d output twice a few seconds one after the other (only changes and not TX) 0x00048: FRTIMER (Free Running Timer) 0x55AE1845 0x5613A01A 0x03FA0: mpc0 (Missed Packets Count 0) 0x0000511C 0x00005124 0x0405C: prc64 (Packets Received (64B) Count) 0x000172DF 0x000172E7 0x04078: bprc (Broadcast Packets Rx Count) 0x0000239E 0x000023A3 0x0407C: mprc (Multicast Packets Rx Count) 0x00015BAE 0x00015BB1 0x04088: gorcl (Good Octets Rx Count Low) 0x6882B90F 0x6882BB0F 0x040C0: torl (Total Octets Rx Count Low) 0x688B88FF 0x688B8AFF 0x040D0: tpr (Total Packets Received) 0x0D6326BB 0x0D6326C3 Neither the Receive Descriptor Head nor Tail register changes. dmesg: Nothing Cheers, Martin -----Original Message----- From: ta...@gm... [mailto:ta...@gm...] On Behalf Of Jeff Kirsher Sent: Donnerstag, 28. Juli 2011 13:17 To: Zielinski, Martin; Don Skidmore Cc: e10...@li... Subject: Re: [E1000-devel] ixgbe: not accepting any packets - increasing rx_missed_errors On Thu, Jul 28, 2011 at 01:41, <Mar...@mc...> wrote: > Hello, > > With a 82559EB card a customer often comes into the situation that no packet can be received anymore until network restart. > The symptom is that the rx_missed_errors register counts each packet but no more packets can be seen by the kernel. > > We are using a 2.6.32 kernel with version: 2.0.44-k2. > > I am aware that this is an old driver version, but please give me a chance to explain why I'm asking for information anyway: > > - The driver is part of the 2.6.32 stable branch. > - It takes 2 - 10 days to reproduce it in the lab. So if we use a newer version, we cannot be sure that the problem is fixed just because we don't see it anymore. > - According to the customer the issue started with an update that adds the memory boundary and disables packet split (errata #45). PSRTYPE register is not initialized in this version. Everything in the previous version worked (so with the even older driver). > - It is a critical customer. If we provide a new version and it fails again this will become a problem. > - All reports about this issue end up without resolution or the advice to update the driver. I really tried to extract an explanation or the exact changeset that fixes the issue. But I failed. So for documentation purposes it would be a good thing to make the solution googleble. > > Don Skidmore wrote in: > > http://sourceforge.net/mailarchive/forum.php?thread_name=29F4ED941D916B48B88B4D2A4F3D1B9C01D2E285AF%40orsmsx509.amr.corp.intel.com&forum_name=e1000-devel > > "Have you tried using the latest Source Forge driver (3.2.9). Including in it was a fix that corrected an erratum that sounds very similar to your issue." > > I'd greatly appreciate if someone can point me to the right direction. What I'd like to understand is: > Don seems to be have been working with you, so I will let him continue in assisting you (since he is the ixgbe Maintainer). There have been 15 more recent out-of-tree driver release's since the you are using, so it is very possible that the issue you are seeing was fixed later on in one of the more recent driver releases, and the fix was not back-ported to the older 2.6.32 kernel. If Don does not have the information already, any information that you can provide (i.e. kernel config, lspci -vvv output. dmesg log with the error's you are seeing). This information can help us greatly in determining what fixes that were implemented in later versions of the driver would have an effect on the issue you are seeing. Once we narrow down the fix(es) that resolve the issue, then we can provide the additional information on what the exact change is and why. With some (not all) fixes, we should have testing scenarios which would consistently reproduce the issue, so that we can accurately determine if the fix(es) resolved the issue. I know that I am speaking in generalities and nothing specific, this is mainly because I do not the exact issue you are having the the possible fixes that Don is aware of. I have added Don to this email thread, and will let him work with you to get the specifics on the issue(s) you are seeing. So that we can work on getting a resolution to you, whether it be an updated driver or a patchset against your kernel. Cheers, Jeff > - What change exactly is the fix for this issue? > - How can I verify that I am seeing the same issue (some special register/memory dump/...)? > - How can I verify that the issue is fixed. > > I know - I'm asking for support for a driver that is part of the stable kernel but very old in your development line. > So I would be even happier if someone takes the time to answer my questions. > > Cheers, > Martin > > Martin Zielinski > Dipl. Inform > Senior Engineer > > McAfee GmbH > > Firmensitz: Muenchen > Amtsgericht: AG Muenchen > Handelsregister: HRB 144340 > Geschaeftsfuehrer: Emmet Russell, Keith Krzeminski, Douglas Rice > Bankverbindung: ABN-Amro Bank N.V. Konto 671 211 9006 > UST-ID: DE168122444 > > ------------------------------------------------------------------------------ > Got Input? Slashdot Needs You. > Take our quick survey online. Come on, we don't ask for help often. > Plus, you'll get a chance to win $100 to spend on ThinkGeek. > http://p.sf.net/sfu/slashdot-survey > _______________________________________________ > E1000-devel mailing list > E10...@li... > https://lists.sourceforge.net/lists/listinfo/e1000-devel > To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired > -- Cheers, Jeff Firmensitz: Muenchen Amtsgericht: AG Muenchen Handelsregister: HRB 144340 Geschaeftsfuehrer: Emmet Russell, Keith Krzeminski, Douglas Rice Bankverbindung: ABN-Amro Bank N.V. Konto 671 211 9006 UST-ID: DE168122444 |