#336 1.1.2 -- loses connectivity

closed
e1000e (107)
standalone_driver
5
2013-07-08
2009-12-04
James Bellinger
No

Howdy. The built-in drivers for Ubuntu 9.10 had this same problem, but the 1.1.2 from Intel's site did as well. They appear to be the same as the 1.1.2 posted here. Tried upgrading to Ubuntu 10.04 beta since it has a newer kernel, those drivers do it too.

What will happen is that the network connectivity will suddenly cease to work, after which ifconfig will start racking up literally billions or trillions of errors on RX, TX, collisions, etc. There does not seem to be any pattern to it, and it doesn't require an extreme amount of traffic either -- last time I had received 1.5MB and sent 5MB before it died.

The NIC is Intel 82574L on Supermicro X8SIE-F (two NICs integrated onto the board).

What can I do to diagnose or fix this? This is an essentially unusable level of stability.

Discussion

<< < 1 2 3 (Page 3 of 3)
  • Emil Tantilov
    Emil Tantilov
    2010-04-01

    Michael,

    With FC12 stock kernel I was able to reproduce the issue almost immediately. Something that I noticed on this kernel (2.6.31.5-127.fc12) - ASPM is enabled on the 82574 interfaces with the FC12 kernel despite it being disabled in the BIOS. With ASPM disabled from the kernel (pcie_aspm=off) my test has been running without issues so far - I will leave it running for extended period of time just to make sure.

    Could you please attach the output from lspci -vvv from your working and failing kernels?

    You mentioned that e1000e disabled ASPM - this is only partially true. The current e1000e driver only disables L1 ASPM, while on 82574 it should disable both L0s and L1 when MTU=1500. We have a patch that should fix this and will hopefully be out soon. That is why I suggested that you need to make sure ASPM is disabled on those devices.

    Its interesting though - as ASPM is disabled when I boot into 2.6.33.1 kernel - at least with my config. Since you mentioned that you saw a failure on 2.6.33.1 - could you post your configuration file?

     

  • Anonymous
    2010-04-01

    emiltan, adding pcie_aspm=off did the trick for both .33 and .32 kernels. I successfully transferred 4gb of data on each kernel with the 1.0.2 and 1.1.2 driver. Thanks for the tip.

    I'm curious as to why ASPM must be disabled for this chipset. Is ASPM not a feature of this chipset or is the driver broken and disabling it a workaround? If the driver is broken, will we see a fix in the future? I'd like to have as much power savings as possible. :)

     
  • Emil Tantilov
    Emil Tantilov
    2010-04-02

    Michael, thanks for the feedback!

    Unfortunately the ASPM issue is a known HW errata with the part, so this is not a driver bug. Specifically for 82574 L0s should be disabled when using standard frames. So you can still have L1 enabled in this configuration. We are working on a patch to disable the correct ASPM states for all affected devices and hopefully it will be out soon.

     
  • Todd Fujinaka
    Todd Fujinaka
    2013-07-08

    Closing due to inactivity.

     
  • Todd Fujinaka
    Todd Fujinaka
    2013-07-08

    • status: open --> closed
     
<< < 1 2 3 (Page 3 of 3)