#100 e1000e intermittent freeze-until-reboot in 2.6.36+

closed
e1000e (108)
in-kernel_driver
6
2013-07-09
2011-02-02
Nix
No

This is possibly ASPM-related: diagnostics to determine it are going on now.

Described in full in http://sourceforge.net/mailarchive/forum.php?thread_name=87k4kfq1at.fsf%40spindle.srvr.nix&forum_name=e1000-devel, in brief, after the hang, a register dump looks like this:

Offset Values
-------- -----
000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
010: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
020: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
030: 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
060: 06 88 00 00 06 88 00 00 00 00 00 00 00 00 00 00
070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Keeping the adapter totally idle or persistently active (via pingflooding and apparently even ping -s 1) keeps the hang from happening.

Discussion

<< < 1 2 (Page 2 of 2)
  • Nix

    Nix - 2011-03-22

    That does look plausible, doesn't it. Unfortunately the failure mode is the same with that tree applied, with or without the patch: ASPM remains enabled: no change. (The set of ASPM-related messages gains an extra line:

    Unable to assume _OSC PCIe control. Disabling ASPM

    which appears somewhat inaccurate, alas.)

    I'll do some more debugging this weekend... (starting the Thursday after that my current job finishes and I can spend a lot more time on this.)

     
  • Nix

    Nix - 2011-03-22

    Still no change, I'm afraid.

     
  • Bruce Allan

    Bruce Allan - 2011-03-23

    ASPM still enabled, eh? Hmm, this has gotten out of my expertise I'm afraid, and when using the pci-2.6 tree with the most recent patches I am not able to reproduce the problem on any of my systems. You should consider taking the issue of ASPM L0s not getting disabled on the adapter in your system to the PCI experts on the linux-pci@vger.kernel.org mailing list. I'll continue to monitor the situation, but the PCI maintainers are far more knowledgable of the PCI code than I am and you have a much better chance getting this resolved through them.

     
  • Todd Fujinaka

    Todd Fujinaka - 2013-07-09

    Closing due to inactivity.

     
  • Todd Fujinaka

    Todd Fujinaka - 2013-07-09
    • status: open --> closed
     
<< < 1 2 (Page 2 of 2)

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks