#321 e100 hang at hibernate, firmware related

david graham
e100 (15)
Ivan Kalvatchev

I use IntelPro/100 as second lan card in my fairly old machine.
Since kernel 2.6.29 I'm having problem suspending and resuming (using native kernel hibernation) if the card is enabled.
If the interface is brought down and module removed, hibernation is successful.

At first, the hibernation suspending takes forever, however it seems that there are indeed some timeouts that run out, and it seems
to be able to move to next stage. E.g. at some point sysrq doesn't work, but after few minutes it does. However suspending never finishes and never powers down.
Sometimes the hibernation code manages to write image to the swap, so at restart kernel tries to resume the session, and hangs forever.

To this report I attach the log from boot and single suspend using "pm_test=devices" debug mechanism.
This test passes successfully, however new attempt to do "pm_test=devices" hangs the system. The system also hangs if I do rmmod e100 after one "pm_test=devices".
I have loaded the e100 module with the option debug=10, so I hope it have all needed debug info.

While checking the logs I noticed that firmware loading is failing. It seems that udev is called and it tries to run firmware.sh script that is supposed to feed the firmware images. I have changed the script to display more info, in case script is given wrong data:
err /sys$DEVPATH/loading $FIRMWARE
err ls /sys/class/firmware
err ls /sys/devices/pci0000:00/0000:00:09.0/
err ls -R /sys/devices/pci0000:00/0000:00:09.0/firmware/

After a number of tries I got to the conclusion that the the resuming module tries to load the firmware, however the userland process hangs at it (probably because all userland is still frozen). The module timeouts and fails at firmware loading. Then later when the userland is enabled the firmware.sh script is run but the directory/file it have to work are already removed.
I attach log file with /sys/class/firmware/timeout set to 300, this produces hang check verbose backtraces.

While there are indeed serious firmware design flaws, this doesn't explain the hang at suspending(!!), or the fact
that e100 always report success on resume even when the e100_open() function fails due to not-found firmware.


  • david graham
    david graham

    Hi Ivan, Thanks for such a clear problem description.

    I had no trouble reproducing this result on a 2.6.31 kernel, using commands as in the Documentation/power/basic-pm-debugging.txt

    echo none >/sys/power/pm_test
    echo reboot >/sys/power/disk
    echo disk >/sys/power/state

    I see exactly what you reported, but do not yet understand why the firmware file will not load on resume. (You have gone much deeper into this already than I have) . I also don't yet know why we see the hang on suspend, but I do know that it is also related to the firmware, because when I hacked the driver to remove the request_firmware call, I saw neither a hang at suspend nor any issue on resume.

    I'll discuss this with colleagues tomorrow, and we may well end up sending the issue out to lkml for comment. I'll let you know.

  • david graham
    david graham

    After discussions with David Woodhouse, the autor of the series of commits (5658c7, 4d2acf, d172e7,88ecf8 ) that introduced the external firmware loading mechanism to 2.6.27, I have created a workaround for the e100 driver, and it is headed upstream. I'll include the commit id here when it is known. In the meantime, I include the patch in its current state so that you can (if you wish) independantly test it.

  • david graham
    david graham

    The fix is accepted inio the net-next-2.6 tree, which means that it should appear in the 2.6.33 kernel. The commit id is 7e15b0c , fix is tited "e100: Fix to allow systems with FW based cards to resume from STD"