#353 ixgbe FDIR reinit error message when runnign stress test

closed
nobody
None
in-kernel_driver
9
2014-04-13
2012-08-16
Bhushan
No

Hi,

When running stress tests, we came across some error messages in the kern.log file. Below is the error report. The error is seen on 3.2.18 kernel.

15:44:41 dl380g8-6 kernel: [ 4054.222717] ixgbe 0000:04:00.0: eth7:
failed to finish FDIR re-initialization, ignored adding FDIR ATR filters
Jun 8 16:50:07 dl380g8-6 kernel: [ 7978.420491] ixgbe 0000:04:00.1: eth6:
failed to finish FDIR re-initialization, ignored adding FDIR ATR filters

The function which prints this dignostic message is executed via worker thread function ixgbe_service_event_schedule. One of the case is when the NIC hangs while doing TX. There are other events when ixgbe_service_event_schedule is triggered, but could not understand the macro checks. Can you please look into this ?

Regards
Shashi

Discussion

  • Bhushan
    Bhushan
    2012-08-17

    • priority: 1 --> 9
     
  • Emil Tantilov
    Emil Tantilov
    2012-08-17

    Could you please the following information:
    1. Driver and kernel versions.
    2. The type/model of the netowrk adapter in use (lspci -vvv/lspci -n)
    3. What exactly do you mean by stress test? Can you provide a script that can replicate it?
    4. Any driver settings different from the default.
    5. Kernel config file (unless it is a stock distro kernel)
    6. Attach the full output from dmesg after the issue occurs.

     
  • Bhushan
    Bhushan
    2012-08-22

    Hi Emil,

    As I was busy with other issues, hence the delay. Below are the details required

    1) Kernel Version : 3.2 Driver Version : 3.9.17-NAPI
    2)firmware-version: 0x800003d6, 1.180.0
    3) NIC type/model : Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01)
    4) No driver settings or configuration has been changed
    5) I have attached the .config file
    6) The issue was reported around 2 months back, So I don't have access to the kernel log files.The only log information that was provided by the testing team is below

    Jun 8 15:44:41 dl380g8-6 kernel: [ 4054.222717] ixgbe 0000:04:00.0: eth7:
    failed to finish FDIR re-initialization, ignored adding FDIR ATR filters
    Jun 8 16:50:07 dl380g8-6 kernel: [ 7978.420491] ixgbe 0000:04:00.1: eth6:
    failed to finish FDIR re-initialization, ignored adding FDIR ATR filters

    Regards
    Shashidhara

     
    Attachments
  • Emil Tantilov
    Emil Tantilov
    2012-08-23

    Thanks for the information, but you did not specify the nature of the stress test you were running.

     
  • Bhushan
    Bhushan
    2012-08-28

    Test cases are designed to find defects by stressing the system and causing system failures. The test tool is a collection of programs that are designed to stress certain areas of the system, including CPU, memory, graphics, network, power management, and storage.
    Test tool is suitable for system stress testing, including CHO (Continuous Hours of Operation) testing, peripheral stress testing, especially channel mix and load tests, and for providing a system load for QE testing.

     
    Last edit: Bhushan 2012-08-28
  • Bhushan
    Bhushan
    2012-09-10

    Hi Emil,
    Could you please look into this .
    Let me know if you need any more data.

    Thanks

     
  • Emil Tantilov
    Emil Tantilov
    2012-09-11

    Iside from the message are there any issues that you see during the stress test?

    The message basically means that the HW was not able to initialize the Flow Director for some of the flows. It can occur in a stress test (I am still not sure what the test was in this case) and it should not lead to any issues.

    You can see in ixgbe_reinit_fdir_tables_82599() there are 2 loops that poll for FDIRCMD.CMD and FDIRCTRL.INIT_DONE. You can try increasing the time to poll, or if you can reliably reproduce this issue try building the driver with make CFLAGS_EXTRA=-DDBG or just change hw_dbg to printk in the 2 instances to show which of the 2 polls times out.

     
  • Bhushan
    Bhushan
    2012-09-11

    Hi Emil,

    Thanks for your reply.I will perform mentioned activity & provide all the out puts.

    Thanks!

     
  • Emil Tantilov
    Emil Tantilov
    2012-09-18

    Just a quick update. I was able to reproduce the issue and the timeout is caused by the poll loop in the re-init code. We'll release a pacth for this issue once vaildation is completed.

     
  • Bhushan
    Bhushan
    2012-10-03

    Thanks for your update.Is this patch will be avilable soon.

     
  • Todd Fujinaka
    Todd Fujinaka
    2013-07-09

    • status: open --> closed
     
  • Todd Fujinaka
    Todd Fujinaka
    2013-07-09

    Closing due to age.