#430 PCIe link lost, device now detached

closed
None
standalone_driver
1
2015-03-11
2014-08-15
No

igb version 5.2.9.4

before the update and using the in_kernel one (kernel 3.12 ltsi) i had "Reset adapter" (and "transmit queue timedout"

tbh: i cannot rule out hardware issues because:
- i experience these errors in slightly increasing rate; today, i had to reboot 4 times...
- PCIe link lost sounds like something PCIe bus related and possibly not this driver
- the board ipmi shows that the +3V voltage is constantly moving between 2.6V and 3V, which marks it as critital at times, and then back to OK, though since the board itself seems to work ok for now, this might be a sensor issue?

==> if the +3V is the one used for PCIe bus... then i'd say there's the problem. if not, maybe it's something else?

though the board seems the same as the 2750 (with only the cpu different) this board doesn't have BIOS upgrades...

do you guys want more info?

Discussion

  • Maarten Vanraes

    Maarten Vanraes - 2014-08-16

    ok, some more info, i've got a hang today that had no information at all, just a hang, that i had to reboot manually

    and i also had a failure on only 1 of the 2 devices with transmit queue timed out + Detected Tx Unit Hang: (afterwards, it came up again, but nothing i did worked to get it working again...)

    Aug 16 17:15:35 localhost kernel: igb 0000:08:00.0: Detected Tx Unit Hang
    Tx Queue <0>
    TDH <fe>
    TDT <3>
    next_to_use <3>
    next_to_clean <fe>
    buffer_info[next_to_clean]
    time_stamp <10002c03c>
    next_to_watch <ffff88007a9bdfe0>
    jiffies <10002c118>
    desc.status <1098000>
    Aug 16 17:15:37 localhost kernel: igb 0000:08:00.0: Detected Tx Unit Hang
    Tx Queue <0>
    TDH <fe>
    TDT <3>
    next_to_use <3>
    next_to_clean <fe>
    buffer_info[next_to_clean]
    time_stamp <10002c03c>
    next_to_watch <ffff88007a9bdfe0>
    jiffies <10002c1e0>
    desc.status <1098000>
    Aug 16 17:15:39 localhost kernel: igb 0000:08:00.0: Detected Tx Unit Hang
    Tx Queue <0>
    TDH <fe>
    TDT <3>
    next_to_use <3>
    next_to_clean <fe>
    buffer_info[next_to_clean]
    time_stamp <10002c03c>
    next_to_watch <ffff88007a9bdfe0>
    jiffies <10002c2a8>
    desc.status <1098000>
    Aug 16 17:15:41 localhost kernel: igb 0000:08:00.0: Detected Tx Unit Hang
    Tx Queue <0>
    TDH <fe>
    TDT <3>
    next_to_use <3>
    next_to_clean <fe>
    buffer_info[next_to_clean]
    time_stamp <10002c03c>
    next_to_watch <ffff88007a9bdfe0>
    jiffies <10002c370>
    desc.status <1098000>
    Aug 16 17:15:42 localhost kernel: ------------[ cut here ]------------
    Aug 16 17:15:42 localhost kernel: WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0x24e/0x260()
    Aug 16 17:15:42 localhost kernel: NETDEV WATCHDOG: enp8s0 (igb): transmit queue 0 timed out
    Aug 16 17:15:42 localhost kernel: Modules linked in: ipt_IFWLOG ipt_psd cls_basic cls_flow cls_fw cls_u32 sch_fq_codel sch_tbf sch_prio sch_htb sch_hfsc sch_ingress sch_sfq xt_CHECKSUM ipt_rpfilter xt_statistic xt_CT xt_connlimit xt_realm xt_addrtype ip_set_hash_ip xt_comment xt_recent xt_nat ipt_ULOG ipt_REJECT ipt_MASQUERADE ipt_ECN ipt_CLUSTERIP ipt_ah xt_set ip_set nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_udplite nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_TPROXY xt_time xt_TCPMSS xt_tcpmss
    Aug 16 17:15:42 localhost kernel: xt_sctp xt_policy xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG nfnetlink_log xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_connmark xt_CLASSIFY xt_AUDIT xt_state iptable_raw iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 iptable_mangle nfnetlink iptable_filter ip_tables tun xt_multiport nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack nf_conntrack xt_tcpudp ip6t_REJECT xt_LOG ip6table_filter ip6_tables x_tables af_packet sit tunnel4 ip_tunnel bridge stp llc hid_generic iTCO_wdt gpio_ich iTCO_vendor_support nls_utf8 nls_cp437 vfat fat coretemp crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ast ttm drm_kms_helper drm i2c_algo_bit usbhid hid aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd
    Aug 16 17:15:42 localhost kernel: microcode igb(O) serio_raw i2c_i801 lpc_ich dca i2c_ismt i2c_core shpchp usb_storage ipmi_si button cpufreq_ondemand cpufreq_conservative cpufreq_powersave acpi_cpufreq processor evdev ipmi_devintf ipmi_msghandler nfsd auth_rpcgss oid_registry nfs_acl lockd sunrpc ipv6 autofs4 ehci_pci ehci_hcd usbcore usb_common
    Aug 16 17:15:42 localhost kernel: CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 3.12.25-server-3.mga4 #1
    Aug 16 17:15:42 localhost kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./C2550D4I, BIOS P1.40 01/14/2014
    Aug 16 17:15:42 localhost kernel: 0000000000000009 ffff88027fc03d70 ffffffff8161a014 ffff88027fc03db8
    Aug 16 17:15:42 localhost kernel: ffff88027fc03da8 ffffffff8106153d 0000000000000000 ffff880270f4a000
    Aug 16 17:15:42 localhost kernel: 0000000000000010 0000000000000000 ffff880270f4a000 ffff88027fc03e08
    Aug 16 17:15:42 localhost kernel: Call Trace:
    Aug 16 17:15:42 localhost kernel: <IRQ> [<ffffffff8161a014>] dump_stack+0x45/0x56
    Aug 16 17:15:42 localhost kernel: [<ffffffff8106153d>] warn_slowpath_common+0x7d/0xa0
    Aug 16 17:15:42 localhost kernel: [<ffffffff810615ac>] warn_slowpath_fmt+0x4c/0x50
    Aug 16 17:15:42 localhost kernel: [<ffffffff8107a87b>] ? queue_work+0x12b/0x310
    Aug 16 17:15:42 localhost kernel: [<ffffffff8157e5be>] dev_watchdog+0x24e/0x260
    Aug 16 17:15:42 localhost kernel: [<ffffffff8157e370>] ? dev_graft_qdisc+0x80/0x80
    Aug 16 17:15:42 localhost kernel: [<ffffffff8106dce6>] call_timer_fn+0x36/0x110
    Aug 16 17:15:42 localhost kernel: [<ffffffff8157e370>] ? dev_graft_qdisc+0x80/0x80
    Aug 16 17:15:42 localhost kernel: [<ffffffff8106e379>] run_timer_softirq+0x1e9/0x290
    Aug 16 17:15:42 localhost kernel: [<ffffffff810669f7>]
    do_softirq+0xf7/0x240
    Aug 16 17:15:42 localhost kernel: [<ffffffff8162ac5c>] call_softirq+0x1c/0x30
    Aug 16 17:15:42 localhost kernel: [<ffffffff81015a85>] do_softirq+0x55/0x90
    Aug 16 17:15:42 localhost kernel: [<ffffffff81066cb5>] irq_exit+0xa5/0xb0
    Aug 16 17:15:42 localhost kernel: [<ffffffff81042ab5>] smp_apic_timer_interrupt+0x45/0x60
    Aug 16 17:15:42 localhost kernel: [<ffffffff81629f9d>] apic_timer_interrupt+0x6d/0x80
    Aug 16 17:15:42 localhost kernel: <EOI> [<ffffffff8101b545>] ? native_sched_clock+0x15/0x80
    Aug 16 17:15:42 localhost kernel: [<ffffffff81521872>] ? cpuidle_enter_state+0x52/0xc0
    Aug 16 17:15:42 localhost kernel: [<ffffffff81521868>] ? cpuidle_enter_state+0x48/0xc0
    Aug 16 17:15:42 localhost kernel: [<ffffffff815219a7>] cpuidle_idle_call+0xc7/0x210
    Aug 16 17:15:42 localhost kernel: [<ffffffff8101cd7e>] arch_cpu_idle+0xe/0x30
    Aug 16 17:15:42 localhost kernel: [<ffffffff810b40c5>] cpu_startup_entry+0xe5/0x280
    Aug 16 17:15:42 localhost kernel: [<ffffffff8160a8d7>] rest_init+0x77/0x80
    Aug 16 17:15:42 localhost kernel: [<ffffffff81cf3f27>] start_kernel+0x42a/0x436
    Aug 16 17:15:42 localhost kernel: [<ffffffff81cf3906>] ? repair_env_string+0x5c/0x5c
    Aug 16 17:15:42 localhost kernel: [<ffffffff81cf3120>] ? early_idt_handlers+0x120/0x120
    Aug 16 17:15:42 localhost kernel: [<ffffffff81cf35ee>] x86_64_start_reservations+0x2a/0x2c
    Aug 16 17:15:42 localhost kernel: [<ffffffff81cf36f8>] x86_64_start_kernel+0x108/0x117
    Aug 16 17:15:42 localhost kernel: ---[ end trace ecf370eefc6ae3f5 ]---
    Aug 16 17:15:42 localhost ifplugd(enp8s0)[1080]: Link beat lost.
    Aug 16 17:15:45 localhost ifplugd(enp8s0)[1080]: Link beat detected.
    Aug 16 17:15:46 localhost kernel: igb 0000:08:00.0 enp8s0: igb: enp8s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
    [root@localhost ~]# ethtool --test enp8s0
    The test result is FAIL
    The test extra info:
    Register test (offline) 8552
    Eeprom test (offline) 0
    Interrupt test (offline) 4
    Loopback test (offline) 13
    Link test (on/offline) 1
    [root@localhost ~]# ethtool enp8s0
    Settings for enp8s0:
    Supported ports: [ TP ]
    Supported link modes: 10baseT/Half 10baseT/Full
    100baseT/Half 100baseT/Full
    1000baseT/Full
    Supported pause frame use: Symmetric
    Supports auto-negotiation: Yes
    Advertised link modes: 10baseT/Half 10baseT/Full
    100baseT/Half 100baseT/Full
    1000baseT/Full
    Advertised pause frame use: Symmetric
    Advertised auto-negotiation: Yes
    Speed: 1000Mb/s
    Duplex: Full
    Port: Twisted Pair
    PHYAD: 1
    Transceiver: internal
    Auto-negotiation: on
    MDI-X: off (auto)
    Supports Wake-on: pumbg
    Wake-on: g
    Current message level: 0x0000ffff (65535)
    drv probe link timer ifdown ifup rx_err tx_err tx_queued intr tx_done rx_status pktdata hw wol 0x8000
    Link detected: yes

     
  • Maarten Vanraes

    Maarten Vanraes - 2014-08-16

    tbh, i'm probably gonna send this one back soon... too much things are going wrong. and even though i now configured auto-reboot on these things, rebooting every couple of hours is not really fun...

     
  • Todd Fujinaka

    Todd Fujinaka - 2014-08-18

    Let me know when you decide.

    Thanks.

     
  • Maarten Vanraes

    Maarten Vanraes - 2014-08-19

    sometimes i seem to have failures where power seems off-ish and then the ipmi board isn't working either...

    i'm sending it back, though i presume to get an identical board back.

    which i why i asked if you guys needed some extra info. or perhaps tell me for sure that "PCIe link lost", means something went wrong on the mainboard...

     
  • Mike

    Mike - 2014-10-01

    Maarten

    I also have the ASRock 2750 and I also have the exact same issues you are experiencing.

    For me the 5.2.9.3 drivers have been the most stable to date.

    Let me know how your replacement board gets on please. As this has been driving me insane for some time.

    However I might add that I tried Win server 2008 on this board with the drivers from the AsRock site and it was completely stable. I just don't want to use windows.

    Regards

    Mike

     
    Last edit: Mike 2014-10-01
  • Maarten Vanraes

    Maarten Vanraes - 2014-10-02

    i notice the original post wasn't that clear... i have actually the 2550 board, and had posted in the 2750 bug thread. and was asked to create a new bug report.

    the replacement board got shipped back, i only need to pick it up at the post office now, which isn't open much :-( . since it's sooner than i expected, i'm gonna test the 3V line first, to make sure it's actually fixed...

     
  • Mike

    Mike - 2014-10-02

    Yea I mean it takes longer but it always goes eventually.

    If I stream from it I can get it to last a few weeks. If I hammer it with a disc copy sometimes I can copy 1tb others much less.

    And it's only the nic that goes system is still responsive.

     
  • Maarten Vanraes

    Maarten Vanraes - 2014-10-02

    well, yeah, but you still have to reboot to get it back working again...

    since this is my firewall, this is insanely annoying...

     
  • Todd Fujinaka

    Todd Fujinaka - 2015-03-11
    • status: open --> closed
     

Log in to post a comment.