You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
(1) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
|
Feb
|
Mar
(1) |
Apr
(9) |
May
(3) |
Jun
|
Jul
(3) |
Aug
(6) |
Sep
|
Oct
(7) |
Nov
|
Dec
|
2004 |
Jan
|
Feb
(5) |
Mar
(10) |
Apr
(2) |
May
(22) |
Jun
(8) |
Jul
(4) |
Aug
(8) |
Sep
(3) |
Oct
|
Nov
(36) |
Dec
(52) |
2005 |
Jan
(9) |
Feb
(13) |
Mar
(9) |
Apr
|
May
(14) |
Jun
(5) |
Jul
(20) |
Aug
(31) |
Sep
(2) |
Oct
(3) |
Nov
(18) |
Dec
(18) |
2006 |
Jan
(36) |
Feb
(16) |
Mar
(76) |
Apr
(78) |
May
(32) |
Jun
(30) |
Jul
(67) |
Aug
(43) |
Sep
(54) |
Oct
(116) |
Nov
(223) |
Dec
(158) |
2007 |
Jan
(180) |
Feb
(71) |
Mar
(110) |
Apr
(114) |
May
(203) |
Jun
(100) |
Jul
(238) |
Aug
(191) |
Sep
(177) |
Oct
(171) |
Nov
(211) |
Dec
(159) |
2008 |
Jan
(227) |
Feb
(288) |
Mar
(197) |
Apr
(253) |
May
(132) |
Jun
(152) |
Jul
(109) |
Aug
(143) |
Sep
(157) |
Oct
(198) |
Nov
(121) |
Dec
(147) |
2009 |
Jan
(105) |
Feb
(61) |
Mar
(191) |
Apr
(161) |
May
(118) |
Jun
(172) |
Jul
(166) |
Aug
(67) |
Sep
(86) |
Oct
(79) |
Nov
(118) |
Dec
(181) |
2010 |
Jan
(136) |
Feb
(154) |
Mar
(92) |
Apr
(83) |
May
(101) |
Jun
(66) |
Jul
(118) |
Aug
(78) |
Sep
(134) |
Oct
(131) |
Nov
(132) |
Dec
(104) |
2011 |
Jan
(79) |
Feb
(104) |
Mar
(144) |
Apr
(145) |
May
(130) |
Jun
(169) |
Jul
(146) |
Aug
(76) |
Sep
(113) |
Oct
(82) |
Nov
(145) |
Dec
(122) |
2012 |
Jan
(132) |
Feb
(106) |
Mar
(145) |
Apr
(238) |
May
(140) |
Jun
(162) |
Jul
(166) |
Aug
(147) |
Sep
(80) |
Oct
(148) |
Nov
(192) |
Dec
(90) |
2013 |
Jan
(139) |
Feb
(162) |
Mar
(174) |
Apr
(81) |
May
(261) |
Jun
(301) |
Jul
(106) |
Aug
(175) |
Sep
(305) |
Oct
(222) |
Nov
(95) |
Dec
(120) |
2014 |
Jan
(196) |
Feb
(171) |
Mar
(146) |
Apr
(118) |
May
(127) |
Jun
(93) |
Jul
(175) |
Aug
(66) |
Sep
(85) |
Oct
(120) |
Nov
(81) |
Dec
(192) |
2015 |
Jan
(141) |
Feb
(133) |
Mar
(189) |
Apr
(126) |
May
(59) |
Jun
(117) |
Jul
(56) |
Aug
(97) |
Sep
(44) |
Oct
(48) |
Nov
(33) |
Dec
(87) |
2016 |
Jan
(37) |
Feb
(56) |
Mar
(72) |
Apr
(65) |
May
(66) |
Jun
(65) |
Jul
(98) |
Aug
(54) |
Sep
(84) |
Oct
(68) |
Nov
(69) |
Dec
(60) |
2017 |
Jan
(30) |
Feb
(38) |
Mar
(53) |
Apr
(6) |
May
(2) |
Jun
(5) |
Jul
(15) |
Aug
(15) |
Sep
(7) |
Oct
(18) |
Nov
(23) |
Dec
(6) |
2018 |
Jan
(39) |
Feb
(5) |
Mar
(34) |
Apr
(26) |
May
(27) |
Jun
(5) |
Jul
(12) |
Aug
(4) |
Sep
|
Oct
(4) |
Nov
(4) |
Dec
(4) |
2019 |
Jan
(7) |
Feb
(10) |
Mar
(21) |
Apr
(26) |
May
(4) |
Jun
(5) |
Jul
(11) |
Aug
(6) |
Sep
(7) |
Oct
(13) |
Nov
(3) |
Dec
(17) |
2020 |
Jan
|
Feb
(3) |
Mar
(3) |
Apr
(5) |
May
(2) |
Jun
(5) |
Jul
|
Aug
|
Sep
(6) |
Oct
(7) |
Nov
(2) |
Dec
(7) |
2021 |
Jan
(9) |
Feb
(10) |
Mar
(18) |
Apr
(1) |
May
(3) |
Jun
|
Jul
(16) |
Aug
(2) |
Sep
|
Oct
|
Nov
(9) |
Dec
(2) |
2022 |
Jan
(3) |
Feb
|
Mar
(9) |
Apr
(8) |
May
(5) |
Jun
(6) |
Jul
(1) |
Aug
|
Sep
(1) |
Oct
|
Nov
(7) |
Dec
(2) |
2023 |
Jan
(7) |
Feb
(2) |
Mar
(6) |
Apr
|
May
(4) |
Jun
(2) |
Jul
(4) |
Aug
(3) |
Sep
(4) |
Oct
(2) |
Nov
(4) |
Dec
(10) |
2024 |
Jan
(4) |
Feb
(2) |
Mar
(1) |
Apr
|
May
(1) |
Jun
(1) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
From: Jon K. <jo...@nu...> - 2024-08-23 18:06:29
|
Hi e1000/igb folks, Reaching out about a compile time error with IGB driver, kernel 6.6, and trying to compile with -Werror In the Intel side of the IGB source, there is a flush_scheduled_work() in igb_remove, which according to git blame from the GitHub side has been there since 1.0.1 https://github.com/intel/ethernet-linux-igb/blame/d4658bb8811ea60deb8d3398e8682b64dc0e1f07/src/igb_main.c#L3429 This same flush is not present on the mainline driver: https://github.com/torvalds/linux/blame/master/drivers/net/ethernet/intel/igb/igb_main.c#L3860 This flush now produces a compile time warning, which turns into a failure with Werror There was an announcement on LKML about this for in-tree users a while back, here: https://lore.kernel.org/all/49925af7-78a8-a3dd-bce6-cfc02e1a9236@I-love.SAKURA.ne.jp/T/#u Compiling without Werror and using the driver is just fine, but wanted to see if this issue has been raised before and if there was any harm in simply removing this call (As the mainline driver appears to be working just fine without it)? Thanks, Jon In file included from ./include/linux/srcu.h:21, from ./include/linux/notifier.h:16, from ./arch/x86/include/asm/uprobes.h:13, from ./include/linux/uprobes.h:49, from ./include/linux/mm_types.h:16, from ./include/linux/buildid.h:5, from ./include/linux/module.h:14, from /builddir/build/BUILD/igb-5.16.9/src/igb_main.c:4: /builddir/build/BUILD/igb-5.16.9/src/igb_main.c: In function 'igb_remove': ./include/linux/workqueue.h:639:2: error: call to '__warn_flushing_systemwide_wq' declared with attribute warning: Please avoid flushing system-wide workqueues. [-Werror] __warn_flushing_systemwide_wq(); \ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /builddir/build/BUILD/igb-5.16.9/src/igb_main.c:3431:2: note: in expansion of macro 'flush_scheduled_work' flush_scheduled_work(); ^~~~~~~~~~~~~~~~~~~~ |
From: Rustad, M. D <mar...@in...> - 2024-06-14 17:24:02
|
> On May 20, 2024, at 3:33 AM, Alexander Kokorin <zuk...@gm...> wrote: > > We have noticed that when we receive TCP packets with the wrong > checksum from the internet, on > the receiving node it goes through, NIC compares the checksum, lets packet > further and increases the kernel counter for RX ERR. It doesn't make sense > as nothing can be done with such packets. Don't be so sure. Years ago when I worked for another company, another team had a product that handled networking traffic. Their product dropped packets that had bad TCP checksums. There were certain features in some software that simply would not work. They spent months trying to figure out why the features only didn't work when traffic went through their product. Eventually they finally realized that there were always some TCP checksum errors when that software was used. They stopped dropping the packets with bad checksums and then it worked fine. Yes, there is (or at least was?) software that abused the TCP checksum to pass some other data through the connection. The point of the hardware checksum check is to allow software not not have to do it when it is good - to optimize the normal case. It is not to drop the bad packets at a lower level, hiding them from TCP. -- Mark Rustad (he/him), Ethernet Products Group, Intel Corporation |
From: Alexander K. <zuk...@gm...> - 2024-05-20 10:41:45
|
Hello, We are using e810 NICs in our work and mostly they are used to pass through network traffic using QinQ. We have noticed that when we receive TCP packets with the wrong checksum from the internet, on the receiving node it goes through, NIC compares the checksum, lets packet further and increases the kernel counter for RX ERR. It doesn't make sense as nothing can be done with such packets. During the debugging process we found a place in code where it is not working correctly, we think it goes straight to checksum_fail. ´´´ if (ipv4 && (rx_status0 & (BIT(ICE_RX_FLEX_DESC_STATUS0_XSUM_IPE_S)))) goto checksum_fail; if (ipv6 && (rx_status0 & (BIT(ICE_RX_FLEX_DESC_STATUS0_IPV6EXADD_S)))) goto checksum_fail; /* check for L4 errors and handle packets that were not able to be * checksummed due to arrival speed */ if (rx_status0 & BIT(ICE_RX_FLEX_DESC_STATUS0_XSUM_L4E_S)) goto checksum_fail; /* check for outer UDP checksum error in tunneled packets */ if ((rx_status1 & BIT(ICE_RX_FLEX_DESC_STATUS1_NAT_S)) && (rx_status0 & BIT(ICE_RX_FLEX_DESC_STATUS0_XSUM_EUDPE_S))) goto checksum_fail; ´´´ instead of going to next session where the checksum for tunneled packets are unnecessary and shouldn't increase counter ´´´ /* Only report checksum unnecessary for TCP, UDP, or SCTP */ switch (decoded.inner_prot) { case ICE_RX_PTYPE_INNER_PROT_TCP: case ICE_RX_PTYPE_INNER_PROT_UDP: case ICE_RX_PTYPE_INNER_PROT_SCTP: skb->ip_summed = CHECKSUM_UNNECESSARY; ´´´ The one way to deal with it is to disable rx checksumming in ethtool, but we don't want to lose the monitoring for "normal" L2 packets. Here is the example of such packet, the only difference the src and dst IPs and checksum Here are some examples for such packets: Frame 2088: 70 bytes on wire (560 bits), 70 bytes captured (560 bits) Encapsulation type: Ethernet (1) UTC Arrival Time: May 13, 2024 13:12:17.815209000 UTC [Time shift for this packet: 0.000000000 seconds] [Time delta from previous captured frame: 0.054763000 seconds] [Time delta from previous displayed frame: 3.864796000 seconds] [Time since reference or first frame: 5.132255000 seconds] Frame Number: 2088 Frame Length: 70 bytes (560 bits) Capture Length: 70 bytes (560 bits) [Frame is marked: False] [Frame is ignored: False] [Protocols in frame: eth:ethertype:vlan:ethertype:ip:tcp] [Coloring Rule Name: Checksum Errors] [Coloring Rule String [truncated]: eth.fcs.status=="Bad" ip.checksum.status=="Bad" tcp.checksum.status=="Bad" udp.checksum.status=="Bad" sctp.checksum.status=="Bad" mstp.checksum.status=="Bad" cdp.checksum.status=="Bad" ||] Ethernet II, Src: , Dst: Destination: Source: Type: 802.1Q Virtual LAN (0x8100) 802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 258 000. .... .... .... = Priority: Best Effort (default) (0) ...0 .... .... .... = DEI: Ineligible .... 0001 0000 0010 = ID: 258 Type: IPv4 (0x0800) Internet Protocol Version 4, Src: , Dst: 0100 .... = Version: 4 .... 0101 = Header Length: 20 bytes (5) Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT) Total Length: 52 Identification: 0x1763 (5987) 000. .... = Flags: 0x0 ...0 0000 0000 0000 = Fragment Offset: 0 Time to Live: 119 Protocol: TCP (6) Header Checksum: 0x42c9 [correct] [Header checksum status: Good] [Calculated Checksum: 0x42c9] Source Address: Destination Address: Transmission Control Protocol, Src Port: 64455, Dst Port: 22, Seq: 0, Len: 0 Source Port: 64455 Destination Port: 22 [Stream index: 2] [Conversation completeness: Incomplete, SYN_SENT (1)] [TCP Segment Len: 0] Sequence Number: 0 (relative sequence number) Sequence Number (raw): 4100199712 [Next Sequence Number: 1 (relative sequence number)] Acknowledgment Number: 0 Acknowledgment number (raw): 0 1000 .... = Header Length: 32 bytes (8) Flags: 0x002 (SYN) Window: 64240 [Calculated window size: 64240] Checksum: 0x9bb8 incorrect, should be 0x8524(maybe caused by "TCP checksum offload"?) [Expert Info (Error/Checksum): Bad checksum [should be 0x8524]] [Bad checksum [should be 0x8524]] [Severity level: Error] [Group: Checksum] [Calculated Checksum: 0x8524] [Checksum Status: Bad] Urgent Pointer: 0 Options: (12 bytes), Maximum segment size, No-Operation (NOP), Window scale, No-Operation (NOP), No-Operation (NOP), SACK permitted TCP Option - Maximum segment size: 1460 bytes TCP Option - No-Operation (NOP) TCP Option - Window scale: 8 (multiply by 256) TCP Option - No-Operation (NOP) TCP Option - No-Operation (NOP) TCP Option - SACK permitted [Timestamps] [Time since first frame in this TCP stream: 0.000000000 seconds] [Time since previous frame in this TCP stream: 0.000000000 seconds] Looking forward to your reply on that matter -- Mit freundlichen Grüßen Alexander Kokorin |
From: Ross V. <ro...@ka...> - 2024-03-26 21:48:34
|
Hello, Is the patch at [1] relevant to the out-of-tree ice driver? It was merged into upstream linux 6.1, but doesn't seem to have been applied out of tree. I checked 1.11.14 and 1.13.7. That patch came up while investigating an error from the out-of-tree 1.11.14: [ 2.731841] ice 0000:05:00.0: ice_init_interrupt_scheme failed: -34 [ 2.730619] ice 0000:05:00.0: not enough device MSI-X vectors. requested = 44, available = 1 Thanks, Ross [1] - https://lore.kernel.org/netdev/202...@in.../ |
From: Billie A. (balsup) <ba...@ci...> - 2024-02-13 16:44:03
|
> In the latest i40e 2.24.6, there is inconsistent usage of conditionals leading to compilation errors. I have resolved the i40e compilation problem for 6.6.9 kernel with the attached patch. I basically swapped the order of CONFIG_PCI_IOV and CONFIG_DCB, and moved two functions outside of the CONFIG_DCB conditional. The separate stub of pci_disable_pcie_error_reporting is because this function was deleted in 6.6. I'm not positive that this is the correct solution. I am just guessing that the functionality was added in the pci bus path somewhere. I had to do similar with ixgbe driver. |
From: Billie A. (balsup) <ba...@ci...> - 2024-02-13 01:01:06
|
In the latest i40e 2.24.6, there is inconsistent usage of conditionals leading to compilation errors. I am porting to 6.6.9 kernel. In source src/i40e_virtchnl_pf.c, the function i40e_set_link_state is defined/implemented under three conditions, which must all be met (line 6663): #ifdef HAVE_NDO_SET_VF_LINK_STATE #ifdef CONFIG_DCB #ifdef CONFIG_PCI_IOV However, it is subsequently. used by function i40e_set_vf_enable under the single conditional at line 7595 #ifdef CONFIG_PCI_IOV Similarly, it is set in the i40e_vfd_ops table under the same conditional at line 9490 #ifdef CONFIG_PCI_IOV .get_link_state = i40e_get_link_state, .set_link_state = i40e_set_link_state, #endif The inconsistency leads to errors during compilation if CONFIG_PCI_IOV is defined and either HAVE_NDO_SET_VF_LINK_STATE or CONFIG_DCB is not. In my case, CONFIG_DCB is not enabled, and I would prefer not to enable it in my kernel config. What is the recommended solution? It is not clear to me why CONFIG_DCB is necessary. Should the two usages under a single #ifdef CONFIG_PCI_IOV be changed to require all three? e.g. #if defined(HAVE_NDO_SET_VF_LINK_STATE) && defined(CONFIG_DCB) && defined(CONFIG_PCI_IOV) Or perhaps can the two functions i40e_set_link_state and i40e_configure_vf_link be moved outside of the CONFIG_DCB check? Or is there another recommended solution? |
From: Skyler M. <sm+...@sk...> - 2024-01-18 21:26:25
|
Hi there, As we can see here https://github.com/samipsolutions/vyos-build/actions/runs/7564805403/job/20599526418#step:20:117 I'm unable to compile the driver due to it complaining about `pci_disable_pcie_error_reporting` and the enable version of that too. This is with ixgbe 5.19.9 source, and 5.19.6 at least used to compile. I think it's the result of `-Werror=implicit-function-declaration` but not sure where it gets that as I'm using the vyos build container for this. Any ideas as to what to try to fix it would be much appreciated? Skyler |
From: Jesse B. <jes...@in...> - 2024-01-18 20:32:58
|
On 1/11/2024 4:21 AM, Kum...@sw... wrote: > Hi, > > Unlike the previous releases for the drivers, we don’t see the column for version anymore for the Intel(R) Gigabit Ethernet Network Driver in SUSE Linux (SLE15-SP4) when doing modinfo igb. > > Is this something expected and if yes, is there any other way to get the igb driver versions? For older SUSE installations we have , we see 5.6.0-k or something like that for igb driver versions. Hi Mohit, The Intel out-of-tree (OOT) drivers (like the one you download from sourceforge) have a version number in them, but in the upstream, version numbers were removed by the kernel community, and the version is equivalent to the kernel the driver was released with. If there was some reason you thought you needed a driver version, please let us know. The reason the kernel community removed the driver versions from upstream (and therefore from consumers of upstream, like the SLES distro you mention) is that the version numbers were misleading, wrong, or not kept up to date. Basically the idea that comparing in-kernel to OOT using a version number is not a good idea, as the drivers are not the same, they're two different products released at different times, with differing functionality. If you need the specific upstream commit that igb was updated to in the SLE15 SP4 release, please contact SuSE. Hope this helps! Jesse |
From: <Kum...@sw...> - 2024-01-11 12:34:32
|
Hi, Unlike the previous releases for the drivers, we don’t see the column for version anymore for the Intel(R) Gigabit Ethernet Network Driver in SUSE Linux (SLE15-SP4) when doing modinfo igb. Is this something expected and if yes, is there any other way to get the igb driver versions? For older SUSE installations we have , we see 5.6.0-k or something like that for igb driver versions. Br, Mohit |
From: Pierre S. <psa...@ex...> - 2024-01-08 14:57:39
|
Hi, The attached patch fixes the issue below with 5.19.9. Thanks, Pierre From: Pierre Sangouard Sent: Thursday, August 31, 2023 17:53 To: e10...@li... Subject: ixgbe driver version 5.19.6 build error Hi, Building ixgbe driver version 5.19.6 for kernel 4.9.337 with CONFIG_I40E_DISABLE_PACKET_SPLIT=1, I get the following failure: env -u KERNELRELEASE make -C ixgbe-5.19.6/src KSRC=my_linux_directory EXTRA_CFLAGS=-DCONFIG_IXGBE_DISABLE_PACKET_SPLIT=1 INSTALL_MOD_DIR=extra || exit 1; make[1]: Entering directory 'my_driver_directory' filtering include/linux/dev_printk.h out filtering include/net/flow_keys.h out filtering include/net/flow_offload.h out all files (for given query) filtered out filtering include/linux/device/class.h out all files (for given query) filtered out filtering include/linux/gnss.h out all files (for given query) filtered out filtering include/linux/jump_label_type.h out filtering include/linux/jump_label_type.h out make[2]: Entering directory 'my_linux_directory' CC [M] my_driver_directory/ixgbe_main.o my_driver_directory/ixgbe_main.c: In function 'ixgbe_configure_rx_ring': my_driver_directory/ixgbe_main.c:4423:20: error: implicit declaration of function 'ixgbe_rx_offset'; did you mean 'ixgbe_rx_bufsz'? [-Werror=implicit-function-declaration] ring->rx_offset = ixgbe_rx_offset(ring); ^~~~~~~~~~~~~~~ ixgbe_rx_bufsz cc1: some warnings being treated as errors scripts/Makefile.build:307: recipe for target 'my_driver_directory/ixgbe_main.o' failed make[3]: *** [my_driver_directory/ixgbe_main.o] Error 1 Makefile:1544: recipe for target '_module_my_driver_directory' failed make[2]: *** [_module_my_driver_directory] Error 2 make[2]: Leaving directory 'my_linux_directory' Makefile:100: recipe for target 'default' failed make[1]: *** [default] Error 2 make[1]: Leaving directory 'my_driver_directory' Any idea how to fix it? Thanks, Pierre Pierre Sangouard Integration Manager / Extreme Networks psa...@ex...<mailto:psa...@ex...> |
From: stefanx <st...@lr...> - 2023-12-19 17:09:10
|
Hello, one of our servers crashes regularly, apparently during heavy network load. The log files are then full of this message: kernel: [514257.305733] i40e 0000:02:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0020 address=0x79ea8113f60 flags=0x0000] This is the driver: i40e: Intel(R) Ethernet Connection XL710 Network Driver i40e: Copyright (c) 2013 - 2019 Intel Corporation. i40e 0000:02:00.0: fw 8.5.67516 api 1.15 nvm 8.50 0x8000be1e 1.3295.0 [8086:15ff] [15d9:1c76] i40e 0000:02:00.0: MAC address: 7c:c2:55:9d:d2:78 i40e 0000:02:00.0: FW LLDP is enabled i40e 0000:02:00.0 eth0: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None i40e 0000:02:00.0: PCI-Express: Speed 8.0GT/s Width x4 i40e 0000:02:00.0: PCI-Express bandwidth available for this device may be insufficient for optimal performance. i40e 0000:02:00.0: Please move the device to a different PCI-e link with more lanes and/or higher transfer rate. i40e 0000:02:00.0: Features: PF-id[0] VFs: 64 VSIs: 66 QP: 119 RSS FD_ATR FD_SB NTUPLE DCB VxLAN Geneve PTP VEPA i40e 0000:02:00.1: fw 8.5.67516 api 1.15 nvm 8.50 0x8000be1e 1.3295.0 [8086:15ff] [15d9:1c76] i40e 0000:02:00.1: MAC address: 7c:c2:55:9d:d2:79 i40e 0000:02:00.1: FW LLDP is enabled i40e 0000:02:00.1: PCI-Express: Speed 8.0GT/s Width x4 i40e 0000:02:00.1: PCI-Express bandwidth available for this device may be insufficient for optimal performance. i40e 0000:02:00.1: Please move the device to a different PCI-e link with more lanes and/or higher transfer rate. i40e 0000:02:00.1: Features: PF-id[1] VFs: 64 VSIs: 66 QP: 119 RSS FD_ATR FD_SB NTUPLE DCB VxLAN Geneve PTP VEPA i40e 0000:02:00.0 enp2s0f0: renamed from eth0 i40e 0000:02:00.1 enp2s0f1: renamed from eth1 i40e 0000:02:00.0: entering allmulti mode. This message stands out there: i40e 0000:02:00.1: PCI-Express bandwidth available for this device may be insufficient for optimal performance. i40e 0000:02:00.1: Please move the device to a different PCI-e link with more lanes and/or higher transfer rate. Does somebody has any idea ? GRUB_CMDLINE_LINUX_DEFAULT="iommu=soft" is sometimes recommended in similar cases with IO_PAGE_FAULT. Maybe I should lower the speed from 10 Gbit/s to 1 Gbit/s as a test? Thanks Stefan |
From: stefanx <st...@lr...> - 2023-12-17 16:49:43
|
Hello, one of our servers crashes regularly, apparently during heavy network load. The log files are then full of this message: kernel: [514257.305733] i40e 0000:02:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0020 address=0x79ea8113f60 flags=0x0000] This is the driver: i40e: Intel(R) Ethernet Connection XL710 Network Driver i40e: Copyright (c) 2013 - 2019 Intel Corporation. i40e 0000:02:00.0: fw 8.5.67516 api 1.15 nvm 8.50 0x8000be1e 1.3295.0 [8086:15ff] [15d9:1c76] i40e 0000:02:00.0: MAC address: 7c:c2:55:9d:d2:78 i40e 0000:02:00.0: FW LLDP is enabled i40e 0000:02:00.0 eth0: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None i40e 0000:02:00.0: PCI-Express: Speed 8.0GT/s Width x4 i40e 0000:02:00.0: PCI-Express bandwidth available for this device may be insufficient for optimal performance. i40e 0000:02:00.0: Please move the device to a different PCI-e link with more lanes and/or higher transfer rate. i40e 0000:02:00.0: Features: PF-id[0] VFs: 64 VSIs: 66 QP: 119 RSS FD_ATR FD_SB NTUPLE DCB VxLAN Geneve PTP VEPA i40e 0000:02:00.1: fw 8.5.67516 api 1.15 nvm 8.50 0x8000be1e 1.3295.0 [8086:15ff] [15d9:1c76] i40e 0000:02:00.1: MAC address: 7c:c2:55:9d:d2:79 i40e 0000:02:00.1: FW LLDP is enabled i40e 0000:02:00.1: PCI-Express: Speed 8.0GT/s Width x4 i40e 0000:02:00.1: PCI-Express bandwidth available for this device may be insufficient for optimal performance. i40e 0000:02:00.1: Please move the device to a different PCI-e link with more lanes and/or higher transfer rate. i40e 0000:02:00.1: Features: PF-id[1] VFs: 64 VSIs: 66 QP: 119 RSS FD_ATR FD_SB NTUPLE DCB VxLAN Geneve PTP VEPA i40e 0000:02:00.0 enp2s0f0: renamed from eth0 i40e 0000:02:00.1 enp2s0f1: renamed from eth1 i40e 0000:02:00.0: entering allmulti mode. This message stands out there: i40e 0000:02:00.1: PCI-Express bandwidth available for this device may be insufficient for optimal performance. i40e 0000:02:00.1: Please move the device to a different PCI-e link with more lanes and/or higher transfer rate. Does somebody has any idea ? GRUB_CMDLINE_LINUX_DEFAULT="iommu=soft" is sometimes recommended in similar cases with IO_PAGE_FAULT. Maybe I should lower the speed from 10 Gbit/s to 1 Gbit/s as a test? Thanks Stefan |
From: stefanx <st...@lr...> - 2023-12-14 22:04:11
|
Hello, one of our servers crashes regularly, apparently during heavy network load. The log files are then full of this message: kernel: [514257.305733] i40e 0000:02:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0020 address=0x79ea8113f60 flags=0x0000] This is the driver: i40e: Intel(R) Ethernet Connection XL710 Network Driver i40e: Copyright (c) 2013 - 2019 Intel Corporation. i40e 0000:02:00.0: fw 8.5.67516 api 1.15 nvm 8.50 0x8000be1e 1.3295.0 [8086:15ff] [15d9:1c76] i40e 0000:02:00.0: MAC address: 7c:c2:55:9d:d2:78 i40e 0000:02:00.0: FW LLDP is enabled i40e 0000:02:00.0 eth0: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None i40e 0000:02:00.0: PCI-Express: Speed 8.0GT/s Width x4 i40e 0000:02:00.0: PCI-Express bandwidth available for this device may be insufficient for optimal performance. i40e 0000:02:00.0: Please move the device to a different PCI-e link with more lanes and/or higher transfer rate. i40e 0000:02:00.0: Features: PF-id[0] VFs: 64 VSIs: 66 QP: 119 RSS FD_ATR FD_SB NTUPLE DCB VxLAN Geneve PTP VEPA i40e 0000:02:00.1: fw 8.5.67516 api 1.15 nvm 8.50 0x8000be1e 1.3295.0 [8086:15ff] [15d9:1c76] i40e 0000:02:00.1: MAC address: 7c:c2:55:9d:d2:79 i40e 0000:02:00.1: FW LLDP is enabled i40e 0000:02:00.1: PCI-Express: Speed 8.0GT/s Width x4 i40e 0000:02:00.1: PCI-Express bandwidth available for this device may be insufficient for optimal performance. i40e 0000:02:00.1: Please move the device to a different PCI-e link with more lanes and/or higher transfer rate. i40e 0000:02:00.1: Features: PF-id[1] VFs: 64 VSIs: 66 QP: 119 RSS FD_ATR FD_SB NTUPLE DCB VxLAN Geneve PTP VEPA i40e 0000:02:00.0 enp2s0f0: renamed from eth0 i40e 0000:02:00.1 enp2s0f1: renamed from eth1 i40e 0000:02:00.0: entering allmulti mode. This message stands out there: i40e 0000:02:00.1: PCI-Express bandwidth available for this device may be insufficient for optimal performance. i40e 0000:02:00.1: Please move the device to a different PCI-e link with more lanes and/or higher transfer rate. Does somebody has any idea ? GRUB_CMDLINE_LINUX_DEFAULT="iommu=soft" is sometimes recommended in similar cases with IO_PAGE_FAULT. Maybe I should lower the speed from 10 Gbit/s to 1 Gbit/s as a test? Thanks Stefan |
From: Assaf A. <as...@qw...> - 2023-12-10 10:21:01
|
Thank you, Donald, Jesse, and all, for helping us. We'll go through the responses you gave us and let you know if we have any questions or findings. Assaf, On Thu, Dec 7, 2023 at 9:56 PM Brandeburg, Jesse <jes...@in...> wrote: > Hi Assaf, and thanks Don for mentioning the Cisco link. > > > > I had a further look at the stats and see this: > > mac_local_faults.nic: 0 > > mac_remote_faults.nic: 1 > > > > on both the sender and receiver stats. Remote fault means the switch RX > PCS failed to maintain locked state (far end of the cable away from our > adapter). This might help you switch team or cisco figure out what is going > on. > > > > In this case I don’t think it’s the driver or the local end firmware, but > I would strongly suggest that you update the firmware to a newer version on > (some of) your cards, and you can get the updated firmware from Cisco. > > > > So, I’d be asking, why is the switch cycling or dropping the link? Hope > this helps! > > > > Jesse > > > > *From:* Buchholz, Donald <don...@in...> > *Sent:* Thursday, December 7, 2023 11:05 AM > *To:* Assaf Albo <as...@qw...> > *Cc:* Brandeburg, Jesse <jes...@in...>; > e10...@li...; Matan Levy <ma...@qw...>; Itamar > Maron <it...@qw...> > *Subject:* RE: [e1000-devel] Intel E810 100Gb goes down sporadically > > > > Hi Assaf, > > > > Thank you for the data. I see from the data files you included that > > you are working with a Cisco-branded E810-CQDA2 NIC. > > > > As this is a Cisco supported NIC, have you consulted Cisco support > > and configured your system with Cisco-approved firmware/vendor > > versions? > > > > I do not support the Cisco products, but I see immediately that the > > NIC FW is revision 2.25. The ice driver v1.9.11 was developed at > > Intel for use with 4.xx firmware. > > > > Please contact Cisco. If it is a problem that they cannot resolve the > matter, they will reach out to the appropriate Intel support team > > for this product. > > > > Best regards, > > - Don > > > > > > *From:* Assaf Albo <as...@qw...> > *Sent:* Wednesday, December 6, 2023 3:34 AM > *To:* Buchholz, Donald <don...@in...> > *Cc:* Brandeburg, Jesse <jes...@in...>; > e10...@li...; Matan Levy <ma...@qw...>; Itamar > Maron <it...@qw...> > *Subject:* Re: [e1000-devel] Intel E810 100Gb goes down sporadically > > > > Hey guys, > > Firstly, I'd like to thank you all for helping us out. > > Attached to this mail are two files with all the statistics (client > machine + server machine). > > > > > > > > > > *"The passthrough device shouldn't be any problem but I do recommend that > if you're passing through the device to a VM, you try to match the > destination PCIe function number to the origination ID to prevent odd > issues. like if your host device is: 01:00.1 then (I'm not sure you can do > this) I'd hope the VM device is 00:06.1, and not 00:06.0"* > > Exactly what we are doing, we are matching. > You can see in the attached files that one of the machines is working with > eth0 00:06.0 and the other eth1 00:06.1 > > > > *"Also, do you see any stats or events on the switch side when link is > lost?"* > > We use Cisco Nexus switches, and our network engineer said that he > sees events of link down from the ports. > > > > On Wed, Dec 6, 2023 at 6:42 AM Buchholz, Donald <don...@in...> > wrote: > > Hi Assaf, > > In addition to the commands listed by Jesse, > please also provide "ethtool -i <eth#>" output. > This will assist us in identifying the NIC and > Firmware revision you are using. > > - Don > > > > -----Original Message----- > > From: Jesse Brandeburg <jes...@in...> > > Sent: Tuesday, December 5, 2023 10:47 AM > > To: Assaf Albo <as...@qw...>; e10...@li...; > Matan > > Levy <ma...@qw...> > > Subject: Re: [e1000-devel] Intel E810 100Gb goes down sporadically > > > > On 12/3/2023 1:26 AM, Assaf Albo via E1000-devel wrote: > > > Hello guys, > > > > > > We are having constant network issues in production in that the link > goes > > > down, waits *exactly* 7-8 seconds, and goes up again. > > > This can happen zero to a few times a day on all our servers; they are > not > > > in the same location and are connected to different network devices. > > > > > > Each server runs as a KVM virtual machine with 60 CPUs (Pinning) and > 224Gi > > > (Huge pages) - overall performance is excellent. > > > The NIC is PCI passed through to the KVM machine AS IS. > > > OS Rocky Linux 8.5, kernel 4.18.0-348.23.1.el8_5.x86_64 with Intel ice > > > 1.9.11 built and installed using rpm. > > > We have a traffic generator between two servers (our app: > client+server) > > > that is reaching 94Gb and can replicate this issue. > > > > > > The dmesg once the issue occur: > > > Nov 28 16:01:27 SERVER kernel: ice 0000:00:06.0 eth0: NIC Link is Down > > > Nov 28 16:01:35 SERVER kernel: ice 0000:00:06.0 eth0: NIC Link is up > 100 > > > Gbps Full Duplex, Requested FEC: RS-FEC, Negotiated FEC: RS-FEC, > Autoneg > > > Advertised: Off, Autoneg Negotiated: False, Flow Control: None > > > > Hi Assaf, sorry hear you're having problems. > > > > w.r.t. the link down events we need to determine if it is a local down > > or remote. > > > > Please gather the 'ethtool -S eth0' statistics for a system that has had > > some problems, and send to the list as text. > > > > also, 'ethtool -m eth0' > > > > The passthrough device shouldn't be any problem but I do recommend that > > if you're passing through the device to a VM, you try to match the > > destination PCIe function number to the origination ID to prevent odd > > issues. > > > > like if your host device is: > > 01:00.1 then (I'm not sure you can do this) I'd hope the VM device is > > 00:06.1, and not 00:06.0 > > > > So I guess with that statement I'd ask do you ever see the problem on > > systems with > > 3b:00.0 (ice PF PCIe in host) > > 00:06.0 (ice PF in VM) > > > > having the link down issues? > > > > Please include output from devlink dev info, and if you know it, what > > switch you're connected to. > > > > Also, do you see any stats or events on the switch side when link is > lost? > > > > - Jesse > > > > > > _______________________________________________ > > E1000-devel mailing list > > E10...@li... > > https://lists.sourceforge.net/lists/listinfo/e1000-devel > > To learn more about Intel Ethernet, visit > > https://community.intel.com/t5/Ethernet-Products/bd-p/ethernet-products > > |
From: Brandeburg, J. <jes...@in...> - 2023-12-07 19:56:22
|
Hi Assaf, and thanks Don for mentioning the Cisco link. I had a further look at the stats and see this: mac_local_faults.nic: 0 mac_remote_faults.nic: 1 on both the sender and receiver stats. Remote fault means the switch RX PCS failed to maintain locked state (far end of the cable away from our adapter). This might help you switch team or cisco figure out what is going on. In this case I don’t think it’s the driver or the local end firmware, but I would strongly suggest that you update the firmware to a newer version on (some of) your cards, and you can get the updated firmware from Cisco. So, I’d be asking, why is the switch cycling or dropping the link? Hope this helps! Jesse From: Buchholz, Donald <don...@in...> Sent: Thursday, December 7, 2023 11:05 AM To: Assaf Albo <as...@qw...> Cc: Brandeburg, Jesse <jes...@in...>; e10...@li...; Matan Levy <ma...@qw...>; Itamar Maron <it...@qw...> Subject: RE: [e1000-devel] Intel E810 100Gb goes down sporadically Hi Assaf, Thank you for the data. I see from the data files you included that you are working with a Cisco-branded E810-CQDA2 NIC. As this is a Cisco supported NIC, have you consulted Cisco support and configured your system with Cisco-approved firmware/vendor versions? I do not support the Cisco products, but I see immediately that the NIC FW is revision 2.25. The ice driver v1.9.11 was developed at Intel for use with 4.xx firmware. Please contact Cisco. If it is a problem that they cannot resolve the matter, they will reach out to the appropriate Intel support team for this product. Best regards, - Don From: Assaf Albo <as...@qw...<mailto:as...@qw...>> Sent: Wednesday, December 6, 2023 3:34 AM To: Buchholz, Donald <don...@in...<mailto:don...@in...>> Cc: Brandeburg, Jesse <jes...@in...<mailto:jes...@in...>>; e10...@li...<mailto:e10...@li...>; Matan Levy <ma...@qw...<mailto:ma...@qw...>>; Itamar Maron <it...@qw...<mailto:it...@qw...>> Subject: Re: [e1000-devel] Intel E810 100Gb goes down sporadically Hey guys, Firstly, I'd like to thank you all for helping us out. Attached to this mail are two files with all the statistics (client machine + server machine). "The passthrough device shouldn't be any problem but I do recommend that if you're passing through the device to a VM, you try to match the destination PCIe function number to the origination ID to prevent odd issues. like if your host device is: 01:00.1 then (I'm not sure you can do this) I'd hope the VM device is 00:06.1, and not 00:06.0" Exactly what we are doing, we are matching. You can see in the attached files that one of the machines is working with eth0 00:06.0 and the other eth1 00:06.1 "Also, do you see any stats or events on the switch side when link is lost?" We use Cisco Nexus switches, and our network engineer said that he sees events of link down from the ports. On Wed, Dec 6, 2023 at 6:42 AM Buchholz, Donald <don...@in...<mailto:don...@in...>> wrote: Hi Assaf, In addition to the commands listed by Jesse, please also provide "ethtool -i <eth#>" output. This will assist us in identifying the NIC and Firmware revision you are using. - Don > -----Original Message----- > From: Jesse Brandeburg <jes...@in...<mailto:jes...@in...>> > Sent: Tuesday, December 5, 2023 10:47 AM > To: Assaf Albo <as...@qw...<mailto:as...@qw...>>; e10...@li...<mailto:e10...@li...>; Matan > Levy <ma...@qw...<mailto:ma...@qw...>> > Subject: Re: [e1000-devel] Intel E810 100Gb goes down sporadically > > On 12/3/2023 1:26 AM, Assaf Albo via E1000-devel wrote: > > Hello guys, > > > > We are having constant network issues in production in that the link goes > > down, waits *exactly* 7-8 seconds, and goes up again. > > This can happen zero to a few times a day on all our servers; they are not > > in the same location and are connected to different network devices. > > > > Each server runs as a KVM virtual machine with 60 CPUs (Pinning) and 224Gi > > (Huge pages) - overall performance is excellent. > > The NIC is PCI passed through to the KVM machine AS IS. > > OS Rocky Linux 8.5, kernel 4.18.0-348.23.1.el8_5.x86_64 with Intel ice > > 1.9.11 built and installed using rpm. > > We have a traffic generator between two servers (our app: client+server) > > that is reaching 94Gb and can replicate this issue. > > > > The dmesg once the issue occur: > > Nov 28 16:01:27 SERVER kernel: ice 0000:00:06.0 eth0: NIC Link is Down > > Nov 28 16:01:35 SERVER kernel: ice 0000:00:06.0 eth0: NIC Link is up 100 > > Gbps Full Duplex, Requested FEC: RS-FEC, Negotiated FEC: RS-FEC, Autoneg > > Advertised: Off, Autoneg Negotiated: False, Flow Control: None > > Hi Assaf, sorry hear you're having problems. > > w.r.t. the link down events we need to determine if it is a local down > or remote. > > Please gather the 'ethtool -S eth0' statistics for a system that has had > some problems, and send to the list as text. > > also, 'ethtool -m eth0' > > The passthrough device shouldn't be any problem but I do recommend that > if you're passing through the device to a VM, you try to match the > destination PCIe function number to the origination ID to prevent odd > issues. > > like if your host device is: > 01:00.1 then (I'm not sure you can do this) I'd hope the VM device is > 00:06.1, and not 00:06.0 > > So I guess with that statement I'd ask do you ever see the problem on > systems with > 3b:00.0 (ice PF PCIe in host) > 00:06.0 (ice PF in VM) > > having the link down issues? > > Please include output from devlink dev info, and if you know it, what > switch you're connected to. > > Also, do you see any stats or events on the switch side when link is lost? > > - Jesse > > > _______________________________________________ > E1000-devel mailing list > E10...@li...<mailto:E10...@li...> > https://lists.sourceforge.net/lists/listinfo/e1000-devel > To learn more about Intel Ethernet, visit > https://community.intel.com/t5/Ethernet-Products/bd-p/ethernet-products |
From: Buchholz, D. <don...@in...> - 2023-12-07 19:07:48
|
Hi Assaf, Thank you for the data. I see from the data files you included that you are working with a Cisco-branded E810-CQDA2 NIC. As this is a Cisco supported NIC, have you consulted Cisco support and configured your system with Cisco-approved firmware/vendor versions? I do not support the Cisco products, but I see immediately that the NIC FW is revision 2.25. The ice driver v1.9.11 was developed at Intel for use with 4.xx firmware. Please contact Cisco. If it is a problem that they cannot resolve the matter, they will reach out to the appropriate Intel support team for this product. Best regards, - Don From: Assaf Albo <as...@qw...> Sent: Wednesday, December 6, 2023 3:34 AM To: Buchholz, Donald <don...@in...> Cc: Brandeburg, Jesse <jes...@in...>; e10...@li...; Matan Levy <ma...@qw...>; Itamar Maron <it...@qw...> Subject: Re: [e1000-devel] Intel E810 100Gb goes down sporadically Hey guys, Firstly, I'd like to thank you all for helping us out. Attached to this mail are two files with all the statistics (client machine + server machine). "The passthrough device shouldn't be any problem but I do recommend that if you're passing through the device to a VM, you try to match the destination PCIe function number to the origination ID to prevent odd issues. like if your host device is: 01:00.1 then (I'm not sure you can do this) I'd hope the VM device is 00:06.1, and not 00:06.0" Exactly what we are doing, we are matching. You can see in the attached files that one of the machines is working with eth0 00:06.0 and the other eth1 00:06.1 "Also, do you see any stats or events on the switch side when link is lost?" We use Cisco Nexus switches, and our network engineer said that he sees events of link down from the ports. On Wed, Dec 6, 2023 at 6:42 AM Buchholz, Donald <don...@in...<mailto:don...@in...>> wrote: Hi Assaf, In addition to the commands listed by Jesse, please also provide "ethtool -i <eth#>" output. This will assist us in identifying the NIC and Firmware revision you are using. - Don > -----Original Message----- > From: Jesse Brandeburg <jes...@in...<mailto:jes...@in...>> > Sent: Tuesday, December 5, 2023 10:47 AM > To: Assaf Albo <as...@qw...<mailto:as...@qw...>>; e10...@li...<mailto:e10...@li...>; Matan > Levy <ma...@qw...<mailto:ma...@qw...>> > Subject: Re: [e1000-devel] Intel E810 100Gb goes down sporadically > > On 12/3/2023 1:26 AM, Assaf Albo via E1000-devel wrote: > > Hello guys, > > > > We are having constant network issues in production in that the link goes > > down, waits *exactly* 7-8 seconds, and goes up again. > > This can happen zero to a few times a day on all our servers; they are not > > in the same location and are connected to different network devices. > > > > Each server runs as a KVM virtual machine with 60 CPUs (Pinning) and 224Gi > > (Huge pages) - overall performance is excellent. > > The NIC is PCI passed through to the KVM machine AS IS. > > OS Rocky Linux 8.5, kernel 4.18.0-348.23.1.el8_5.x86_64 with Intel ice > > 1.9.11 built and installed using rpm. > > We have a traffic generator between two servers (our app: client+server) > > that is reaching 94Gb and can replicate this issue. > > > > The dmesg once the issue occur: > > Nov 28 16:01:27 SERVER kernel: ice 0000:00:06.0 eth0: NIC Link is Down > > Nov 28 16:01:35 SERVER kernel: ice 0000:00:06.0 eth0: NIC Link is up 100 > > Gbps Full Duplex, Requested FEC: RS-FEC, Negotiated FEC: RS-FEC, Autoneg > > Advertised: Off, Autoneg Negotiated: False, Flow Control: None > > Hi Assaf, sorry hear you're having problems. > > w.r.t. the link down events we need to determine if it is a local down > or remote. > > Please gather the 'ethtool -S eth0' statistics for a system that has had > some problems, and send to the list as text. > > also, 'ethtool -m eth0' > > The passthrough device shouldn't be any problem but I do recommend that > if you're passing through the device to a VM, you try to match the > destination PCIe function number to the origination ID to prevent odd > issues. > > like if your host device is: > 01:00.1 then (I'm not sure you can do this) I'd hope the VM device is > 00:06.1, and not 00:06.0 > > So I guess with that statement I'd ask do you ever see the problem on > systems with > 3b:00.0 (ice PF PCIe in host) > 00:06.0 (ice PF in VM) > > having the link down issues? > > Please include output from devlink dev info, and if you know it, what > switch you're connected to. > > Also, do you see any stats or events on the switch side when link is lost? > > - Jesse > > > _______________________________________________ > E1000-devel mailing list > E10...@li...<mailto:E10...@li...> > https://lists.sourceforge.net/lists/listinfo/e1000-devel > To learn more about Intel Ethernet, visit > https://community.intel.com/t5/Ethernet-Products/bd-p/ethernet-products |
From: Assaf A. <as...@qw...> - 2023-12-06 12:05:25
|
Hey guys, Firstly, I'd like to thank you all for helping us out. Attached to this mail are two files with all the statistics (client machine + server machine). *"The passthrough device shouldn't be any problem but I do recommend thatif you're passing through the device to a VM, you try to match thedestination PCIe function number to the origination ID to prevent oddissues.like if your host device is:01:00.1 then (I'm not sure you can do this) I'd hope the VM device is00:06.1, and not 00:06.0"* Exactly what we are doing, we are matching. You can see in the attached files that one of the machines is working with eth0 00:06.0 and the other eth1 00:06.1 *"Also, do you see any stats or events on the switch side when link is lost?"* We use Cisco Nexus switches, and our network engineer said that he sees events of link down from the ports. On Wed, Dec 6, 2023 at 6:42 AM Buchholz, Donald <don...@in...> wrote: > Hi Assaf, > > In addition to the commands listed by Jesse, > please also provide "ethtool -i <eth#>" output. > This will assist us in identifying the NIC and > Firmware revision you are using. > > - Don > > > > -----Original Message----- > > From: Jesse Brandeburg <jes...@in...> > > Sent: Tuesday, December 5, 2023 10:47 AM > > To: Assaf Albo <as...@qw...>; e10...@li...; > Matan > > Levy <ma...@qw...> > > Subject: Re: [e1000-devel] Intel E810 100Gb goes down sporadically > > > > On 12/3/2023 1:26 AM, Assaf Albo via E1000-devel wrote: > > > Hello guys, > > > > > > We are having constant network issues in production in that the link > goes > > > down, waits *exactly* 7-8 seconds, and goes up again. > > > This can happen zero to a few times a day on all our servers; they are > not > > > in the same location and are connected to different network devices. > > > > > > Each server runs as a KVM virtual machine with 60 CPUs (Pinning) and > 224Gi > > > (Huge pages) - overall performance is excellent. > > > The NIC is PCI passed through to the KVM machine AS IS. > > > OS Rocky Linux 8.5, kernel 4.18.0-348.23.1.el8_5.x86_64 with Intel ice > > > 1.9.11 built and installed using rpm. > > > We have a traffic generator between two servers (our app: > client+server) > > > that is reaching 94Gb and can replicate this issue. > > > > > > The dmesg once the issue occur: > > > Nov 28 16:01:27 SERVER kernel: ice 0000:00:06.0 eth0: NIC Link is Down > > > Nov 28 16:01:35 SERVER kernel: ice 0000:00:06.0 eth0: NIC Link is up > 100 > > > Gbps Full Duplex, Requested FEC: RS-FEC, Negotiated FEC: RS-FEC, > Autoneg > > > Advertised: Off, Autoneg Negotiated: False, Flow Control: None > > > > Hi Assaf, sorry hear you're having problems. > > > > w.r.t. the link down events we need to determine if it is a local down > > or remote. > > > > Please gather the 'ethtool -S eth0' statistics for a system that has had > > some problems, and send to the list as text. > > > > also, 'ethtool -m eth0' > > > > The passthrough device shouldn't be any problem but I do recommend that > > if you're passing through the device to a VM, you try to match the > > destination PCIe function number to the origination ID to prevent odd > > issues. > > > > like if your host device is: > > 01:00.1 then (I'm not sure you can do this) I'd hope the VM device is > > 00:06.1, and not 00:06.0 > > > > So I guess with that statement I'd ask do you ever see the problem on > > systems with > > 3b:00.0 (ice PF PCIe in host) > > 00:06.0 (ice PF in VM) > > > > having the link down issues? > > > > Please include output from devlink dev info, and if you know it, what > > switch you're connected to. > > > > Also, do you see any stats or events on the switch side when link is > lost? > > > > - Jesse > > > > > > _______________________________________________ > > E1000-devel mailing list > > E10...@li... > > https://lists.sourceforge.net/lists/listinfo/e1000-devel > > To learn more about Intel Ethernet, visit > > https://community.intel.com/t5/Ethernet-Products/bd-p/ethernet-products > |
From: Buchholz, D. <don...@in...> - 2023-12-06 04:43:17
|
Hi Assaf, In addition to the commands listed by Jesse, please also provide "ethtool -i <eth#>" output. This will assist us in identifying the NIC and Firmware revision you are using. - Don > -----Original Message----- > From: Jesse Brandeburg <jes...@in...> > Sent: Tuesday, December 5, 2023 10:47 AM > To: Assaf Albo <as...@qw...>; e10...@li...; Matan > Levy <ma...@qw...> > Subject: Re: [e1000-devel] Intel E810 100Gb goes down sporadically > > On 12/3/2023 1:26 AM, Assaf Albo via E1000-devel wrote: > > Hello guys, > > > > We are having constant network issues in production in that the link goes > > down, waits *exactly* 7-8 seconds, and goes up again. > > This can happen zero to a few times a day on all our servers; they are not > > in the same location and are connected to different network devices. > > > > Each server runs as a KVM virtual machine with 60 CPUs (Pinning) and 224Gi > > (Huge pages) - overall performance is excellent. > > The NIC is PCI passed through to the KVM machine AS IS. > > OS Rocky Linux 8.5, kernel 4.18.0-348.23.1.el8_5.x86_64 with Intel ice > > 1.9.11 built and installed using rpm. > > We have a traffic generator between two servers (our app: client+server) > > that is reaching 94Gb and can replicate this issue. > > > > The dmesg once the issue occur: > > Nov 28 16:01:27 SERVER kernel: ice 0000:00:06.0 eth0: NIC Link is Down > > Nov 28 16:01:35 SERVER kernel: ice 0000:00:06.0 eth0: NIC Link is up 100 > > Gbps Full Duplex, Requested FEC: RS-FEC, Negotiated FEC: RS-FEC, Autoneg > > Advertised: Off, Autoneg Negotiated: False, Flow Control: None > > Hi Assaf, sorry hear you're having problems. > > w.r.t. the link down events we need to determine if it is a local down > or remote. > > Please gather the 'ethtool -S eth0' statistics for a system that has had > some problems, and send to the list as text. > > also, 'ethtool -m eth0' > > The passthrough device shouldn't be any problem but I do recommend that > if you're passing through the device to a VM, you try to match the > destination PCIe function number to the origination ID to prevent odd > issues. > > like if your host device is: > 01:00.1 then (I'm not sure you can do this) I'd hope the VM device is > 00:06.1, and not 00:06.0 > > So I guess with that statement I'd ask do you ever see the problem on > systems with > 3b:00.0 (ice PF PCIe in host) > 00:06.0 (ice PF in VM) > > having the link down issues? > > Please include output from devlink dev info, and if you know it, what > switch you're connected to. > > Also, do you see any stats or events on the switch side when link is lost? > > - Jesse > > > _______________________________________________ > E1000-devel mailing list > E10...@li... > https://lists.sourceforge.net/lists/listinfo/e1000-devel > To learn more about Intel Ethernet, visit > https://community.intel.com/t5/Ethernet-Products/bd-p/ethernet-products |
From: Jesse B. <jes...@in...> - 2023-12-05 18:47:11
|
On 12/3/2023 1:26 AM, Assaf Albo via E1000-devel wrote: > Hello guys, > > We are having constant network issues in production in that the link goes > down, waits *exactly* 7-8 seconds, and goes up again. > This can happen zero to a few times a day on all our servers; they are not > in the same location and are connected to different network devices. > > Each server runs as a KVM virtual machine with 60 CPUs (Pinning) and 224Gi > (Huge pages) - overall performance is excellent. > The NIC is PCI passed through to the KVM machine AS IS. > OS Rocky Linux 8.5, kernel 4.18.0-348.23.1.el8_5.x86_64 with Intel ice > 1.9.11 built and installed using rpm. > We have a traffic generator between two servers (our app: client+server) > that is reaching 94Gb and can replicate this issue. > > The dmesg once the issue occur: > Nov 28 16:01:27 SERVER kernel: ice 0000:00:06.0 eth0: NIC Link is Down > Nov 28 16:01:35 SERVER kernel: ice 0000:00:06.0 eth0: NIC Link is up 100 > Gbps Full Duplex, Requested FEC: RS-FEC, Negotiated FEC: RS-FEC, Autoneg > Advertised: Off, Autoneg Negotiated: False, Flow Control: None Hi Assaf, sorry hear you're having problems. w.r.t. the link down events we need to determine if it is a local down or remote. Please gather the 'ethtool -S eth0' statistics for a system that has had some problems, and send to the list as text. also, 'ethtool -m eth0' The passthrough device shouldn't be any problem but I do recommend that if you're passing through the device to a VM, you try to match the destination PCIe function number to the origination ID to prevent odd issues. like if your host device is: 01:00.1 then (I'm not sure you can do this) I'd hope the VM device is 00:06.1, and not 00:06.0 So I guess with that statement I'd ask do you ever see the problem on systems with 3b:00.0 (ice PF PCIe in host) 00:06.0 (ice PF in VM) having the link down issues? Please include output from devlink dev info, and if you know it, what switch you're connected to. Also, do you see any stats or events on the switch side when link is lost? - Jesse |
From: Assaf A. <as...@qw...> - 2023-12-03 10:26:14
|
Hello guys, We are having constant network issues in production in that the link goes down, waits *exactly* 7-8 seconds, and goes up again. This can happen zero to a few times a day on all our servers; they are not in the same location and are connected to different network devices. Each server runs as a KVM virtual machine with 60 CPUs (Pinning) and 224Gi (Huge pages) - overall performance is excellent. The NIC is PCI passed through to the KVM machine AS IS. OS Rocky Linux 8.5, kernel 4.18.0-348.23.1.el8_5.x86_64 with Intel ice 1.9.11 built and installed using rpm. We have a traffic generator between two servers (our app: client+server) that is reaching 94Gb and can replicate this issue. The dmesg once the issue occur: Nov 28 16:01:27 SERVER kernel: ice 0000:00:06.0 eth0: NIC Link is Down Nov 28 16:01:35 SERVER kernel: ice 0000:00:06.0 eth0: NIC Link is up 100 Gbps Full Duplex, Requested FEC: RS-FEC, Negotiated FEC: RS-FEC, Autoneg Advertised: Off, Autoneg Negotiated: False, Flow Control: None Thanks, Assaf |
From: Eduardo E <ee...@gm...> - 2023-11-08 20:57:20
|
Hi, I have a board based on Intel C3000 (Denverton) processor which has an embedded X553 controller, attached to uPC MDIO there's a PHY that I have to access from user space. Using in-tree ixgbe driver I can access it although ioctl call takes ~40ms to complete (as below) which is too slow. In-tree-driver ioctl performance: $ sudo strace -T --trace=ioctl ./phytool read eno1/0/2 ioctl(3, SIOCGMIIREG, 0x7fffc81f42c0) = 0 <0.039746> 0x0141 Using a test program which call PHY read 1000 times the program takes 40s to complete although it uses 22ms of user + system time. $ time sudo ./mdio_ioctl_performance 1000 IOCTL Mode reads real 0m40.050s user 0m0.011s sys 0m0.011s Changing to out-of-tree driver (5.19.6) external MDIO access does not work at all even though ethernet ports work. I've noticed that using out-of-tree driver the mdio bus disappears from "/sys/class/mdio_bus/ixgbe-mdio-0000:05:00.0" I've tried Kernel 5.4, 5.15 and 6.5 and the result is the same both using in-tree and out-of-tree driver. Any Idea on how to enable external MDIO access in out-of-tree driver or improve performance using in-tree driver? |
From: adelio A. <aa...@ac...> - 2023-11-08 15:39:59
|
Good Morning, With the driver provided with the Ubuntu 22.04.3 Tls kernel, the I210 interface continually resets. even with the latest kernel update. same with Ubuntu version 23.10.1 I retrieved the igb-5.14.16 driver from the site http://sourceforge.net/projects/e1000, the compilation went smoothly and my I210 interface no longer resets. my current version is : Ubuntu 22.04.3 LTS (GNU/Linux 5.15.0-88-generic x86_64) filename: /lib/modules/5.15.0-88-generic/updates/drivers/net/ethernet/intel/igb/igb.ko version: 5.14.16 in attached file my previous email. Cordialement, Adelio ALVES Technicien Hotline: 0 892 700 131 aa...@ac... www.actn.fr | 3 impasse Denis Papin - Z.I. de Pahin - BP 10016 - 31170 Tournefeuille -----Message d'origine----- De : Jesse Brandeburg <jes...@in...> Envoyé : mercredi 8 novembre 2023 00:16 À : adelio ALVES <aa...@ac...>; e10...@li... Objet : Re: [e1000-devel] idg driver compilation error on Ubuntu On 10/30/2023 3:27 AM, adelio ALVES wrote: Thanks for your report! Something happened to the content of your message when I released it to the mailing list. Please use the driver included in your kernel (igb.ko.xz or the like) and let us know if you have any problems. Was there a reason you wanted to run the out-of-tree igb-5.7.2 driver? Kernel version 5.15.0-97-generic should already have a working igb driver. Thanks, Jesse |
From: Jesse B. <jes...@in...> - 2023-11-07 23:31:44
|
On 10/30/2023 3:27 AM, adelio ALVES wrote: Thanks for your report! Something happened to the content of your message when I released it to the mailing list. Please use the driver included in your kernel (igb.ko.xz or the like) and let us know if you have any problems. Was there a reason you wanted to run the out-of-tree igb-5.7.2 driver? Kernel version 5.15.0-97-generic should already have a working igb driver. Thanks, Jesse |
From: pkz <pen...@gm...> - 2023-11-01 07:29:09
|
Hello, I need help with my network card, which is experiencing a network interruption issue. The log shows: "NETDEV WATCHDOG: ens238f1 (i40e): transmit queue 86 timed out." ------ Oct 27 16:40:05 C2-82-172 kernel: [3806344.657561] i40e 0000:c2:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0067 address=0x8 flags=0x0000] Oct 27 16:40:05 C2-82-172 kernel: [3806344.657645] i40e 0000:c2:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0067 address=0x1008 flags=0x0000] Oct 27 16:40:05 C2-82-172 kernel: [3806344.657711] i40e 0000:c2:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0067 address=0x2008 flags=0x0000] Oct 27 16:40:05 C2-82-172 kernel: [3806344.657776] i40e 0000:c2:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0067 address=0x2500 flags=0x0000] Oct 27 16:40:05 C2-82-172 kernel: [3806344.657841] i40e 0000:c2:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0067 address=0x3008 flags=0x0000] Oct 27 16:40:05 C2-82-172 kernel: [3806344.657905] i40e 0000:c2:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0067 address=0x3a08 flags=0x0000] Oct 27 16:40:05 C2-82-172 kernel: [3806344.657969] i40e 0000:c2:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0067 address=0x4008 flags=0x0000] Oct 27 16:40:05 C2-82-172 kernel: [3806344.658034] i40e 0000:c2:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0067 address=0x4608 flags=0x0000] Oct 27 16:40:05 C2-82-172 kernel: [3806344.658099] i40e 0000:c2:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0067 address=0x4c08 flags=0x0000] Oct 27 16:40:05 C2-82-172 kernel: [3806344.658164] i40e 0000:c2:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0067 address=0x5008 flags=0x0000] Oct 27 16:40:12 C2-82-172 kernel: [3806351.537473] ------------[ cut here ]------------ Oct 27 16:40:12 C2-82-172 kernel: [3806351.537503] NETDEV WATCHDOG: ens238f1 (i40e): transmit queue 86 timed out Oct 27 16:40:12 C2-82-172 kernel: [3806351.537526] WARNING: CPU: 56 PID: 0 at net/sched/sch_generic.c:472 dev_watchdog+0x258/0x260 Oct 27 16:40:12 C2-82-172 kernel: [3806351.537530] Modules linked in: ipmi_poweroff ipmi_watchdog xt_CT xt_tcpudp ipt_rpfilter xt_multiport iptable_raw ip_set_hash_ip ip_set_hash_net ipip tunnel4 ip_tunnel wireguard ip6_udp_tunnel udp_tunnel nf_tables veth ip6table_mangle iptable_mangle ip6table_filter xt_conntrack xt_MASQUERADE xt_mark xt_addrtype xt_set nf_conntrack_netlink ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_bitmap_port ip_set_hash_ipport dummy xt_comment iptable_filter ip6table_nat ip6_tables iptable_nat nf_nat bpfilter binfmt_misc aufs ip_set nfnetlink overlay bonding dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua amd64_edac_mod edac_mce_amd kvm_amd kvm ipmi_ssif snd_hda_codec_hdmi joydev cdc_ether input_leds usbnet mii ccp snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd soundcore k10temp ipmi_si ipmi_devintf ipmi_msghandler mac_hid nvidia_uvm(OE) sch_fq_codel msr ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp Oct 27 16:40:12 C2-82-172 kernel: [3806351.537615] llc sunrpc ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear nvidia_drm(POE) nvidia_modeset(POE) ast drm_vram_helper crct10dif_pclmul crc32_pclmul i2c_algo_bit ghash_clmulni_intel aesni_intel nvidia(POE) ttm hid_generic crypto_simd cryptd glue_helper drm_kms_helper syscopyarea mpt3sas sysfillrect sysimgblt ahci usbhid raid_class fb_sys_fops i40e hid libahci scsi_transport_sas drm i2c_piix4 Oct 27 16:40:12 C2-82-172 kernel: [3806351.537668] CPU: 56 PID: 0 Comm: swapper/56 Tainted: P OE 5.4.0-81-generic #91-Ubuntu Oct 27 16:40:12 C2-82-172 kernel: [3806351.537669] Hardware name: ASUSTeK COMPUTER INC. ESC8000A-E11/KMPG-D32 Series, BIOS 0103 04/18/2022 Oct 27 16:40:12 C2-82-172 kernel: [3806351.537672] RIP: 0010:dev_watchdog+0x258/0x260 Oct 27 16:40:12 C2-82-172 kernel: [3806351.537677] Code: 85 c0 75 e5 eb 9f 4c 89 ff c6 05 20 e3 ec 00 01 e8 fd c1 fa ff 44 89 e9 4c 89 fe 48 c7 c7 d8 b6 03 9c 48 89 c2 e8 ea d8 13 00 <0f> 0b eb 80 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 41 57 49 89 d7 Oct 27 16:40:12 C2-82-172 kernel: [3806351.537682] RSP: 0018:ffffb05a99da8e30 EFLAGS: 00010286 Oct 27 16:40:12 C2-82-172 kernel: [3806351.537684] RAX: 0000000000000000 RBX: ffff9115f89b9ec0 RCX: 0000000000000000 Oct 27 16:40:12 C2-82-172 kernel: [3806351.537685] RDX: ffff91163f227740 RSI: ffff91163f2178c8 RDI: 0000000000000300 Oct 27 16:40:12 C2-82-172 kernel: [3806351.537686] RBP: ffffb05a99da8e60 R08: ffff91163f2178c8 R09: 0000000000000004 Oct 27 16:40:12 C2-82-172 kernel: [3806351.537687] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000080 Oct 27 16:40:12 C2-82-172 kernel: [3806351.537688] R13: 0000000000000056 R14: ffff9115f98a2480 R15: ffff9115f98a2000 Oct 27 16:40:12 C2-82-172 kernel: [3806351.537690] FS: 0000000000000000(0000) GS:ffff91163f200000(0000) knlGS:0000000000000000 Oct 27 16:40:12 C2-82-172 kernel: [3806351.537695] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Oct 27 16:40:12 C2-82-172 kernel: [3806351.537696] CR2: 000000000307f6e0 CR3: 0000010df0c0a000 CR4: 0000000000340ee0 Oct 27 16:40:12 C2-82-172 kernel: [3806351.537697] Call Trace: Oct 27 16:40:12 C2-82-172 kernel: [3806351.537699] <IRQ> Oct 27 16:40:12 C2-82-172 kernel: [3806351.537704] ? pfifo_fast_enqueue+0x150/0x150 Oct 27 16:40:12 C2-82-172 kernel: [3806351.537710] call_timer_fn+0x32/0x130 Oct 27 16:40:12 C2-82-172 kernel: [3806351.537721] __run_timers.part.0+0x180/0x280 Oct 27 16:40:12 C2-82-172 kernel: [3806351.537725] ? tick_sched_handle+0x33/0x60 Oct 27 16:40:12 C2-82-172 kernel: [3806351.537729] ? tick_sched_timer+0x3d/0x80 Oct 27 16:40:12 C2-82-172 kernel: [3806351.537734] ? ktime_get+0x3e/0xa0 Oct 27 16:40:12 C2-82-172 kernel: [3806351.537736] run_timer_softirq+0x2a/0x50 Oct 27 16:40:12 C2-82-172 kernel: [3806351.537741] __do_softirq+0xe1/0x2d6 Oct 27 16:40:12 C2-82-172 kernel: [3806351.537746] ? hrtimer_interrupt+0x136/0x220 Oct 27 16:40:12 C2-82-172 kernel: [3806351.537752] irq_exit+0xae/0xb0 Oct 27 16:40:12 C2-82-172 kernel: [3806351.537756] smp_apic_timer_interrupt+0x7b/0x140 Oct 27 16:40:12 C2-82-172 kernel: [3806351.537758] apic_timer_interrupt+0xf/0x20 Oct 27 16:40:12 C2-82-172 kernel: [3806351.537762] </IRQ> Oct 27 16:40:12 C2-82-172 kernel: [3806351.537770] RIP: 0010:cpuidle_enter_state+0xc5/0x450 Oct 27 16:40:12 C2-82-172 kernel: [3806351.537777] Code: ff e8 0f f2 84 ff 80 7d c7 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 65 03 00 00 31 ff e8 72 f6 8a ff fb 66 0f 1f 44 00 00 <45> 85 ed 0f 88 8f 02 00 00 49 63 cd 4c 8b 7d d0 4c 2b 7d c8 48 8d Oct 27 16:40:12 C2-82-172 kernel: [3806351.537781] RSP: 0018:ffffb05a8068fe38 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 Oct 27 16:40:12 C2-82-172 kernel: [3806351.537787] RAX: ffff91163f22adc0 RBX: ffffffff9c369380 RCX: 000000000000001f Oct 27 16:40:12 C2-82-172 kernel: [3806351.537788] RDX: 0000000000000000 RSI: 000000002c388bcf RDI: 0000000000000000 Oct 27 16:40:12 C2-82-172 kernel: [3806351.537789] RBP: ffffb05a8068fe78 R08: 000d85db4747addb R09: 0000000000000000 Oct 27 16:40:12 C2-82-172 kernel: [3806351.537790] R10: ffff91193ff55328 R11: 0000000000000000 R12: ffff91160001bc00 Oct 27 16:40:12 C2-82-172 kernel: [3806351.537794] R13: 0000000000000002 R14: 0000000000000002 R15: ffff91160001bc00 Oct 27 16:40:12 C2-82-172 kernel: [3806351.537798] ? cpuidle_enter_state+0xa1/0x450 Oct 27 16:40:12 C2-82-172 kernel: [3806351.537802] cpuidle_enter+0x2e/0x40 Oct 27 16:40:12 C2-82-172 kernel: [3806351.537807] call_cpuidle+0x23/0x40 Oct 27 16:40:12 C2-82-172 kernel: [3806351.537811] do_idle+0x1dd/0x270 Oct 27 16:40:12 C2-82-172 kernel: [3806351.537814] cpu_startup_entry+0x20/0x30 Oct 27 16:40:12 C2-82-172 kernel: [3806351.537820] start_secondary+0x167/0x1c0 Oct 27 16:40:12 C2-82-172 kernel: [3806351.537824] secondary_startup_64+0xa4/0xb0 Oct 27 16:40:12 C2-82-172 kernel: [3806351.537828] ---[ end trace ab47ea50b75ed1a1 ]--- Oct 27 16:40:12 C2-82-172 kernel: [3806351.537852] i40e 0000:c2:00.1 ens238f1: tx_timeout: VSI_seid: 391, Q 86, NTC: 0xc8, HWB: 0xc8, NTU: 0xf1, TAIL: 0xf1, INT: 0x1 Oct 27 16:40:12 C2-82-172 kernel: [3806351.537859] i40e 0000:c2:00.1 ens238f1: tx_timeout recovery level 1, hung_queue 86 Oct 27 16:40:12 C2-82-172 kernel: [3806351.538666] i40e 0000:c2:00.1: VSI seid 391 Tx ring 0 disable timeout Oct 27 16:40:13 C2-82-172 kernel: [3806352.148680] i40e 0000:c2:00.1: VSI seid 393 Tx ring 128 disable timeout Oct 27 16:40:13 C2-82-172 kernel: [3806352.209797] bond0: (slave ens238f1): link status definitely down, disabling slave Oct 27 16:40:13 C2-82-172 kernel: [3806352.380340] i40e 0000:c2:00.0: VSI seid 390 Tx ring 0 disable timeout Oct 27 16:40:13 C2-82-172 kernel: [3806352.550050] i40e 0000:c2:00.0: VSI seid 392 Tx ring 128 disable timeout Oct 27 16:40:13 C2-82-172 kernel: [3806352.609619] bond0: (slave ens238f0): link status definitely down, disabling slave Oct 27 16:40:16 C2-82-172 kernel: [3806355.499723] i40e 0000:c2:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0067 address=0x5208 flags=0x0000] Oct 27 16:40:16 C2-82-172 kernel: [3806355.499780] i40e 0000:c2:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0067 address=0x6008 flags=0x0000] Oct 27 16:40:16 C2-82-172 kernel: [3806355.499818] i40e 0000:c2:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0067 address=0x8008 flags=0x0000] Oct 27 16:40:16 C2-82-172 kernel: [3806355.499856] i40e 0000:c2:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0067 address=0x7008 flags=0x0000] Oct 27 16:40:16 C2-82-172 kernel: [3806355.499894] i40e 0000:c2:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0067 address=0x9008 flags=0x0000] Oct 27 16:40:16 C2-82-172 kernel: [3806355.499932] i40e 0000:c2:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0067 address=0xa008 flags=0x0000] Oct 27 16:40:16 C2-82-172 kernel: [3806355.499968] i40e 0000:c2:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0067 address=0xac08 flags=0x0000] Oct 27 16:40:16 C2-82-172 kernel: [3806355.500005] i40e 0000:c2:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0067 address=0xb008 flags=0x0000] Oct 27 16:40:16 C2-82-172 kernel: [3806355.500042] i40e 0000:c2:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0067 address=0xbc08 flags=0x0000] Oct 27 16:40:16 C2-82-172 kernel: [3806355.500080] i40e 0000:c2:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0067 address=0xc008 flags=0x0000] Oct 27 16:40:16 C2-82-172 kernel: [3806355.500121] amd_iommu_report_page_fault: 20 callbacks suppressed Oct 27 16:40:16 C2-82-172 kernel: [3806355.500122] AMD-Vi: Event logged [IO_PAGE_FAULT device=c2:00.1 domain=0x0067 address=0xcc08 flags=0x0000] Oct 27 16:40:16 C2-82-172 kernel: [3806355.500159] AMD-Vi: Event logged [IO_PAGE_FAULT device=c2:00.1 domain=0x0067 address=0xd008 flags=0x0000] Oct 27 16:40:16 C2-82-172 kernel: [3806355.500196] AMD-Vi: Event logged [IO_PAGE_FAULT device=c2:00.1 domain=0x0067 address=0xdc08 flags=0x0000] Oct 27 16:40:16 C2-82-172 kernel: [3806355.500233] AMD-Vi: Event logged [IO_PAGE_FAULT device=c2:00.1 domain=0x0067 address=0xe008 flags=0x0000] Oct 27 16:40:16 C2-82-172 kernel: [3806355.500271] AMD-Vi: Event logged [IO_PAGE_FAULT device=c2:00.1 domain=0x0067 address=0xec08 flags=0x0000] Oct 27 16:40:16 C2-82-172 kernel: [3806355.500306] AMD-Vi: Event logged [IO_PAGE_FAULT device=c2:00.1 domain=0x0067 address=0xf008 flags=0x0000] Oct 27 16:40:16 C2-82-172 kernel: [3806355.500342] AMD-Vi: Event logged [IO_PAGE_FAULT device=c2:00.1 domain=0x0067 address=0xfc08 flags=0x0000] Oct 27 16:40:16 C2-82-172 kernel: [3806355.500378] AMD-Vi: Event logged [IO_PAGE_FAULT device=c2:00.1 domain=0x0067 address=0x10008 flags=0x0000] Oct 27 16:40:16 C2-82-172 kernel: [3806355.500413] AMD-Vi: Event logged [IO_PAGE_FAULT device=c2:00.1 domain=0x0067 address=0x10e08 flags=0x0000] Oct 27 16:40:16 C2-82-172 kernel: [3806355.500449] AMD-Vi: Event logged [IO_PAGE_FAULT device=c2:00.1 domain=0x0067 address=0x11008 flags=0x0000] Oct 27 16:40:17 C2-82-172 kernel: [3806356.241885] bond0: (slave ens238f1): link status definitely up, 10000 Mbps full duplex Oct 27 16:40:17 C2-82-172 kernel: [3806356.241905] bond0: active interface up! Oct 27 16:40:17 C2-82-172 kernel: [3806356.242036] bond0: (slave ens238f0): link status definitely up, 10000 Mbps full duplex Oct 27 16:40:22 C2-82-172 kernel: [3806361.521403] i40e 0000:c2:00.0 ens238f0: tx_timeout: VSI_seid: 390, Q 47, NTC: 0x0, HWB: 0x0, NTU: 0x1, TAIL: 0x1, INT: 0x1 Oct 27 16:40:22 C2-82-172 kernel: [3806361.521433] i40e 0000:c2:00.0 ens238f0: tx_timeout recovery level 1, hung_queue 47 Oct 27 16:40:22 C2-82-172 kernel: [3806361.521465] i40e 0000:c2:00.1 ens238f1: tx_timeout: VSI_seid: 391, Q 47, NTC: 0x0, HWB: 0x0, NTU: 0x1, TAIL: 0x1, INT: 0x1 Oct 27 16:40:22 C2-82-172 kernel: [3806361.521469] i40e 0000:c2:00.1 ens238f1: tx_timeout recovery level 2, hung_queue 47 Oct 27 16:40:22 C2-82-172 kernel: [3806361.522227] i40e 0000:c2:00.0: VSI seid 390 Tx ring 0 disable timeout Oct 27 16:40:22 C2-82-172 kernel: [3806361.689171] i40e 0000:c2:00.0: VSI seid 393 Tx ring 128 disable timeout Oct 27 16:40:22 C2-82-172 kernel: [3806361.742281] i40e 0000:c2:00.1: VSI seid 391 Tx ring 0 disable timeout Oct 27 16:40:23 C2-82-172 kernel: [3806362.002685] i40e 0000:c2:00.1: VSI seid 392 Tx ring 128 disable timeout Oct 27 16:40:23 C2-82-172 kernel: [3806362.061900] bond0: (slave ens238f1): link status definitely down, disabling slave Oct 27 16:40:23 C2-82-172 kernel: [3806362.061911] bond0: now running without any active interface! Oct 27 16:40:23 C2-82-172 kernel: [3806362.062180] bond0: (slave ens238f0): link status definitely down, disabling slave Oct 27 16:40:26 C2-82-172 kernel: [3806365.405761] bond0: (slave ens238f1): link status definitely up, 10000 Mbps full duplex Oct 27 16:40:26 C2-82-172 kernel: [3806365.405771] bond0: active interface up! Oct 27 16:40:26 C2-82-172 kernel: [3806365.405894] bond0: (slave ens238f0): link status definitely up, 10000 Mbps full duplex Oct 27 16:40:31 C2-82-172 kernel: [3806370.737324] i40e 0000:c2:00.0 ens238f0: tx_timeout: VSI_seid: 390, Q 47, NTC: 0x0, HWB: 0x0, NTU: 0x1, TAIL: 0x1, INT: 0x1 Oct 27 16:40:31 C2-82-172 kernel: [3806370.737349] i40e 0000:c2:00.0 ens238f0: tx_timeout recovery level 2, hung_queue 47 Oct 27 16:40:31 C2-82-172 kernel: [3806370.737464] i40e 0000:c2:00.1 ens238f1: tx_timeout: VSI_seid: 391, Q 22, NTC: 0x0, HWB: 0x0, NTU: 0x2, TAIL: 0x2, INT: 0x1 Oct 27 16:40:31 C2-82-172 kernel: [3806370.737472] i40e 0000:c2:00.1 ens238f1: tx_timeout recovery level 3, hung_queue 22 Oct 27 16:40:31 C2-82-172 kernel: [3806370.738263] i40e 0000:c2:00.1: VSI seid 391 Tx ring 0 disable timeout Oct 27 16:40:32 C2-82-172 kernel: [3806371.006836] i40e 0000:c2:00.1: VSI seid 392 Tx ring 128 disable timeout Oct 27 16:40:32 C2-82-172 kernel: [3806371.058110] i40e 0000:c2:00.0: VSI seid 390 Tx ring 0 disable timeout Oct 27 16:40:32 C2-82-172 kernel: [3806371.235049] i40e 0000:c2:00.0: VSI seid 393 Tx ring 128 disable timeout Oct 27 16:40:32 C2-82-172 kernel: [3806371.301567] bond0: (slave ens238f1): link status definitely down, disabling slave Oct 27 16:40:32 C2-82-172 kernel: [3806371.301580] bond0: now running without any active interface! Oct 27 16:40:32 C2-82-172 kernel: [3806371.301790] bond0: (slave ens238f0): link status definitely down, disabling slave Oct 27 16:40:38 C2-82-172 kernel: [3806377.827814] i40e 0000:c2:00.0 ens238f0: NIC Link is Down Oct 27 16:40:39 C2-82-172 kernel: [3806378.302982] i40e 0000:c2:00.1 ens238f1: NIC Link is Down Oct 27 16:40:39 C2-82-172 kernel: [3806378.601264] i40e 0000:c2:00.0 ens238f0: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None Oct 27 16:40:39 C2-82-172 kernel: [3806378.611104] i40e 0000:c2:00.1 ens238f1: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None Oct 27 16:40:39 C2-82-172 kernel: [3806378.641449] bond0: (slave ens238f1): link status definitely up, 10000 Mbps full duplex Oct 27 16:40:39 C2-82-172 kernel: [3806378.641464] bond0: active interface up! Oct 27 16:40:39 C2-82-172 kernel: [3806378.641590] bond0: (slave ens238f0): link status definitely up, 10000 Mbps full duplex Oct 27 16:40:44 C2-82-172 kernel: [3806383.793205] i40e 0000:c2:00.1 ens238f1: tx_timeout: VSI_seid: 391, Q 47, NTC: 0x0, HWB: 0x0, NTU: 0x1, TAIL: 0x1, INT: 0x1 Oct 27 16:40:44 C2-82-172 kernel: [3806383.793229] i40e 0000:c2:00.1 ens238f1: tx_timeout recovery level 4, hung_queue 47 Oct 27 16:40:44 C2-82-172 kernel: [3806383.793231] i40e 0000:c2:00.1 ens238f1: tx_timeout recovery unsuccessful Oct 27 16:40:44 C2-82-172 kernel: [3806383.793316] i40e 0000:c2:00.0 ens238f0: tx_timeout: VSI_seid: 390, Q 47, NTC: 0x0, HWB: 0x0, NTU: 0x1, TAIL: 0x1, INT: 0x1 Oct 27 16:40:44 C2-82-172 kernel: [3806383.793324] i40e 0000:c2:00.0 ens238f0: tx_timeout recovery level 3, hung_queue 47 Oct 27 16:40:44 C2-82-172 kernel: [3806383.794875] i40e 0000:c2:00.1: VSI seid 391 Tx ring 0 disable timeout Oct 27 16:40:45 C2-82-172 kernel: [3806383.984483] i40e 0000:c2:00.1: VSI seid 393 Tx ring 128 disable timeout Oct 27 16:40:45 C2-82-172 kernel: [3806384.035239] i40e 0000:c2:00.0: VSI seid 390 Tx ring 0 disable timeout Oct 27 16:40:45 C2-82-172 kernel: [3806384.279796] i40e 0000:c2:00.0: VSI seid 392 Tx ring 128 disable timeout Oct 27 16:40:45 C2-82-172 kernel: [3806384.341668] bond0: (slave ens238f1): link status definitely down, disabling slave Oct 27 16:40:45 C2-82-172 kernel: [3806384.341679] bond0: now running without any active interface! Oct 27 16:40:45 C2-82-172 kernel: [3806384.341945] bond0: (slave ens238f0): link status definitely down, disabling slave Oct 27 16:40:48 C2-82-172 kernel: [3806387.019142] i40e 0000:c2:00.0 ens238f0: NIC Link is Down Oct 27 16:40:48 C2-82-172 kernel: [3806387.442791] i40e 0000:c2:00.1 ens238f1: NIC Link is Down Oct 27 16:40:48 C2-82-172 kernel: [3806387.737179] i40e 0000:c2:00.0 ens238f0: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None Oct 27 16:40:48 C2-82-172 kernel: [3806387.747166] i40e 0000:c2:00.1 ens238f1: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None Oct 27 16:40:48 C2-82-172 kernel: [3806387.773306] bond0: (slave ens238f1): link status definitely up, 10000 Mbps full duplex Oct 27 16:40:48 C2-82-172 kernel: [3806387.773315] bond0: active interface up! Oct 27 16:40:48 C2-82-172 kernel: [3806387.773438] bond0: (slave ens238f0): link status definitely up, 10000 Mbps full duplex Oct 27 16:40:58 C2-82-172 kernel: [3806397.873092] i40e 0000:c2:00.0 ens238f0: tx_timeout: VSI_seid: 390, Q 47, NTC: 0x0, HWB: 0x0, NTU: 0x1, TAIL: 0x1, INT: 0x1 Oct 27 16:40:58 C2-82-172 kernel: [3806397.873096] i40e 0000:c2:00.0 ens238f0: tx_timeout recovery level 4, hung_queue 47 Oct 27 16:40:58 C2-82-172 kernel: [3806397.873097] i40e 0000:c2:00.0 ens238f0: tx_timeout recovery unsuccessful Oct 27 16:40:58 C2-82-172 kernel: [3806397.873181] i40e 0000:c2:00.1 ens238f1: tx_timeout: VSI_seid: 391, Q 47, NTC: 0x0, HWB: 0x0, NTU: 0x1, TAIL: 0x1, INT: 0x1 Oct 27 16:40:58 C2-82-172 kernel: [3806397.873183] i40e 0000:c2:00.1 ens238f1: tx_timeout recovery level 5, hung_queue 47 Oct 27 16:40:58 C2-82-172 kernel: [3806397.873184] i40e 0000:c2:00.1 ens238f1: tx_timeout recovery unsuccessful |
From: adelio A. <aa...@ac...> - 2023-10-30 15:01:34
|