[Linuxptp-users] Support for latest igb driver?
PTP IEEE 1588 stack for Linux
Brought to you by:
rcochran
From: Rich S. <sch...@gm...> - 2016-12-21 21:26:25
|
I've been testing linuxptp for about a year (now version 1.8) and am still seeing the following failure always after 8 or more days of successful operation: port 1: send delay request failed port 1: SLAVE to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED) It then goes into an endless loop of attempting to clear, getting some good master offsets, then faulting again. *Killing and restarting ptp4l does not fix the problem. It requires a power cycle of the server (power cycle of the NIC). * That makes me suspect the igb driver itself. In the latest test linuxptp ran for *14.7 days* before FAULTing, leading me to wonder if there is some memory leak in the igb driver? Is igb driver 5.3.5.4 supported by linuxptp? Should I be using an earlier version? Host: Cisco C240M4 Red Hat Enterprise Linux 7: 3.10.0-327.28.2.el7.x86_64 NIC: i350 ethtool -T enp1s0f0 Time stamping parameters for enp1s0f0: Capabilities: hardware-transmit (SOF_TIMESTAMPING_TX_HARDWARE) software-transmit (SOF_TIMESTAMPING_TX_SOFTWARE) hardware-receive (SOF_TIMESTAMPING_RX_HARDWARE) software-receive (SOF_TIMESTAMPING_RX_SOFTWARE) software-system-clock (SOF_TIMESTAMPING_SOFTWARE) hardware-raw-clock (SOF_TIMESTAMPING_RAW_HARDWARE) PTP Hardware Clock: 0 Hardware Transmit Timestamp Modes: off (HWTSTAMP_TX_OFF) on (HWTSTAMP_TX_ON) Hardware Receive Filter Modes: none (HWTSTAMP_FILTER_NONE) all (HWTSTAMP_FILTER_ALL) ethtool -i enp1s0f0 driver: igb version: 5.3.5.4 firmware-version: 1.63, 0x80000c25, 0.384.130 bus-info: 0000:01:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: no Valgrind finds no memory leak in ptp4l: ==3544== HEAP SUMMARY: ==3544== in use at exit: 0 bytes in 0 blocks ==3544== total heap usage: 200 allocs, 200 frees, 40,202 bytes allocated ==3544== ==3544== All heap blocks were freed -- no leaks are possible ==3544== ==3544== For counts of detected and suppressed errors, rerun with: -v ==3544== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 1 from 1) Example output running fine then generating FAULT: ptp4l[4906535.703]: linreg: points 16 slope 0.999999322 intercept 15 err 29 ptp4l[4906535.703]: master offset -48 s2 freq +663 path delay 1647 ptp4l[4906536.703]: linreg: points 16 slope 0.999999323 intercept 17 err 30 ptp4l[4906536.703]: master offset -72 s2 freq +660 path delay 1647 ptp4l[4906537.703]: linreg: points 16 slope 0.999999323 intercept -5 err 30 ptp4l[4906537.703]: master offset 28 s2 freq +682 path delay 1647 ptp4l[4906538.703]: linreg: points 16 slope 0.999999325 intercept 10 err 29 ptp4l[4906538.703]: master offset -14 s2 freq +666 path delay 1647 ptp4l[4906539.478]: port 1: delay timeout ptp4l[4906539.503]: delay filtered 1650 raw 1667 ptp4l[4906539.703]: linreg: points 16 slope 0.999999325 intercept 3 err 29 ptp4l[4906539.703]: master offset -3 s2 freq +671 path delay 1650 ptp4l[4906540.701]: port 1: delay timeout ptp4l[4906540.704]: linreg: points 16 slope 0.999999327 intercept 15 err 30 ptp4l[4906540.704]: master offset -75 s2 freq +658 path delay 1650 ptp4l[4906540.738]: delay filtered 1653 raw 15548 ptp4l[4906541.703]: linreg: points 16 slope 0.999999328 intercept 7 err 30 ptp4l[4906541.703]: master offset -17 s2 freq +666 path delay 1653 ptp4l[4906542.703]: linreg: points 16 slope 0.999999330 intercept 19 err 30 ptp4l[4906542.703]: master offset -43 s2 freq +651 path delay 1653 ptp4l[4906543.703]: linreg: points 16 slope 0.999999331 intercept 4 err 30 ptp4l[4906543.703]: master offset -14 s2 freq +666 path delay 1653 ptp4l[4906544.301]: port 1: delay timeout ptp4l[4906545.303]: timed out while polling for tx timestamp ptp4l[4906545.303]: increasing tx_timestamp_timeout may correct this issue, but it is likely cause d by a driver bug ptp4l[4906545.303]: port 1: send delay request failed ptp4l[4906545.303]: port 1: SLAVE to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED) ptp4l[4906545.303]: waiting 2^{4} seconds to clear fault on port 1 ptp4l[4906561.303]: clearing fault on port 1 ptp4l[4906561.303]: config item enp1s0f0.logMinDelayReqInterval is 2 ptp4l[4906561.303]: config item enp1s0f0.logAnnounceInterval is 0 ptp4l[4906561.303]: config item enp1s0f0.announceReceiptTimeout is 4 ptp4l[4906561.303]: config item enp1s0f0.syncReceiptTimeout is 0 ptp4l[4906561.303]: config item enp1s0f0.transportSpecific is 0 ptp4l[4906561.303]: config item enp1s0f0.logSyncInterval is 0 ptp4l[4906561.303]: config item enp1s0f0.logMinPdelayReqInterval is 2 ptp4l[4906561.303]: config item enp1s0f0.neighborPropDelayThresh is 20000000 ptp4l[4906561.303]: config item enp1s0f0.min_neighbor_prop_delay is -20000000 ptp4l[4906561.303]: config item enp1s0f0.udp_ttl is 1 ptp4l[4906561.305]: driver changed our HWTSTAMP options ptp4l[4906561.305]: tx_type 1 not 1 ptp4l[4906561.305]: rx_filter 1 not 12 ptp4l[4906561.305]: config item (null).dscp_event is 0 ptp4l[4906561.305]: config item (null).dscp_general is 0 ptp4l[4906561.305]: port 1: FAULTY to LISTENING on FAULT_CLEARED ptp4l[4906561.703]: port 1: setting asCapable ptp4l[4906561.713]: port 1: new foreign master 0019dd.fffe.00085c-1 ptp4l[4906563.713]: selected best master clock 0019dd.fffe.00085c ptp4l[4906563.713]: foreign master not using PTP timescale ptp4l[4906563.713]: running in a temporal vortex ptp4l[4906563.713]: port 1: LISTENING to UNCALIBRATED on RS_SLAVE ptp4l[4906564.704]: linreg: points 8 slope 0.999999339 intercept 123 err 31 ptp4l[4906564.704]: master offset -120 s2 freq +538 path delay 1653 ptp4l[4906564.704]: port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED ptp4l[4906565.704]: linreg: points 8 slope 0.999999340 intercept 29 err 31 ptp4l[4906565.704]: master offset -59 s2 freq +631 path delay 1653 ptp4l[4906566.704]: linreg: points 8 slope 0.999999340 intercept 3 err 31 ptp4l[4906566.704]: master offset -10 s2 freq +656 path delay 1653 ptp4l[4906567.704]: linreg: points 8 slope 0.999999339 intercept -16 err 31 ptp4l[4906567.704]: master offset 53 s2 freq +676 path delay 1653 ptp4l[4906567.720]: port 1: delay timeout ptp4l[4906568.721]: timed out while polling for tx timestamp ptp4l[4906568.721]: increasing tx_timestamp_timeout may correct this issue, but it is likely cause d by a driver bug ptp4l[4906568.721]: port 1: send delay request failed ptp4l[4906568.721]: port 1: SLAVE to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED) ptp4l[4906568.721]: waiting 2^{4} seconds to clear fault on port 1 ptp4l[4906584.722]: clearing fault on port 1 ptp4l[4906584.722]: config item enp1s0f0.logMinDelayReqInterval is 2 ptp4l[4906584.722]: config item enp1s0f0.logAnnounceInterval is 0 ptp4l[4906584.722]: config item enp1s0f0.announceReceiptTimeout is 4 ptp4l[4906584.722]: config item enp1s0f0.syncReceiptTimeout is 0 ptp4l[4906584.722]: config item enp1s0f0.transportSpecific is 0 ptp4l[4906584.722]: config item enp1s0f0.logSyncInterval is 0 ptp4l[4906584.722]: config item enp1s0f0.logMinPdelayReqInterval is 2 ptp4l[4906584.722]: config item enp1s0f0.neighborPropDelayThresh is 20000000 ptp4l[4906584.722]: config item enp1s0f0.min_neighbor_prop_delay is -20000000 ptp4l[4906584.722]: config item enp1s0f0.udp_ttl is 1 ptp4l[4906584.722]: driver changed our HWTSTAMP options ptp4l[4906584.722]: tx_type 1 not 1 ptp4l[4906584.722]: rx_filter 1 not 12 ptp4l[4906584.722]: config item (null).dscp_event is 0 ptp4l[4906584.722]: config item (null).dscp_general is 0 ptp4l[4906584.722]: port 1: FAULTY to LISTENING on FAULT_CLEARED ptp4l[4906585.704]: port 1: setting asCapable ptp4l[4906585.714]: port 1: new foreign master 0019dd.fffe.00085c-1 ptp4l[4906587.714]: selected best master clock 0019dd.fffe.00085c ptp4l[4906587.714]: foreign master not using PTP timescale ptp4l[4906587.714]: running in a temporal vortex ptp4l[4906587.714]: port 1: LISTENING to UNCALIBRATED on RS_SLAVE ptp4l[4906588.705]: linreg: points 8 slope 0.999999345 intercept 506 err 37 ptp4l[4906588.705]: master offset -645 s2 freq +149 path delay 1653 ptp4l[4906588.705]: port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED ptp4l[4906589.705]: linreg: points 8 slope 0.999999347 intercept 81 err 40 ptp4l[4906589.705]: master offset -194 s2 freq +572 path delay 1653 ptp4l[4906590.705]: linreg: points 8 slope 0.999999351 intercept 67 err 43 ptp4l[4906590.705]: master offset -166 s2 freq +582 path delay 1653 ptp4l[4906591.705]: linreg: points 8 slope 0.999999356 intercept 62 err 43 ptp4l[4906591.705]: master offset -68 s2 freq +582 path delay 1653 ptp4l[4906592.417]: port 1: delay timeout ptp4l[4906593.418]: timed out while polling for tx timestamp ptp4l[4906593.418]: increasing tx_timestamp_timeout may correct this issue, but it is likely cause d by a driver bug ptp4l[4906593.418]: port 1: send delay request failed ptp4l[4906593.418]: port 1: SLAVE to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED) ptp4l[4906593.418]: waiting 2^{4} seconds to clear fault on port 1 ptp4l[4906609.419]: clearing fault on port 1 Rich Schmidt, CTR, USNO -- "If you want to build a ship, don’t drum up people to collect wood and don’t assign them tasks and work, but rather teach them to long for the endless immensity of the sea." - *Antoine de Saint-Exupéry* |