Re: [Linuxptp-users] Support for latest igb driver?
PTP IEEE 1588 stack for Linux
Brought to you by:
rcochran
From: Rich S. <sch...@gm...> - 2016-12-30 16:39:25
|
I am sorry to report that the proposed fix to the problem SLAVE to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED), shown below did not resolve the issue. Red Hat LINUX: with source kernel 4.9.0 Intel igb driver: 5.4.0-k Prior to compiling the kernel: cd /usr/src/linux-4.9/drivers/net/ethernet/intel/igb Edit igb_main.c and comment out at line 5715: /* wr32(E1000_TSICR, ack); */ Ran fine for a while then failed as shown below. Able to restore by killing ptp4l, rmmod igb; modprobe igb, restart ptp4l. Here is the ptp4l log after running successfully for 26.65 hours: ptp4l[101975.294]: master offset -58 s2 freq +831 path delay 1632 ptp4l[101976.294]: linreg: points 8 slope 0.999999144 intercept 3 err 25 ptp4l[101976.294]: master offset -10 s2 freq +853 path delay 1632 ptp4l[101976.900]: port 1: delay timeout ptp4l[101976.910]: timed out while polling for tx timestamp ptp4l[101976.910]: increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug ptp4l[101976.910]: port 1: send delay request failed ptp4l[101976.910]: port 1: SLAVE to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED) ptp4l[101976.910]: waiting 2^{4} seconds to clear fault on port 1 ptp4l[101992.911]: clearing fault on port 1 ptp4l[101992.911]: config item enp1s0f0.logMinDelayReqInterval is 2 ptp4l[101992.911]: config item enp1s0f0.logAnnounceInterval is 0 ptp4l[101992.911]: config item enp1s0f0.announceReceiptTimeout is 4 ptp4l[101992.911]: config item enp1s0f0.syncReceiptTimeout is 0 ptp4l[101992.911]: config item enp1s0f0.transportSpecific is 0 ptp4l[101992.911]: config item enp1s0f0.logSyncInterval is 0 ptp4l[101992.911]: config item enp1s0f0.logMinPdelayReqInterval is 2 ptp4l[101992.911]: config item enp1s0f0.neighborPropDelayThresh is 20000000 ptp4l[101992.911]: config item enp1s0f0.min_neighbor_prop_delay is -20000000 ptp4l[101992.911]: config item enp1s0f0.udp_ttl is 1 ptp4l[101992.915]: driver changed our HWTSTAMP options ptp4l[101992.915]: tx_type 1 not 1 ptp4l[101992.915]: rx_filter 1 not 12 ptp4l[101992.915]: config item (null).dscp_event is 0 ptp4l[101992.915]: config item (null).dscp_general is 0 ptp4l[101992.915]: port 1: FAULTY to LISTENING on FAULT_CLEARED ptp4l[101993.294]: port 1: setting asCapable ptp4l[101993.299]: port 1: new foreign master 0019dd.fffe.00085c-1 ptp4l[101995.299]: selected best master clock 0019dd.fffe.00085c ptp4l[101995.299]: foreign master not using PTP timescale ptp4l[101995.299]: running in a temporal vortex ptp4l[101995.299]: port 1: LISTENING to UNCALIBRATED on RS_SLAVE ptp4l[101996.295]: linreg: points 8 slope 0.999999153 intercept 142 err 29 ptp4l[101996.295]: master offset -150 s2 freq +705 path delay 1632 ptp4l[101996.295]: port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED ptp4l[101996.635]: port 1: delay timeout ptp4l[101996.645]: timed out while polling for tx timestamp ptp4l[101996.645]: increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug ptp4l[101996.645]: port 1: send delay request failed ptp4l[101996.645]: port 1: SLAVE to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED) ptp4l[101996.645]: waiting 2^{4} seconds to clear fault on port 1 ptp4l[102012.645]: clearing fault on port 1 . . . Richard Schmidt, CTR Time Service Dept. US Naval Observatory On Wed, Dec 21, 2016 at 4:53 PM, Richard Cochran <ric...@gm...> wrote: > On Wed, Dec 21, 2016 at 04:26:16PM -0500, Rich Schmidt wrote: > > I've been testing linuxptp for about a year (now version 1.8) and am > still > > seeing the following failure always after 8 or more days of successful > > operation: > > > ptp4l[4906544.301]: port 1: delay timeout > > ptp4l[4906545.303]: timed out while polling for tx timestamp > > ptp4l[4906545.303]: increasing tx_timestamp_timeout may correct this > issue, > > but it is likely cause > > d by a driver bug > > ptp4l[4906545.303]: port 1: send delay request failed > > I don't recalling seeing this myself, but still this is the second > such igb failure report I have received recently. > > I wonder whether the incorrect double TSICR acknowledge is the root > cause. In igb_main.c we have: > > static void igb_tsync_interrupt(struct igb_adapter *adapter) > { > struct e1000_hw *hw = &adapter->hw; > struct ptp_clock_event event; > struct timespec64 ts; > u32 ack = 0, tsauxc, sec, nsec, tsicr = rd32(E1000_TSICR); > > ... > > /* acknowledge the interrupts */ > wr32(E1000_TSICR, ack); > } > > According to the datasheet, the first rd32() should already > acknowledge the interrupts, but the 82580 (iirc) has a bug that > requires the additional wr32(). > > Try removing that last line, and see if things improve... > > Thanks, > Richard > -- "If you want to build a ship, don’t drum up people to collect wood and don’t assign them tasks and work, but rather teach them to long for the endless immensity of the sea." - *Antoine de Saint-Exupéry* |