linuxptp-users Mailing List for linuxptp (Page 108)
PTP IEEE 1588 stack for Linux
Brought to you by:
rcochran
You can subscribe to this list here.
2012 |
Jan
|
Feb
(10) |
Mar
(47) |
Apr
|
May
(26) |
Jun
(10) |
Jul
(4) |
Aug
(2) |
Sep
(2) |
Oct
(20) |
Nov
(14) |
Dec
(8) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2013 |
Jan
(6) |
Feb
(18) |
Mar
(27) |
Apr
(57) |
May
(32) |
Jun
(21) |
Jul
(79) |
Aug
(108) |
Sep
(13) |
Oct
(73) |
Nov
(51) |
Dec
(24) |
2014 |
Jan
(24) |
Feb
(41) |
Mar
(39) |
Apr
(5) |
May
(6) |
Jun
(2) |
Jul
(5) |
Aug
(15) |
Sep
(7) |
Oct
(6) |
Nov
|
Dec
(7) |
2015 |
Jan
(27) |
Feb
(18) |
Mar
(37) |
Apr
(8) |
May
(13) |
Jun
(44) |
Jul
(4) |
Aug
(50) |
Sep
(35) |
Oct
(6) |
Nov
(24) |
Dec
(19) |
2016 |
Jan
(30) |
Feb
(30) |
Mar
(23) |
Apr
(4) |
May
(12) |
Jun
(19) |
Jul
(26) |
Aug
(13) |
Sep
|
Oct
(23) |
Nov
(37) |
Dec
(15) |
2017 |
Jan
(33) |
Feb
(19) |
Mar
(20) |
Apr
(43) |
May
(39) |
Jun
(23) |
Jul
(20) |
Aug
(27) |
Sep
(10) |
Oct
(15) |
Nov
|
Dec
(24) |
2018 |
Jan
(3) |
Feb
(10) |
Mar
(34) |
Apr
(34) |
May
(28) |
Jun
(50) |
Jul
(27) |
Aug
(75) |
Sep
(21) |
Oct
(42) |
Nov
(25) |
Dec
(31) |
2019 |
Jan
(39) |
Feb
(28) |
Mar
(19) |
Apr
(7) |
May
(30) |
Jun
(22) |
Jul
(54) |
Aug
(36) |
Sep
(19) |
Oct
(33) |
Nov
(36) |
Dec
(32) |
2020 |
Jan
(29) |
Feb
(38) |
Mar
(29) |
Apr
(30) |
May
(39) |
Jun
(45) |
Jul
(31) |
Aug
(52) |
Sep
(40) |
Oct
(8) |
Nov
(48) |
Dec
(30) |
2021 |
Jan
(35) |
Feb
(32) |
Mar
(23) |
Apr
(55) |
May
(43) |
Jun
(63) |
Jul
(17) |
Aug
(24) |
Sep
(9) |
Oct
(31) |
Nov
(67) |
Dec
(55) |
2022 |
Jan
(31) |
Feb
(48) |
Mar
(76) |
Apr
(18) |
May
(13) |
Jun
(46) |
Jul
(75) |
Aug
(54) |
Sep
(59) |
Oct
(65) |
Nov
(44) |
Dec
(7) |
2023 |
Jan
(38) |
Feb
(32) |
Mar
(35) |
Apr
(23) |
May
(46) |
Jun
(53) |
Jul
(18) |
Aug
(10) |
Sep
(24) |
Oct
(15) |
Nov
(40) |
Dec
(6) |
From: Ian T. <Ian...@pg...> - 2017-04-05 14:14:06
|
All Here’s a log with phc2sys output. This board ran for 11 hours without an error before this happened … Apr 5 09:18:21 localhost user.info phc2sys: [38796.187] rms 28 max 65 freq -242 +/- 1 delay 1099 +/- 11 Apr 5 09:20:21 localhost user.info phc2sys: [38916.211] rms 29 max 73 freq -242 +/- 3 delay 1100 +/- 12 Apr 5 09:20:40 localhost user.info ptp4l: [38934.698] rms 29 max 76 freq -242 +/- 6 delay 2029 +/- 4 Apr 5 09:20:52 localhost user.err ptp4l: [38946.961] timed out while polling for tx timestamp Apr 5 09:20:52 localhost user.err ptp4l: [38946.961] increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug Apr 5 09:20:52 localhost user.err ptp4l: [38946.962] Failure in transport_send(to)() Apr 5 09:20:52 localhost user.err ptp4l: [38946.962] port 1: send delay request failed Apr 5 09:20:52 localhost user.notice ptp4l: [38946.962] port 1: SLAVE to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED) Apr 5 09:20:52 localhost user.info phc2sys: [38947.222] port 2cee26.fffe.000189-1 changed state Apr 5 09:20:52 localhost user.info phc2sys: [38947.222] reconfiguring after port state change Apr 5 09:20:52 localhost user.info phc2sys: [38947.222] selecting eth0 for synchronization Apr 5 09:20:52 localhost user.info phc2sys: [38947.222] nothing to synchronize Apr 5 09:21:08 localhost user.notice ptp4l: [38962.962] port 1: FAULTY to LISTENING on INIT_COMPLETE Apr 5 09:21:08 localhost user.warn ptp4l: [38963.198] clockcheck: clock jumped backward or running slower than expected! Apr 5 09:21:10 localhost user.notice ptp4l: [38964.698] port 1: new foreign master 000cec.fffe.0a0f8d-1 Apr 5 09:21:14 localhost user.notice ptp4l: [38968.698] selected best master clock 000cec.fffe.0a0f8d Apr 5 09:21:14 localhost user.notice ptp4l: [38968.698] port 1: LISTENING to UNCALIBRATED on RS_SLAVE Apr 5 09:21:14 localhost user.info phc2sys: [38969.224] port 2cee26.fffe.000189-1 changed state Apr 5 09:21:14 localhost user.info phc2sys: [38969.224] reconfiguring after port state change Apr 5 09:21:14 localhost user.info phc2sys: [38969.224] master clock not ready, waiting... Apr 5 09:21:21 localhost user.notice ptp4l: [38975.698] port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED Apr 5 09:21:21 localhost user.info phc2sys: [38976.229] port 2cee26.fffe.000189-1 changed state Apr 5 09:21:21 localhost user.info phc2sys: [38976.230] reconfiguring after port state change Apr 5 09:21:21 localhost user.info phc2sys: [38976.230] selecting CLOCK_REALTIME for synchronization Apr 5 09:21:21 localhost user.info phc2sys: [38976.230] selecting eth0 as the master clock Apr 5 09:22:50 localhost user.info phc2sys: [39065.260] rms 34 max 88 freq -241 +/- 5 delay 1100 +/- 10 Apr 5 09:24:50 localhost user.info phc2sys: [39185.288] rms 28 max 67 freq -242 +/- 1 delay 1099 +/- 11 Apr 5 09:25:23 localhost user.info ptp4l: [39217.698] rms 3270369174 max 37000003713 freq -238 +/- 95 delay 2030 +/- 8 Apr 5 09:26:50 localhost user.info phc2sys: [39305.330] rms 29 max 70 freq -241 +/- 2 delay 1099 +/- 10 Apr 5 09:28:50 localhost user.info phc2sys: [39425.376] rms 29 max 71 freq -242 +/- 1 delay 1100 +/- 11 Apr 5 09:29:39 localhost user.info ptp4l: [39473.698] rms 29 max 75 freq -243 +/- 2 delay 2023 +/- 9 Ian T. From: David Mirabito [mailto:dav...@gm...] Sent: Tuesday, April 04, 2017 6:18 PM To: Ian Thompson Cc: lin...@li... Subject: [External] Re: [Linuxptp-users] clockcheck jumps forwards and backwards Hi, The device is a: 00:14.0 Ethernet controller: Intel Corporation Ethernet Connection I354 (rev 03) Using bash-4.3# ethtool -i ma2 driver: igb version: 5.3.0-k firmware-version: 0.0.0 expansion-rom-version: bus-info: 0000:00:14.1 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: no And: # ethtool -T ma2 Time stamping parameters for ma2: Capabilities: hardware-transmit (SOF_TIMESTAMPING_TX_HARDWARE) software-transmit (SOF_TIMESTAMPING_TX_SOFTWARE) hardware-receive (SOF_TIMESTAMPING_RX_HARDWARE) software-receive (SOF_TIMESTAMPING_RX_SOFTWARE) software-system-clock (SOF_TIMESTAMPING_SOFTWARE) hardware-raw-clock (SOF_TIMESTAMPING_RAW_HARDWARE) PTP Hardware Clock: 1 Hardware Transmit Timestamp Modes: off (HWTSTAMP_TX_OFF) on (HWTSTAMP_TX_ON) Hardware Receive Filter Modes: none (HWTSTAMP_FILTER_NONE) all (HWTSTAMP_FILTER_ALL) The config is: [global] slaveOnly 1 summary_interval 6 priority1 255 [ma2] And running as: /usr/sbin/ptp4l -f /etc/ptp4l.conf /usr/sbin/phc2sys -a -r -u 64 -n 5 We are running version 1.8, downloaded from the sourceforge mirror. It's built with openembedede/bitbake and their recipie defines some extra cflags, I can look iwhy these were deemed to be necessary or if they could affect anything: EXTRA_OEMAKE = "'CFLAGS=-D_GNU_SOURCE -DHAVE_CLOCK_ADJTIME -DHAVE_POSIX_SPAWN -DHAVE_ONESTEP_SYNC'" I will look into obtaining more verbose logs. For what it's worth, this exact same setup works elsewhere it is just this one physical setup that exhibits this, although unclear if the cause a physical fault or something about the network/master outside. Additionally, since Ian brought it up a) We do sometimes see tx timestamp timeouts too b) We also occasionally see UNEXPECTED_SYSWRAP messages from igb My understanding is that b) is an intel bug (bad per-device assumptions made in code regarding default state of PPS IRQ) on this HW and seems to be generally treated as benign. I do have a slight suspicion that a and b may be somehow related (backing out of the unexpected wrap IRQ 'forgets' to notice the available tx timestamp being ready?) but I have some digging to to on that front. I currently expect (although happy to be proven wrong) that both a) and b) are unrelated to the clockcheck jumps, since a+b happens readily and doesn't affect sync *too* badly, whereas constant clockcheck aborts happens only in one place and is apparently disastrous to sync quality. Cheers, and thanks for your replies, Dave On Wed, Apr 5, 2017 at 1:45 AM, Ian Thompson <Ian...@pg...<mailto:Ian...@pg...>> wrote: Possibly following on from David’s post. We have a system with 18 boards in a rack, each board has a Altera SoC with the STM Ethernet MAC connected via gigabit Ethernet to an Arista ptp-aware switch and then a Spectracom GrandMaster. The boards are running Linux kernel 3.15.0. They lock quickly after boot and can remain locked for several hours but usually any one of the boards may do the following … Apr 4 13:42:04 localhost user.info<https://urldefense.proofpoint.com/v2/url?u=http-3A__user.info&d=DwMFaQ&c=KV_I7O14pmwRcmAVyJ1eg4Jwb8Y2JAxuL5YgMGHpjcQ&r=zdHnydvzOnwuGQ--L90nq9WdYaiUdEVnfAroj9WKyYs&m=YrWhtlXVxxCzYLBrGRVRx46qRxB7vLHf6gtzNDah7es&s=fbUeH1NnQuOsYg0ca_GQ4d7QbQdPD0Q3K4UU8ucBTZY&e=> ptp4l: [537.164] rms 123 max 599 freq +255 +/- 39 delay 7362 +/- 48 Apr 4 13:42:29 localhost user.err ptp4l: [561.387] timed out while polling for tx timestamp Apr 4 13:42:29 localhost user.err ptp4l: [561.387] increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug Apr 4 13:42:29 localhost user.err ptp4l: [561.387] port 1: send delay request failed Apr 4 13:42:29 localhost user.notice ptp4l: [561.387] port 1: SLAVE to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED) Apr 4 13:42:45 localhost user.notice ptp4l: [577.388] port 1: FAULTY to LISTENING on FAULT_CLEARED Apr 4 13:42:45 localhost user.warn ptp4l: [577.414] clockcheck: clock jumped backward or running slower than expected! Apr 4 13:42:45 localhost user.notice ptp4l: [577.414] port 1: new foreign master 000cec.fffe.0a085d-1 Apr 4 13:42:47 localhost user.notice ptp4l: [579.414] selected best master clock 000cec.fffe.0a085d Apr 4 13:42:47 localhost user.notice ptp4l: [579.414] port 1: LISTENING to UNCALIBRATED on RS_SLAVE Apr 4 13:42:54 localhost user.notice ptp4l: [587.164] port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED Apr 4 13:46:46 localhost user.info<https://urldefense.proofpoint.com/v2/url?u=http-3A__user.info&d=DwMFaQ&c=KV_I7O14pmwRcmAVyJ1eg4Jwb8Y2JAxuL5YgMGHpjcQ&r=zdHnydvzOnwuGQ--L90nq9WdYaiUdEVnfAroj9WKyYs&m=YrWhtlXVxxCzYLBrGRVRx46qRxB7vLHf6gtzNDah7es&s=fbUeH1NnQuOsYg0ca_GQ4d7QbQdPD0Q3K4UU8ucBTZY&e=> ptp4l: [818.414] rms 2312500092 max 37000001557 freq +246 +/- 250 delay 7358 +/- 46 Apr 4 13:51:02 localhost user.info<https://urldefense.proofpoint.com/v2/url?u=http-3A__user.info&d=DwMFaQ&c=KV_I7O14pmwRcmAVyJ1eg4Jwb8Y2JAxuL5YgMGHpjcQ&r=zdHnydvzOnwuGQ--L90nq9WdYaiUdEVnfAroj9WKyYs&m=YrWhtlXVxxCzYLBrGRVRx46qRxB7vLHf6gtzNDah7es&s=fbUeH1NnQuOsYg0ca_GQ4d7QbQdPD0Q3K4UU8ucBTZY&e=> ptp4l: [1074.413] rms 116 max 681 freq +256 +/- 48 delay 7373 +/- 88 Does this imply that one lost delay request can do this, or is there a retry mechanism? Notice that the system recovers but we can’t afford the large timing glitch that gets introduced. We have a lot of traffic leaving the boards but only PTP traffic coming in. As we increase the off board transfer rates the problem seems to occur more often. Thanks for any help, Ian T. |
From: David M. <dav...@gm...> - 2017-04-04 23:18:17
|
Hi, The device is a: 00:14.0 Ethernet controller: Intel Corporation Ethernet Connection I354 (rev 03) Using bash-4.3# ethtool -i ma2 driver: igb version: 5.3.0-k firmware-version: 0.0.0 expansion-rom-version: bus-info: 0000:00:14.1 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: no And: # ethtool -T ma2 Time stamping parameters for ma2: Capabilities: hardware-transmit (SOF_TIMESTAMPING_TX_HARDWARE) software-transmit (SOF_TIMESTAMPING_TX_SOFTWARE) hardware-receive (SOF_TIMESTAMPING_RX_HARDWARE) software-receive (SOF_TIMESTAMPING_RX_SOFTWARE) software-system-clock (SOF_TIMESTAMPING_SOFTWARE) hardware-raw-clock (SOF_TIMESTAMPING_RAW_HARDWARE) PTP Hardware Clock: 1 Hardware Transmit Timestamp Modes: off (HWTSTAMP_TX_OFF) on (HWTSTAMP_TX_ON) Hardware Receive Filter Modes: none (HWTSTAMP_FILTER_NONE) all (HWTSTAMP_FILTER_ALL) The config is: [global] slaveOnly 1 summary_interval 6 priority1 255 [ma2] And running as: /usr/sbin/ptp4l -f /etc/ptp4l.conf /usr/sbin/phc2sys -a -r -u 64 -n 5 We are running version 1.8, downloaded from the sourceforge mirror. It's built with openembedede/bitbake and their recipie defines some extra cflags, I can look iwhy these were deemed to be necessary or if they could affect anything: EXTRA_OEMAKE = "'CFLAGS=-D_GNU_SOURCE -DHAVE_CLOCK_ADJTIME -DHAVE_POSIX_SPAWN -DHAVE_ONESTEP_SYNC'" I will look into obtaining more verbose logs. For what it's worth, this exact same setup works elsewhere it is just this one physical setup that exhibits this, although unclear if the cause a physical fault or something about the network/master outside. Additionally, since Ian brought it up a) We do sometimes see tx timestamp timeouts too b) We also occasionally see UNEXPECTED_SYSWRAP messages from igb My understanding is that b) is an intel bug (bad per-device assumptions made in code regarding default state of PPS IRQ) on this HW and seems to be generally treated as benign. I do have a slight suspicion that a and b may be somehow related (backing out of the unexpected wrap IRQ 'forgets' to notice the available tx timestamp being ready?) but I have some digging to to on that front. I currently expect (although happy to be proven wrong) that both a) and b) are unrelated to the clockcheck jumps, since a+b happens readily and doesn't affect sync *too* badly, whereas constant clockcheck aborts happens only in one place and is apparently disastrous to sync quality. Cheers, and thanks for your replies, Dave On Wed, Apr 5, 2017 at 1:45 AM, Ian Thompson <Ian...@pg...> wrote: > Possibly following on from David’s post. > > > > We have a system with 18 boards in a rack, each board has a Altera SoC > with the STM Ethernet MAC connected via gigabit Ethernet to an Arista > ptp-aware switch and then a Spectracom GrandMaster. > > The boards are running Linux kernel 3.15.0. > > > > They lock quickly after boot and can remain locked for several hours but > usually any one of the boards may do the following … > > > > Apr 4 13:42:04 localhost user.info ptp4l: [537.164] rms 123 max 599 > freq +255 +/- 39 delay 7362 +/- 48 > > Apr 4 13:42:29 localhost user.err ptp4l: [561.387] timed out while > polling for tx timestamp > > Apr 4 13:42:29 localhost user.err ptp4l: [561.387] increasing > tx_timestamp_timeout may correct this issue, but it is likely caused by a > driver bug > > Apr 4 13:42:29 localhost user.err ptp4l: [561.387] port 1: send delay > request failed > > Apr 4 13:42:29 localhost user.notice ptp4l: [561.387] port 1: SLAVE to > FAULTY on FAULT_DETECTED (FT_UNSPECIFIED) > > Apr 4 13:42:45 localhost user.notice ptp4l: [577.388] port 1: FAULTY to > LISTENING on FAULT_CLEARED > > Apr 4 13:42:45 localhost user.warn ptp4l: [577.414] clockcheck: clock > jumped backward or running slower than expected! > > Apr 4 13:42:45 localhost user.notice ptp4l: [577.414] port 1: new foreign > master 000cec.fffe.0a085d-1 > > Apr 4 13:42:47 localhost user.notice ptp4l: [579.414] selected best > master clock 000cec.fffe.0a085d > > Apr 4 13:42:47 localhost user.notice ptp4l: [579.414] port 1: LISTENING > to UNCALIBRATED on RS_SLAVE > > Apr 4 13:42:54 localhost user.notice ptp4l: [587.164] port 1: > UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED > > Apr 4 13:46:46 localhost user.info ptp4l: [818.414] rms 2312500092 max > 37000001557 freq +246 +/- 250 delay 7358 +/- 46 > > Apr 4 13:51:02 localhost user.info ptp4l: [1074.413] rms 116 max 681 > freq +256 +/- 48 delay 7373 +/- 88 > > > > Does this imply that one lost delay request can do this, or is there a > retry mechanism? > > Notice that the system recovers but we can’t afford the large timing > glitch that gets introduced. > > We have a lot of traffic leaving the boards but only PTP traffic coming > in. As we increase the off board transfer rates the problem seems to occur > more often. > > > > Thanks for any help, > > Ian T. > > > > > |
From: Ian T. <Ian...@pg...> - 2017-04-04 16:07:41
|
Possibly following on from David’s post. We have a system with 18 boards in a rack, each board has a Altera SoC with the STM Ethernet MAC connected via gigabit Ethernet to an Arista ptp-aware switch and then a Spectracom GrandMaster. The boards are running Linux kernel 3.15.0. They lock quickly after boot and can remain locked for several hours but usually any one of the boards may do the following … Apr 4 13:42:04 localhost user.info ptp4l: [537.164] rms 123 max 599 freq +255 +/- 39 delay 7362 +/- 48 Apr 4 13:42:29 localhost user.err ptp4l: [561.387] timed out while polling for tx timestamp Apr 4 13:42:29 localhost user.err ptp4l: [561.387] increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug Apr 4 13:42:29 localhost user.err ptp4l: [561.387] port 1: send delay request failed Apr 4 13:42:29 localhost user.notice ptp4l: [561.387] port 1: SLAVE to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED) Apr 4 13:42:45 localhost user.notice ptp4l: [577.388] port 1: FAULTY to LISTENING on FAULT_CLEARED Apr 4 13:42:45 localhost user.warn ptp4l: [577.414] clockcheck: clock jumped backward or running slower than expected! Apr 4 13:42:45 localhost user.notice ptp4l: [577.414] port 1: new foreign master 000cec.fffe.0a085d-1 Apr 4 13:42:47 localhost user.notice ptp4l: [579.414] selected best master clock 000cec.fffe.0a085d Apr 4 13:42:47 localhost user.notice ptp4l: [579.414] port 1: LISTENING to UNCALIBRATED on RS_SLAVE Apr 4 13:42:54 localhost user.notice ptp4l: [587.164] port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED Apr 4 13:46:46 localhost user.info ptp4l: [818.414] rms 2312500092 max 37000001557 freq +246 +/- 250 delay 7358 +/- 46 Apr 4 13:51:02 localhost user.info ptp4l: [1074.413] rms 116 max 681 freq +256 +/- 48 delay 7373 +/- 88 Does this imply that one lost delay request can do this, or is there a retry mechanism? Notice that the system recovers but we can’t afford the large timing glitch that gets introduced. We have a lot of traffic leaving the boards but only PTP traffic coming in. As we increase the off board transfer rates the problem seems to occur more often. Thanks for any help, Ian T. |
From: Miroslav L. <mli...@re...> - 2017-04-04 10:58:02
|
On Tue, Apr 04, 2017 at 06:52:21PM +1000, David Mirabito wrote: > Is it safe to assume, that given a crappy PTP network ptp4l would just > degrade to ntp-like performance at worst? Or are there some other > thresholds or sanity checks which would cause it to throw in the towel in > situations where NTP would keep trucking, being designed for such > situations and necessarily more robust? No, NTP would generally perform better in non-ideal conditions (e.g. busy network using switches without PTP support), but PTP should still work. What you describe looks like a bug to me. > Mar 19 09:06:54 user.info phc2sys: [1337052.328] reconfiguring after port > state change > Mar 19 09:06:54 user.info phc2sys: [1337052.328] selecting CLOCK_REALTIME > for synchronization > Mar 19 09:06:54 user.info phc2sys: [1337052.328] selecting eth2 as the > master clock > Mar 19 09:06:56 user.warning ptp4l: [1337054.329] clockcheck: clock jumped > backward or running slower than expected! > Mar 19 09:06:56 user.notice ptp4l: [1337054.331] port 1: SLAVE to > UNCALIBRATED on SYNCHRONIZATION_FAULT > Mar 19 09:06:56 user.warning ptp4l: [1337054.562] clockcheck: clock jumped > forward or running faster than expected! > Is this a possible failure mode of a network / master that is just too poor > to survive? A broken master shouldn't trigger the clockcheck messages on slaves. I suspect the phc2sys process is adjusting/stepping the PHC when it shouldn't, possibly triggering some feedback loop between ptp4l and phc2sys. What HW/driver and linuxptp version is this? Can you please post your ptp4l config and command-line options used for ptp4l and phc2sys? Having full logs with measurements (-l 6) might help too. -- Miroslav Lichvar |
From: David M. <dav...@gm...> - 2017-04-04 08:52:29
|
Hello List, I understand PTP in general needs stable/symmetrical round-trip times to be at it's best and that things like non-PTP-aware switches will introduce noise. It makes sense that as the network gets less deterministic and symmetrical then PTP will necessarily perform worse - probably both in terms of the self-reported rms/max offsets as well as as measured by some external "correct" reference. Is it safe to assume, that given a crappy PTP network ptp4l would just degrade to ntp-like performance at worst? Or are there some other thresholds or sanity checks which would cause it to throw in the towel in situations where NTP would keep trucking, being designed for such situations and necessarily more robust? I have one machine which is admittedly on a not-ideal PTP network, but with some bizzare logs I'm having difficulty understanding: Mar 19 09:06:50 user.notice ptp4l: [1337049.193] selected best master clock 94bc40 Mar 19 09:06:50 user.warning ptp4l: [1337049.193] running in a temporal vortex Mar 19 09:06:50 user.notice ptp4l: [1337049.193] port 1: LISTENING to UNCALIBRATED on RS_SLAVE Mar 19 09:06:51 user.info phc2sys: [1337049.328] port d24202-1 changed state Mar 19 09:06:51 user.info phc2sys: [1337049.328] reconfiguring after port state change Mar 19 09:06:51 user.info phc2sys: [1337049.328] master clock not ready, waiting... Mar 19 09:06:52 user.warning ptp4l: [1337050.328] clockcheck: clock jumped backward or running slower than expected! Mar 19 09:06:52 user.warning ptp4l: [1337050.520] clockcheck: clock jumped forward or running faster than expected! Mar 19 09:06:53 user.notice ptp4l: [1337051.884] port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED Mar 19 09:06:54 user.info phc2sys: [1337052.328] port d24202-1 changed state Mar 19 09:06:54 user.info phc2sys: [1337052.328] reconfiguring after port state change Mar 19 09:06:54 user.info phc2sys: [1337052.328] selecting CLOCK_REALTIME for synchronization Mar 19 09:06:54 user.info phc2sys: [1337052.328] selecting eth2 as the master clock Mar 19 09:06:56 user.warning ptp4l: [1337054.329] clockcheck: clock jumped backward or running slower than expected! Mar 19 09:06:56 user.notice ptp4l: [1337054.331] port 1: SLAVE to UNCALIBRATED on SYNCHRONIZATION_FAULT Mar 19 09:06:56 user.warning ptp4l: [1337054.562] clockcheck: clock jumped forward or running faster than expected! Mar 19 09:06:57 user.info phc2sys: [1337055.329] port d24202-1 changed state Mar 19 09:06:57 user.info phc2sys: [1337055.329] reconfiguring after port state change Mar 19 09:06:57 user.info phc2sys: [1337055.329] master clock not ready, waiting... Mar 19 09:06:57 user.notice ptp4l: [1337055.887] port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED Mar 19 09:06:58 user.info phc2sys: [1337056.329] port d24202-1 changed state Mar 19 09:06:58 user.info phc2sys: [1337056.329] reconfiguring after port state change Mar 19 09:06:58 user.info phc2sys: [1337056.329] selecting CLOCK_REALTIME for synchronization Mar 19 09:06:58 user.info phc2sys: [1337056.329] selecting eth2 as the master clock Is this a possible failure mode of a network / master that is just too poor to survive? In particular the clock check differences - swinging by +-20% from the system time *within the same second*? Given we are also running phc2sys I am wondering if we've hit some case where the system clock is lagging the PHC and there's some resonance between this lag and the swings of the PHC? So system clock would speed up to try catch the PHC and be at it's fastest just as the PHC has turned around and been at it's slowest; so the system clock then slows down. I have an inkling this could be additive over time and the swings get larger and larger until each is around 10% adjusted frequency to "true" which makes them ~20% from one another and clockcheck bites. Is this a reasonable explanation? Alternatively, are there other things which could cause near-simultaneous clockcheck faster and slower complaints? The config is pretty vanilla, just UDP in slave-only mode on a single port. The network is not ideal for PTP, but I was hoping it would at least truck along, even if not with record-breaking accuracy. Analysing ~5000 status log mesages from ptp4l (in between state machine resets) shows * freq: mean 42714 stddev 406 * delay: mean 3577 stddev 40 * err_rms mean: 1005 stddev 10058 (nb: mean/dev of an already RMS term might not be meaningful, and this is super-spiky probably to always hitting fault mode and re-starting) Am happy to try provide more info or run tests if there are any ideas or suggestions. Thanks, David |
From: Šimon W. <sim...@se...> - 2017-03-25 11:37:52
|
Thank you for the replies, On Fri, 24 Mar 2017 12:17:23 -0700 "Gary E. Miller" <ge...@re...> wrote: > On Fri, 24 Mar 2017 20:09:29 +0100 > Richard Cochran <ric...@gm...> wrote: > > Wireless PTP is not going to work very well. I recommend using NTP > > instead. > > Let us not make this a PTP vs. NTP thing. There is ongoing work to > use the low level PTP mechanisms with NTP. They can work together, > and in the near future they will work even better together. > > But I agree, you will get much better results with the RasPi ethernet > over its WiFi. Maybe this user does not have the wired option. I am working on a project where the raspberries are built into racing slot cars and an access point, so I can't be extremely strict on the level of accuracy. PTP still sounds like the better option so I will test it using PPS signal and an osciloscope and tweak the servo and filter parameters. Simon Wernisch |
From: Gary E. M. <ge...@re...> - 2017-03-24 19:32:09
|
Yo Richard! On Fri, 24 Mar 2017 20:09:29 +0100 Richard Cochran <ric...@gm...> wrote: > On Fri, Mar 24, 2017 at 06:44:30PM +0100, Šimon Wernisch wrote: > > Hello, > > I am trying to use ptp4l with software timestamping on my raspberry > > pi with a RT5370 wireless adapter. The driver is rt2800usb found in > > drivers/net/wireless/ralink/rt2x00 of the raspberry kernel and it > > doesn't support software timestamp transmitting. > > Wireless PTP is not going to work very well. I recommend using NTP > instead. Let us not make this a PTP vs. NTP thing. There is ongoing work to use the low level PTP mechanisms with NTP. They can work together, and in the near future they will work even better together. But I agree, you will get much better results with the RasPi ethernet over its WiFi. Maybe this user does not have the wired option. RGDS GARY --------------------------------------------------------------------------- Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97703 ge...@re... Tel:+1 541 382 8588 Veritas liberabit vos. -- Quid est veritas? "If you can’t measure it, you can’t improve it." - Lord Kelvin |
From: Richard C. <ric...@gm...> - 2017-03-24 19:09:51
|
On Fri, Mar 24, 2017 at 06:44:30PM +0100, Šimon Wernisch wrote: > Hello, > I am trying to use ptp4l with software timestamping on my raspberry pi > with a RT5370 wireless adapter. The driver is rt2800usb found in > drivers/net/wireless/ralink/rt2x00 of the raspberry kernel and it > doesn't support software timestamp transmitting. Wireless PTP is not going to work very well. I recommend using NTP instead. > I have added skb_tx_timestamp(skb) into the transmit path and a > ethtool_ops struct. Ethtool still reports no software transmitting. It is not enough just to add the ethtool structure... > +static const struct ethtool_ops rt2x00_ethtool_ops = { > + .get_ts_info = ethtool_op_get_ts_info, > + .get_link = ethtool_op_get_link > +}; You also must use it in the driver's net_device instance, for example: ndev->ethtool_ops = &rt2x00_ethtool_ops; HTH, Richard |
From: Šimon W. <sim...@se...> - 2017-03-24 18:12:05
|
Hello, I am trying to use ptp4l with software timestamping on my raspberry pi with a RT5370 wireless adapter. The driver is rt2800usb found in drivers/net/wireless/ralink/rt2x00 of the raspberry kernel and it doesn't support software timestamp transmitting. I have added skb_tx_timestamp(skb) into the transmit path and a ethtool_ops struct. Ethtool still reports no software transmitting. Could I ask for help with this driver or general instructions? Thanks, Simon Wernisch |
From: Richard C. <ric...@gm...> - 2017-03-24 14:15:06
|
On Fri, Mar 24, 2017 at 10:44:20AM +0800, Hardik Gohil wrote: > root@phycore-am335x-1:~# ptp4l -i eth0 -m -P -s Try adding -2 (layer2 transport) to the command line. It looks like the CPTS does not recognize UDP P2P packets. You should ask TI about this... Thanks, Richard |
From: Richard C. <ric...@gm...> - 2017-03-24 13:59:42
|
On Fri, Mar 24, 2017 at 07:48:41AM +0100, Richard Cochran wrote: > Please try to reproduce the issue using a mainline kernel. I can reproduce this on a BBB on mainline 3.14.33. I honestly can't remember whether P2P ever worked on the CPTS, but I think it did, IIRC. I'll see if I can track this done... Thanks, Richard |
From: Richard C. <ric...@gm...> - 2017-03-24 06:49:06
|
On Fri, Mar 24, 2017 at 02:40:33PM +0800, Hardik Gohil wrote: > It is a vendor kernel from PHYTEC. Please try to reproduce the issue using a mainline kernel. Thanks, Richard |
From: Hardik G. <har...@gm...> - 2017-03-24 06:40:44
|
Hello , It is a vendor kernel from PHYTEC. Regards, Hardik A Gohil On Fri, Mar 24, 2017 at 2:27 PM, Richard Cochran <ric...@gm...> wrote: > On Fri, Mar 24, 2017 at 10:44:20AM +0800, Hardik Gohil wrote: > > I am working on Linux 3.12 running on TI AM335x. > > Is this a mainline kernel, or a vendor kernel? > > Thanks, > Richard > |
From: Richard C. <ric...@gm...> - 2017-03-24 06:27:59
|
On Fri, Mar 24, 2017 at 10:44:20AM +0800, Hardik Gohil wrote: > I am working on Linux 3.12 running on TI AM335x. Is this a mainline kernel, or a vendor kernel? Thanks, Richard |
From: Hardik G. <har...@gm...> - 2017-03-24 02:44:27
|
Hello, I am working on Linux 3.12 running on TI AM335x. The Test system is GPS connected to CPU over Ethernet and GPS is configured to PEER to PEER mode. I can successfully test using E2E mode there were no issues. root@phycore-am335x-1:~# ptp4l -i eth0 -m -P -s ptp4l[2103.623]: selected /dev/ptp0 as PTP clock ptp4l[2103.653]: port 1: INITIALIZING to LISTENING on INITIALIZE ptp4l[2103.657]: port 0: INITIALIZING to LISTENING on INITIALIZE ptp4l[2103.724]: port 1: new foreign master 001d7f.fffe.8002f7-1 ptp4l[2103.849]: port 1: received PDELAY_REQ without timestamp ptp4l[2104.658]: timed out while polling for tx timestamp ptp4l[2104.659]: increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug ptp4l[2104.659]: port 1: send peer delay request failed ptp4l[2104.659]: port 1: LISTENING to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED) ptp4l[2120.707]: port 1: FAULTY to LISTENING on FAULT_CLEARED ptp4l[2120.865]: port 1: received PDELAY_REQ without timestamp ptp4l[2121.714]: timed out while polling for tx timestamp ptp4l[2121.715]: increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug ptp4l[2121.715]: port 1: send peer delay request failed ptp4l[2121.715]: port 1: LISTENING to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED) Regards, Hardik A Gohil |
From: Joerg C. (LWE) <Chr...@li...> - 2017-03-09 07:31:10
|
> Sent: Tuesday, March 07, 2017 9:37 AM > > On Tue, Mar 07, 2017 at 07:57:46AM +0000, Joerg Christian (LWE) wrote: > > When I stepped the master clock I had this result on the slave, no > > matter what value I use for fault_reset_interval. I have always the s0 for 16 > seconds. > > It's not related to the fault reset interval. In the s0 state the PI servo is measuring > frequency of the clock and the length of the interval is inversely proportional to > the I constant. If you need to reduce the interval, you can use a larger I constant, > or use a different servo. OK, that makes sense. I now tested the servos pi, linreg, ntpshm and nullf. What I need would be a mix of nullf and pi/linreg. When the deviation is smaller than the step_threshold change the clock frequency. When the deviation is bigger than the step_threshold go immediately to s1 and step the clock. Are there other servos available than the above mentioned? |
From: Richard C. <ric...@gm...> - 2017-03-09 06:02:05
|
On Wed, Mar 08, 2017 at 11:23:51PM +0000, Keller, Jacob E wrote: > I'm not sure. I've seen this message a few times and haven't been able to reproduce it reliably enough. It shouldn't impact the functionality, and as long as its only intermittent drops it's not a big deal. I have also seen this occasionally on the i210. I *think* the i210 should never drop a time stamp, in theory, but there obviously is some issue. It is rare, and so it hasn't bothered me. Other HW drop time stamps by design! Thanks, Richard |
From: Keller, J. E <jac...@in...> - 2017-03-08 23:23:59
|
> -----Original Message----- > From: Tino Mettler [mailto:tin...@al...] > Sent: Tuesday, March 07, 2017 1:24 AM > To: 'linuxptp-users' <lin...@li...> > Subject: [Linuxptp-users] Received packets without timestamp > > Hi, > > I get the following messages a few times per day: > > port 1: received DELAY_REQ without timestamp > port 1: received SYNC without timestamp > > According to the source, this means that the kernel driver did not > supply a timestamp from the PHC for that packets. The NIC is an Intel > I219-LM. I did not find anything useful in Google regarding this. Is > there anything I can do to avoid this? > > Regards, > Tino > I'm not sure. I've seen this message a few times and haven't been able to reproduce it reliably enough. It shouldn't impact the functionality, and as long as its only intermittent drops it's not a big deal. Thanks, Jake |
From: Tino M. <tin...@al...> - 2017-03-07 09:24:03
|
Hi, I get the following messages a few times per day: port 1: received DELAY_REQ without timestamp port 1: received SYNC without timestamp According to the source, this means that the kernel driver did not supply a timestamp from the PHC for that packets. The NIC is an Intel I219-LM. I did not find anything useful in Google regarding this. Is there anything I can do to avoid this? Regards, Tino |
From: Miroslav L. <mli...@re...> - 2017-03-07 08:37:18
|
On Tue, Mar 07, 2017 at 07:57:46AM +0000, Joerg Christian (LWE) wrote: > When I stepped the master clock I had this result on the slave, no matter what value I use for > fault_reset_interval. I have always the s0 for 16 seconds. It's not related to the fault reset interval. In the s0 state the PI servo is measuring frequency of the clock and the length of the interval is inversely proportional to the I constant. If you need to reduce the interval, you can use a larger I constant, or use a different servo. -- Miroslav Lichvar |
From: Joerg C. (LWE) <Chr...@li...> - 2017-03-07 07:57:56
|
I have implemented time synchronization with ptp4l using software timestamping. My requirement is to sync the slaves very fast after the master clock steps. How can I make the slaves step immediately after they recognize a step of the master clock? I tried this config file for the slave: [global] verbose 1 time_stamping software slaveOnly 1 step_threshold 0.001 first_step_threshold 0.00001 fault_reset_interval -128 [eth0.2] When I stepped the master clock I had this result on the slave, no matter what value I use for fault_reset_interval. I have always the s0 for 16 seconds. ptp4l[20016.779]: master offset -838788 s2 freq -77766 path delay 37419 ptp4l[20017.778]: master offset -1251433 s0 freq -77766 path delay 37419 ptp4l[20017.779]: port 1: SLAVE to UNCALIBRATED on SYNCHRONIZATION_FAULT ptp4l[20018.778]: master offset -1669199 s0 freq -77766 path delay 37454 ptp4l[20019.778]: master offset -2119251 s0 freq -77766 path delay 73117 ptp4l[20020.777]: master offset -2544222 s0 freq -77766 path delay 81722 ptp4l[20021.777]: master offset -2959934 s0 freq -77766 path delay 86377 ptp4l[20022.777]: master offset -3370656 s0 freq -77766 path delay 86377 ptp4l[20023.776]: master offset -3827576 s0 freq -77766 path delay 126573 ptp4l[20024.776]: master offset -4256688 s0 freq -77766 path delay 174966 ptp4l[20025.776]: master offset -4702737 s0 freq -77766 path delay 174966 ptp4l[20026.775]: master offset -5128022 s0 freq -77766 path delay 180869 ptp4l[20027.775]: master offset -5539745 s0 freq -77766 path delay 180869 ptp4l[20028.775]: master offset -5926680 s0 freq -77766 path delay 150756 ptp4l[20029.774]: master offset -6337803 s0 freq -77766 path delay 148828 ptp4l[20030.774]: master offset -6754509 s0 freq -77766 path delay 148828 ptp4l[20031.774]: master offset -7167231 s0 freq -77766 path delay 148828 ptp4l[20032.773]: master offset -7584545 s0 freq -77766 path delay 150756 ptp4l[20033.773]: master offset -7977596 s0 freq -77766 path delay 150756 ptp4l[20034.773]: master offset -8410324 s1 freq -415350 path delay 150756 ptp4l[20036.773]: master offset -158698 s2 freq -431379 path delay 145125 ptp4l[20036.773]: port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED ptp4l[20037.773]: master offset -217699 s2 freq -437496 path delay 145125 ptp4l[20038.773]: master offset -242286 s2 freq -440197 path delay 118919 Regards Christian --- b2bemail4liebherr |
From: Jan D. <jan...@gm...> - 2017-03-02 08:54:27
|
2017-03-02 6:36 GMT+01:00 Richard Cochran <ric...@gm...>: > On Wed, Mar 01, 2017 at 01:04:48PM -0000, Hugh Reynolds wrote: > > The hardware timestamp counter seems to be stuck. > > > > We can set it using phc_ctl, but it doesn't count. > > Sounds like a missing peripheral clock enable. > > I have never used or tested the imx6 ptp driver, and so I don't have > any specific wisdom to offer. In general I find the freescale^Wnxp > kernel SW to be of dubious quality, and so my only advice is to go > through every last register bit and make sure their code is correct. > I am currently working with two i.MX6 UL Eval Boards. I tried with one of the boards as master and the other one as slave as well as my workstation as master and the boards as slaves. It works but I didn't do any comparison with other setups. I had more problems with the drivers for the onboard network interface in my workstation. Now I use the Intel i210. I can not say much about quality of the NXP/Freescale driver code though. To date, I just skimmed some code to get a rough understanding how their PTP support is implemented. Best Jan |
From: Richard C. <ric...@gm...> - 2017-03-02 05:37:06
|
On Wed, Mar 01, 2017 at 01:04:48PM -0000, Hugh Reynolds wrote: > The hardware timestamp counter seems to be stuck. > > We can set it using phc_ctl, but it doesn't count. Sounds like a missing peripheral clock enable. I have never used or tested the imx6 ptp driver, and so I don't have any specific wisdom to offer. In general I find the freescale^Wnxp kernel SW to be of dubious quality, and so my only advice is to go through every last register bit and make sure their code is correct. <rant> Their vendor kernels are particularly bad. On one project a few years back, the Ethernet port stopped working with freescale's newer kernel. Turns out that they unconditionally enabled the PTP clock output on one of the MII input lines. This was done with a one-liner buried deep within the board setup code under arch/arm/mach. By the time I figured this out, the PHY was toast. But hey, the code worked on their development kit, so who can complain? </rant> Thanks, Richard |
From: Hugh R. <hu...@co...> - 2017-03-01 13:05:00
|
Perhaps this is not the correct list to post this question but we are trying to use PTP4L on an i.MX6. The hardware timestamp counter seems to be stuck. We can set it using phc_ctl, but it doesn't count. On the i.MX6 configuration: The GPR1 bits are being set in init imx6q_1588_init(void) from the mach-imx6q.c file........we've added debug so we know that line of code is being executed. Our dtsi file includes the line MX6QDL_PAD_GPIO_16__ENET_REF_CLK 0x4001b0b1 What else do I need to do? Any help would be greatly appreciated. Regards Hugh --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus |
From: Norbert L. <nol...@gm...> - 2017-03-01 09:06:08
|
Good, any estimate when this ends up in Linux 4.9? Also, is the other issue [1] looked at? Just want to know if I need to poke somewhere / someone, the mailing list seems to more of a patch tracker. Kind Regards, Norbert Lange [1] - http://lists.osuosl.org/pipermail/intel-wired-lan/Week-of-Mon-20170220/008171.html 2017-02-28 22:07 GMT+01:00 Keller, Jacob E <jac...@in...>: >> -----Original Message----- >> From: Richard Cochran [mailto:ric...@gm...] >> Sent: Sunday, February 26, 2017 2:53 AM >> To: Norbert Lange <nol...@gm...> >> Cc: Keller, Jacob E <jac...@in...>; linuxptp- >> us...@li... >> Subject: Re: [Linuxptp-users] PTP Clock completely wrong on an Intel 82579L >> >> On Thu, Feb 23, 2017 at 11:20:26AM +0100, Norbert Lange wrote: >> > Yeah, I seen that matrix, and both NICs are on that list. Still have >> > some rather serious issues, one is completely useless (82579LM, e1000e >> > driver) and the other (i354, igb driver) apparently wraps the timer >> > very 18 minutes, losing their master role. >> >> Looks like you aren't the only one with this issue: >> >> https://www.spinics.net/lists/kernel/msg2446767.html >> >> So it does look like a bug where driver the driver selects the wrong >> clock values. >> >> Cheers, >> Richard > > Yep that's what it sounded like. Glad that appears resolved now. > > Regards, > Jake |