Thread: [Linuxptp-users] Observing below issue with ptp4l running in BC mode
PTP IEEE 1588 stack for Linux
Brought to you by:
rcochran
|
From: ramesh t <ram...@ya...> - 2022-01-05 19:34:20
|
hi, Running ptp4l in BC mode, with clock_type set to BC and boundary_clock_jbod set to 1. Have connected one NIC port to BC/GM and another NIC port to testing unit (TU). Following is the ptp4l command. ./ptp4l -2 -A -w -i <port connected to BC/GM> -i <port connected to TU> -f /etc/ptp4l_bc.c In BC mode, ptp4l getting clock from BC/GM and is providing clock to testing unit. Also running below phc2sys command: phc2sys -s <port to connected to BC/GM> -c CLOCK_REALTIME -w -r -n 24 phc2sys -s <port to connected to BC/GM> -c <port connected to TU> -r -w -n 24 On resetting of the testing unit, observing ptp port state of the interface connected to BC/GM going to PS_FAULTY temporarily though it shouldn't be impacted due to reset of testing unit. Verified the same behavior is seen with latest linuxptp4l code. Please suggest. regards, Ramesh |
|
From: Richard C. <ric...@gm...> - 2022-01-05 20:01:47
|
On Wed, Jan 05, 2022 at 07:34:07PM +0000, ramesh t wrote: > hi, > > Running ptp4l in BC mode, with clock_type set to BC and boundary_clock_jbod set to 1. Have connected one NIC port to BC/GM and another NIC port to testing unit (TU). Following is the ptp4l command. > > ./ptp4l -2 -A -w -i <port connected to BC/GM> -i <port connected to TU> -f /etc/ptp4l_bc.c > > In BC mode, ptp4l getting clock from BC/GM and is providing clock to testing unit. > > Also running below phc2sys command: > phc2sys -s <port to connected to BC/GM> -c CLOCK_REALTIME -w -r -n 24 > phc2sys -s <port to connected to BC/GM> -c <port connected to TU> -r -w -n 24 You should use the phc2sys automatic mode (-a) when using boundary_clock_jbod. Thanks, Richard |
|
From: ramesh t <ram...@ya...> - 2022-01-06 01:40:43
|
hi Richard, Using option -a will automatically time sync all the PHC devices in the system. Is there a way to sync only those interface which are part of BC configuration (port connected to Testing Unit)?? Using -a option, will it help to solve this issue seen? Because ptp4l is going into PS_FAULTY? Please suggest. regards, Ramesh On Thursday, January 6, 2022, 01:33:54 AM GMT+5:30, Richard Cochran <ric...@gm...> wrote: On Wed, Jan 05, 2022 at 07:34:07PM +0000, ramesh t wrote: > hi, > > Running ptp4l in BC mode, with clock_type set to BC and boundary_clock_jbod set to 1. Have connected one NIC port to BC/GM and another NIC port to testing unit (TU). Following is the ptp4l command. > > ./ptp4l -2 -A -w -i <port connected to BC/GM> -i <port connected to TU> -f /etc/ptp4l_bc.c > > In BC mode, ptp4l getting clock from BC/GM and is providing clock to testing unit. > > Also running below phc2sys command: > phc2sys -s <port to connected to BC/GM> -c CLOCK_REALTIME -w -r -n 24 > phc2sys -s <port to connected to BC/GM> -c <port connected to TU> -r -w -n 24 You should use the phc2sys automatic mode (-a) when using boundary_clock_jbod. Thanks, Richard |
|
From: ramesh t <ram...@ya...> - 2022-01-06 02:12:34
|
hi Richard, >>Using option -a will automatically time sync all the PHC devices in the system. >>Is there a way to sync only those interface which are part of BC configuration (port connected to Testing Unit)?? Please ignore above question. Repeated the testing with phc2sys automatic mode (-a), still same issue is seen. (Used command: phc2sys -a -r -n 24) Please suggest. regards, Ramesh On Thursday, January 6, 2022, 07:10:20 AM GMT+5:30, ramesh t <ram...@ya...> wrote: hi Richard, Using option -a will automatically time sync all the PHC devices in the system. Is there a way to sync only those interface which are part of BC configuration (port connected to Testing Unit)?? Using -a option, will it help to solve this issue seen? Because ptp4l is going into PS_FAULTY? Please suggest. regards, Ramesh On Thursday, January 6, 2022, 01:33:54 AM GMT+5:30, Richard Cochran <ric...@gm...> wrote: On Wed, Jan 05, 2022 at 07:34:07PM +0000, ramesh t wrote: > hi, > > Running ptp4l in BC mode, with clock_type set to BC and boundary_clock_jbod set to 1. Have connected one NIC port to BC/GM and another NIC port to testing unit (TU). Following is the ptp4l command. > > ./ptp4l -2 -A -w -i <port connected to BC/GM> -i <port connected to TU> -f /etc/ptp4l_bc.c > > In BC mode, ptp4l getting clock from BC/GM and is providing clock to testing unit. > > Also running below phc2sys command: > phc2sys -s <port to connected to BC/GM> -c CLOCK_REALTIME -w -r -n 24 > phc2sys -s <port to connected to BC/GM> -c <port connected to TU> -r -w -n 24 You should use the phc2sys automatic mode (-a) when using boundary_clock_jbod. Thanks, Richard |
|
From: Richard C. <ric...@gm...> - 2022-01-06 02:33:11
|
On Wed, Jan 05, 2022 at 07:34:07PM +0000, ramesh t wrote: > On resetting of the testing unit, observing ptp port state of the > interface connected to BC/GM going to PS_FAULTY temporarily though > it shouldn't be impacted due to reset of testing unit. Why does BC/GM port go to PS_FAULTY? Link down? Thanks, Richard |
|
From: ramesh t <ram...@ya...> - 2022-02-01 19:04:44
|
hi Miroslav Lichvar, You are correct, "the timestamp checked in the clockcheck code should always be a timestamp of the clock synchronized by phc2sys" In this case, we have connected testUnit to interface A. Interface A is providing clock to testUnit (with ptp4l in BC mode) and time is synchronized on PHC of interface A using phc2sys. Resetting of testUnit is triggering interface A to go down. This in turns is triggering PHC run at different frequency for small duration till interface is up. Though this is a driver issue on interface A, my question why is the below code required? > interval = (int64_t)ts - cc->last_ts; > if (interval >= 0 && interval < CHECK_MIN_INTERVAL) > return ret; Can't we just depend on CLOCK_MONOTONIC clock check? regards, Ramesh On Monday, January 24, 2022, 01:34:26 PM GMT+5:30, Miroslav Lichvar <mli...@re...> wrote: On Thu, Jan 20, 2022 at 06:11:57PM +0000, ramesh t via Linuxptp-users wrote: > In clockcheck_sample function, we should depend on CLOCK_MONOTONIC to decide if its getting called more frequency than a second. But we also check on remote time: > > interval = (int64_t)ts - cc->last_ts; > if (interval >= 0 && interval < CHECK_MIN_INTERVAL) > return ret; > > This may not be correct as remote phc time could have drifted. Hence when we call clockcheck_sample again mono_interval may be a higher value, resulting in variation of min and max freq_offset calculation. The timestamp checked in the clockcheck code should always be a timestamp of the clock synchronized by phc2sys, not the clock to which it is synchronized (what I assume you mean by "remote"). Can you point to the code path where this is not the case? -- Miroslav Lichvar |
|
From: Miroslav L. <mli...@re...> - 2022-02-02 08:55:38
|
On Tue, Feb 01, 2022 at 07:04:29PM +0000, ramesh t wrote: > Resetting of testUnit is triggering interface A to go down. This in turns is triggering PHC run at different frequency for small duration till interface is up. Though this is a driver issue on interface A, my question why is the below code required? > > interval = (int64_t)ts - cc->last_ts; > > if (interval >= 0 && interval < CHECK_MIN_INTERVAL) > > return ret; > > Can't we just depend on CLOCK_MONOTONIC clock check? This code is needed to not check timestamps that are too close to each other as that amplifies the noise in the calculated frequency offset between the two clocks and increase rate of false positives. The interval could be calculated from the system monotonic clock, if that's what you are asking. The clock would have to be read in each call of the function instead of just once per the minimum interval. -- Miroslav Lichvar |
|
From: ramesh t <ram...@ya...> - 2022-02-07 16:27:09
|
hi Richard, > Feb 7 09:35:20 ptp4l: [610991.536] port 2: send sync failed > Feb 7 09:35:20 ptp4l: [610991.536] port 2: MASTER to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED) > Feb 7 09:35:20 ptp4l: [610991.557] port 2: link down > Feb 7 09:35:20 ptp4l: [610991.557] port 2: received link status notification DOWN Why do you keep ignoring this message? No, the above error is because remote Testunit is resetted/rebooted. Hence the interface goes down resulting in above logs. regards, Ramesh On Monday, February 7, 2022, 08:20:42 PM GMT+5:30, Richard Cochran <ric...@gm...> wrote: On Mon, Feb 07, 2022 at 10:47:51AM +0000, ramesh t wrote: > Did few more iterations of testing (ptp4l in BC mode) by resetting TestUnit. Still observing "send sync error" with txtimeout of 100ms. > > Question: > 1) ptp4l is running in BC mode providing clock to other connected TestUnits and syncing clock from another BC/GM. > But in the below case, it would be blocked for almost 100ms (1/10 sec) Wouldn't this impact ptp packets to testunit or handling ptp packets from BC/GM? Yes, it might if the long timeout is, in fact, needed. If you want to run logSyncInterval at 62.5 milliseconds, then you will need to design your system to handle that high rate. This might entail a different choice of MAC. > Feb 7 09:35:20 ptp4l: [610991.536] port 2: send sync failed > Feb 7 09:35:20 ptp4l: [610991.536] port 2: MASTER to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED) > Feb 7 09:35:20 ptp4l: [610991.557] port 2: link down > Feb 7 09:35:20 ptp4l: [610991.557] port 2: received link status notification DOWN Why do you keep ignoring this message? Richard |
|
From: Richard C. <ric...@gm...> - 2022-02-07 18:01:50
|
On Mon, Feb 07, 2022 at 04:26:45PM +0000, ramesh t wrote: > hi Richard, > > > Feb 7 09:35:20 ptp4l: [610991.536] port 2: send sync failed > > Feb 7 09:35:20 ptp4l: [610991.536] port 2: MASTER to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED) > > Feb 7 09:35:20 ptp4l: [610991.557] port 2: link down > > Feb 7 09:35:20 ptp4l: [610991.557] port 2: received link status notification DOWN > Why do you keep ignoring this message? > > No, the above error is because remote Testunit is resetted/rebooted. Hence the interface goes down resulting in above logs. So now you know the reason for "send sync failed". Richard |
|
From: ramesh t <ram...@ya...> - 2022-01-06 19:08:40
|
hi Richard, >>Why does BC/GM port go to PS_FAULTY? Link down? Sorry, could be issues with my side of the changes. Tried looking into code to find out if there is any variable in port or clock struct which will tell if the interface is acting as client (slave) or server side of PTP. Please suggest. regards, Ramesh On Thursday, January 6, 2022, 08:03:06 AM GMT+5:30, Richard Cochran <ric...@gm...> wrote: On Wed, Jan 05, 2022 at 07:34:07PM +0000, ramesh t wrote: > On resetting of the testing unit, observing ptp port state of the > interface connected to BC/GM going to PS_FAULTY temporarily though > it shouldn't be impacted due to reset of testing unit. Why does BC/GM port go to PS_FAULTY? Link down? Thanks, Richard |
|
From: ramesh t <ram...@ya...> - 2022-01-07 07:35:15
|
hi Richard, In BC mode, Port 2 is connected to TestingUnit and Port 1 is connected to BC/GM unit. Observing below behavior on reboot of TestingUnit, Port 1 is going from SLAVE to UNCALIBRATED, which is not correct. Below are the ptp4l and phc2sys command used. ptp4l -2 -A -i enp175s0f1 -i enp95s0f1 -f /etc/ptp4l_bc.conf phc2sys_slave -a -r -n 24 ptp4l: [325741.182] rms 6 max 15 freq +648 +/- 10 delay 373 +/- 1 phc2sya: [325741.509] CLOCK_REALTIME phc offset 19 s2 freq -81940 delay 1150 phc2sys: [325741.510] enp95s0f1 phc offset 4 s2 freq -2830 delay 3808 ptp4l: [325742.182] rms 3 max 7 freq +647 +/- 5 delay 375 +/- 0 phc2sys: [325742.510] CLOCK_REALTIME phc offset 2 s2 freq -81952 delay 1164 phc2sys: [325742.510] enp95s0f1 phc offset -2 s2 freq -2835 delay 3820 ptp4l: [325743.182] rms 5 max 11 freq +646 +/- 9 delay 374 +/- 1 phc2sys: [325743.510] CLOCK_REALTIME phc offset -9 s2 freq -81962 delay 1166 phc2sys: [325743.510] enp95s0f1 phc offset 0 s2 freq -2834 delay 3817 kernel: [324954.668590] i40e 0000:5f:00.1 enp95s0f1: NIC Link is Down ptp4l: [325744.218] timed out while polling for tx timestamp ptp4l: [325744.218] increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug ptp4l: [325744.218] port 2: send sync failed ptp4l: [325744.218] port 2: MASTER to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED) ptp4l: [325744.259] clockcheck: clock jumped backward or running slower than expected! ptp4l: [325744.259] port 1: SLAVE to UNCALIBRATED on SYNCHRONIZATION_FAULT ptp4l: [325744.259] PS_SLAVE: port_e2e_transition ptp4l: [325744.259] port 2: link down ptp4l: [325744.259] port 2: NEC received link status notification DOWN ptp4l: [325744.259] selected best master clock 000580.fffe.07cc8a ptp4l: [325744.259] updating UTC offset to 37 ptp4l: [325744.259] rms 5 max 13 freq +643 +/- 6 delay 374 +/- 1 ptp4l: [325744.259] port 2: NEC received link status notification DOWN ptp4l: [325744.366] port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED ptp4l: [325744.366] PS_SLAVE: port_e2e_transition ptp4l: [325744.430] clockcheck: clock jumped forward or running faster than expected! ptp4l: [325744.430] port 1: SLAVE to UNCALIBRATED on SYNCHRONIZATION_FAULT ptp4l: [325744.430] PS_SLAVE: port_e2e_transition phc2sys: [325744.510] port 40a6b7.fffe.0da261-2 changed state phc2sys: [325744.510] port 40a6b7.fffe.0da261-1 changed state phc2sys: message repeated 2 times: [ [325744.510] port 40a6b7.fffe.0da261-1 changed state] phc2sys: [325744.510] reconfiguring after port state change phc2sys: [325744.510] selecting enp95s0f1 for synchronization phc2sys: [325744.510] master clock not ready, waiting... ptp4l: [325744.678] port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED ptp4l: [325744.678] PS_SLAVE: port_e2e_transition ptp4l: [325745.182] rms 16 max 31 freq +646 +/- 44 delay 374 +/- 1 phc2sys: [325745.510] port 40a6b7.fffe.0da261-1 changed state phc2sys: [325745.510] reconfiguring after port state change phc2sys: [325745.510] selecting enp95s0f1 for synchronization phc2sys: [325745.510] selecting CLOCK_REALTIME for synchronization phc2sys: [325745.510] selecting enp175s0f1 as the master clock phc2sys: [325745.510] CLOCK_REALTIME phc offset -11 s2 freq -81967 delay 1149 phc2sys: [325745.510] clockcheck: clock jumped backward or running slower than expected! phc2sys: [325745.510] enp95s0f1 phc offset -1106433060 s0 freq -2834 delay 194 Please suggest. regards, Ramesh On Friday, January 7, 2022, 12:38:31 AM GMT+5:30, ramesh t <ram...@ya...> wrote: hi Richard, >>Why does BC/GM port go to PS_FAULTY? Link down? Sorry, could be issues with my side of the changes. Tried looking into code to find out if there is any variable in port or clock struct which will tell if the interface is acting as client (slave) or server side of PTP. Please suggest. regards, Ramesh On Thursday, January 6, 2022, 08:03:06 AM GMT+5:30, Richard Cochran <ric...@gm...> wrote: On Wed, Jan 05, 2022 at 07:34:07PM +0000, ramesh t wrote: > On resetting of the testing unit, observing ptp port state of the > interface connected to BC/GM going to PS_FAULTY temporarily though > it shouldn't be impacted due to reset of testing unit. Why does BC/GM port go to PS_FAULTY? Link down? Thanks, Richard |
|
From: Richard C. <ric...@gm...> - 2022-01-07 15:23:17
|
On Fri, Jan 07, 2022 at 07:32:39AM +0000, ramesh t wrote: > ptp4l: [325744.430] clockcheck: clock jumped forward or running faster than expected! > ptp4l: [325744.430] port 1: SLAVE to UNCALIBRATED on SYNCHRONIZATION_FAULT See this? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > Please suggest. I suggest reading the log messages to see the clearly identified source of the fault. |
|
From: ramesh t <ram...@ya...> - 2022-01-17 08:25:18
|
hi Richard, Issue: Running ptp4l in BC mode. Have connected one NIC port to BC/GM and another NIC port to testing unit (TU). On resetting of the testing unit, temporarily observing ptp4l going into FAULTY state though it shouldn't be impacted due to reset of testing unit. On debugged this issue further: By adding debugs to print mono_interval in clockcheck_sample, below is the understanding. In normal working case: Observing clockcheck_sample is getting called once in 120-128 milli seconds. ptp4l: [523771.617] mono 127658899 ptp4l: [523771.737] mono 120000322 On resetting testing Unit, whenever send sync failure is observed, clockcheck_sample is getting called after 200 milli seconds. ptp4l: [523772.664] timed out while polling for tx timestamp ptp4l: [523772.664] PID 6071 increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug ptp4l: [523772.664] port 2: send sync failed ptp4l: [523772.700] mono 210350673 ptp4l: [523772.700] clockcheck port 1: clock jumped backward or running slower than expected! ptp4l: [523772.700] port 1: SLAVE to UNCALIBRATED on SYNCHRONIZATION_FAULT Since mono interval will be used to calculate min and max freq offset, clockcheck is reporting error and resetting servo. As I understand ptp4l is running in single thread and this seems to impact whenever send failure occurs. Please suggest. regards, Ramesh On Friday, January 7, 2022, 08:53:09 PM GMT+5:30, Richard Cochran <ric...@gm...> wrote: On Fri, Jan 07, 2022 at 07:32:39AM +0000, ramesh t wrote: > ptp4l: [325744.430] clockcheck: clock jumped forward or running faster than expected! > ptp4l: [325744.430] port 1: SLAVE to UNCALIBRATED on SYNCHRONIZATION_FAULT See this? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > Please suggest. I suggest reading the log messages to see the clearly identified source of the fault. |
|
From: Richard C. <ric...@gm...> - 2022-01-17 16:06:04
|
On Mon, Jan 17, 2022 at 08:25:06AM +0000, ramesh t wrote:
> Please suggest.
1. Make sure your ptp4l has these five patches:
a082bcd clockcheck: Increase minimum interval.
6df8425 port: Don't renew raw transport.
e117e37 port: Don't check timestamps from non-slave ports.
262a49b clock: Reset clock check on best clock/port change.
7e8eba5 clock: Reset state when switching port with same best clock.
If that doesn't fix the problem, then:
2. Disable clock sanity check.
sanity_freq_limit
The maximum allowed frequency offset between uncorrected clock
and the system monotonic clock in parts per billion (ppb). This
is used as a sanity check of the synchronized clock. When a
larger offset is measured, a warning message will be printed and
the servo will be reset. When set to 0, the sanity check is dis‐
abled. The default is 200000000 (20%).
|
|
From: ramesh t <ram...@ya...> - 2022-01-18 17:38:29
|
hi Richard, With Step 1, it seems to be working fine, will try few more variations and check. Thank you for your help. But have few questions: Patches provided doesn't seems to have any relation with error "timed out while polling for tx timestamp" which occurred while transmitting Sync packets on master side. I'm not observing this error, any reason why? After recover/reboot of remote system, phc offset of the interface connected to TestUnit/remote system seems to be struck at improper value. [root@satdd-nec-worker0 ~]# /usr/sbin/phc_ctl enp95s0f1 cmp phc_ctl[645780.131]: offset from CLOCK_REALTIME is 31335266008ns phc2sys: [645792.375] clock jumped backward or running slower than expected! phc2sys: [645792.375] enp95s0f1 phc offset -68335289644 s0 freq -900000000 delay 3779 phc2sys: [645793.375] CLOCK_REALTIME phc offset -11 s2 freq -79884 delay 1143 phc2sys: [645793.375] clock jumped backward or running slower than expected! phc2sys: [645793.375] enp95s0f1 phc offset -68335291579 s0 freq -900000000 delay 3767 phc2sys: [645794.375] CLOCK_REALTIME phc offset 8 s2 freq -79868 delay 1156 phc2sys: [645794.376] clock jumped backward or running slower than expected! phc2sys: [645794.376] enp95s0f1 phc offset -68335293498 s0 freq -900000000 delay 3798 phc2sys: [645795.376] CLOCK_REALTIME phc offset 4 s2 freq -79870 delay 1158 Please suggest. regards, Ramesh On Monday, January 17, 2022, 09:35:54 PM GMT+5:30, Richard Cochran <ric...@gm...> wrote: On Mon, Jan 17, 2022 at 08:25:06AM +0000, ramesh t wrote: > Please suggest. 1. Make sure your ptp4l has these five patches: a082bcd clockcheck: Increase minimum interval. 6df8425 port: Don't renew raw transport. e117e37 port: Don't check timestamps from non-slave ports. 262a49b clock: Reset clock check on best clock/port change. 7e8eba5 clock: Reset state when switching port with same best clock. If that doesn't fix the problem, then: 2. Disable clock sanity check. sanity_freq_limit The maximum allowed frequency offset between uncorrected clock and the system monotonic clock in parts per billion (ppb). This is used as a sanity check of the synchronized clock. When a larger offset is measured, a warning message will be printed and the servo will be reset. When set to 0, the sanity check is dis‐ abled. The default is 200000000 (20%). |
|
From: ramesh t <ram...@ya...> - 2022-01-20 18:12:35
|
hi Richard, In phc2sys code, for default config phc_interval, update_clock is called once in a second based CLOCK_MONOTONIC timer. With sanity_check enabled, clockcheck_sample is also called. In clockcheck_sample function, we should depend on CLOCK_MONOTONIC to decide if its getting called more frequency than a second. But we also check on remote time: interval = (int64_t)ts - cc->last_ts; if (interval >= 0 && interval < CHECK_MIN_INTERVAL) return ret; This may not be correct as remote phc time could have drifted. Hence when we call clockcheck_sample again mono_interval may be a higher value, resulting in variation of min and max freq_offset calculation. Can you please check this and suggest. regards, Ramesh On Tuesday, January 18, 2022, 11:10:22 PM GMT+5:30, ramesh t via Linuxptp-devel <lin...@li...> wrote: hi Richard, With Step 1, it seems to be working fine, will try few more variations and check. Thank you for your help. But have few questions: Patches provided doesn't seems to have any relation with error "timed out while polling for tx timestamp" which occurred while transmitting Sync packets on master side. I'm not observing this error, any reason why? After recover/reboot of remote system, phc offset of the interface connected to TestUnit/remote system seems to be struck at improper value. [root@satdd-nec-worker0 ~]# /usr/sbin/phc_ctl enp95s0f1 cmp phc_ctl[645780.131]: offset from CLOCK_REALTIME is 31335266008ns phc2sys: [645792.375] clock jumped backward or running slower than expected! phc2sys: [645792.375] enp95s0f1 phc offset -68335289644 s0 freq -900000000 delay 3779 phc2sys: [645793.375] CLOCK_REALTIME phc offset -11 s2 freq -79884 delay 1143 phc2sys: [645793.375] clock jumped backward or running slower than expected! phc2sys: [645793.375] enp95s0f1 phc offset -68335291579 s0 freq -900000000 delay 3767 phc2sys: [645794.375] CLOCK_REALTIME phc offset 8 s2 freq -79868 delay 1156 phc2sys: [645794.376] clock jumped backward or running slower than expected! phc2sys: [645794.376] enp95s0f1 phc offset -68335293498 s0 freq -900000000 delay 3798 phc2sys: [645795.376] CLOCK_REALTIME phc offset 4 s2 freq -79870 delay 1158 Please suggest. regards, Ramesh On Monday, January 17, 2022, 09:35:54 PM GMT+5:30, Richard Cochran <ric...@gm...> wrote: On Mon, Jan 17, 2022 at 08:25:06AM +0000, ramesh t wrote: > Please suggest. 1. Make sure your ptp4l has these five patches: a082bcd clockcheck: Increase minimum interval. 6df8425 port: Don't renew raw transport. e117e37 port: Don't check timestamps from non-slave ports. 262a49b clock: Reset clock check on best clock/port change. 7e8eba5 clock: Reset state when switching port with same best clock. If that doesn't fix the problem, then: 2. Disable clock sanity check. sanity_freq_limit The maximum allowed frequency offset between uncorrected clock and the system monotonic clock in parts per billion (ppb). This is used as a sanity check of the synchronized clock. When a larger offset is measured, a warning message will be printed and the servo will be reset. When set to 0, the sanity check is dis‐ abled. The default is 200000000 (20%). _______________________________________________ Linuxptp-devel mailing list Lin...@li... https://lists.sourceforge.net/lists/listinfo/linuxptp-devel |
|
From: Miroslav L. <mli...@re...> - 2022-01-24 08:04:35
|
On Thu, Jan 20, 2022 at 06:11:57PM +0000, ramesh t via Linuxptp-users wrote: > In clockcheck_sample function, we should depend on CLOCK_MONOTONIC to decide if its getting called more frequency than a second. But we also check on remote time: > > interval = (int64_t)ts - cc->last_ts; > if (interval >= 0 && interval < CHECK_MIN_INTERVAL) > return ret; > > This may not be correct as remote phc time could have drifted. Hence when we call clockcheck_sample again mono_interval may be a higher value, resulting in variation of min and max freq_offset calculation. The timestamp checked in the clockcheck code should always be a timestamp of the clock synchronized by phc2sys, not the clock to which it is synchronized (what I assume you mean by "remote"). Can you point to the code path where this is not the case? -- Miroslav Lichvar |
|
From: ramesh t <ram...@ya...> - 2022-02-07 10:48:08
|
hi Richard,
Did few more iterations of testing (ptp4l in BC mode) by resetting TestUnit. Still observing "send sync error" with txtimeout of 100ms.
Question:
1) ptp4l is running in BC mode providing clock to other connected TestUnits and syncing clock from another BC/GM.
But in the below case, it would be blocked for almost 100ms (1/10 sec) Wouldn't this impact ptp packets to testunit or handling ptp packets from BC/GM?
res = poll(&pfd, 1, sk_tx_timeout);
if (res < 1) {
Sync and Announcement packets values as below
logAnnounceInterval -3
logSyncInterval -4
Feb 7 09:35:18 phc2sys: [610989.748] CLOCK_REALTIME phc offset -16 s2 freq -81596 delay 566
Feb 7 09:35:18 phc2sys: [610989.748] enp95s0f1 phc offset -1 s2 freq -2464 delay 3725
Feb 7 09:35:18 ptp4l: [610989.818] rms 5 max 12 freq +649 +/- 8 delay 370 +/- 1
Feb 7 09:35:19 phc2sys: [610990.748] CLOCK_REALTIME phc offset 0 s2 freq -81585 delay 576
Feb 7 09:35:19 phc2sys: [610990.748] enp95s0f1 phc offset -1 s2 freq -2465 delay 3740
Feb 7 09:35:19 ptp4l: [610990.818] rms 5 max 10 freq +648 +/- 9 delay 369 +/- 0
Feb 7 09:35:20 ptp4l: [610991.536] timed out while polling for tx timestamp revent 0 event 2
Feb 7 09:35:20 ptp4l: [610991.536] increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug
Feb 7 09:35:20 ptp4l: [610991.536] port 2: send sync failed
Feb 7 09:35:20 ptp4l: [610991.536] port 2: MASTER to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED)
Feb 7 09:35:20 ptp4l: [610991.557] port 2: link down
Feb 7 09:35:20 ptp4l: [610991.557] port 2: received link status notification DOWN
Feb 7 09:35:20 ptp4l: [610991.557] selected best master clock 000580.fffe.07cc8a
Feb 7 09:35:20 ptp4l: [610991.557] updating UTC offset to 37
Feb 7 09:35:20 ptp4l: [610991.557] Not client interface for BC mode enp95s0f1
Feb 7 09:35:20 ptp4l: [610991.557] port 2: received link status notification DOWN
Feb 7 09:35:20 ptp4l: [610991.557] port 2: received link status notification DOWN
Feb 7 09:35:20 ptp4l: [610991.557] port 2: received link status notification DOWN
Feb 7 09:35:20 phc2sys: [610991.748] port 40a6b7.fffe.0da261-2 changed state
Feb 7 09:35:20 phc2sys: [610991.748] reconfiguring after port state change
Feb 7 09:35:20 phc2sys: [610991.748] selecting enp95s0f1 for synchronization
Feb 7 09:35:20 phc2sys: [610991.748] selecting CLOCK_REALTIME for synchronization
Feb 7 09:35:20 phc2sys: [610991.748] selecting enp175s0f1 as the master clock
Feb 7 09:35:20 phc2sys: [610991.748] CLOCK_REALTIME phc offset 22 s2 freq -81563 delay 574
Feb 7 09:35:20 phc2sys: [610991.748] clock jumped backward or running slower than expected!
Feb 7 09:35:20 phc2sys: [610991.748] enp95s0f1 phc offset -270490592 s0 freq -2465 delay 1856
Feb 7 09:35:20 ptp4l: [610991.818] rms 4 max 8 freq +651 +/- 7 delay 370 +/- 0
Feb 7 09:35:21 ptp4l: [610992.492] port 2: received link status notification DOWN
Feb 7 09:35:21 phc2sys: [610992.748] CLOCK_REALTIME phc offset -5 s2 freq -81583 delay 574
Feb 7 09:35:21 phc2sys: [610992.748] clock jumped backward or running slower than expected!
Feb 7 09:35:21 phc2sys: [610992.748] enp95s0f1 phc offset -1040559513 s0 freq -2465 delay 1878
Feb 7 09:35:21 ptp4l: [610992.818] rms 5 max 9 freq +649 +/- 8 delay 369 +/- 1
Feb 7 09:35:22 phc2sys: [610993.748] CLOCK_REALTIME phc offset -8 s2 freq -81587 delay 573
Feb 7 09:35:22 phc2sys: [610993.748] clock jumped backward or running slower than expected!
Please suggest.
regards,
Ramesh
On Tuesday, January 18, 2022, 11:08:18 PM GMT+5:30, ramesh t <ram...@ya...> wrote:
hi Richard,
With Step 1, it seems to be working fine, will try few more variations and check.
Thank you for your help.
But have few questions:
Patches provided doesn't seems to have any relation with error "timed out while polling for tx timestamp" which occurred while transmitting Sync packets on master side. I'm not observing this error, any reason why?
After recover/reboot of remote system, phc offset of the interface connected to TestUnit/remote system seems to be struck at improper value.
[root@satdd-nec-worker0 ~]# /usr/sbin/phc_ctl enp95s0f1 cmp
phc_ctl[645780.131]: offset from CLOCK_REALTIME is 31335266008ns
phc2sys: [645792.375] clock jumped backward or running slower than expected!
phc2sys: [645792.375] enp95s0f1 phc offset -68335289644 s0 freq -900000000 delay 3779
phc2sys: [645793.375] CLOCK_REALTIME phc offset -11 s2 freq -79884 delay 1143
phc2sys: [645793.375] clock jumped backward or running slower than expected!
phc2sys: [645793.375] enp95s0f1 phc offset -68335291579 s0 freq -900000000 delay 3767
phc2sys: [645794.375] CLOCK_REALTIME phc offset 8 s2 freq -79868 delay 1156
phc2sys: [645794.376] clock jumped backward or running slower than expected!
phc2sys: [645794.376] enp95s0f1 phc offset -68335293498 s0 freq -900000000 delay 3798
phc2sys: [645795.376] CLOCK_REALTIME phc offset 4 s2 freq -79870 delay 1158
Please suggest.
regards,
Ramesh
On Monday, January 17, 2022, 09:35:54 PM GMT+5:30, Richard Cochran <ric...@gm...> wrote:
On Mon, Jan 17, 2022 at 08:25:06AM +0000, ramesh t wrote:
> Please suggest.
1. Make sure your ptp4l has these five patches:
a082bcd clockcheck: Increase minimum interval.
6df8425 port: Don't renew raw transport.
e117e37 port: Don't check timestamps from non-slave ports.
262a49b clock: Reset clock check on best clock/port change.
7e8eba5 clock: Reset state when switching port with same best clock.
If that doesn't fix the problem, then:
2. Disable clock sanity check.
sanity_freq_limit
The maximum allowed frequency offset between uncorrected clock
and the system monotonic clock in parts per billion (ppb). This
is used as a sanity check of the synchronized clock. When a
larger offset is measured, a warning message will be printed and
the servo will be reset. When set to 0, the sanity check is dis‐
abled. The default is 200000000 (20%).
|
|
From: Richard C. <ric...@gm...> - 2022-02-07 14:51:00
|
On Mon, Feb 07, 2022 at 10:47:51AM +0000, ramesh t wrote: > Did few more iterations of testing (ptp4l in BC mode) by resetting TestUnit. Still observing "send sync error" with txtimeout of 100ms. > > Question: > 1) ptp4l is running in BC mode providing clock to other connected TestUnits and syncing clock from another BC/GM. > But in the below case, it would be blocked for almost 100ms (1/10 sec) Wouldn't this impact ptp packets to testunit or handling ptp packets from BC/GM? Yes, it might if the long timeout is, in fact, needed. If you want to run logSyncInterval at 62.5 milliseconds, then you will need to design your system to handle that high rate. This might entail a different choice of MAC. > Feb 7 09:35:20 ptp4l: [610991.536] port 2: send sync failed > Feb 7 09:35:20 ptp4l: [610991.536] port 2: MASTER to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED) > Feb 7 09:35:20 ptp4l: [610991.557] port 2: link down > Feb 7 09:35:20 ptp4l: [610991.557] port 2: received link status notification DOWN Why do you keep ignoring this message? Richard |