Thread: [Linuxptp-users] Observing below issue with ptp4l running in BC mode

PTP IEEE 1588 stack for Linux

Brought to you by: rcochran

linuxptp-users

[Linuxptp-users] Observing below issue with ptp4l running in BC mode

From: ramesh t <ram...@ya...> - 2022-01-05 19:34:20

hi,

Running ptp4l in BC mode, with clock_type set to BC and boundary_clock_jbod set to 1. Have connected one NIC port to BC/GM and another NIC port to testing unit (TU). Following is the ptp4l command.

./ptp4l -2 -A -w -i <port connected to BC/GM> -i <port connected to TU> -f /etc/ptp4l_bc.c

In BC mode, ptp4l getting clock from BC/GM and is providing clock to testing unit.

Also running below phc2sys command:
phc2sys -s <port to connected to BC/GM> -c CLOCK_REALTIME -w -r -n 24
phc2sys -s <port to connected to BC/GM> -c <port connected to TU> -r -w -n 24

On resetting of the testing unit, observing ptp port state of the interface connected to BC/GM going to PS_FAULTY temporarily though it shouldn't be impacted due to reset of testing unit.

Verified the same behavior is seen with latest linuxptp4l code.

Please suggest.

regards,
Ramesh

Re: [Linuxptp-users] Observing below issue with ptp4l running in BC mode

From: Richard C. <ric...@gm...> - 2022-01-05 20:01:47

On Wed, Jan 05, 2022 at 07:34:07PM +0000, ramesh t wrote:
> hi,
> 
> Running ptp4l in BC mode, with clock_type set to BC and boundary_clock_jbod set to 1. Have connected one NIC port to BC/GM and another NIC port to testing unit (TU). Following is the ptp4l command.
> 
> ./ptp4l -2 -A -w -i <port connected to BC/GM> -i <port connected to TU> -f /etc/ptp4l_bc.c
> 
> In BC mode, ptp4l getting clock from BC/GM and is providing clock to testing unit.
> 
> Also running below phc2sys command:
> phc2sys -s <port to connected to BC/GM> -c CLOCK_REALTIME -w -r -n 24
> phc2sys -s <port to connected to BC/GM> -c <port connected to TU> -r -w -n 24

You should use the phc2sys automatic mode (-a) when using boundary_clock_jbod.

Thanks,
Richard

Re: [Linuxptp-users] Observing below issue with ptp4l running in BC mode

From: ramesh t <ram...@ya...> - 2022-01-06 01:40:43

hi Richard,

Using option -a will automatically time sync all the PHC devices in the system.
Is there a way to sync only those interface which are part of BC configuration (port connected to Testing Unit)??

Using -a option, will it help to solve this issue seen? Because ptp4l is going into PS_FAULTY?
Please suggest. 

regards,
Ramesh

On Thursday, January 6, 2022, 01:33:54 AM GMT+5:30, Richard Cochran <ric...@gm...> wrote: 

On Wed, Jan 05, 2022 at 07:34:07PM +0000, ramesh t wrote:

> hi,
> 
> Running ptp4l in BC mode, with clock_type set to BC and boundary_clock_jbod set to 1. Have connected one NIC port to BC/GM and another NIC port to testing unit (TU). Following is the ptp4l command.
> 
> ./ptp4l -2 -A -w -i <port connected to BC/GM> -i <port connected to TU> -f /etc/ptp4l_bc.c
> 
> In BC mode, ptp4l getting clock from BC/GM and is providing clock to testing unit.
> 
> Also running below phc2sys command:
> phc2sys -s <port to connected to BC/GM> -c CLOCK_REALTIME -w -r -n 24
> phc2sys -s <port to connected to BC/GM> -c <port connected to TU> -r -w -n 24

You should use the phc2sys automatic mode (-a) when using boundary_clock_jbod.

Thanks,
Richard

Re: [Linuxptp-users] Observing below issue with ptp4l running in BC mode

From: ramesh t <ram...@ya...> - 2022-01-06 02:12:34

hi Richard,

>>Using option -a will automatically time sync all the PHC devices in the system.
>>Is there a way to sync only those interface which are part of BC configuration (port connected to Testing Unit)??

Please ignore above question.

Repeated the testing with phc2sys automatic mode (-a), still same issue is seen.
(Used command: phc2sys -a -r -n 24)

Please suggest.

regards,
Ramesh

On Thursday, January 6, 2022, 07:10:20 AM GMT+5:30, ramesh t <ram...@ya...> wrote: 

hi Richard,

Using option -a will automatically time sync all the PHC devices in the system.
Is there a way to sync only those interface which are part of BC configuration (port connected to Testing Unit)??

Using -a option, will it help to solve this issue seen? Because ptp4l is going into PS_FAULTY?
Please suggest. 

regards,
Ramesh

On Thursday, January 6, 2022, 01:33:54 AM GMT+5:30, Richard Cochran <ric...@gm...> wrote: 

On Wed, Jan 05, 2022 at 07:34:07PM +0000, ramesh t wrote:

> hi,
> 
> Running ptp4l in BC mode, with clock_type set to BC and boundary_clock_jbod set to 1. Have connected one NIC port to BC/GM and another NIC port to testing unit (TU). Following is the ptp4l command.
> 
> ./ptp4l -2 -A -w -i <port connected to BC/GM> -i <port connected to TU> -f /etc/ptp4l_bc.c
> 
> In BC mode, ptp4l getting clock from BC/GM and is providing clock to testing unit.
> 
> Also running below phc2sys command:
> phc2sys -s <port to connected to BC/GM> -c CLOCK_REALTIME -w -r -n 24
> phc2sys -s <port to connected to BC/GM> -c <port connected to TU> -r -w -n 24

You should use the phc2sys automatic mode (-a) when using boundary_clock_jbod.

Thanks,
Richard

Re: [Linuxptp-users] Observing below issue with ptp4l running in BC mode

From: Richard C. <ric...@gm...> - 2022-01-06 02:33:11

On Wed, Jan 05, 2022 at 07:34:07PM +0000, ramesh t wrote:

> On resetting of the testing unit, observing ptp port state of the
> interface connected to BC/GM going to PS_FAULTY temporarily though
> it shouldn't be impacted due to reset of testing unit.

Why does BC/GM port go to PS_FAULTY?  Link down?

Thanks,
Richard

Re: [Linuxptp-users] [Linuxptp-devel] Observing below issue with ptp4l running in BC mode

From: ramesh t <ram...@ya...> - 2022-02-01 19:04:44

hi Miroslav Lichvar,

You are correct, "the timestamp checked in the clockcheck code should always be a timestamp of the clock synchronized by phc2sys"

In this case, we have connected testUnit to interface A. Interface A is providing clock to testUnit (with ptp4l in BC mode) and time is synchronized on PHC of interface A using phc2sys.

Resetting of testUnit is triggering interface A to go down. This in turns is triggering PHC run at different frequency for small duration till interface is up. Though this is a driver issue on interface A, my question why is the below code required?
>         interval = (int64_t)ts - cc->last_ts;
>         if (interval >= 0 && interval < CHECK_MIN_INTERVAL)
>               return ret;

Can't we just depend on CLOCK_MONOTONIC clock check?

regards,
Ramesh





On Monday, January 24, 2022, 01:34:26 PM GMT+5:30, Miroslav Lichvar <mli...@re...> wrote: 





On Thu, Jan 20, 2022 at 06:11:57PM +0000, ramesh t via Linuxptp-users wrote:

> In clockcheck_sample function, we should depend on CLOCK_MONOTONIC to decide if its getting called more frequency than a second. But we also check on remote time:
> 
>         interval = (int64_t)ts - cc->last_ts;
>         if (interval >= 0 && interval < CHECK_MIN_INTERVAL)
>               return ret;
> 
> This may not be correct as remote phc time could have drifted. Hence when we call clockcheck_sample again mono_interval may be a higher value, resulting in variation of min and max freq_offset calculation.


The timestamp checked in the clockcheck code should always be a
timestamp of the clock synchronized by phc2sys, not the clock to which
it is synchronized (what I assume you mean by "remote"). Can you point
to the code path where this is not the case?

-- 
Miroslav Lichvar

Re: [Linuxptp-users] [Linuxptp-devel] Observing below issue with ptp4l running in BC mode

From: Miroslav L. <mli...@re...> - 2022-02-02 08:55:38

On Tue, Feb 01, 2022 at 07:04:29PM +0000, ramesh t wrote:
> Resetting of testUnit is triggering interface A to go down. This in turns is triggering PHC run at different frequency for small duration till interface is up. Though this is a driver issue on interface A, my question why is the below code required?
> >         interval = (int64_t)ts - cc->last_ts;
> >         if (interval >= 0 && interval < CHECK_MIN_INTERVAL)
> >               return ret;
> 
> Can't we just depend on CLOCK_MONOTONIC clock check?

This code is needed to not check timestamps that are too close to each
other as that amplifies the noise in the calculated frequency offset
between the two clocks and increase rate of false positives.

The interval could be calculated from the system monotonic clock, if
that's what you are asking. The clock would have to be read in each
call of the function instead of just once per the minimum interval.

-- 
Miroslav Lichvar

Re: [Linuxptp-users] Observing below issue with ptp4l running in BC mode

From: ramesh t <ram...@ya...> - 2022-02-07 16:27:09

hi Richard,

> Feb  7 09:35:20 ptp4l: [610991.536] port 2: send sync failed
> Feb  7 09:35:20 ptp4l: [610991.536] port 2: MASTER to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED)
> Feb  7 09:35:20 ptp4l: [610991.557] port 2: link down
> Feb  7 09:35:20 ptp4l: [610991.557] port 2: received link status notification DOWN
Why do you keep ignoring this message?

No, the above error is because remote Testunit is resetted/rebooted. Hence the interface goes down resulting in above logs.

regards,
Ramesh

On Monday, February 7, 2022, 08:20:42 PM GMT+5:30, Richard Cochran <ric...@gm...> wrote: 





On Mon, Feb 07, 2022 at 10:47:51AM +0000, ramesh t wrote:
> Did few more iterations of testing (ptp4l in BC mode) by resetting TestUnit. Still observing "send sync error" with txtimeout of 100ms.
> 
> Question:
> 1) ptp4l is running in BC mode providing clock to other connected TestUnits and syncing clock from another BC/GM.
>     But in the below case, it would be blocked for almost 100ms (1/10 sec) Wouldn't this impact ptp packets to testunit or handling ptp packets from BC/GM? 

Yes, it might if the long timeout is, in fact, needed.

If you want to run logSyncInterval at 62.5 milliseconds, then you will
need to design your system to handle that high rate.  This might
entail a different choice of MAC.

> Feb  7 09:35:20 ptp4l: [610991.536] port 2: send sync failed
> Feb  7 09:35:20 ptp4l: [610991.536] port 2: MASTER to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED)
> Feb  7 09:35:20 ptp4l: [610991.557] port 2: link down
> Feb  7 09:35:20 ptp4l: [610991.557] port 2: received link status notification DOWN

Why do you keep ignoring this message?


Richard

Re: [Linuxptp-users] Observing below issue with ptp4l running in BC mode

From: Richard C. <ric...@gm...> - 2022-02-07 18:01:50

On Mon, Feb 07, 2022 at 04:26:45PM +0000, ramesh t wrote:
> hi Richard,
> 
> > Feb  7 09:35:20 ptp4l: [610991.536] port 2: send sync failed
> > Feb  7 09:35:20 ptp4l: [610991.536] port 2: MASTER to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED)
> > Feb  7 09:35:20 ptp4l: [610991.557] port 2: link down
> > Feb  7 09:35:20 ptp4l: [610991.557] port 2: received link status notification DOWN
> Why do you keep ignoring this message?
> 
> No, the above error is because remote Testunit is resetted/rebooted. Hence the interface goes down resulting in above logs.

So now you know the reason for "send sync failed".

Richard

Re: [Linuxptp-users] Observing below issue with ptp4l running in BC mode

From: ramesh t <ram...@ya...> - 2022-01-06 19:08:40

hi Richard,

>>Why does BC/GM port go to PS_FAULTY?  Link down?
Sorry, could be issues with my side of the changes.

Tried looking into code to find out if there is any variable in port or clock struct which will tell if the interface is acting as client (slave) or server side of PTP.
Please suggest. 

regards,
Ramesh

On Thursday, January 6, 2022, 08:03:06 AM GMT+5:30, Richard Cochran <ric...@gm...> wrote: 

On Wed, Jan 05, 2022 at 07:34:07PM +0000, ramesh t wrote:

> On resetting of the testing unit, observing ptp port state of the
> interface connected to BC/GM going to PS_FAULTY temporarily though
> it shouldn't be impacted due to reset of testing unit.

Why does BC/GM port go to PS_FAULTY?  Link down?

Thanks,
Richard

Re: [Linuxptp-users] Observing below issue with ptp4l running in BC mode

From: ramesh t <ram...@ya...> - 2022-01-07 07:35:15

hi Richard,

In BC mode, Port 2 is connected to TestingUnit and Port 1 is connected to BC/GM unit.
Observing below behavior on reboot of TestingUnit, Port 1 is going from SLAVE to UNCALIBRATED, which is not correct.

Below are the ptp4l and phc2sys command used.
ptp4l -2 -A -i enp175s0f1 -i enp95s0f1 -f /etc/ptp4l_bc.conf
phc2sys_slave -a -r -n 24

ptp4l: [325741.182] rms    6 max   15 freq   +648 +/-  10 delay   373 +/-   1
phc2sya: [325741.509] CLOCK_REALTIME phc offset        19 s2 freq  -81940 delay   1150
phc2sys: [325741.510] enp95s0f1 phc offset         4 s2 freq   -2830 delay   3808
ptp4l: [325742.182] rms    3 max    7 freq   +647 +/-   5 delay   375 +/-   0
phc2sys: [325742.510] CLOCK_REALTIME phc offset         2 s2 freq  -81952 delay   1164
phc2sys: [325742.510] enp95s0f1 phc offset        -2 s2 freq   -2835 delay   3820
ptp4l: [325743.182] rms    5 max   11 freq   +646 +/-   9 delay   374 +/-   1
phc2sys: [325743.510] CLOCK_REALTIME phc offset        -9 s2 freq  -81962 delay   1166
phc2sys: [325743.510] enp95s0f1 phc offset         0 s2 freq   -2834 delay   3817
kernel: [324954.668590] i40e 0000:5f:00.1 enp95s0f1: NIC Link is Down
ptp4l: [325744.218] timed out while polling for tx timestamp
ptp4l: [325744.218] increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug
ptp4l: [325744.218] port 2: send sync failed
ptp4l: [325744.218] port 2: MASTER to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED)
ptp4l: [325744.259] clockcheck: clock jumped backward or running slower than expected!
ptp4l: [325744.259] port 1: SLAVE to UNCALIBRATED on SYNCHRONIZATION_FAULT
ptp4l: [325744.259] PS_SLAVE: port_e2e_transition
ptp4l: [325744.259] port 2: link down
ptp4l: [325744.259] port 2: NEC received link status notification DOWN
ptp4l: [325744.259] selected best master clock 000580.fffe.07cc8a
ptp4l: [325744.259] updating UTC offset to 37
ptp4l: [325744.259] rms    5 max   13 freq   +643 +/-   6 delay   374 +/-   1
ptp4l: [325744.259] port 2: NEC received link status notification DOWN
ptp4l: [325744.366] port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED
ptp4l: [325744.366] PS_SLAVE: port_e2e_transition
ptp4l: [325744.430] clockcheck: clock jumped forward or running faster than expected!
ptp4l: [325744.430] port 1: SLAVE to UNCALIBRATED on SYNCHRONIZATION_FAULT
ptp4l: [325744.430] PS_SLAVE: port_e2e_transition
phc2sys: [325744.510] port 40a6b7.fffe.0da261-2 changed state
phc2sys: [325744.510] port 40a6b7.fffe.0da261-1 changed state
phc2sys: message repeated 2 times: [ [325744.510] port 40a6b7.fffe.0da261-1 changed state]
phc2sys: [325744.510] reconfiguring after port state change
phc2sys: [325744.510] selecting enp95s0f1 for synchronization
phc2sys: [325744.510] master clock not ready, waiting...
ptp4l: [325744.678] port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED
ptp4l: [325744.678] PS_SLAVE: port_e2e_transition
ptp4l: [325745.182] rms   16 max   31 freq   +646 +/-  44 delay   374 +/-   1
phc2sys: [325745.510] port 40a6b7.fffe.0da261-1 changed state
phc2sys: [325745.510] reconfiguring after port state change
phc2sys: [325745.510] selecting enp95s0f1 for synchronization
phc2sys: [325745.510] selecting CLOCK_REALTIME for synchronization
phc2sys: [325745.510] selecting enp175s0f1 as the master clock
phc2sys: [325745.510] CLOCK_REALTIME phc offset       -11 s2 freq  -81967 delay   1149
phc2sys: [325745.510] clockcheck: clock jumped backward or running slower than expected!
phc2sys: [325745.510] enp95s0f1 phc offset -1106433060 s0 freq   -2834 delay    194

Please suggest.

regards,
Ramesh

On Friday, January 7, 2022, 12:38:31 AM GMT+5:30, ramesh t <ram...@ya...> wrote: 





hi Richard,

>>Why does BC/GM port go to PS_FAULTY?  Link down?
Sorry, could be issues with my side of the changes.

Tried looking into code to find out if there is any variable in port or clock struct which will tell if the interface is acting as client (slave) or server side of PTP.
Please suggest. 

regards,
Ramesh

On Thursday, January 6, 2022, 08:03:06 AM GMT+5:30, Richard Cochran <ric...@gm...> wrote: 





On Wed, Jan 05, 2022 at 07:34:07PM +0000, ramesh t wrote:


> On resetting of the testing unit, observing ptp port state of the
> interface connected to BC/GM going to PS_FAULTY temporarily though
> it shouldn't be impacted due to reset of testing unit.


Why does BC/GM port go to PS_FAULTY?  Link down?

Thanks,
Richard

Re: [Linuxptp-users] Observing below issue with ptp4l running in BC mode

From: Richard C. <ric...@gm...> - 2022-01-07 15:23:17

On Fri, Jan 07, 2022 at 07:32:39AM +0000, ramesh t wrote:

> ptp4l: [325744.430] clockcheck: clock jumped forward or running faster than expected!
> ptp4l: [325744.430] port 1: SLAVE to UNCALIBRATED on SYNCHRONIZATION_FAULT

See this? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

> Please suggest.

I suggest reading the log messages to see the clearly identified
source of the fault.

Re: [Linuxptp-users] Observing below issue with ptp4l running in BC mode

From: ramesh t <ram...@ya...> - 2022-01-17 08:25:18

hi Richard,

Issue:
Running ptp4l in BC mode. Have connected one NIC port to BC/GM and another NIC port to testing unit (TU). 
On resetting of the testing unit, temporarily observing ptp4l going into FAULTY state though it shouldn't be impacted due to reset of testing unit.

On debugged this issue further:
By adding debugs to print mono_interval in clockcheck_sample, below is the understanding.
In normal working case:
Observing clockcheck_sample is getting called once in 120-128 milli seconds.
ptp4l: [523771.617] mono 127658899
ptp4l: [523771.737] mono 120000322

On resetting testing Unit, whenever send sync failure is observed,  clockcheck_sample is getting called after 200 milli seconds.
ptp4l: [523772.664] timed out while polling for tx timestamp
ptp4l: [523772.664] PID 6071 increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug
ptp4l: [523772.664] port 2: send sync failed
ptp4l: [523772.700] mono 210350673
ptp4l: [523772.700] clockcheck port 1: clock jumped backward or running slower than expected!
ptp4l: [523772.700] port 1: SLAVE to UNCALIBRATED on SYNCHRONIZATION_FAULT

Since mono interval will be used to calculate min and max freq offset, clockcheck is reporting error and resetting servo.

As I understand ptp4l is running in single thread and this seems to impact whenever send failure occurs.

Please suggest.

regards,
Ramesh

On Friday, January 7, 2022, 08:53:09 PM GMT+5:30, Richard Cochran <ric...@gm...> wrote: 

On Fri, Jan 07, 2022 at 07:32:39AM +0000, ramesh t wrote:

> ptp4l: [325744.430] clockcheck: clock jumped forward or running faster than expected!
> ptp4l: [325744.430] port 1: SLAVE to UNCALIBRATED on SYNCHRONIZATION_FAULT

See this? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

> Please suggest.

I suggest reading the log messages to see the clearly identified
source of the fault.

Re: [Linuxptp-users] Observing below issue with ptp4l running in BC mode

From: Richard C. <ric...@gm...> - 2022-01-17 16:06:04

On Mon, Jan 17, 2022 at 08:25:06AM +0000, ramesh t wrote:
> Please suggest.

1. Make sure your ptp4l has these five patches:

   a082bcd clockcheck: Increase minimum interval.
   6df8425 port: Don't renew raw transport.
   e117e37 port: Don't check timestamps from non-slave ports.
   262a49b clock: Reset clock check on best clock/port change.
   7e8eba5 clock: Reset state when switching port with same best clock.

   If that doesn't fix the problem, then:

2. Disable clock sanity check.

       sanity_freq_limit
              The  maximum  allowed frequency offset between uncorrected clock
              and the system monotonic clock in parts per billion (ppb).  This
              is  used  as  a  sanity  check of the synchronized clock. When a
              larger offset is measured, a warning message will be printed and
              the servo will be reset. When set to 0, the sanity check is dis‐
              abled. The default is 200000000 (20%).

Re: [Linuxptp-users] Observing below issue with ptp4l running in BC mode

From: ramesh t <ram...@ya...> - 2022-01-18 17:38:29

hi Richard,

With Step 1, it seems to be working fine, will try few more variations and check.
Thank you for your help.

But have few questions:
Patches provided doesn't seems to have any relation with error "timed out while polling for tx timestamp" which occurred while transmitting Sync packets on master side. I'm not observing this error, any reason why?

After recover/reboot of remote system, phc offset of the interface connected to TestUnit/remote system seems to be struck at improper value.
[root@satdd-nec-worker0 ~]# /usr/sbin/phc_ctl enp95s0f1 cmp
phc_ctl[645780.131]: offset from CLOCK_REALTIME is 31335266008ns

phc2sys: [645792.375] clock jumped backward or running slower than expected!
phc2sys: [645792.375] enp95s0f1 phc offset -68335289644 s0 freq -900000000 delay 3779
phc2sys: [645793.375] CLOCK_REALTIME phc offset -11 s2 freq -79884 delay 1143
phc2sys: [645793.375] clock jumped backward or running slower than expected!
phc2sys: [645793.375] enp95s0f1 phc offset -68335291579 s0 freq -900000000 delay 3767
phc2sys: [645794.375] CLOCK_REALTIME phc offset 8 s2 freq -79868 delay 1156
phc2sys: [645794.376] clock jumped backward or running slower than expected!
phc2sys: [645794.376] enp95s0f1 phc offset -68335293498 s0 freq -900000000 delay 3798
phc2sys: [645795.376] CLOCK_REALTIME phc offset 4 s2 freq -79870 delay 1158

Please suggest.

regards,
Ramesh

On Monday, January 17, 2022, 09:35:54 PM GMT+5:30, Richard Cochran <ric...@gm...> wrote:

On Mon, Jan 17, 2022 at 08:25:06AM +0000, ramesh t wrote:

> Please suggest.

1. Make sure your ptp4l has these five patches:

a082bcd clockcheck: Increase minimum interval.
6df8425 port: Don't renew raw transport.
e117e37 port: Don't check timestamps from non-slave ports.
262a49b clock: Reset clock check on best clock/port change.
7e8eba5 clock: Reset state when switching port with same best clock.

If that doesn't fix the problem, then:

2. Disable clock sanity check.

sanity_freq_limit
The maximum allowed frequency offset between uncorrected clock
and the system monotonic clock in parts per billion (ppb). This
is used as a sanity check of the synchronized clock. When a
larger offset is measured, a warning message will be printed and
the servo will be reset. When set to 0, the sanity check is dis‐
abled. The default is 200000000 (20%).

Re: [Linuxptp-users] [Linuxptp-devel] Observing below issue with ptp4l running in BC mode

From: ramesh t <ram...@ya...> - 2022-01-20 18:12:35

hi Richard,

In phc2sys code, for default config phc_interval, update_clock is called once in a second based CLOCK_MONOTONIC timer. With sanity_check enabled, clockcheck_sample is also called.
 
In clockcheck_sample function, we should depend on CLOCK_MONOTONIC to decide if its getting called more frequency than a second. But we also check on remote time:

        interval = (int64_t)ts - cc->last_ts;
        if (interval >= 0 && interval < CHECK_MIN_INTERVAL)
              return ret;

This may not be correct as remote phc time could have drifted. Hence when we call clockcheck_sample again mono_interval may be a higher value, resulting in variation of min and max freq_offset calculation.

Can you please check this and suggest.

regards,
Ramesh






On Tuesday, January 18, 2022, 11:10:22 PM GMT+5:30, ramesh t via Linuxptp-devel <lin...@li...> wrote: 





hi Richard,

With Step 1, it seems to be working fine, will try few more variations and check.
Thank you for your help.

But have few questions:
Patches provided doesn't seems to have any relation with error "timed out while polling for tx timestamp" which occurred while transmitting Sync packets on master side. I'm not observing this error, any reason why?

After recover/reboot of remote system, phc offset of the interface connected to TestUnit/remote system seems to be struck at improper value.
[root@satdd-nec-worker0 ~]# /usr/sbin/phc_ctl enp95s0f1 cmp
phc_ctl[645780.131]: offset from CLOCK_REALTIME is 31335266008ns

phc2sys: [645792.375] clock jumped backward or running slower than expected!
phc2sys: [645792.375] enp95s0f1 phc offset -68335289644 s0 freq -900000000 delay   3779
phc2sys: [645793.375] CLOCK_REALTIME phc offset       -11 s2 freq  -79884 delay   1143
phc2sys: [645793.375] clock jumped backward or running slower than expected!
phc2sys: [645793.375] enp95s0f1 phc offset -68335291579 s0 freq -900000000 delay   3767
phc2sys: [645794.375] CLOCK_REALTIME phc offset         8 s2 freq  -79868 delay   1156
phc2sys: [645794.376] clock jumped backward or running slower than expected!
phc2sys: [645794.376] enp95s0f1 phc offset -68335293498 s0 freq -900000000 delay   3798
phc2sys: [645795.376] CLOCK_REALTIME phc offset         4 s2 freq  -79870 delay   1158

Please suggest.

regards,
Ramesh

On Monday, January 17, 2022, 09:35:54 PM GMT+5:30, Richard Cochran <ric...@gm...> wrote: 





On Mon, Jan 17, 2022 at 08:25:06AM +0000, ramesh t wrote:

> Please suggest.


1. Make sure your ptp4l has these five patches:

  a082bcd clockcheck: Increase minimum interval.
  6df8425 port: Don't renew raw transport.
  e117e37 port: Don't check timestamps from non-slave ports.
  262a49b clock: Reset clock check on best clock/port change.
  7e8eba5 clock: Reset state when switching port with same best clock.

  If that doesn't fix the problem, then:

2. Disable clock sanity check.

      sanity_freq_limit
              The  maximum  allowed frequency offset between uncorrected clock
              and the system monotonic clock in parts per billion (ppb).  This
              is  used  as  a  sanity  check of the synchronized clock. When a
              larger offset is measured, a warning message will be printed and
              the servo will be reset. When set to 0, the sanity check is dis‐
              abled. The default is 200000000 (20%).



_______________________________________________
Linuxptp-devel mailing list
Lin...@li...
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel

Re: [Linuxptp-users] [Linuxptp-devel] Observing below issue with ptp4l running in BC mode

From: Miroslav L. <mli...@re...> - 2022-01-24 08:04:35

On Thu, Jan 20, 2022 at 06:11:57PM +0000, ramesh t via Linuxptp-users wrote:
> In clockcheck_sample function, we should depend on CLOCK_MONOTONIC to decide if its getting called more frequency than a second. But we also check on remote time:
> 
>         interval = (int64_t)ts - cc->last_ts;
>         if (interval >= 0 && interval < CHECK_MIN_INTERVAL)
>               return ret;
> 
> This may not be correct as remote phc time could have drifted. Hence when we call clockcheck_sample again mono_interval may be a higher value, resulting in variation of min and max freq_offset calculation.

The timestamp checked in the clockcheck code should always be a
timestamp of the clock synchronized by phc2sys, not the clock to which
it is synchronized (what I assume you mean by "remote"). Can you point
to the code path where this is not the case?

-- 
Miroslav Lichvar

Re: [Linuxptp-users] Observing below issue with ptp4l running in BC mode

From: ramesh t <ram...@ya...> - 2022-02-07 10:48:08

hi Richard,

Did few more iterations of testing (ptp4l in BC mode) by resetting TestUnit. Still observing "send sync error" with txtimeout of 100ms.

Question:
1) ptp4l is running in BC mode providing clock to other connected TestUnits and syncing clock from another BC/GM.
    But in the below case, it would be blocked for almost 100ms (1/10 sec) Wouldn't this impact ptp packets to testunit or handling ptp packets from BC/GM? 
                res = poll(&pfd, 1, sk_tx_timeout);
                if (res < 1) {
Sync and Announcement packets values as below 
logAnnounceInterval     -3 
logSyncInterval         -4


Feb  7 09:35:18 phc2sys: [610989.748] CLOCK_REALTIME phc offset       -16 s2 freq  -81596 delay    566
Feb  7 09:35:18 phc2sys: [610989.748] enp95s0f1 phc offset        -1 s2 freq   -2464 delay   3725
Feb  7 09:35:18 ptp4l: [610989.818] rms    5 max   12 freq   +649 +/-   8 delay   370 +/-   1
Feb  7 09:35:19 phc2sys: [610990.748] CLOCK_REALTIME phc offset         0 s2 freq  -81585 delay    576
Feb  7 09:35:19 phc2sys: [610990.748] enp95s0f1 phc offset        -1 s2 freq   -2465 delay   3740
Feb  7 09:35:19 ptp4l: [610990.818] rms    5 max   10 freq   +648 +/-   9 delay   369 +/-   0
Feb  7 09:35:20 ptp4l: [610991.536] timed out while polling for tx timestamp revent 0 event 2
Feb  7 09:35:20 ptp4l: [610991.536] increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug
Feb  7 09:35:20 ptp4l: [610991.536] port 2: send sync failed
Feb  7 09:35:20 ptp4l: [610991.536] port 2: MASTER to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED)
Feb  7 09:35:20 ptp4l: [610991.557] port 2: link down
Feb  7 09:35:20 ptp4l: [610991.557] port 2: received link status notification DOWN
Feb  7 09:35:20 ptp4l: [610991.557] selected best master clock 000580.fffe.07cc8a
Feb  7 09:35:20 ptp4l: [610991.557] updating UTC offset to 37
Feb  7 09:35:20 ptp4l: [610991.557] Not client interface for BC mode enp95s0f1
Feb  7 09:35:20 ptp4l: [610991.557] port 2: received link status notification DOWN
Feb  7 09:35:20 ptp4l: [610991.557] port 2: received link status notification DOWN
Feb  7 09:35:20 ptp4l: [610991.557] port 2: received link status notification DOWN
Feb  7 09:35:20 phc2sys: [610991.748] port 40a6b7.fffe.0da261-2 changed state
Feb  7 09:35:20 phc2sys: [610991.748] reconfiguring after port state change
Feb  7 09:35:20 phc2sys: [610991.748] selecting enp95s0f1 for synchronization
Feb  7 09:35:20 phc2sys: [610991.748] selecting CLOCK_REALTIME for synchronization
Feb  7 09:35:20 phc2sys: [610991.748] selecting enp175s0f1 as the master clock
Feb  7 09:35:20 phc2sys: [610991.748] CLOCK_REALTIME phc offset        22 s2 freq  -81563 delay    574
Feb  7 09:35:20 phc2sys: [610991.748] clock jumped backward or running slower than expected!
Feb  7 09:35:20 phc2sys: [610991.748] enp95s0f1 phc offset -270490592 s0 freq   -2465 delay   1856
Feb  7 09:35:20 ptp4l: [610991.818] rms    4 max    8 freq   +651 +/-   7 delay   370 +/-   0
Feb  7 09:35:21 ptp4l: [610992.492] port 2: received link status notification DOWN
Feb  7 09:35:21 phc2sys: [610992.748] CLOCK_REALTIME phc offset        -5 s2 freq  -81583 delay    574
Feb  7 09:35:21 phc2sys: [610992.748] clock jumped backward or running slower than expected!
Feb  7 09:35:21 phc2sys: [610992.748] enp95s0f1 phc offset -1040559513 s0 freq   -2465 delay   1878
Feb  7 09:35:21 ptp4l: [610992.818] rms    5 max    9 freq   +649 +/-   8 delay   369 +/-   1
Feb  7 09:35:22 phc2sys: [610993.748] CLOCK_REALTIME phc offset        -8 s2 freq  -81587 delay    573
Feb  7 09:35:22 phc2sys: [610993.748] clock jumped backward or running slower than expected!

Please suggest.

regards,
Ramesh

On Tuesday, January 18, 2022, 11:08:18 PM GMT+5:30, ramesh t <ram...@ya...> wrote: 





hi Richard,

With Step 1, it seems to be working fine, will try few more variations and check.
Thank you for your help.

But have few questions:
Patches provided doesn't seems to have any relation with error "timed out while polling for tx timestamp" which occurred while transmitting Sync packets on master side. I'm not observing this error, any reason why?

After recover/reboot of remote system, phc offset of the interface connected to TestUnit/remote system seems to be struck at improper value.
[root@satdd-nec-worker0 ~]# /usr/sbin/phc_ctl enp95s0f1 cmp
phc_ctl[645780.131]: offset from CLOCK_REALTIME is 31335266008ns

phc2sys: [645792.375] clock jumped backward or running slower than expected!
phc2sys: [645792.375] enp95s0f1 phc offset -68335289644 s0 freq -900000000 delay   3779
phc2sys: [645793.375] CLOCK_REALTIME phc offset       -11 s2 freq  -79884 delay   1143
phc2sys: [645793.375] clock jumped backward or running slower than expected!
phc2sys: [645793.375] enp95s0f1 phc offset -68335291579 s0 freq -900000000 delay   3767
phc2sys: [645794.375] CLOCK_REALTIME phc offset         8 s2 freq  -79868 delay   1156
phc2sys: [645794.376] clock jumped backward or running slower than expected!
phc2sys: [645794.376] enp95s0f1 phc offset -68335293498 s0 freq -900000000 delay   3798
phc2sys: [645795.376] CLOCK_REALTIME phc offset         4 s2 freq  -79870 delay   1158

Please suggest.

regards,
Ramesh

On Monday, January 17, 2022, 09:35:54 PM GMT+5:30, Richard Cochran <ric...@gm...> wrote: 





On Mon, Jan 17, 2022 at 08:25:06AM +0000, ramesh t wrote:

> Please suggest.


1. Make sure your ptp4l has these five patches:

  a082bcd clockcheck: Increase minimum interval.
  6df8425 port: Don't renew raw transport.
  e117e37 port: Don't check timestamps from non-slave ports.
  262a49b clock: Reset clock check on best clock/port change.
  7e8eba5 clock: Reset state when switching port with same best clock.

  If that doesn't fix the problem, then:

2. Disable clock sanity check.

      sanity_freq_limit
              The  maximum  allowed frequency offset between uncorrected clock
              and the system monotonic clock in parts per billion (ppb).  This
              is  used  as  a  sanity  check of the synchronized clock. When a
              larger offset is measured, a warning message will be printed and
              the servo will be reset. When set to 0, the sanity check is dis‐
              abled. The default is 200000000 (20%).

Re: [Linuxptp-users] Observing below issue with ptp4l running in BC mode

From: Richard C. <ric...@gm...> - 2022-02-07 14:51:00

On Mon, Feb 07, 2022 at 10:47:51AM +0000, ramesh t wrote:
> Did few more iterations of testing (ptp4l in BC mode) by resetting TestUnit. Still observing "send sync error" with txtimeout of 100ms.
> 
> Question:
> 1) ptp4l is running in BC mode providing clock to other connected TestUnits and syncing clock from another BC/GM.
>     But in the below case, it would be blocked for almost 100ms (1/10 sec) Wouldn't this impact ptp packets to testunit or handling ptp packets from BC/GM? 

Yes, it might if the long timeout is, in fact, needed.

If you want to run logSyncInterval at 62.5 milliseconds, then you will
need to design your system to handle that high rate.  This might
entail a different choice of MAC.

> Feb  7 09:35:20 ptp4l: [610991.536] port 2: send sync failed
> Feb  7 09:35:20 ptp4l: [610991.536] port 2: MASTER to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED)
> Feb  7 09:35:20 ptp4l: [610991.557] port 2: link down
> Feb  7 09:35:20 ptp4l: [610991.557] port 2: received link status notification DOWN

Why do you keep ignoring this message?

Richard