Thread: [Linuxptp-devel] [PATCH 0/3] Fixes for sanity clock check

PTP IEEE 1588 stack for Linux

Brought to you by: rcochran

linuxptp-devel

[Linuxptp-devel] [PATCH 0/3] Fixes for sanity clock check

From: Miroslav L. <mli...@re...> - 2021-05-25 12:28:31

These patches should make the clock check more reliable, e.g. when the
clock is controlled externally by another process, or when there are
multiple clocks in the jbod mode as was discussed recently on this list.

The first patch is not strictly related to the clock check, e.g. it also
fixes the path delay after switching the slave port.

Miroslav Lichvar (3):
  clock: Reset state when switching port with same best clock.
  clock: Reset clock check on port state change.
  port: Don't check timestamps from non-slave ports.

 clock.c      | 3 ++-
 clockcheck.c | 9 ++++++++-
 clockcheck.h | 6 ++++++
 port.c       | 5 ++++-
 4 files changed, 20 insertions(+), 3 deletions(-)

-- 
2.26.3

[Linuxptp-devel] [PATCH 1/3] clock: Reset state when switching port with same best clock.

From: Miroslav L. <mli...@re...> - 2021-05-25 12:28:32

When the best port is changed, but the ID of the best clock doesn't
change (e.g. a passive port is activated on link failure), reset the
current delay and other master/link-specific state to avoid the switch
throwing the clock off.

Signed-off-by: Miroslav Lichvar <mli...@re...>
---
 clock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/clock.c b/clock.c
index e545a9b..a073575 100644
--- a/clock.c
+++ b/clock.c
@@ -1947,7 +1947,7 @@ static void handle_state_decision_event(struct clock *c)
 			  cid2str(&best_id));
 	}
 
-	if (!cid_eq(&best_id, &c->best_id)) {
+	if (!cid_eq(&best_id, &c->best_id) || best != c->best) {
 		clock_freq_est_reset(c);
 		tsproc_reset(c->tsproc, 1);
 		if (!tmv_is_zero(c->initial_delay))
-- 
2.26.3

Re: [Linuxptp-devel] [PATCH 1/3] clock: Reset state when switching port with same best clock.

From: Keller, J. E <jac...@in...> - 2021-05-25 20:34:21


> -----Original Message-----
> From: Miroslav Lichvar <mli...@re...>
> Sent: Tuesday, May 25, 2021 5:28 AM
> To: lin...@li...
> Subject: [Linuxptp-devel] [PATCH 1/3] clock: Reset state when switching port with
> same best clock.
> 
> When the best port is changed, but the ID of the best clock doesn't
> change (e.g. a passive port is activated on link failure), reset the
> current delay and other master/link-specific state to avoid the switch
> throwing the clock off.
> 

Right. Since we are on a new port, we are going to have different characteristics, even if the clock is still the same.

Makes sense.

Reviewed-by: Jacob Keller <jac...@in...>

> Signed-off-by: Miroslav Lichvar <mli...@re...>
> ---
>  clock.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/clock.c b/clock.c
> index e545a9b..a073575 100644
> --- a/clock.c
> +++ b/clock.c
> @@ -1947,7 +1947,7 @@ static void handle_state_decision_event(struct clock
> *c)
>  			  cid2str(&best_id));
>  	}
> 
> -	if (!cid_eq(&best_id, &c->best_id)) {
> +	if (!cid_eq(&best_id, &c->best_id) || best != c->best) {
>  		clock_freq_est_reset(c);
>  		tsproc_reset(c->tsproc, 1);
>  		if (!tmv_is_zero(c->initial_delay))
> --
> 2.26.3
> 
> 
> 
> _______________________________________________
> Linuxptp-devel mailing list
> Lin...@li...
> https://lists.sourceforge.net/lists/listinfo/linuxptp-devel

[Linuxptp-devel] [PATCH 3/3] port: Don't check timestamps from non-slave ports.

From: Miroslav L. <mli...@re...> - 2021-05-25 12:28:35

Don't perform the sanity check on receive timestamps from ports in
non-slave states to avoid false positives in the jbod mode, where
the timestamps can be generated by different clocks.

Reported-by: Amar Subramanyam <asu...@al...>
Signed-off-by: Miroslav Lichvar <mli...@re...>
---
 port.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/port.c b/port.c
index 10bb9e1..fb420fb 100644
--- a/port.c
+++ b/port.c
@@ -2744,7 +2744,10 @@ static enum fsm_event bc_event(struct port *p, int fd_index)
 	}
 	if (msg_sots_valid(msg)) {
 		ts_add(&msg->hwts.ts, -p->rx_timestamp_offset);
-		clock_check_ts(p->clock, tmv_to_nanoseconds(msg->hwts.ts));
+		if (p->state == PS_SLAVE) {
+			clock_check_ts(p->clock,
+				       tmv_to_nanoseconds(msg->hwts.ts));
+		}
 	}
 
 	switch (msg_type(msg)) {
-- 
2.26.3

[Linuxptp-devel] [PATCH 2/3] clock: Reset clock check on port state change.

From: Miroslav L. <mli...@re...> - 2021-05-25 12:28:37

Reset the clock check to avoid false positives when switching between
slave and non-slave state and the clock is controlled by an external
process (e.g. phc2sys -rr).

Signed-off-by: Miroslav Lichvar <mli...@re...>
---
 clock.c      | 1 +
 clockcheck.c | 9 ++++++++-
 clockcheck.h | 6 ++++++
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/clock.c b/clock.c
index a073575..69f2b66 100644
--- a/clock.c
+++ b/clock.c
@@ -1949,6 +1949,7 @@ static void handle_state_decision_event(struct clock *c)
 
 	if (!cid_eq(&best_id, &c->best_id) || best != c->best) {
 		clock_freq_est_reset(c);
+		clockcheck_reset(c->sanity_check);
 		tsproc_reset(c->tsproc, 1);
 		if (!tmv_is_zero(c->initial_delay))
 			tsproc_set_delay(c->tsproc, c->initial_delay);
diff --git a/clockcheck.c b/clockcheck.c
index d48a578..d0b4714 100644
--- a/clockcheck.c
+++ b/clockcheck.c
@@ -47,9 +47,16 @@ struct clockcheck *clockcheck_create(int freq_limit)
 	if (!cc)
 		return NULL;
 	cc->freq_limit = freq_limit;
+	clockcheck_reset(cc);
+	return cc;
+}
+
+void clockcheck_reset(struct clockcheck *cc)
+{
+	cc->freq_known = 0;
 	cc->max_freq = -CHECK_MAX_FREQ;
 	cc->min_freq = CHECK_MAX_FREQ;
-	return cc;
+	cc->last_ts = 0;
 }
 
 int clockcheck_sample(struct clockcheck *cc, uint64_t ts)
diff --git a/clockcheck.h b/clockcheck.h
index 78aca48..1ff86eb 100644
--- a/clockcheck.h
+++ b/clockcheck.h
@@ -33,6 +33,12 @@ struct clockcheck;
  */
 struct clockcheck *clockcheck_create(int freq_limit);
 
+/**
+ * Reset a clock check.
+ * @param cc Pointer to a clock check obtained via @ref clockcheck_create().
+ */
+void clockcheck_reset(struct clockcheck *cc);
+
 /**
  * Perform the sanity check on a time stamp.
  * @param cc Pointer to a clock check obtained via @ref clockcheck_create().
-- 
2.26.3

Re: [Linuxptp-devel] [PATCH 2/3] clock: Reset clock check on port state change.

From: Keller, J. E <jac...@in...> - 2021-05-25 20:35:50


> -----Original Message-----
> From: Miroslav Lichvar <mli...@re...>
> Sent: Tuesday, May 25, 2021 5:28 AM
> To: lin...@li...
> Subject: [Linuxptp-devel] [PATCH 2/3] clock: Reset clock check on port state
> change.
> 
> Reset the clock check to avoid false positives when switching between
> slave and non-slave state and the clock is controlled by an external
> process (e.g. phc2sys -rr).
> 


The way this is worded, I expected clockcheck_reset to only be called if the clock is controlled by an external process, but we seem to call this every time.

> Signed-off-by: Miroslav Lichvar <mli...@re...>
> ---
>  clock.c      | 1 +
>  clockcheck.c | 9 ++++++++-
>  clockcheck.h | 6 ++++++
>  3 files changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/clock.c b/clock.c
> index a073575..69f2b66 100644
> --- a/clock.c
> +++ b/clock.c
> @@ -1949,6 +1949,7 @@ static void handle_state_decision_event(struct clock
> *c)
> 
>  	if (!cid_eq(&best_id, &c->best_id) || best != c->best) {
>  		clock_freq_est_reset(c);
> +		clockcheck_reset(c->sanity_check);
>  		tsproc_reset(c->tsproc, 1);
>  		if (!tmv_is_zero(c->initial_delay))
>  			tsproc_set_delay(c->tsproc, c->initial_delay);
> diff --git a/clockcheck.c b/clockcheck.c
> index d48a578..d0b4714 100644
> --- a/clockcheck.c
> +++ b/clockcheck.c
> @@ -47,9 +47,16 @@ struct clockcheck *clockcheck_create(int freq_limit)
>  	if (!cc)
>  		return NULL;
>  	cc->freq_limit = freq_limit;
> +	clockcheck_reset(cc);
> +	return cc;
> +}
> +
> +void clockcheck_reset(struct clockcheck *cc)
> +{
> +	cc->freq_known = 0;
>  	cc->max_freq = -CHECK_MAX_FREQ;
>  	cc->min_freq = CHECK_MAX_FREQ;
> -	return cc;
> +	cc->last_ts = 0;
>  }
> 
>  int clockcheck_sample(struct clockcheck *cc, uint64_t ts)
> diff --git a/clockcheck.h b/clockcheck.h
> index 78aca48..1ff86eb 100644
> --- a/clockcheck.h
> +++ b/clockcheck.h
> @@ -33,6 +33,12 @@ struct clockcheck;
>   */
>  struct clockcheck *clockcheck_create(int freq_limit);
> 
> +/**
> + * Reset a clock check.
> + * @param cc Pointer to a clock check obtained via @ref clockcheck_create().
> + */
> +void clockcheck_reset(struct clockcheck *cc);
> +
>  /**
>   * Perform the sanity check on a time stamp.
>   * @param cc Pointer to a clock check obtained via @ref clockcheck_create().
> --
> 2.26.3
> 
> 
> 
> _______________________________________________
> Linuxptp-devel mailing list
> Lin...@li...
> https://lists.sourceforge.net/lists/listinfo/linuxptp-devel

Re: [Linuxptp-devel] [PATCH 2/3] clock: Reset clock check on port state change.

From: Miroslav L. <mli...@re...> - 2021-05-26 07:46:12

On Tue, May 25, 2021 at 08:35:35PM +0000, Keller, Jacob E wrote:
> > Reset the clock check to avoid false positives when switching between
> > slave and non-slave state and the clock is controlled by an external
> > process (e.g. phc2sys -rr).
> > 
> 
> 
> The way this is worded, I expected clockcheck_reset to only be called if the clock is controlled by an external process, but we seem to call this every time.

Detecting that would be difficult and unreliable. I'll reword the
message to me it more clear.

> >  	if (!cid_eq(&best_id, &c->best_id) || best != c->best) {
> >  		clock_freq_est_reset(c);
> > +		clockcheck_reset(c->sanity_check);

And here is a missing check for NULL in case it is disabled.

Thanks,

-- 
Miroslav Lichvar

[Linuxptp-devel] [PATCH v2 0/3] Fixes for sanity clock check

From: Miroslav L. <mli...@re...> - 2021-05-26 08:23:52

v2
- improved commit message
- added missing NULL check

These patches should make the clock check more reliable, e.g. when the
clock is controlled externally by another process, or when there are
multiple clocks in the jbod mode as was discussed recently on this list.

The first patch is not strictly related to the clock check, e.g. it also
fixes the path delay after switching the slave port.

Miroslav Lichvar (3):
  clock: Reset state when switching port with same best clock.
  clock: Reset clock check on best clock/port change.
  port: Don't check timestamps from non-slave ports.

 clock.c      | 4 +++-
 clockcheck.c | 9 ++++++++-
 clockcheck.h | 6 ++++++
 port.c       | 5 ++++-
 4 files changed, 21 insertions(+), 3 deletions(-)

-- 
2.26.3

[Linuxptp-devel] [PATCH v2 1/3] clock: Reset state when switching port with same best clock.

From: Miroslav L. <mli...@re...> - 2021-05-26 08:23:49

When the best port is changed, but the ID of the best clock doesn't
change (e.g. a passive port is activated on link failure), reset the
current delay and other master/link-specific state to avoid the switch
throwing the clock off.

Reviewed-by: Jacob Keller <jac...@in...>
Signed-off-by: Miroslav Lichvar <mli...@re...>
---
 clock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/clock.c b/clock.c
index e545a9b..a073575 100644
--- a/clock.c
+++ b/clock.c
@@ -1947,7 +1947,7 @@ static void handle_state_decision_event(struct clock *c)
 			  cid2str(&best_id));
 	}
 
-	if (!cid_eq(&best_id, &c->best_id)) {
+	if (!cid_eq(&best_id, &c->best_id) || best != c->best) {
 		clock_freq_est_reset(c);
 		tsproc_reset(c->tsproc, 1);
 		if (!tmv_is_zero(c->initial_delay))
-- 
2.26.3

[Linuxptp-devel] [PATCH v2 3/3] port: Don't check timestamps from non-slave ports.

From: Miroslav L. <mli...@re...> - 2021-05-26 08:23:50

Don't perform the sanity check on receive timestamps from ports in
non-slave states to avoid false positives in the jbod mode, where
the timestamps can be generated by different clocks.

Reported-by: Amar Subramanyam <asu...@al...>
Signed-off-by: Miroslav Lichvar <mli...@re...>
---
 port.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/port.c b/port.c
index 10bb9e1..fb420fb 100644
--- a/port.c
+++ b/port.c
@@ -2744,7 +2744,10 @@ static enum fsm_event bc_event(struct port *p, int fd_index)
 	}
 	if (msg_sots_valid(msg)) {
 		ts_add(&msg->hwts.ts, -p->rx_timestamp_offset);
-		clock_check_ts(p->clock, tmv_to_nanoseconds(msg->hwts.ts));
+		if (p->state == PS_SLAVE) {
+			clock_check_ts(p->clock,
+				       tmv_to_nanoseconds(msg->hwts.ts));
+		}
 	}
 
 	switch (msg_type(msg)) {
-- 
2.26.3

Re: [Linuxptp-devel] [PATCH v2 3/3] port: Don't check timestamps from non-slave ports.

From: Keller, J. E <jac...@in...> - 2021-05-26 08:53:40


> -----Original Message-----
> From: Miroslav Lichvar <mli...@re...>
> Sent: Wednesday, May 26, 2021 1:24 AM
> To: lin...@li...
> Subject: [Linuxptp-devel] [PATCH v2 3/3] port: Don't check timestamps from non-
> slave ports.
> 
> Don't perform the sanity check on receive timestamps from ports in
> non-slave states to avoid false positives in the jbod mode, where
> the timestamps can be generated by different clocks.
> 

Makes sense.

Reviewed-by: Jacob Keller <jac...@in...>

> Reported-by: Amar Subramanyam <asu...@al...>
> Signed-off-by: Miroslav Lichvar <mli...@re...>
> ---
>  port.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/port.c b/port.c
> index 10bb9e1..fb420fb 100644
> --- a/port.c
> +++ b/port.c
> @@ -2744,7 +2744,10 @@ static enum fsm_event bc_event(struct port *p, int
> fd_index)
>  	}
>  	if (msg_sots_valid(msg)) {
>  		ts_add(&msg->hwts.ts, -p->rx_timestamp_offset);
> -		clock_check_ts(p->clock, tmv_to_nanoseconds(msg->hwts.ts));
> +		if (p->state == PS_SLAVE) {
> +			clock_check_ts(p->clock,
> +				       tmv_to_nanoseconds(msg->hwts.ts));
> +		}
>  	}
> 
>  	switch (msg_type(msg)) {
> --
> 2.26.3
> 
> 
> 
> _______________________________________________
> Linuxptp-devel mailing list
> Lin...@li...
> https://lists.sourceforge.net/lists/listinfo/linuxptp-devel

[Linuxptp-devel] [PATCH v2 2/3] clock: Reset clock check on best clock/port change.

From: Miroslav L. <mli...@re...> - 2021-05-26 08:23:54

Reset the clock check when the best clock or port changes, together with
the other state like current estimated delay and frequency. This avoids
false positives if the clock is controlled by an external process when
not synchronized by PTP (e.g. phc2sys -rr).

Signed-off-by: Miroslav Lichvar <mli...@re...>
---
 clock.c      | 2 ++
 clockcheck.c | 9 ++++++++-
 clockcheck.h | 6 ++++++
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/clock.c b/clock.c
index a073575..2dd7ef9 100644
--- a/clock.c
+++ b/clock.c
@@ -1949,6 +1949,8 @@ static void handle_state_decision_event(struct clock *c)
 
 	if (!cid_eq(&best_id, &c->best_id) || best != c->best) {
 		clock_freq_est_reset(c);
+		if (c->sanity_check)
+			clockcheck_reset(c->sanity_check);
 		tsproc_reset(c->tsproc, 1);
 		if (!tmv_is_zero(c->initial_delay))
 			tsproc_set_delay(c->tsproc, c->initial_delay);
diff --git a/clockcheck.c b/clockcheck.c
index d48a578..d0b4714 100644
--- a/clockcheck.c
+++ b/clockcheck.c
@@ -47,9 +47,16 @@ struct clockcheck *clockcheck_create(int freq_limit)
 	if (!cc)
 		return NULL;
 	cc->freq_limit = freq_limit;
+	clockcheck_reset(cc);
+	return cc;
+}
+
+void clockcheck_reset(struct clockcheck *cc)
+{
+	cc->freq_known = 0;
 	cc->max_freq = -CHECK_MAX_FREQ;
 	cc->min_freq = CHECK_MAX_FREQ;
-	return cc;
+	cc->last_ts = 0;
 }
 
 int clockcheck_sample(struct clockcheck *cc, uint64_t ts)
diff --git a/clockcheck.h b/clockcheck.h
index 78aca48..1ff86eb 100644
--- a/clockcheck.h
+++ b/clockcheck.h
@@ -33,6 +33,12 @@ struct clockcheck;
  */
 struct clockcheck *clockcheck_create(int freq_limit);
 
+/**
+ * Reset a clock check.
+ * @param cc Pointer to a clock check obtained via @ref clockcheck_create().
+ */
+void clockcheck_reset(struct clockcheck *cc);
+
 /**
  * Perform the sanity check on a time stamp.
  * @param cc Pointer to a clock check obtained via @ref clockcheck_create().
-- 
2.26.3

Re: [Linuxptp-devel] [PATCH v2 2/3] clock: Reset clock check on best clock/port change.

From: Keller, J. E <jac...@in...> - 2021-05-26 08:53:20


> -----Original Message-----
> From: Miroslav Lichvar <mli...@re...>
> Sent: Wednesday, May 26, 2021 1:24 AM
> To: lin...@li...
> Subject: [Linuxptp-devel] [PATCH v2 2/3] clock: Reset clock check on best
> clock/port change.
> 
> Reset the clock check when the best clock or port changes, together with
> the other state like current estimated delay and frequency. This avoids
> false positives if the clock is controlled by an external process when
> not synchronized by PTP (e.g. phc2sys -rr).
> 

Thanks! The intent is more clear now!

Reviewed-by: Jacob Keller <jac...@in...>

> Signed-off-by: Miroslav Lichvar <mli...@re...>
> ---
>  clock.c      | 2 ++
>  clockcheck.c | 9 ++++++++-
>  clockcheck.h | 6 ++++++
>  3 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/clock.c b/clock.c
> index a073575..2dd7ef9 100644
> --- a/clock.c
> +++ b/clock.c
> @@ -1949,6 +1949,8 @@ static void handle_state_decision_event(struct clock
> *c)
> 
>  	if (!cid_eq(&best_id, &c->best_id) || best != c->best) {
>  		clock_freq_est_reset(c);
> +		if (c->sanity_check)
> +			clockcheck_reset(c->sanity_check);
>  		tsproc_reset(c->tsproc, 1);
>  		if (!tmv_is_zero(c->initial_delay))
>  			tsproc_set_delay(c->tsproc, c->initial_delay);
> diff --git a/clockcheck.c b/clockcheck.c
> index d48a578..d0b4714 100644
> --- a/clockcheck.c
> +++ b/clockcheck.c
> @@ -47,9 +47,16 @@ struct clockcheck *clockcheck_create(int freq_limit)
>  	if (!cc)
>  		return NULL;
>  	cc->freq_limit = freq_limit;
> +	clockcheck_reset(cc);
> +	return cc;
> +}
> +
> +void clockcheck_reset(struct clockcheck *cc)
> +{
> +	cc->freq_known = 0;
>  	cc->max_freq = -CHECK_MAX_FREQ;
>  	cc->min_freq = CHECK_MAX_FREQ;
> -	return cc;
> +	cc->last_ts = 0;
>  }
> 
>  int clockcheck_sample(struct clockcheck *cc, uint64_t ts)
> diff --git a/clockcheck.h b/clockcheck.h
> index 78aca48..1ff86eb 100644
> --- a/clockcheck.h
> +++ b/clockcheck.h
> @@ -33,6 +33,12 @@ struct clockcheck;
>   */
>  struct clockcheck *clockcheck_create(int freq_limit);
> 
> +/**
> + * Reset a clock check.
> + * @param cc Pointer to a clock check obtained via @ref clockcheck_create().
> + */
> +void clockcheck_reset(struct clockcheck *cc);
> +
>  /**
>   * Perform the sanity check on a time stamp.
>   * @param cc Pointer to a clock check obtained via @ref clockcheck_create().
> --
> 2.26.3
> 
> 
> 
> _______________________________________________
> Linuxptp-devel mailing list
> Lin...@li...
> https://lists.sourceforge.net/lists/listinfo/linuxptp-devel

Re: [Linuxptp-devel] [PATCH v2 3/3] port: Don't check timestamps from non-slave ports.

From: Amar S. <asu...@al...> - 2021-05-26 14:05:35

Hi Miroslav,

I am afraid, the patches that was intended to fix the reported issue doesn't do so. The issue that we reported was not due to timestamps from non-slave port being compared in jbod mode.  But due to mono_interval (CLOCK_MONOTIC interval b/w two successive calls to clock_check_sample()) becoming high due to the execution BMCA algorithm between two successive calls to clock_check_sample() function. We believe that this is due to execution time being higher, and solution is to increase the default value of sanity_freq_limit accordingly.
Please find the logs we collected with your patches applied.
Here we can see that mono_interval (132256969) increases when compared to the interval between hardware timestamp of successive sync packets (102726564).
Please note that port2(sriov1) is the current active port, and clock check is never done on port1.

Log:
ptp4l[13295.936]: selected best master clock 001122.fffe.33f801
ptp4l[13295.936]: updating UTC offset to 37
ptp4l[13295.936]: port :: port 2 (sriov1)
ptp4l[13295.936]: Mono Interval :: 132256969
ptp4l[13295.936]: Interval :: 102726564
ptp4l[13295.936]: clockcheck: clock jumped backward or running slower than expected!
ptp4l[13295.936]: port 2 (sriov1): SLAVE to UNCALIBRATED on SYNCHRONIZATION_FAULT
ptp4l[13296.202]: selected best master clock 001122.fffe.33f801
ptp4l[13296.202]: updating UTC offset to 37
ptp4l[13296.319]: selected best master clock 001122.fffe.33f801
ptp4l[13296.319]: updating UTC offset to 37
ptp4l[13296.515]: selected best master clock 001122.fffe.33f801
ptp4l[13296.515]: updating UTC offset to 37
ptp4l[13296.725]: selected best master clock 001122.fffe.33f801
ptp4l[13296.725]: updating UTC offset to 37
ptp4l[13296.873]: port 2 (sriov1): UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED
ptp4l[13296.963]: port :: port 2 (sriov1)
ptp4l[13296.964]: Mono Interval :: 1027848806
ptp4l[13296.964]: Interval :: 1057330889
ptp4l[13297.013]: selected best master clock 001122.fffe.33f801
ptp4l[13297.013]: updating UTC offset to 37
ptp4l[13297.059]: port :: port 2 (sriov1)
ptp4l[13297.062]: rms    8 max   14 freq  -2600 +/-   9 delay   729 +/-   0

Commands used : 
ptp4l -f ~/config_ptp_8275_1.conf -H -m  -s --boundary_clock_jbod=1 -i sriov0 -i sriov1
phc2sys -a -r -m -R 16 -n 24

Thanks,
Amar B S

> -----Original Message-----
> From: Miroslav Lichvar [mailto:mli...@re...]
> Sent: 26 May 2021 13:54
> To: lin...@li...
> Cc: Amar Subramanyam <asu...@al...>
> Subject: [PATCH v2 3/3] port: Don't check timestamps from non-slave ports.
> 
> CAUTION: This email originated from outside of Altiostar. Do not click on links
> or open attachments unless you recognize the sender and you are sure the
> content is safe. You will never be asked to reset your Altiostar password via
> email.
> 
> 
> Don't perform the sanity check on receive timestamps from ports in non-slave
> states to avoid false positives in the jbod mode, where the timestamps can be
> generated by different clocks.
> 
> Reported-by: Amar Subramanyam <asu...@al...>
> Signed-off-by: Miroslav Lichvar <mli...@re...>
> ---
>  port.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/port.c b/port.c
> index 10bb9e1..fb420fb 100644
> --- a/port.c
> +++ b/port.c
> @@ -2744,7 +2744,10 @@ static enum fsm_event bc_event(struct port *p, int
> fd_index)
>         }
>         if (msg_sots_valid(msg)) {
>                 ts_add(&msg->hwts.ts, -p->rx_timestamp_offset);
> -               clock_check_ts(p->clock, tmv_to_nanoseconds(msg->hwts.ts));
> +               if (p->state == PS_SLAVE) {
> +                       clock_check_ts(p->clock,
> +                                      tmv_to_nanoseconds(msg->hwts.ts));
> +               }
>         }
> 
>         switch (msg_type(msg)) {
> --
> 2.26.3

Re: [Linuxptp-devel] [PATCH v2 3/3] port: Don't check timestamps from non-slave ports.

From: Miroslav L. <mli...@re...> - 2021-05-26 14:32:37

On Wed, May 26, 2021 at 02:05:01PM +0000, Amar Subramanyam wrote:
> Hi Miroslav,
> 
> I am afraid, the patches that was intended to fix the reported issue doesn't do so. The issue that we reported was not due to timestamps from non-slave port being compared in jbod mode.  But due to mono_interval (CLOCK_MONOTIC interval b/w two successive calls to clock_check_sample()) becoming high due to the execution BMCA algorithm between two successive calls to clock_check_sample() function. We believe that this is due to execution time being higher, and solution is to increase the default value of sanity_freq_limit accordingly.

The execution of the BMCA took 30 milliseconds? In that case the fix
would be to make the minimum interval longer. Can you post a strace -tt
log containing the last second leading to the fault?

I suspect it's rather the PHC interval that's wrong. There might be an
unexpected interaction with phc2sys. Can you reproduce the issue with
no phc2sys running?

-- 
Miroslav Lichvar

Re: [Linuxptp-devel] [PATCH v2 3/3] port: Don't check timestamps from non-slave ports.

From: Miroslav L. <mli...@re...> - 2021-05-27 09:44:35

On Wed, May 26, 2021 at 05:45:29PM +0000, Amar Subramanyam wrote:
> Hi Miroslav,
> 
> We were able to reproduce the issue even without running phc2sys.
> Please find the attached strace and ptp4l logs when this issue is seen.

Ok, thanks. That's very helpful.

> From the ptp4l log, we can see that BMCA took 148 msec to run(Mono Interval :: 148445967).
> The same can be observed from strace logs. In the attached strace log, BMCA is executed between the timestamps 00:07:27.047830 (line number 53) to 00:07:27.196275(line number 102).

I don't see that in the log.

> 00:07:27.142942 close(4)                = 0
> 00:07:27.153733 close(15)               = 0
> 00:07:27.167783 socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP) = 4
> 00:07:27.167829 ioctl(4, SIOCGIFHWADDR, {ifr_name="sriov0", ifr_hwaddr={sa_family=ARPHRD_ETHER, sa_data=64:4c:36:12:55:e0}}) = 0
> 00:07:27.167878 close(4)                = 0
> 00:07:27.167964 socket(AF_PACKET, SOCK_RAW, 768) = 4
> 00:07:27.168023 ioctl(4, SIOCGIFINDEX, {ifr_name="sriov0", }) = 0
> 00:07:27.168095 bind(4, {sa_family=AF_PACKET, sll_protocol=htons(ETH_P_ALL), sll_ifindex=if_nametoindex("sriov0"), sll_hatype=ARPHRD_NETROM, sll_pkttype=PACKET_HOST, sll_halen=0}, 20) = 0
> 00:07:27.178781 setsockopt(4, SOL_SOCKET, SO_BINDTODEVICE, "sriov0", 6) = 0
> 00:07:27.178830 setsockopt(4, SOL_SOCKET, SO_ATTACH_FILTER, {len=12, filter=0x635ae0}, 16) = 0
> 00:07:27.178875 setsockopt(4, SOL_PACKET, PACKET_ADD_MEMBERSHIP, {mr_ifindex=if_nametoindex("sriov0"), mr_type=PACKET_MR_MULTICAST, mr_alen=6, mr_address=011b19000000}, 16) = 0
> 00:07:27.179151 setsockopt(4, SOL_PACKET, PACKET_ADD_MEMBERSHIP, {mr_ifindex=if_nametoindex("sriov0"), mr_type=PACKET_MR_MULTICAST, mr_alen=6, mr_address=0180c200000e}, 16) = 0
> 00:07:27.179415 socket(AF_PACKET, SOCK_RAW, 768) = 15
> 00:07:27.179482 ioctl(15, SIOCGIFINDEX, {ifr_name="sriov0", }) = 0
> 00:07:27.179518 bind(15, {sa_family=AF_PACKET, sll_protocol=htons(ETH_P_ALL), sll_ifindex=if_nametoindex("sriov0"), sll_hatype=ARPHRD_NETROM, sll_pkttype=PACKET_HOST, sll_halen=0}, 20) = 0
> 00:07:27.193807 setsockopt(15, SOL_SOCKET, SO_BINDTODEVICE, "sriov0", 6) = 0

What I see is that it's the closing and binding of the raw sockets
that is slowing down ptp4l so much that a received message waits for
up to ~40 milliseconds before the clock check gets its timestamp.

ptp4l is renewing the transport on announce/sync timeout, which
according to the git log is needed in the client-only mode to not get
multicast sockets stuck when the link goes down. I think the fix here
is to avoid renewing the raw transport. It shouldn't be necessary if I
understand it correctly.

Can you please verify that the following change fixes the problem for
you?

--- a/port.c
+++ b/port.c
@@ -1806,6 +1806,12 @@ static int port_renew_transport(struct port *p)
        if (!port_is_enabled(p)) {
                return 0;
        }
+
+       /* Closing and binding of raw sockets is too slow and unnecessary */
+       if (transport_type(p->trp) == TRANS_IEEE_802_3) {
+               return 0;
+       }
+
        transport_close(p->trp, &p->fda);
        port_clear_fda(p, FD_FIRST_TIMER);
        res = transport_open(p->trp, p->iface, &p->fda, p->timestamping);

-- 
Miroslav Lichvar

Re: [Linuxptp-devel] [PATCH v2 3/3] port: Don't check timestamps from non-slave ports.

From: Amar S. <asu...@al...> - 2021-05-28 09:26:36

Hi Miroslav,

Please find our response inline.

Thanks,
Amar B S

> -----Original Message-----
> From: Miroslav Lichvar [mailto:mli...@re...]
> Sent: 27 May 2021 15:14
> To: Amar Subramanyam <asu...@al...>
> Cc: lin...@li...
> Subject: Re: [PATCH v2 3/3] port: Don't check timestamps from non-slave ports.
> 
> CAUTION: This email originated from outside of Altiostar. Do not click on links
> or open attachments unless you recognize the sender and you are sure the
> content is safe. You will never be asked to reset your Altiostar password via
> email.
> 
> 
> On Wed, May 26, 2021 at 05:45:29PM +0000, Amar Subramanyam wrote:
> > Hi Miroslav,
> >
> > We were able to reproduce the issue even without running phc2sys.
> > Please find the attached strace and ptp4l logs when this issue is seen.
> 
> Ok, thanks. That's very helpful.
> 
> > From the ptp4l log, we can see that BMCA took 148 msec to run(Mono Interval
> :: 148445967).
> > The same can be observed from strace logs. In the attached strace log, BMCA
> is executed between the timestamps 00:07:27.047830 (line number 53) to
> 00:07:27.196275(line number 102).
> 
> I don't see that in the log.
> 
> > 00:07:27.142942 close(4)                = 0
> > 00:07:27.153733 close(15)               = 0
> > 00:07:27.167783 socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP) = 4
> > 00:07:27.167829 ioctl(4, SIOCGIFHWADDR, {ifr_name="sriov0",
> ifr_hwaddr={sa_family=ARPHRD_ETHER, sa_data=64:4c:36:12:55:e0}}) = 0
> > 00:07:27.167878 close(4)                = 0
> > 00:07:27.167964 socket(AF_PACKET, SOCK_RAW, 768) = 4
> > 00:07:27.168023 ioctl(4, SIOCGIFINDEX, {ifr_name="sriov0", }) = 0
> > 00:07:27.168095 bind(4, {sa_family=AF_PACKET,
> > sll_protocol=htons(ETH_P_ALL), sll_ifindex=if_nametoindex("sriov0"),
> > sll_hatype=ARPHRD_NETROM, sll_pkttype=PACKET_HOST, sll_halen=0}, 20) =
> > 0
> > 00:07:27.178781 setsockopt(4, SOL_SOCKET, SO_BINDTODEVICE, "sriov0",
> > 6) = 0
> > 00:07:27.178830 setsockopt(4, SOL_SOCKET, SO_ATTACH_FILTER, {len=12,
> > filter=0x635ae0}, 16) = 0
> > 00:07:27.178875 setsockopt(4, SOL_PACKET, PACKET_ADD_MEMBERSHIP,
> > {mr_ifindex=if_nametoindex("sriov0"), mr_type=PACKET_MR_MULTICAST,
> > mr_alen=6, mr_address=011b19000000}, 16) = 0
> > 00:07:27.179151 setsockopt(4, SOL_PACKET, PACKET_ADD_MEMBERSHIP,
> > {mr_ifindex=if_nametoindex("sriov0"), mr_type=PACKET_MR_MULTICAST,
> > mr_alen=6, mr_address=0180c200000e}, 16) = 0
> > 00:07:27.179415 socket(AF_PACKET, SOCK_RAW, 768) = 15
> > 00:07:27.179482 ioctl(15, SIOCGIFINDEX, {ifr_name="sriov0", }) = 0
> > 00:07:27.179518 bind(15, {sa_family=AF_PACKET,
> > sll_protocol=htons(ETH_P_ALL), sll_ifindex=if_nametoindex("sriov0"),
> > sll_hatype=ARPHRD_NETROM, sll_pkttype=PACKET_HOST, sll_halen=0}, 20) =
> > 0
> > 00:07:27.193807 setsockopt(15, SOL_SOCKET, SO_BINDTODEVICE, "sriov0",
> > 6) = 0
> 
> What I see is that it's the closing and binding of the raw sockets that is slowing
> down ptp4l so much that a received message waits for up to ~40 milliseconds
> before the clock check gets its timestamp.
> 
> ptp4l is renewing the transport on announce/sync timeout, which according to
> the git log is needed in the client-only mode to not get multicast sockets stuck
> when the link goes down. I think the fix here is to avoid renewing the raw
> transport. It shouldn't be necessary if I understand it correctly.
> 
> Can you please verify that the following change fixes the problem for you?
> 
> --- a/port.c
> +++ b/port.c
> @@ -1806,6 +1806,12 @@ static int port_renew_transport(struct port *p)
>         if (!port_is_enabled(p)) {
>                 return 0;
>         }
> +
> +       /* Closing and binding of raw sockets is too slow and unnecessary */
> +       if (transport_type(p->trp) == TRANS_IEEE_802_3) {
> +               return 0;
> +       }
> +
>         transport_close(p->trp, &p->fda);
>         port_clear_fda(p, FD_FIRST_TIMER);
>         res = transport_open(p->trp, p->iface, &p->fda, p->timestamping);
> 

Yes, With this patch applied, the problem seems to be fixed. We even did a cable removal test and link status was correctly detected.
We could see that ptp4l already has RTNL functionality and hence this port_renew_transport() seems to be unnecessary redundancy.
Should bypassing the function entirely not just for 802_3 but also for UDPv4 and UDPv6 transports be the right solution?
Let us know your comments.

> --
> Miroslav Lichvar

Re: [Linuxptp-devel] [PATCH v2 3/3] port: Don't check timestamps from non-slave ports.

From: Richard C. <ric...@gm...> - 2021-05-28 16:51:05

On Fri, May 28, 2021 at 09:26:07AM +0000, Amar Subramanyam via Linuxptp-devel wrote:
> We could see that ptp4l already has RTNL functionality and hence this port_renew_transport() seems to be unnecessary redundancy.

The link notifications from the kernel are best effort and are not
guaranteed.  So renewing the transport is the safe thing to do.

Thanks,
Richard

[Linuxptp-devel] [PATCH v3 0/5] Fixes for sanity clock check

From: Miroslav L. <mli...@re...> - 2021-05-31 09:08:17

v3
- added patch to avoid slow renewal of raw sockets
- added patch to increase the minimum check interval

v2
- improved commit message
- added missing NULL check

These patches make the clock check more reliable in several different
cases.

The first patch is not strictly related to the clock check, e.g. it also
fixes the path delay after switching the slave port.

Miroslav Lichvar (5):
  clock: Reset state when switching port with same best clock.
  clock: Reset clock check on best clock/port change.
  port: Don't check timestamps from non-slave ports.
  port: Don't renew raw transport.
  clockcheck: Increase minimum interval.

 clock.c      |  4 +++-
 clockcheck.c | 11 +++++++++--
 clockcheck.h |  6 ++++++
 port.c       | 11 ++++++++++-
 4 files changed, 28 insertions(+), 4 deletions(-)

-- 
2.26.3

[Linuxptp-devel] [PATCH v3 4/5] port: Don't renew raw transport.

From: Miroslav L. <mli...@re...> - 2021-05-31 09:08:17

Renewing of the transport on announce/sync timeout is needed in the
client-only mode to avoid getting stuck with a broken multicast socket
when the link goes down.

This shouldn't be necessary with the raw transport. Closing and binding
of raw sockets can apparently be so slow that it triggers a false
positive in the clock check.

Reported-by: Amar Subramanyam <asu...@al...>
Signed-off-by: Miroslav Lichvar <mli...@re...>
---
 port.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/port.c b/port.c
index fb420fb..6bf0684 100644
--- a/port.c
+++ b/port.c
@@ -1806,6 +1806,12 @@ static int port_renew_transport(struct port *p)
 	if (!port_is_enabled(p)) {
 		return 0;
 	}
+
+	/* Closing and binding of raw sockets is too slow and unnecessary */
+	if (transport_type(p->trp) == TRANS_IEEE_802_3) {
+		return 0;
+	}
+
 	transport_close(p->trp, &p->fda);
 	port_clear_fda(p, FD_FIRST_TIMER);
 	res = transport_open(p->trp, p->iface, &p->fda, p->timestamping);
-- 
2.26.3

Re: [Linuxptp-devel] [PATCH v3 4/5] port: Don't renew raw transport.

From: Keller, J. E <jac...@in...> - 2021-06-02 01:07:45


> -----Original Message-----
> From: Miroslav Lichvar <mli...@re...>
> Sent: Monday, May 31, 2021 2:08 AM
> To: lin...@li...
> Subject: [Linuxptp-devel] [PATCH v3 4/5] port: Don't renew raw transport.
> 
> Renewing of the transport on announce/sync timeout is needed in the
> client-only mode to avoid getting stuck with a broken multicast socket
> when the link goes down.
> 
> This shouldn't be necessary with the raw transport. Closing and binding
> of raw sockets can apparently be so slow that it triggers a false
> positive in the clock check.
> 
> Reported-by: Amar Subramanyam <asu...@al...>
> Signed-off-by: Miroslav Lichvar <mli...@re...>
> ---

Makes sense.

Reviewed-by: Jacob Keller <jac...@in...>

>  port.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/port.c b/port.c
> index fb420fb..6bf0684 100644
> --- a/port.c
> +++ b/port.c
> @@ -1806,6 +1806,12 @@ static int port_renew_transport(struct port *p)
>  	if (!port_is_enabled(p)) {
>  		return 0;
>  	}
> +
> +	/* Closing and binding of raw sockets is too slow and unnecessary */
> +	if (transport_type(p->trp) == TRANS_IEEE_802_3) {
> +		return 0;
> +	}
> +
>  	transport_close(p->trp, &p->fda);
>  	port_clear_fda(p, FD_FIRST_TIMER);
>  	res = transport_open(p->trp, p->iface, &p->fda, p->timestamping);
> --
> 2.26.3
> 
> 
> 
> _______________________________________________
> Linuxptp-devel mailing list
> Lin...@li...
> https://lists.sourceforge.net/lists/listinfo/linuxptp-devel

[Linuxptp-devel] [PATCH v3 3/5] port: Don't check timestamps from non-slave ports.

From: Miroslav L. <mli...@re...> - 2021-05-31 09:08:15

Don't perform the sanity check on receive timestamps from ports in
non-slave states to avoid false positives in the jbod mode, where
the timestamps can be generated by different clocks.

Reviewed-by: Jacob Keller <jac...@in...>
Signed-off-by: Miroslav Lichvar <mli...@re...>
---
 port.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/port.c b/port.c
index 10bb9e1..fb420fb 100644
--- a/port.c
+++ b/port.c
@@ -2744,7 +2744,10 @@ static enum fsm_event bc_event(struct port *p, int fd_index)
 	}
 	if (msg_sots_valid(msg)) {
 		ts_add(&msg->hwts.ts, -p->rx_timestamp_offset);
-		clock_check_ts(p->clock, tmv_to_nanoseconds(msg->hwts.ts));
+		if (p->state == PS_SLAVE) {
+			clock_check_ts(p->clock,
+				       tmv_to_nanoseconds(msg->hwts.ts));
+		}
 	}
 
 	switch (msg_type(msg)) {
-- 
2.26.3

[Linuxptp-devel] [PATCH v3 2/5] clock: Reset clock check on best clock/port change.

From: Miroslav L. <mli...@re...> - 2021-05-31 09:08:15

Reset the clock check when the best clock or port changes, together with
the other state like current estimated delay and frequency. This avoids
false positives if the clock is controlled by an external process when
not synchronized by PTP (e.g. phc2sys -rr).

Reviewed-by: Jacob Keller <jac...@in...>
Signed-off-by: Miroslav Lichvar <mli...@re...>
---
 clock.c      | 2 ++
 clockcheck.c | 9 ++++++++-
 clockcheck.h | 6 ++++++
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/clock.c b/clock.c
index a073575..2dd7ef9 100644
--- a/clock.c
+++ b/clock.c
@@ -1949,6 +1949,8 @@ static void handle_state_decision_event(struct clock *c)
 
 	if (!cid_eq(&best_id, &c->best_id) || best != c->best) {
 		clock_freq_est_reset(c);
+		if (c->sanity_check)
+			clockcheck_reset(c->sanity_check);
 		tsproc_reset(c->tsproc, 1);
 		if (!tmv_is_zero(c->initial_delay))
 			tsproc_set_delay(c->tsproc, c->initial_delay);
diff --git a/clockcheck.c b/clockcheck.c
index d48a578..d0b4714 100644
--- a/clockcheck.c
+++ b/clockcheck.c
@@ -47,9 +47,16 @@ struct clockcheck *clockcheck_create(int freq_limit)
 	if (!cc)
 		return NULL;
 	cc->freq_limit = freq_limit;
+	clockcheck_reset(cc);
+	return cc;
+}
+
+void clockcheck_reset(struct clockcheck *cc)
+{
+	cc->freq_known = 0;
 	cc->max_freq = -CHECK_MAX_FREQ;
 	cc->min_freq = CHECK_MAX_FREQ;
-	return cc;
+	cc->last_ts = 0;
 }
 
 int clockcheck_sample(struct clockcheck *cc, uint64_t ts)
diff --git a/clockcheck.h b/clockcheck.h
index 78aca48..1ff86eb 100644
--- a/clockcheck.h
+++ b/clockcheck.h
@@ -33,6 +33,12 @@ struct clockcheck;
  */
 struct clockcheck *clockcheck_create(int freq_limit);
 
+/**
+ * Reset a clock check.
+ * @param cc Pointer to a clock check obtained via @ref clockcheck_create().
+ */
+void clockcheck_reset(struct clockcheck *cc);
+
 /**
  * Perform the sanity check on a time stamp.
  * @param cc Pointer to a clock check obtained via @ref clockcheck_create().
-- 
2.26.3

[Linuxptp-devel] [PATCH v3 1/5] clock: Reset state when switching port with same best clock.

From: Miroslav L. <mli...@re...> - 2021-05-31 09:08:17

When the best port is changed, but the ID of the best clock doesn't
change (e.g. a passive port is activated on link failure), reset the
current delay and other master/link-specific state to avoid the switch
throwing the clock off.

Reviewed-by: Jacob Keller <jac...@in...>
Signed-off-by: Miroslav Lichvar <mli...@re...>
---
 clock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/clock.c b/clock.c
index e545a9b..a073575 100644
--- a/clock.c
+++ b/clock.c
@@ -1947,7 +1947,7 @@ static void handle_state_decision_event(struct clock *c)
 			  cid2str(&best_id));
 	}
 
-	if (!cid_eq(&best_id, &c->best_id)) {
+	if (!cid_eq(&best_id, &c->best_id) || best != c->best) {
 		clock_freq_est_reset(c);
 		tsproc_reset(c->tsproc, 1);
 		if (!tmv_is_zero(c->initial_delay))
-- 
2.26.3

[Linuxptp-devel] [PATCH v3 5/5] clockcheck: Increase minimum interval.

From: Miroslav L. <mli...@re...> - 2021-05-31 09:08:23

Increase the minimum check interval to 1 second to measure the frequency
offset more accurately and with default configuration make false
positives less likely due to a heavily overloaded system.

Signed-off-by: Miroslav Lichvar <mli...@re...>
---
 clockcheck.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/clockcheck.c b/clockcheck.c
index d0b4714..f0141be 100644
--- a/clockcheck.c
+++ b/clockcheck.c
@@ -23,7 +23,7 @@
 #include "clockcheck.h"
 #include "print.h"
 
-#define CHECK_MIN_INTERVAL 100000000
+#define CHECK_MIN_INTERVAL 1000000000
 #define CHECK_MAX_FREQ 900000000
 
 struct clockcheck {
-- 
2.26.3

1 2 > >> (Page 1 of 2)