[Linuxptp-devel] phc2sys, timemaster: stale/bad receive time /w ptp4l using 2 ethernet interfaces

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hello linuxptp community,

thank you for your work.

We are experiencing a problem using the timemaster application with PTP
synchronizing to two ethernet ports. In detail I suspect the problem to
be connected to the phc2sys autoconfiguration mode.

We are not sure if this is caused by
- wrong command line parameters / config files on our side
- an inherent disablity of the respective applications for this use case
- a bug
The levels of likeliness are likely to decrease from top to bottom.

#########################
### setup description ###
#########################

Our setup is as follows:

The hardware is a custom board with a NXP P2020 PowerPC CPU. The CPU
provides 3 MACs and one PTP Clock. The CPUs MACs provide hardware
timestamping functionality.
The device shall act as a PTP slave only.
For redundacy reasons, one PTP Grandmaster Clock Master Clock is
attached to our device's eth1, another PTP Grandmaster Clock Master
Clock is attached to our device's eth2.
The ntp server (a regular server pc) is currently connected to eth0.

The timemaster application basically spawns the applications ptp4l,
phc2sys and ntpd (instead of cronyd in our case).
The effect seen is the same, whether ptp4l, phc2sys and ntpd are called
manually or via timemaster.
To make this issue more understandable, I will only display the console
dumps for the manual calling of the applications here. The issue is not
attributed to the timemaster application itself.

##########################################
### problem description / console logs ###
##########################################

We are calling the respective applications in the following order (in
different ssh sessions):

1. ptp4l:
/usr/bin/ptp4l -l 7 -f /var/run/timemaster/ptp4l.0.conf.bak -H -i eth1
-i eth2 -m

Config file:
# cat /var/run/timemaster/ptp4l.0.conf.bak
[global]
slaveOnly 1
domainNumber 0
uds_address /var/run/timemaster/ptp4l.0.socket
message_tag [0:eth1+eth2]

It is very rudimentary and a copy of the config file created by the
timemaster when we first ran into this problem.

ptp4l correctly syncs /dev/ptp0 to one of the grandmaster clocks:

root@cp61850:/var/log# pmc -u -b 0 'GET PORT_DATA_SET'
sending: GET PORT_DATA_SET
        00d093.fffe.274a5e-1 seq 0 RESPONSE MANAGEMENT PORT_DATA_SET
                portIdentity            00d093.fffe.274a5e-1
                portState               SLAVE
                logMinDelayReqInterval  0
                peerMeanPathDelay       0
                logAnnounceInterval     1
                announceReceiptTimeout  3
                logSyncInterval         0
                delayMechanism          1
                logMinPdelayReqInterval 0
                versionNumber           2
        00d093.fffe.274a5e-2 seq 0 RESPONSE MANAGEMENT PORT_DATA_SET
                portIdentity            00d093.fffe.274a5e-2
                portState               LISTENING
                logMinDelayReqInterval  0
                peerMeanPathDelay       0
                logAnnounceInterval     1
                announceReceiptTimeout  3
                logSyncInterval         0
                delayMechanism          1
                logMinPdelayReqInterval 0
                versionNumber           2

Debug prints of ptp4l are attached.

2. phc2sys:
/usr/bin/phc2sys -l 7 -a -r -R 1.00 -m -z
/var/run/timemaster/ptp4l.0.socket -t [0:eth1+eth2] -n 0 -E ntpshm -M 0

resulting in following debug output:
phc2sys[28965.849]: [0:eth1+eth2] config item (null).uds_address is
'/var/run/timemaster/ptp4l.0.socket'
phc2sys[28965.850]: [0:eth1+eth2] config item (null).ntpshm_segment is 0
phc2sys[28965.850]: [0:eth1+eth2] config item (null).step_threshold is
0.000000
phc2sys[28965.850]: [0:eth1+eth2] config item
(null).first_step_threshold is 0.000020
phc2sys[28965.850]: [0:eth1+eth2] config item (null).max_frequency is
900000000
phc2sys[28965.851]: [0:eth1+eth2] config item (null).ntpshm_segment is 0
phc2sys[28965.851]: [0:eth1+eth2] config item (null).step_threshold is
0.000000
phc2sys[28965.851]: [0:eth1+eth2] config item
(null).first_step_threshold is 0.000020
phc2sys[28965.851]: [0:eth1+eth2] config item (null).max_frequency is
900000000
phc2sys[28965.851]: [0:eth1+eth2] config item (null).ntpshm_segment is 0
phc2sys[28965.851]: [0:eth1+eth2] config item (null).step_threshold is
0.000000
phc2sys[28965.851]: [0:eth1+eth2] config item
(null).first_step_threshold is 0.000020
phc2sys[28965.851]: [0:eth1+eth2] config item (null).max_frequency is
900000000
phc2sys[28966.852]: [0:eth1+eth2] reconfiguring after port state change
phc2sys[28966.852]: [0:eth1+eth2] selecting eth2 for synchronization
phc2sys[28966.852]: [0:eth1+eth2] selecting CLOCK_REALTIME for
synchronization
phc2sys[28966.853]: [0:eth1+eth2] selecting eth1 as the master clock
phc2sys[28966.853]: [0:eth1+eth2] phc offset  -6283178 s0 freq      +0
delay    680
phc2sys[28966.853]: [0:eth1+eth2] phc offset         5 s0 freq      +0
delay   2130
phc2sys[28967.853]: [0:eth1+eth2] phc offset  -6285375 s0 freq      +0
delay    680
phc2sys[28967.853]: [0:eth1+eth2] phc offset         5 s0 freq      +0
delay   2170
phc2sys[28968.853]: [0:eth1+eth2] phc offset  -6287640 s0 freq      +0
delay    680
phc2sys[28968.854]: [0:eth1+eth2] phc offset         5 s0 freq      +0
delay   2130
...

3. ntpd
/usr/sbin/ntpd -u ntp:ntp -g -n -c /var/run/timemaster/ntp.conf.bak

Config file:
# cat /var/run/timemaster/ntp.conf.bak
includefile /etc/ntp.conf
server 192.168.11.100 minpoll 6 maxpoll 10
server 127.127.28.0 minpoll 2 maxpoll 2 mode 1
fudge 127.127.28.0 refid PTP0

It is very rudimentary and a copy of the config file created by the
timemaster when we first ran into this problem.

resulting in following debug output:
20 Mar 15:48:15 ntpd[12373]: ntpd 4.2.8p8@1.3265 Wed Jan 25 14:36:50 UTC
2017 (2): Starting
20 Mar 15:48:15 ntpd[12373]: Command line: /usr/sbin/ntpd -u ntp:ntp -g
-n -c /var/run/timemaster/ntp.conf.bak
20 Mar 15:48:15 ntpd[12373]: proto: precision = 0.780 usec (-20)
20 Mar 15:48:15 ntpd[12373]: minpoll: provided value (2) is out of range
[3-255])
20 Mar 15:48:15 ntpd[12373]: Listen and drop on 0 v6wildcard [::]:123
20 Mar 15:48:15 ntpd[12373]: Listen and drop on 1 v4wildcard 0.0.0.0:123
20 Mar 15:48:15 ntpd[12373]: Listen normally on 2 lo 127.0.0.1:123
20 Mar 15:48:15 ntpd[12373]: Listen normally on 3 eth0 192.168.11.247:123
20 Mar 15:48:15 ntpd[12373]: Listen normally on 4 eth1 192.168.1.236:123
20 Mar 15:48:15 ntpd[12373]: Listen normally on 5 eth2 192.168.91.220:123
20 Mar 15:48:15 ntpd[12373]: Listen normally on 6 lo [::1]:123
20 Mar 15:48:15 ntpd[12373]: Listen normally on 7 eth0
[fe80::2d0:93ff:fe27:4a5d%3]:123
20 Mar 15:48:15 ntpd[12373]: Listen normally on 8 eth1
[fe80::2d0:93ff:fe27:4a5e%4]:123
20 Mar 15:48:15 ntpd[12373]: Listen normally on 9 eth2
[fe80::2d0:93ff:fe27:4a5f%5]:123
20 Mar 15:48:15 ntpd[12373]: Listening on routing socket on fd #26 for
interface updates
20 Mar 15:48:16 ntpd[12373]: SHM: stale/bad receive time, delay=-37s
20 Mar 15:48:17 ntpd[12373]: SHM: stale/bad receive time, delay=-37s
20 Mar 15:48:18 ntpd[12373]: SHM: stale/bad receive time, delay=-37s
(continuing..)

Obviously there is a problem with the leap seconds.
In my understanding, phc2sys should "translate" the PHC's TAI into UTC
and provide this to the ntpd via the SHM.
"In hardware time stamping mode, ptp4l announces use of PTP time scale
and PHC is used for the stamps.  That means PHC must follow PTP time
scale while system clock follows UTC.  Time offset between these two is
maintained by phc2sys."
[phy2sys manpage]

The ntpd only synchronises to the ntp server 192.168.11.100, not the PTP
clock:

root@cp61850:~# ntpq -c peers
     remote           refid      st t when poll reach   delay   offset
jitter
==============================================================================
*192.168.11.100  131.188.3.221    2 u    5   64    1    0.384    1.938
0.001
 SHM(0)          .PTP0.           0 l    -    8    0    0.000    0.000
0.000

However the grand master clock and ntp server are set correctly to the
correct UTC resp. TAI times.

The above behaviour is the same if eth1 or eth2 are unplugged before or
after the calling of the applications.

For investigation, further tests were performed:

#########################################################
### comparative test 1: ptp4l using only eth1 or eth2 ###
#########################################################

For a test, the basic setup (### problem description ###) is repeated
with ptp4l using only one ethernet interface:

/usr/bin/ptp4l -l 7 -f /var/run/timemaster/ptp4l.0.conf.bak -H -i eth1 -m

Order and syntax of the other applications' calling are the same as above.

The ntpd error message
SHM: stale/bad receive time, delay=-37s
does not appear.

ntpd correctly syncs to the PTP Clock:

root@cp61850:~# ntpq -c peers
     remote           refid      st t when poll reach   delay   offset
jitter
==============================================================================
+192.168.11.100  131.188.3.222    2 u    4   64    1    0.185    1.010
0.001
*SHM(0)          .PTP0.           0 l    3    8    1    0.000    6.986
0.001

The behaviour is the same if eth2 instead of eth1 is used by ptp4l.

##############################################################
### comparative test 2: call phc2sys with -w instead of -a ###
##############################################################

For a test, the basic setup (### problem description ###) is repeated
with phc2sys flags -w instead of -a (autoconfiguration disabled).

# /usr/bin/phc2sys -l 7 -w -r -R 1.00 -m -z
/var/run/timemaster/ptp4l.0.socket -t [0:eth1+eth2] -n 0 -E ntpshm -M 0
-s /dev/ptp0

Order and syntax of the other applications' calling are the same as above.

The ntpd error message
SHM: stale/bad receive time, delay=-37s
does not appear.

ntpd correctly syncs to the PTP Clock:

root@cp61850:~# ntpq -c peers
     remote           refid      st t when poll reach   delay   offset
jitter
==============================================================================
+192.168.11.100  131.188.3.222    2 u    4   64    1    0.185    1.010
0.001
*SHM(0)          .PTP0.           0 l    3    8    1    0.000    6.986
0.001

##################
### discussion ###
##################

It seems there is a problem with the autoconfiguration featue of phc2sys
in combination with ptp4l working with two ethernet interfaces.

Can anyone confirm this behaviour?

Did we make mistakes with the application flags and/or config files,
respectively is there a solution by changing these?

Best regards,
Stefan

-- 

Stefan Lange
Entwicklung

___________________________________________________________
GateWare Communications GmbH
!ACHTUNG, NEUE ADRESSE!
Friggastr. 2, D-90461 Nürnberg
Germany

Tel: +49 (0) 911 / 946016-190
Fax: +49 (0) 911 / 946016-80

Eingetragene Rechtsform : GmbH
Sitz und Registergericht: Amtsgericht Nürnberg HRB 18334
Geschäftsführer         : Wolfgang Haubner

Web: http://www.gateware.de
E-Mail: s....@ga...
___________________________________________________________

Diese E-mail kann vertrauliche Informationen enthalten. Falls Sie diese
E-Mail irrtümlich erhalten haben, informieren Sie bitte unverzüglich den
Absender und löschen Sie diese E-Mail von jedem Rechner, auch von den
Mailservern. Jede Verbreitung des Inhalts, auch die teilweise Verbreitung,
ist in diesem Fall untersagt. Außer bei Vorsatz oder grober Fahrlässigkeit
schließen wir jegliche Haftung für Verluste oder Schäden aus, die durch
Viren befallene Software oder E-Mails verursacht werden.

This e-mail may contain confidential information. If you received this
e-mail in error, please contact the sender and delete this e-mail from your
computer, including your mailservers. Any dissemination, even partly, is
prohibited. Except in case of gross negligence or wilful misconduct we
accept no liability for any loss or damage caused by software or e-mail
viruses.

[Linuxptp-devel] phc2sys, timemaster: stale/bad receive time /w ptp4l using 2 ethernet interfaces

PTP IEEE 1588 stack for Linux

[Linuxptp-devel] phc2sys, timemaster: stale/bad receive time /w ptp4l using 2 ethernet interfaces