Thread: [Linuxptp-users] How to use PTP for accurate relative time accuracy (when doing local timestamping)
PTP IEEE 1588 stack for Linux
Brought to you by:
rcochran
From: C. D. <cd...@ou...> - 2019-03-03 02:10:48
|
Hello I have several servers doing measurements. Each server is logging its timestamped measurements independently. About every few seconds, all the measurements are grouped together in a database. Within this cluster of servers, it is very important to accurately reconcile the timestamped measurements to have a consistent time series. Until now, a few milliseconds were acceptable so I was using NTP. Now, the sampling frequency has increased, so I need to be much more accurate. I am trying to do that with PTP. This seems to be a use case very similar to https://sourceforge.net/p/linuxptp/mailman/message/35665802/ : "Relative (device to device) time accuracy is important but absolute world-time only needs to match roughly (couple of milliseconds)". To be honest, absolute world time accuracy is a very distant concern for now: it is preferable to increase the relative accuracy by several microseconds, even if the trade-off for doing that would degrade the absolute time accuracy by several milliseconds (!!). I haven't found many people with the same concerns. The servers are in a datacenter where I can't add anything like a GPS grandmaster, but I have otherwise full control of the baremetal servers: to help with other issues related to database latency jitter, I now have each server directly connected to every other in pairs, using simple crossover RJ45 cables between extra NICs. All the NICs are Intel e1000e. For example, for a cluster of 3 servers: Serv1->switch->internet : eth0 Serv1->Serv2 : eth2 Serv1->Serv3 : eth3 Serv2->switch->->internet : eth0 Serv2->Serv1 : eth1 Serv2->Serv3 : eth3 Serv3->switch->->internet : eth0 Serv3->Serv1 : eth1 Serv3->Serv2 : eth2 Each link is configured with IPv4 and IPv6, and everything works (ping, ping6, arping...). Using arping, to give an idea of the jitter on the lan, sending 10 packets: - from server1 to server2: rtt min/avg/max/std-dev = 0.111/0.135/0.181/0.018 ms - from server2 to server3 (link with the lowest load): rtt min/avg/max/std-dev = 0.081/0.102/0.126/0.018 ms - from server1 to server3 (link with the highest load): rtt min/avg/max/std-dev = 0.122/0.311/0.846/0.235 ms I would like now to take advantage of these extra NICS to increase the relative time accuracy by as much as I can, but I am an absolute beginner with PTP. I have read everything I could, then I started to use linuxptp timemaster with the NTP servers from the datacenter and my NICs simply configured for ptp4l with a 150 us tolerance given the rtt results, ie on server1: [ptp_domain 1] interfaces eth3 ptp4l_option clock_servo linreg # 150 microseconds delay 150e-6 (likewise for eth2, and eth0 even if there's a switch which may add delays) Then 'systemctl status timemaster' shows everything running normally: CGroup: /system.slice/timemaster.service ├─53151 /usr/sbin/timemaster -f /etc/linuxptp/timemaster.conf ├─53153 /usr/sbin/chronyd -n -f /var/run/timemaster/chrony.conf ├─53156 /usr/sbin/ptp4l -l 5 -f /var/run/timemaster/ptp4l.0.conf -H -i eth0 ├─53157 /usr/sbin/phc2sys -E linreg -a -r -R 1.00 -z /var/run/timemaster/ptp4l.0.socket -t [0:eth0] -n 0 -E ntpshm -M 0 ├─53158 /usr/sbin/ptp4l -l 5 -f /var/run/timemaster/ptp4l.1.conf -H -i eth2 ├─53160 /usr/sbin/phc2sys -E linreg -a -r -R 1.00 -z /var/run/timemaster/ptp4l.1.socket -t [1:eth2] -n 1 -E ntpshm -M 1 ├─53161 /usr/sbin/ptp4l -l 5 -f /var/run/timemaster/ptp4l.2.conf -H -i eth3 └─53163 /usr/sbin/phc2sys -E linreg -a -r -R 1.00 -z /var/run/timemaster/ptp4l.2.socket -t [1:eth3] -n 1 -E ntpshm -M 2 However, PTP doesn't work unless I start manually ptp4l on the other servers, using specifically created systemd scripts to manage the separate interfaces and master/slave options. 'chrony sources' shows: 210 Number of sources = 6 MS Name/IP address Stratum Poll Reach LastRx Last sample =============================================================================== #? PTP0 0 2 0 - +0ns[ +0ns] +/- 0ns #? PTP1 0 2 0 - +0ns[ +0ns] +/- 0ns #? PTP2 0 2 0 - +0ns[ +0ns] +/- 0ns (...) Based on my understanding of timemaster, this is because it hardcodes "slaveOnly 1" in the scripts it creates in /var/run/timemaster/ptp4l.?.conf, and adding SlaveOnly 0 to the [ptp4l.conf] section of /etc/linuxptp/timemaster.conf doesn't help (both SlaveOnly 0 and SlaveOnly 1 are then present in the configuration file). With the default configuration, all the servers are slaveonly and nobody ever becomes a master. To fix that, I created custom systemd scripts instead of using timemaster, for example on server3 I use: ExecStart=/usr/sbin/phc2sys -w -z /var/run/ptp4l.%i.socket -s %i (...) ExecStart=/usr/sbin/ptp4l -f /etc/linuxptp/ptp4l.conf --uds_address=/var/run/ptp4l.%i.socket -i %i ptp4l.conf is the vanilla debian configuration file with just clock_servo linreg added Then on server1, I can get chrony sources to at least see the PTP from eth3 using a very vanilla timemaster.conf: [ptp_domain 0] interfaces eth0 delay 150e-6 [ptp_domain 1] interfaces eth2 ptp4l_option clock_servo linreg delay 150e-6 [ptp_domain 1] interfaces eth3 ptp4l_option clock_servo linreg delay 150e-6 [timemaster] ntp_program chronyd [chrony.conf] include /etc/chrony.conf [ntp.conf] includefile /etc/ntp.conf [ptp4l.conf] [chronyd] path /usr/sbin/chronyd [ntpd] path /usr/sbin/ntpd options -u ntp:ntp -g [phc2sys] path /usr/sbin/phc2sys options -E linreg [ptp4l] path /usr/sbin/ptp4l I just need to start the services manually on server3: systemctl start ptp4l@eth1 systemctl start phc2sys@eth1 Yet if I do that, the "#x" indicates that chronyc thinks the PTP is a falseticker: #x PTP2 0 2 77 3 +97.9s[ +97.9s] +/- 44ms I can't fault it for thinking that, since there is a 98 second offset (!!) If I wait a bit, the accuracy improves, but not the offset: #x PTP2 0 2 377 6 +97.9s[ +97.9s] +/- 5442us (...) #x PTP2 0 2 377 4 +97.9s[ +97.9s] +/- 379us (...) #x PTP2 0 2 177 5 +98.0s[ +98.0s] +/- 82us I believe I am doing several thing wrong - for example, I should set the PHC using the system time instead of doing the opposite, but I have not found how to do that with timemaster. I did some more reading, but I could not find anything close to what I need except maybe https://github.com/not1337/pps-stuff which use a GPS to serve the time by PTP. Based on that, I run instead of systemd phc2sys@eth1 a "sys2phc" script on server3: phc2sys -s CLOCK_REALTIME -c eth1 -O 0 -R 10 -N 2 -E linreg -L 50000000 -n 0 -q -m Then server1 has time jumps back and forth on the PTP source, for ex: phc2sys[4656203.109]: [1:eth3] eth3 sys offset 224 s2 freq -5704 delay 12840 phc2sys[4656203.209]: [1:eth3] eth3 sys offset -38178389 s2 freq -24180511 delay 5408 phc2sys[4656203.309]: [1:eth3] clockcheck: clock jumped backward or running slower than expected! phc2sys[4656203.309]: [1:eth3] eth3 sys offset -35789106 s0 freq -24180511 delay 12946 phc2sys[4656203.410]: [1:eth3] eth3 sys offset -33367774 s0 freq -24180511 delay 13232 phc2sys[4656203.510]: [1:eth3] eth3 sys offset -30946905 s0 freq -24180511 delay 13396 phc2sys[4656203.610]: [1:eth3] eth3 sys offset -28525849 s2 freq -285263527 delay 13355 phc2sys[4656203.710]: [1:eth3] eth3 sys offset 36575 s2 freq +360178 delay 16503 phc2sys[4656203.810]: [1:eth3] eth3 sys offset 4576 s2 freq +22734 delay 12835 phc2sys[4656203.910]: [1:eth3] eth3 sys offset 1830 s2 freq +18307 delay 12880 phc2sys[4656204.010]: [1:eth3] eth3 sys offset -523 s2 freq +12767 delay 12760 I tried killing the phc2sys and ptp4l started by timemaster on server1 to replace it by a similar scripts, but it doesn't help. At this point, I am stuck. I can't get PTP to work. Could someone please help me with the configuration to at least have PTP working in UDP? I would also be interested in replacing UDPv4 by L2 since it may help with the accuracy, and any other tuning that could help the relative time accuracy inside the cluster (and the time it take to syncronize to the cluster in case of a reboot) Thanks |
From: Miroslav L. <mli...@re...> - 2019-03-04 09:39:40
|
On Sun, Mar 03, 2019 at 02:10:27AM +0000, C. Devereaux wrote: > #x PTP2 0 2 177 5 +98.0s[ +98.0s] +/- 82us > > I believe I am doing several thing wrong - for example, I should set the PHC using the system time instead of doing the opposite, but I have not found how to do that with timemaster. I did some more reading, but I could not find anything close to what I need except maybe https://github.com/not1337/pps-stuff which use a GPS to serve the time by PTP. Based on that, I run instead of systemd phc2sys@eth1 a "sys2phc" script on server3: phc2sys -s CLOCK_REALTIME -c eth1 -O 0 -R 10 -N 2 -E linreg -L 50000000 -n 0 -q -m > > Then server1 has time jumps back and forth on the PTP source, for ex: I think that means there are two processes (phc2sys and ptp4l) trying to control the same clock at the same time. It's not possible to use timemaster to run a PTP master. The PTP clock will not be synchronized to the system clock. The NTPSHM servo can work only in one direction. With timemaster and ntpd/chrony it's always PHC->system clock. My suggestion would be to dedicate a node in the cluster to work as a PTP master for the other nodes, configured manually with phc2sys and ptp4l. The PTP clock on the master node can be synchronized to the system clock using a weak PI servo, which can be synchronized with NTP. -- Miroslav Lichvar |
From: C. D. <cd...@ou...> - 2019-03-04 10:26:05
|
Hello * It's not possible to use timemaster to run a PTP master. The PTP clock will not be synchronized to the system clock. The NTPSHM servo can work only in one direction. With timemaster and ntpd/chrony it's always PHC->system clock. That much I almost deducted 😊 Timemaster seems too rigid. * My suggestion would be to dedicate a node in the cluster to work as a PTP master for the other nodes, configured manually with phc2sys and ptp4l. The PTP clock on the master node can be synchronized to the system clock using a weak PI servo, which can be synchronized with NTP That’s the plan! I would run that on server3. However, even after my manual configuration, the PHC clocls are like 90 seconds off. I need to be roughly within absolute time, but timestamps 90s off may be too much. Could you please suggest the syntax of the phc2sys and ptp4l commands to run on server3 (time master) and server1? I will edit my systemd services to match. Server3 talks to server1 via eth1, which is called eth3 on the other end, on server1. Because weak PI servo, or the constants I should use are not exactly self evident for a PTP beginner – and likewise for which algorithms will give the most accuracy. Based on my readings, I assume a stable jitter around 50 ns should be possible. Thanks |
From: Miroslav L. <mli...@re...> - 2019-03-04 10:37:22
|
On Mon, Mar 04, 2019 at 10:25:53AM +0000, C. Devereaux wrote: > Could you please suggest the syntax of the phc2sys and ptp4l commands to run on server3 (time master) and server1? I will edit my systemd services to match. Server3 talks to server1 via eth1, which is called eth3 on the other end, on server1. On the PTP master you could try this: ptp4l -m -i $ETH --clockClass 6 phc2sys -a -r -r -P 1e-4 -I 1e-8 The system clock should be synchronized, e.g. by NTP. The slaves can be configured by timemaster using both NTP and PTP. > Because weak PI servo, or the constants I should use are not exactly self evident for a PTP beginner – and likewise for which algorithms will give the most accuracy. Based on my readings, I assume a stable jitter around 50 ns should be possible. With good switches with PTP support jitter of 50 nanosecond might be possible. -- Miroslav Lichvar |
From: C. D. <cd...@ou...> - 2019-03-05 08:25:31
|
Hello Miroslav, First, thanks a lot for the suggestions: it is working now! But I had to make some changes, and the jitter remains strong. I would like to have server2 and 3 match the timestamping of server1, to avoid introducing inconsistencies in the time series due to measurement issues. The way I see it, if due to clock jitter, the same event occurring at t0 is being seen at t0 on server1, t0+32 us on server2, and t0-32us on server3 (worst case scenario with the worst measured values now), this means the aggregated measures from the cluster have at best a temporal resolution of about 64 us, and anything not separated by at least 64 us risks being seen in the wrong order. * On the PTP master you could try this: The NIC clocks were off by quite a lot, even after synchronizing the system clock by NTP and checking with ntpdate -q With the suggested commands, I got: phc2sys[12119]: [4853480.133] eth0 sys offset -36998785228 s0 freq -26003 delay 12680 (with the offset slowly adjusting) That’s about 37 seconds. I suppose it’s due to the TAI UTC difference, with the RTC being on UTC. After reading more about that, instead of hardcoding an offset of 37 (and keeping track of the when it must be changed) I decided to hardcode 0 instead, and to run the ptp4l server in UTC, through either legacy hardware timestamping or software timestamping. Also, it takes some time to reach a smaller offset. I would like to “start from scratch” and have the offset jump., ie make the NICs match the RTC immediately, before starting to broadcast PTP messages. However, the -F to force step sync on start to do that seems to be ignored, regardless of the units I use: /usr/sbin/phc2sys -s CLOCK_REALTIME -c %i -r -r -P 1e-4 -I 1e-8 -O0 -F99999999999 phc2sys[4859094.548]: eth2 sys offset -45059123402 s0 freq -5389409 delay 12788 phc2sys[4859209.606]: eth2 sys offset -44439496394 s0 freq -5389409 delay 12426 (…) phc2sys[4859247.349]: eth2 sys offset -44236236081 s0 freq -5389409 delay 12748 Also, the legacy hardware timestamping doesn’t work on the e1000e NIC: /usr/sbin/ptp4l --clockClass 6 --free_running 1 --uds_address /var/run/ptp4l.%i.socket -L -i %i -m ptp4l[4861645.037]: interface 'eth3' does not support requested timestamping mode For now, I’m using linreg in phc2sys (even if I can’t start from scratch, at least it is adjusting faster), and software timestamping (in ptp4l, to get UTC values ie -O0 offset) /usr/sbin/phc2sys -s CLOCK_REALTIME -c eth3 -r -r -P 1e-4 -I 1e-8 -O0 -m -F 09999 -E linreg phc2sys[4859656.487]: eth3 sys offset -48759179009 s0 freq -6251529 delay 12759 phc2sys[4859675.492]: eth3 sys offset -39138078779 s2 freq -599999999 delay 20288 (…) phc2sys[4860253.139]: eth3 sys offset -43 s2 freq -5230 delay 12800 /usr/sbin/ptp4l --clockClass 6 -S -i eth2 -i eth3 --uds_address /var/run/ptp4l.socket -m With this, on server3 and server2, the jitter is still far from the ns scale, but more tolerable: Server3: $ chronyc sources (…) #* PTP0 0 2 377 2 +78ns[+1392ns] +/- 32us (…) $ chronyc tracking Reference ID : 50545030 (PTP0) Stratum : 1 Ref time (UTC) : Tue Mar 05 07:43:03 2019 System time : 0.000003320 seconds fast of NTP time Last offset : +0.000016404 seconds RMS offset : 0.000030792 seconds Frequency : 15.905 ppm fast Residual freq : +0.678 ppm Skew : 11.099 ppm Root delay : 0.000010 seconds Root dispersion : 0.000111 seconds Update interval : 4.0 seconds Leap status : Normal Server 2: #* PTP0 0 2 377 5 +76ns[ +258ns] +/- 5226ns Reference ID : 50545030 (PTP0) Stratum : 1 Ref time (UTC) : Tue Mar 05 08:09:37 2019 System time : 0.000000016 seconds fast of NTP time Last offset : +0.000000043 seconds RMS offset : 0.000000128 seconds Frequency : 21.068 ppm fast Residual freq : +0.000 ppm Skew : 0.007 ppm Root delay : 0.000010 seconds Root dispersion : 0.000006 seconds Update interval : 4.0 seconds Leap status : Normal Would you have any idea on how to improve that? * With good switches with PTP support jitter of 50 nanosecond might be possible. There is no switch anywhere in my setup. All the NICs are connected directly to eachother by crossover cables to minimize such issues. I just can’t have a proper PTP grandmaster In the DC, so I’m trying to find work arounds: server 1 in master mode, servers 2 and running running timemaster (slave mode) Is there anything I could do to take advantage the direct connection between the NICs to further reduce jitter on server2 and 3? Is there any interest at all in using L2? Or on using the direct connection between server 2 and server 3 Because another way to look at them is that the servers are hooked together in a triangle : server1 (eth2) <-- server 2 (eth1) server 1 (eth3) (eth3) | ^ V | Server 3(eth1) | Server 3 (eth2) ----/ Thanks! |
From: Miroslav L. <mli...@re...> - 2019-03-05 09:00:58
|
On Tue, Mar 05, 2019 at 08:25:17AM +0000, C. Devereaux wrote: > After reading more about that, instead of hardcoding an offset of 37 (and keeping track of the when it must be changed) I decided to hardcode 0 instead, and to run the ptp4l server in UTC, through either legacy hardware timestamping or software timestamping. Software timestamping is much less accurate and legacy timestamping is not supported in the mainline driver AFAIK. The UTC offset can be specified in the configuration, if for some reason 37 doesn't work for you. > Also, it takes some time to reach a smaller offset. I would like to “start from scratch” and have the offset jump., ie make the NICs match the RTC immediately, before starting to broadcast PTP messages. > > However, the -F to force step sync on start to do that seems to be ignored, regardless of the units I use: The step will happen only in the s1 state. With a small I constant it takes up 1000 seconds to reach the s1 state, because it needs to measure the frequency offset very accurately. > For now, I’m using linreg in phc2sys (even if I can’t start from scratch, at least it is adjusting faster), and software timestamping (in ptp4l, to get UTC values ie -O0 offset) Yeah, that won't work very well. You need HW timestamping and a very slow servo. > There is no switch anywhere in my setup. All the NICs are connected directly to eachother by crossover cables to minimize such issues. I just can’t have a proper PTP grandmaster In the DC, so I’m trying to find work arounds: server 1 in master mode, servers 2 and running running timemaster (slave mode) Why do you need PTP? If there are no switches, you could use NTP with HW timestamping and get a similar performance, except the configuration would be much simpler (no need for mixing PTP with NTP). > Is there anything I could do to take advantage the direct connection between the NICs to further reduce jitter on server2 and 3? Is there any interest at all in using L2? Or on using the direct connection between server 2 and server 3 I don't think L2 will help, at least not with e1000e. The connection between server 2 and 3 could be useful if server 1 will not always be the grandmaster. But this would complicate the configuration quite a bit (it's not possible with timemaster). > Because another way to look at them is that the servers are hooked together in a triangle > : > server1 (eth2) <-- server 2 (eth1) > server 1 (eth3) (eth3) > | ^ > V | > Server 3(eth1) | > Server 3 (eth2) ----/ Ok, so there are two interfaces on server 1. I missed that. I think you need one of them to be synchronized to the system clock (using very small PI constants) as I suggested before. The other interface needs to be synchronized to the first interface using default PI constants or linreg. ptp4l needs to be configured with the boundary_clock_jbod option. HTH, -- Miroslav Lichvar |
From: C. D. <cd...@ou...> - 2019-03-05 13:16:37
|
Hello, > Software timestamping is much less accurate and legacy timestamping is not supported in the mainline driver AFAIK. The UTC offset can be specified in the configuration, if for some reason 37 doesn't work for you. How do you specify the UTC offset? (I’m afraid 37 will change to 38, etc) > Yeah, that won't work very well. You need HW timestamping and a very slow servo. Ok, so -H and no linreg > Why do you need PTP? If there are no switches, you could use NTP with HW timestamping and get a similar performance, except the configuration would be much simpler (no need for mixing PTP with NTP). I’m not sure I need it. I just want all the servers clocks to be as accurate as possible, as explained above for the tmeseries consistency. After some brief reading, I concluded NTP would never give me more precision than a few ms – even worse ethan what I achieve now with usec level > I don't think L2 will help, at least not with e1000e. The connection between server 2 and 3 could be useful if server 1 will not always be the grandmaster. But this would complicate the configuration quite a bit (it's not possible with timemaster). I can kick out timemaster. At this point I may as well write all my configs manually. There are 3 servers because sometimes one needs to be rebooted or maintained, while the measurements go on. I was also hoping this sidelink between server2 and 3 could create another parameter in the regression to better take into account server1 RTC clock natural drift one way or the other. > I think you need one of them to be synchronized to the system clock (using very small PI constants) as I suggested before. The other interface needs to be synchronized to the first interface using default PI constants or linreg. ptp4l needs to be configured with the boundary_clock_jbod option. So do you mean, on server1 for both on eth2 (server1 to server 2) and eth3 (server 1 to server 3) /usr/sbin/phc2sys -s CLOCK_REALTIME -c %i -r -r -P 1e-4 -I 1e-8 -O37 On server 3 and server 2 direct link to server1: /usr/sbin/phc2sys -s eth1 -c %i -r -r E lingreg On server 3 and server 2 direct link to eachother: /usr/sbin/phc2sys -s CLOCK_REALTIME -c %i -r -r-P 1e-4 -I 1e-8 --boundary_clock_jbod 1 Any suggestion is welcome! |
From: Miroslav L. <mli...@re...> - 2019-03-05 15:18:57
|
On Tue, Mar 05, 2019 at 01:16:27PM +0000, C. Devereaux wrote: > How do you specify the UTC offset? (I’m afraid 37 will change to 38, etc) There is a "utc_offset" option. > > I think you need one of them to be synchronized to the system clock > (using very small PI constants) as I suggested before. The other > interface needs to be synchronized to the first interface using > default PI constants or linreg. ptp4l needs to be configured with the > boundary_clock_jbod option. > So do you mean, on server1 for both on eth2 (server1 to server 2) and eth3 (server 1 to server 3) > /usr/sbin/phc2sys -s CLOCK_REALTIME -c %i -r -r -P 1e-4 -I 1e-8 -O37 In this case only one interface should be synchronized directly to the system clock (synchronized by NTP). -r doesn't do anything without -a. phc2sys -s CLOCK_REALTIME -c eth1 -P 1e-4 -I 1e-8 -O 37 phc2sys -s eth1 -c eth2 -O 0 > On server 3 and server 2 direct link to server1: > /usr/sbin/phc2sys -s eth1 -c %i -r -r E lingreg > On server 3 and server 2 direct link to eachother: > /usr/sbin/phc2sys -s CLOCK_REALTIME -c %i -r -r-P 1e-4 -I 1e-8 --boundary_clock_jbod 1 I'm not sure what this is supposed to do. The jbod option belongs to ptp4l. There can be only one process synchronizing each clock. A single phc2sys -a using one or two -r should be enough. When the server 1 goes down, the other servers will synchronize to one of their interfaces (or system clock with -r -r). When server 1 is back, they will quickly resynchronize with it. That may or may not be what you want. -- Miroslav Lichvar |
From: C. D. <cd...@ou...> - 2019-03-26 12:17:18
|
> I'm not sure what this is supposed to do. The jbod option belongs to ptp4l. Oops, wrong copy paste! Here is what I currently I run on the master: For the NTP layer, I use chrony with an input of about 10 stratum1 servers that are 5 to 8ms away, as measured by ping and confirmed by chronyc sources, with minpoll 4 maxpoll 4 iburst for each and then: stratumweight 0 hwtimestamp * lock_all sched_priority 1 logdir /var/log/chrony log tracking measurements statistics maxupdateskew 100.0 hwclockfile /etc/adjtime rtcsync makestep 1 3 Since available_clocksource shows: tsc hpet acpi_pm, the clocksource is set to hpet by the rc.local: echo hpet > /sys/devices/system/clocksource/clocksource0/current_clocksource chrony starts fine: Mar 26 06:42:12 d1 systemd[1]: Starting chrony, an NTP client/server... Mar 26 06:42:13 d1 chronyd[2634]: chronyd version 3.4 starting (+CMDMON +NTP +R EFCLOCK +RTC +PRIVDROP +SCFILTER +SIGND +ASYNCDNS +SECHASH +IPV6 -DEBUG) Mar 26 06:42:13 d1 chronyd[2634]: Enabled HW timestamping on eth0 Mar 26 06:42:13 d1 chronyd[2634]: Enabled HW timestamping on eth3 Mar 26 06:42:13 d1 chronyd[2634]: Enabled HW timestamping on eth2 Mar 26 06:42:13 d1 chronyd[2634]: Frequency 10.106 +/- 0.004 ppm read from /var /lib/chrony/chrony.drift Mar 26 06:42:13 d1 chronyd[2634]: Loaded seccomp filter Mar 26 06:42:13 d1 systemd[1]: Started chrony, an NTP client/server. I ponder about correcting each stratum1 server by their average offset since they seem quite stable on repeated measurements, as you can see on this plot of the column 5 of /var/log/chrony/statistics: [cid:image003.png@01D4E3AC.49DBF920] But there are just some correlated discrete jumps that I can’t explain, as it often increases the spread but then remains linear before exploding again sometimes : [cid:image006.png@01D4E3AC.49DBF920] I notice the exact same phenomenon on a spare server that’s just monitoring these 10 stratum1 servers, so I suppose this is just some NTP protocol woes, but just to be safe I am also logging this data on this spare with a regular ntp, for later analysis. I’m not sure how correcting by the offset will help. I have written scripts to compute the average offsets and put them in chrony.conf It seems to slightly improve the situation, but I though chrony already used these average? Anyway, I will keep doing that until I can no longer see lines, and just get noise [cid:image008.png@01D4E3AC.49DBF920] Then for the PTP layer, I use on the master the options you suggested: # hardware timestamping with no offset /usr/sbin/ptp4l --clockClass 6 -H -i eth2 -i eth3 --uds_address /var/run/ptp4l.socket -m --boundary_clock_jbod=1 --utc_offset=0 # no -a because can’t mix autoconfiguration and manual conf /usr/sbin/phc2sys -s CLOCK_REALTIME -c eth3 -r -r -P 1e-4 -I 1e-8 -O0 And on the slaves through /usr/sbin/timemaster -f /etc/linuxptp/timemaster.conf /usr/sbin/chronyd -n -f /var/run/timemaster/chrony.conf /usr/sbin/ptp4l -l 5 -f /var/run/timemaster/ptp4l.0.conf -H -I eth1 /usr/sbin/phc2sys -E linreg -a -r -R 1.00 -z /var/run/timemaster/ptp4l.0.socket -t [0:eth1] -n 0 -E ntpshm -M 0 I am not using the direct connection between the slaves at the moment, as explained below: > There can be only one process synchronizing each clock. A single phc2sys -a using one or two -r should be enough. When the server 1 goes down, the other servers will synchronize to one of their interfaces (or system clock with -r -r). When server 1 is back, they will quickly resynchronize with it. That may or may not be what you want. This is what I want if it happens, but not quickly: very slowly instead, as I care more about keeping the internal time consistency of the local cluster, even if it diverges from external sources. So after a reboot, I want the NTP sync to happen very quickly even if it causes discrete jumps, then I want the whole cluster to remain internally consistent and slowly adjust to the ideal NTP time – but not at the cost of changes bigger than say 100 ns. I think this is why you said to use low PI and no linreg on the PTP master |