Thread: [Linuxptp-users] SLAVE to UNCALIBRATED on SYNCHRONIZATION_FAULT
PTP IEEE 1588 stack for Linux
Brought to you by:
rcochran
From: Daniel Le <dan...@ex...> - 2015-04-01 22:18:35
|
Hello, My PTP slave clock appears to lose sync with a grandmaster clock when under heavy load and worse it can't recover. The sync is good when there is low or no other traffic. This slave clock uses software timestamping to adjust the host system time. The PTP transmit and receive packets are time stamped by a non-1588 aware NIC's FPGA clock which is sync'd to the host system clock, i.e. the NIC regularly gets host system time to step/slew to it. The log shows: - port <port#>: SLAVE to UNCALIBRATED on SYNCHRONIZATION_FAULT and the following repetitive messages: - clockcheck: clock jumped forward or running faster than expected! - clockcheck: clock jumped backward or running slower than expected! I would appreciate information to debug this, as well an explanation of what may be happening. Thanks, Daniel |
From: Richard C. <ric...@gm...> - 2015-04-02 06:32:13
|
On Wed, Apr 01, 2015 at 10:18:27PM +0000, Daniel Le wrote: > My PTP slave clock appears to lose sync with a grandmaster clock > when under heavy load and worse it can't recover. The sync is good > when there is low or no other traffic. This slave clock uses > software timestamping to adjust the host system time. The PTP > transmit and receive packets are time stamped by a non-1588 aware > NIC's FPGA clock which is sync'd to the host system clock, i.e. the > NIC regularly gets host system time to step/slew to it. This sounds fishy to me. You say your slave uses SW time stamping, but that the FPGA provides time stamps. That is HW time stamping! Also, since the Linux system time is purely software, how do you get its time into the FPGA? By using phc2sys? > The log shows: > - port <port#>: SLAVE to UNCALIBRATED on SYNCHRONIZATION_FAULT > > and the following repetitive messages: > - clockcheck: clock jumped forward or running faster than expected! > - clockcheck: clock jumped backward or running slower than expected! > > I would appreciate information to debug this, as well an explanation of what may be happening. That message comes from the function, clockcheck_sample(), in clockcheck.c. It does the following: /* Check the sanity of the synchronized clock by comparing its uncorrected frequency with the system monotonic clock. If the synchronized clock is the system clock, the measured frequency offset will be the current frequency correction of the system clock. */ This is sanity check against CLOCK_MONOTONIC. Probably there is a bug in your custom HW design or in the system/fpga synchronization method. Thanks, Richard |
From: Daniel Le <dan...@ex...> - 2015-04-02 15:38:46
|
Please see inline. -----Original Message----- From: Richard Cochran [mailto:ric...@gm...] Sent: Thursday, April 02, 2015 2:32 AM To: Daniel Le Cc: lin...@li... Subject: Re: [Linuxptp-users] SLAVE to UNCALIBRATED on SYNCHRONIZATION_FAULT On Wed, Apr 01, 2015 at 10:18:27PM +0000, Daniel Le wrote: > My PTP slave clock appears to lose sync with a grandmaster clock when > under heavy load and worse it can't recover. The sync is good when > there is low or no other traffic. This slave clock uses software > timestamping to adjust the host system time. The PTP transmit and > receive packets are time stamped by a non-1588 aware NIC's FPGA clock > which is sync'd to the host system clock, i.e. the NIC regularly gets > host system time to step/slew to it. This sounds fishy to me. You say your slave uses SW time stamping, but that the FPGA provides time stamps. That is HW time stamping! Also, since the Linux system time is purely software, how do you get its time into the FPGA? By using phc2sys? [DL] The FPGA has its own clock and a proprietary slewing mechanism to sync to a time source. It does not use phc2sys because my embedded system doesn't have 3.x Linux kernel. [DL] In the case of PTP time source, the FPGA engine on the NIC periodically reads the kernel system time (do_gettimeofday) in order to step/slew to the system time which is synchronized to PTP grandmaster time. [DL] The ptp4l program is run with -S option, however, for example when sending/receiving packets via IPv4 transport in udp_send() and udp_recv(), a timestamping pipe is used to get the FPGA hardware timestamps of the packets, instead of the functions sendto() and sk_receive(). > The log shows: > - port <port#>: SLAVE to UNCALIBRATED on SYNCHRONIZATION_FAULT > > and the following repetitive messages: > - clockcheck: clock jumped forward or running faster than expected! > - clockcheck: clock jumped backward or running slower than expected! > > I would appreciate information to debug this, as well an explanation of what may be happening. That message comes from the function, clockcheck_sample(), in clockcheck.c. It does the following: /* Check the sanity of the synchronized clock by comparing its uncorrected frequency with the system monotonic clock. If the synchronized clock is the system clock, the measured frequency offset will be the current frequency correction of the system clock. */ This is sanity check against CLOCK_MONOTONIC. Probably there is a bug in your custom HW design or in the system/fpga synchronization method. [DL] Could you further elaborate this clockcheck_sample functionality (such as uncorrected frequency)? Is my understating of the following correct? - The synchronized clock is the PTP clock and is maintained by PTP packet TX/RX timestamps per 1588 standard. - The system monotonic clock (CLOCK_MONOTONIC) is the Linux kernel system clock. [DL] What is the threshold to determine that clock jumped forward/backward too much? [DL]Upon a system boot-up or restart, how does PTP slave clock sets the system clock initially? Is CLOCK_REALTIME involved? Thank you. Daniel |
From: Richard C. <ric...@gm...> - 2015-04-02 16:18:48
|
On Thu, Apr 02, 2015 at 03:38:38PM +0000, Daniel Le wrote: > [DL] The FPGA has its own clock and a proprietary slewing mechanism to sync to a time source. It does not use phc2sys > because my embedded system doesn't have 3.x Linux kernel. > > [DL] In the case of PTP time source, the FPGA engine on the NIC periodically reads the kernel system time > (do_gettimeofday) in order to step/slew to the system time which is synchronized to PTP grandmaster time. (IOW, software time stamping) > [DL] The ptp4l program is run with -S option, however, for example when sending/receiving packets via IPv4 transport > in udp_send() and udp_recv(), a timestamping pipe is used to get the FPGA hardware timestamps of the packets, > instead of the functions sendto() and sk_receive(). (IOW, hardware time stamping) This design seems wrong to me. Why not let the FPGA have it own clock, and then synchronize the Linux system time to it? That is how all the other devices do it. In any case, you have some elaborate custom kernel and ptp4l modifications. I really can't help you with those, sorry. > [DL] Could you further elaborate this clockcheck_sample functionality (such as uncorrected frequency)? Please take a look at the code. It is not all that complicated. > Is my understating of the following correct? > - The synchronized clock is the PTP clock and is maintained by PTP packet TX/RX timestamps per 1588 standard. Yes. > - The system monotonic clock (CLOCK_MONOTONIC) is the Linux kernel system clock. No. See 'man 3 clock_gettime'. > [DL] What is the threshold to determine that clock jumped forward/backward too much? The code does this: Check the sanity of the synchronized clock by comparing its uncorrected frequency with the system monotonic clock. When the measured frequency offset is larger than the value of the sanity_freq_limit option (20% by default), a warning message will be printed and the servo will be reset. Setting the option to zero disables the check. This is useful to detect when the clock is broken or adjusted by another program. > [DL]Upon a system boot-up or restart, how does PTP slave clock sets the system clock initially? Is CLOCK_REALTIME involved? First of all, the ptp4l program will not start at boot-up or restart unless you arrange for that to happen. Secondly, the ptp4l program sets its target clock (either the PHC in the case of HW time stamping, or CLOCK_REALTIME for SW time stamping) to match that of the remote master, according to the 'first_step_threshold' configuration option. HTH, Richard |
From: Daniel Le <dan...@ex...> - 2015-04-02 23:40:36
|
Thanks much for your help. I'll review the design. Meanwhile I have a couple more questions if you don't mind. (1) Could you point me in the code where ptp4l slave clock sets for the first time the kernel CLOCK_REALTIME for SW time stamping? And I guess it does that right after transitioning into Slave state from Uncalibrated. I'm thinking the FPGA clock time should be stepped at that moment. (2) I tried to revert to sendto() and sk_receive() and ran into the following error messages: timed out while polling for tx timestamp increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug port <portN>: send delay request failed Does that mean sendto() of the original program needs customization? Daniel -----Original Message----- From: Richard Cochran [mailto:ric...@gm...] Sent: Thursday, April 02, 2015 12:19 PM To: Daniel Le Cc: lin...@li... Subject: Re: [Linuxptp-users] SLAVE to UNCALIBRATED on SYNCHRONIZATION_FAULT On Thu, Apr 02, 2015 at 03:38:38PM +0000, Daniel Le wrote: > [DL] The FPGA has its own clock and a proprietary slewing mechanism to sync to a time source. It does not use phc2sys > because my embedded system doesn't have 3.x Linux kernel. > > [DL] In the case of PTP time source, the FPGA engine on the NIC periodically reads the kernel system time > (do_gettimeofday) in order to step/slew to the system time which is synchronized to PTP grandmaster time. (IOW, software time stamping) > [DL] The ptp4l program is run with -S option, however, for example when sending/receiving packets via IPv4 transport > in udp_send() and udp_recv(), a timestamping pipe is used to get the FPGA hardware timestamps of the packets, > instead of the functions sendto() and sk_receive(). (IOW, hardware time stamping) This design seems wrong to me. Why not let the FPGA have it own clock, and then synchronize the Linux system time to it? That is how all the other devices do it. In any case, you have some elaborate custom kernel and ptp4l modifications. I really can't help you with those, sorry. > [DL] Could you further elaborate this clockcheck_sample functionality (such as uncorrected frequency)? Please take a look at the code. It is not all that complicated. > Is my understating of the following correct? > - The synchronized clock is the PTP clock and is maintained by PTP packet TX/RX timestamps per 1588 standard. Yes. > - The system monotonic clock (CLOCK_MONOTONIC) is the Linux kernel system clock. No. See 'man 3 clock_gettime'. > [DL] What is the threshold to determine that clock jumped forward/backward too much? The code does this: Check the sanity of the synchronized clock by comparing its uncorrected frequency with the system monotonic clock. When the measured frequency offset is larger than the value of the sanity_freq_limit option (20% by default), a warning message will be printed and the servo will be reset. Setting the option to zero disables the check. This is useful to detect when the clock is broken or adjusted by another program. > [DL]Upon a system boot-up or restart, how does PTP slave clock sets the system clock initially? Is CLOCK_REALTIME involved? First of all, the ptp4l program will not start at boot-up or restart unless you arrange for that to happen. Secondly, the ptp4l program sets its target clock (either the PHC in the case of HW time stamping, or CLOCK_REALTIME for SW time stamping) to match that of the remote master, according to the 'first_step_threshold' configuration option. HTH, Richard |
From: Richard C. <ric...@gm...> - 2015-04-03 09:56:01
|
On Thu, Apr 02, 2015 at 11:40:28PM +0000, Daniel Le wrote: > Thanks much for your help. I'll review the design. Meanwhile I have a couple more questions if you don't mind. Regarding the design, it would be better to expose the FPGA clock to ptp4l directly. That means letting ptp4l set and adjust the FPGA clock. You don't have the PHC subsystem in your pre-3.0 kernel, but you can emulate the interface in the FPGA driver using ioctls. For the time stamps, you can hack your MAC driver to obtain the time stamps and deliver them via so_timestamping. > (1) Could you point me in the code where ptp4l slave clock sets for the first time the kernel CLOCK_REALTIME for SW time stamping? And I guess it does that right after transitioning into Slave state from Uncalibrated. I'm thinking the FPGA clock time should be stepped at that moment. That happens in the function clock_synchronize() in clock.c, switch (state) { case SERVO_UNLOCKED: break; case SERVO_JUMP: clockadj_set_freq(c->clkid, -adj); clockadj_step(c->clkid, -tmv_to_nanoseconds(c->master_offset)); here. ----------^ ... case SERVO_LOCKED: ... } > (2) I tried to revert to sendto() and sk_receive() and ran into the following error messages: > > timed out while polling for tx timestamp > increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug > port <portN>: send delay request failed > > Does that mean sendto() of the original program needs customization? I am not sure what you have done to the kernel and user space code. Without seeing the code, it is not possible for me to answer this question. Sorry, Richard |