Menu

ptpd occasionally takes many minutes to sync?

Help
2010-03-05
2019-11-06
  • Jeremy Friesner

    Jeremy Friesner - 2010-03-05

    Hi all,

    I have a problem with ptpd:  usually it works well, but sometimes when we start up our servers (multiple Intel64-based Linux boxes on the same gigabit ethernet LAN, each running ptpd) it takes a really long time (tens of minutes or even several hours) to sync the clocks together.  During this period, we can see the clocks heading towards synchronization, at a rate of about 73ms per minute.  Once the clocks are synchronized, they stay synchronized after that, so it's not just a case of (clock drift, with ptpd not working at all)… but of course we need the systems to be synchronized more or less immediately at startup; having to wait an hour after startup isn't practical for our application.

    Does anyone have an idea as to what might cause this really slow synchronization to occur?

    Thanks,
    Jeremy

     
  • Todd Newton

    Todd Newton - 2013-04-30

    We have seen similar behavior as well with the v2.2.2 and SVN repository code. A colleague has mentioned that v2.1.0 did not have this type of issue. Systems would sync to within 1ms very quickly. The latest code being used appears to synchronize the clocks better/tighter, but the convergence may take several minutes (15 to 20+ minutes) before the clocks are within 1ms of the master (as reported by the offset from master).

    Any ideas?

    Thanks.

    --Todd

     
  • Jan Breuer

    Jan Breuer - 2013-05-01

    There are at least two problems in PTPd

    1) If your clock is for example 999 ms from the master, PTPd will try to slew the clock rather then doing the step. Slewing the clock from this worst case will take more then 30 min.

    999 000 000 ns / ADJ_FREQ_MAX = 1951 s ~ 32.52 min

    You can't easily grow ADJ_FREQ_MAX but you can lower the treshold to e.g. 10 ms.

    2) Implementation of function servo_perform_clock_step is inappropriate and clock step is inaccurate. This is problem below 10 ms, so I thing, you can't see this.

    now it is:
    - get time
    - substract offset
    - do something that take long time (e.g. print debug message or send something to syslog)
    - set time

    it should be:
    - get time
    - substract offset
    - set time
    - do something that take long time

    or better:
    - perform kernel clock step (using adjtimex or adjtime)
    - do something that take long time

    --Jan

     
  • Wojciech Owczarek

    Some more comments from me - possible causes of the issue in the light of ptpd's current behaviour:

    1. Currently ptpd resets the observed drift (base clock frequency offset) by default whenever you restart it and whenever it changes state. This means that if you restart your slave, its whole stabilisation work is lost. A fix for this part has been in svn for a while but not in the last release: the -F option / drift recovery. -F 1 preserves the kernel frequency shift between restarts, and -F 2 saves it to a file on shutdown, and reads from file on startup. This makes a tremendous difference. Next ptpd version will default to the "correct" behaviour of preserving the frequency offset. I don't think resetting it makes sense at all: servo is quick enough to change the observed drift very quickly when needed.

    2. The servo PI controller P and I components are not optimal - when slewing from a large offset, the offset will horribly oscillate before it stabilises. This will hopefully be fixed in near future. In fact, this simply looks embarrassing right now.

    3. Currently ptpd does not control kernel's tick value (which to an extent is just another mechanism to apply frequency offsets - look it up). If you have had NTPd running on your system that has left the tick value in non-default, ptpd stabilisation time will be even longer!

    4. The effective slew rate currently limited to ADJ_FREQ_MAX can be extended to way over 512us / s, by controlling tick as well as frequency offset. Simply speaking, a tick change of 1 equals 100ppm frequency shift (100 us / s), on top of the frequency offset. I have got code that controls tick and allows to extend the slew rate below 512ppm - switches from tick to tick + frequency when exceeding 512ppm, but also keep tick at 0 when offset is below 512ppm. The switch from tick to freq offset is not linear - you can see change in the offset curve when it happens - but allows to slew even quicker.

     

    Last edit: Wojciech Owczarek 2013-06-23
  • Wojciech Owczarek

    Jan,

    It would have been 1951 seconds if ptpd kept the freq offset constant at 512ppm until it gets closer to zero - unfortunately the PI controller will start lowering it and it will oscillate quite a bit.

    I think the ideal behaviour to quickly stabilise the clock is:

    1. Set the frequency offset to the last known observed drift
    2. Immediately step the clock

    Basically - the current behaviour, just minus setting observed drift to zero. This should bring a huge difference.

    I think the case you're describing where we're doing I/O stuff like writing logs etc, before applying the new time, has some effect, but very little. Remember that we also reset the servo when stepping the clock, so offset will be incorrect until one-way delay is stable.

     
    • Jan Breuer

      Jan Breuer - 2013-06-24

      Wojciech,
      you are right. It is time with optimal regulation and not the real behaviout.

      Btw, PI is currently tuned for 1 sync per 2 seconds and not for 1 sync per second. There are larger oscilations now.

      Problem I mentioned is possible to see on slower systems with E2E Transparent Clock. It made synchronization faster if PTPd don't reset the drift and using TC switch with correction field which effectively reduces the path delay for a few nanoseconds.

       
  • Wojciech Owczarek

    Jan,

    I changed the clock stepping functions a little bit - clock is stepped immediately - messages are displayed after the fact. Two messages "stepping...." "stepped..." were not useful anyway - also drift is not reset.

    This works really great if:
    - You have a last observed drift saved in the drift file
    - you've stopped ptpd for a long time

    Then you just start it and send it a SIGUSR1. It will step the clock and restore the previous observed drift. Should stabilise very quickly.

     
  • Guilherme Costa G Fernandes

    Hi all,

    I'm having this same issue: the clocks take a long time (in the order of ten minutes) to synchronize when starting with an offset in the order of 100 ms. I'm running a system that cannot be kept powered up between uses.

    I see that some features @wowczarek mentioned, such as preserving the kernel frequency, have been implement in the newest version.

    However, I'm interested in what @Jan Breuer mentioned: how to lower the threshold for stepping the clock from 1 second to 10 ms, for example? I'm currently running PTPd from the command line, having installed it from Ubuntu's Synaptic Package Manager, and could not find a way to do that using the provided options.

    I tried using the flag "--clock:step_startup_force 1", which does perform a reset in the beginning that does not influence the posterior master-slave offset.

    A secondary point: has any development been made about optimizing the Kp and Ki gains of the servo? The default values remain 0.1 and 0.001.

    Thanks in advance!

     

Log in to post a comment.