Menu

sync lost, with log: ptpd2: updateDelay aborted, delay -xxxx is negative

Help
2013-01-28
2013-03-21
  • Matt Garman

    Matt Garman - 2013-01-28

    Hi,

    I'm running ptpd 2.2.0 on CentOS 5.x. Commandline is "ptpd2 -g -d -b ethX". Occasionally, a few ptp slaves will get out of sync with the master. When this happens, the system log file (/var/log/messages) is filled with entries like this:

    Jan 27 21:36:16 lnxsvr53 ptpd2: updateDelay aborted, delay -2118000 is negative

    It's fairly rare that this happens, maybe once every week or two. And it seems to be limited to a handful of machines. I can't see anything obviously wrong with these machines. And simply re-starting the ptpd daemon fixes the problem (at least until it happens again).

    Note that the above message seems fairly common on all servers, even ones where I do not have this occasionally-out-of-sync problem. But the magnitude of that delay value seems to determine whether or not the machine's clock is actually out of sync. The example above was from a machine whose clock was out of sync. Here is an example from a server that does not have the sync issue:

    Jan 28 08:58:07 lnxsvr91 ptpd2: updateDelay aborted, delay -17000 is negative

    I don't have enough data to determine exactly what the threshold is for when that syslog message is benign, and when it's indicative of a real problem.

    Hoping someone here can offer some ideas/suggestions.

    Thanks!

     

    Last edit: Matt Garman 2013-01-28
  • Wojciech Owczarek

    Matt,

    I remember looking into something similar. I can't recall the exact issue, but please try version 2.2.2 or better try the current svn trunk and see if this changes anything.

    Thanks
    Woj

     
    • Matt Garman

      Matt Garman - 2013-03-21

      Any chance you were able to recall the exact issue and whether it was addressed in 2.2.2 or SVN? I did some limited testing with SVN, and it appears to resolve the issue, although I'm hesitant to do a large-scale rollout with a non-production (i.e. SVN) release. If you were pretty confident that the issue was addressed in 2.2.2, then I'd give that a go. In other words, if you can remember, it might save me some time and effort. :)

      I ask because today I am seeing a similar issue with 2.2.0: the clock on a PTP client appears to be stable, but it is "stuck" at an offset of about 125 microseconds from the PTP master. Compared to another identical client on the same subnet, where the offset is consistently within a few microseconds.

      On each client, I am looking at the offset on the two machines by doing something like:
      # while [ 1 ]; do ntpdate -q master ; sleep 5s ; done
      Where "master" is the ptp master and also an NTP client. I know as per the other thread that ntpdate -q is at best a crude way to determine the offset. But it is consistent with other internal metrics we have (e.g. timestamp-based packet transfer times).

      Based on my experience with the original problem that I described in the first post, I suspect that a simple restart of the PTP daemon on the client will resolve the issue. But if you have any further insight into the issue, I'd be interested in hearing it!

      Thanks again,
      Matt

       

Log in to post a comment.