Menu

Trouble with ptpd 2.1.0 on Raspberry Pi

Help
Seltsam
2015-02-16
2015-09-14
  • Seltsam

    Seltsam - 2015-02-16

    Hello!

    I am having some trouble with ptpd 2.1.0 running on a bunch of Raspberry Pis I desperately try to synch. One of the Raspis is my master clock source (running ntp and ptpd together) and the others are running ptpd only. It works just fine for the first couple of hours but after a while the slaves are drifting away from the master clock.

    I know that there are newer versions of ptpd available but I compiled the latest release for Raspberry and as soon as both server and clients are up and running the network is not available any more. I am not smart enough to directly edit the source code so I decided to go with the version distributed by the Raspberry guys when I apt-get the ptpd - which is version 2.1.0. My systems run wheezy with Kernel 3.18.5.

    Maybe it's just a minor thing I have overseen. So, I'm starting the server instance of ptpd with the options "-b eth0 -t", since the server is on an Ethernet wire. The clients are launched with "-b wlan0 -g -D -f /home/pi/ptpd.txt" - they are obviously on Wifi. Since the synchronisation works for a while with great precision I do not assume that the Wifi card should be the problem (correct me if I'm wrong. I'm using the Edimax EW-7612UAn Wireless-LAN USB-Adapter).

    Rebooting the master does not help, the clients are still on the same offset as before. Rebooting the clients, however, gets them to synch correctly (for a while).

    Any help would be greatly appreciated!

    Best,
    Seltsam

     
  • Wojciech Owczarek

    Hi,

    1. Can you clarify "network is not available anymore" when running 2.3.0?
    2. Can you clarify "slaves are drifting away from master clock"? Are they free-running, or do they start drifting aggressively, like 0.5ms per second?
    3. Can you run a ping between a master and a slave while this is happening and see what happens around the moment when the slaves start drifting and when network has issues?
    4. Is the same issue happening when you use the wired connection?

    The latest version should work without issues. Maybe your network has a problem with multicast. Are there any errors in the logs? We don't really support 2.1.0 anymore and the answer usually is to upgrade to 2.3.

    One thing I must add is that the way you are trying to sync them, is guaranteed to be not very effective for PTP, for the following reasons:

    1. USB
    2. Wireless
    3. USB wireless

    It is not uncommon for packets do be randomly delayed, which is absolutely detrimental for time sync. The Rpi is not an ideal platform for time sync either - even if you use the wired connection, it's USB anyway - the network controller sits behind a USB hub.

    Regards,
    Wojciech

     
  • Seltsam

    Seltsam - 2015-02-16

    Wojciech,

    thanks a lot for your quick reply!

    To be a bit more specific:

    1. While running 2.3 as soon as the connection between client and at least one server is established both RPis are not responsive anymore when I try to talk to them via SSH or SFTP while everything is fine when I use a directly connected keyboard. So basically no "normal" network communication is possible anymore.
    2. With "slaves drifting away" I mean that the offset between master and slaves are growing or declining (and sometimes get better after a while).
      To measure this I wrote my own multicast-based method which is sending a regular current timestamp from the server to all clients and I compare the received timestamp with the current time of the clients.
      This way I have seen time differences from -200 to +200 ms so far. It usually starts with something close to 0 and then "drifts" away to the larger values. I am sending 5 timestamps per second but display only the median of 20 received timestamps. Actually ptpd displays a rather jumpy behaviour:
      2015-02-16 16:10:03:474467, slv, b827ebfffec33640/01, 0.000000000, -0.127077006, 0.000000000, 0.056168000, -512000
      2015-02-16 16:10:04:534863, slv, b827ebfffec33640/01, 0.000000000, -0.098252876, 0.000000000, 0.005330000, -512000
      2015-02-16 16:10:05:522495, slv, b827ebfffec33640/01, 0.000000000, -0.077982359, 0.000000000, 0.008135000, -512000
      2015-02-16 16:10:06:448718, slv, b827ebfffec33640/01, 0.000000000, -0.120282842, 0.000000000, 0.082439000, -512000
      2015-02-16 16:10:07:468084, slv, b827ebfffec33640/01, 0.000000000, -0.135863001, 0.000000000, 0.062545000, -512000
      2015-02-16 16:10:08:511831, slv, b827ebfffec33640/01, 0.000000000, -0.092851660, 0.000000000, 0.019666000, -512000
      2015-02-16 16:10:09:516072, slv, b827ebfffec33640/01, 0.000000000, -0.072697152, 0.000000000, 0.014563000, -512000
    3. My ping time is around 6-7 ms between server and wireless clients.
    4. Yes. The same is happening with RPis connected via Ethernet cable.

    You are right, my setup is not the best solution for nanosecond synchronisation :) and I'm certainly not expecting to synch much better than 10-20 ms which would be good enough for my application. I'm intending to run the RPis as wireless speakers and as long as the deviation between the devices stays under 20 ms I should be fine.

    Would be great to know if anyone was successful in running 2.3 on Raspbian/wheezy?

    Best
    Seltsam

     
  • Wojciech Owczarek

    Hi,

    I have been running, developing and building 2.3.0 and 2.3.1rcX on my RPis running Raspbian Wheezy for a while and never experienced the issues you mention here.

    I think this may be a more fundamental issue than with PTPd itself. More questions:
    - Does this problem not happen at all when you're not running ptpd?
    - Are you sure you have no loops in your network?
    - Can you test multicast with another tool like mtools: https://code.google.com/p/open-mtools/downloads/list or iperf and see if you get the same problem? Send and subscribe to group 224.0.1.129.
    - Are you getting the same issue when you connect two RPI directly to each other with the Ethernet cable, with the USB dongles disconnected?
    - Can you try using unicast mode instead?

    Finally, check your CPU usage - does the box become unresponsive when you're running PTPd AND your sound streaming, or on its own? It's very easy to choke a RPi. I'm using one for an Asterisk call server for home and if you're running codecs that are not specifically ARM optimised, the performance is crap. And I mean very crap.

    Regarding the jumpy behaviour, you're not going to get much better than this over wireless. Standard WLAN exhibits very high packet delay variation. I mean, 1 millisecond variation is very, very, very high for PTP. The filters have nothing firm to grab to. What PTPd is doing is it's mercilessly dragging the clock up and down along with the jitter. 2.3.1 that should be released soon, should show much improved response to noise like this, we can't do wonders I'm afraid.

    Regards,
    Wojciech

     

    Last edit: Wojciech Owczarek 2015-02-16
  • Seltsam

    Seltsam - 2015-02-17

    Wojciech,

    thanks for your good suggestions. You encouraged me to try compiling PTPd 2.3.1-rc3 again and test it on absolutely clean and fresh Raspbian installations. I don't know what I did wrong the last time - now I don't experience the weird network craziness anymore. I must have done something wrong in the compilation process or while editing the config file.

    The clients synch pretty well over wired Ethernet with a very good precision (at least for my case - around 1 ms). Wifi, as you said, does not deliver the same accuracy but it looks like it is good enough for my purpose, it seems to stay below 50ms.

    Now we'll see how this develops over the day. I think with 2.1.0 the issue was that PTPd displayed the correct offset to the server but did not adjust the clock accordingly. I will let you know how this develops over the next days or two.

    Thanks a lot for your support so far and your good work on PTPd!

    Best
    Seltsam

     
  • Seltsam

    Seltsam - 2015-02-18

    OK, here's my update after running PTPd 2.3.1-rc3 for a night.

    I have started server and three clients around noon yesterday. The deviation between client and servers was around 10-50 ms in the beginning. At midnight they reached a value of 280-300 ms. This value increased steadily over the course of 12 hours. Since my Raspberries reboot automatically at midnight I don't know how much the deviation would have been after another 8 hours.

    Any idea how this might be possible?

    Best
    Seltsam

     
  • Niels Hornung

    Niels Hornung - 2015-09-11

    I tested 2.3.1 on my 2 Pis one is Model B+ and the other is Pi2 B+ and its working nicely wit 50 us offset till now 2 hrs tested.

     

    Last edit: Niels Hornung 2015-09-11
  • Wojciech Owczarek

    Niels,

    This looks like a good result for RPi, you could probably get even better numbers with filtering, unless you are already doing this. Check ptpd2.conf man page - a combination of a statistical filter and outlier filter can be quite effective. You can also increase the Sync and Delay Request message rates to something like 16/sec. When using filtering, higher message rates help as there are more samples left when some are thrown away.

    Cheers,
    Wojciech

     
  • Niels Hornung

    Niels Hornung - 2015-09-14

    Which filters should be used? delay and sync simultaniusly. How should i determine which filter is needed by testing the options?
    I Think autotune is a good option for the outlier filter but dont know anything about the threshold.

     

    Last edit: Niels Hornung 2015-09-14
  • Wojciech Owczarek

    Hi Niels,

    It's best to use filters for both Sync and Delay. You need to experiment a little to see which settings bring the best results. There are many settings for the outlier filter but the defaults are usually OK.

    Seeing that you edited your reply I take it that you found the actual man page. The built-in long help (-H) is complete but sometimes brief. The man pages (man ptpd2.conf and man ptpd2) are the best sources of information.

    As to determining which options work best, you simply have to observe. The best combination is the one that gives you the most stable (and preferably lowest) offset from master. You can watch the statistics log, but you can also observe the status file ( global:status_file=/some/path like /var/run/ptpd2.status). You can do it with the watch command such as watch -n 1 cat /var/run/ptpd2.status ).

    The lower the mean and standard dev of offset from master the better. The lower the standard dev of clock correction (observed drift), the more stable the clock.

     

Log in to post a comment.