Try to improve the stability on a Linux system with two ptp daemons running in hybrid mode. The server has one physical network card with two VLANs:
Servers:
ptpd2 -c ptpd2.conf -i eth0.11 --masterslave -d 0
ptpd2 -c ptpd2.conf -i eth0.22 --masteronly -d 1
Clients:
ptpd2 -c ptpd2.conf -i eth0.22 --slaveonly -d 1
Without the USE_BINDTODEVICE code in ./src/dep/net.c, the clients stop updating the "Offset from Master : 0.000xxxxxx s" value (constantly writing the same offset value to the status log file) after a few hours uptime.
An alternative solution is setting "ptpengine:ip_mode" to "multicast".
Hi,
This was changed in 2.3.1, because this was causing issues on some systems. Your patch reverts it back to the original line from 2.3.0.
Are you sure the issues you are seeing are being caused by this? What's the OS and kernel version you're on? Something like this happening after a couple of hours to me is more likely a sign of a kernel issue.
Arguably SO_BINDTODEVICE only controls sending packets, so should only matter for sending multicast - which we don't send in Hybrid. This is there only to make sure we send multicast out the interface on which ptpd is running.
Let me run some longer tests in hybrid mode to verify this.
If we find a solution for this, we may release a minor release with this and some other minor fixes.
Thanks,
Wojciech
Last edit: Wojciech Owczarek 2015-07-06
OK, I think I replied before I looked into this in more detail. The problem with this code most likely exists only on the master, not on the slave.
Will run some more tests.
Regards,
Wojciech
One thing worth mentioning is that as of PTPd 2.3.1 you can also run in unicast mode with multiple slaves:
Master:
ptpengine:ip_mode=unicast
ptpmode:unicast_negotiation=y
ptpmode:disable_bmca=y
ptpengine:preset=masteronly
Slave:
ptpengine:ip_mode=unicast
ptpmode:unicast_negotiation=y
ptpengine:preset=slaveonly
ptpengine:unicast_destinations"ip1,ip2,..."
The slaves will then request unicast transmission from the GM.
This effectively enables the Telecom profile, which may be easier to deploy where multicast routing is not possible or difficult to run.
In your scenario you have PTPd running on two tagged subinterfaces over the same physical, so the clock ID will be the same because of the same MAC. You should probably avoid that by setting different port numbers for both instances:
ptpengine:port_number=1 - first instance
ptpengine:port_number=2 - second instance, etc.
Regards,
Wojciech
Hi,
thank you for your response! The server is a custom embedded Linux system with 2.6.27 on a PPC (big endian) and the clients run embedded Linux 2.6.20 on a MIPS (little endian). The server does not know the client IP addresses (network configuration is done with avahi).
The ptpd hybrid mode looks very useful for this scenario: the server broadcasts to all clients and the clients reply with unicast. The idea is to use eth0.11 to connect all servers and let the one server define the system time (masterslave, domain 0).
A server can have one or more clients on eth0.22 (server is always master, clients always slave on domain 1).
I ran tests with hybrid mode on several systems for 2-3 week with 2.3.1-rc3. After system startup, ptpd could successfully synchronize the slave with the master. The "Offset from master" value in the status log on the client stopped being updated sometimes after a few hours (a restart of ptpd on all clients and servers was necessary to recover), sometimes it worked flawlessly for days.
Sending "kill -USR2" to the ptpd on the server showed high "domainMismatchErrors" values in the logs. After switching to multicast mode (or applying the patch), the domain mismatch errors disappeared.
Regards,
Christian
Christian,
I suggest running final 2.3.1 and repeating the tests. Between RC3 and final, a lot of fixes have been put in place.
Domain mismatch errors are seen when a device sees packets for a different domain than it is configured for. This would make sense when you're running two hybrid masters on the same host without bindtodevice: traffic from both masters probably goes out via the same interface.
This is going to be very difficult to resolve, because SO_BINDTODEVICE affects the way packets are looped back to PTPd - which is necessary for Delay Request packets in hybrid mode.
What I'm saying is that your change will fix things on the master side, but break things on the slave side. When using the masterslave preset, we cannot switch the IP options on the fly. In general, the network transport should be untouched once started.
Does your host support libpcap? That would be an easy solution here. Otherwise full multicast - or you can run your master with a modified PTPd and slave unmodified.
Finally, do the IP addresses of your masters change by any chance? If they do, PTPd has no way of knowing about this currently when running as master. Slave will reset its network transport and pick up interface changes, but master has no way to know this, as it does need to receive any data to be master, so has nothing to check.
Multicast may be your safest bet here - not only it fixes your issue, but also because "hybrid" is not yet standard - this is going to be the operation of the Enterprise Profile, which has not been ratified yet. If you're planning to interoperate with other vendors' PTP, multicast may also be a good choice.
What could potentially be done is in hybrid mode, to set bindtodevice for masteronly, and not set it for slaveonly. For masterslave this is tough.
In general, while it is convenient, I wouldn't recommend the full master+slave operation anyway, the industry is turning away from that, and unless you absolutely want this, it may lead to unwanted topology changes. It's better to run "boundary" - master on one port and slave on another, than flip between master and slave.
Regards,
Wojciech
Last edit: Wojciech Owczarek 2015-07-06
Christian,
Just to be precise, here's the background and what happens on some systems in testing (ignore this if this is obvious to you):
PTPd when running with no support for hardware timestamping, or transmit timestamping, must receive its own packets to estimate transmission time. This is needed for Sync in master state, and for Delay Request in slave state. The key in hybrid mode is that Sync is multicast but DelayReq is unicast
For multicast packets, the socket option ip_multicast_loop automatically feeds the transmitted packets back into the socket.
For unicast packets, we must manually send the packets to ourselves - those packets come in via the loopback interface. This is where the problem starts.
to make sure multicast packets always go out via the specified interface, we use so_bindtodevice. this allows you to run multiple Multicast streams for the same group over different interfaces.
when you set so_bindtodevice, the manual packet loopback does not work anymore on some kernels: the packets to self on loopback interface are not being received anymore
So as you see:
any mode that sends multicast event messages, needs so_bindtodevice to support multiple interfaces
any mode that sends unicast event messages, cannot use so_bindtodevice
This is a problem because 1. describes a hybrid master and 2. describes a hybrid slave. A decision had to be made in the code, and that decision was made in favour of the slave.