[Linuxptp-users] One-step sync and P2P mode issue
PTP IEEE 1588 stack for Linux
Brought to you by:
rcochran
From: Sriharsha B. <Sri...@Em...> - 2014-08-12 06:47:17
|
Hi folks, I'm running into this problem in P2P mode with our PTP capable NIC. I'd appreciate if you could take a look at the details below and confirm if this is an issue in ptp4l. The linuxptp version is: 1.4-00060-g93b7807. After starting ptp4l, it fails to send sync after a few iterations right at the beginning. The error is "No message of desired type" (ENOMSG - 42). After a timeout of 16 seconds (fault clear timeout) it restarts and encounters the same error and this repeats. Here's the ptp4l command/args used: ptp4l -f ptp2.cfg -HPm2 -l7 -i ptp2 Note that I've configured one-step sync that seems to be triggering this error condition. ptp4l[352396.491]: selected /dev/ptp4 as PTP clock ptp4l[352396.493]: PI servo: sync interval 1.000 kp 0.700 ki 0.300000 ptp4l[352396.514]: port 1: INITIALIZING to LISTENING on INITIALIZE ptp4l[352397.514]: port 1: delay timeout ptp4l[352397.514]: port 1: sending pdelay req seq(0) ptp4l[352398.514]: port 1: delay timeout ptp4l[352398.514]: port 1: sending pdelay req seq(1) ptp4l[352398.544]: port 1: setting asCapable ptp4l[352399.514]: port 1: delay timeout ptp4l[352399.514]: port 1: sending pdelay req seq(2) ptp4l[352400.514]: port 1: delay timeout ptp4l[352400.514]: port 1: sending pdelay req seq(3) ptp4l[352401.514]: port 1: delay timeout ptp4l[352401.514]: port 1: sending pdelay req seq(4) ptp4l[352402.514]: port 1: delay timeout ptp4l[352402.515]: port 1: sending pdelay req seq(5) ptp4l[352403.160]: port 1: announce timeout ptp4l[352403.160]: port 1: LISTENING to MASTER on ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES ptp4l[352403.161]: selected best master clock 0090fa.fffe.6c024e ptp4l[352403.161]: assuming the grand master role ptp4l[352403.162]: port 1: master tx announce timeout ptp4l[352403.515]: port 1: delay timeout ptp4l[352403.515]: port 1: sending pdelay req seq(6) ptp4l[352404.161]: port 1: master sync timeout ptp4l[352404.161]: p->timestamping (3) ptp4l[352404.515]: port 1: delay timeout ptp4l[352404.515]: port 1: sending pdelay req seq(7) ptp4l[352405.161]: port 1: master sync timeout ptp4l[352405.161]: p->timestamping (3) ptp4l[352405.162]: port 1: master tx announce timeout ptp4l[352405.515]: port 1: delay timeout ptp4l[352405.515]: port 1: sending pdelay req seq(8) ptp4l[352406.161]: port 1: master sync timeout ptp4l[352406.161]: send failed: 42 No message of desired type ptp4l[352406.161]: port 1: send sync failed ptp4l[352406.161]: port 1: MASTER to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED) ptp4l[352406.183]: waiting 2^{4} seconds to clear fault on port 1 ptp4l[352422.183]: clearing fault on port 1 Based on further debugging, the sequence of events leading to the error is shown below: - Driver/HW receives pdelay request and sync msgs alternately like this: - pdelay-req - sync - pdelay-req - sync <----- hits ENOMSG error - Messages are sent and tx timestamps generated for each (all are event msgs). - Driver makes a copy of skb with timestamp info into socket error queue. - The first sync message in the error queue is not consumed by ptp4l. - This is because port_tx_sync() calls transport_send() with event set to TRANS_ONESTEP. The transport send routines check the event type to know whether to call sk_receive() after sending the message to get a copy. They do sk_receive() only if event == TRANS_EVENT. - In this case since event is TRANS_ONESTEP, there's no call to sk_receive(). - This leaves the sync msg in socket error queue. - Next pdelay-req is sent; driver clones this skb into error queue. - ptp4l invokes sk_receive() to get a copy of this pdelay-req. - The kernel function sock_recv_errqueue() or ip_recv_error() is invoked. - There's a check in those functions to see if there's more skbs in err queue. - The kernel function marks the socket in error(sets sk->sk_err) before return. - The next sync message is sent down by ptp4l. - Stack calls sock_alloc_send_pskb() to get a skb. - There, it checks if the socket is in error and returns failure. - The second sync send fails. Note that the driver decides whether to make a copy into the error queue based on SKBTX_HW_TSTAMP. This gets set for packets that are sent down the stack on the event socket; i.e, the socket on which setsockopt(SO_TIMESTAMPING) has been done. So the driver seems to be correctly handling the packets. It looks like a change in ptp4l might be needed to address this. The transport send functions (raw_send(), udp_send()...) should be modified to consume the error queue skb also for the case when event is TRANS_ONESTEP. They might just have to throw away the skb if the timestamp is not of interest since it is one-step. Thanks, -Harsha |