[Linuxptp-devel] [PATCH 2/3] clock: Dump unexpected packets received on the error queues of sockets
PTP IEEE 1588 stack for Linux
Brought to you by:
rcochran
From: Vladimir O. <ol...@gm...> - 2019-12-16 23:10:57
|
In the current design of the SO_SELECT_ERR_QUEUE socket option (which is enabled by sk_timestamping_init on the event fd), it is a bug to only check revents & POLLIN, but not also POLLERR. Normally the error queue events that the application expects (i.e. TX timestamps) are received, within a given timeout, in-band with the transmission of the timestampable message itself (for example in raw_send). For messages that the application does not / no longer expects, such as TX timestamps delivered late, duplicate TX timestamps, general exceptional messages enqueued by the kernel in the socket error queue etc, ptp4l will be taken by surprise in clock_poll() by these, and will think that there is data, since POLLIN is set (but in fact in the socket error queue etc, ptp4l will be taken by surprise in clock_poll() by these, and will think that there is data, since POLLIN is set (but in fact POLLERR is also set, and this has an entirely different meaning). A very, very simple reproducer is to take a DSA switch and run: tcpdump -i eth0 -j adapter_unsynced on its DSA master net device. The above command will enable timestamping on that net device, and if both the DSA switch and the master support PTP, this will make the kernel send duplicate TX timestamps for every sent event packet, which will completely kill ptp4l until a reboot, with no indication whatsoever of what's going on. Since the messages on the error queue are unexpected, we have no need for them. And they can be in theory anything, so simply hex dumping their content and moving along sounds like a good idea. Printing them to the user is optional (and helpful), but reading them is not. With this patch, even with extraneous data delivered by a buggy kernel (which the application now loudly complains about), the synchronization keeps chugging along. Otherwise the application starts reordering packets in recvmsg() due to misinterpreting which socket queue has data available. Signed-off-by: Vladimir Oltean <ol...@gm...> --- clock.c | 11 +++++++++++ msg.c | 12 ++++++++++++ msg.h | 7 +++++++ 3 files changed, 30 insertions(+) diff --git a/clock.c b/clock.c index 146576ac589c..768bbb49513d 100644 --- a/clock.c +++ b/clock.c @@ -1508,6 +1508,17 @@ int clock_poll(struct clock *c) LIST_FOREACH(p, &c->ports, list) { /* Let the ports handle their events. */ for (i = 0; i < N_POLLFD; i++) { + if (cur[i].revents & POLLERR) { + unsigned char pkt[1600]; + + cnt = recv(cur[i].fd, pkt, sizeof(pkt), + MSG_ERRQUEUE); + pr_err("Unexpected data on socket err queue:"); + msg_raw_dump(pkt, cnt); + + continue; + } + if (cur[i].revents & (POLLIN|POLLPRI)) { event = port_event(p, i); if (EV_STATE_DECISION_EVENT == event) { diff --git a/msg.c b/msg.c index d1619d4973f1..aae6e141f2be 100644 --- a/msg.c +++ b/msg.c @@ -601,3 +601,15 @@ int msg_sots_missing(struct ptp_message *m) } return msg_sots_valid(m) ? 0 : 1; } + +void msg_raw_dump(unsigned char *pkt, int cnt) +{ + int k; + + for (k = 0; k < cnt; k++) { + if (k % 16 == 0) + printf("\n%04x ", k); + printf("%02x ", pkt[k]); + } + printf("\n"); +} diff --git a/msg.h b/msg.h index c1294c23f6c3..e85fbb916aef 100644 --- a/msg.h +++ b/msg.h @@ -394,6 +394,13 @@ void msg_put(struct ptp_message *m); */ int msg_sots_missing(struct ptp_message *m); +/** + * Print a wireshark-compatible hex dump of a message buffer. + * @param pkt Message buffer to print + * @param cnt Length of message buffer + */ +void msg_raw_dump(unsigned char *pkt, int cnt); + /** * Test whether a message has a valid SO_TIMESTAMPING time stamp. * @param m Message to test. -- 2.17.1 |