Thread: [Linuxptp-devel] [PATCH 0/3] More strict checking against kernel bugs
PTP IEEE 1588 stack for Linux
Brought to you by:
rcochran
From: Vladimir O. <ol...@gm...> - 2019-12-16 23:10:55
|
The reordering issue reported by me initially on the linuxptp-devel list [0] with the sja1105 DSA driver turned out not to be a reordering issue at all, in fact. Due to a kernel bug described in this patch [1], the DSA driver was in race with the master Ethernet driver and would occasionally (very rarely) deliver 2 TX timestamps to ptp4l on the event socket. The first TX timestamp is consumed in-band in the raw_send function. The second is caught by the main poll() syscall in the main ptp4l event loop - clock_poll(). When poll() sees the second TX-timestamped skb, it returns with revents == (POLLIN | POLLERR). But the main loop only checks for POLLIN, and says "yay, there's data!". So it proceeds to call recvmsg() with flags=0 (instead of MSG_ERRQUEUE), so it doesn't see any data in sk->sk_receive_queue. So, surprise, false alarm, the data that woke it up was in sk->sk_error_queue. The ptp4l process goes to sleep waiting for data. It sleeps for a whole sync interval. When it wakes up, it wakes up with the next sync, even though the previous sync's follow-up may have arrived in the meantime. Apparent reordering. Ptp4l does not print anything, it just appears to freeze. So this patch set aims to improves the error reporting in ptp4l, such that tracing back to the root cause is easier next time, and the problem does not blow up into other, completely unrelated things. [0]: https://sourceforge.net/p/linuxptp/mailman/message/36773629/ [1]: https://patchwork.ozlabs.org/patch/1210871/ Vladimir Oltean (3): ptp4l: Call recvmsg() with the MSG_DONTWAIT flag clock: Dump unexpected packets received on the error queues of sockets port: Signal sync/follow-up mismatch events loudly clock.c | 11 +++++++++++ msg.c | 12 ++++++++++++ msg.h | 7 +++++++ port.c | 21 +++++++++++++++++++++ raw.c | 2 +- udp.c | 2 +- udp6.c | 2 +- 7 files changed, 54 insertions(+), 3 deletions(-) -- 2.17.1 |
From: Vladimir O. <ol...@gm...> - 2019-12-16 23:10:56
|
The application's main event loop (clock_poll) is woken up by poll() and dispatches the socket receive queue events to the corresponding ports as needed. So it is a bug if poll() wakes up the process for data availability on a socket's receive queue, and then recvmsg(), called immediately afterwards, goes to sleep trying to retrieve it. This patch will generate an error that will be propagated to the user if this condition happens. Can it happen? As of this patch, ptp4l uses the SO_SELECT_ERR_QUEUE socket option, which means that poll() will wake the process up, with revents == (POLLIN | POLLERR), if data is available in the error queue. But clock_poll() does not check POLLERR, just POLLIN, and draws the wrong conclusion that there is data available in the receive queue (when it is in fact available in the error queue). When the above condition happens, recvmsg() will sleep typically for a whole sync interval waiting for data on the event socket, and will be woken up when the new real frame arrives. It will not dequeue follow-up messages during this time (which are sent to the general message socket) and when it does, it will already be late for them (their seqid will be out of order). So it will drop them and everything that comes after. The synchronization process will fail. The above condition shouldn't typically happen, but exceptional kernel events will trigger it. It helps to be strict in ptp4l in order for those events to not blow up in even stranger symptoms unrelated to the root cause of the problem. Signed-off-by: Vladimir Oltean <ol...@gm...> --- raw.c | 2 +- udp.c | 2 +- udp6.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/raw.c b/raw.c index f1c92b9f8d90..c0a8cd80855f 100644 --- a/raw.c +++ b/raw.c @@ -278,7 +278,7 @@ static int raw_recv(struct transport *t, int fd, void *buf, int buflen, buflen += hlen; hdr = (struct eth_hdr *) ptr; - cnt = sk_receive(fd, ptr, buflen, addr, hwts, 0); + cnt = sk_receive(fd, ptr, buflen, addr, hwts, MSG_DONTWAIT); if (cnt >= 0) cnt -= hlen; diff --git a/udp.c b/udp.c index 48af482b4526..eb1617872f37 100644 --- a/udp.c +++ b/udp.c @@ -210,7 +210,7 @@ no_event: static int udp_recv(struct transport *t, int fd, void *buf, int buflen, struct address *addr, struct hw_timestamp *hwts) { - return sk_receive(fd, buf, buflen, addr, hwts, 0); + return sk_receive(fd, buf, buflen, addr, hwts, MSG_DONTWAIT); } static int udp_send(struct transport *t, struct fdarray *fda, diff --git a/udp6.c b/udp6.c index 74ebc7f0cf09..06c6fad2160f 100644 --- a/udp6.c +++ b/udp6.c @@ -227,7 +227,7 @@ no_event: static int udp6_recv(struct transport *t, int fd, void *buf, int buflen, struct address *addr, struct hw_timestamp *hwts) { - return sk_receive(fd, buf, buflen, addr, hwts, 0); + return sk_receive(fd, buf, buflen, addr, hwts, MSG_DONTWAIT); } static int udp6_send(struct transport *t, struct fdarray *fda, -- 2.17.1 |
From: Keller, J. E <jac...@in...> - 2019-12-17 20:00:07
|
> -----Original Message----- > From: Vladimir Oltean <ol...@gm...> > Sent: Monday, December 16, 2019 3:10 PM > To: ric...@gm... > Cc: lin...@li... > Subject: [Linuxptp-devel] [PATCH 1/3] ptp4l: Call recvmsg() with the > MSG_DONTWAIT flag > > The application's main event loop (clock_poll) is woken up by poll() and > dispatches the socket receive queue events to the corresponding ports as > needed. > > So it is a bug if poll() wakes up the process for data availability on a > socket's receive queue, and then recvmsg(), called immediately > afterwards, goes to sleep trying to retrieve it. This patch will > generate an error that will be propagated to the user if this condition > happens. > > Can it happen? > > As of this patch, ptp4l uses the SO_SELECT_ERR_QUEUE socket option, > which means that poll() will wake the process up, with revents == > (POLLIN | POLLERR), if data is available in the error queue. But > clock_poll() does not check POLLERR, just POLLIN, and draws the wrong > conclusion that there is data available in the receive queue (when it is > in fact available in the error queue). > > When the above condition happens, recvmsg() will sleep typically for a > whole sync interval waiting for data on the event socket, and will be > woken up when the new real frame arrives. It will not dequeue follow-up > messages during this time (which are sent to the general message socket) > and when it does, it will already be late for them (their seqid will be > out of order). So it will drop them and everything that comes after. The > synchronization process will fail. > > The above condition shouldn't typically happen, but exceptional kernel > events will trigger it. It helps to be strict in ptp4l in order for > those events to not blow up in even stranger symptoms unrelated to the > root cause of the problem. > Yep, makes sense. > Signed-off-by: Vladimir Oltean <ol...@gm...> > --- > raw.c | 2 +- > udp.c | 2 +- > udp6.c | 2 +- > 3 files changed, 3 insertions(+), 3 deletions(-) > > diff --git a/raw.c b/raw.c > index f1c92b9f8d90..c0a8cd80855f 100644 > --- a/raw.c > +++ b/raw.c > @@ -278,7 +278,7 @@ static int raw_recv(struct transport *t, int fd, void *buf, int > buflen, > buflen += hlen; > hdr = (struct eth_hdr *) ptr; > > - cnt = sk_receive(fd, ptr, buflen, addr, hwts, 0); > + cnt = sk_receive(fd, ptr, buflen, addr, hwts, MSG_DONTWAIT); > So basically we just indicate that we do not want to sleep. Good. This makes sense to me. Thanks, Jake > if (cnt >= 0) > cnt -= hlen; > diff --git a/udp.c b/udp.c > index 48af482b4526..eb1617872f37 100644 > --- a/udp.c > +++ b/udp.c > @@ -210,7 +210,7 @@ no_event: > static int udp_recv(struct transport *t, int fd, void *buf, int buflen, > struct address *addr, struct hw_timestamp *hwts) > { > - return sk_receive(fd, buf, buflen, addr, hwts, 0); > + return sk_receive(fd, buf, buflen, addr, hwts, MSG_DONTWAIT); > } > > static int udp_send(struct transport *t, struct fdarray *fda, > diff --git a/udp6.c b/udp6.c > index 74ebc7f0cf09..06c6fad2160f 100644 > --- a/udp6.c > +++ b/udp6.c > @@ -227,7 +227,7 @@ no_event: > static int udp6_recv(struct transport *t, int fd, void *buf, int buflen, > struct address *addr, struct hw_timestamp *hwts) > { > - return sk_receive(fd, buf, buflen, addr, hwts, 0); > + return sk_receive(fd, buf, buflen, addr, hwts, MSG_DONTWAIT); > } > > static int udp6_send(struct transport *t, struct fdarray *fda, > -- > 2.17.1 > > > > _______________________________________________ > Linuxptp-devel mailing list > Lin...@li... > https://lists.sourceforge.net/lists/listinfo/linuxptp-devel |
From: Vladimir O. <ol...@gm...> - 2019-12-16 23:10:57
|
Ptp4l is too silent when receiving, for whatever reason, out of order messages. If the reordering is persistent (which is either a broken network, or a broken kernel), the behavior looks like a complete synchronization stall, since the application is designed to never attempt to recover from such a condition. At least save some people some debugging hours and be loud when the application reaches this code path. Signed-off-by: Vladimir Oltean <ol...@gm...> --- port.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/port.c b/port.c index 58fbe669af14..58559531f7e7 100644 --- a/port.c +++ b/port.c @@ -1191,6 +1191,23 @@ static void port_synchronize(struct port *p, } } +static void port_syfufsm_print_mismatch(struct port *p, enum syfu_event event, + struct ptp_message *m) +{ + int expected_msgtype; + + if (event == SYNC_MISMATCH) + expected_msgtype = FOLLOW_UP; + else + expected_msgtype = SYNC; + + pr_err("port %hu: have %s %d, expecting %s but got %s %d, dropping", + portnum(p), msg_type_string(msg_type(p->last_syncfup)), + p->last_syncfup->header.sequenceId, + msg_type_string(expected_msgtype), + msg_type_string(msg_type(m)), m->header.sequenceId); +} + /* * Handle out of order packets. The network stack might * provide the follow up _before_ the sync message. After all, @@ -1225,6 +1242,7 @@ static void port_syfufsm(struct port *p, enum syfu_event event, case SF_HAVE_SYNC: switch (event) { case SYNC_MISMATCH: + port_syfufsm_print_mismatch(p, event, m); msg_put(p->last_syncfup); msg_get(m); p->last_syncfup = m; @@ -1232,6 +1250,7 @@ static void port_syfufsm(struct port *p, enum syfu_event event, case SYNC_MATCH: break; case FUP_MISMATCH: + port_syfufsm_print_mismatch(p, event, m); msg_put(p->last_syncfup); msg_get(m); p->last_syncfup = m; @@ -1252,6 +1271,7 @@ static void port_syfufsm(struct port *p, enum syfu_event event, case SF_HAVE_FUP: switch (event) { case SYNC_MISMATCH: + port_syfufsm_print_mismatch(p, event, m); msg_put(p->last_syncfup); msg_get(m); p->last_syncfup = m; @@ -1267,6 +1287,7 @@ static void port_syfufsm(struct port *p, enum syfu_event event, p->syfu = SF_EMPTY; break; case FUP_MISMATCH: + port_syfufsm_print_mismatch(p, event, m); msg_put(p->last_syncfup); msg_get(m); p->last_syncfup = m; -- 2.17.1 |
From: Keller, J. E <jac...@in...> - 2019-12-17 20:05:45
|
> -----Original Message----- > From: Vladimir Oltean <ol...@gm...> > Sent: Monday, December 16, 2019 3:11 PM > To: ric...@gm... > Cc: lin...@li... > Subject: [Linuxptp-devel] [PATCH 3/3] port: Signal sync/follow-up mismatch > events loudly > > Ptp4l is too silent when receiving, for whatever reason, out of order > messages. If the reordering is persistent (which is either a broken > network, or a broken kernel), the behavior looks like a complete > synchronization stall, since the application is designed to never > attempt to recover from such a condition. > > At least save some people some debugging hours and be loud when the > application reaches this code path. > Excellent! This one makes sense to me. Thanks, Jake |
From: Vladimir O. <ol...@gm...> - 2019-12-16 23:10:57
|
In the current design of the SO_SELECT_ERR_QUEUE socket option (which is enabled by sk_timestamping_init on the event fd), it is a bug to only check revents & POLLIN, but not also POLLERR. Normally the error queue events that the application expects (i.e. TX timestamps) are received, within a given timeout, in-band with the transmission of the timestampable message itself (for example in raw_send). For messages that the application does not / no longer expects, such as TX timestamps delivered late, duplicate TX timestamps, general exceptional messages enqueued by the kernel in the socket error queue etc, ptp4l will be taken by surprise in clock_poll() by these, and will think that there is data, since POLLIN is set (but in fact in the socket error queue etc, ptp4l will be taken by surprise in clock_poll() by these, and will think that there is data, since POLLIN is set (but in fact POLLERR is also set, and this has an entirely different meaning). A very, very simple reproducer is to take a DSA switch and run: tcpdump -i eth0 -j adapter_unsynced on its DSA master net device. The above command will enable timestamping on that net device, and if both the DSA switch and the master support PTP, this will make the kernel send duplicate TX timestamps for every sent event packet, which will completely kill ptp4l until a reboot, with no indication whatsoever of what's going on. Since the messages on the error queue are unexpected, we have no need for them. And they can be in theory anything, so simply hex dumping their content and moving along sounds like a good idea. Printing them to the user is optional (and helpful), but reading them is not. With this patch, even with extraneous data delivered by a buggy kernel (which the application now loudly complains about), the synchronization keeps chugging along. Otherwise the application starts reordering packets in recvmsg() due to misinterpreting which socket queue has data available. Signed-off-by: Vladimir Oltean <ol...@gm...> --- clock.c | 11 +++++++++++ msg.c | 12 ++++++++++++ msg.h | 7 +++++++ 3 files changed, 30 insertions(+) diff --git a/clock.c b/clock.c index 146576ac589c..768bbb49513d 100644 --- a/clock.c +++ b/clock.c @@ -1508,6 +1508,17 @@ int clock_poll(struct clock *c) LIST_FOREACH(p, &c->ports, list) { /* Let the ports handle their events. */ for (i = 0; i < N_POLLFD; i++) { + if (cur[i].revents & POLLERR) { + unsigned char pkt[1600]; + + cnt = recv(cur[i].fd, pkt, sizeof(pkt), + MSG_ERRQUEUE); + pr_err("Unexpected data on socket err queue:"); + msg_raw_dump(pkt, cnt); + + continue; + } + if (cur[i].revents & (POLLIN|POLLPRI)) { event = port_event(p, i); if (EV_STATE_DECISION_EVENT == event) { diff --git a/msg.c b/msg.c index d1619d4973f1..aae6e141f2be 100644 --- a/msg.c +++ b/msg.c @@ -601,3 +601,15 @@ int msg_sots_missing(struct ptp_message *m) } return msg_sots_valid(m) ? 0 : 1; } + +void msg_raw_dump(unsigned char *pkt, int cnt) +{ + int k; + + for (k = 0; k < cnt; k++) { + if (k % 16 == 0) + printf("\n%04x ", k); + printf("%02x ", pkt[k]); + } + printf("\n"); +} diff --git a/msg.h b/msg.h index c1294c23f6c3..e85fbb916aef 100644 --- a/msg.h +++ b/msg.h @@ -394,6 +394,13 @@ void msg_put(struct ptp_message *m); */ int msg_sots_missing(struct ptp_message *m); +/** + * Print a wireshark-compatible hex dump of a message buffer. + * @param pkt Message buffer to print + * @param cnt Length of message buffer + */ +void msg_raw_dump(unsigned char *pkt, int cnt); + /** * Test whether a message has a valid SO_TIMESTAMPING time stamp. * @param m Message to test. -- 2.17.1 |
From: Keller, J. E <jac...@in...> - 2019-12-17 19:58:27
|
> -----Original Message----- > From: Vladimir Oltean <ol...@gm...> > Sent: Monday, December 16, 2019 3:10 PM > To: ric...@gm... > Cc: lin...@li... > Subject: [Linuxptp-devel] [PATCH 0/3] More strict checking against kernel bugs > > The reordering issue reported by me initially on the linuxptp-devel > list [0] with the sja1105 DSA driver turned out not to be a reordering > issue at all, in fact. > > Due to a kernel bug described in this patch [1], the DSA driver was in > race with the master Ethernet driver and would occasionally (very > rarely) deliver 2 TX timestamps to ptp4l on the event socket. > > The first TX timestamp is consumed in-band in the raw_send function. > The second is caught by the main poll() syscall in the main ptp4l event > loop - clock_poll(). > > When poll() sees the second TX-timestamped skb, it returns with revents > == (POLLIN | POLLERR). But the main loop only checks for POLLIN, and > says "yay, there's data!". So it proceeds to call recvmsg() with flags=0 > (instead of MSG_ERRQUEUE), so it doesn't see any data in > sk->sk_receive_queue. So, surprise, false alarm, the data that woke it > up was in sk->sk_error_queue. The ptp4l process goes to sleep waiting > for data. > > It sleeps for a whole sync interval. > > When it wakes up, it wakes up with the next sync, even though the > previous sync's follow-up may have arrived in the meantime. > > Apparent reordering. > > Ptp4l does not print anything, it just appears to freeze. > > So this patch set aims to improves the error reporting in ptp4l, such > that tracing back to the root cause is easier next time, and the problem > does not blow up into other, completely unrelated things. > Nice analysis. Thanks, Jake > [0]: https://sourceforge.net/p/linuxptp/mailman/message/36773629/ > [1]: https://patchwork.ozlabs.org/patch/1210871/ > > Vladimir Oltean (3): > ptp4l: Call recvmsg() with the MSG_DONTWAIT flag > clock: Dump unexpected packets received on the error queues of sockets > port: Signal sync/follow-up mismatch events loudly > > clock.c | 11 +++++++++++ > msg.c | 12 ++++++++++++ > msg.h | 7 +++++++ > port.c | 21 +++++++++++++++++++++ > raw.c | 2 +- > udp.c | 2 +- > udp6.c | 2 +- > 7 files changed, 54 insertions(+), 3 deletions(-) > > -- > 2.17.1 > > > > _______________________________________________ > Linuxptp-devel mailing list > Lin...@li... > https://lists.sourceforge.net/lists/listinfo/linuxptp-devel |
From: Keller, J. E <jac...@in...> - 2019-12-17 20:04:45
|
> -----Original Message----- > From: Vladimir Oltean <ol...@gm...> > Sent: Monday, December 16, 2019 3:11 PM > To: ric...@gm... > Cc: lin...@li... > Subject: [Linuxptp-devel] [PATCH 2/3] clock: Dump unexpected packets received > on the error queues of sockets > > In the current design of the SO_SELECT_ERR_QUEUE socket option (which is > enabled by sk_timestamping_init on the event fd), it is a bug to only > check revents & POLLIN, but not also POLLERR. > > Normally the error queue events that the application expects (i.e. TX > timestamps) are received, within a given timeout, in-band with the > transmission of the timestampable message itself (for example in > raw_send). > > For messages that the application does not / no longer expects, such as > TX timestamps delivered late, duplicate TX timestamps, general > exceptional messages enqueued by the kernel in the socket error queue > etc, ptp4l will be taken by surprise in clock_poll() by these, and will > think that there is data, since POLLIN is set (but in fact in the socket > error queue etc, ptp4l will be taken by surprise in clock_poll() by > these, and will think that there is data, since POLLIN is set (but in > fact POLLERR is also set, and this has an entirely different meaning). This double parenthetical is a bit hard to parse. > > A very, very simple reproducer is to take a DSA switch and run: > > tcpdump -i eth0 -j adapter_unsynced > > on its DSA master net device. The above command will enable timestamping > on that net device, and if both the DSA switch and the master support > PTP, this will make the kernel send duplicate TX timestamps for every > sent event packet, which will completely kill ptp4l until a reboot, with > no indication whatsoever of what's going on. > > Since the messages on the error queue are unexpected, we have no need > for them. And they can be in theory anything, so simply hex dumping > their content and moving along sounds like a good idea. > > Printing them to the user is optional (and helpful), but reading them is > not. With this patch, even with extraneous data delivered by a buggy > kernel (which the application now loudly complains about), the > synchronization keeps chugging along. Otherwise the application starts > reordering packets in recvmsg() due to misinterpreting which socket > queue has data available. > > Signed-off-by: Vladimir Oltean <ol...@gm...> > --- > clock.c | 11 +++++++++++ > msg.c | 12 ++++++++++++ > msg.h | 7 +++++++ > 3 files changed, 30 insertions(+) > > diff --git a/clock.c b/clock.c > index 146576ac589c..768bbb49513d 100644 > --- a/clock.c > +++ b/clock.c > @@ -1508,6 +1508,17 @@ int clock_poll(struct clock *c) > LIST_FOREACH(p, &c->ports, list) { > /* Let the ports handle their events. */ > for (i = 0; i < N_POLLFD; i++) { > + if (cur[i].revents & POLLERR) { > + unsigned char pkt[1600]; > + > + cnt = recv(cur[i].fd, pkt, sizeof(pkt), > + MSG_ERRQUEUE); > + pr_err("Unexpected data on socket err queue:"); > + msg_raw_dump(pkt, cnt); > + > + continue; > + } > + Does it makes sense to always print these at pr_err level? I suppose we almost never expect messages besides Tx timestamps, so warning loudly is better than only warning if you enable the higher debug printing.. However.. with this change it almost feels like these are now ignored/handled properly and thus not an issue so printing them at like pr_debug might make more sense? Hmm. I think I'm fine with this either way. Thanks, Jake > if (cur[i].revents & (POLLIN|POLLPRI)) { > event = port_event(p, i); > if (EV_STATE_DECISION_EVENT == event) { > diff --git a/msg.c b/msg.c > index d1619d4973f1..aae6e141f2be 100644 > --- a/msg.c > +++ b/msg.c > @@ -601,3 +601,15 @@ int msg_sots_missing(struct ptp_message *m) > } > return msg_sots_valid(m) ? 0 : 1; > } > + > +void msg_raw_dump(unsigned char *pkt, int cnt) > +{ > + int k; > + > + for (k = 0; k < cnt; k++) { > + if (k % 16 == 0) > + printf("\n%04x ", k); > + printf("%02x ", pkt[k]); > + } > + printf("\n"); > +} > diff --git a/msg.h b/msg.h > index c1294c23f6c3..e85fbb916aef 100644 > --- a/msg.h > +++ b/msg.h > @@ -394,6 +394,13 @@ void msg_put(struct ptp_message *m); > */ > int msg_sots_missing(struct ptp_message *m); > > +/** > + * Print a wireshark-compatible hex dump of a message buffer. > + * @param pkt Message buffer to print > + * @param cnt Length of message buffer > + */ > +void msg_raw_dump(unsigned char *pkt, int cnt); > + > /** > * Test whether a message has a valid SO_TIMESTAMPING time stamp. > * @param m Message to test. > -- > 2.17.1 > > > > _______________________________________________ > Linuxptp-devel mailing list > Lin...@li... > https://lists.sourceforge.net/lists/listinfo/linuxptp-devel |
From: Vladimir O. <ol...@gm...> - 2019-12-17 20:34:03
|
On Tue, 17 Dec 2019 at 22:04, Keller, Jacob E <jac...@in...> wrote: > > -----Original Message----- > > From: Vladimir Oltean <ol...@gm...> > > Sent: Monday, December 16, 2019 3:11 PM > > To: ric...@gm... > > Cc: lin...@li... > > Subject: [Linuxptp-devel] [PATCH 2/3] clock: Dump unexpected packets received > > on the error queues of sockets > > For messages that the application does not / no longer expects, such as > > TX timestamps delivered late, duplicate TX timestamps, general > > exceptional messages enqueued by the kernel in the socket error queue > > etc, ptp4l will be taken by surprise in clock_poll() by these, and will > > think that there is data, since POLLIN is set (but in fact in the socket > > error queue etc, ptp4l will be taken by surprise in clock_poll() by > > these, and will think that there is data, since POLLIN is set (but in > > fact POLLERR is also set, and this has an entirely different meaning). > > This double parenthetical is a bit hard to parse. > Sorry, this is hard for me to understand too. The paragraph should have been: For messages that the application does not / no longer expects, such as TX timestamps delivered late, duplicate TX timestamps, general exceptional messages enqueued by the kernel in the socket error queue etc, ptp4l will be taken by surprise in clock_poll() by these, and will think that there is data, since POLLIN is set (but in fact POLLERR is also set, and this has an entirely different meaning), and will attempt to dequeue them from the wrong queue, which is empty. > > > > A very, very simple reproducer is to take a DSA switch and run: > > > > tcpdump -i eth0 -j adapter_unsynced > > > > on its DSA master net device. The above command will enable timestamping > > on that net device, and if both the DSA switch and the master support > > PTP, this will make the kernel send duplicate TX timestamps for every > > sent event packet, which will completely kill ptp4l until a reboot, with > > no indication whatsoever of what's going on. > > > > Since the messages on the error queue are unexpected, we have no need > > for them. And they can be in theory anything, so simply hex dumping > > their content and moving along sounds like a good idea. > > > > Printing them to the user is optional (and helpful), but reading them is > > not. With this patch, even with extraneous data delivered by a buggy > > kernel (which the application now loudly complains about), the > > synchronization keeps chugging along. Otherwise the application starts > > reordering packets in recvmsg() due to misinterpreting which socket > > queue has data available. > > > > Signed-off-by: Vladimir Oltean <ol...@gm...> > > --- > > clock.c | 11 +++++++++++ > > msg.c | 12 ++++++++++++ > > msg.h | 7 +++++++ > > 3 files changed, 30 insertions(+) > > > > diff --git a/clock.c b/clock.c > > index 146576ac589c..768bbb49513d 100644 > > --- a/clock.c > > +++ b/clock.c > > @@ -1508,6 +1508,17 @@ int clock_poll(struct clock *c) > > LIST_FOREACH(p, &c->ports, list) { > > /* Let the ports handle their events. */ > > for (i = 0; i < N_POLLFD; i++) { > > + if (cur[i].revents & POLLERR) { > > + unsigned char pkt[1600]; > > + > > + cnt = recv(cur[i].fd, pkt, sizeof(pkt), > > + MSG_ERRQUEUE); > > + pr_err("Unexpected data on socket err queue:"); > > + msg_raw_dump(pkt, cnt); > > + > > + continue; > > + } > > + > > Does it makes sense to always print these at pr_err level? I suppose we almost never expect messages besides Tx timestamps, so warning loudly is better than only warning if you enable the higher debug printing.. However.. with this change it almost feels like these are now ignored/handled properly and thus not an issue so printing them at like pr_debug might make more sense? For my particular case, this is dead code because of the kernel fix. The other "[PATCH 3/3] port: Signal sync/follow-up mismatch events loudly" is dead code because of this one. For anybody else, I suppose that this is the beginning of the end of what ptp4l will tell them anyway. So I'm not too concerned that it's too loud. It isn't guarded by anything else, it is the other way around. First you poll on the fd, then you recvmsg and process it to see what it is. > > Hmm. I think I'm fine with this either way. > > Thanks, > Jake Thanks, -Vladimir |
From: Keller, J. E <jac...@in...> - 2019-12-17 22:29:40
|
> -----Original Message----- > From: Vladimir Oltean <ol...@gm...> > Sent: Tuesday, December 17, 2019 12:34 PM > To: Keller, Jacob E <jac...@in...> > Cc: ric...@gm...; lin...@li... > Subject: Re: [Linuxptp-devel] [PATCH 2/3] clock: Dump unexpected packets > received on the error queues of sockets > > On Tue, 17 Dec 2019 at 22:04, Keller, Jacob E <jac...@in...> wrote: > > > > -----Original Message----- > > > From: Vladimir Oltean <ol...@gm...> > > > Sent: Monday, December 16, 2019 3:11 PM > > > To: ric...@gm... > > > Cc: lin...@li... > > > Subject: [Linuxptp-devel] [PATCH 2/3] clock: Dump unexpected packets > received > > > on the error queues of sockets > > > For messages that the application does not / no longer expects, such as > > > TX timestamps delivered late, duplicate TX timestamps, general > > > exceptional messages enqueued by the kernel in the socket error queue > > > etc, ptp4l will be taken by surprise in clock_poll() by these, and will > > > think that there is data, since POLLIN is set (but in fact in the socket > > > error queue etc, ptp4l will be taken by surprise in clock_poll() by > > > these, and will think that there is data, since POLLIN is set (but in > > > fact POLLERR is also set, and this has an entirely different meaning). > > > > This double parenthetical is a bit hard to parse. > > > > Sorry, this is hard for me to understand too. The paragraph should have been: > > For messages that the application does not / no longer expects, such as > TX timestamps delivered late, duplicate TX timestamps, general > exceptional messages enqueued by the kernel in the socket error queue > etc, ptp4l will be taken by surprise in clock_poll() by these, and will > think that there is data, since POLLIN is set (but in fact POLLERR is > also set, and this has an entirely different meaning), and will attempt > to dequeue them from the wrong queue, which is empty. > That sounds better. It still feels a bit like a run-on sentence. Maybe something like: For messages that the application does not / no longer expects, such as Tx timestamps delivered late, duplicate Tx timestamps, or general exceptional messages enqueued by the kernel in the socket error queue, ptp4l will be taken by surprise in clock_poll(). It will think there is data, since POLLIN is set. In fact, POLLERR is also set which gives an entirely different meaning. Because of this, ptp4l will attempt to dequeue messages from the wrong (empty) queue. > > > > > > > A very, very simple reproducer is to take a DSA switch and run: > > > > > > tcpdump -i eth0 -j adapter_unsynced > > > > > > on its DSA master net device. The above command will enable timestamping > > > on that net device, and if both the DSA switch and the master support > > > PTP, this will make the kernel send duplicate TX timestamps for every > > > sent event packet, which will completely kill ptp4l until a reboot, with > > > no indication whatsoever of what's going on. > > > > > > Since the messages on the error queue are unexpected, we have no need > > > for them. And they can be in theory anything, so simply hex dumping > > > their content and moving along sounds like a good idea. > > > > > > Printing them to the user is optional (and helpful), but reading them is > > > not. With this patch, even with extraneous data delivered by a buggy > > > kernel (which the application now loudly complains about), the > > > synchronization keeps chugging along. Otherwise the application starts > > > reordering packets in recvmsg() due to misinterpreting which socket > > > queue has data available. > > > > > > Signed-off-by: Vladimir Oltean <ol...@gm...> > > > --- > > > clock.c | 11 +++++++++++ > > > msg.c | 12 ++++++++++++ > > > msg.h | 7 +++++++ > > > 3 files changed, 30 insertions(+) > > > > > > diff --git a/clock.c b/clock.c > > > index 146576ac589c..768bbb49513d 100644 > > > --- a/clock.c > > > +++ b/clock.c > > > @@ -1508,6 +1508,17 @@ int clock_poll(struct clock *c) > > > LIST_FOREACH(p, &c->ports, list) { > > > /* Let the ports handle their events. */ > > > for (i = 0; i < N_POLLFD; i++) { > > > + if (cur[i].revents & POLLERR) { > > > + unsigned char pkt[1600]; > > > + > > > + cnt = recv(cur[i].fd, pkt, sizeof(pkt), > > > + MSG_ERRQUEUE); > > > + pr_err("Unexpected data on socket err queue:"); > > > + msg_raw_dump(pkt, cnt); > > > + > > > + continue; > > > + } > > > + > > > > Does it makes sense to always print these at pr_err level? I suppose we almost > never expect messages besides Tx timestamps, so warning loudly is better than > only warning if you enable the higher debug printing.. However.. with this change > it almost feels like these are now ignored/handled properly and thus not an issue > so printing them at like pr_debug might make more sense? > > For my particular case, this is dead code because of the kernel fix. > The other "[PATCH 3/3] port: Signal sync/follow-up mismatch events > loudly" is dead code because of this one. Right. This only matters in the case where we *do* get extra messages. > > For anybody else, I suppose that this is the beginning of the end of > what ptp4l will tell them anyway. So I'm not too concerned that it's > too loud. It isn't guarded by anything else, it is the other way > around. First you poll on the fd, then you recvmsg and process it to > see what it is. I'm just wanting to make sure that the messages aren't going to be something where "things work as expected, and now the user is confused by the additional messages". It sounds like this should be rare enough that always dumping and loudly complaining is the right thing to do. Thanks, Jake > > > > > Hmm. I think I'm fine with this either way. > > > > Thanks, > > Jake > > Thanks, > -Vladimir |
From: Vladimir O. <ol...@gm...> - 2020-06-15 15:23:45
|
The reordering issue reported by me initially on the linuxptp-devel list [0] with the sja1105 DSA driver turned out not to be a reordering issue at all, in fact. Due to a kernel bug described in this patch [1], the DSA driver was in race with the master Ethernet driver and would occasionally (very rarely) deliver 2 TX timestamps to ptp4l on the event socket. The first TX timestamp is consumed in-band in the raw_send function. The second is caught by the main poll() syscall in the main ptp4l event loop - clock_poll(). When poll() sees the second TX-timestamped skb, it returns with revents == (POLLIN | POLLERR). But the main loop only checks for POLLIN, and says "yay, there's data!". So it proceeds to call recvmsg() with flags=0 (instead of MSG_ERRQUEUE), so it doesn't see any data in sk->sk_receive_queue. So, surprise, false alarm, the data that woke it up was in sk->sk_error_queue. The ptp4l process goes to sleep waiting for data. It sleeps for a whole sync interval. When it wakes up, it wakes up with the next sync, even though the previous sync's follow-up may have arrived in the meantime. Apparent reordering. Ptp4l does not print anything, it just appears to freeze. So this patch set aims to improves the error reporting in ptp4l, such that tracing back to the root cause is easier next time, and the problem does not blow up into other, completely unrelated things. [0]: https://sourceforge.net/p/linuxptp/mailman/message/36773629/ [1]: https://patchwork.ozlabs.org/project/netdev/cover/201...@gm... Vladimir Oltean (3): ptp4l: call recvmsg() with the MSG_DONTWAIT flag clock: dump unexpected packets received on the error queues of sockets port: print sync/follow-up mismatch events clock.c | 11 +++++++++++ msg.c | 12 ++++++++++++ msg.h | 8 ++++++++ port.c | 21 +++++++++++++++++++++ raw.c | 2 +- udp.c | 2 +- udp6.c | 2 +- 7 files changed, 55 insertions(+), 3 deletions(-) -- 2.25.1 |
From: Vladimir O. <ol...@gm...> - 2020-06-15 15:23:39
|
The application's main event loop (clock_poll) is woken up by poll() and dispatches the socket receive queue events to the corresponding ports as needed. So it is a bug if poll() wakes up the process for data availability on a socket's receive queue, and then recvmsg(), called immediately afterwards, goes to sleep trying to retrieve it. This patch will generate an error that will be propagated to the user if this condition happens. Can it happen? As of this patch, ptp4l uses the SO_SELECT_ERR_QUEUE socket option, which means that poll() will wake the process up, with revents == (POLLIN | POLLERR), if data is available in the error queue. But clock_poll() does not check POLLERR, just POLLIN, and draws the wrong conclusion that there is data available in the receive queue (when it is in fact available in the error queue). When the above condition happens, recvmsg() will sleep typically for a whole sync interval waiting for data on the event socket, and will be woken up when the new real frame arrives. It will not dequeue follow-up messages during this time (which are sent to the general message socket) and when it does, it will already be late for them (their seqid will be out of order). So it will drop them and everything that comes after. The synchronization process will fail. The above condition shouldn't typically happen, but exceptional kernel events will trigger it. It helps to be strict in ptp4l in order for those events to not blow up in even stranger symptoms unrelated to the root cause of the problem. Signed-off-by: Vladimir Oltean <ol...@gm...> --- Changes in v2: None. raw.c | 2 +- udp.c | 2 +- udp6.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/raw.c b/raw.c index 15c97561066e..0bd15b08e0a2 100644 --- a/raw.c +++ b/raw.c @@ -279,7 +279,7 @@ static int raw_recv(struct transport *t, int fd, void *buf, int buflen, buflen += hlen; hdr = (struct eth_hdr *) ptr; - cnt = sk_receive(fd, ptr, buflen, addr, hwts, 0); + cnt = sk_receive(fd, ptr, buflen, addr, hwts, MSG_DONTWAIT); if (cnt >= 0) cnt -= hlen; diff --git a/udp.c b/udp.c index 36802fb67b74..826bd124deef 100644 --- a/udp.c +++ b/udp.c @@ -210,7 +210,7 @@ no_event: static int udp_recv(struct transport *t, int fd, void *buf, int buflen, struct address *addr, struct hw_timestamp *hwts) { - return sk_receive(fd, buf, buflen, addr, hwts, 0); + return sk_receive(fd, buf, buflen, addr, hwts, MSG_DONTWAIT); } static int udp_send(struct transport *t, struct fdarray *fda, diff --git a/udp6.c b/udp6.c index 744a5bc8adcb..ba5482e3d4c9 100644 --- a/udp6.c +++ b/udp6.c @@ -227,7 +227,7 @@ no_event: static int udp6_recv(struct transport *t, int fd, void *buf, int buflen, struct address *addr, struct hw_timestamp *hwts) { - return sk_receive(fd, buf, buflen, addr, hwts, 0); + return sk_receive(fd, buf, buflen, addr, hwts, MSG_DONTWAIT); } static int udp6_send(struct transport *t, struct fdarray *fda, -- 2.25.1 |
From: Vladimir O. <ol...@gm...> - 2020-06-19 12:41:29
|
On Fri, 19 Jun 2020 at 15:36, Richard Cochran <ric...@gm...> wrote: > > On Fri, Jun 19, 2020 at 03:26:48PM +0300, Vladimir Oltean wrote: > > if (flags == MSG_ERRQUEUE) { > > struct pollfd pfd = { fd, sk_events, 0 }; > > res = poll(&pfd, 1, sk_tx_timeout); > > if (res < 1) { > > pr_err(res ? "poll for tx timestamp failed: %m" : > > "timed out while polling for tx timestamp"); > > pr_err("increasing tx_timestamp_timeout may correct " > > "this issue, but it is likely caused by a driver bug"); > > return -errno; > > > > What business has ptp4l to say that TX timestamps not being delivered > > within 1 ms to the application are indicative of a driver bug? You and > > I know full well that there's a lot of hardware which simply can't > > deliver timestamps that fast, and that there's nothing wrong with it. > > Correct, and the timeout is configurable beyond the 1 ms default. > > > No need to shame here, really. > > This is printing a helpful message to let the user set an appropriate > timeout value for their system. > > > Why would the network stack need additional validation via software > > timestamps? I mean, it's not ptp4l's fault that timestamps were not > > delivered in the right order. I thought the linuxptp suite is trying > > to be helpful instead of pedantic, I'm starting to change my opinion. > > You are certainly entitled to your opinions! > > Thanks, > Richard Ok, and this patch does not have a place in linuxptp because? Thanks, -Vladimir |
From: Richard C. <ric...@gm...> - 2020-06-19 12:47:20
|
On Fri, Jun 19, 2020 at 03:41:09PM +0300, Vladimir Oltean wrote: > Ok, and this patch does not have a place in linuxptp because? It is a matter of style, as I already said. Thanks, |
From: Richard C. <ric...@gm...> - 2020-06-24 17:15:52
|
On Mon, Jun 15, 2020 at 06:23:19PM +0300, Vladimir Oltean wrote: > The application's main event loop (clock_poll) is woken up by poll() and > dispatches the socket receive queue events to the corresponding ports as > needed. > > So it is a bug if poll() wakes up the process for data availability on a > socket's receive queue, and then recvmsg(), called immediately > afterwards, goes to sleep trying to retrieve it. This patch will > generate an error that will be propagated to the user if this condition > happens. > > Can it happen? > > As of this patch, ptp4l uses the SO_SELECT_ERR_QUEUE socket option, > which means that poll() will wake the process up, with revents == > (POLLIN | POLLERR), if data is available in the error queue. But > clock_poll() does not check POLLERR, just POLLIN, and draws the wrong > conclusion that there is data available in the receive queue (when it is > in fact available in the error queue). > > When the above condition happens, recvmsg() will sleep typically for a > whole sync interval waiting for data on the event socket, and will be > woken up when the new real frame arrives. It will not dequeue follow-up > messages during this time (which are sent to the general message socket) > and when it does, it will already be late for them (their seqid will be > out of order). So it will drop them and everything that comes after. The > synchronization process will fail. > > The above condition shouldn't typically happen, but exceptional kernel > events will trigger it. It helps to be strict in ptp4l in order for > those events to not blow up in even stranger symptoms unrelated to the > root cause of the problem. > > Signed-off-by: Vladimir Oltean <ol...@gm...> Applied. Thanks, Richard |
From: Vladimir O. <ol...@gm...> - 2020-06-15 15:23:40
|
In the current design of the SO_SELECT_ERR_QUEUE socket option (which is enabled by sk_timestamping_init on the event fd), it is a bug to only check revents & POLLIN, but not also POLLERR. Normally the error queue events that the application expects (i.e. TX timestamps) are received, within a given timeout, in-band with the transmission of the timestampable message itself (for example in raw_send). For messages that the application does not / no longer expects, such as TX timestamps delivered late, duplicate TX timestamps, general exceptional messages enqueued by the kernel in the socket error queue etc, ptp4l will be taken by surprise in clock_poll() by these, and will think that there is data, since POLLIN is set (but in fact in the socket error queue etc, ptp4l will be taken by surprise in clock_poll() by these, and will think that there is data, since POLLIN is set (but in fact POLLERR is also set, and this has an entirely different meaning). A very, very simple reproducer is to take a DSA switch and run: tcpdump -i eth0 -j adapter_unsynced on its DSA master net device. The above command will enable timestamping on that net device, and if both the DSA switch and the master support PTP, this will make the kernel send duplicate TX timestamps for every sent event packet, which will completely kill ptp4l until a reboot, with no indication whatsoever of what's going on. Since the messages on the error queue are unexpected, we have no need for them. And they can be in theory anything, so simply hex dumping their content and moving along sounds like a good idea. Printing them to the user is optional (and helpful), but reading them is not. With this patch, even with extraneous data delivered by a buggy kernel (which the application now loudly complains about), the synchronization keeps chugging along. Otherwise the application starts reordering packets in recvmsg() due to misinterpreting which socket queue has data available. Signed-off-by: Vladimir Oltean <ol...@gm...> --- Changes in v2: Make msg_raw_dump print to stderr instead of stdout, matching the "Unexpected data on socket err queue:" text. clock.c | 11 +++++++++++ msg.c | 12 ++++++++++++ msg.h | 8 ++++++++ 3 files changed, 31 insertions(+) diff --git a/clock.c b/clock.c index f43cc2afd49e..d61881c5db7a 100644 --- a/clock.c +++ b/clock.c @@ -1559,6 +1559,17 @@ int clock_poll(struct clock *c) LIST_FOREACH(p, &c->ports, list) { /* Let the ports handle their events. */ for (i = 0; i < N_POLLFD; i++) { + if (cur[i].revents & POLLERR) { + unsigned char pkt[1600]; + + cnt = recv(cur[i].fd, pkt, sizeof(pkt), + MSG_ERRQUEUE); + pr_err("Unexpected data on socket err queue:"); + msg_raw_dump(pkt, cnt, stderr); + + continue; + } + if (cur[i].revents & (POLLIN|POLLPRI)) { event = port_event(p, i); if (EV_STATE_DECISION_EVENT == event) { diff --git a/msg.c b/msg.c index d1619d4973f1..59bb55d6cd89 100644 --- a/msg.c +++ b/msg.c @@ -601,3 +601,15 @@ int msg_sots_missing(struct ptp_message *m) } return msg_sots_valid(m) ? 0 : 1; } + +void msg_raw_dump(unsigned char *pkt, int cnt, FILE *fp) +{ + int k; + + for (k = 0; k < cnt; k++) { + if (k % 16 == 0) + fprintf(fp, "\n%04x ", k); + fprintf(fp, "%02x ", pkt[k]); + } + fprintf(fp, "\n"); +} diff --git a/msg.h b/msg.h index e71d3ceec4b9..f2165084395a 100644 --- a/msg.h +++ b/msg.h @@ -395,6 +395,14 @@ void msg_put(struct ptp_message *m); */ int msg_sots_missing(struct ptp_message *m); +/** + * Print a wireshark-compatible hex dump of a message buffer. + * @param pkt Message buffer to print + * @param cnt Length of message buffer + * @param fp An open file pointer. + */ +void msg_raw_dump(unsigned char *pkt, int cnt, FILE *fp); + /** * Test whether a message has a valid SO_TIMESTAMPING time stamp. * @param m Message to test. -- 2.25.1 |
From: Vladimir O. <ol...@gm...> - 2020-06-19 19:45:41
|
On Fri, 19 Jun 2020 at 21:20, Richard Cochran <ric...@gm...> wrote: > > On Fri, Jun 19, 2020 at 07:33:03PM +0300, Vladimir Oltean wrote: > > Then it is not correct to "continue" the loop, since it will keep > > iterating through the pollfd array for sockets that were closed in the > > meantime. > > And don't we want to set up a fault timer, to clear it eventually? > > Ah! Right you are ... > > diff --git a/clock.c b/clock.c > index f43cc2a..a66d189 100644 > --- a/clock.c > +++ b/clock.c > @@ -1559,8 +1559,14 @@ int clock_poll(struct clock *c) > LIST_FOREACH(p, &c->ports, list) { > /* Let the ports handle their events. */ > for (i = 0; i < N_POLLFD; i++) { > - if (cur[i].revents & (POLLIN|POLLPRI)) { > - event = port_event(p, i); > + if (cur[i].revents & (POLLIN|POLLPRI|POLLERR)) { > + if (cur[i].revents & POLLERR) { > + pr_err("port %d: unexpected socket error", > + port_number(p)); > + event = EV_FAULT_DETECTED; > + } else { > + event = port_event(p, i); > + } > if (EV_STATE_DECISION_EVENT == event) { > c->sde = 1; > } Yes, much better. ptp4l[12509.967]: selected /dev/ptp3 as PTP clock transport_open: 37 ptp4l[12510.004]: port 1: INITIALIZING to LISTENING on INIT_COMPLETE transport_open: 37 ptp4l[12510.005]: port 0: INITIALIZING to LISTENING on INIT_COMPLETE ptp4l[12510.157]: port 1: unexpected socket error ptp4l[12510.157]: port 1: LISTENING to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED) transport_open: 37 ptp4l[12526.220]: port 1: FAULTY to LISTENING on INIT_COMPLETE ptp4l[12527.158]: port 1: unexpected socket error ptp4l[12527.158]: port 1: LISTENING to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED) transport_open: 37 ptp4l[12543.220]: port 1: FAULTY to LISTENING on INIT_COMPLETE ptp4l[12544.162]: port 1: unexpected socket error ptp4l[12544.162]: port 1: LISTENING to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED) Thanks, -Vladimir |
From: Richard C. <ric...@gm...> - 2020-06-24 16:27:13
|
On Fri, Jun 19, 2020 at 10:45:23PM +0300, Vladimir Oltean wrote: > Yes, much better. May I add your tag to the commit message, like this? Catch unexpected socket polling errors. The poll(2) system call may set POLLERR in the returned events. Normally no errors are returned unless specifically requested by setting an appropriate socket option. Nevertheless, the poll(2) API is quite generic, and there is no guarantee that the kernel networking stack might push an error event one day. This patch adds defensive code in order to catch any unexpected error condition. Suggested-by: Vladimir Oltean <ol...@gm...> Signed-off-by: Richard Cochran <ric...@gm...> |
From: Vladimir O. <ol...@gm...> - 2020-06-15 15:23:47
|
ptp4l is too silent when receiving, for whatever reason, out of order messages. If the reordering is persistent (which is either a broken network, or a broken kernel), the behavior looks like a complete synchronization stall, since the application is designed to never attempt to recover from such a condition. At least save some people some debugging hours and print when the application reaches this code path. Since it's a debugging tool, we don't want to create false alarms when the occasional packet gets reordered in a production system, but have this information readily available when the program's log level is set to debug, instead of having users fish for it with code instrumentation. Signed-off-by: Vladimir Oltean <ol...@gm...> --- Changes in v2: Drop log level to debug. port.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/port.c b/port.c index 729cbd784e3b..8c372e2a1256 100644 --- a/port.c +++ b/port.c @@ -1215,6 +1215,23 @@ static void port_synchronize(struct port *p, } } +static void port_syfufsm_print_mismatch(struct port *p, enum syfu_event event, + struct ptp_message *m) +{ + int expected_msgtype; + + if (event == SYNC_MISMATCH) + expected_msgtype = FOLLOW_UP; + else + expected_msgtype = SYNC; + + pr_debug("port %hu: have %s %d, expecting %s but got %s %d, dropping", + portnum(p), msg_type_string(msg_type(p->last_syncfup)), + p->last_syncfup->header.sequenceId, + msg_type_string(expected_msgtype), + msg_type_string(msg_type(m)), m->header.sequenceId); +} + /* * Handle out of order packets. The network stack might * provide the follow up _before_ the sync message. After all, @@ -1249,6 +1266,7 @@ static void port_syfufsm(struct port *p, enum syfu_event event, case SF_HAVE_SYNC: switch (event) { case SYNC_MISMATCH: + port_syfufsm_print_mismatch(p, event, m); msg_put(p->last_syncfup); msg_get(m); p->last_syncfup = m; @@ -1256,6 +1274,7 @@ static void port_syfufsm(struct port *p, enum syfu_event event, case SYNC_MATCH: break; case FUP_MISMATCH: + port_syfufsm_print_mismatch(p, event, m); msg_put(p->last_syncfup); msg_get(m); p->last_syncfup = m; @@ -1277,6 +1296,7 @@ static void port_syfufsm(struct port *p, enum syfu_event event, case SF_HAVE_FUP: switch (event) { case SYNC_MISMATCH: + port_syfufsm_print_mismatch(p, event, m); msg_put(p->last_syncfup); msg_get(m); p->last_syncfup = m; @@ -1293,6 +1313,7 @@ static void port_syfufsm(struct port *p, enum syfu_event event, p->syfu = SF_EMPTY; break; case FUP_MISMATCH: + port_syfufsm_print_mismatch(p, event, m); msg_put(p->last_syncfup); msg_get(m); p->last_syncfup = m; -- 2.25.1 |
From: Richard C. <ric...@gm...> - 2020-06-24 17:16:21
|
On Mon, Jun 15, 2020 at 06:23:21PM +0300, Vladimir Oltean wrote: > ptp4l is too silent when receiving, for whatever reason, out of order > messages. If the reordering is persistent (which is either a broken > network, or a broken kernel), the behavior looks like a complete > synchronization stall, since the application is designed to never > attempt to recover from such a condition. > > At least save some people some debugging hours and print when the > application reaches this code path. Since it's a debugging tool, we > don't want to create false alarms when the occasional packet gets > reordered in a production system, but have this information readily > available when the program's log level is set to debug, instead of > having users fish for it with code instrumentation. > > Signed-off-by: Vladimir Oltean <ol...@gm...> Applied. Thanks, Richard |
From: Jacob K. <jac...@in...> - 2020-06-18 19:43:07
|
On 6/15/2020 8:23 AM, Vladimir Oltean wrote: > The application's main event loop (clock_poll) is woken up by poll() and > dispatches the socket receive queue events to the corresponding ports as > needed. > > So it is a bug if poll() wakes up the process for data availability on a > socket's receive queue, and then recvmsg(), called immediately > afterwards, goes to sleep trying to retrieve it. This patch will > generate an error that will be propagated to the user if this condition > happens. > > Can it happen? > > As of this patch, ptp4l uses the SO_SELECT_ERR_QUEUE socket option, > which means that poll() will wake the process up, with revents == > (POLLIN | POLLERR), if data is available in the error queue. But > clock_poll() does not check POLLERR, just POLLIN, and draws the wrong > conclusion that there is data available in the receive queue (when it is > in fact available in the error queue). > > When the above condition happens, recvmsg() will sleep typically for a > whole sync interval waiting for data on the event socket, and will be > woken up when the new real frame arrives. It will not dequeue follow-up > messages during this time (which are sent to the general message socket) > and when it does, it will already be late for them (their seqid will be > out of order). So it will drop them and everything that comes after. The > synchronization process will fail. > > The above condition shouldn't typically happen, but exceptional kernel > events will trigger it. It helps to be strict in ptp4l in order for > those events to not blow up in even stranger symptoms unrelated to the > root cause of the problem. > Makes sense. Using MSG_DONTWAIT seems reasonable since we already know there should be data available. > Signed-off-by: Vladimir Oltean <ol...@gm...> > --- > Changes in v2: > None. > > raw.c | 2 +- > udp.c | 2 +- > udp6.c | 2 +- > 3 files changed, 3 insertions(+), 3 deletions(-) > > diff --git a/raw.c b/raw.c > index 15c97561066e..0bd15b08e0a2 100644 > --- a/raw.c > +++ b/raw.c > @@ -279,7 +279,7 @@ static int raw_recv(struct transport *t, int fd, void *buf, int buflen, > buflen += hlen; > hdr = (struct eth_hdr *) ptr; > > - cnt = sk_receive(fd, ptr, buflen, addr, hwts, 0); > + cnt = sk_receive(fd, ptr, buflen, addr, hwts, MSG_DONTWAIT); > > if (cnt >= 0) > cnt -= hlen; > diff --git a/udp.c b/udp.c > index 36802fb67b74..826bd124deef 100644 > --- a/udp.c > +++ b/udp.c > @@ -210,7 +210,7 @@ no_event: > static int udp_recv(struct transport *t, int fd, void *buf, int buflen, > struct address *addr, struct hw_timestamp *hwts) > { > - return sk_receive(fd, buf, buflen, addr, hwts, 0); > + return sk_receive(fd, buf, buflen, addr, hwts, MSG_DONTWAIT); > } > > static int udp_send(struct transport *t, struct fdarray *fda, > diff --git a/udp6.c b/udp6.c > index 744a5bc8adcb..ba5482e3d4c9 100644 > --- a/udp6.c > +++ b/udp6.c > @@ -227,7 +227,7 @@ no_event: > static int udp6_recv(struct transport *t, int fd, void *buf, int buflen, > struct address *addr, struct hw_timestamp *hwts) > { > - return sk_receive(fd, buf, buflen, addr, hwts, 0); > + return sk_receive(fd, buf, buflen, addr, hwts, MSG_DONTWAIT); > } > > static int udp6_send(struct transport *t, struct fdarray *fda, > |
From: Jacob K. <jac...@in...> - 2020-06-18 19:44:50
|
On 6/15/2020 8:23 AM, Vladimir Oltean wrote: > In the current design of the SO_SELECT_ERR_QUEUE socket option (which is > enabled by sk_timestamping_init on the event fd), it is a bug to only > check revents & POLLIN, but not also POLLERR. > > Normally the error queue events that the application expects (i.e. TX > timestamps) are received, within a given timeout, in-band with the > transmission of the timestampable message itself (for example in > raw_send). > > For messages that the application does not / no longer expects, such as > TX timestamps delivered late, duplicate TX timestamps, general > exceptional messages enqueued by the kernel in the socket error queue > etc, ptp4l will be taken by surprise in clock_poll() by these, and will > think that there is data, since POLLIN is set (but in fact in the socket > error queue etc, ptp4l will be taken by surprise in clock_poll() by > these, and will think that there is data, since POLLIN is set (but in > fact POLLERR is also set, and this has an entirely different meaning). > > A very, very simple reproducer is to take a DSA switch and run: > > tcpdump -i eth0 -j adapter_unsynced > > on its DSA master net device. The above command will enable timestamping > on that net device, and if both the DSA switch and the master support > PTP, this will make the kernel send duplicate TX timestamps for every > sent event packet, which will completely kill ptp4l until a reboot, with > no indication whatsoever of what's going on. > > Since the messages on the error queue are unexpected, we have no need > for them. And they can be in theory anything, so simply hex dumping > their content and moving along sounds like a good idea. > > Printing them to the user is optional (and helpful), but reading them is > not. With this patch, even with extraneous data delivered by a buggy > kernel (which the application now loudly complains about), the > synchronization keeps chugging along. Otherwise the application starts > reordering packets in recvmsg() due to misinterpreting which socket > queue has data available. > > Signed-off-by: Vladimir Oltean <ol...@gm...> > --- > Changes in v2: > Make msg_raw_dump print to stderr instead of stdout, matching the > "Unexpected data on socket err queue:" text. > > clock.c | 11 +++++++++++ > msg.c | 12 ++++++++++++ > msg.h | 8 ++++++++ > 3 files changed, 31 insertions(+) > > diff --git a/clock.c b/clock.c > index f43cc2afd49e..d61881c5db7a 100644 > --- a/clock.c > +++ b/clock.c > @@ -1559,6 +1559,17 @@ int clock_poll(struct clock *c) > LIST_FOREACH(p, &c->ports, list) { > /* Let the ports handle their events. */ > for (i = 0; i < N_POLLFD; i++) { > + if (cur[i].revents & POLLERR) { > + unsigned char pkt[1600]; > + > + cnt = recv(cur[i].fd, pkt, sizeof(pkt), > + MSG_ERRQUEUE); > + pr_err("Unexpected data on socket err queue:"); > + msg_raw_dump(pkt, cnt, stderr); > + > + continue; > + } > + > if (cur[i].revents & (POLLIN|POLLPRI)) { > event = port_event(p, i); > if (EV_STATE_DECISION_EVENT == event) { > diff --git a/msg.c b/msg.c > index d1619d4973f1..59bb55d6cd89 100644 > --- a/msg.c > +++ b/msg.c > @@ -601,3 +601,15 @@ int msg_sots_missing(struct ptp_message *m) > } > return msg_sots_valid(m) ? 0 : 1; > } > + > +void msg_raw_dump(unsigned char *pkt, int cnt, FILE *fp) > +{ > + int k; > + > + for (k = 0; k < cnt; k++) { > + if (k % 16 == 0) > + fprintf(fp, "\n%04x ", k); > + fprintf(fp, "%02x ", pkt[k]); > + } > + fprintf(fp, "\n"); > +} Could this use the pr facility, so that it honors the printing options to log to syslog and manages the level? I'm not sure what I would mark these as: debug because they no longer impact the syncing process, or true error because they are unexpected? Thanks, Jake > diff --git a/msg.h b/msg.h > index e71d3ceec4b9..f2165084395a 100644 > --- a/msg.h > +++ b/msg.h > @@ -395,6 +395,14 @@ void msg_put(struct ptp_message *m); > */ > int msg_sots_missing(struct ptp_message *m); > > +/** > + * Print a wireshark-compatible hex dump of a message buffer. > + * @param pkt Message buffer to print > + * @param cnt Length of message buffer > + * @param fp An open file pointer. > + */ > +void msg_raw_dump(unsigned char *pkt, int cnt, FILE *fp); > + > /** > * Test whether a message has a valid SO_TIMESTAMPING time stamp. > * @param m Message to test. > |
From: Vladimir O. <ol...@gm...> - 2020-06-18 20:01:08
|
On Thu, 18 Jun 2020 at 22:44, Jacob Keller <jac...@in...> wrote: > > > I'm not sure what I would mark these as: debug because they no longer > impact the syncing process, or true error because they are unexpected? > > Thanks, > Jake > I don't think I would qualify "unexpected packets received on a socket's error queue" as anything other than an error. Now, whose error is it, that's another story, but somebody should signal it nonetheless. -Vladimir |
From: Vladimir O. <ol...@gm...> - 2020-06-18 19:57:15
|
Hi Jacob, On Thu, 18 Jun 2020 at 22:44, Jacob Keller <jac...@in...> wrote: > > > > On 6/15/2020 8:23 AM, Vladimir Oltean wrote: > > In the current design of the SO_SELECT_ERR_QUEUE socket option (which is > > enabled by sk_timestamping_init on the event fd), it is a bug to only > > check revents & POLLIN, but not also POLLERR. > > > > Normally the error queue events that the application expects (i.e. TX > > timestamps) are received, within a given timeout, in-band with the > > transmission of the timestampable message itself (for example in > > raw_send). > > > > For messages that the application does not / no longer expects, such as > > TX timestamps delivered late, duplicate TX timestamps, general > > exceptional messages enqueued by the kernel in the socket error queue > > etc, ptp4l will be taken by surprise in clock_poll() by these, and will > > think that there is data, since POLLIN is set (but in fact in the socket > > error queue etc, ptp4l will be taken by surprise in clock_poll() by > > these, and will think that there is data, since POLLIN is set (but in > > fact POLLERR is also set, and this has an entirely different meaning). > > > > A very, very simple reproducer is to take a DSA switch and run: > > > > tcpdump -i eth0 -j adapter_unsynced > > > > on its DSA master net device. The above command will enable timestamping > > on that net device, and if both the DSA switch and the master support > > PTP, this will make the kernel send duplicate TX timestamps for every > > sent event packet, which will completely kill ptp4l until a reboot, with > > no indication whatsoever of what's going on. > > > > Since the messages on the error queue are unexpected, we have no need > > for them. And they can be in theory anything, so simply hex dumping > > their content and moving along sounds like a good idea. > > > > Printing them to the user is optional (and helpful), but reading them is > > not. With this patch, even with extraneous data delivered by a buggy > > kernel (which the application now loudly complains about), the > > synchronization keeps chugging along. Otherwise the application starts > > reordering packets in recvmsg() due to misinterpreting which socket > > queue has data available. > > > > Signed-off-by: Vladimir Oltean <ol...@gm...> > > --- > > Changes in v2: > > Make msg_raw_dump print to stderr instead of stdout, matching the > > "Unexpected data on socket err queue:" text. > > > > clock.c | 11 +++++++++++ > > msg.c | 12 ++++++++++++ > > msg.h | 8 ++++++++ > > 3 files changed, 31 insertions(+) > > > > diff --git a/clock.c b/clock.c > > index f43cc2afd49e..d61881c5db7a 100644 > > --- a/clock.c > > +++ b/clock.c > > @@ -1559,6 +1559,17 @@ int clock_poll(struct clock *c) > > LIST_FOREACH(p, &c->ports, list) { > > /* Let the ports handle their events. */ > > for (i = 0; i < N_POLLFD; i++) { > > + if (cur[i].revents & POLLERR) { > > + unsigned char pkt[1600]; > > + > > + cnt = recv(cur[i].fd, pkt, sizeof(pkt), > > + MSG_ERRQUEUE); > > + pr_err("Unexpected data on socket err queue:"); > > + msg_raw_dump(pkt, cnt, stderr); > > + > > + continue; > > + } > > + > > if (cur[i].revents & (POLLIN|POLLPRI)) { > > event = port_event(p, i); > > if (EV_STATE_DECISION_EVENT == event) { > > diff --git a/msg.c b/msg.c > > index d1619d4973f1..59bb55d6cd89 100644 > > --- a/msg.c > > +++ b/msg.c > > @@ -601,3 +601,15 @@ int msg_sots_missing(struct ptp_message *m) > > } > > return msg_sots_valid(m) ? 0 : 1; > > } > > + > > +void msg_raw_dump(unsigned char *pkt, int cnt, FILE *fp) > > +{ > > + int k; > > + > > + for (k = 0; k < cnt; k++) { > > + if (k % 16 == 0) > > + fprintf(fp, "\n%04x ", k); > > + fprintf(fp, "%02x ", pkt[k]); > > + } > > + fprintf(fp, "\n"); > > +} > > Could this use the pr facility, so that it honors the printing options > to log to syslog and manages the level? > > I'm not sure what I would mark these as: debug because they no longer > impact the syncing process, or true error because they are unexpected? > > Thanks, > Jake > Thanks a lot for looking at this. Good point about redirecting to syslog, I did not think about that. The reason why I did not use pr_err was due to the automatic line ending, which would have unnecessarily complicated this function by having me print each line of the packet to a temporary buffer first. But now that I see there's no way around it, I guess I'll have to do that. > > diff --git a/msg.h b/msg.h > > index e71d3ceec4b9..f2165084395a 100644 > > --- a/msg.h > > +++ b/msg.h > > @@ -395,6 +395,14 @@ void msg_put(struct ptp_message *m); > > */ > > int msg_sots_missing(struct ptp_message *m); > > > > +/** > > + * Print a wireshark-compatible hex dump of a message buffer. > > + * @param pkt Message buffer to print > > + * @param cnt Length of message buffer > > + * @param fp An open file pointer. > > + */ > > +void msg_raw_dump(unsigned char *pkt, int cnt, FILE *fp); > > + > > /** > > * Test whether a message has a valid SO_TIMESTAMPING time stamp. > > * @param m Message to test. > > Cheers, -Vladimir |
From: Jacob K. <jac...@in...> - 2020-06-18 20:02:48
|
On 6/18/2020 12:56 PM, Vladimir Oltean wrote: >> Could this use the pr facility, so that it honors the printing options >> to log to syslog and manages the level? >> >> I'm not sure what I would mark these as: debug because they no longer >> impact the syncing process, or true error because they are unexpected? >> >> Thanks, >> Jake >> > > Thanks a lot for looking at this. > Good point about redirecting to syslog, I did not think about that. > The reason why I did not use pr_err was due to the automatic line > ending, which would have unnecessarily complicated this function by > having me print each line of the packet to a temporary buffer first. > But now that I see there's no way around it, I guess I'll have to do > that. What about implementing this as part of pr.c? Thanks, Jake |