Thread: Re: [Linuxptp-devel] [PATCH v2 2/3] clock: dump unexpected packets received on the error queues of (Page 3)
PTP IEEE 1588 stack for Linux
Brought to you by:
rcochran
From: Richard C. <ric...@gm...> - 2020-06-24 18:12:37
|
On Wed, Jun 24, 2020 at 09:01:33PM +0300, Vladimir Oltean wrote: > I think it boils down to something very simple. > The kernel is waking up a user process with revents = POLLIN | > POLLERR. That has a certain meaning. > Then the process says "Oh, yay, I have revents = POLLIN. Let me > process that!". That has an entirely different meaning. I agree. In this specific case, if POLLERR appears, it means that some unknown error occurred, and the application can't do anything about it beyond throwing a generic fault. This is what I would call "defensive" programming. If and when the kernel introduces new POLLERR events, we will of course implement the appropriate handlers. But as of today there are no such events to handle. Thanks, Richard |
From: Vladimir O. <ol...@gm...> - 2020-06-19 16:33:29
|
On Fri, 19 Jun 2020 at 18:47, Richard Cochran <ric...@gm...> wrote: > > On Fri, Jun 19, 2020 at 05:58:17PM +0300, Vladimir Oltean wrote: > > You need to recv() the data. Otherwise, it will remain forever there, > > ptp4l will be toast, since this will keep dispatching the fault ad > > infinitum. > > The fault handling path closes all sockets and opens new ones. > > Thanks, > Richard Then it is not correct to "continue" the loop, since it will keep iterating through the pollfd array for sockets that were closed in the meantime. And don't we want to set up a fault timer, to clear it eventually? -Vladimir |
From: Richard C. <ric...@gm...> - 2020-06-19 18:20:33
|
On Fri, Jun 19, 2020 at 07:33:03PM +0300, Vladimir Oltean wrote: > Then it is not correct to "continue" the loop, since it will keep > iterating through the pollfd array for sockets that were closed in the > meantime. > And don't we want to set up a fault timer, to clear it eventually? Ah! Right you are ... diff --git a/clock.c b/clock.c index f43cc2a..a66d189 100644 --- a/clock.c +++ b/clock.c @@ -1559,8 +1559,14 @@ int clock_poll(struct clock *c) LIST_FOREACH(p, &c->ports, list) { /* Let the ports handle their events. */ for (i = 0; i < N_POLLFD; i++) { - if (cur[i].revents & (POLLIN|POLLPRI)) { - event = port_event(p, i); + if (cur[i].revents & (POLLIN|POLLPRI|POLLERR)) { + if (cur[i].revents & POLLERR) { + pr_err("port %d: unexpected socket error", + port_number(p)); + event = EV_FAULT_DETECTED; + } else { + event = port_event(p, i); + } if (EV_STATE_DECISION_EVENT == event) { c->sde = 1; } |
From: Vladimir O. <ol...@gm...> - 2020-06-24 18:17:01
|
On Wed, 24 Jun 2020 at 21:12, Richard Cochran <ric...@gm...> wrote: > > On Wed, Jun 24, 2020 at 09:01:33PM +0300, Vladimir Oltean wrote: > > I think it boils down to something very simple. > > The kernel is waking up a user process with revents = POLLIN | > > POLLERR. That has a certain meaning. > > Then the process says "Oh, yay, I have revents = POLLIN. Let me > > process that!". That has an entirely different meaning. > > I agree. > > In this specific case, if POLLERR appears, it means that some unknown > error occurred, and the application can't do anything about it beyond > throwing a generic fault. This is what I would call "defensive" > programming. > Treating it in a meaningful way for what it is, or for what it can be in the future (an event on an error/control socket) is defensive programming. Treating it as an event on a data queue is a bug. > If and when the kernel introduces new POLLERR events, we will of > course implement the appropriate handlers. But as of today there are > no such events to handle. > > Thanks, > Richard > Thanks, -Vladimir |
From: Richard C. <ric...@gm...> - 2020-06-24 18:23:12
|
On Wed, Jun 24, 2020 at 09:16:42PM +0300, Vladimir Oltean wrote: > Treating it as an event on a data queue is a bug. Yeah, I guess we have just been lucky all this time. |
From: Vladimir O. <ol...@gm...> - 2020-06-24 18:39:12
|
On Wed, 24 Jun 2020 at 21:22, Richard Cochran <ric...@gm...> wrote: > > On Wed, Jun 24, 2020 at 09:16:42PM +0300, Vladimir Oltean wrote: > > Treating it as an event on a data queue is a bug. > > Yeah, I guess we have just been lucky all this time. I can sense some irony here, but I don't see your point though? I mean, take any board which has a DSA master that supports PTP timestamping and a DSA switch that also supports PTP timestamping, and run that "tcpdump -j adapter_unsynced" command on the DSA master. If you don't have my patch that prevents that from happening: https://patchwork.ozlabs.org/project/netdev/patch/201...@gm.../ then what'll happen is exactly what happened to me. Although in my case it was due to a driver bug, and in the tcpdump case it is due to a systematic limitation of the SO_TIMESTAMPING API. Not only that, but when you're coming from the angle that "tcpdump broke ptp4l", then you at least have some sort of rope to cling on to (if you want to, that is). And I _guarantee_ to you that somebody out there ran into that oddity at least once before, said "huh, weird, what a buggy system" and went on with their life. So, not really worth it to investigate, when there's something you can do to avoid the problem. But just blowing off the fact that it _is_ easy to reproduce even without "odd" driver bugs is pushing it a bit, IMO. So, I guess we _have_ been lucky all this time, in the "ignorance is bliss" sense. The ptp4l program does the same thing in both cases, and it isn't something that is correct given its options. Thanks, -Vladimir |
From: Richard C. <ric...@gm...> - 2020-06-19 19:58:33
|
On Mon, Jun 15, 2020 at 06:23:19PM +0300, Vladimir Oltean wrote: > > The above condition shouldn't typically happen, but exceptional kernel > events will trigger it. It helps to be strict in ptp4l in order for > those events to not blow up in even stranger symptoms unrelated to the > root cause of the problem. Actually, after all that discussion and ranting, I think I'll this patch after all. It won't hurt, and it might help the next person with their kernel/driver woes. Thanks, Richard |
From: Vladimir O. <ol...@gm...> - 2020-06-19 20:29:21
|
On Fri, 19 Jun 2020 at 22:58, Richard Cochran <ric...@gm...> wrote: > > On Mon, Jun 15, 2020 at 06:23:19PM +0300, Vladimir Oltean wrote: > > > > The above condition shouldn't typically happen, but exceptional kernel > > events will trigger it. It helps to be strict in ptp4l in order for > > those events to not blow up in even stranger symptoms unrelated to the > > root cause of the problem. > > Actually, after all that discussion and ranting, I think I'll this > patch after all. It won't hurt, and it might help the next person > with their kernel/driver woes. > > Thanks, > Richard Wow, that was a hell of a fight that I didn't enjoy. But, at least glad that we can agree on what 'trying to be helpful' entails. Thanks, -Vladimir |