Thread: Re: [Linuxptp-devel] [PATCH v2 2/3] clock: dump unexpected packets received on the error queues of (Page 3)

PTP IEEE 1588 stack for Linux

Brought to you by: rcochran

linuxptp-devel

Re: [Linuxptp-devel] [PATCH v2 2/3] clock: dump unexpected packets received on the error queues of sockets

From: Richard C. <ric...@gm...> - 2020-06-24 18:12:37

On Wed, Jun 24, 2020 at 09:01:33PM +0300, Vladimir Oltean wrote:
> I think it boils down to something very simple.
> The kernel is waking up a user process with revents = POLLIN |
> POLLERR. That has a certain meaning.
> Then the process says "Oh, yay, I have revents = POLLIN. Let me
> process that!". That has an entirely different meaning.

I agree.

In this specific case, if POLLERR appears, it means that some unknown
error occurred, and the application can't do anything about it beyond
throwing a generic fault.  This is what I would call "defensive"
programming.

If and when the kernel introduces new POLLERR events, we will of
course implement the appropriate handlers.  But as of today there are
no such events to handle.

Thanks,
Richard

Re: [Linuxptp-devel] [PATCH v2 2/3] clock: dump unexpected packets received on the error queues of sockets

From: Vladimir O. <ol...@gm...> - 2020-06-19 16:33:29

On Fri, 19 Jun 2020 at 18:47, Richard Cochran <ric...@gm...> wrote:
>
> On Fri, Jun 19, 2020 at 05:58:17PM +0300, Vladimir Oltean wrote:
> > You need to recv() the data. Otherwise, it will remain forever there,
> > ptp4l will be toast, since this will keep dispatching the fault ad
> > infinitum.
>
> The fault handling path closes all sockets and opens new ones.
>
> Thanks,
> Richard

Then it is not correct to "continue" the loop, since it will keep
iterating through the pollfd array for sockets that were closed in the
meantime.
And don't we want to set up a fault timer, to clear it eventually?

-Vladimir

Re: [Linuxptp-devel] [PATCH v2 2/3] clock: dump unexpected packets received on the error queues of sockets

From: Richard C. <ric...@gm...> - 2020-06-19 18:20:33

On Fri, Jun 19, 2020 at 07:33:03PM +0300, Vladimir Oltean wrote:
> Then it is not correct to "continue" the loop, since it will keep
> iterating through the pollfd array for sockets that were closed in the
> meantime.
> And don't we want to set up a fault timer, to clear it eventually?

Ah!  Right you are ...

diff --git a/clock.c b/clock.c
index f43cc2a..a66d189 100644
--- a/clock.c
+++ b/clock.c
@@ -1559,8 +1559,14 @@ int clock_poll(struct clock *c)
 	LIST_FOREACH(p, &c->ports, list) {
 		/* Let the ports handle their events. */
 		for (i = 0; i < N_POLLFD; i++) {
-			if (cur[i].revents & (POLLIN|POLLPRI)) {
-				event = port_event(p, i);
+			if (cur[i].revents & (POLLIN|POLLPRI|POLLERR)) {
+				if (cur[i].revents & POLLERR) {
+					pr_err("port %d: unexpected socket error",
+					       port_number(p));
+					event = EV_FAULT_DETECTED;
+				} else {
+					event = port_event(p, i);
+				}
 				if (EV_STATE_DECISION_EVENT == event) {
 					c->sde = 1;
 				}

Re: [Linuxptp-devel] [PATCH v2 2/3] clock: dump unexpected packets received on the error queues of sockets

From: Vladimir O. <ol...@gm...> - 2020-06-24 18:17:01

On Wed, 24 Jun 2020 at 21:12, Richard Cochran <ric...@gm...> wrote:
>
> On Wed, Jun 24, 2020 at 09:01:33PM +0300, Vladimir Oltean wrote:
> > I think it boils down to something very simple.
> > The kernel is waking up a user process with revents = POLLIN |
> > POLLERR. That has a certain meaning.
> > Then the process says "Oh, yay, I have revents = POLLIN. Let me
> > process that!". That has an entirely different meaning.
>
> I agree.
>
> In this specific case, if POLLERR appears, it means that some unknown
> error occurred, and the application can't do anything about it beyond
> throwing a generic fault.  This is what I would call "defensive"
> programming.
>

Treating it in a meaningful way for what it is, or for what it can be
in the future (an event on an error/control socket) is defensive
programming.
Treating it as an event on a data queue is a bug.

> If and when the kernel introduces new POLLERR events, we will of
> course implement the appropriate handlers.  But as of today there are
> no such events to handle.
>
> Thanks,
> Richard
>

Thanks,
-Vladimir

Re: [Linuxptp-devel] [PATCH v2 2/3] clock: dump unexpected packets received on the error queues of sockets

From: Richard C. <ric...@gm...> - 2020-06-24 18:23:12

On Wed, Jun 24, 2020 at 09:16:42PM +0300, Vladimir Oltean wrote:
> Treating it as an event on a data queue is a bug.

Yeah, I guess we have just been lucky all this time.

Re: [Linuxptp-devel] [PATCH v2 2/3] clock: dump unexpected packets received on the error queues of sockets

From: Vladimir O. <ol...@gm...> - 2020-06-24 18:39:12

On Wed, 24 Jun 2020 at 21:22, Richard Cochran <ric...@gm...> wrote:
>
> On Wed, Jun 24, 2020 at 09:16:42PM +0300, Vladimir Oltean wrote:
> > Treating it as an event on a data queue is a bug.
>
> Yeah, I guess we have just been lucky all this time.

I can sense some irony here, but I don't see your point though?
I mean, take any board which has a DSA master that supports PTP
timestamping and a DSA switch that also supports PTP timestamping, and
run that "tcpdump -j adapter_unsynced" command on the DSA master. If
you don't have my patch that prevents that from happening:

https://patchwork.ozlabs.org/project/netdev/patch/201...@gm.../

then what'll happen is exactly what happened to me.
Although in my case it was due to a driver bug, and in the tcpdump
case it is due to a systematic limitation of the SO_TIMESTAMPING API.
Not only that, but when you're coming from the angle that "tcpdump
broke ptp4l", then you at least have some sort of rope to cling on to
(if you want to, that is). And I _guarantee_ to you that somebody out
there ran into that oddity at least once before, said "huh, weird,
what a buggy system" and went on with their life. So, not really worth
it to investigate, when there's something you can do to avoid the
problem. But just blowing off the fact that it _is_ easy to reproduce
even without "odd" driver bugs is pushing it a bit, IMO.
So, I guess we _have_ been lucky all this time, in the "ignorance is
bliss" sense. The ptp4l program does the same thing in both cases, and
it isn't something that is correct given its options.

Thanks,
-Vladimir

Re: [Linuxptp-devel] [PATCH v2 1/3] ptp4l: call recvmsg() with the MSG_DONTWAIT flag

From: Richard C. <ric...@gm...> - 2020-06-19 19:58:33

On Mon, Jun 15, 2020 at 06:23:19PM +0300, Vladimir Oltean wrote:
> 
> The above condition shouldn't typically happen, but exceptional kernel
> events will trigger it. It helps to be strict in ptp4l in order for
> those events to not blow up in even stranger symptoms unrelated to the
> root cause of the problem.

Actually, after all that discussion and ranting, I think I'll this
patch after all.  It won't hurt, and it might help the next person
with their kernel/driver woes.

Thanks,
Richard

Re: [Linuxptp-devel] [PATCH v2 1/3] ptp4l: call recvmsg() with the MSG_DONTWAIT flag

From: Vladimir O. <ol...@gm...> - 2020-06-19 20:29:21

On Fri, 19 Jun 2020 at 22:58, Richard Cochran <ric...@gm...> wrote:
>
> On Mon, Jun 15, 2020 at 06:23:19PM +0300, Vladimir Oltean wrote:
> >
> > The above condition shouldn't typically happen, but exceptional kernel
> > events will trigger it. It helps to be strict in ptp4l in order for
> > those events to not blow up in even stranger symptoms unrelated to the
> > root cause of the problem.
>
> Actually, after all that discussion and ranting, I think I'll this
> patch after all.  It won't hurt, and it might help the next person
> with their kernel/driver woes.
>
> Thanks,
> Richard

Wow, that was a hell of a fight that I didn't enjoy. But, at least
glad that we can agree on what 'trying to be helpful' entails.

Thanks,
-Vladimir

<< < 1 2 3 (Page 3 of 3)