Re: [Linuxptp-users] Possible missing fault
PTP IEEE 1588 stack for Linux
Brought to you by:
rcochran
From: Keller, J. E <jac...@in...> - 2012-05-07 19:29:09
|
> -----Original Message----- > From: Richard Cochran [mailto:ric...@gm...] > Sent: Monday, May 07, 2012 12:17 PM > To: Keller, Jacob E > Cc: lin...@li... > Subject: Re: [Linuxptp-users] Possible missing fault > On Mon, May 07, 2012 at 06:34:51PM +0000, Keller, Jacob E wrote: >> This is using layer2 ethernet, and the peer delay mechanism. The >> following output With a large chunk of similar output snipped out of >> the middle. >> >> ptp4l[9130]: master offset 0 s2 adj -0 path delay 2246 >> ptp4l[9130]: master offset 0 s2 adj -0 path delay 2246 >> ptp4l[9130]: master offset 0 s2 adj -0 path delay 2246 >> ptp4l[9130]: master offset 0 s2 adj -0 path delay 2246 > > BTW - Wow, those number look great. > > ...snip... > >> ptp4l[9130]: master offset 0 s2 adj -1 path delay 2246 >> ptp4l[9130]: recvmsg tx timestamp failed: Resource temporarily >> unavailable >> --- >> Here, recvmsg doesn't throw a fault? It just keeps going, but >> clearly Something is wrong. >> --- >> ptp4l[9130]: port 1: send peer delay response failed > > There is a check missing in port.c for this case. This should fix it. > > diff --git a/port.c b/port.c > index 4fe32d0..5edc772 100644 > --- a/port.c > +++ b/port.c > @@ -1297,7 +1297,8 @@ enum fsm_event port_event(struct port *p, int fd_index) > process_delay_req(p, msg); > break; > case PDELAY_REQ: > - process_pdelay_req(p, msg); > + if (process_pdelay_req(p, msg)) > + event = EV_FAULT_DETECTED; > break; > case PDELAY_RESP: > if (process_pdelay_resp(p, msg)) > >> ptp4l[9130]: master offset 63794 s2 adj +34723 path delay 2314 >> ptp4l[9130]: master offset 29063 s2 adj +19130 path delay 2314 >> ptp4l[9130]: master offset 9993 s2 adj +8779 path delay 2250 >> ptp4l[9130]: master offset 1210 s2 adj +2994 path delay 2250 >> ptp4l[9130]: master offset -1782 s2 adj +365 path delay 2247 >> ptp4l[9130]: master offset -2148 s2 adj -535 path delay 2246 >> ptp4l[9130]: master offset -1612 s2 adj -644 path delay 2246 >> ptp4l[9130]: master offset -971 s2 adj -486 path delay 2246 >> ptp4l[9130]: master offset -483 s2 adj -290 path delay 2245 > > Here it has started to recover, obviously. > >> Much later on a different error throws a fault and suddenly >> everything is better. Is that behavior in the Middle possibly caused >> by some faulty state that wasn't cleared? I am not sure, but those >> path delays And > values seem incredibly wrong. > > The peer path delay is a moving average of ten values, so even if you > get one or a few very wrong values, the bad effect should soon > disappear, but in your log the error persists much longer. > >> I think it's because somehow one of the sequence numbers got Out of >> order and messed up. I am going to try and take a look at that code >> and see if I can find the issue, But I am wondering if you've seen >> behavior > like this. > > I haven't seen that, and it does look like either the messages are > wrong (unlikely, but check with wireshark) or that the HW time stamps > are getting associated with the wrong messages. > > Good luck, > Richard I am not sure exactly what the error is considering this occurred after 3 days. I think adding the port fault check should be good though, as after the fault the major screwup appears to go away. - Jake PS: I sent a patch that adds those fault checks :) |