Re: [Linuxptp-users] Possible missing fault
PTP IEEE 1588 stack for Linux
Brought to you by:
rcochran
From: Richard C. <ric...@gm...> - 2012-05-07 19:17:23
|
On Mon, May 07, 2012 at 06:34:51PM +0000, Keller, Jacob E wrote: > This is using layer2 ethernet, and the peer delay mechanism. The following output > With a large chunk of similar output snipped out of the middle. > > ptp4l[9130]: master offset 0 s2 adj -0 path delay 2246 > ptp4l[9130]: master offset 0 s2 adj -0 path delay 2246 > ptp4l[9130]: master offset 0 s2 adj -0 path delay 2246 > ptp4l[9130]: master offset 0 s2 adj -0 path delay 2246 BTW - Wow, those number look great. ...snip... > ptp4l[9130]: master offset 0 s2 adj -1 path delay 2246 > ptp4l[9130]: recvmsg tx timestamp failed: Resource temporarily unavailable > --- > Here, recvmsg doesn't throw a fault? It just keeps going, but clearly > Something is wrong. > --- > ptp4l[9130]: port 1: send peer delay response failed There is a check missing in port.c for this case. This should fix it. diff --git a/port.c b/port.c index 4fe32d0..5edc772 100644 --- a/port.c +++ b/port.c @@ -1297,7 +1297,8 @@ enum fsm_event port_event(struct port *p, int fd_index) process_delay_req(p, msg); break; case PDELAY_REQ: - process_pdelay_req(p, msg); + if (process_pdelay_req(p, msg)) + event = EV_FAULT_DETECTED; break; case PDELAY_RESP: if (process_pdelay_resp(p, msg)) > ptp4l[9130]: master offset 63794 s2 adj +34723 path delay 2314 > ptp4l[9130]: master offset 29063 s2 adj +19130 path delay 2314 > ptp4l[9130]: master offset 9993 s2 adj +8779 path delay 2250 > ptp4l[9130]: master offset 1210 s2 adj +2994 path delay 2250 > ptp4l[9130]: master offset -1782 s2 adj +365 path delay 2247 > ptp4l[9130]: master offset -2148 s2 adj -535 path delay 2246 > ptp4l[9130]: master offset -1612 s2 adj -644 path delay 2246 > ptp4l[9130]: master offset -971 s2 adj -486 path delay 2246 > ptp4l[9130]: master offset -483 s2 adj -290 path delay 2245 Here it has started to recover, obviously. > Much later on a different error throws a fault and suddenly everything is better. Is that behavior in the > Middle possibly caused by some faulty state that wasn't cleared? I am not sure, but those path delays > And values seem incredibly wrong. The peer path delay is a moving average of ten values, so even if you get one or a few very wrong values, the bad effect should soon disappear, but in your log the error persists much longer. > I think it's because somehow one of the sequence numbers got > Out of order and messed up. I am going to try and take a look at that code and see if I can find the issue, > But I am wondering if you've seen behavior like this. I haven't seen that, and it does look like either the messages are wrong (unlikely, but check with wireshark) or that the HW time stamps are getting associated with the wrong messages. Good luck, Richard |