Re: [Linuxptp-users] clockcheck - need to filter large spurious phase jumps?
PTP IEEE 1588 stack for Linux
Brought to you by:
rcochran
From: Keller, J. E <jac...@in...> - 2013-11-05 22:02:32
|
Hi Rich, On Tue, 2013-11-05 at 16:26 -0500, Rich Schmidt wrote: > This is Rich Schmidt, linuxptp newbie. > > I am testing linuxptp on this system at the US Naval Observatory: > > Supermicro SYS-5015A-EHF-D525 (Atom) > > Intel 82547L NICs driver: e1000e version: 2.5.4-NAPI > firmware-version: 1.9-0 > Debian with kernel 3.12.0-rc > > > Running: > Sync PHC to USNO Master Clock via Zyfer Gsync PTP GrandMaster: > ptp4l -i eth1 -l 7 -s -p /dev/ptp1 > > Sync CLOCK_REALTIME to PHC: > phc2sys -s /dev/ptp1 -L 100000000 -l 7 -R 0.25 -O 0 > > > > Things seem to work fine for a while, then I get a single large phase > offset detected by ptp4l. The -L freq limit was an attempt to > control these offsets, but did not help. > > > Are these large phase jumps filtered out by ptp4l? It seems not, > because phc2sys sees them. Or is this some unreliability in the Intel > > 82547L NICs? Is the PHC read failing? Thank you for your thoughts. > > > > Here is a sample. The clock is not being steered by NTP or any other > program. > Are you sure? I can't think of anything else controlling the clock, but something is obviously controlling it as seen in the logs. > Nov 5 18:12:27 pluto ptp4l: [354666.428] master offset 57 s2 > freq +34356 path delay 6086 > Nov 5 18:12:29 pluto ptp4l: [354668.428] master offset -139 s2 > freq +34266 path delay 6092 > Nov 5 18:12:30 pluto phc2sys: [354669.993] phc offset 4529 s2 > freq +8805 delay 4715 > Nov 5 18:12:31 pluto ptp4l: [354670.428] master offset -32 s2 > freq +34299 path delay 6092 > Nov 5 18:12:33 pluto ptp4l: [354672.428] master offset 20 s2 > freq +34320 path delay 6092 > Nov 5 18:12:34 pluto phc2sys: [354673.993] phc offset 470 s2 > freq +7931 delay 4705 > Nov 5 18:12:35 pluto ptp4l: [354674.428] master offset 54 s2 > freq +34340 path delay 6095 > Nov 5 18:12:37 pluto ptp4l: [354676.428] master offset -15 s2 > freq +34314 path delay 6095 > Nov 5 18:12:38 pluto phc2sys: [354677.993] phc offset -6992 s2 > freq +3968 delay 4870 > Nov 5 18:12:39 pluto ptp4l: [354678.428] master offset -19 s2 > freq +34309 path delay 6096 > Nov 5 18:12:41 pluto ptp4l: [354680.429] master offset 55 s2 > freq +34344 path delay 6096 > Nov 5 18:12:42 pluto phc2sys: [354681.994] phc offset 11326 s2 > freq +11945 delay 4715 > Nov 5 18:12:43 pluto ptp4l: [354682.428] master offset -90 s2 > freq +34279 path delay 6096 > Nov 5 18:12:45 pluto ptp4l: [354684.429] master offset -49 s2 > freq +34286 path delay 6096 > Nov 5 18:12:46 pluto phc2sys: [354685.994] phc offset -70368744182111 > s2 freq -500000 delay 4715 > Nov 5 18:12:47 pluto ptp4l: [354686.428] clockcheck: clock jumped > forward or running faster than expected! This should pretty much be caused by something managing the clock causing a jump. Possibly your grand master on the other end is doing something? I can't think of any other reason this would occur... Do you have the ability to monitor the grand master state and see if it was jumped? Since you're doing hardware timestamping, nothing would control the clock on the device except ptp4l.. so even NTP running shouldn't cause an issue (other than phc2sys trying to interfere with it... but that wouldn't be in the ptp4l logs) My gut says the driver is resetting the clock to 0 somehow on accident... What about the driver, what version are you using? The debian in-kernel e1000e driver? Could you try this against the one available on sourceforge.net from our e1000 project? This could theoretically be caused by a bug in the driver.. Since I am not part of the e1000e team, I don't know the specifics for that driver... maybe they have some logic that is resetting the register values incorrectly.. You could also check the output of the clock directly by using the ptp test program provided in the Documentation folder in the kernel source.. you might be able to kill ptp4l in time and check to see what the value of the ptp device clock says it is at that point... Could you show us some of the dmesg output as well? Maybe that might indicate some other issue occurring.. I'm not really sure.. Regards, Jake |