Re: [Linuxptp-users] clockcheck - need to filter large spurious phase jumps?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Tue, 2013-11-05 at 18:50 -0500, Rich Schmidt wrote:
> Thanks, Jake, 
> Well, it is a jump of 19.55 hours in the last example that you note.
> We would notice that on the GrandMaster.    These jumps happen four or
> 5 times per hour at random times.  I am using a newer e1000e driver
> than the sourceforge driver (oh, oh).  From the Intel site,
> 2.5.4-NAPI. I will try the sourceforge version next. 
> 
> # testptp -c -d /dev/ptp1
> capabilities:
>   599999999 maximum frequency adjustment (ppb)
>   0 programmable alarms
>   0 external time stamp channels
>   0 programmable periodic signals
>   0 pulse per second
> 
> # testptp -g -d /dev/ptp1;date
> clock time: 1383695290.199580197 or Tue Nov  5 23:48:10 2013
> Tue Nov  5 23:48:10 UTC 2013
> # ethtool -T eth1
> 
> Time stamping parameters for eth1:
> Capabilities:
>     hardware-transmit     (SOF_TIMESTAMPING_TX_HARDWARE)
>     software-transmit     (SOF_TIMESTAMPING_TX_SOFTWARE)
>     hardware-receive      (SOF_TIMESTAMPING_RX_HARDWARE)
>     software-receive      (SOF_TIMESTAMPING_RX_SOFTWARE)
>     software-system-clock (SOF_TIMESTAMPING_SOFTWARE)
>     hardware-raw-clock    (SOF_TIMESTAMPING_RAW_HARDWARE)
> PTP Hardware Clock: 1
> Hardware Transmit Timestamp Modes:
>     off                   (HWTSTAMP_TX_OFF)
>     on                    (HWTSTAMP_TX_ON)
> Hardware Receive Filter Modes:
>     none                  (HWTSTAMP_FILTER_NONE)
>     all                   (HWTSTAMP_FILTER_ALL)
>     ptpv1-l4-sync         (HWTSTAMP_FILTER_PTP_V1_L4_SYNC)
>     ptpv1-l4-delay-req    (HWTSTAMP_FILTER_PTP_V1_L4_DELAY_REQ)
>     ptpv2-l4-sync         (HWTSTAMP_FILTER_PTP_V2_L4_SYNC)
>     ptpv2-l4-delay-req    (HWTSTAMP_FILTER_PTP_V2_L4_DELAY_REQ)
>     ptpv2-l2-sync         (HWTSTAMP_FILTER_PTP_V2_L2_SYNC)
>     ptpv2-l2-delay-req    (HWTSTAMP_FILTER_PTP_V2_L2_DELAY_REQ)
>     ptpv2-event           (HWTSTAMP_FILTER_PTP_V2_EVENT)
>     ptpv2-sync            (HWTSTAMP_FILTER_PTP_V2_SYNC)
>     ptpv2-delay-req       (HWTSTAMP_FILTER_PTP_V2_DELAY_REQ)
> 
> 
> My real question is can the clock_sanity check in linuxptp filter out
> crazy big offsets that are say, greater than 3 s.d. from the mean?  
> 
> 

If the Grand Master is having these jumps, there's no way to filter
them.. that is the remote clock simply being jumped, and we can't
"filter" it, because what we do is reset the servo, and then relock on
the new target. It looks like a "you should filter this out" but in
reality the values you see are the time offset from the master, not the
current time.

That is, the ptp4l log output doesn't really do a good job of showing
the current time on the clock, so it isn't obvious whether you are
actually going forward or not. Does this make sense?

If the Grand Master is *not* jumping and you are just seeing your local
clock reset, that's a bug.

Thanks,
Jake

> 
> On Tue, Nov 5, 2013 at 5:02 PM, Keller, Jacob E
> <jac...@in...> wrote:
>         Hi Rich,
>         
>         On Tue, 2013-11-05 at 16:26 -0500, Rich Schmidt wrote:
>         > This is Rich Schmidt, linuxptp newbie.
>         >
>         > I am testing linuxptp on this system at the US Naval
>         Observatory:
>         >
>         > Supermicro SYS-5015A-EHF-D525 (Atom)
>         >
>         > Intel 82547L  NICs   driver: e1000e version: 2.5.4-NAPI
>         > firmware-version: 1.9-0
>         > Debian with kernel 3.12.0-rc
>         >
>         >
>         > Running:
>         > Sync PHC to USNO Master Clock via Zyfer Gsync PTP
>         GrandMaster:
>         > ptp4l -i eth1 -l 7 -s -p /dev/ptp1
>         >
>         > Sync CLOCK_REALTIME to PHC:
>         > phc2sys -s /dev/ptp1 -L 100000000 -l 7 -R 0.25 -O 0
>         >
>         >
>         >
>         > Things seem to work fine for a while, then I get a single
>         large phase
>         > offset detected by ptp4l.  The  -L freq limit was an attempt
>         to
>         > control these offsets, but did not help.
>         >
>         >
>         > Are these large phase jumps filtered out by ptp4l?  It seems
>         not,
>         > because phc2sys sees them. Or is this some unreliability in
>         the Intel
>         >
>         > 82547L NICs?  Is the PHC read failing?   Thank you for your
>         thoughts.
>         >
>         >
>         >
>         > Here is a sample.  The clock is not being steered by NTP or
>         any other
>         > program.
>         >
>         
>         
>         Are you sure? I can't think of anything else controlling the
>         clock, but
>         something is obviously controlling it as seen in the logs.
>         
>         > Nov  5 18:12:27 pluto ptp4l: [354666.428] master offset
>         57 s2
>         > freq  +34356 path delay      6086
>         > Nov  5 18:12:29 pluto ptp4l: [354668.428] master offset
>         -139 s2
>         > freq  +34266 path delay      6092
>         > Nov  5 18:12:30 pluto phc2sys: [354669.993] phc offset
>          4529 s2
>         > freq   +8805 delay   4715
>         > Nov  5 18:12:31 pluto ptp4l: [354670.428] master offset
>            -32 s2
>         > freq  +34299 path delay      6092
>         > Nov  5 18:12:33 pluto ptp4l: [354672.428] master offset
>         20 s2
>         > freq  +34320 path delay      6092
>         > Nov  5 18:12:34 pluto phc2sys: [354673.993] phc offset
>         470 s2
>         > freq   +7931 delay   4705
>         > Nov  5 18:12:35 pluto ptp4l: [354674.428] master offset
>         54 s2
>         > freq  +34340 path delay      6095
>         > Nov  5 18:12:37 pluto ptp4l: [354676.428] master offset
>            -15 s2
>         > freq  +34314 path delay      6095
>         > Nov  5 18:12:38 pluto phc2sys: [354677.993] phc offset
>         -6992 s2
>         > freq   +3968 delay   4870
>         > Nov  5 18:12:39 pluto ptp4l: [354678.428] master offset
>            -19 s2
>         > freq  +34309 path delay      6096
>         > Nov  5 18:12:41 pluto ptp4l: [354680.429] master offset
>         55 s2
>         > freq  +34344 path delay      6096
>         > Nov  5 18:12:42 pluto phc2sys: [354681.994] phc offset
>         11326 s2
>         > freq  +11945 delay   4715
>         > Nov  5 18:12:43 pluto ptp4l: [354682.428] master offset
>            -90 s2
>         > freq  +34279 path delay      6096
>         > Nov  5 18:12:45 pluto ptp4l: [354684.429] master offset
>            -49 s2
>         > freq  +34286 path delay      6096
>         > Nov  5 18:12:46 pluto phc2sys: [354685.994] phc offset
>         -70368744182111
>         > s2 freq -500000 delay   4715
>         > Nov  5 18:12:47 pluto ptp4l: [354686.428] clockcheck: clock
>         jumped
>         > forward or running faster than expected!
>         
>         
>         This should pretty much be caused by something managing the
>         clock
>         causing a jump. Possibly your grand master on the other end is
>         doing
>         something? I can't think of any other reason this would
>         occur... Do you
>         have the ability to monitor the grand master state and see if
>         it was
>         jumped?
>         
>         Since you're doing hardware timestamping, nothing would
>         control the
>         clock on the device except ptp4l.. so even NTP running
>         shouldn't cause
>         an issue (other than phc2sys trying to interfere with it...
>         but that
>         wouldn't be in the ptp4l logs)
>         
>         My gut says the driver is resetting the clock to 0 somehow on
>         accident...
>         
>         What about the driver, what version are you using? The debian
>         in-kernel
>         e1000e driver? Could you try this against the one available on
>         sourceforge.net from our e1000 project? This could
>         theoretically be
>         caused by a bug in the driver..
>         
>         Since I am not part of the e1000e team, I don't know the
>         specifics for
>         that driver... maybe they have some logic that is resetting
>         the register
>         values incorrectly..
>         
>         You could also check the output of the clock directly by using
>         the ptp
>         test program provided in the Documentation folder in the
>         kernel source..
>         you might be able to kill ptp4l in time and check to see what
>         the value
>         of the ptp device clock says it is at that point...
>         
>         Could you show us some of the dmesg output as well? Maybe that
>         might
>         indicate some other issue occurring.. I'm not really sure..
>         
>         Regards,
>         Jake
>         
>         
> 
> 
> ------------------------------------------------------------------------------
> November Webinars for C, C++, Fortran Developers
> Accelerate application performance with scalable programming models. Explore
> techniques for threading, error checking, porting, and tuning. Get the most 
> from the latest Intel processors and coprocessors. See abstracts and register
> http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk
> _______________________________________________
> Linuxptp-users mailing list
> Lin...@li...
> https://lists.sourceforge.net/lists/listinfo/linuxptp-users

Re: [Linuxptp-users] clockcheck - need to filter large spurious phase jumps?

PTP IEEE 1588 stack for Linux

Re: [Linuxptp-users] clockcheck - need to filter large spurious phase jumps?