Re: [Linuxptp-users] Need help debugging failed clock synchronization
PTP IEEE 1588 stack for Linux
Brought to you by:
rcochran
From: John H. <jhu...@no...> - 2016-03-16 17:20:43
|
On 03/16/2016 09:45 AM, Ledda William EXT wrote: > What happens if you use SW time stamping instead of the HW one? After changing the 'time_stamping' option in /etc/ptp4l.conf from hardware to software and restarting ptp4l I now see much better behavior. Below is the log output after giving it a little while to settle. (This is still under the 4.5 kernel with the included e1000e driver). Mar 16 09:53:50 statler ptp4l[13014]: [6140.391] master offset -5072 s2 freq -21299 path delay 55357 Mar 16 09:53:51 statler ptp4l[13014]: [6141.393] master offset 7692 s2 freq -20015 path delay 55357 Mar 16 09:53:52 statler ptp4l[13014]: [6142.394] master offset -1163 s2 freq -20902 path delay 55357 Mar 16 09:53:53 statler ptp4l[13014]: [6143.396] master offset 5369 s2 freq -20243 path delay 55357 Mar 16 09:53:54 statler ptp4l[13014]: [6144.396] master offset -12270 s2 freq -22019 path delay 55357 Mar 16 09:53:55 statler ptp4l[13014]: [6145.398] master offset -18745 s2 freq -22685 path delay 55357 Mar 16 09:53:56 statler ptp4l[13014]: [6146.399] master offset 7707 s2 freq -20033 path delay 55357 Mar 16 09:53:57 statler ptp4l[13014]: [6147.401] master offset 7230 s2 freq -20073 path delay 55459 Mar 16 09:53:58 statler ptp4l[13014]: [6148.401] master offset 7093 s2 freq -20080 path delay 55459 Mar 16 09:53:59 statler ptp4l[13014]: [6149.403] master offset -1826 s2 freq -20973 path delay 55459 Mar 16 09:54:00 statler ptp4l[13014]: [6150.404] master offset 6597 s2 freq -20124 path delay 55459 Mar 16 09:54:01 statler ptp4l[13014]: [6151.405] master offset 5667 s2 freq -20212 path delay 55459 Mar 16 09:54:02 statler ptp4l[13014]: [6152.406] master offset -14483 s2 freq -22241 path delay 55459 > Can you try compiling and installing manually the driver from Intel? I believe that I did try the Intel driver but didn't see any success. I found version 3.3.3 of the driver at [3], followed the instructions in the readme. At the time I was running the 3.10.0-327.10.1 kernel. The timestamp (see below) on e1000e.ko matches up with when I performed the build, and the file size is way bigger (6M as compared to ~780K) for the ko on the older 3.10 and the newer 4.5 kernels. I did an rmmod (which hung my SSH session) I then rebooted the machine (which I assume loaded the new driver). After having done all of that I saw the same clock jumped forward messages, ever growing master offset, and negative path delay. I then moved onto the new kernel. -rw-r--r-- 1 root root 6.0M Mar 16 07:40 /usr/lib/modules/3.10.0-327.10.1.el7.x86_64/updates/drivers/net/ethernet/intel/e1000e/e1000e.ko -rw-r--r--. 1 root root 381K Nov 19 15:52 /usr/lib/modules/3.10.0-327.el7.x86_64/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko -rwxr--r-- 1 root root 377K Mar 14 08:37 /usr/lib/modules/4.5.0-1.el7.elrepo.x86_64/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko [3] https://downloadcenter.intel.com/download/15817 > > William > > -----Original Message----- > From: John Hubbard [mailto:jhu...@no...] > Sent: 16 March 2016 16:39 > To: lin...@li... > Subject: Re: [Linuxptp-users] Need help debugging failed clock synchronization > > On 03/16/2016 03:50 AM, Richard Cochran wrote: >> On Tue, Mar 15, 2016 at 04:14:32PM -0700, John Hubbard wrote: >>> Apologies if this has already been asked and answered. I tried to >>> look for solutions to my problem in the mailing list archive, but >>> when I click the list archive link on the mailman page, I get a >>> sourceforge page telling me Error 403 "Read access required". >> Yes, SF is mostly broken. Please use gmane for the archives. >> >> http://news.gmane.org/gmane.comp.linux.ptp.user >> >> http://news.gmane.org/gmane.comp.linux.ptp.devel > Thanks for the hint. Looking through the archive, it looks like my problem might be similar to Daniel Le's January thread "Master offsets don't converge". However it doesn't look like he ever resolved things, and it also looks like he was using SW time-stamping where as I believe my NIC should be capable of HW time-stamping. > >>> I'm trying to configure a machine running CentOS 7 (3.10 kernel) with >>> an Intel 82574L NIC to use PTP as its time source. >> There are two Linux kernel driver workarounds for that unlucky card: >> >> 5e7ff97004 v3.16-rc1 e1000e: 82574/82583 TimeSync errata for SYSTIM read >> 37b12910dd v4.3-rc1 e1000e: Fix tight loop implementation of systime read algorithm >> >> You should try a newer kernel (4.3+) or use the Intel out of tree >> drivers from SF. > Thanks for the suggestions. I followed the instructions at [1] and I'm now running with a 4.5 kernel. > > [jhubbard@statler:~]$ uname -a > Linux statler 4.5.0-1.el7.elrepo.x86_64 #1 SMP Mon Mar 14 10:24:58 EDT > 2016 x86_64 x86_64 x86_64 GNU/Linux > > I've disabled phc2sys for now. I tried restarting ptp4l and the log [2] still shows the same clock jumped forward errors. > > [2] > [jhubbard@statler:~]$ journalctl -u ptp4l -f > -- Logs begin at Wed 2016-03-16 07:48:05 MST. -- > Mar 16 08:15:33 statler ptp4l[12591]: [242.851] port 0: INITIALIZING to LISTENING on INITIALIZE > Mar 16 08:15:32 statler systemd[1]: Stopping Precision Time Protocol (PTP) service... > Mar 16 08:15:33 statler systemd[1]: Started Precision Time Protocol (PTP) service. > Mar 16 08:15:33 statler systemd[1]: Starting Precision Time Protocol (PTP) service... > Mar 16 08:15:33 statler ptp4l[12591]: [243.204] port 1: new foreign master 000cec.fffe.080c09-1 > Mar 16 08:15:37 statler ptp4l[12591]: [247.209] selected best master clock 000cec.fffe.080c09 > Mar 16 08:15:37 statler ptp4l[12591]: [247.209] port 1: LISTENING to UNCALIBRATED on RS_SLAVE > Mar 16 08:15:37 statler ptp4l[12591]: [247.279] port 1: minimum delay request interval 2^4 > Mar 16 08:15:39 statler ptp4l[12591]: [249.211] master offset -16769399087 s0 freq +23999998 path delay -1116866908 > Mar 16 08:15:40 statler ptp4l[12591]: [250.213] master offset -13924642727 s1 freq +23999999 path delay -1116866908 > Mar 16 08:15:41 statler ptp4l[12591]: [251.214] master offset 2750049109 s2 freq +23999999 path delay -1116866908 > Mar 16 08:15:41 statler ptp4l[12591]: [251.214] port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED > Mar 16 08:15:42 statler ptp4l[12591]: [252.215] clockcheck: clock jumped forward or running faster than expected! > Mar 16 08:15:42 statler ptp4l[12591]: [252.215] master offset 5502378494 s0 freq +23999999 path delay -1116866908 > > Messages continue with alternating "clockcheck: clock jumped" and "master offset" messages. The freq is fixed, the master offset counts slowly upwards, and the path delay remains negative with the occasional small fluctuations. > > [1] > http://linuxg.net/install-kernel-4-x-on-enterprise-linux-7-centos-7-and-rhel-7/ > > -- -john To be or not to be, that is the question 2b || !2b (0b10)*(0b1100010) || !(0b10)*(0b1100010) 0b11000100 || !0b11000100 0b11000100 || 0b00111011 0b11111111 255, that is the answer. |