Thread: [Linuxptp-users] Need help debugging failed clock synchronization
PTP IEEE 1588 stack for Linux
Brought to you by:
rcochran
From: John H. <jhu...@no...> - 2016-03-15 23:14:40
|
Apologies if this has already been asked and answered. I tried to look for solutions to my problem in the mailing list archive, but when I click the list archive link on the mailman page, I get a sourceforge page telling me Error 403 "Read access required". I'm trying to configure a machine running CentOS 7 (3.10 kernel) with an Intel 82574L NIC to use PTP as its time source. I was able to successfully do this with another CentOS 7 machine (Intel i350 NIC) but I'm having problems with this new system. In both cases the PTP Master is a Spectracom SecureSync PTP Grand Master. I've followed Redhat's directions [1] for configuring PTP. My ptp4l options are "-f /etc/ptp4l.conf -i eno1 -A" and my phc2sys option are "-a -r -u 60". My ptp4l.conf file is the CentOS 7 default and the same across both system. I can supply that if you think it'll be useful. The master is connected to the problem machine through a non-boundary switch; specifically an HP-ProCurve 2910al-24g. The other machine is connected through that same switch plus a non-boundary Cisco switch, and at least two or three more switches of unknown manufacturers. My log shows two repeating ptp4l log messages [2] with the master offset counting slowly upwards. The path delay is kind of stable but always negative. What does a negative path delay mean? The message about clock jump: is that saying that the ptp master clock has jumped forward/running fast, or is it referring to the system clock or a hardware clock? Overall does anyone have any suggestions for what might be wrong? FWIW [3] shows the ph2sys log messages. Thanks in advance [1] https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/System_Administrators_Guide/ch-Configuring_PTP_Using_ptp4l.html [2] Mar 15 15:35:47 statler ptp4l[2628]: [2582.823] clockcheck: clock jumped forward or running faster than expected! Mar 15 15:37:37 statler ptp4l[2628]: [2693.041] master offset 993697857563 s0 freq +23999999 path delay -713598018 [3] Mar 15 15:31:22 statler systemd[1]: Started Synchronize system clock or PTP hardware clock (PHC). Mar 15 15:31:33 statler phc2sys[773]: [2332.991] port 002590.fffe.a1f6a1-1 changed state Mar 15 15:31:33 statler phc2sys[773]: [2332.991] reconfiguring after port state change Mar 15 15:31:33 statler phc2sys[773]: [2332.991] selecting CLOCK_REALTIME for synchronization Mar 15 15:31:33 statler phc2sys[773]: [2332.991] selecting eno1 as the master clock Mar 15 15:31:38 statler phc2sys[773]: [2333.991] port 002590.fffe.a1f6a1-1 changed state Mar 15 15:31:38 statler phc2sys[773]: [2333.991] reconfiguring after port state change Mar 15 15:31:38 statler phc2sys[773]: [2333.991] master clock not ready, waiting... -- -john To be or not to be, that is the question 2b || !2b (0b10)*(0b1100010) || !(0b10)*(0b1100010) 0b11000100 || !0b11000100 0b11000100 || 0b00111011 0b11111111 255, that is the answer. |
From: Ledda W. E. <Wil...@it...> - 2016-03-16 08:20:47
|
Hello, I remember that there were some discussion on the “clock jumped forward or running faster than expected”. It could be a problem in the driver. AFAIK Intel i350 uses igb driver, while 82574L e1000. So… Which driver are you using (run “ethtool –i <interface>” as root)? Have you tried to run ptp4l without phc2sys? William From: John Hubbard [mailto:jhu...@no...] Sent: 16 March 2016 00:15 To: lin...@li... Subject: [Linuxptp-users] Need help debugging failed clock synchronization Apologies if this has already been asked and answered. I tried to look for solutions to my problem in the mailing list archive, but when I click the list archive link on the mailman page, I get a sourceforge page telling me Error 403 "Read access required". I'm trying to configure a machine running CentOS 7 (3.10 kernel) with an Intel 82574L NIC to use PTP as its time source. I was able to successfully do this with another CentOS 7 machine (Intel i350 NIC) but I'm having problems with this new system. In both cases the PTP Master is a Spectracom SecureSync PTP Grand Master. I've followed Redhat's directions [1] for configuring PTP. My ptp4l options are "-f /etc/ptp4l.conf -i eno1 -A" and my phc2sys option are "-a -r -u 60". My ptp4l.conf file is the CentOS 7 default and the same across both system. I can supply that if you think it'll be useful. The master is connected to the problem machine through a non-boundary switch; specifically an HP-ProCurve 2910al-24g. The other machine is connected through that same switch plus a non-boundary Cisco switch, and at least two or three more switches of unknown manufacturers. My log shows two repeating ptp4l log messages [2] with the master offset counting slowly upwards. The path delay is kind of stable but always negative. What does a negative path delay mean? The message about clock jump: is that saying that the ptp master clock has jumped forward/running fast, or is it referring to the system clock or a hardware clock? Overall does anyone have any suggestions for what might be wrong? FWIW [3] shows the ph2sys log messages. Thanks in advance [1] https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/System_Administrators_Guide/ch-Configuring_PTP_Using_ptp4l.html [2] Mar 15 15:35:47 statler ptp4l[2628]: [2582.823] clockcheck: clock jumped forward or running faster than expected! Mar 15 15:37:37 statler ptp4l[2628]: [2693.041] master offset 993697857563 s0 freq +23999999 path delay -713598018 [3] Mar 15 15:31:22 statler systemd[1]: Started Synchronize system clock or PTP hardware clock (PHC). Mar 15 15:31:33 statler phc2sys[773]: [2332.991] port 002590.fffe.a1f6a1-1 changed state Mar 15 15:31:33 statler phc2sys[773]: [2332.991] reconfiguring after port state change Mar 15 15:31:33 statler phc2sys[773]: [2332.991] selecting CLOCK_REALTIME for synchronization Mar 15 15:31:33 statler phc2sys[773]: [2332.991] selecting eno1 as the master clock Mar 15 15:31:38 statler phc2sys[773]: [2333.991] port 002590.fffe.a1f6a1-1 changed state Mar 15 15:31:38 statler phc2sys[773]: [2333.991] reconfiguring after port state change Mar 15 15:31:38 statler phc2sys[773]: [2333.991] master clock not ready, waiting... -- -john To be or not to be, that is the question 2b || !2b (0b10)*(0b1100010) || !(0b10)*(0b1100010) 0b11000100 || !0b11000100 0b11000100 || 0b00111011 0b11111111 255, that is the answer. |
From: Richard C. <ric...@gm...> - 2016-03-16 10:53:22
|
On Wed, Mar 16, 2016 at 08:20:33AM +0000, Ledda William EXT wrote: > Hello, > I remember that there were some discussion on the “clock jumped > forward or running faster than expected”. It could be a problem in > the driver. AFAIK Intel i350 uses igb driver, while 82574L e1000. The 82574 needs the e1000e driver. > So… Which driver are you using (run “ethtool –i <interface>” as > root)? Have you tried to run ptp4l without phc2sys? Right, phc2sys depends on ptp4l working properly. So, to debug your new HW, first run ptp4l all by itself. Thanks, Richard |
From: Richard C. <ric...@gm...> - 2016-03-16 10:50:31
|
On Tue, Mar 15, 2016 at 04:14:32PM -0700, John Hubbard wrote: > Apologies if this has already been asked and answered. I tried to look for > solutions to my problem in the mailing list archive, but when I click the > list archive link on the mailman page, I get a sourceforge page telling me > Error 403 "Read access required". Yes, SF is mostly broken. Please use gmane for the archives. http://news.gmane.org/gmane.comp.linux.ptp.user http://news.gmane.org/gmane.comp.linux.ptp.devel > I'm trying to configure a machine running CentOS 7 (3.10 kernel) with an > Intel 82574L NIC to use PTP as its time source. There are two Linux kernel driver workarounds for that unlucky card: 5e7ff97004 v3.16-rc1 e1000e: 82574/82583 TimeSync errata for SYSTIM read 37b12910dd v4.3-rc1 e1000e: Fix tight loop implementation of systime read algorithm You should try a newer kernel (4.3+) or use the Intel out of tree drivers from SF. Thanks, Richard |
From: John H. <jhu...@no...> - 2016-03-16 15:38:55
|
On 03/16/2016 03:50 AM, Richard Cochran wrote: > On Tue, Mar 15, 2016 at 04:14:32PM -0700, John Hubbard wrote: >> Apologies if this has already been asked and answered. I tried to look for >> solutions to my problem in the mailing list archive, but when I click the >> list archive link on the mailman page, I get a sourceforge page telling me >> Error 403 "Read access required". > Yes, SF is mostly broken. Please use gmane for the archives. > > http://news.gmane.org/gmane.comp.linux.ptp.user > > http://news.gmane.org/gmane.comp.linux.ptp.devel Thanks for the hint. Looking through the archive, it looks like my problem might be similar to Daniel Le's January thread "Master offsets don't converge". However it doesn't look like he ever resolved things, and it also looks like he was using SW time-stamping where as I believe my NIC should be capable of HW time-stamping. >> I'm trying to configure a machine running CentOS 7 (3.10 kernel) with an >> Intel 82574L NIC to use PTP as its time source. > There are two Linux kernel driver workarounds for that unlucky card: > > 5e7ff97004 v3.16-rc1 e1000e: 82574/82583 TimeSync errata for SYSTIM read > 37b12910dd v4.3-rc1 e1000e: Fix tight loop implementation of systime read algorithm > > You should try a newer kernel (4.3+) or use the Intel out of tree > drivers from SF. Thanks for the suggestions. I followed the instructions at [1] and I'm now running with a 4.5 kernel. [jhubbard@statler:~]$ uname -a Linux statler 4.5.0-1.el7.elrepo.x86_64 #1 SMP Mon Mar 14 10:24:58 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux I've disabled phc2sys for now. I tried restarting ptp4l and the log [2] still shows the same clock jumped forward errors. [2] [jhubbard@statler:~]$ journalctl -u ptp4l -f -- Logs begin at Wed 2016-03-16 07:48:05 MST. -- Mar 16 08:15:33 statler ptp4l[12591]: [242.851] port 0: INITIALIZING to LISTENING on INITIALIZE Mar 16 08:15:32 statler systemd[1]: Stopping Precision Time Protocol (PTP) service... Mar 16 08:15:33 statler systemd[1]: Started Precision Time Protocol (PTP) service. Mar 16 08:15:33 statler systemd[1]: Starting Precision Time Protocol (PTP) service... Mar 16 08:15:33 statler ptp4l[12591]: [243.204] port 1: new foreign master 000cec.fffe.080c09-1 Mar 16 08:15:37 statler ptp4l[12591]: [247.209] selected best master clock 000cec.fffe.080c09 Mar 16 08:15:37 statler ptp4l[12591]: [247.209] port 1: LISTENING to UNCALIBRATED on RS_SLAVE Mar 16 08:15:37 statler ptp4l[12591]: [247.279] port 1: minimum delay request interval 2^4 Mar 16 08:15:39 statler ptp4l[12591]: [249.211] master offset -16769399087 s0 freq +23999998 path delay -1116866908 Mar 16 08:15:40 statler ptp4l[12591]: [250.213] master offset -13924642727 s1 freq +23999999 path delay -1116866908 Mar 16 08:15:41 statler ptp4l[12591]: [251.214] master offset 2750049109 s2 freq +23999999 path delay -1116866908 Mar 16 08:15:41 statler ptp4l[12591]: [251.214] port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED Mar 16 08:15:42 statler ptp4l[12591]: [252.215] clockcheck: clock jumped forward or running faster than expected! Mar 16 08:15:42 statler ptp4l[12591]: [252.215] master offset 5502378494 s0 freq +23999999 path delay -1116866908 Messages continue with alternating "clockcheck: clock jumped" and "master offset" messages. The freq is fixed, the master offset counts slowly upwards, and the path delay remains negative with the occasional small fluctuations. [1] http://linuxg.net/install-kernel-4-x-on-enterprise-linux-7-centos-7-and-rhel-7/ -- -john To be or not to be, that is the question 2b || !2b (0b10)*(0b1100010) || !(0b10)*(0b1100010) 0b11000100 || !0b11000100 0b11000100 || 0b00111011 0b11111111 255, that is the answer. |
From: Ledda W. E. <Wil...@it...> - 2016-03-16 16:46:11
|
What happens if you use SW time stamping instead of the HW one? Can you try compiling and installing manually the driver from Intel? William -----Original Message----- From: John Hubbard [mailto:jhu...@no...] Sent: 16 March 2016 16:39 To: lin...@li... Subject: Re: [Linuxptp-users] Need help debugging failed clock synchronization On 03/16/2016 03:50 AM, Richard Cochran wrote: > On Tue, Mar 15, 2016 at 04:14:32PM -0700, John Hubbard wrote: >> Apologies if this has already been asked and answered. I tried to >> look for solutions to my problem in the mailing list archive, but >> when I click the list archive link on the mailman page, I get a >> sourceforge page telling me Error 403 "Read access required". > Yes, SF is mostly broken. Please use gmane for the archives. > > http://news.gmane.org/gmane.comp.linux.ptp.user > > http://news.gmane.org/gmane.comp.linux.ptp.devel Thanks for the hint. Looking through the archive, it looks like my problem might be similar to Daniel Le's January thread "Master offsets don't converge". However it doesn't look like he ever resolved things, and it also looks like he was using SW time-stamping where as I believe my NIC should be capable of HW time-stamping. >> I'm trying to configure a machine running CentOS 7 (3.10 kernel) with >> an Intel 82574L NIC to use PTP as its time source. > There are two Linux kernel driver workarounds for that unlucky card: > > 5e7ff97004 v3.16-rc1 e1000e: 82574/82583 TimeSync errata for SYSTIM read > 37b12910dd v4.3-rc1 e1000e: Fix tight loop implementation of systime read algorithm > > You should try a newer kernel (4.3+) or use the Intel out of tree > drivers from SF. Thanks for the suggestions. I followed the instructions at [1] and I'm now running with a 4.5 kernel. [jhubbard@statler:~]$ uname -a Linux statler 4.5.0-1.el7.elrepo.x86_64 #1 SMP Mon Mar 14 10:24:58 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux I've disabled phc2sys for now. I tried restarting ptp4l and the log [2] still shows the same clock jumped forward errors. [2] [jhubbard@statler:~]$ journalctl -u ptp4l -f -- Logs begin at Wed 2016-03-16 07:48:05 MST. -- Mar 16 08:15:33 statler ptp4l[12591]: [242.851] port 0: INITIALIZING to LISTENING on INITIALIZE Mar 16 08:15:32 statler systemd[1]: Stopping Precision Time Protocol (PTP) service... Mar 16 08:15:33 statler systemd[1]: Started Precision Time Protocol (PTP) service. Mar 16 08:15:33 statler systemd[1]: Starting Precision Time Protocol (PTP) service... Mar 16 08:15:33 statler ptp4l[12591]: [243.204] port 1: new foreign master 000cec.fffe.080c09-1 Mar 16 08:15:37 statler ptp4l[12591]: [247.209] selected best master clock 000cec.fffe.080c09 Mar 16 08:15:37 statler ptp4l[12591]: [247.209] port 1: LISTENING to UNCALIBRATED on RS_SLAVE Mar 16 08:15:37 statler ptp4l[12591]: [247.279] port 1: minimum delay request interval 2^4 Mar 16 08:15:39 statler ptp4l[12591]: [249.211] master offset -16769399087 s0 freq +23999998 path delay -1116866908 Mar 16 08:15:40 statler ptp4l[12591]: [250.213] master offset -13924642727 s1 freq +23999999 path delay -1116866908 Mar 16 08:15:41 statler ptp4l[12591]: [251.214] master offset 2750049109 s2 freq +23999999 path delay -1116866908 Mar 16 08:15:41 statler ptp4l[12591]: [251.214] port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED Mar 16 08:15:42 statler ptp4l[12591]: [252.215] clockcheck: clock jumped forward or running faster than expected! Mar 16 08:15:42 statler ptp4l[12591]: [252.215] master offset 5502378494 s0 freq +23999999 path delay -1116866908 Messages continue with alternating "clockcheck: clock jumped" and "master offset" messages. The freq is fixed, the master offset counts slowly upwards, and the path delay remains negative with the occasional small fluctuations. [1] http://linuxg.net/install-kernel-4-x-on-enterprise-linux-7-centos-7-and-rhel-7/ -- -john To be or not to be, that is the question 2b || !2b (0b10)*(0b1100010) || !(0b10)*(0b1100010) 0b11000100 || !0b11000100 0b11000100 || 0b00111011 0b11111111 255, that is the answer. ------------------------------------------------------------------------------ Transform Data into Opportunity. Accelerate data analysis in your applications with Intel Data Analytics Acceleration Library. Click to learn more. http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140 _______________________________________________ Linuxptp-users mailing list Lin...@li... https://lists.sourceforge.net/lists/listinfo/linuxptp-users |
From: John H. <jhu...@no...> - 2016-03-16 17:20:43
|
On 03/16/2016 09:45 AM, Ledda William EXT wrote: > What happens if you use SW time stamping instead of the HW one? After changing the 'time_stamping' option in /etc/ptp4l.conf from hardware to software and restarting ptp4l I now see much better behavior. Below is the log output after giving it a little while to settle. (This is still under the 4.5 kernel with the included e1000e driver). Mar 16 09:53:50 statler ptp4l[13014]: [6140.391] master offset -5072 s2 freq -21299 path delay 55357 Mar 16 09:53:51 statler ptp4l[13014]: [6141.393] master offset 7692 s2 freq -20015 path delay 55357 Mar 16 09:53:52 statler ptp4l[13014]: [6142.394] master offset -1163 s2 freq -20902 path delay 55357 Mar 16 09:53:53 statler ptp4l[13014]: [6143.396] master offset 5369 s2 freq -20243 path delay 55357 Mar 16 09:53:54 statler ptp4l[13014]: [6144.396] master offset -12270 s2 freq -22019 path delay 55357 Mar 16 09:53:55 statler ptp4l[13014]: [6145.398] master offset -18745 s2 freq -22685 path delay 55357 Mar 16 09:53:56 statler ptp4l[13014]: [6146.399] master offset 7707 s2 freq -20033 path delay 55357 Mar 16 09:53:57 statler ptp4l[13014]: [6147.401] master offset 7230 s2 freq -20073 path delay 55459 Mar 16 09:53:58 statler ptp4l[13014]: [6148.401] master offset 7093 s2 freq -20080 path delay 55459 Mar 16 09:53:59 statler ptp4l[13014]: [6149.403] master offset -1826 s2 freq -20973 path delay 55459 Mar 16 09:54:00 statler ptp4l[13014]: [6150.404] master offset 6597 s2 freq -20124 path delay 55459 Mar 16 09:54:01 statler ptp4l[13014]: [6151.405] master offset 5667 s2 freq -20212 path delay 55459 Mar 16 09:54:02 statler ptp4l[13014]: [6152.406] master offset -14483 s2 freq -22241 path delay 55459 > Can you try compiling and installing manually the driver from Intel? I believe that I did try the Intel driver but didn't see any success. I found version 3.3.3 of the driver at [3], followed the instructions in the readme. At the time I was running the 3.10.0-327.10.1 kernel. The timestamp (see below) on e1000e.ko matches up with when I performed the build, and the file size is way bigger (6M as compared to ~780K) for the ko on the older 3.10 and the newer 4.5 kernels. I did an rmmod (which hung my SSH session) I then rebooted the machine (which I assume loaded the new driver). After having done all of that I saw the same clock jumped forward messages, ever growing master offset, and negative path delay. I then moved onto the new kernel. -rw-r--r-- 1 root root 6.0M Mar 16 07:40 /usr/lib/modules/3.10.0-327.10.1.el7.x86_64/updates/drivers/net/ethernet/intel/e1000e/e1000e.ko -rw-r--r--. 1 root root 381K Nov 19 15:52 /usr/lib/modules/3.10.0-327.el7.x86_64/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko -rwxr--r-- 1 root root 377K Mar 14 08:37 /usr/lib/modules/4.5.0-1.el7.elrepo.x86_64/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko [3] https://downloadcenter.intel.com/download/15817 > > William > > -----Original Message----- > From: John Hubbard [mailto:jhu...@no...] > Sent: 16 March 2016 16:39 > To: lin...@li... > Subject: Re: [Linuxptp-users] Need help debugging failed clock synchronization > > On 03/16/2016 03:50 AM, Richard Cochran wrote: >> On Tue, Mar 15, 2016 at 04:14:32PM -0700, John Hubbard wrote: >>> Apologies if this has already been asked and answered. I tried to >>> look for solutions to my problem in the mailing list archive, but >>> when I click the list archive link on the mailman page, I get a >>> sourceforge page telling me Error 403 "Read access required". >> Yes, SF is mostly broken. Please use gmane for the archives. >> >> http://news.gmane.org/gmane.comp.linux.ptp.user >> >> http://news.gmane.org/gmane.comp.linux.ptp.devel > Thanks for the hint. Looking through the archive, it looks like my problem might be similar to Daniel Le's January thread "Master offsets don't converge". However it doesn't look like he ever resolved things, and it also looks like he was using SW time-stamping where as I believe my NIC should be capable of HW time-stamping. > >>> I'm trying to configure a machine running CentOS 7 (3.10 kernel) with >>> an Intel 82574L NIC to use PTP as its time source. >> There are two Linux kernel driver workarounds for that unlucky card: >> >> 5e7ff97004 v3.16-rc1 e1000e: 82574/82583 TimeSync errata for SYSTIM read >> 37b12910dd v4.3-rc1 e1000e: Fix tight loop implementation of systime read algorithm >> >> You should try a newer kernel (4.3+) or use the Intel out of tree >> drivers from SF. > Thanks for the suggestions. I followed the instructions at [1] and I'm now running with a 4.5 kernel. > > [jhubbard@statler:~]$ uname -a > Linux statler 4.5.0-1.el7.elrepo.x86_64 #1 SMP Mon Mar 14 10:24:58 EDT > 2016 x86_64 x86_64 x86_64 GNU/Linux > > I've disabled phc2sys for now. I tried restarting ptp4l and the log [2] still shows the same clock jumped forward errors. > > [2] > [jhubbard@statler:~]$ journalctl -u ptp4l -f > -- Logs begin at Wed 2016-03-16 07:48:05 MST. -- > Mar 16 08:15:33 statler ptp4l[12591]: [242.851] port 0: INITIALIZING to LISTENING on INITIALIZE > Mar 16 08:15:32 statler systemd[1]: Stopping Precision Time Protocol (PTP) service... > Mar 16 08:15:33 statler systemd[1]: Started Precision Time Protocol (PTP) service. > Mar 16 08:15:33 statler systemd[1]: Starting Precision Time Protocol (PTP) service... > Mar 16 08:15:33 statler ptp4l[12591]: [243.204] port 1: new foreign master 000cec.fffe.080c09-1 > Mar 16 08:15:37 statler ptp4l[12591]: [247.209] selected best master clock 000cec.fffe.080c09 > Mar 16 08:15:37 statler ptp4l[12591]: [247.209] port 1: LISTENING to UNCALIBRATED on RS_SLAVE > Mar 16 08:15:37 statler ptp4l[12591]: [247.279] port 1: minimum delay request interval 2^4 > Mar 16 08:15:39 statler ptp4l[12591]: [249.211] master offset -16769399087 s0 freq +23999998 path delay -1116866908 > Mar 16 08:15:40 statler ptp4l[12591]: [250.213] master offset -13924642727 s1 freq +23999999 path delay -1116866908 > Mar 16 08:15:41 statler ptp4l[12591]: [251.214] master offset 2750049109 s2 freq +23999999 path delay -1116866908 > Mar 16 08:15:41 statler ptp4l[12591]: [251.214] port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED > Mar 16 08:15:42 statler ptp4l[12591]: [252.215] clockcheck: clock jumped forward or running faster than expected! > Mar 16 08:15:42 statler ptp4l[12591]: [252.215] master offset 5502378494 s0 freq +23999999 path delay -1116866908 > > Messages continue with alternating "clockcheck: clock jumped" and "master offset" messages. The freq is fixed, the master offset counts slowly upwards, and the path delay remains negative with the occasional small fluctuations. > > [1] > http://linuxg.net/install-kernel-4-x-on-enterprise-linux-7-centos-7-and-rhel-7/ > > -- -john To be or not to be, that is the question 2b || !2b (0b10)*(0b1100010) || !(0b10)*(0b1100010) 0b11000100 || !0b11000100 0b11000100 || 0b00111011 0b11111111 255, that is the answer. |
From: Richard C. <ric...@gm...> - 2016-03-16 19:54:13
|
On Wed, Mar 16, 2016 at 10:20:35AM -0700, John Hubbard wrote: > After changing the 'time_stamping' option in /etc/ptp4l.conf from > hardware to software and restarting ptp4l I now see much better > behavior. Yes, but probably you are disappointed having to forego the HW synchronization performance. At least this test shows that your card most likely has a HW bug. > I believe that I did try the Intel driver but didn't see any success. I > found version 3.3.3 of the driver at [3], followed the instructions in > the readme. At the time I was running the 3.10.0-327.10.1 kernel. The > timestamp (see below) on e1000e.ko matches up with when I performed the > build, and the file size is way bigger (6M as compared to ~780K) for the > ko on the older 3.10 and the newer 4.5 kernels. I did an rmmod (which > hung my SSH session) I then rebooted the machine (which I assume loaded > the new driver). I wouldn't assume that. Either do rmmod/insmod by hand (on the console!) or simply rename or move the original driver before rebooting. Thanks, Richard |
From: John H. <jhu...@no...> - 2016-03-16 20:45:09
|
On 03/16/2016 12:54 PM, Richard Cochran wrote: > On Wed, Mar 16, 2016 at 10:20:35AM -0700, John Hubbard wrote: >> After changing the 'time_stamping' option in /etc/ptp4l.conf from >> hardware to software and restarting ptp4l I now see much better >> behavior. > Yes, but probably you are disappointed having to forego the HW > synchronization performance. At least this test shows that your card > most likely has a HW bug. If possible it would be really nice to get the HW time-stamping working on this system. I can move to another system if needed but getting this working would help me in the short term. (No expansion ports or I'd just pick up another NIC. On a related note do you or anyone else on the list know how well the Intel X540 (10Gb NIC using the ixgbe driver) is supported WRT ptp4l? >> I believe that I did try the Intel driver but didn't see any success. I >> found version 3.3.3 of the driver at [3], followed the instructions in >> the readme. At the time I was running the 3.10.0-327.10.1 kernel. The >> timestamp (see below) on e1000e.ko matches up with when I performed the >> build, and the file size is way bigger (6M as compared to ~780K) for the >> ko on the older 3.10 and the newer 4.5 kernels. I did an rmmod (which >> hung my SSH session) I then rebooted the machine (which I assume loaded >> the new driver). > I wouldn't assume that. Either do rmmod/insmod by hand (on the > console!) or simply rename or move the original driver before > rebooting. OK the machine has got three kernels installed. Here's the e1000e driver version (as reported by modinfo) for each: Kernel 3.10.0-327 e1000e version 3.2.5-k Kernel 3.10.0-327.10.1 e1000e version 3.3.3-NAPI Kernel 4.5.0-1 e1000e version 3.2.6-k Under all three kernels with software time stamping things 'work' but with more jitter than I'd like to see. With hardware time stamping things don't work. Specifically I see clock jumped forward messages and an ever increasing master offset. -- -john To be or not to be, that is the question 2b || !2b (0b10)*(0b1100010) || !(0b10)*(0b1100010) 0b11000100 || !0b11000100 0b11000100 || 0b00111011 0b11111111 255, that is the answer. |
From: Richard C. <ric...@gm...> - 2016-03-16 21:56:55
|
On Wed, Mar 16, 2016 at 01:45:00PM -0700, John Hubbard wrote: > If possible it would be really nice to get the HW time-stamping working on > this system. I can move to another system if needed but getting this > working would help me in the short term. The best advice I know, would be to take the testptp program (from linux/Documentation/ptp) and verify whether the HW clock is behaving or not. For example, set time, get time, get time in loop, set a frequency offset and compare interval with system time, etc. Thanks, Richard |
From: Keller, J. E <jac...@in...> - 2016-03-22 22:36:07
|
Hi John, On Wed, 2016-03-16 at 13:45 -0700, John Hubbard wrote: > On 03/16/2016 12:54 PM, Richard Cochran wrote: > > > > On Wed, Mar 16, 2016 at 10:20:35AM -0700, John Hubbard wrote: > > > > > > After changing the 'time_stamping' option in /etc/ptp4l.conf from > > > hardware to software and restarting ptp4l I now see much better > > > behavior. > > Yes, but probably you are disappointed having to forego the HW > > synchronization performance. At least this test shows that your > > card > > most likely has a HW bug. > If possible it would be really nice to get the HW time-stamping > working > on this system. I can move to another system if needed but getting > this > working would help me in the short term. (No expansion ports or I'd > just pick up another NIC. On a related note do you or anyone else > on > the list know how well the Intel X540 (10Gb NIC using the ixgbe > driver) > is supported WRT ptp4l? > The X540 device should be supported WRT ptp4l, and as far as I know it works quite well. I am sorry for the troubles the e1000e adapter is causing. It is most likely a driver issue. I am not 100% sure who is responsible for that driver now, but I will attempt to determine if the latest errata have been released on SourceForge yet. (It can be slow sometimes) > > > > > > > > I believe that I did try the Intel driver but didn't see any > > > success. I > > > found version 3.3.3 of the driver at [3], followed the > > > instructions in > > > the readme. At the time I was running the 3.10.0-327.10.1 > > > kernel. The > > > timestamp (see below) on e1000e.ko matches up with when I > > > performed the > > > build, and the file size is way bigger (6M as compared to ~780K) > > > for the > > > ko on the older 3.10 and the newer 4.5 kernels. I did an rmmod > > > (which > > > hung my SSH session) I then rebooted the machine (which I assume > > > loaded > > > the new driver). > > I wouldn't assume that. Either do rmmod/insmod by hand (on the > > console!) or simply rename or move the original driver before > > rebooting. > OK the machine has got three kernels installed. Here's the e1000e > driver version (as reported by modinfo) for each: > > Kernel 3.10.0-327 e1000e version 3.2.5-k > Kernel 3.10.0-327.10.1 e1000e version 3.3.3-NAPI > Kernel 4.5.0-1 e1000e version 3.2.6-k > > Under all three kernels with software time stamping things 'work' > but > with more jitter than I'd like to see. With hardware time stamping > things don't work. Specifically I see clock jumped forward messages > and > an ever increasing master offset. > > As Richard suggested, I would use testphc program to debug if you have a weird driver issue or not. It is very likely an issue with the hardware for this part, as there are several errata regarding the SYSTIME clock as Richard noted earlier. Regards, Jake |
From: John H. <jhu...@no...> - 2016-03-23 17:54:55
|
On 03/22/2016 03:35 PM, Keller, Jacob E wrote: > Hi John, > > On Wed, 2016-03-16 at 13:45 -0700, John Hubbard wrote: >> On 03/16/2016 12:54 PM, Richard Cochran wrote: >>> On Wed, Mar 16, 2016 at 10:20:35AM -0700, John Hubbard wrote: >>>> After changing the 'time_stamping' option in /etc/ptp4l.conf from >>>> hardware to software and restarting ptp4l I now see much better >>>> behavior. >>> Yes, but probably you are disappointed having to forego the HW >>> synchronization performance. At least this test shows that your >>> card >>> most likely has a HW bug. >> If possible it would be really nice to get the HW time-stamping >> working >> on this system. I can move to another system if needed but getting >> this >> working would help me in the short term. (No expansion ports or I'd >> just pick up another NIC. On a related note do you or anyone else >> on >> the list know how well the Intel X540 (10Gb NIC using the ixgbe >> driver) >> is supported WRT ptp4l? >> > The X540 device should be supported WRT ptp4l, and as far as I know it > works quite well. I am sorry for the troubles the e1000e adapter is > causing. It is most likely a driver issue. I am not 100% sure who is > responsible for that driver now, but I will attempt to determine if the > latest errata have been released on SourceForge yet. (It can be slow > sometimes) Thanks for the info about the x540. Please let me know if a newer driver for the 82574L ends up on SF. -- -john To be or not to be, that is the question 2b || !2b (0b10)*(0b1100010) || !(0b10)*(0b1100010) 0b11000100 || !0b11000100 0b11000100 || 0b00111011 0b11111111 255, that is the answer. |
From: Keller, J. E <jac...@in...> - 2016-03-23 18:59:22
|
Hi John, It looks like you should have the latest driver (3.3.3) already. If you could isolate the problem using testptp from the Documentation/ptp folder of the kernel tree, using the sourceforge e1000e driver, and show that it is having issues, then we can get that reported to the team that owns e1000e, and hopefully they can determine what needs to be fixed. Thanks, Jake From: John Hubbard [mailto:jhu...@no...] Sent: Wednesday, March 23, 2016 10:55 AM To: Keller, Jacob E <jac...@in...>; ric...@gm... Cc: lin...@li... Subject: Re: [Linuxptp-users] Need help debugging failed clock synchronization On 03/22/2016 03:35 PM, Keller, Jacob E wrote: Hi John, On Wed, 2016-03-16 at 13:45 -0700, John Hubbard wrote: On 03/16/2016 12:54 PM, Richard Cochran wrote: On Wed, Mar 16, 2016 at 10:20:35AM -0700, John Hubbard wrote: After changing the 'time_stamping' option in /etc/ptp4l.conf from hardware to software and restarting ptp4l I now see much better behavior. Yes, but probably you are disappointed having to forego the HW synchronization performance. At least this test shows that your card most likely has a HW bug. If possible it would be really nice to get the HW time-stamping working on this system. I can move to another system if needed but getting this working would help me in the short term. (No expansion ports or I'd just pick up another NIC. On a related note do you or anyone else on the list know how well the Intel X540 (10Gb NIC using the ixgbe driver) is supported WRT ptp4l? The X540 device should be supported WRT ptp4l, and as far as I know it works quite well. I am sorry for the troubles the e1000e adapter is causing. It is most likely a driver issue. I am not 100% sure who is responsible for that driver now, but I will attempt to determine if the latest errata have been released on SourceForge yet. (It can be slow sometimes) Thanks for the info about the x540. Please let me know if a newer driver for the 82574L ends up on SF. -- -john To be or not to be, that is the question 2b || !2b (0b10)*(0b1100010) || !(0b10)*(0b1100010) 0b11000100 || !0b11000100 0b11000100 || 0b00111011 0b11111111 255, that is the answer. |