Thread: Re: [Linuxptp-users] phc offset explodes after a while
PTP IEEE 1588 stack for Linux
Brought to you by:
rcochran
From: Koehrer M. (ETAS/ESW5) <mat...@et...> - 2014-01-30 15:15:19
Attachments:
ptp-extract.pcap
|
Hi Richard, > I have used the very same kernel 3.2.48 without the RT_PREEMPT patch > using > the same configuration. > Also here, I got the same issue after a while (it took more than 30 minutes). > I added some print-outs to the file clock.c in function clock_synchronize(). > Directly after the c->master_offset = tmv_sub(....) call I printed out the > values that are used in the computation. > Below is the related snippet from the logfile. > There is a jump in the value of "origin" that seems to cause the issue. > I was able to catch the issue again. Also I created a tcpdump log in parallel. Here is the log snippet, the tcpdump file is attached. Regards Mathias ---- BEGIN LOG Jan 30 15:57:34 pca ptp4l: [6636.856] ingress:1391093889615119561 origin:1391093889615118874 path_delay:734 c1:0 c2:0 Jan 30 15:57:34 pca ptp4l: [6636.856] master offset -47 s2 freq -6217 path delay 734 Jan 30 15:57:35 pca phc2sys: [6637.403] phc offset 451 s2 freq +29992 delay 6572 Jan 30 15:57:35 pca ptp4l: [6637.856] ingress:1391093890615137959 origin:1391093890615137194 path_delay:734 c1:0 c2:0 Jan 30 15:57:35 pca ptp4l: [6637.856] master offset 31 s2 freq -6153 path delay 734 Jan 30 15:57:36 pca phc2sys: [6638.403] phc offset -2442 s2 freq +27235 delay 6811 Jan 30 15:57:36 pca ptp4l: [6638.856] ingress:1391093891615150863 origin:1391093891615150154 path_delay:734 c1:0 c2:0 Jan 30 15:57:36 pca ptp4l: [6638.856] master offset -25 s2 freq -6200 path delay 734 Jan 30 15:57:37 pca phc2sys: [6639.403] phc offset 10534 s2 freq +39478 delay 6712 Jan 30 15:57:37 pca ptp4l: [6639.855] Offset is 18446673704965374042 Line 1058 Jan 30 15:57:37 pca ptp4l: [6639.855] ingress:1391093892614699018 origin:1391164261358875858 path_delay:734 c1:0 c2:0 Jan 30 15:57:37 pca ptp4l: [6639.855] master offset -70368744177574 s2 freq -599999999 path delay 734 Jan 30 15:57:38 pca phc2sys: [6640.403] phc offset -329001709 s2 freq -500000 delay 6783 Jan 30 15:57:38 pca ptp4l: [6640.855] Offset is 18446673704965374042 Line 1055 Jan 30 15:57:38 pca ptp4l: [6640.855] Offset is 18446673705563630157 Line 1058 Jan 30 15:57:38 pca ptp4l: [6640.855] ingress:1391093894212433325 origin:1391164262358351738 path_delay:3046 c1:0 c2:0 Jan 30 15:57:38 pca ptp4l: [6640.855] master offset -70368145921459 s2 freq -599999999 path delay 3046 Jan 30 15:57:39 pca phc2sys: [6641.404] phc offset -928285857 s2 freq -500000 delay 6749 Jan 30 15:57:39 pca ptp4l: [6641.854] Offset is 18446673705563630157 Line 1055 Jan 30 15:57:39 pca ptp4l: [6641.854] Offset is 18446673706163311668 Line 1058 ---- END LOG |
From: Vick, M. <mat...@in...> - 2014-01-30 21:36:33
|
On 1/30/14, 12:53 AM, "Koehrer Mathias (ETAS/ESW5)" <mat...@et...> wrote: >Hi all, > >I am very new in using linuxptp. >I want to synchronize two PC using a separate point-to-point Ethernet >connection between both PCs. >This Ethernet connection is used only for PTP, any other network traffic >is routed via another NIC. > >Kernel 3.2.48-rt69 (RT_PREEMPT patch). >I use the Intel Gigabit CT Desktop adapter (82574L). >For this is use the latest e1000e driver (Version 2.5.4) and build it >outside the kernel to have the PTP support. >One of the PCs (PCA) runs using 64 bit kernel, the other PC (PCB) runs >using a 32 bit kernel. >linux-ptp version 1.3. > >On PCA I start the following commands: ># ./ptp4l -i eth1 -p /dev/ptp1 -m -H -P >and after a couple of seconds: ># ./phc2sys -s /dev/ptp1 -w -m > >On PCB I do the very same (unless I have to use /dev/ptp0 here). > >This works really fine for a while. However after a while the value of >phc explodes dramatically. >The value will be -35192325800 which looks like a kind of value overrun! >In hex this is 0xFFFFFFF7CE5FB958. >Please see the logfiles below. >What could be the issue for that?!? > >Thanks for any feedback on this! > >Regards > >Mathias Mathias, Thanks for your report on this! We've had a few reports of this now and are investigating it internally. We currently believe we may have a hardware bug on 82574L where some reads of the device clock are corrupted. We are testing a workaround now and will be submitting the patch upstream once it has passed our internal validation. Cheers, Matthew Matthew Vick Linux Development Networking Division Intel Corporation |
From: Koehrer M. (ETAS/ESW5) <mat...@et...> - 2014-01-31 09:07:46
|
Hi all, thanks for the feedback. I did some new tests... > > Your PCA (64 bit kernel) is the slave, and your PCB (32 bit kernel) is > the master. The source of the problem is in the master. > > 1. Maybe there is another program on host PCB that is resetting the > time of the PTP clock? No, I checked already for that. Without running the ptp tools, there is no time jump at all. > 2. What happens when you run with the roles reversed? > Use ptp4l -s on host PCB. The same behaviour happens - now on the other PC. I think this is no wonder as both are using the same type of NIC. > 3. On host PCB, try the following script with nothing else running. > > testptp -s > while [ 1 ]; do > date; > testptp -g; > sleep 1; > done > > Does the sudden jump in time also occur? The testptp program is > under /Documentation/ptp in the Linux kernel sources. OK, I did this using my original setup. PCB is master, PCA is slave. On PCB I added your proposed commands and logged the output via "logger" to the log file to have all results in a single file per PC. ------- BEGIN PCB (Master) -------- Jan 31 09:55:08 pcb testptp: clock time: 1391158543.964709305 or Fri Jan 31 09:55:43 2014 Jan 31 09:55:09 pcb phc2sys: [5455.412] phc offset -1124 s2 freq -2642 delay 4120 Jan 31 09:55:09 pcb date: Fri Jan 31 09:55:09 CET 2014 Jan 31 09:55:09 pcb testptp: clock time: 1391158544.967732260 or Fri Jan 31 09:55:44 2014 Jan 31 09:55:10 pcb phc2sys: [5456.412] phc offset 6967 s2 freq +5112 delay 4120 Jan 31 09:55:10 pcb date: Fri Jan 31 09:55:10 CET 2014 Jan 31 09:55:10 pcb testptp: clock time: 1391158545.970753621 or Fri Jan 31 09:55:45 2014 Jan 31 09:55:11 pcb phc2sys: [5457.413] phc offset 6980 s2 freq +7215 delay 4119 Jan 31 09:55:11 pcb date: Fri Jan 31 09:55:11 CET 2014 Jan 31 09:55:11 pcb testptp: clock time: 1391158546.973770115 or Fri Jan 31 09:55:46 2014 Jan 31 09:55:12 pcb phc2sys: [5458.413] phc offset -620 s2 freq +1709 delay 4119 Jan 31 09:55:12 pcb date: Fri Jan 31 09:55:12 CET 2014 Jan 31 09:55:12 pcb testptp: clock time: 1391158547.976781979 or Fri Jan 31 09:55:47 2014 Jan 31 09:55:13 pcb phc2sys: [5459.413] phc offset 70368744182224 s2 freq +599999999 delay 4119 Jan 31 09:55:13 pcb date: Fri Jan 31 09:55:13 CET 2014 ************ Here is the jump: Jan 31 09:55:13 pcb testptp: clock time: 1391228917.600794255 or Sat Feb 1 05:28:37 2014 Jan 31 09:55:14 pcb phc2sys: [5460.413] phc offset 70368144122243 s2 freq +599999999 delay 1600 Jan 31 09:55:14 pcb date: Fri Jan 31 09:55:14 CET 2014 Jan 31 09:55:14 pcb testptp: clock time: 1391228918.002014685 or Sat Feb 1 05:28:38 2014 Jan 31 09:55:15 pcb phc2sys: [5461.413] phc offset 70367544038218 s2 freq +599999999 delay 1648 Jan 31 09:55:15 pcb date: Fri Jan 31 09:55:15 CET 2014 Jan 31 09:55:15 pcb testptp: clock time: 1391228918.403224428 or Sat Feb 1 05:28:38 2014 Jan 31 09:55:16 pcb phc2sys: [5462.413] phc offset 70366943954855 s2 freq +599999999 delay 1648 Jan 31 09:55:16 pcb date: Fri Jan 31 09:55:16 CET 2014 Jan 31 09:55:16 pcb testptp: clock time: 1391228918.804436187 or Sat Feb 1 05:28:38 2014 ----------- END PCB -------------- ------------- BEGIN PCA (Slave) ---------------- Jan 31 09:55:10 pca ptp4l: [5467.031] ingress:1391158545769528578 origin:1391158545769530044 path_delay:733 c1:0 c2:0 Jan 31 09:55:10 pca ptp4l: [5467.031] master offset -2199 s2 freq -10448 path delay 733 Jan 31 09:55:11 pca phc2sys: [5467.603] phc offset 4115 s2 freq +31537 delay 6748 Jan 31 09:55:11 pca ptp4l: [5468.031] ingress:1391158546769555014 origin:1391158546769547343 path_delay:734 c1:0 c2:0 Jan 31 09:55:11 pca ptp4l: [5468.031] master offset 6937 s2 freq -1972 path delay 734 Jan 31 09:55:12 pca phc2sys: [5468.604] phc offset -2634 s2 freq +26022 delay 6810 Jan 31 09:55:12 pca ptp4l: [5469.031] ingress:1391158547769564771 origin:1391158547769554271 path_delay:736 c1:0 c2:0 Jan 31 09:55:12 pca ptp4l: [5469.031] master offset 9764 s2 freq +2936 path delay 736 Jan 31 09:55:13 pca phc2sys: [5469.604] phc offset 8630 s2 freq +36496 delay 6763 Jan 31 09:55:13 pca ptp4l: [5470.031] ingress:1391158548769579072 origin:1391158548769576093 path_delay:738 c1:0 c2:0 Jan 31 09:55:13 pca ptp4l: [5470.031] master offset 2241 s2 freq -1658 path delay 738 Jan 31 09:55:14 pca phc2sys: [5470.604] phc offset 12870 s2 freq +43325 delay 6555 ********** Here is the jump: Jan 31 09:55:14 pca ptp4l: [5471.031] ingress:1391158549769582167 origin:1391228917916706058 path_delay:3137 c1:0 c2:0 Jan 31 09:55:14 pca ptp4l: [5471.031] master offset -70368147127028 s2 freq -599999999 path delay 3137 Jan 31 09:55:15 pca phc2sys: [5471.604] phc offset -343848810 s2 freq -500000 delay 6105 Jan 31 09:55:15 pca ptp4l: [5472.031] ingress:1391158551368868373 origin:1391228918316714536 path_delay:7912 c1:0 c2:0 Jan 31 09:55:15 pca ptp4l: [5472.031] master offset -70366947854075 s2 freq -599999999 path delay 7912 Jan 31 09:55:16 pca phc2sys: [5472.604] phc offset -943159536 s2 freq -500000 delay 6752 Jan 31 09:55:16 pca ptp4l: [5473.032] ingress:1391158552968869361 origin:1391228918716717414 path_delay:12701 c1:0 c2:0 Jan 31 09:55:16 pca ptp4l: [5473.032] master offset -70365747860754 s2 freq -599999999 path delay 12701 Jan 31 09:55:17 pca phc2sys: [5473.605] phc offset -1542448802 s2 freq -500000 delay 6090 ------------ END PCA --------------- I wonder why the time presented by the testptp program does not fit to the system time. Here is always a delta of about 35 seconds. I hope that helps to identify the issue. Thanks a lot Mathias |
From: Richard C. <ric...@gm...> - 2014-01-31 11:22:29
|
On Fri, Jan 31, 2014 at 09:07:25AM +0000, Koehrer Mathias (ETAS/ESW5) wrote: > > 2. What happens when you run with the roles reversed? > > Use ptp4l -s on host PCB. > The same behaviour happens - now on the other PC. > I think this is no wonder as both are using the same type of NIC. But you said that you are running a different kernel (32/64 bit), and so this is one variable to check. > I wonder why the time presented by the testptp program does not fit to the system time. Here is always a delta of about 35 seconds. This 35 second difference is the current TAI-UTC offset. You should make sure that your PTP clock is initialized with the TAI timescale, if you want to run as a grand master clock. (But this has nothing to do with the time jump that you are seeing.) > I hope that helps to identify the issue. >From what Matt Vick said, it sounds like there is a hardware issue with this card. Thanks, Richard |
From: Koehrer M. (ETAS/ESW5) <mat...@et...> - 2014-01-31 13:18:23
|
Hi Matthew, thanks for the information. > Thanks for your report on this! We've had a few reports of this now and > are investigating it internally. We currently believe we may have a > hardware bug on 82574L where some reads of the device clock are > corrupted. > We are testing a workaround now and will be submitting the patch upstream > once it has passed our internal validation. Is the Intel IGB (i350 controller) also affected or which Intel adapter to you recommend for PTP? We are fairly free in choosing a suitable adapter. Thanks for a short feedback. Best regards Mathias |
From: Ledda W. E. <Wil...@it...> - 2014-01-31 13:48:32
|
I'm working with i350 and the IGB driver without any problem. I know "very well" your problem because I was getting the same and I was wander about the possibility of a driver/HW issue. But after many days of continuous tests and a lot of debug information collected (igress and origin timestamp from ptp4l, tcpdump for the network) I haven't seen any error due to inconsistent read of the HW clock or something like that, but I instead discovered a problem in a TC in our facility. So I feel to say that i350 and the driver are ok. Regards William -----Original Message----- From: Koehrer Mathias (ETAS/ESW5) [mailto:mat...@et...] Sent: 31 January 2014 14:18 To: Vick, Matthew; lin...@li... Cc: Richard Cochran; Ledda William EXT Subject: Re: [Linuxptp-users] phc offset explodes after a while Hi Matthew, thanks for the information. > Thanks for your report on this! We've had a few reports of this now > and are investigating it internally. We currently believe we may have > a hardware bug on 82574L where some reads of the device clock are > corrupted. > We are testing a workaround now and will be submitting the patch > upstream once it has passed our internal validation. Is the Intel IGB (i350 controller) also affected or which Intel adapter to you recommend for PTP? We are fairly free in choosing a suitable adapter. Thanks for a short feedback. Best regards Mathias |
From: Vick, M. <mat...@in...> - 2014-01-31 18:01:15
|
On 1/31/14, 5:18 AM, "Koehrer Mathias (ETAS/ESW5)" <mat...@et...> wrote: >Hi Matthew, > >thanks for the information. > >> Thanks for your report on this! We've had a few reports of this now and >> are investigating it internally. We currently believe we may have a >> hardware bug on 82574L where some reads of the device clock are >> corrupted. >> We are testing a workaround now and will be submitting the patch >>upstream >> once it has passed our internal validation. > >Is the Intel IGB (i350 controller) also affected or which Intel adapter >to you recommend for PTP? >We are fairly free in choosing a suitable adapter. > >Thanks for a short feedback. > >Best regards > >Mathias Mathias, You should be able to continue using the 82574L once we get the patch pushed upstream, which should be relatively soon. As far as other adapters are concerned, the Intel adapter I would recommend for PTP is the I210. Please do not hesitate to let us know if you continue having problems. Cheers, Matthew |
From: Ledda W. E. <Wil...@it...> - 2014-01-30 16:42:59
|
Are you able to find the same jump of origin in the tcpdump (look for the Follow_up messages)? If yes the problem is in your master clock device. William -----Original Message----- From: Koehrer Mathias (ETAS/ESW5) [mailto:mat...@et...] Sent: 30 January 2014 16:10 To: Richard Cochran Cc: lin...@li... Subject: Re: [Linuxptp-users] phc offset explodes after a while Hi Richard, > I have used the very same kernel 3.2.48 without the RT_PREEMPT patch > using the same configuration. > Also here, I got the same issue after a while (it took more than 30 minutes). > I added some print-outs to the file clock.c in function clock_synchronize(). > Directly after the c->master_offset = tmv_sub(....) call I printed out > the values that are used in the computation. > Below is the related snippet from the logfile. > There is a jump in the value of "origin" that seems to cause the issue. > I was able to catch the issue again. Also I created a tcpdump log in parallel. Here is the log snippet, the tcpdump file is attached. Regards Mathias ---- BEGIN LOG Jan 30 15:57:34 pca ptp4l: [6636.856] ingress:1391093889615119561 origin:1391093889615118874 path_delay:734 c1:0 c2:0 Jan 30 15:57:34 pca ptp4l: [6636.856] master offset -47 s2 freq -6217 path delay 734 Jan 30 15:57:35 pca phc2sys: [6637.403] phc offset 451 s2 freq +29992 delay 6572 Jan 30 15:57:35 pca ptp4l: [6637.856] ingress:1391093890615137959 origin:1391093890615137194 path_delay:734 c1:0 c2:0 Jan 30 15:57:35 pca ptp4l: [6637.856] master offset 31 s2 freq -6153 path delay 734 Jan 30 15:57:36 pca phc2sys: [6638.403] phc offset -2442 s2 freq +27235 delay 6811 Jan 30 15:57:36 pca ptp4l: [6638.856] ingress:1391093891615150863 origin:1391093891615150154 path_delay:734 c1:0 c2:0 Jan 30 15:57:36 pca ptp4l: [6638.856] master offset -25 s2 freq -6200 path delay 734 Jan 30 15:57:37 pca phc2sys: [6639.403] phc offset 10534 s2 freq +39478 delay 6712 Jan 30 15:57:37 pca ptp4l: [6639.855] Offset is 18446673704965374042 Line 1058 Jan 30 15:57:37 pca ptp4l: [6639.855] ingress:1391093892614699018 origin:1391164261358875858 path_delay:734 c1:0 c2:0 Jan 30 15:57:37 pca ptp4l: [6639.855] master offset -70368744177574 s2 freq -599999999 path delay 734 Jan 30 15:57:38 pca phc2sys: [6640.403] phc offset -329001709 s2 freq -500000 delay 6783 Jan 30 15:57:38 pca ptp4l: [6640.855] Offset is 18446673704965374042 Line 1055 Jan 30 15:57:38 pca ptp4l: [6640.855] Offset is 18446673705563630157 Line 1058 Jan 30 15:57:38 pca ptp4l: [6640.855] ingress:1391093894212433325 origin:1391164262358351738 path_delay:3046 c1:0 c2:0 Jan 30 15:57:38 pca ptp4l: [6640.855] master offset -70368145921459 s2 freq -599999999 path delay 3046 Jan 30 15:57:39 pca phc2sys: [6641.404] phc offset -928285857 s2 freq -500000 delay 6749 Jan 30 15:57:39 pca ptp4l: [6641.854] Offset is 18446673705563630157 Line 1055 Jan 30 15:57:39 pca ptp4l: [6641.854] Offset is 18446673706163311668 Line 1058 ---- END LOG |
From: Richard C. <ric...@gm...> - 2014-01-30 17:51:42
|
On Thu, Jan 30, 2014 at 04:22:22PM +0000, Ledda William EXT wrote: > Are you able to find the same jump of origin in the tcpdump (look for the Follow_up messages)? > > If yes the problem is in your master clock device. Yes, in the PCAP file there is a jump in the follow up messages. Packet 28 has seconds = 1391093891. Packet 36 has seconds = 1391164261. The difference is 70370 seconds. That seems to be consistent with your other logs. Your PCA (64 bit kernel) is the slave, and your PCB (32 bit kernel) is the master. The source of the problem is in the master. Here are a few questions to answer to get to the bottom of this: 1. Maybe there is another program on host PCB that is resetting the time of the PTP clock? 2. What happens when you run with the roles reversed? Use ptp4l -s on host PCB. 3. On host PCB, try the following script with nothing else running. testptp -s while [ 1 ]; do date; testptp -g; sleep 1; done Does the sudden jump in time also occur? The testptp program is under /Documentation/ptp in the Linux kernel sources. Thanks, Richard |