Thread: [Linuxptp-users] Loss of sync with 25% of CPU load
PTP IEEE 1588 stack for Linux
Brought to you by:
rcochran
From: Diego G. P. <gpr...@un...> - 2021-04-06 10:30:10
|
Hello everyone, I have a query that maybe you can solve. I use a switch as a GM and a board as a slave with ptp4l and phc2sys. They work perfectly with good offsets (-20ns,20ns) but when I force load, just 25%, they lose synchronization. Can you help me? ptp4l[742.892]: rms 4 max 7 freq +16816 +/- 5 delay 235 +/- 0 phc2sys[743.863]: CLOCK_REALTIME phc offset 31 s2 freq +10667 delay 4908 ptp4l[743.893]: rms 3 max 5 freq +16815 +/- 4 delay 235 +/- 0 phc2sys[744.864]: CLOCK_REALTIME phc offset 0 s2 freq +10645 delay 4932 ptp4l[744.893]: rms 3 max 6 freq +16813 +/- 4 delay 234 +/- 0 phc2sys[745.864]: CLOCK_REALTIME phc offset -62 s2 freq +10583 delay 4896 ptp4l[745.894]: rms 3 max 5 freq +16814 +/- 4 delay 234 +/- 0 phc2sys[746.864]: CLOCK_REALTIME phc offset 47 s2 freq +10673 delay 4896 ptp4l[746.895]: rms 3 max 6 freq +16812 +/- 4 delay 234 +/- 0 ptp4l[747.473]: clockcheck: clock jumped backward or running slower than expected! ptp4l[747.473]: port 1 (enp3s0): SLAVE to UNCALIBRATED on SYNCHRONIZATION_FAULT ptp4l[747.473]: port 1 (enp3s0): UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED phc2sys[748.022]: port aabbcc.fffe.00094e-1 changed state phc2sys[748.022]: port aabbcc.fffe.00094e-1 changed state ptp4l[748.438]: port 1 (enp3s0): SLAVE to MASTER on ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES ptp4l[748.438]: selected local clock aabbcc.fffe.00094e as best master phc2sys[748.438]: port aabbcc.fffe.00094e-1 changed state ptp4l[748.438]: port 1 (enp3s0): assuming the grand master role phc2sys[748.438]: reconfiguring after port state change ptp4l[748.438]: clockcheck: clock jumped backward or running slower than expected! phc2sys[748.438]: selecting enp3s0 for synchronization phc2sys[748.438]: selecting CLOCK_REALTIME as the master clock phc2sys[748.438]: enp3s0 sys offset 49 s0 freq +16813 delay 4943 ptp4l[748.645]: clockcheck: clock jumped forward or running faster than expected! ptp4l[749.123]: selected best master clock 00049f.fffe.ef0808 ptp4l[749.123]: port 1 (enp3s0): MASTER to UNCALIBRATED on RS_SLAVE phc2sys[749.438]: port aabbcc.fffe.00094e-1 changed state phc2sys[749.439]: reconfiguring after port state change phc2sys[749.439]: master clock not ready, waiting... ptp4l[749.646]: port 1 (enp3s0): UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED ptp4l[749.772]: rms 3 max 6 freq +16802 +/- 22 delay 234 +/- 0 phc2sys[750.439]: port aabbcc.fffe.00094e-1 changed state phc2sys[750.440]: reconfiguring after port state change phc2sys[750.440]: selecting CLOCK_REALTIME for synchronization phc2sys[750.440]: selecting enp3s0 as the master clock phc2sys[750.440]: CLOCK_REALTIME phc offset -211 s0 freq +10673 delay 4896 ptp4l[750.772]: rms 18 max 24 freq +16796 +/- 13 delay 234 +/- 0 phc2sys[751.440]: CLOCK_REALTIME phc offset -252 s2 freq +10600 delay 4908 ptp4l[751.773]: rms 20 max 26 freq +16820 +/- 6 delay 234 +/- 0 phc2sys[752.850]: CLOCK_REALTIME phc offset -219 s2 freq +10381 delay 4872 ptp4l[752.850]: port 1 (enp3s0): SLAVE to MASTER on ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES ptp4l[752.850]: selected local clock aabbcc.fffe.00094e as best master ptp4l[752.851]: port 1 (enp3s0): assuming the grand master role ptp4l[752.852]: clockcheck: clock jumped backward or running slower than expected! ptp4l[753.023]: clockcheck: clock jumped forward or running faster than expected! ptp4l[753.123]: selected best master clock 00049f.fffe.ef0808 ptp4l[753.124]: port 1 (enp3s0): MASTER to UNCALIBRATED on RS_SLAVE phc2sys[753.850]: port aabbcc.fffe.00094e-1 changed state phc2sys[753.851]: port aabbcc.fffe.00094e-1 changed state phc2sys[753.851]: reconfiguring after port state change phc2sys[753.851]: master clock not ready, waiting... In the log, as you can see, it works well until I run the CPU load. Here is the command I run with stress-ng: stress-ng --cpu 4 --cpu-load 25 -sched fifo --sched-prio 99 --times ptp4l and phc2sys are fifo priority 99 as well. Do you know how to solve it? Ask me whatever you want. Kind regards, Diego |
From: Diego G. P. <gpr...@un...> - 2021-04-07 09:39:15
|
Hello everyone, I would like to know if I can stablish somehow a deadline to the ptp4l protocol to avoid other threads run before it. I think that whereby I can avoid the loss of syncronization. What do you think with respect to it? Thank you, Diego El 06/04/2021 a las 12:29, Diego García Prieto escribió: > Hello everyone, > > I have a query that maybe you can solve. > > I use a switch as a GM and a board as a slave with ptp4l and phc2sys. > They work perfectly with good offsets (-20ns,20ns) but when I force > load, just 25%, they lose synchronization. Can you help me? > > > ptp4l[742.892]: rms 4 max 7 freq +16816 +/- 5 delay 235 +/- 0 > phc2sys[743.863]: CLOCK_REALTIME phc offset 31 s2 freq +10667 > delay 4908 > ptp4l[743.893]: rms 3 max 5 freq +16815 +/- 4 delay 235 +/- 0 > phc2sys[744.864]: CLOCK_REALTIME phc offset 0 s2 freq +10645 > delay 4932 > ptp4l[744.893]: rms 3 max 6 freq +16813 +/- 4 delay 234 +/- 0 > phc2sys[745.864]: CLOCK_REALTIME phc offset -62 s2 freq +10583 > delay 4896 > ptp4l[745.894]: rms 3 max 5 freq +16814 +/- 4 delay 234 +/- 0 > phc2sys[746.864]: CLOCK_REALTIME phc offset 47 s2 freq +10673 > delay 4896 > ptp4l[746.895]: rms 3 max 6 freq +16812 +/- 4 delay 234 +/- 0 > ptp4l[747.473]: clockcheck: clock jumped backward or running slower > than expected! > ptp4l[747.473]: port 1 (enp3s0): SLAVE to UNCALIBRATED on > SYNCHRONIZATION_FAULT > ptp4l[747.473]: port 1 (enp3s0): UNCALIBRATED to SLAVE on > MASTER_CLOCK_SELECTED > phc2sys[748.022]: port aabbcc.fffe.00094e-1 changed state > phc2sys[748.022]: port aabbcc.fffe.00094e-1 changed state > ptp4l[748.438]: port 1 (enp3s0): SLAVE to MASTER on > ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES > ptp4l[748.438]: selected local clock aabbcc.fffe.00094e as best master > phc2sys[748.438]: port aabbcc.fffe.00094e-1 changed state > ptp4l[748.438]: port 1 (enp3s0): assuming the grand master role > phc2sys[748.438]: reconfiguring after port state change > ptp4l[748.438]: clockcheck: clock jumped backward or running slower > than expected! > phc2sys[748.438]: selecting enp3s0 for synchronization > phc2sys[748.438]: selecting CLOCK_REALTIME as the master clock > phc2sys[748.438]: enp3s0 sys offset 49 s0 freq +16813 delay > 4943 > ptp4l[748.645]: clockcheck: clock jumped forward or running faster > than expected! > ptp4l[749.123]: selected best master clock 00049f.fffe.ef0808 > ptp4l[749.123]: port 1 (enp3s0): MASTER to UNCALIBRATED on RS_SLAVE > phc2sys[749.438]: port aabbcc.fffe.00094e-1 changed state > phc2sys[749.439]: reconfiguring after port state change > phc2sys[749.439]: master clock not ready, waiting... > ptp4l[749.646]: port 1 (enp3s0): UNCALIBRATED to SLAVE on > MASTER_CLOCK_SELECTED > ptp4l[749.772]: rms 3 max 6 freq +16802 +/- 22 delay 234 +/- 0 > phc2sys[750.439]: port aabbcc.fffe.00094e-1 changed state > phc2sys[750.440]: reconfiguring after port state change > phc2sys[750.440]: selecting CLOCK_REALTIME for synchronization > phc2sys[750.440]: selecting enp3s0 as the master clock > phc2sys[750.440]: CLOCK_REALTIME phc offset -211 s0 freq +10673 > delay 4896 > ptp4l[750.772]: rms 18 max 24 freq +16796 +/- 13 delay 234 +/- 0 > phc2sys[751.440]: CLOCK_REALTIME phc offset -252 s2 freq +10600 > delay 4908 > ptp4l[751.773]: rms 20 max 26 freq +16820 +/- 6 delay 234 +/- 0 > phc2sys[752.850]: CLOCK_REALTIME phc offset -219 s2 freq +10381 > delay 4872 > ptp4l[752.850]: port 1 (enp3s0): SLAVE to MASTER on > ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES > ptp4l[752.850]: selected local clock aabbcc.fffe.00094e as best master > ptp4l[752.851]: port 1 (enp3s0): assuming the grand master role > ptp4l[752.852]: clockcheck: clock jumped backward or running slower > than expected! > ptp4l[753.023]: clockcheck: clock jumped forward or running faster > than expected! > ptp4l[753.123]: selected best master clock 00049f.fffe.ef0808 > ptp4l[753.124]: port 1 (enp3s0): MASTER to UNCALIBRATED on RS_SLAVE > phc2sys[753.850]: port aabbcc.fffe.00094e-1 changed state > phc2sys[753.851]: port aabbcc.fffe.00094e-1 changed state > phc2sys[753.851]: reconfiguring after port state change > phc2sys[753.851]: master clock not ready, waiting... > > > In the log, as you can see, it works well until I run the CPU load. > Here is the command I run with stress-ng: stress-ng --cpu 4 --cpu-load > 25 -sched fifo --sched-prio 99 --times > > ptp4l and phc2sys are fifo priority 99 as well. > > > Do you know how to solve it? Ask me whatever you want. > > > Kind regards, > > Diego > > > > _______________________________________________ > Linuxptp-users mailing list > Lin...@li... > https://lists.sourceforge.net/lists/listinfo/linuxptp-users |
From: Richard C. <ric...@gm...> - 2021-04-07 15:05:03
|
On Wed, Apr 07, 2021 at 11:38:55AM +0200, Diego García Prieto wrote: > I would like to know if I can stablish somehow a deadline to the ptp4l > protocol to avoid other threads run before it. I think that whereby I can > avoid the loss of syncronization. What do you think with respect to it? IOW, give it higher sched_fifo priority than stress-ng. Also, depending on your network load, watch out for stress-ng starving the networking stack (by keeping ksoftirqd from running). Also, watch out for stress-ng starving kworker threads (if your MAC driver uses kwork to deliver Tx time stamps). HTH, Richard |
From: Diego G. P. <gpr...@un...> - 2021-04-07 16:31:43
|
> IOW, give it higher sched_fifo priority than stress-ng. Done but it keeps similar. ptp4l at 99 and stress-ng at 50 and even at 1 priority > > Also, depending on your network load, watch out for stress-ng starving > the networking stack (by keeping ksoftirqd from running). > > Also, watch out for stress-ng starving kworker threads (if your MAC > driver uses kwork to deliver Tx time stamps). I run it again with ptp4l at priority 99 and stress-ng at 50 and this is the load of ksoftirqd and kworkers. Tasks: 221 total, 3 running, 218 sleeping, 0 stopped, 0 zombie %Cpu(s): 31,7 us, 0,6 sy, 0,0 ni, 67,0 id, 0,7 wa, 0,0 hi, 0,0 si, 0,0 st KiB Mem : 3915948 total, 2980824 free, 372584 used, 562540 buff/cache KiB Swap: 1047548 total, 1047548 free, 0 used. 3333472 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2754 root rt 0 32880 5372 3356 S 30,1 0,1 0:17.03 stress-ng-cpu 2755 root rt 0 32880 5372 3356 S 30,1 0,1 0:17.18 stress-ng-cpu 2756 root rt 0 32880 5372 3356 S 30,1 0,1 0:17.03 stress-ng-cpu 2757 root rt 0 32880 5372 3356 S 30,1 0,1 0:17.18 stress-ng-cpu 2177 nodo1 20 0 577400 33168 26052 S 1,7 0,8 0:02.24 gnome-panel 898 root 20 0 391732 69436 48052 S 1,3 1,8 0:18.87 Xorg 2483 nodo1 20 0 512104 37008 27552 S 1,0 0,9 0:11.07 gnome-terminal- 2173 nodo1 20 0 517256 37332 23772 S 0,7 1,0 0:08.11 compiz 2751 nodo1 20 0 43152 3964 3316 R 0,7 0,1 0:00.50 top 3 root 20 0 0 0 0 S 0,3 0,0 0:00.78 ksoftirqd/0 21 root 20 0 0 0 0 S 0,3 0,0 0:00.67 ksoftirqd/1 1901 nodo1 20 0 43428 3876 2780 S 0,3 0,1 0:00.64 dbus-daemon 1 root 20 0 119924 6024 3996 S 0,0 0,2 0:04.38 systemd 2 root 20 0 0 0 0 S 0,0 0,0 0:00.01 kthreadd 4 root -2 0 0 0 0 S 0,0 0,0 0:00.35 ktimersoftd/0 6 root 0 -20 0 0 0 S 0,0 0,0 0:00.00 kworker/0:0H 8 root 20 0 0 0 0 S 0,0 0,0 0:00.96 rcu_preempt 9 root 20 0 0 0 0 S 0,0 0,0 0:00.00 rcu_sched 10 root 20 0 0 0 0 S 0,0 0,0 0:00.25 rcuc/0 11 root 20 0 0 0 0 S 0,0 0,0 0:00.00 kswork 12 root rt 0 0 0 0 S 0,0 0,0 0:00.00 posixcputmr/0 13 root rt 0 0 0 0 S 0,0 0,0 0:00.00 migration/0 14 root rt 0 0 0 0 S 0,0 0,0 0:00.03 watchdog/0 15 root 20 0 0 0 0 S 0,0 0,0 0:00.00 cpuhp/0 16 root 20 0 0 0 0 S 0,0 0,0 0:00.00 cpuhp/1 17 root rt 0 0 0 0 S 0,0 0,0 0:00.03 watchdog/1 18 root rt 0 0 0 0 S 0,0 0,0 0:00.00 migration/1 19 root 20 0 0 0 0 S 0,0 0,0 0:00.25 rcuc/1 20 root -2 0 0 0 0 S 0,0 0,0 0:00.42 ktimersoftd/1 23 root 0 -20 0 0 0 S 0,0 0,0 0:00.00 kworker/1:0H 24 root rt 0 0 0 0 S 0,0 0,0 0:00.00 posixcputmr/1 25 root 20 0 0 0 0 S 0,0 0,0 0:00.00 cpuhp/2 26 root rt 0 0 0 0 S 0,0 0,0 0:00.04 watchdog/2 27 root rt 0 0 0 0 S 0,0 0,0 0:00.00 migration/2 28 root 20 0 0 0 0 R 0,0 0,0 0:00.51 rcuc/2 29 root -2 0 0 0 0 S 0,0 0,0 0:00.43 ktimersoftd/2 30 root 20 0 0 0 0 R 0,0 0,0 0:01.50 ksoftirqd/2 32 root 0 -20 0 0 0 S 0,0 0,0 0:00.00 kworker/2:0H 33 root rt 0 0 0 0 S 0,0 0,0 0:00.00 posixcputmr/2 34 root 20 0 0 0 0 S 0,0 0,0 0:00.00 cpuhp/3 35 root rt 0 0 0 0 S 0,0 0,0 0:00.04 watchdog/3 36 root rt 0 0 0 0 S 0,0 0,0 0:00.00 migration/3 37 root 20 0 0 0 0 S 0,0 0,0 0:00.53 rcuc/3 38 root -2 0 0 0 0 S 0,0 0,0 0:01.13 ktimersoftd/3 39 root 20 0 0 0 0 S 0,0 0,0 0:01.31 ksoftirqd/3 41 root 0 -20 0 0 0 S 0,0 0,0 0:00.00 kworker/3:0H 42 root rt 0 0 0 0 S 0,0 0,0 0:00.00 posixcputmr/3 Thank you, Diego |
From: Diego G. P. <gpr...@un...> - 2021-04-12 11:36:43
|
I run it by changing the priorities to allow be higher the ptp4l than the stress-ng but it still remains the same. What could I do to solve that loss of sync when I apply 25% of CPU load? Thank you in advance for your responses, Diego El 07/04/2021 a las 18:31, Diego García Prieto escribió: >> IOW, give it higher sched_fifo priority than stress-ng. > Done but it keeps similar. ptp4l at 99 and stress-ng at 50 and even at > 1 priority >> >> Also, depending on your network load, watch out for stress-ng starving >> the networking stack (by keeping ksoftirqd from running). >> >> Also, watch out for stress-ng starving kworker threads (if your MAC >> driver uses kwork to deliver Tx time stamps). > > I run it again with ptp4l at priority 99 and stress-ng at 50 and this > is the load of ksoftirqd and kworkers. > > Tasks: 221 total, 3 running, 218 sleeping, 0 stopped, 0 zombie > %Cpu(s): 31,7 us, 0,6 sy, 0,0 ni, 67,0 id, 0,7 wa, 0,0 hi, 0,0 > si, 0,0 st > KiB Mem : 3915948 total, 2980824 free, 372584 used, 562540 > buff/cache > KiB Swap: 1047548 total, 1047548 free, 0 used. 3333472 avail > Mem > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 2754 root rt 0 32880 5372 3356 S 30,1 0,1 0:17.03 > stress-ng-cpu > 2755 root rt 0 32880 5372 3356 S 30,1 0,1 0:17.18 > stress-ng-cpu > 2756 root rt 0 32880 5372 3356 S 30,1 0,1 0:17.03 > stress-ng-cpu > 2757 root rt 0 32880 5372 3356 S 30,1 0,1 0:17.18 > stress-ng-cpu > 2177 nodo1 20 0 577400 33168 26052 S 1,7 0,8 0:02.24 > gnome-panel > 898 root 20 0 391732 69436 48052 S 1,3 1,8 0:18.87 Xorg > 2483 nodo1 20 0 512104 37008 27552 S 1,0 0,9 0:11.07 > gnome-terminal- > 2173 nodo1 20 0 517256 37332 23772 S 0,7 1,0 0:08.11 compiz > 2751 nodo1 20 0 43152 3964 3316 R 0,7 0,1 0:00.50 top > 3 root 20 0 0 0 0 S 0,3 0,0 0:00.78 > ksoftirqd/0 > 21 root 20 0 0 0 0 S 0,3 0,0 0:00.67 > ksoftirqd/1 > 1901 nodo1 20 0 43428 3876 2780 S 0,3 0,1 0:00.64 > dbus-daemon > 1 root 20 0 119924 6024 3996 S 0,0 0,2 0:04.38 systemd > 2 root 20 0 0 0 0 S 0,0 0,0 0:00.01 > kthreadd > 4 root -2 0 0 0 0 S 0,0 0,0 0:00.35 > ktimersoftd/0 > 6 root 0 -20 0 0 0 S 0,0 0,0 0:00.00 > kworker/0:0H > 8 root 20 0 0 0 0 S 0,0 0,0 0:00.96 > rcu_preempt > 9 root 20 0 0 0 0 S 0,0 0,0 0:00.00 > rcu_sched > 10 root 20 0 0 0 0 S 0,0 0,0 0:00.25 rcuc/0 > 11 root 20 0 0 0 0 S 0,0 0,0 0:00.00 kswork > 12 root rt 0 0 0 0 S 0,0 0,0 0:00.00 > posixcputmr/0 > 13 root rt 0 0 0 0 S 0,0 0,0 0:00.00 > migration/0 > 14 root rt 0 0 0 0 S 0,0 0,0 0:00.03 > watchdog/0 > 15 root 20 0 0 0 0 S 0,0 0,0 0:00.00 cpuhp/0 > 16 root 20 0 0 0 0 S 0,0 0,0 0:00.00 cpuhp/1 > 17 root rt 0 0 0 0 S 0,0 0,0 0:00.03 > watchdog/1 > 18 root rt 0 0 0 0 S 0,0 0,0 0:00.00 > migration/1 > 19 root 20 0 0 0 0 S 0,0 0,0 0:00.25 rcuc/1 > 20 root -2 0 0 0 0 S 0,0 0,0 0:00.42 > ktimersoftd/1 > 23 root 0 -20 0 0 0 S 0,0 0,0 0:00.00 > kworker/1:0H > 24 root rt 0 0 0 0 S 0,0 0,0 0:00.00 > posixcputmr/1 > 25 root 20 0 0 0 0 S 0,0 0,0 0:00.00 cpuhp/2 > 26 root rt 0 0 0 0 S 0,0 0,0 0:00.04 > watchdog/2 > 27 root rt 0 0 0 0 S 0,0 0,0 0:00.00 > migration/2 > 28 root 20 0 0 0 0 R 0,0 0,0 0:00.51 rcuc/2 > 29 root -2 0 0 0 0 S 0,0 0,0 0:00.43 > ktimersoftd/2 > 30 root 20 0 0 0 0 R 0,0 0,0 0:01.50 > ksoftirqd/2 > 32 root 0 -20 0 0 0 S 0,0 0,0 0:00.00 > kworker/2:0H > 33 root rt 0 0 0 0 S 0,0 0,0 0:00.00 > posixcputmr/2 > 34 root 20 0 0 0 0 S 0,0 0,0 0:00.00 cpuhp/3 > 35 root rt 0 0 0 0 S 0,0 0,0 0:00.04 > watchdog/3 > 36 root rt 0 0 0 0 S 0,0 0,0 0:00.00 > migration/3 > 37 root 20 0 0 0 0 S 0,0 0,0 0:00.53 rcuc/3 > 38 root -2 0 0 0 0 S 0,0 0,0 0:01.13 > ktimersoftd/3 > 39 root 20 0 0 0 0 S 0,0 0,0 0:01.31 > ksoftirqd/3 > 41 root 0 -20 0 0 0 S 0,0 0,0 0:00.00 > kworker/3:0H > 42 root rt 0 0 0 0 S 0,0 0,0 0:00.00 > posixcputmr/3 > > > Thank you, > > Diego > > > > _______________________________________________ > Linuxptp-users mailing list > Lin...@li... > https://lists.sourceforge.net/lists/listinfo/linuxptp-users |
From: Diego G. P. <gpr...@un...> - 2021-04-12 17:43:31
|
Might it be fixed by disabling the sanity_freq_limit (--sanity_freq_limit 0) to avoid the message "clockcheck: ...jumped or slower than expected!"? Diego El 12/04/2021 a las 13:36, Diego García Prieto escribió: > I run it by changing the priorities to allow be higher the ptp4l than > the stress-ng but it still remains the same. What could I do to solve > that loss of sync when I apply 25% of CPU load? > > > Thank you in advance for your responses, > > Diego > > > El 07/04/2021 a las 18:31, Diego García Prieto escribió: >>> IOW, give it higher sched_fifo priority than stress-ng. >> Done but it keeps similar. ptp4l at 99 and stress-ng at 50 and even >> at 1 priority >>> >>> Also, depending on your network load, watch out for stress-ng starving >>> the networking stack (by keeping ksoftirqd from running). >>> >>> Also, watch out for stress-ng starving kworker threads (if your MAC >>> driver uses kwork to deliver Tx time stamps). >> >> I run it again with ptp4l at priority 99 and stress-ng at 50 and this >> is the load of ksoftirqd and kworkers. >> >> Tasks: 221 total, 3 running, 218 sleeping, 0 stopped, 0 zombie >> %Cpu(s): 31,7 us, 0,6 sy, 0,0 ni, 67,0 id, 0,7 wa, 0,0 hi, 0,0 >> si, 0,0 st >> KiB Mem : 3915948 total, 2980824 free, 372584 used, 562540 >> buff/cache >> KiB Swap: 1047548 total, 1047548 free, 0 used. 3333472 >> avail Mem >> >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >> 2754 root rt 0 32880 5372 3356 S 30,1 0,1 0:17.03 >> stress-ng-cpu >> 2755 root rt 0 32880 5372 3356 S 30,1 0,1 0:17.18 >> stress-ng-cpu >> 2756 root rt 0 32880 5372 3356 S 30,1 0,1 0:17.03 >> stress-ng-cpu >> 2757 root rt 0 32880 5372 3356 S 30,1 0,1 0:17.18 >> stress-ng-cpu >> 2177 nodo1 20 0 577400 33168 26052 S 1,7 0,8 0:02.24 >> gnome-panel >> 898 root 20 0 391732 69436 48052 S 1,3 1,8 0:18.87 Xorg >> 2483 nodo1 20 0 512104 37008 27552 S 1,0 0,9 0:11.07 >> gnome-terminal- >> 2173 nodo1 20 0 517256 37332 23772 S 0,7 1,0 0:08.11 compiz >> 2751 nodo1 20 0 43152 3964 3316 R 0,7 0,1 0:00.50 top >> 3 root 20 0 0 0 0 S 0,3 0,0 0:00.78 >> ksoftirqd/0 >> 21 root 20 0 0 0 0 S 0,3 0,0 0:00.67 >> ksoftirqd/1 >> 1901 nodo1 20 0 43428 3876 2780 S 0,3 0,1 0:00.64 >> dbus-daemon >> 1 root 20 0 119924 6024 3996 S 0,0 0,2 0:04.38 >> systemd >> 2 root 20 0 0 0 0 S 0,0 0,0 0:00.01 >> kthreadd >> 4 root -2 0 0 0 0 S 0,0 0,0 0:00.35 >> ktimersoftd/0 >> 6 root 0 -20 0 0 0 S 0,0 0,0 0:00.00 >> kworker/0:0H >> 8 root 20 0 0 0 0 S 0,0 0,0 0:00.96 >> rcu_preempt >> 9 root 20 0 0 0 0 S 0,0 0,0 0:00.00 >> rcu_sched >> 10 root 20 0 0 0 0 S 0,0 0,0 0:00.25 rcuc/0 >> 11 root 20 0 0 0 0 S 0,0 0,0 0:00.00 kswork >> 12 root rt 0 0 0 0 S 0,0 0,0 0:00.00 >> posixcputmr/0 >> 13 root rt 0 0 0 0 S 0,0 0,0 0:00.00 >> migration/0 >> 14 root rt 0 0 0 0 S 0,0 0,0 0:00.03 >> watchdog/0 >> 15 root 20 0 0 0 0 S 0,0 0,0 0:00.00 >> cpuhp/0 >> 16 root 20 0 0 0 0 S 0,0 0,0 0:00.00 >> cpuhp/1 >> 17 root rt 0 0 0 0 S 0,0 0,0 0:00.03 >> watchdog/1 >> 18 root rt 0 0 0 0 S 0,0 0,0 0:00.00 >> migration/1 >> 19 root 20 0 0 0 0 S 0,0 0,0 0:00.25 rcuc/1 >> 20 root -2 0 0 0 0 S 0,0 0,0 0:00.42 >> ktimersoftd/1 >> 23 root 0 -20 0 0 0 S 0,0 0,0 0:00.00 >> kworker/1:0H >> 24 root rt 0 0 0 0 S 0,0 0,0 0:00.00 >> posixcputmr/1 >> 25 root 20 0 0 0 0 S 0,0 0,0 0:00.00 >> cpuhp/2 >> 26 root rt 0 0 0 0 S 0,0 0,0 0:00.04 >> watchdog/2 >> 27 root rt 0 0 0 0 S 0,0 0,0 0:00.00 >> migration/2 >> 28 root 20 0 0 0 0 R 0,0 0,0 0:00.51 rcuc/2 >> 29 root -2 0 0 0 0 S 0,0 0,0 0:00.43 >> ktimersoftd/2 >> 30 root 20 0 0 0 0 R 0,0 0,0 0:01.50 >> ksoftirqd/2 >> 32 root 0 -20 0 0 0 S 0,0 0,0 0:00.00 >> kworker/2:0H >> 33 root rt 0 0 0 0 S 0,0 0,0 0:00.00 >> posixcputmr/2 >> 34 root 20 0 0 0 0 S 0,0 0,0 0:00.00 >> cpuhp/3 >> 35 root rt 0 0 0 0 S 0,0 0,0 0:00.04 >> watchdog/3 >> 36 root rt 0 0 0 0 S 0,0 0,0 0:00.00 >> migration/3 >> 37 root 20 0 0 0 0 S 0,0 0,0 0:00.53 rcuc/3 >> 38 root -2 0 0 0 0 S 0,0 0,0 0:01.13 >> ktimersoftd/3 >> 39 root 20 0 0 0 0 S 0,0 0,0 0:01.31 >> ksoftirqd/3 >> 41 root 0 -20 0 0 0 S 0,0 0,0 0:00.00 >> kworker/3:0H >> 42 root rt 0 0 0 0 S 0,0 0,0 0:00.00 >> posixcputmr/3 >> >> >> Thank you, >> >> Diego >> >> >> >> _______________________________________________ >> Linuxptp-users mailing list >> Lin...@li... >> https://lists.sourceforge.net/lists/listinfo/linuxptp-users |
From: Miroslav L. <mli...@re...> - 2021-04-13 08:04:19
|
On Mon, Apr 12, 2021 at 04:59:06PM +0200, Diego García Prieto wrote: > Might it be fixed by disabling the sanity_freq_limit (--sanity_freq_limit 0) > to avoid the message "clockcheck: ...jumped or slower than expected!"? I think that just hides the underlying issue. I don't see how CPU load could trigger those warnings. I suspect it's a HW/driver specific issue. Do you have any details on that? -- Miroslav Lichvar |
From: Diego G. P. <gpr...@un...> - 2021-04-13 08:51:50
|
The driver of the network card is the intel_pstate. Is this what you refer to? Thank you for your help, Diego El 13/04/2021 a las 10:04, Miroslav Lichvar escribió: > On Mon, Apr 12, 2021 at 04:59:06PM +0200, Diego García Prieto wrote: >> Might it be fixed by disabling the sanity_freq_limit (--sanity_freq_limit 0) >> to avoid the message "clockcheck: ...jumped or slower than expected!"? > I think that just hides the underlying issue. I don't see how CPU load > could trigger those warnings. I suspect it's a HW/driver specific > issue. Do you have any details on that? > |
From: Diego G. P. <gpr...@un...> - 2021-04-13 08:57:01
|
I am running ptp4l an dphc2sys with priority fifo 99 as well as the stress-ng. Stress-ng load the CPU with 4 threads at 25% of load (the board has 4 cores). I do not understand why kworkers have priority 20 by default. I am trying to rise them to 99 for avoid interferences by the load. Is this correct or I am focusing on useless stuff? Diego El 13/04/2021 a las 10:51, Diego García Prieto escribió: > The driver of the network card is the intel_pstate. Is this what you > refer to? > > Thank you for your help, > > Diego > > > El 13/04/2021 a las 10:04, Miroslav Lichvar escribió: >> On Mon, Apr 12, 2021 at 04:59:06PM +0200, Diego García Prieto wrote: >>> Might it be fixed by disabling the sanity_freq_limit >>> (--sanity_freq_limit 0) >>> to avoid the message "clockcheck: ...jumped or slower than expected!"? >> I think that just hides the underlying issue. I don't see how CPU load >> could trigger those warnings. I suspect it's a HW/driver specific >> issue. Do you have any details on that? >> > > > _______________________________________________ > Linuxptp-users mailing list > Lin...@li... > https://lists.sourceforge.net/lists/listinfo/linuxptp-users |
From: Miroslav L. <mli...@re...> - 2021-04-13 09:20:07
|
On Tue, Apr 13, 2021 at 10:51:36AM +0200, Diego García Prieto wrote: > The driver of the network card is the intel_pstate. Is this what you refer > to? I think intel_pstate is about CPU power states. That might be related if it is an issue with the TSC clocksource, but I was interested in the kernel version, NIC and its driver. -- Miroslav Lichvar |
From: Diego G. P. <gpr...@un...> - 2021-04-13 09:36:09
|
Alright. Kernel version: Linux 4.9.18-rt14-rt14 (a preempt-RT patch) NIC: I210 Gigabit Network Connection - Vendor: Intel corporation - Version: 03 Driver: igb Ask me more if you want. Diego El 13/04/2021 a las 11:19, Miroslav Lichvar escribió: > On Tue, Apr 13, 2021 at 10:51:36AM +0200, Diego García Prieto wrote: >> The driver of the network card is the intel_pstate. Is this what you refer >> to? > I think intel_pstate is about CPU power states. That might be related > if it is an issue with the TSC clocksource, but I was interested in > the kernel version, NIC and its driver. > |
From: Miroslav L. <mli...@re...> - 2021-04-13 10:00:19
|
On Tue, Apr 13, 2021 at 11:36:00AM +0200, Diego García Prieto wrote: > Kernel version: Linux 4.9.18-rt14-rt14 (a preempt-RT patch) I don't know much about RT kernels. > NIC: I210 Gigabit Network Connection - Vendor: Intel corporation - Version: > 03 > > Driver: igb That NIC and driver are solid in my experience, so I'd suspect an issue with the system clock. Check current and available clocksources: cat /sys/devices/system/clocksource/clocksource0/current_clocksource cat /sys/devices/system/clocksource/clocksource0/available_clocksource and try switching to a different clocksource, e.g. echo hpet > /sys/devices/system/clocksource/clocksource0/current_clocksource -- Miroslav Lichvar |
From: Diego G. P. <gpr...@un...> - 2021-04-14 11:44:30
|
Hello Miroslav, > That NIC and driver are solid in my experience, so I'd suspect an > issue with the system clock. > > Check current and available clocksources: > cat /sys/devices/system/clocksource/clocksource0/current_clocksource > cat /sys/devices/system/clocksource/clocksource0/available_clocksource > > and try switching to a different clocksource, e.g. > > echo hpet > /sys/devices/system/clocksource/clocksource0/current_clocksource The available clocks in such path were tsc itself and acpi_pm. I do not jave "hpet". I changed it for acpi_pm and the results are equal. It appears the message "clockcheck: clock jumped backward or runnin slower than expected!". Then the slave port change from slave to uncalibrated. It tries to synchronize to the GM a little more and then it converts into master. I tried the test with a normal kernel without Preemp_RT patch and the same results are. Diego |
From: Diego G. P. <gpr...@un...> - 2021-04-14 11:39:33
|
Hello Miroslav, > That NIC and driver are solid in my experience, so I'd suspect an > issue with the system clock. > > Check current and available clocksources: > cat /sys/devices/system/clocksource/clocksource0/current_clocksource > cat /sys/devices/system/clocksource/clocksource0/available_clocksource > > and try switching to a different clocksource, e.g. > > echo hpet > > /sys/devices/system/clocksource/clocksource0/current_clocksource The available clocks in such path were tsc itself and acpi_pm. I do not jave "hpet". I changed it for acpi_pm and the results are equal. It appears the message "clockcheck: clock jumped backward or runnin slower than expected!". Then the slave port change from slave to uncalibrated. It tries to synchronize to the GM a little more and then it converts into master. I tried the test with a normal kernel without Preemp_RT patch and the same results are. Diego |
From: Diego G. P. <gpr...@un...> - 2021-04-21 16:25:22
|
Hello, It does not work. Done: stress-ng (fifo 50), ptp4l (fifo 99), phc2sys(fifo 99), kworker of the cores where ptp4l and phc2sys run (fifo 99). it still breaks the ptp4l synchronization when stress-ng boots up. I do not believe that it has not happend to others. Some new advice? Thank you for reading, Diego El 07/04/2021 a las 17:04, Richard Cochran escribió: > On Wed, Apr 07, 2021 at 11:38:55AM +0200, Diego García Prieto wrote: > >> I would like to know if I can stablish somehow a deadline to the ptp4l >> protocol to avoid other threads run before it. I think that whereby I can >> avoid the loss of syncronization. What do you think with respect to it? > IOW, give it higher sched_fifo priority than stress-ng. > > Also, depending on your network load, watch out for stress-ng starving > the networking stack (by keeping ksoftirqd from running). > > Also, watch out for stress-ng starving kworker threads (if your MAC > driver uses kwork to deliver Tx time stamps). > > HTH, > Richard |
From: Keller, J. E <jac...@in...> - 2021-04-21 21:55:51
|
> -----Original Message----- > From: Diego García Prieto <gpr...@un...> > Sent: Wednesday, April 21, 2021 9:25 AM > To: Richard Cochran <ric...@gm...> > Cc: lin...@li... > Subject: Re: [Linuxptp-users] Loss of sync with 25% of CPU load > > Hello, > > It does not work. Done: stress-ng (fifo 50), ptp4l (fifo 99), > phc2sys(fifo 99), kworker of the cores where ptp4l and phc2sys run (fifo > 99). it still breaks the ptp4l synchronization when stress-ng boots up. > I do not believe that it has not happend to others. > > Some new advice? > > You mentioned you were using igb. I believe that driver still relies on a work queue task to handle the Tx timestamps, as well as overflow check. If your device needs the overflow check, and it gets skipped that could result in very wild incorrect timestamping results. However, I do not believe that is the case with the i210 so I think that can be ruled out. I am not sure what else could be causing clock instability or variable delay in the packet transmit and receive handling within the hardware to the point that it impacts the PTP calculation.. Thanks, Jake > Thank you for reading, > > Diego > > El 07/04/2021 a las 17:04, Richard Cochran escribió: > > On Wed, Apr 07, 2021 at 11:38:55AM +0200, Diego García Prieto wrote: > > > >> I would like to know if I can stablish somehow a deadline to the ptp4l > >> protocol to avoid other threads run before it. I think that whereby I can > >> avoid the loss of syncronization. What do you think with respect to it? > > IOW, give it higher sched_fifo priority than stress-ng. > > > > Also, depending on your network load, watch out for stress-ng starving > > the networking stack (by keeping ksoftirqd from running). > > > > Also, watch out for stress-ng starving kworker threads (if your MAC > > driver uses kwork to deliver Tx time stamps). > > > > HTH, > > Richard > > > _______________________________________________ > Linuxptp-users mailing list > Lin...@li... > https://lists.sourceforge.net/lists/listinfo/linuxptp-users |
From: Richard C. <ric...@gm...> - 2021-04-22 00:03:23
|
On Wed, Apr 21, 2021 at 09:55:31PM +0000, Keller, Jacob E wrote: > You mentioned you were using igb. I believe that driver still relies > on a work queue task to handle the Tx timestamps, as well as > overflow check. Right, and work has no priority at all in the kernel. This could be improved at the driver level by using the PHC kthread, which then could be given priority administratively. Thanks, Richard |
From: Diego G. P. <gpr...@un...> - 2021-04-22 11:19:07
|
> Right, and work has no priority at all in the kernel. This could be > improved at the driver level by using the PHC kthread, which then > could be given priority administratively. If you say to change the kworkers priority of the same core where ptp4l and phc2sys are, I did it via "chrt -f -p 99 [pid of the kworker]" and it happened the same. That is why I ask for an explanation with other words because it is not the same I think. Diego G. |
From: Diego G. P. <gpr...@un...> - 2021-04-22 09:56:08
|
> You mentioned you were using igb. I believe that driver still relies on a work queue task to handle the Tx timestamps, as well as overflow check. > > If your device needs the overflow check, and it gets skipped that could result in very wild incorrect timestamping results. However, I do not believe that is the case with the i210 so I think that can be ruled out. > > I am not sure what else could be causing clock instability or variable delay in the packet transmit and receive handling within the hardware to the point that it impacts the PTP calculation.. > To be exact my network driver is "igb" whose version is 5.4.0. I saw that there is a new version (5.5.2). Is it convenient to update the version or I will not notice any change? Everything is OK when I run ptp4l and phc2sys, under 100ns for both, that is totally synchronized. But at the moment I run 4 threads of stress-ng by adding CPU load, suddenly I lose sync. I have to say that if I force ptp4l and phc2sys to run in one core and the threads in other cores, everything is OK, but the case is that I want to simulate an environment where other real-time is sharing cores with ptp4l-phc2sys. Do you know what I mean? Thank you very much for your hel, Jacob. I appreciate it. Do not hesitate to reply again. Diego G. |
From: Diego G. P. <gpr...@un...> - 2021-04-22 09:58:24
|
Hello Richard, El 22/04/2021 a las 2:03, Richard Cochran escribió: > On Wed, Apr 21, 2021 at 09:55:31PM +0000, Keller, Jacob E wrote: > >> You mentioned you were using igb. I believe that driver still relies >> on a work queue task to handle the Tx timestamps, as well as >> overflow check. > Right, and work has no priority at all in the kernel. This could be > improved at the driver level by using the PHC kthread, which then > could be given priority administratively. Could you explain it in other words? Do you see something to improve? Do not hesitate to reply as many times as you want. Thank you very much for your help. Diego G. |
From: Diego G. P. <gpr...@un...> - 2021-04-22 10:26:44
|
One thing. Try to load the CPU with stress-ng in your systems and check what happens. This is the stress-ng command: "sudo stress-ng --cpu 4 --cpu-load 25 --sched fifo --sched-prio 50 --times" This command generates 4 threads (because I want to add load to my 4 cores) with 25% of load each. It use a FIFO scheduler with priority 50. Remember that my ptp4l and phc2sys processes are with priority FIFO 99. Notice that the issue rises up when one stress-ng thread shares the same core than ptp4l. Besides, this happens when the scheduler is Round Robin or FIFO. Without any of them, there is no issues. I ask this to check is this also happens to all of you. If you need more information about this issue, just ask me. Thank you, Diego G. El 22/04/2021 a las 11:58, Diego García Prieto escribió: > Hello Richard, > > El 22/04/2021 a las 2:03, Richard Cochran escribió: >> On Wed, Apr 21, 2021 at 09:55:31PM +0000, Keller, Jacob E wrote: >> >>> You mentioned you were using igb. I believe that driver still relies >>> on a work queue task to handle the Tx timestamps, as well as >>> overflow check. >> Right, and work has no priority at all in the kernel. This could be >> improved at the driver level by using the PHC kthread, which then >> could be given priority administratively. > > Could you explain it in other words? Do you see something to improve? > > Do not hesitate to reply as many times as you want. > > Thank you very much for your help. > > > Diego G. > > > > _______________________________________________ > Linuxptp-users mailing list > Lin...@li... > https://lists.sourceforge.net/lists/listinfo/linuxptp-users |
From: Richard C. <ric...@gm...> - 2021-04-22 15:13:02
|
On Thu, Apr 22, 2021 at 11:55:57AM +0200, Diego García Prieto wrote: > adding CPU load, suddenly I lose sync. I have to say that if I force ptp4l > and phc2sys to run in one core and the threads in other cores, everything is > OK, Ah, now you are getting somewhere! You are attempting to tune your system for your own performance requirements. Sorry to say it, but there is no silver bullet, no magic wand, and we on the list don't have a crystal ball. Tuning your system is work that you have to do. BTW, if you don't have it already, here is a book that might interest you. Brendan Gregg, Systems Performance: Enterprise and the Cloud, 2nd Edition http://www.brendangregg.com/systems-performance-2nd-edition-book.html It won't solve your problem directly, but it might help along the way. Good luck, Richard |
From: Diego G. P. <gpr...@un...> - 2021-04-25 08:38:17
|
Sorry but I only have the ptp4l running and I want to add stress. I have not tuned it yet. I supposed that someone had done the same (ptp4l + stress) and experienced the same problems like me. Thank you for your help. I apreciate it. Diego El 22/04/2021 a las 17:12, Richard Cochran escribió: > On Thu, Apr 22, 2021 at 11:55:57AM +0200, Diego García Prieto wrote: >> adding CPU load, suddenly I lose sync. I have to say that if I force ptp4l >> and phc2sys to run in one core and the threads in other cores, everything is >> OK, > Ah, now you are getting somewhere! > > You are attempting to tune your system for your own performance > requirements. Sorry to say it, but there is no silver bullet, no > magic wand, and we on the list don't have a crystal ball. > > Tuning your system is work that you have to do. > > BTW, if you don't have it already, here is a book that might interest you. > > Brendan Gregg, Systems Performance: Enterprise and the Cloud, 2nd Edition > http://www.brendangregg.com/systems-performance-2nd-edition-book.html > > It won't solve your problem directly, but it might help along the way. > > > Good luck, > > Richard |