Re: [Linuxptp-users] Filter occasional spikes in offset
PTP IEEE 1588 stack for Linux
Brought to you by:
rcochran
From: Oleg O. <leo...@fb...> - 2022-05-04 11:39:40
|
> On 4 May 2022, at 11:33, Miroslav Lichvar <mli...@re...> wrote: > Hi Miroslav, Thank you for your response. I appreciate it. > On Tue, May 03, 2022 at 02:26:21PM +0000, Oleg Obleukhov via Linuxptp-users wrote: >> Hi team, >> In large distributed networks very many factors can lead to a short term spike in offset. Primarily network equipment without Transparent Clock support (even on a single device). > > PTP was designed for networks with constant delay. On switched > networks that requires full on-path PTP support. If you don't have > that, you should be looking at NTP or another protocol designed for > networks with variable delays, where more effective filtering can be > implemented. While we are phasing out old equipment the reality is - there will be always some % of misbehaving/old switches in large distributed systems with thousands switches on the way. During congestion which only lasts several microseconds we may be affected and we need to survive. > > Of course, that doesn't mean linuxptp couldn't try to do better in > these suboptimal conditions. The question is if it's in the scope of > the project. As you seem to have found out, the main issue with the > current design is that dropping samples can lead to servo instability. > >> Looking at ptp4l config I didn’t to find anything to overcome this situation and ignore this 1 bad outlier. >> I implemented a quick patch https://gist.github.com/leoleovich/5a4dff7e089bd429c5d208d9276e1683 which can mitigate this and it works very well: > >> Preventing unnecessary tuning of the servo for a short period of time by using a padding technique (simply filling with previous values). The patch I proposed simply doesn’t pass the offset to a servo - so it shouldn’t be too bad. For example with default ptp4l settings we can tolerate several missed syncs in a row. But I am open for suggestions of course. > > That patch seems to be dropping the sample and there is a different > output shown in the example. Is there a newer version of the patch you > didn't publish? The code I suggested matches the output. It simply prints something like: skip 1/2 large offset (>20000) -248483 When occasional spikes arise. The only difference is max_offset_locked and max_offset_locked_skip should be set to 0 and currently they are at 20000 and 2 respectively. > >> The bottom line is - we need to find a way to ignore outliers in a locked state where it’s not expected to have shot term large jumps in offset. >> Please check this out and let me know if there is a better way to handle this situation or if this patch can inspire any other ideas… > > If a spike filter needs to be implemented, I think it would better if > the threshold was automatically adjusted based on the jitter. For an > example, see the "Popcorn spike suppressor" in RFC5905 (NTPv4). Automatically adjusted filter is something even better. If you open for such idea we can discuss this as well. I wanted to start somewhere. > > -- > Miroslav Lichvar > Thank you, Oleg. |