Thread: [Linuxptp-devel] Peer delay regression in v1.6
PTP IEEE 1588 stack for Linux
Brought to you by:
rcochran
From: Richard C. <ric...@gm...> - 2015-10-22 08:45:14
|
We figure the neighbor rate ratio in port_nrate_calculate(): delta t3 / delta t4 = peer_interval / local_interval Originally the peer delay was figured like this: pd = tmv_sub(tmv_sub(t4, t1), tmv_sub(t3, t2)); Since t4-t1 are local, and t3-t2 are from the peer, there will be an error if the local and peer are not syntonized. In commit bd001fde (in v1.5) Delio changed the peer delay calculation: adj_t41 = p->nrate.ratio * tmv_dbl(tmv_sub(t4, t1)); pd = tmv_sub(dbl_tmv(adj_t41), tmv_sub(t3, t2)); This converts the local interval t4-t1 into the time base of the peer, and so the peer delay is then expressed with the peer's frequency. This follows clause 11.2.15.2.4 from IEEE 802.1-AS. Now, with the tsproc in place, the peer delay figured like this: /* delay = ((t2 - t3) * rr + (t4 - t1)) / 2 */ t23 = tmv_sub(tsp->t2, tsp->t3); if (tsp->clock_rate_ratio != 1.0) t23 = dbl_tmv(tmv_dbl(t23) * tsp->clock_rate_ratio); t41 = tmv_sub(tsp->t4, tsp->t1); delay = tmv_div(tmv_add(t23, t41), 2); The ratio peer/local is applied to t3-t2, but these are peer time stamps to begin with! That scaling is wrong, and even if the code did (t2-t3)*(1/rr), it would still change the meaning of the peer delay, since it would be then expressed in the local time base. [ The tsproc code is correct for figuring E2E delay, assuming that tsp->clock_rate_ratio is GM/local, since t3-t2 is a local time interval. ] We need to fix this and make a new release soon, since the error in the peer delay will impact some configurations, especially gPTP with free running local clocks. In order to fix this, I am thinking of adding a e2e/p2p flag to tsproc_create. When set to p2p, the formula delay = ((t2 - t3) + rr * (t4 - t1)) / 2 would be used. That would restore the behavior of v1.5. Thoughts? Thanks, Richard |
From: Delio B. <dbr...@au...> - 2015-10-22 13:44:40
|
Hello Richard, Thank you for spotting this issue. We haven’t been hit by it (yet!) because we use an old version of linuxptp. I hope to be able to move to a more recent release at some point. The fix you propose seems good to me. Thanks! -- Delio Brignoli AudioScience Inc On 22 Oct 2015, at 10:45, Richard Cochran <ric...@gm...> wrote: > > We figure the neighbor rate ratio in port_nrate_calculate(): > > delta t3 / delta t4 = peer_interval / local_interval > > Originally the peer delay was figured like this: > > pd = tmv_sub(tmv_sub(t4, t1), tmv_sub(t3, t2)); > > Since t4-t1 are local, and t3-t2 are from the peer, there will be an > error if the local and peer are not syntonized. > > In commit bd001fde (in v1.5) Delio changed the peer delay calculation: > > adj_t41 = p->nrate.ratio * tmv_dbl(tmv_sub(t4, t1)); > pd = tmv_sub(dbl_tmv(adj_t41), tmv_sub(t3, t2)); > > This converts the local interval t4-t1 into the time base of the peer, > and so the peer delay is then expressed with the peer's frequency. > This follows clause 11.2.15.2.4 from IEEE 802.1-AS. > > Now, with the tsproc in place, the peer delay figured like this: > > /* delay = ((t2 - t3) * rr + (t4 - t1)) / 2 */ > > t23 = tmv_sub(tsp->t2, tsp->t3); > if (tsp->clock_rate_ratio != 1.0) > t23 = dbl_tmv(tmv_dbl(t23) * tsp->clock_rate_ratio); > t41 = tmv_sub(tsp->t4, tsp->t1); > delay = tmv_div(tmv_add(t23, t41), 2); > > The ratio peer/local is applied to t3-t2, but these are peer time > stamps to begin with! That scaling is wrong, and even if the code did > (t2-t3)*(1/rr), it would still change the meaning of the peer delay, > since it would be then expressed in the local time base. > > [ The tsproc code is correct for figuring E2E delay, assuming that > tsp->clock_rate_ratio is GM/local, since t3-t2 is a local time > interval. ] > > We need to fix this and make a new release soon, since the error in > the peer delay will impact some configurations, especially gPTP with > free running local clocks. In order to fix this, I am thinking of > adding a e2e/p2p flag to tsproc_create. When set to p2p, the formula > > delay = ((t2 - t3) + rr * (t4 - t1)) / 2 > > would be used. That would restore the behavior of v1.5. > > Thoughts? > > Thanks, > Richard > > ------------------------------------------------------------------------------ > _______________________________________________ > Linuxptp-devel mailing list > Lin...@li... > https://lists.sourceforge.net/lists/listinfo/linuxptp-devel |
From: Miroslav L. <mli...@re...> - 2015-10-23 13:49:25
|
On Thu, Oct 22, 2015 at 10:45:05AM +0200, Richard Cochran wrote: > t23 = tmv_sub(tsp->t2, tsp->t3); > if (tsp->clock_rate_ratio != 1.0) > t23 = dbl_tmv(tmv_dbl(t23) * tsp->clock_rate_ratio); > t41 = tmv_sub(tsp->t4, tsp->t1); > delay = tmv_div(tmv_add(t23, t41), 2); > > The ratio peer/local is applied to t3-t2, but these are peer time > stamps to begin with! That scaling is wrong, and even if the code did > (t2-t3)*(1/rr), it would still change the meaning of the peer delay, > since it would be then expressed in the local time base. The math in the tsproc code was supposed to be identical to the code it replaced. t2 and t3 in tsproc are local times. The mapping between the port and tsproc timestamps is (t1, t2, t3, t4) -> (t3, t4, t1, t2). Maybe it wasn't the best idea to use the same variable names in all three (clock, port, tsproc) contexts? Anyway, does this actually break something? -- Miroslav Lichvar |
From: Richard C. <ric...@gm...> - 2015-10-23 16:03:57
|
> The math in the tsproc code was supposed to be identical to the code > it replaced. t2 and t3 in tsproc are local times. The mapping between > the port and tsproc timestamps is (t1, t2, t3, t4) -> (t3, t4, t1, t2). Oh, tricky! Now I vaguely recall that from the original review. > Maybe it wasn't the best idea to use the same variable names in all > three (clock, port, tsproc) contexts? The tsproc unifies the e2e and p2p methods in a good way. But the different meaning of t1-t4 could use a comment, in order keep people like me from getting confused. > Anyway, does this actually break something? I guess not then. I did try v1.5 an v1.6 on a test system, and I was puzzled why I couldn't see the effect of this bug. Thanks, Richard |