Thread: [Linuxptp-users] Interesting (very wrong) PTP output
PTP IEEE 1588 stack for Linux
Brought to you by:
rcochran
From: Keller, J. E <jac...@in...> - 2012-06-05 23:31:31
|
n0825:[1]/root/linuxptp> ./ptp4l -m -v -P -f default.cfg -i eth2 ptp4l[28757]: selected /dev/ptp0 as PTP clock ptp4l[28757]: port 1: INITIALIZING to LISTENING on INITIALIZE ptp4l[28757]: port 1: new foreign master 001b21.fffe.cf83e4-1 ptp4l[28757]: selected best master clock 001b21.fffe.cf83e4 ptp4l[28757]: port 1: LISTENING to UNCALIBRATED on RS_SLAVE ptp4l[28757]: master offset 8053809267140 s0 adj +0 path delay -58248 ptp4l[28757]: master offset 8052809280419 s0 adj +0 path delay -55403 ptp4l[28757]: master offset 8051809142574 s0 adj +0 path delay -56020 ptp4l[28757]: master offset 8050809162084 s1 adj +0 path delay -56020 ptp4l[28757]: master offset -1000134414 s2 adj -250000000 path delay -56427 ptp4l[28757]: port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED ptp4l[28757]: master offset -750413376 s2 adj -250000000 path delay -56427 ptp4l[28757]: master offset -500400801 s2 adj -250000000 path delay -44558 ptp4l[28757]: master offset -250420149 s2 adj -250000000 path delay -36052 ptp4l[28757]: master offset -393488 s2 adj -250000000 path delay -36052 ptp4l[28757]: master offset 249589808 s2 adj -250000000 path delay -29589 ptp4l[28757]: master offset 499591481 s2 adj -250000000 path delay -5666 ptp4l[28757]: master offset 749574326 s2 adj -250000000 path delay 798 ptp4l[28757]: master offset 999593422 s2 adj -250000000 path delay 8104 ptp4l[28757]: master offset 1249583788 s2 adj -250000000 path delay 8104 ptp4l[28757]: master offset 1499600752 s2 adj -250000000 path delay 15263 ptp4l[28757]: master offset 1749590329 s2 adj -250000000 path delay 15263 ptp4l[28757]: master offset 1999616415 s2 adj -250000000 path delay 15287 ptp4l[28757]: master offset 2249605370 s2 adj -250000000 path delay 15281 ptp4l[28757]: master offset 2499629742 s2 adj -250000000 path delay 15285 ptp4l[28757]: master offset 2749620793 s2 adj -250000000 path delay 15291 --- This occurs sometimes after running a script that tests a bunch of ethtool commands. The result seems to be due to the -250m ppb adjustment never gets reset. However when I reperform the test, and instead of using the script before starting PTP, I force the offset to be a similar value by using testptp script to force clock adjustment, the restult is correct (it calculates the necessary large offset and then works fine) If I re-run the test again but force the 2nd machine to be slave, I get the reverse results: adj is +250000000 but the offset from master continually increases with no sign of converging. It seems like somehow the p/i servo code is incorrect because once we hit max adjust it never seems to move away from it. However that isn't the case when I run ptp4l without that script. I am still trying to determine what the script is doing to the ptp registers on the device (if anything) - Jake |
From: Richard C. <ric...@gm...> - 2012-06-06 16:10:36
|
On Tue, Jun 05, 2012 at 11:31:24PM +0000, Keller, Jacob E wrote: > It seems like somehow the p/i servo code is incorrect because once > we hit max adjust it never seems to move away from it. The servo clamps the adjustment to the maximum. It looks like it the servo output is always outside of the limit. You can inspect s->drift to get a better idea what is going on. > However that > isn't the case when I run ptp4l without that script. I am still > trying to determine what the script is doing to the ptp registers on > the device (if anything) Futzing with the clock or the device registers behind the program's back could well cause unexpected results. HTH, Richard |
From: Keller, J. E <jac...@in...> - 2012-06-06 16:12:37
|
> -----Original Message----- > From: Richard Cochran [mailto:ric...@gm...] > Sent: Wednesday, June 06, 2012 9:10 AM > To: Keller, Jacob E > Cc: lin...@li... > Subject: Re: [Linuxptp-users] Interesting (very wrong) PTP output > > On Tue, Jun 05, 2012 at 11:31:24PM +0000, Keller, Jacob E wrote: > > > It seems like somehow the p/i servo code is incorrect because once we > > hit max adjust it never seems to move away from it. > > The servo clamps the adjustment to the maximum. It looks like it the servo > output is always outside of the limit. You can inspect s->drift to get a > better idea what is going on. > > > However that > > isn't the case when I run ptp4l without that script. I am still trying > > to determine what the script is doing to the ptp registers on the > > device (if anything) > > Futzing with the clock or the device registers behind the program's back > could well cause unexpected results. > The ethtool options script *shouldn't* be doing that, but I am not positive. I'll take a look at the drift value. Also, I ran the test with -l 100 and didn't see any missing timestamp warnings, so all of the timestamp values are at least getting from the driver if not the correct ones. Thanks > HTH, > Richard |
From: Christian R. <chr...@om...> - 2012-06-07 10:11:36
|
Hi Jake and Richard, >On Tue, Jun 05, 2012 at 11:31:24PM +0000, Keller, Jacob E wrote: >> It seems like somehow the p/i servo code is incorrect because once >> we hit max adjust it never seems to move away from it. > > The servo clamps the adjustment to the maximum. It looks like it the > servo output is always outside of the limit. You can inspect s->drift > to get a better idea what is going on. I agree, looks like something is going wrong with s->drift, it seems to get set to a very high value and does not recover from that. Since the value of s->drift is clamped to the maximum adjustment in pi.c when s->count == 4, it must be the initial calculation for s->count == 2. This calculation tries to get a good estimate for s->drift to allow a faster settling of the control loop. However, there is no clamping of the value of s->drift in this case. What are your scripts doing before ptp4l is started? Do they do some clock adjustments? Note that the initial calculation of s->drift only works if the clock is initially set to 0 ppb, otherwise some correction of the calculated s->drift must be done. Finally, if we calculate an estimate for s->drift, we should also apply it to the clock. Please find below a patch (it's only an quick hack and only compile-tested) that a) sets the clock to 0 ppb before doing anything else, b) clamps s->drift to the maximum adjustment value at the first calculation (estimation), c) applies the estimate to the clock. Regards, Christian --- clock.c | 3 +-- pi.c | 25 +++++++++++++++++-------- 2 files changed, 18 insertions(+), 10 deletions(-) diff --git a/clock.c b/clock.c index e3797bf..c8216c3 100644 --- a/clock.c +++ b/clock.c @@ -475,13 +475,12 @@ enum servo_state clock_synchronize(struct clock *c, c->master_offset, state, adj, c->path_delay); switch (state) { - case SERVO_UNLOCKED: - break; case SERVO_JUMP: clock_step(c->clkid, -c->master_offset); c->t1 = tmv_zero(); c->t2 = tmv_zero(); break; + case SERVO_UNLOCKED: case SERVO_LOCKED: clock_ppb(c->clkid, -adj); break; diff --git a/pi.c b/pi.c index 33766b1..743ae71 100644 --- a/pi.c +++ b/pi.c @@ -59,28 +59,37 @@ static double pi_sample(struct servo *servo, switch (s->count) { case 0: - s->offset[0] = offset; - s->local[0] = local_ts; *state = SERVO_UNLOCKED; - s->count = 1; break; case 1: - s->offset[1] = offset; - s->local[1] = local_ts; + s->offset[0] = offset; + s->local[0] = local_ts; *state = SERVO_UNLOCKED; s->count = 2; break; case 2: - s->drift = (s->offset[1] - s->offset[0]) / - (s->local[1] - s->local[0]); + s->offset[1] = offset; + s->local[1] = local_ts; *state = SERVO_UNLOCKED; s->count = 3; break; case 3: - *state = SERVO_JUMP; + s->drift = (s->offset[1] - s->offset[0]) / + (s->local[1] - s->local[0]); + if (s->drift < -s->maxppb) { + s->drift = -s->maxppb; + } else if (s->drift > s->maxppb) { + s->drift = s->maxppb; + } + ppb = s->drift; + *state = SERVO_UNLOCKED; s->count = 4; break; case 4: + *state = SERVO_JUMP; + s->count = 5; + break; + case 5: ki_term = s->ki * offset; ppb = s->kp * offset + s->drift + ki_term; if (ppb < -s->maxppb) { -- 1.7.4.1 |
From: Keller, J. E <jac...@in...> - 2012-06-07 23:03:42
|
> -----Original Message----- > From: Christian Riesch [mailto:chr...@om...] > Sent: Thursday, June 07, 2012 2:55 AM > To: ric...@gm...; jak...@in... > Cc: lin...@li... > Subject: Re: [Linuxptp-users] Interesting (very wrong) PTP output > > Hi Jake and Richard, > > >On Tue, Jun 05, 2012 at 11:31:24PM +0000, Keller, Jacob E wrote: > >> It seems like somehow the p/i servo code is incorrect because once we > >> hit max adjust it never seems to move away from it. > > > > The servo clamps the adjustment to the maximum. It looks like it the > > servo output is always outside of the limit. You can inspect s->drift > > to get a better idea what is going on. > > I agree, looks like something is going wrong with s->drift, it seems to > get set to a very high value and does not recover from that. > > Since the value of s->drift is clamped to the maximum adjustment in pi.c > when > s->count == 4, it must be the initial calculation for s->count == 2. > s->This > calculation tries to get a good estimate for s->drift to allow a faster > settling of the control loop. However, there is no clamping of the value > of s->drift in this case. > > What are your scripts doing before ptp4l is started? Do they do some clock > adjustments? Note that the initial calculation of s->drift only works if > the clock is initially set to 0 ppb, otherwise some correction of the > calculated s->drift must be done. > > Finally, if we calculate an estimate for s->drift, we should also apply it > to the clock. > > Please find below a patch (it's only an quick hack and only compile- > tested) that > a) sets the clock to 0 ppb before doing anything else, > b) clamps s->drift to the maximum adjustment value at the first > calculation > (estimation), > c) applies the estimate to the clock. > > Regards, > Christian > > --- > clock.c | 3 +-- > pi.c | 25 +++++++++++++++++-------- > 2 files changed, 18 insertions(+), 10 deletions(-) > > diff --git a/clock.c b/clock.c > index e3797bf..c8216c3 100644 > --- a/clock.c > +++ b/clock.c > @@ -475,13 +475,12 @@ enum servo_state clock_synchronize(struct clock *c, > c->master_offset, state, adj, c->path_delay); > > switch (state) { > - case SERVO_UNLOCKED: > - break; > case SERVO_JUMP: > clock_step(c->clkid, -c->master_offset); > c->t1 = tmv_zero(); > c->t2 = tmv_zero(); > break; > + case SERVO_UNLOCKED: > case SERVO_LOCKED: > clock_ppb(c->clkid, -adj); > break; > diff --git a/pi.c b/pi.c > index 33766b1..743ae71 100644 > --- a/pi.c > +++ b/pi.c > @@ -59,28 +59,37 @@ static double pi_sample(struct servo *servo, > > switch (s->count) { > case 0: > - s->offset[0] = offset; > - s->local[0] = local_ts; > *state = SERVO_UNLOCKED; > - s->count = 1; > break; > case 1: > - s->offset[1] = offset; > - s->local[1] = local_ts; > + s->offset[0] = offset; > + s->local[0] = local_ts; > *state = SERVO_UNLOCKED; > s->count = 2; > break; > case 2: > - s->drift = (s->offset[1] - s->offset[0]) / > - (s->local[1] - s->local[0]); > + s->offset[1] = offset; > + s->local[1] = local_ts; > *state = SERVO_UNLOCKED; > s->count = 3; > break; > case 3: > - *state = SERVO_JUMP; > + s->drift = (s->offset[1] - s->offset[0]) / > + (s->local[1] - s->local[0]); > + if (s->drift < -s->maxppb) { > + s->drift = -s->maxppb; > + } else if (s->drift > s->maxppb) { > + s->drift = s->maxppb; > + } > + ppb = s->drift; > + *state = SERVO_UNLOCKED; > s->count = 4; > break; > case 4: > + *state = SERVO_JUMP; > + s->count = 5; > + break; > + case 5: > ki_term = s->ki * offset; > ppb = s->kp * offset + s->drift + ki_term; > if (ppb < -s->maxppb) { > -- > 1.7.4.1 > > > -------------------------------------------------------------------------- > ---- > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Linuxptp-users mailing list > Lin...@li... > https://lists.sourceforge.net/lists/listinfo/linuxptp-users I found an issue with the driver for the network adapter, that fixed the issue. I am not sure what this patch is trying to do. But I do think applying a ppb adjustment of 0 to the device originally can be useful to ensure its in a correct state during the unlocked stage. - Jake |
From: Richard C. <ric...@gm...> - 2012-06-20 16:38:33
|
On Thu, Jun 07, 2012 at 11:54:48AM +0200, Christian Riesch wrote: Even though Jacob had some other problem, I still am going to take this patch (or something similar) after I get a chance to test it. I have a few comments... > Please find below a patch (it's only an quick hack and only compile-tested) > that > a) sets the clock to 0 ppb before doing anything else, Agreed, this should be done. > b) clamps s->drift to the maximum adjustment value at the first calculation > (estimation), Not sure if this is too important, but I guess it doesn't hurt. If the actual drift is more than the max adjustment, then the situation is hopeless anyhow. > c) applies the estimate to the clock. Not sure if this is too important either. The drift will dominate this calculation in any case, right? > ki_term = s->ki * offset; > ppb = s->kp * offset + s->drift + ki_term; Thanks, Richard |
From: Keller, J. E <jac...@in...> - 2012-06-20 16:41:44
|
> -----Original Message----- > From: Richard Cochran [mailto:ric...@gm...] > Sent: Wednesday, June 20, 2012 9:38 AM > To: Christian Riesch > Cc: jak...@in...; lin...@li... > Subject: Re: [Linuxptp-users] Interesting (very wrong) PTP output > > On Thu, Jun 07, 2012 at 11:54:48AM +0200, Christian Riesch wrote: > > Even though Jacob had some other problem, I still am going to take this > patch (or something similar) after I get a chance to test it. > > I have a few comments... > > > Please find below a patch (it's only an quick hack and only > > compile-tested) that > > a) sets the clock to 0 ppb before doing anything else, > > Agreed, this should be done. > > > b) clamps s->drift to the maximum adjustment value at the first > calculation > > (estimation), > > Not sure if this is too important, but I guess it doesn't hurt. If the > actual drift is more than the max adjustment, then the situation is > hopeless anyhow. > > > c) applies the estimate to the clock. > > Not sure if this is too important either. The drift will dominate this > calculation in any case, right? > > > ki_term = s->ki * offset; > > ppb = s->kp * offset + s->drift + ki_term; > > Thanks, > Richard > > -------------------------------------------------------------------------- > ---- > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and threat > landscape has changed and how IT managers can respond. Discussions will > include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Linuxptp-users mailing list > Lin...@li... > https://lists.sourceforge.net/lists/listinfo/linuxptp-users I agree with the patch as well. My issue was different but this can help in odd cases (resetting the clock to 0ppb is good as that clears any previous settings) |
From: Richard C. <ric...@gm...> - 2012-09-25 08:11:08
|
On Thu, Jun 07, 2012 at 11:54:48AM +0200, Christian Riesch wrote: > > Please find below a patch (it's only an quick hack and only compile-tested) > that > a) sets the clock to 0 ppb before doing anything else, > b) clamps s->drift to the maximum adjustment value at the first calculation > (estimation), > c) applies the estimate to the clock. Coming back to this patch, I think I now have a better solution to the issue of starting with a clock that has already been adjusted. I just pushed out this commit 8f00d29 Discover and utilize the initial clock frequency adjustment. that makes use of the clock_adjtimex() to get the initial frequency adjustment value (functionality to appear in kernel 3.7). FWIW, I did try out this patch, but I found that, suprisingly, the code seemed to work better without this patch. I think the reason was that the path delay measurements were spoiled by resetting the frequency adjustment to zero, and the circa 100 ppm error introduced at the jump was soon corrected. In any case, using the latest kernel together with this patch works perfectly, tested using the PHYTER. Thanks, Richard |
From: Jacob K. <jac...@in...> - 2012-09-26 18:40:54
|
On 09/25/2012 01:10 AM, Richard Cochran wrote: > On Thu, Jun 07, 2012 at 11:54:48AM +0200, Christian Riesch wrote: >> >> Please find below a patch (it's only an quick hack and only compile-tested) >> that >> a) sets the clock to 0 ppb before doing anything else, >> b) clamps s->drift to the maximum adjustment value at the first calculation >> (estimation), >> c) applies the estimate to the clock. > > Coming back to this patch, I think I now have a better solution to the > issue of starting with a clock that has already been adjusted. I just > pushed out this commit > > 8f00d29 Discover and utilize the initial clock frequency adjustment. > > that makes use of the clock_adjtimex() to get the initial frequency > adjustment value (functionality to appear in kernel 3.7). > > FWIW, I did try out this patch, but I found that, suprisingly, the > code seemed to work better without this patch. I think the reason > was that the path delay measurements were spoiled by resetting the > frequency adjustment to zero, and the circa 100 ppm error introduced > at the jump was soon corrected. > > In any case, using the latest kernel together with this patch works > perfectly, tested using the PHYTER. > > Thanks, > Richard > > Once I have a spare moment I will go ahead and test this on the intel ixgbe parts. It should provide a definite improvement. - Jake |
From: Richard C. <ric...@gm...> - 2012-06-06 16:12:26
|
BTW, this kind of degenerate behaviour can also be caused by wrong time stamps. Richard |
From: Keller, J. E <jac...@in...> - 2012-06-06 16:13:17
|
Ok. I will add some instrumentation to see if I can verify the timestamps are good. > -----Original Message----- > From: Richard Cochran [mailto:ric...@gm...] > Sent: Wednesday, June 06, 2012 9:12 AM > To: Keller, Jacob E > Cc: lin...@li... > Subject: Re: [Linuxptp-users] Interesting (very wrong) PTP output > > BTW, this kind of degenerate behaviour can also be caused by wrong time > stamps. > > Richard |
From: Keller, J. E <jac...@in...> - 2012-06-07 22:59:07
|
I found the bug in my driver: the time increment register was being zeroed by a reset path, and I wasn't properly resetting it, so the ptp daemon was unable to sync to it as master because the other end has a clock that wasn't changing. This was the cause of the problems. - Jake > -----Original Message----- > From: Keller, Jacob E [mailto:jac...@in...] > Sent: Wednesday, June 06, 2012 9:13 AM > To: Richard Cochran > Cc: lin...@li... > Subject: Re: [Linuxptp-users] Interesting (very wrong) PTP output > > Ok. I will add some instrumentation to see if I can verify the timestamps > are good. > > > -----Original Message----- > > From: Richard Cochran [mailto:ric...@gm...] > > Sent: Wednesday, June 06, 2012 9:12 AM > > To: Keller, Jacob E > > Cc: lin...@li... > > Subject: Re: [Linuxptp-users] Interesting (very wrong) PTP output > > > > BTW, this kind of degenerate behaviour can also be caused by wrong > > time stamps. > > > > Richard > > -------------------------------------------------------------------------- > ---- > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and threat > landscape has changed and how IT managers can respond. Discussions will > include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Linuxptp-users mailing list > Lin...@li... > https://lists.sourceforge.net/lists/listinfo/linuxptp-users |