linuxptp-users Mailing List for linuxptp (Page 159)
PTP IEEE 1588 stack for Linux
Brought to you by:
rcochran
You can subscribe to this list here.
2012 |
Jan
|
Feb
(10) |
Mar
(47) |
Apr
|
May
(26) |
Jun
(10) |
Jul
(4) |
Aug
(2) |
Sep
(2) |
Oct
(20) |
Nov
(14) |
Dec
(8) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2013 |
Jan
(6) |
Feb
(18) |
Mar
(27) |
Apr
(57) |
May
(32) |
Jun
(21) |
Jul
(79) |
Aug
(108) |
Sep
(13) |
Oct
(73) |
Nov
(51) |
Dec
(24) |
2014 |
Jan
(24) |
Feb
(41) |
Mar
(39) |
Apr
(5) |
May
(6) |
Jun
(2) |
Jul
(5) |
Aug
(15) |
Sep
(7) |
Oct
(6) |
Nov
|
Dec
(7) |
2015 |
Jan
(27) |
Feb
(18) |
Mar
(37) |
Apr
(8) |
May
(13) |
Jun
(44) |
Jul
(4) |
Aug
(50) |
Sep
(35) |
Oct
(6) |
Nov
(24) |
Dec
(19) |
2016 |
Jan
(30) |
Feb
(30) |
Mar
(23) |
Apr
(4) |
May
(12) |
Jun
(19) |
Jul
(26) |
Aug
(13) |
Sep
|
Oct
(23) |
Nov
(37) |
Dec
(15) |
2017 |
Jan
(33) |
Feb
(19) |
Mar
(20) |
Apr
(43) |
May
(39) |
Jun
(23) |
Jul
(20) |
Aug
(27) |
Sep
(10) |
Oct
(15) |
Nov
|
Dec
(24) |
2018 |
Jan
(3) |
Feb
(10) |
Mar
(34) |
Apr
(34) |
May
(28) |
Jun
(50) |
Jul
(27) |
Aug
(75) |
Sep
(21) |
Oct
(42) |
Nov
(25) |
Dec
(31) |
2019 |
Jan
(39) |
Feb
(28) |
Mar
(19) |
Apr
(7) |
May
(30) |
Jun
(22) |
Jul
(54) |
Aug
(36) |
Sep
(19) |
Oct
(33) |
Nov
(36) |
Dec
(32) |
2020 |
Jan
(29) |
Feb
(38) |
Mar
(29) |
Apr
(30) |
May
(39) |
Jun
(45) |
Jul
(31) |
Aug
(52) |
Sep
(40) |
Oct
(8) |
Nov
(48) |
Dec
(30) |
2021 |
Jan
(35) |
Feb
(32) |
Mar
(23) |
Apr
(55) |
May
(43) |
Jun
(63) |
Jul
(17) |
Aug
(24) |
Sep
(9) |
Oct
(31) |
Nov
(67) |
Dec
(55) |
2022 |
Jan
(31) |
Feb
(48) |
Mar
(76) |
Apr
(18) |
May
(13) |
Jun
(46) |
Jul
(75) |
Aug
(54) |
Sep
(59) |
Oct
(65) |
Nov
(44) |
Dec
(7) |
2023 |
Jan
(38) |
Feb
(32) |
Mar
(35) |
Apr
(23) |
May
(46) |
Jun
(53) |
Jul
(18) |
Aug
(10) |
Sep
(24) |
Oct
(15) |
Nov
(40) |
Dec
(6) |
From: Richard C. <ric...@gm...> - 2012-10-30 17:24:13
|
On Tue, Oct 30, 2012 at 09:54:30AM -0700, Jacob Keller wrote: > > I believe the true correct answer is to completely re-architect the > tx_hwtstamp to be asynchronous, so that it just waits until it receives > the timestamp for a complete sequence of events. That design is > significantly more difficult to write though. But even if we did that way, it would not really be a better solution. Think about your own Intel cards. They would end up missing Tx time stamps and possibly mixing them up due to the hardware limitation of having a Tx time stamp FIFO of depth one. And it is not just Intel cards that have this issue. I think the majority of the current hardware offerings all have this same limitation. So we really must wait for the Tx time stamp after sending an event message before going on with the protocol, simply to function on most of the hardware out there. Thanks, Richard |
From: Miroslav L. <mli...@re...> - 2012-10-30 17:17:31
|
On Tue, Oct 30, 2012 at 06:05:38PM +0100, Richard Cochran wrote: > This does not accomplish anything since: > > --- /usr/include/asm-generic/errno.h --- > > #define EWOULDBLOCK EAGAIN /* Operation would block */ > > > > usleep(1); > > + try_again++; > > This is the wrong solution. The right way is to set the > tx_timestamp_retries configuration variable to a higher number, like > 200 or 2000 instead of the default of 2. Would it make sense to specify a timeout instead of number of retries and use select()? -- Miroslav Lichvar |
From: Richard C. <ric...@gm...> - 2012-10-30 17:05:59
|
On Tue, Oct 30, 2012 at 04:47:09PM +0100, Mario Molitor wrote: > > I have instrument the ptp4l code and I could see that a part of problem was a not correct error handling in the function sk_receive(). The recvmsg() returns sometime a EAGAIN and try-again variable was not increment. > I have changed this and now disappears this error message with GM Clock search during my flood ping test and it works all very well. > > My code changes: > --- a/sk.c > +++ b/sk.c > > } > if (errno == EINTR) { > try_again++; > - } else if (errno == EAGAIN) { > + } else if ((errno == EAGAIN ) || (errno == EWOULDBLOCK)) { This does not accomplish anything since: --- /usr/include/asm-generic/errno.h --- #define EWOULDBLOCK EAGAIN /* Operation would block */ > usleep(1); > + try_again++; This is the wrong solution. The right way is to set the tx_timestamp_retries configuration variable to a higher number, like 200 or 2000 instead of the default of 2. HTH, Richard |
From: Jacob K. <jac...@in...> - 2012-10-30 16:54:44
|
On 10/30/2012 08:47 AM, Mario Molitor wrote: > > Hallo Richard, > > I observed following error message with a new GM Clock search during my flood ping test: > > ptp4l[23768.970]: recvmsg tx timestamp failed: Resource temporarily unavailable > ptp4l[23768.975]: port 1: send delay request failed > ptp4l[23768.975]: port 1: SLAVE to FAULTY on FAULT_DETECTED > ptp4l[23784.059]: port 1: FAULTY to LISTENING on FAULT_CLEARED > ptp4l[23784.414]: port 1: new foreign master 0050c2.fffe.c2dfc3-1 > ptp4l[23788.424]: selected best master clock 0050c2.fffe.c2dfc3 > ptp4l[23788.425]: port 1: LISTENING to UNCALIBRATED on RS_SLAVE > ptp4l[23790.092]: port 1: minimum delay request interval 2^3 > ptp4l[23790.688]: master offset -329 s2 adj -13761 path delay 1626 > ptp4l[23790.708]: port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED > > flood ping test script: > while true; do sudo ping -f -c 1000 -s $RANDOM <IP of PTP Module> ; done. > > I have instrument the ptp4l code and I could see that a part of problem was a not correct error handling in the function sk_receive(). The recvmsg() returns sometime a EAGAIN and try-again variable was not increment. > I have changed this and now disappears this error message with GM Clock search during my flood ping test and it works all very well. > > My code changes: > --- a/sk.c > +++ b/sk.c > > } > if (errno == EINTR) { > try_again++; > - } else if (errno == EAGAIN) { > + } else if ((errno == EAGAIN ) || (errno == EWOULDBLOCK)) { > usleep(1); > + try_again++; > } else { > break; > } > > > Do you have an idea why these EAGAIN errors occur? I cloud not find a reason for non-blocking. > > Best regards, > Mario > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_sfd2d_oct > _______________________________________________ > Linuxptp-users mailing list > Lin...@li... > https://lists.sourceforge.net/lists/listinfo/linuxptp-users > That loop is there due to the way hardware timestamps are returned from the network stack to PTP4l. They are looped back on the socket error queue, and then picked up by PTP4l. Current design doesn't want to wait indefinitely due to possible missed timestamps. The try_again isn't incremented on purpose, as that would cause it to loop an infinite number of times. You could try to increment the tx_timestamp_retry value in the config file and see if this fixes the issue. I believe we should have a higher default in this field, because the drivers I've tested all have trouble returning the timestamp within the very short time (2 nanoseconds). If it's a problem due to the regular receive that might be an entirely different issue. I believe the true correct answer is to completely re-architect the tx_hwtstamp to be asynchronous, so that it just waits until it receives the timestamp for a complete sequence of events. That design is significantly more difficult to write though. - Jake |
From: Mario M. <mar...@we...> - 2012-10-30 15:47:21
|
Hallo Richard, I observed following error message with a new GM Clock search during my flood ping test: ptp4l[23768.970]: recvmsg tx timestamp failed: Resource temporarily unavailable ptp4l[23768.975]: port 1: send delay request failed ptp4l[23768.975]: port 1: SLAVE to FAULTY on FAULT_DETECTED ptp4l[23784.059]: port 1: FAULTY to LISTENING on FAULT_CLEARED ptp4l[23784.414]: port 1: new foreign master 0050c2.fffe.c2dfc3-1 ptp4l[23788.424]: selected best master clock 0050c2.fffe.c2dfc3 ptp4l[23788.425]: port 1: LISTENING to UNCALIBRATED on RS_SLAVE ptp4l[23790.092]: port 1: minimum delay request interval 2^3 ptp4l[23790.688]: master offset -329 s2 adj -13761 path delay 1626 ptp4l[23790.708]: port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED flood ping test script: while true; do sudo ping -f -c 1000 -s $RANDOM <IP of PTP Module> ; done. I have instrument the ptp4l code and I could see that a part of problem was a not correct error handling in the function sk_receive(). The recvmsg() returns sometime a EAGAIN and try-again variable was not increment. I have changed this and now disappears this error message with GM Clock search during my flood ping test and it works all very well. My code changes: --- a/sk.c +++ b/sk.c } if (errno == EINTR) { try_again++; - } else if (errno == EAGAIN) { + } else if ((errno == EAGAIN ) || (errno == EWOULDBLOCK)) { usleep(1); + try_again++; } else { break; } Do you have an idea why these EAGAIN errors occur? I cloud not find a reason for non-blocking. Best regards, Mario |
From: Jacob K. <jac...@in...> - 2012-09-26 18:40:54
|
On 09/25/2012 01:10 AM, Richard Cochran wrote: > On Thu, Jun 07, 2012 at 11:54:48AM +0200, Christian Riesch wrote: >> >> Please find below a patch (it's only an quick hack and only compile-tested) >> that >> a) sets the clock to 0 ppb before doing anything else, >> b) clamps s->drift to the maximum adjustment value at the first calculation >> (estimation), >> c) applies the estimate to the clock. > > Coming back to this patch, I think I now have a better solution to the > issue of starting with a clock that has already been adjusted. I just > pushed out this commit > > 8f00d29 Discover and utilize the initial clock frequency adjustment. > > that makes use of the clock_adjtimex() to get the initial frequency > adjustment value (functionality to appear in kernel 3.7). > > FWIW, I did try out this patch, but I found that, suprisingly, the > code seemed to work better without this patch. I think the reason > was that the path delay measurements were spoiled by resetting the > frequency adjustment to zero, and the circa 100 ppm error introduced > at the jump was soon corrected. > > In any case, using the latest kernel together with this patch works > perfectly, tested using the PHYTER. > > Thanks, > Richard > > Once I have a spare moment I will go ahead and test this on the intel ixgbe parts. It should provide a definite improvement. - Jake |
From: Richard C. <ric...@gm...> - 2012-09-25 08:11:08
|
On Thu, Jun 07, 2012 at 11:54:48AM +0200, Christian Riesch wrote: > > Please find below a patch (it's only an quick hack and only compile-tested) > that > a) sets the clock to 0 ppb before doing anything else, > b) clamps s->drift to the maximum adjustment value at the first calculation > (estimation), > c) applies the estimate to the clock. Coming back to this patch, I think I now have a better solution to the issue of starting with a clock that has already been adjusted. I just pushed out this commit 8f00d29 Discover and utilize the initial clock frequency adjustment. that makes use of the clock_adjtimex() to get the initial frequency adjustment value (functionality to appear in kernel 3.7). FWIW, I did try out this patch, but I found that, suprisingly, the code seemed to work better without this patch. I think the reason was that the path delay measurements were spoiled by resetting the frequency adjustment to zero, and the circa 100 ppm error introduced at the jump was soon corrected. In any case, using the latest kernel together with this patch works perfectly, tested using the PHYTER. Thanks, Richard |
From: Richard C. <ric...@gm...> - 2012-08-21 12:55:32
|
Dear linuxptp users, I have just pushed out a patch series from Jacob Keller that improves the command line and configuration file interface. I think the new interface is more logical and easier to understand. However, some of the flags have changed meaning (like -m is now -s, -s is now -S, etc), so please make note of the new usage. Thanks, Richard |
From: Richard C. <ric...@gm...> - 2012-08-10 18:56:49
|
Dear linuxptp users, I have just pushed out new code implementing gPTP more fully than before. Now the ptp4l program supports two of the three gPTP TLVs. * follow_up information TLV * path trace TLV I have tested the code minimally together with a LabX Titanium 411 and the XMOS AVB Demo Board. So far, the result looks good. The ptp4l program can act as both slave and master in a gPTP network. Please do give this new code a try. Thanks, Richard |
From: Mario M. <Mar...@we...> - 2012-07-08 10:06:27
|
> Wow, what an embarrassing collection of bugs I produced. No problem. I think you have made great job and the software architecture of ptp4l is very good and this make it easy to analyze problems. > Many thanks for finding this bug and the patch. I just pushed a fix >(just a bit different than your patch) today. Thanks, it was pleasure to help. >I added some compile time debugging that you might try out to verify >the fix. I found that the slave mode uses 6 messages and the master >mode 2, with one master and one slave and the default settings. Ok. Thanks, these are very helpful additional features to analyze the message pool handling and memory consuming. >> I think another good way to solve this behavior is to use a circular >> message pool with a fix size. >> >> What do you think from this idea? >A fixed size would be okay, but it might be hard to predict the number >of messages needed for a busy bounday clock. OK. It was only an idea after changing the code. I wish you a nice weekend and good start in the next week. Thanks, Mario |
From: Mario M. <Mar...@we...> - 2012-07-08 10:05:41
|
> Wow, what an embarrassing collection of bugs I produced. No problem. I think you have made great job and the software architecture of ptp4l is very good and this make it easy to analyze problems. > Many thanks for finding this bug and the patch. I just pushed a fix >(just a bit different than your patch) today. Thanks, it was pleasure to help. >I added some compile time debugging that you might try out to verify >the fix. I found that the slave mode uses 6 messages and the master >mode 2, with one master and one slave and the default settings. Ok. Thanks, these are very helpful additional features to analyze the message pool handling and memory consuming. >> I think another good way to solve this behavior is to use a circular >> message pool with a fix size. >> >> What do you think from this idea? >A fixed size would be okay, but it might be hard to predict the number >of messages needed for a busy bounday clock. OK. It was only an idea after changing the code. I wish you a nice weekend and good start in the next week. Thanks, Mario |
From: Richard C. <ric...@gm...> - 2012-07-07 18:36:59
|
On Sat, Jul 07, 2012 at 12:01:29PM +0200, Mario Molitor wrote: > > In the current implementation of this functions would every call of > msg_allocate() make a new message allocation, because the > condition if(!m) in msg_put() should always false and in the > consequently is msg_pool always empty. Wow, what an embarrassing collection of bugs I produced. > Attached on this email you could also find patch file Many thanks for finding this bug and the patch. I just pushed a fix (just a bit different than your patch) today. I added some compile time debugging that you might try out to verify the fix. I found that the slave mode uses 6 messages and the master mode 2, with one master and one slave and the default settings. > I think another good way to solve this behavior is to use a circular > message pool with a fix size. > > What do you think from this idea? A fixed size would be okay, but it might be hard to predict the number of messages needed for a busy bounday clock. Thanks, Richard |
From: Mario M. <Mar...@we...> - 2012-07-07 10:01:34
|
Hallo PTP4l members, I could observe a memory increasing by long term test of PTP4L application. (cat /proc/[PID of App]/status) The background of this memory leak is message pool handling in the function msg_allocate(void) and msg_put(). In the current implementation of this functions would every call of msg_allocate() make a new message allocation, because the condition if(!m) in msg_put() should always false and in the consequently is msg_pool always empty. My patch suggestion: @@ -146,12 +147,20 @@ struct ptp_message *msg_allocate(void) struct message_storage *s; struct ptp_message *m = TAILQ_FIRST(&msg_pool); if (!m) { + // pr_notice("ptp_message alloc new \n"); s = malloc(sizeof(*s)); if (s) { m = &s->msg; - m->refcnt = 1; + } else { + return NULL; } + } else { + TAILQ_REMOVE(&msg_pool, m, list); + //pr_notice("ptp_message\n"); } + + memset(m, 0, sizeof(*m)); + m->refcnt = 1; return m; } @@ -330,8 +339,11 @@ void msg_print(struct ptp_message *m, FILE *fp) void msg_put(struct ptp_message *m) { m->refcnt--; - if (!m) + if (m != TAILQ_FIRST(&msg_pool) && m->refcnt == 0 ) + { + // pr_notice("TAILQ_INSERT_HEAD %s %i",msg_type_string(msg_type(m)),m->refcnt); TAILQ_INSERT_HEAD(&msg_pool, m, list); + } } int msg_sots_missing(struct ptp_message *m) diff --git a/port.c b/port.c index bf5738f..9d3a966 100644 --- a/port.c +++ b/port.c @@ -366,7 +366,6 @@ static int port_pdelay_request(struct port *p) msg = msg_allocate(); if (!msg) return -1; - memset(msg, 0, sizeof(*msg)); pdulen = sizeof(struct pdelay_req_msg); msg->hwts.type = p->timestamping; @@ -414,7 +413,6 @@ static int port_delay_request(struct port *p) msg = msg_allocate(); if (!msg) return -1; - memset(msg, 0, sizeof(*msg)); pdulen = sizeof(struct delay_req_msg); msg->hwts.type = p->timestamping; @@ -461,7 +459,6 @@ static int port_tx_announce(struct port *p) msg = msg_allocate(); if (!msg) return -1; - memset(msg, 0, sizeof(*msg)); pdulen = sizeof(struct announce_msg); msg->hwts.type = p->timestamping; @@ -523,8 +520,6 @@ static int port_tx_sync(struct port *p) msg_put(msg); return -1; } - memset(msg, 0, sizeof(*msg)); - memset(fup, 0, sizeof(*fup)); pdulen = sizeof(struct sync_msg); msg->hwts.type = p->timestamping; @@ -767,7 +762,6 @@ static int process_delay_req(struct port *p, struct ptp_message *m) msg = msg_allocate(); if (!msg) return -1; - memset(msg, 0, sizeof(*msg)); pdulen = sizeof(struct delay_resp_msg); msg->hwts.type = p->timestamping; @@ -895,8 +889,6 @@ static int process_pdelay_req(struct port *p, struct ptp_message *m) msg_put(rsp); return -1; } - memset(rsp, 0, sizeof(*rsp)); - memset(fup, 0, sizeof(*fup)); rsp_len = sizeof(struct pdelay_resp_msg); rsp->hwts.type = p->timestamping; Attached on this email you could also find patch file I think another good way to solve this behavior is to use a circular message pool with a fix size. What do you think from this idea? Best regards, Mario |
From: Keller, J. E <jac...@in...> - 2012-06-20 16:41:44
|
> -----Original Message----- > From: Richard Cochran [mailto:ric...@gm...] > Sent: Wednesday, June 20, 2012 9:38 AM > To: Christian Riesch > Cc: jak...@in...; lin...@li... > Subject: Re: [Linuxptp-users] Interesting (very wrong) PTP output > > On Thu, Jun 07, 2012 at 11:54:48AM +0200, Christian Riesch wrote: > > Even though Jacob had some other problem, I still am going to take this > patch (or something similar) after I get a chance to test it. > > I have a few comments... > > > Please find below a patch (it's only an quick hack and only > > compile-tested) that > > a) sets the clock to 0 ppb before doing anything else, > > Agreed, this should be done. > > > b) clamps s->drift to the maximum adjustment value at the first > calculation > > (estimation), > > Not sure if this is too important, but I guess it doesn't hurt. If the > actual drift is more than the max adjustment, then the situation is > hopeless anyhow. > > > c) applies the estimate to the clock. > > Not sure if this is too important either. The drift will dominate this > calculation in any case, right? > > > ki_term = s->ki * offset; > > ppb = s->kp * offset + s->drift + ki_term; > > Thanks, > Richard > > -------------------------------------------------------------------------- > ---- > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and threat > landscape has changed and how IT managers can respond. Discussions will > include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Linuxptp-users mailing list > Lin...@li... > https://lists.sourceforge.net/lists/listinfo/linuxptp-users I agree with the patch as well. My issue was different but this can help in odd cases (resetting the clock to 0ppb is good as that clears any previous settings) |
From: Richard C. <ric...@gm...> - 2012-06-20 16:38:33
|
On Thu, Jun 07, 2012 at 11:54:48AM +0200, Christian Riesch wrote: Even though Jacob had some other problem, I still am going to take this patch (or something similar) after I get a chance to test it. I have a few comments... > Please find below a patch (it's only an quick hack and only compile-tested) > that > a) sets the clock to 0 ppb before doing anything else, Agreed, this should be done. > b) clamps s->drift to the maximum adjustment value at the first calculation > (estimation), Not sure if this is too important, but I guess it doesn't hurt. If the actual drift is more than the max adjustment, then the situation is hopeless anyhow. > c) applies the estimate to the clock. Not sure if this is too important either. The drift will dominate this calculation in any case, right? > ki_term = s->ki * offset; > ppb = s->kp * offset + s->drift + ki_term; Thanks, Richard |
From: Keller, J. E <jac...@in...> - 2012-06-07 23:03:42
|
> -----Original Message----- > From: Christian Riesch [mailto:chr...@om...] > Sent: Thursday, June 07, 2012 2:55 AM > To: ric...@gm...; jak...@in... > Cc: lin...@li... > Subject: Re: [Linuxptp-users] Interesting (very wrong) PTP output > > Hi Jake and Richard, > > >On Tue, Jun 05, 2012 at 11:31:24PM +0000, Keller, Jacob E wrote: > >> It seems like somehow the p/i servo code is incorrect because once we > >> hit max adjust it never seems to move away from it. > > > > The servo clamps the adjustment to the maximum. It looks like it the > > servo output is always outside of the limit. You can inspect s->drift > > to get a better idea what is going on. > > I agree, looks like something is going wrong with s->drift, it seems to > get set to a very high value and does not recover from that. > > Since the value of s->drift is clamped to the maximum adjustment in pi.c > when > s->count == 4, it must be the initial calculation for s->count == 2. > s->This > calculation tries to get a good estimate for s->drift to allow a faster > settling of the control loop. However, there is no clamping of the value > of s->drift in this case. > > What are your scripts doing before ptp4l is started? Do they do some clock > adjustments? Note that the initial calculation of s->drift only works if > the clock is initially set to 0 ppb, otherwise some correction of the > calculated s->drift must be done. > > Finally, if we calculate an estimate for s->drift, we should also apply it > to the clock. > > Please find below a patch (it's only an quick hack and only compile- > tested) that > a) sets the clock to 0 ppb before doing anything else, > b) clamps s->drift to the maximum adjustment value at the first > calculation > (estimation), > c) applies the estimate to the clock. > > Regards, > Christian > > --- > clock.c | 3 +-- > pi.c | 25 +++++++++++++++++-------- > 2 files changed, 18 insertions(+), 10 deletions(-) > > diff --git a/clock.c b/clock.c > index e3797bf..c8216c3 100644 > --- a/clock.c > +++ b/clock.c > @@ -475,13 +475,12 @@ enum servo_state clock_synchronize(struct clock *c, > c->master_offset, state, adj, c->path_delay); > > switch (state) { > - case SERVO_UNLOCKED: > - break; > case SERVO_JUMP: > clock_step(c->clkid, -c->master_offset); > c->t1 = tmv_zero(); > c->t2 = tmv_zero(); > break; > + case SERVO_UNLOCKED: > case SERVO_LOCKED: > clock_ppb(c->clkid, -adj); > break; > diff --git a/pi.c b/pi.c > index 33766b1..743ae71 100644 > --- a/pi.c > +++ b/pi.c > @@ -59,28 +59,37 @@ static double pi_sample(struct servo *servo, > > switch (s->count) { > case 0: > - s->offset[0] = offset; > - s->local[0] = local_ts; > *state = SERVO_UNLOCKED; > - s->count = 1; > break; > case 1: > - s->offset[1] = offset; > - s->local[1] = local_ts; > + s->offset[0] = offset; > + s->local[0] = local_ts; > *state = SERVO_UNLOCKED; > s->count = 2; > break; > case 2: > - s->drift = (s->offset[1] - s->offset[0]) / > - (s->local[1] - s->local[0]); > + s->offset[1] = offset; > + s->local[1] = local_ts; > *state = SERVO_UNLOCKED; > s->count = 3; > break; > case 3: > - *state = SERVO_JUMP; > + s->drift = (s->offset[1] - s->offset[0]) / > + (s->local[1] - s->local[0]); > + if (s->drift < -s->maxppb) { > + s->drift = -s->maxppb; > + } else if (s->drift > s->maxppb) { > + s->drift = s->maxppb; > + } > + ppb = s->drift; > + *state = SERVO_UNLOCKED; > s->count = 4; > break; > case 4: > + *state = SERVO_JUMP; > + s->count = 5; > + break; > + case 5: > ki_term = s->ki * offset; > ppb = s->kp * offset + s->drift + ki_term; > if (ppb < -s->maxppb) { > -- > 1.7.4.1 > > > -------------------------------------------------------------------------- > ---- > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Linuxptp-users mailing list > Lin...@li... > https://lists.sourceforge.net/lists/listinfo/linuxptp-users I found an issue with the driver for the network adapter, that fixed the issue. I am not sure what this patch is trying to do. But I do think applying a ppb adjustment of 0 to the device originally can be useful to ensure its in a correct state during the unlocked stage. - Jake |
From: Keller, J. E <jac...@in...> - 2012-06-07 22:59:07
|
I found the bug in my driver: the time increment register was being zeroed by a reset path, and I wasn't properly resetting it, so the ptp daemon was unable to sync to it as master because the other end has a clock that wasn't changing. This was the cause of the problems. - Jake > -----Original Message----- > From: Keller, Jacob E [mailto:jac...@in...] > Sent: Wednesday, June 06, 2012 9:13 AM > To: Richard Cochran > Cc: lin...@li... > Subject: Re: [Linuxptp-users] Interesting (very wrong) PTP output > > Ok. I will add some instrumentation to see if I can verify the timestamps > are good. > > > -----Original Message----- > > From: Richard Cochran [mailto:ric...@gm...] > > Sent: Wednesday, June 06, 2012 9:12 AM > > To: Keller, Jacob E > > Cc: lin...@li... > > Subject: Re: [Linuxptp-users] Interesting (very wrong) PTP output > > > > BTW, this kind of degenerate behaviour can also be caused by wrong > > time stamps. > > > > Richard > > -------------------------------------------------------------------------- > ---- > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and threat > landscape has changed and how IT managers can respond. Discussions will > include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Linuxptp-users mailing list > Lin...@li... > https://lists.sourceforge.net/lists/listinfo/linuxptp-users |
From: Christian R. <chr...@om...> - 2012-06-07 10:11:36
|
Hi Jake and Richard, >On Tue, Jun 05, 2012 at 11:31:24PM +0000, Keller, Jacob E wrote: >> It seems like somehow the p/i servo code is incorrect because once >> we hit max adjust it never seems to move away from it. > > The servo clamps the adjustment to the maximum. It looks like it the > servo output is always outside of the limit. You can inspect s->drift > to get a better idea what is going on. I agree, looks like something is going wrong with s->drift, it seems to get set to a very high value and does not recover from that. Since the value of s->drift is clamped to the maximum adjustment in pi.c when s->count == 4, it must be the initial calculation for s->count == 2. This calculation tries to get a good estimate for s->drift to allow a faster settling of the control loop. However, there is no clamping of the value of s->drift in this case. What are your scripts doing before ptp4l is started? Do they do some clock adjustments? Note that the initial calculation of s->drift only works if the clock is initially set to 0 ppb, otherwise some correction of the calculated s->drift must be done. Finally, if we calculate an estimate for s->drift, we should also apply it to the clock. Please find below a patch (it's only an quick hack and only compile-tested) that a) sets the clock to 0 ppb before doing anything else, b) clamps s->drift to the maximum adjustment value at the first calculation (estimation), c) applies the estimate to the clock. Regards, Christian --- clock.c | 3 +-- pi.c | 25 +++++++++++++++++-------- 2 files changed, 18 insertions(+), 10 deletions(-) diff --git a/clock.c b/clock.c index e3797bf..c8216c3 100644 --- a/clock.c +++ b/clock.c @@ -475,13 +475,12 @@ enum servo_state clock_synchronize(struct clock *c, c->master_offset, state, adj, c->path_delay); switch (state) { - case SERVO_UNLOCKED: - break; case SERVO_JUMP: clock_step(c->clkid, -c->master_offset); c->t1 = tmv_zero(); c->t2 = tmv_zero(); break; + case SERVO_UNLOCKED: case SERVO_LOCKED: clock_ppb(c->clkid, -adj); break; diff --git a/pi.c b/pi.c index 33766b1..743ae71 100644 --- a/pi.c +++ b/pi.c @@ -59,28 +59,37 @@ static double pi_sample(struct servo *servo, switch (s->count) { case 0: - s->offset[0] = offset; - s->local[0] = local_ts; *state = SERVO_UNLOCKED; - s->count = 1; break; case 1: - s->offset[1] = offset; - s->local[1] = local_ts; + s->offset[0] = offset; + s->local[0] = local_ts; *state = SERVO_UNLOCKED; s->count = 2; break; case 2: - s->drift = (s->offset[1] - s->offset[0]) / - (s->local[1] - s->local[0]); + s->offset[1] = offset; + s->local[1] = local_ts; *state = SERVO_UNLOCKED; s->count = 3; break; case 3: - *state = SERVO_JUMP; + s->drift = (s->offset[1] - s->offset[0]) / + (s->local[1] - s->local[0]); + if (s->drift < -s->maxppb) { + s->drift = -s->maxppb; + } else if (s->drift > s->maxppb) { + s->drift = s->maxppb; + } + ppb = s->drift; + *state = SERVO_UNLOCKED; s->count = 4; break; case 4: + *state = SERVO_JUMP; + s->count = 5; + break; + case 5: ki_term = s->ki * offset; ppb = s->kp * offset + s->drift + ki_term; if (ppb < -s->maxppb) { -- 1.7.4.1 |
From: Keller, J. E <jac...@in...> - 2012-06-06 16:13:17
|
Ok. I will add some instrumentation to see if I can verify the timestamps are good. > -----Original Message----- > From: Richard Cochran [mailto:ric...@gm...] > Sent: Wednesday, June 06, 2012 9:12 AM > To: Keller, Jacob E > Cc: lin...@li... > Subject: Re: [Linuxptp-users] Interesting (very wrong) PTP output > > BTW, this kind of degenerate behaviour can also be caused by wrong time > stamps. > > Richard |
From: Keller, J. E <jac...@in...> - 2012-06-06 16:12:37
|
> -----Original Message----- > From: Richard Cochran [mailto:ric...@gm...] > Sent: Wednesday, June 06, 2012 9:10 AM > To: Keller, Jacob E > Cc: lin...@li... > Subject: Re: [Linuxptp-users] Interesting (very wrong) PTP output > > On Tue, Jun 05, 2012 at 11:31:24PM +0000, Keller, Jacob E wrote: > > > It seems like somehow the p/i servo code is incorrect because once we > > hit max adjust it never seems to move away from it. > > The servo clamps the adjustment to the maximum. It looks like it the servo > output is always outside of the limit. You can inspect s->drift to get a > better idea what is going on. > > > However that > > isn't the case when I run ptp4l without that script. I am still trying > > to determine what the script is doing to the ptp registers on the > > device (if anything) > > Futzing with the clock or the device registers behind the program's back > could well cause unexpected results. > The ethtool options script *shouldn't* be doing that, but I am not positive. I'll take a look at the drift value. Also, I ran the test with -l 100 and didn't see any missing timestamp warnings, so all of the timestamp values are at least getting from the driver if not the correct ones. Thanks > HTH, > Richard |
From: Richard C. <ric...@gm...> - 2012-06-06 16:12:26
|
BTW, this kind of degenerate behaviour can also be caused by wrong time stamps. Richard |
From: Richard C. <ric...@gm...> - 2012-06-06 16:10:36
|
On Tue, Jun 05, 2012 at 11:31:24PM +0000, Keller, Jacob E wrote: > It seems like somehow the p/i servo code is incorrect because once > we hit max adjust it never seems to move away from it. The servo clamps the adjustment to the maximum. It looks like it the servo output is always outside of the limit. You can inspect s->drift to get a better idea what is going on. > However that > isn't the case when I run ptp4l without that script. I am still > trying to determine what the script is doing to the ptp registers on > the device (if anything) Futzing with the clock or the device registers behind the program's back could well cause unexpected results. HTH, Richard |
From: Keller, J. E <jac...@in...> - 2012-06-05 23:31:31
|
n0825:[1]/root/linuxptp> ./ptp4l -m -v -P -f default.cfg -i eth2 ptp4l[28757]: selected /dev/ptp0 as PTP clock ptp4l[28757]: port 1: INITIALIZING to LISTENING on INITIALIZE ptp4l[28757]: port 1: new foreign master 001b21.fffe.cf83e4-1 ptp4l[28757]: selected best master clock 001b21.fffe.cf83e4 ptp4l[28757]: port 1: LISTENING to UNCALIBRATED on RS_SLAVE ptp4l[28757]: master offset 8053809267140 s0 adj +0 path delay -58248 ptp4l[28757]: master offset 8052809280419 s0 adj +0 path delay -55403 ptp4l[28757]: master offset 8051809142574 s0 adj +0 path delay -56020 ptp4l[28757]: master offset 8050809162084 s1 adj +0 path delay -56020 ptp4l[28757]: master offset -1000134414 s2 adj -250000000 path delay -56427 ptp4l[28757]: port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED ptp4l[28757]: master offset -750413376 s2 adj -250000000 path delay -56427 ptp4l[28757]: master offset -500400801 s2 adj -250000000 path delay -44558 ptp4l[28757]: master offset -250420149 s2 adj -250000000 path delay -36052 ptp4l[28757]: master offset -393488 s2 adj -250000000 path delay -36052 ptp4l[28757]: master offset 249589808 s2 adj -250000000 path delay -29589 ptp4l[28757]: master offset 499591481 s2 adj -250000000 path delay -5666 ptp4l[28757]: master offset 749574326 s2 adj -250000000 path delay 798 ptp4l[28757]: master offset 999593422 s2 adj -250000000 path delay 8104 ptp4l[28757]: master offset 1249583788 s2 adj -250000000 path delay 8104 ptp4l[28757]: master offset 1499600752 s2 adj -250000000 path delay 15263 ptp4l[28757]: master offset 1749590329 s2 adj -250000000 path delay 15263 ptp4l[28757]: master offset 1999616415 s2 adj -250000000 path delay 15287 ptp4l[28757]: master offset 2249605370 s2 adj -250000000 path delay 15281 ptp4l[28757]: master offset 2499629742 s2 adj -250000000 path delay 15285 ptp4l[28757]: master offset 2749620793 s2 adj -250000000 path delay 15291 --- This occurs sometimes after running a script that tests a bunch of ethtool commands. The result seems to be due to the -250m ppb adjustment never gets reset. However when I reperform the test, and instead of using the script before starting PTP, I force the offset to be a similar value by using testptp script to force clock adjustment, the restult is correct (it calculates the necessary large offset and then works fine) If I re-run the test again but force the 2nd machine to be slave, I get the reverse results: adj is +250000000 but the offset from master continually increases with no sign of converging. It seems like somehow the p/i servo code is incorrect because once we hit max adjust it never seems to move away from it. However that isn't the case when I run ptp4l without that script. I am still trying to determine what the script is doing to the ptp registers on the device (if anything) - Jake |
From: Stephan G. <st...@ga...> - 2012-05-30 19:52:55
|
Hi! > Oops. I added that code. My bad. Don't worry. :) We are more than happy that you built the whole ptp infrastructure into the kernel. > Please post the fix to netdev, and add the tag for stable like this: > > Cc:<st...@vg...> [affects 3.1 and newer] We will do that. Regards, Stephan |
From: Richard C. <ric...@gm...> - 2012-05-30 19:32:40
|
On Wed, May 30, 2012 at 06:20:32PM +0200, Stephan Gatzka wrote: > Hello Richard, > > I'm a colleague of Mario and we found the reason for the strange > data int the skb. > > >Wow, that looks really wrong. I don't see a MAC address anywhere. > > > > The problem lies in the file > linux/drivers/net/ethernet/freescale/fec_mpc52xx.c > > In the receive interrupt function mpc52xx_fec_rx_interrupt > the call to skb_defer_rx_timestamp is done using a freshly > allocated skb (called skb) and not with the received skb (called > rskb). Calling skb_defer_rx_timestamp with the correct parameter > engages the whole engine. Oops. I added that code. My bad. > We would be happy to make a patch for this fix. Do you think the > powerpc mailing list is more suitable than the linux-netdev? Please post the fix to netdev, and add the tag for stable like this: Cc: <st...@vg...> [affects 3.1 and newer] Thanks, Richard |