Menu

#23 tlc_getInterval() always returning 10ms

1.2.0.0
closed
1177 (1)
2014-02-27
2014-02-12
No

I have tracked down the source of the problem and in applying a fix, tracked down a secondary problem as well. I will try to explain both the primary and secondary issues as clearly as I can below. Also note that I have seen this problem while using PD messages. I haven't checked to see if the problem exists with MD messages.

Currently, tlc_getInterval() clears appHandle->nextJob with a call to vos_clearTime() just before calling trdp_pdCheckPending(). Using two "for" loops, trdp_pdCheckPending() iterates through first the receive queue and then the transmit queue for the purpose of setting appHandle->nextJob to the sooner of either the next expected PD message reception time or the next required PD message transmit time. Both loops contain the following code:

if(timerisset(&iterPD->interval) && / not PD PULL? /
timercmp(&iterPD->timeToGo, &appHandle->nextJob, <)) / earlier than current time-out?/
{
appHandle->nextJob = iterPD->timeToGo; / set new next time value from queue element /
}

The primary problem is that since appHandle->nextJob was previously cleared in tlc_getInterval(), the second condition using timercmp() will always return false. This results in appHandle->nextJob remaining cleared when returning to tlc_getInterval(). This in turn results in tlc_getInterval() executing the "else" part of the following code:

if (timerisset(&appHandle->nextJob) &&
timercmp(&now, &appHandle->nextJob, <))
{
vos_subTime(&appHandle->nextJob, &now);
pInterval = appHandle->nextJob;
}
else /
if unknown, set minimum poll time to 10ms /
{
pInterval->tv_sec = 0;
pInterval->tv_usec = TRDP_PROCESS_DEFAULT_CYCLE_TIME; /
Minimum poll time 10ms */
}

which is why tlc_getInterval() always returns 10ms.

I attempted to apply a fix by changing the if condition in both "for" loops of trdp_pdCheckPending() to the following:

if (timerisset(&iterPD->interval) && / has a time out value? /
( timercmp(&iterPD->timeToGo, &appHandle->nextJob, <) / earlier than current time-out? /
|| (appHandle->nextJob.tv_sec == 0 && appHandle->nextJob.tv_usec == 0) / nextJob not previously set /
)
)
{
appHandle->nextJob = iterPD->timeToGo; / set new next time value from queue element /
}

If this fix is applied only to the loop for the transmit queue, appHandle->nextJob is returned with a valid value for the next required transmit time and the proper value is then returned by tlc_getInterval(). However, if the fix is applied to the loop for the receive queue as well, we are back to tlc_getInterval() always returning 10ms. However, in this case the reason is different which leads to the secondary problem.

In addition to the messages I subscribed to, the receive queue also contains an entry for a PD message with ComId 31 which is a reserved TRDP message for retrieving TRDP statistical information. Since in my test setup, this message is never received, it is perpetually timed out and appHandle->nextJob ends up being set to a value in the past (i.e. before "now") which again results in tlc_getInterval() always returning 10ms.

For now I have applied my fix in my copy of the TRDP library to only the transmit queue so that my main thread calling tlc_process() does not wake up unnecessarily every 10ms. However, this secondary problem with the receive queue described should be evaluated from a design perspective since the issue may not only occur with the reserved ComId 31 message but with any subscribed message that has timed out, thereby resulting in, ironically, unnecessary extra CPU usage when messages are not being received.

One possible solution to this secondary problem may be to update the timeToGo field of timed out subscribed messages to the next timeout period instead of leaving it as a value perpetually in the past. Another possible solution would be to skip timed out messages in the receive queue instead of using them to update appHandle->nextJob and leave the responsibility with the application writer to provide a maximum sleep value should the value returned by tlc_getInterval() be very large. There may of course be other possible solutions that can be applied.

Related

Tickets: #23

Discussion

  • Armin-Hagen Weiss

    Michael, many thanks for the very detailed evaluation. Bernd will take care to update its code as soon as possible.

     
  • Armin-Hagen Weiss

    • status: open --> accepted
    • assigned_to: Bernd Löhr
     
  • Bernd Löhr

    Bernd Löhr - 2014-02-27

    Michael, many thanks for the thorough analysis of the problem from me, too!
    I will apply the fix you proposed, and will exclude further timing computation for timed-out packets. When using select(), reception of a late packet should lead to a premature return (and a call to tlc_process()) anyway.

     
    • Michael A. Antonelli

      Glad I could help and thanks for the feedback. :-)

      From: "Bernd Löhr" [mailto:bloehr@users.sf.net]
      Sent: Thursday, February 27, 2014 5:12 AM
      To: [tcnopen:tickets]
      Subject: [tcnopen:tickets] #23 tlc_getInterval() always returning 10ms

      Michael, many thanks for the thorough analysis of the problem from me, too!
      I will apply the fix you proposed, and will exclude further timing computation for timed-out packets. When using select(), reception of a late packet should lead to a premature return (and a call to tlc_process()) anyway.


      [tickets:#23]http://sourceforge.net/p/tcnopen/tickets/23/ tlc_getInterval() always returning 10ms

      Status: accepted
      Created: Wed Feb 12, 2014 04:26 PM UTC by Michael A. Antonelli
      Last Updated: Thu Feb 13, 2014 07:25 AM UTC
      Owner: Bernd Löhr

      I have tracked down the source of the problem and in applying a fix, tracked down a secondary problem as well. I will try to explain both the primary and secondary issues as clearly as I can below. Also note that I have seen this problem while using PD messages. I haven't checked to see if the problem exists with MD messages.

      Currently, tlc_getInterval() clears appHandle->nextJob with a call to vos_clearTime() just before calling trdp_pdCheckPending(). Using two "for" loops, trdp_pdCheckPending() iterates through first the receive queue and then the transmit queue for the purpose of setting appHandle->nextJob to the sooner of either the next expected PD message reception time or the next required PD message transmit time. Both loops contain the following code:

      if(timerisset(&iterPD->interval) && / not PD PULL? /
      timercmp(&iterPD->timeToGo, &appHandle->nextJob, <)) / earlier than current time-out?/
      {
      appHandle->nextJob = iterPD->timeToGo; / set new next time value from queue element /
      }

      The primary problem is that since appHandle->nextJob was previously cleared in tlc_getInterval(), the second condition using timercmp() will always return false. This results in appHandle->nextJob remaining cleared when returning to tlc_getInterval(). This in turn results in tlc_getInterval() executing the "else" part of the following code:

      if (timerisset(&appHandle->nextJob) &&
      timercmp(&now, &appHandle->nextJob, <))
      {
      vos_subTime(&appHandle->nextJob, &now);
      pInterval = appHandle->nextJob;
      }
      else / if unknown, set minimum poll time to 10ms /
      {
      pInterval->tv_sec = 0;
      pInterval->tv_usec = TRDP_PROCESS_DEFAULT_CYCLE_TIME; / Minimum poll time 10ms */
      }

      which is why tlc_getInterval() always returns 10ms.

      I attempted to apply a fix by changing the if condition in both "for" loops of trdp_pdCheckPending() to the following:

      if (timerisset(&iterPD->interval) && / has a time out value? /
      ( timercmp(&iterPD->timeToGo, &appHandle->nextJob, <) / earlier than current time-out? /
      || (appHandle->nextJob.tv_sec == 0 && appHandle->nextJob.tv_usec == 0) / nextJob not previously set /
      )
      )
      {
      appHandle->nextJob = iterPD->timeToGo; / set new next time value from queue element /
      }

      If this fix is applied only to the loop for the transmit queue, appHandle->nextJob is returned with a valid value for the next required transmit time and the proper value is then returned by tlc_getInterval(). However, if the fix is applied to the loop for the receive queue as well, we are back to tlc_getInterval() always returning 10ms. However, in this case the reason is different which leads to the secondary problem.

      In addition to the messages I subscribed to, the receive queue also contains an entry for a PD message with ComId 31 which is a reserved TRDP message for retrieving TRDP statistical information. Since in my test setup, this message is never received, it is perpetually timed out and appHandle->nextJob ends up being set to a value in the past (i.e. before "now") which again results in tlc_getInterval() always returning 10ms.

      For now I have applied my fix in my copy of the TRDP library to only the transmit queue so that my main thread calling tlc_process() does not wake up unnecessarily every 10ms. However, this secondary problem with the receive queue described should be evaluated from a design perspective since the issue may not only occur with the reserved ComId 31 message but with any subscribed message that has timed out, thereby resulting in, ironically, unnecessary extra CPU usage when messages are not being received.

      One possible solution to this secondary problem may be to update the timeToGo field of timed out subscribed messages to the next timeout period instead of leaving it as a value perpetually in the past. Another possible solution would be to skip timed out messages in the receive queue instead of using them to update appHandle->nextJob and leave the responsibility with the application writer to provide a maximum sleep value should the value returned by tlc_getInterval() be very large. There may of course be other possible solutions that can be applied.


      Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/tcnopen/tickets/23/

      To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

      This email and any attachments are only for use by the intended recipient(s) and may contain legally privileged, confidential, proprietary or otherwise private information. Any unauthorized use, reproduction, dissemination, distribution or other disclosure of the contents of this e-mail or its attachments is strictly prohibited. If you have received this email in error, please notify the sender immediately and delete the original.

       

      Related

      Tickets: #23

  • Armin-Hagen Weiss

    I will be out of the office starting 26.02.2014 and will not return until
    28.02.2014.

    I'll reply as fast as possible.

     
  • Bernd Löhr

    Bernd Löhr - 2014-02-27
    • status: accepted --> closed
     
  • Bernd Löhr

    Bernd Löhr - 2014-02-27

    check for nextJob == zero,
    major work in trdp_pdCheckPending(),
    if there are no pending telegrams (or all timed out), 10ms are returned. If the load for the application is too high, the app can increase the value or call select() with NULL pointer to block.

     
  • Armin-Hagen Weiss

    • labels: --> 1177
     

Log in to post a comment.

MongoDB Logo MongoDB