#1228 curl multi timeout does not work with timeout set less than 40ms

closed-fixed
None
5
2014-12-17
2013-05-22
Hang
No

In the latest source code lib/multi.c:

2178   now.tv_usec += 40000; /* compensate for bad precision timers that might've
2179                            triggered too early */
2180   if(now.tv_usec >= 1000000) {
2181     now.tv_sec++;
2182     now.tv_usec -= 1000000;
2183   }
2184 
2185   /*
2186    * The loop following here will go on as long as there are expire-times left
2187    * to process in the splay and 'data' will be re-assigned for every expired
2188    * handle we deal with.
2189    */
2190   do {
2191     /* the first loop lap 'data' can be NULL */
2192     if(data) {
2193       do
2194         result = multi_runsingle(multi, now, data->set.one_easy);
2195       while(CURLM_CALL_MULTI_PERFORM == result);

"now" has been artificially incremented by 40ms, and then passed in to the multi_runsingle(...) below.

Then, multi_runsingle() will check for timeouts:

964       timeout_ms = Curl_timeleft(data, &now,
965                                  (easy->state <= CURLM_STATE_WAITDO)?
966                                  TRUE:FALSE);

so, if the timeout setting is less than 40ms for the curl easy handle, the Curl_tvdiff(*nowp, data->progress.t_startsingle) will return 39ms at the very beginning of the curl easy request, and therefore timeout the request even before it gets started.

It will always prompt something like this (almost immediately after the first curl_multi_perform() call):
http://foo.com => (28) Connection timed out after 39 milliseconds

1 Attachments

Discussion

  • Daniel Stenberg

    Daniel Stenberg - 2013-05-23

    Thanks for your report! Do you have any suggestion what to do about this?

     
    • Hang

      Hang - 2013-05-23

      Daniel, Thanks for your prompt reply.
      First, I want to have a clear idea about the timeout mechanism. If I understand correctly, the extra 40ms added is used to make sure timeout takes place, even if the timer function returns prematurely (e.g. due to a bad precision timer). For example, if the precision timer is of poor accuracy, and returns 5ms prior to the timeout_ms set by the timer function, then by adding this 40ms, we can be sure timeout still takes place. Am I understanding it correctly?

      If so, I think it is a matter of granularity of timeout precision control. For modern hardware and Linux kernel, system calls like get_clocktime or even gettimeofday could yield precision better than 1ms. So, the least we can do is provide the user an option to customize this 40ms value, either as a macro (which can be overridden at build time), or as a curl easy option.

      Another question that I have: if we change this 40ms to something like 2ms (or even 1ms), and we indeed encounter a bad precision timer, which returns 5ms earlier, then does libcurl adjust its expectation correctly and set another timeout timer with the difference (by calling the timer function again)? Or does libcurl simply ignores the timeout event and does not adjust the timer?

      Thanks!

       
  • Daniel Stenberg

    Daniel Stenberg - 2013-05-23
    • assigned_to: Daniel Stenberg
     
  • Daniel Stenberg

    Daniel Stenberg - 2013-06-03

    In general I think it is worth to consider how important it is to have a timeout resolution working correctly within 40 milliseconds. It is a VERY short period of time.

    I can agree to adding ifdef logic to lower the threshold for systems we think are likely to work, like Linux, but I don't think we should aim for perfect millisecond precision.

    Does it work for your case as expected if you set that value to like 5 milliseconds?

     
    • Hang

      Hang - 2013-06-03

      Our use case is to use libcurl for high performance ultra-low latency web service calling, with high qps. Our avg latency of the web service is only 2ms (including connection time), and we need to set a strict and aggressive timeout value to ensure the web service does not block the main process path for too long for each request, in case something goes wrong. So, we need to set the timeout to 5ms or 10ms.

      I agree that 40ms is very short for most of the regular use cases, but since libcurl is intended to be a general framework for different use cases, I think it'd be better to make this value configurable (either in Makefile, or as a curl easy option).

      It does not have to be "perfect" timeout precision, but it at least should not have a lower bound of a hard-coded value (which is coarse for some high-perf use cases).

      If I lower the 40ms hard-coded value to 5ms and recompile libcurl, it will work much better for my use case.

      A side question: if the timer is indeed faulty, and returns much earlier than expected, say 50ms earlier than the due time, will libcurl be able to set a new timer with the difference?

       
  • Daniel Stenberg

    Daniel Stenberg - 2013-06-03

    This 40ms precaution was added in commit 2c72732ebf3da5e as a result of bad resolution in a windows function we use(d). Doing it unconditionally for all systems was just me trying to be careful, but I might also just have done more damage than fixing for other systems of course... I'm prepared to lower that by default (to just a few milli) and maintain 40ms for WIN32 systems until we get feedback from someone with such insights.

    The problematic case for which this was made, is when using the multi_socket API and libcurl has told the application about a timeout, and that timeout is what fires off a bit early. As we don't have any IDs associated with the timeout we can't tell which timeout that fired off but we only have the times to use to check what to do. If it fires off too early, we don't run the correct actions and we don't tell the application again about the same timeout as was already first in the queue.

    A possible fix could be what you're mentioning: if *socket_action() is called with TIMEOUT and nothing was found to work on, assume that happened due to a bad timer and call the application again to update/set the timeout value (which would then be for the same top node in the timeout tree).

    But it is much easier to just change the timeout granularity. Let's start there! What margin value do you suggest for your case?

     
  • Hang

    Hang - 2013-06-03

    I see. Thanks for sharing the background. I definitely agree with you that we can start with changing the timeout granularity first.

    So, first of all, we could add a #ifdef to make WIN32 a special case here.

    Then, for Linux and other platforms, I think we need to handle it in two different cases.
    First case, when user is not providing their own timer function, the lib is using its default, then we have some sort of "guarantee" or knowledge of the timer granularity (based on what timer the system has to provide). For example, if system has clock_gettime() then granularity could be pretty fine, and we could use some short timeout adjustment, such as 3ms or so (to be on the safe side, still).

    And the second case is when user is providing their own timer function. In this case, there is no way for the lib to tell the accuracy of their timer, hence the lib should not make any assumption. In this case, I'd suggest that we provide an additional curl easy option for the user to adjust the value based on their best knowledge of their timer. Of course, the curl easy option for this purpose could have some good all-rounder default, such as 5ms.

    What do you think?

     
  • Hang

    Hang - 2013-06-03

    Sorry, I take the first case back. It seems non-existing.

    So, for the second case (well, probably the only case), I think we should add a curl easy option with an appropriate default.

     
  • Daniel Stenberg

    Daniel Stenberg - 2013-06-09

    Well, while a good intention I don't think providing a setopt for that is a very good idea. Users won't know what to set it to. I think we can start with doing an #ifdef-thing for different environments' conditions and we can start with only special-casing win32...

     
    • Hang

      Hang - 2013-06-09

      Sure. That works for me, as long as the default value is <= 5ms for non-Win32 platform :)

       
  • Daniel Stenberg

    Daniel Stenberg - 2013-06-10
     
  • Daniel Stenberg

    Daniel Stenberg - 2013-06-10

    Ok, here's my suggested patch for this issue. Hang, if you tell me your full name I'll also give you full credit for this (no need though).

     
  • Hang

    Hang - 2013-06-10

    Thanks, Daniel.
    I'm not sure whether there is a code review process (and how it works), but the patch looks decent to me. Thanks for fixing this so quickly.

    BTW, my name is Hang Su.

     
  • Daniel Stenberg

    Daniel Stenberg - 2013-06-10
    • status: open --> closed-fixed
     
  • Daniel Stenberg

    Daniel Stenberg - 2013-06-10

    Thanks a lot for your report and help here. I think it is enough to have it documented here and I'll push it now to git to have it tested by others. Case closed!

     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks