Menu

#1420 Pipelining + client-side timeouts lead to loss of synchronization with response stream and incorrect data delivery

closed-out-of-date
pipelining (3)
5
2015-02-20
2014-09-02
No

I've been tracking a data corruption/missing http status problem and I think I have enough data for a useful bug report.

The problem centers around the handling of queued requests in a pipeline when preceding requests are failed in libcurl after committing to a request/response transaction. In the table below, I show six GET requests pipelined on one connection. 'Time' is relative seconds since the creation of the connection. The first three requests are processed normally. The fourth request times out while processing the response body. The fifth request times out waiting for the response header. The sixth request is allowed to proceed but appears to be out-of-sync with the response stream. I haven't dumped the data in verbose mode but I'd guess that the sixth request is consuming the remainder of the fourth request's response body in some demented fashion.

Request Time Event
0 0 HEADEROUT issued
0 1 First HEADERIN data
0 13 Request completed, 200 status
1 0 HEADEROUT issued
1 13 First HEADERIN data
1 15 Request completed, 200 status
2 0 HEADEROUT issued
2 15 First HEADERIN data
2 20 Request completed, 200 status
3 0 HEADEROUT issued
3 20 First HEADERIN data
3 30 Timeout declared (easy error 28)
3 30 Request failed, easy 28 status
4 0 HEADEROUT issued
4 30 Timeout declared (easy error 28), no HEADERIN data seen
4 30 Request failed, easy 28 status
5 13 HEADEROUT issued
5 30 First DATAIN received, NO HEADERIN data for this request
5 43 Connection closed. This may be in response to server socket close.
5 43 Request appears to succeed but CURLINFO_RESPONSE_CODE returns 0.

The sixth request appears to succeed as far as multi_perform and multi_info_read are concerned. But fetching CURLINFO_RESPONSE_CODE returns 0 for status.

As a workaround, checking the status as above appears to be useful. I'm not certain that's 100% reliable or that the connection will be broken immediately at that point. This has the potential of going very wrong as far as data integrity goes.

If I understand this correctly, solutions would include:

  • Canceling/failing a request that's active on a pipeline results in failure to all requests farther down the pipeline.
  • Canceling/failing a request results in 'passivation' of the request. It no longer interacts with the caller but remains active sinking data from the response until satisfied.

Discussion

  • Monty Brandenberg

    I wonder if there's a non-pipelined scenario where this is a problem as well: request timeouts on keep-alive connections.

     
  • Daniel Stenberg

    Daniel Stenberg - 2014-09-04

    In the non-pipelined case, stopping a single request has no impact on the following ones.

     
    • Monty Brandenberg

      Is that because the connection is closed or because canceling the single request also sinks all response data when the cancel is performed after the request is sent?

       
  • Daniel Stenberg

    Daniel Stenberg - 2014-09-04

    If there's outstanding data in any direction when the transfer is terminated, the connection will be closed. I think that's what "bleeds" over into what it does even when pipelining is activated.

     
  • Monty Brandenberg

    What I've been seeing disagrees with the above description so I dug in with the debugger to investigate. There are two locations where timeouts are detected:

    1. At the top of multi_runsingle() which handles all 'mstate' modes of the request handle, and
    2. In Curl_readwrite() (called by multi_runsingle()) which implicitly supports only the PERFORM state.

    The implementation of the two differs greatly. Detector 1. seems to be the most likely to fire and it is the one that generates the issues I see. Differences:

    • Detector 1. marks 'bits.close' using connclose() and sets mstate to COMPLETED without doing cleanup or disconnecting. 2. does a more complete cleanup signalling all requests bound to the connection.
    • Once bits.close is set, the connection isn't really closed. The pipeline code continues to consider the connection when trying to moving pending requests to the send queue. But after selecting the connection as the best fit, the binding fails with a 'We can reuse, but we want a new connection anyway' message. This should still work but a connection that might have performed pipelining correctly will be passed over. Requests already bound to the connection can ask for data and will be given stale data from responses already committed to the wire.
    • Connection disconnect appears not to happen under client direction. Instead, it looks like Curl_disconnect() gets called when the server eventually closes the socket and libcurl reacts to the EOF. This can happen several requests after the timed out request.

    The state and connection management code looks a bit... sensitive. But some changes that look in order:

    • Single timeout detector or at least common code extracted into a function.
    • Logic for responding to a timed out request is going to be dependent upon 'mstate'. Rather than collapse all states into COMPLETED on timeout, mapping 'mstate' to one of a few states (pending, post-pending, post-completion) and then handling the timeout in the case statement might be better.
    • Once bits.close is set on the connection, Curl_readwrite() will continue to deliver data to any caller. Is this correct or should this become an error condition with some defensive code?
    • Curl_posttransfer() seems to be missing on some paths to Curl_done(). Haven't done a thorough survey, just a casual observation.
    • I'm not entirely sure the non-pipelined case is unaffected by this. The semi-zombie state of the connection might permit some part of a later request reusing the connection with stale data in the socket.

    I'm generally able to detect this problem by examining the HTTP status of a completed request. If data is delivered to the wrong request, I will get a '0' status back because the HTTP status line was never read as a header, only as data. But if a timeout is detected at just the wrong time, an HTTP status line for a preceding request could be fed to a later one leading to data corruption.

    I'm going to work on moving the signalling logic to the top of multi_runsingle() and preventing the consideration of connections for pipelining that have 'bits.close' set. That will probably be my local fix for 7.38.0.

     
  • Monty Brandenberg

    An example of what I'm investigating for a workaround:

    diff -r 4c25fb239336 curl/lib/multi.c
    --- a/curl/lib/multi.c  Wed Sep 10 18:59:12 2014 +0000
    +++ b/curl/lib/multi.c  Thu Sep 18 17:57:47 2014 -0400
    @@ -1019,6 +1019,12 @@
                doesn't work at this point). */
             connclose(data->easy_conn, "Disconnected with pending data");
             data->result = CURLE_OPERATION_TIMEDOUT;
    +        // Linden 1420
    +        if(data->easy_conn) {
    +          infof(data, "Request timed out.  Signal all requests on connection.\n");
    +          Curl_posttransfer(data);
    +          Curl_done(&data->easy_conn, data->result, FALSE);
    +        }
             multistate(data, CURLM_STATE_COMPLETED);
             break;
           }
    diff -r 4c25fb239336 curl/lib/url.c
    --- a/curl/lib/url.c    Wed Sep 10 18:59:12 2014 +0000
    +++ b/curl/lib/url.c    Thu Sep 18 17:57:47 2014 -0400
    @@ -2696,6 +2696,7 @@
                                      const struct connectdata *conn)
     {
       if((conn->handler->protocol & PROTO_FAMILY_HTTP) &&
    +     !conn->bits.close &&                                                // Linden 1420
          Curl_multi_pipeline_enabled(handle->multi) &&
          (handle->set.httpreq == HTTPREQ_GET ||
           handle->set.httpreq == HTTPREQ_HEAD) &&
    
     
  • Monty Brandenberg

    Just keeping people apprised of my work on this. For the second time, HTTP state
    internals defeated me. The 'bits.close' state does quite a bit of unexpected
    flipping between 'on' and 'off' states during request processing. It ended up
    being 'on' too often during a check for connection compatibility with pipelining.
    This resulted in pipelining mostly being inhibited.

    So, I added a bit whose semantics I control. It's used for only one purpose:
    preventing requests being assigned to a connection once that connection is
    known to be unusable (poisoned). A better solution compatible with the HTTP
    state plumbing will have to come from devs more familiar with it.

    diff -r -U5 curl-7.38.0/lib/multi.c curl/lib/multi.c
    --- curl-7.38.0/lib/multi.c     2014-09-07 17:50:41.000000000 -0400
    +++ curl/lib/multi.c    2014-10-10 10:52:34.631014500 -0400
    @@ -1017,10 +1017,17 @@
             /* Force the connection closed because the server could continue to
                send us stuff at any time. (The disconnect_conn logic used below
                doesn't work at this point). */
             connclose(data->easy_conn, "Disconnected with pending data");
             data->result = CURLE_OPERATION_TIMEDOUT;
    +        /* Linden 1420 */
    +        if(data->easy_conn) {
    +          infof(data, "Request timed out.  Signal all requests on connection.\n");
    +          data->easy_conn->bits.poisoned_1420 = TRUE;
    +          Curl_posttransfer(data);
    +          Curl_done(&data->easy_conn, data->result, FALSE);
    +        }
             multistate(data, CURLM_STATE_COMPLETED);
             break;
           }
         }
    
    diff -r -U5 curl-7.38.0/lib/url.c curl/lib/url.c
    --- curl-7.38.0/lib/url.c       2014-09-07 17:50:41.000000000 -0400
    +++ curl/lib/url.c      2014-10-10 10:53:54.744754200 -0400
    @@ -2694,10 +2694,11 @@
    
     static bool IsPipeliningPossible(const struct SessionHandle *handle,
                                      const struct connectdata *conn)
     {
       if((conn->handler->protocol & PROTO_FAMILY_HTTP) &&
    +     !conn->bits.poisoned_1420 &&                                 /* Linden 1420 */
          Curl_multi_pipeline_enabled(handle->multi) &&
          (handle->set.httpreq == HTTPREQ_GET ||
           handle->set.httpreq == HTTPREQ_HEAD) &&
          handle->set.httpversion != CURL_HTTP_VERSION_1_0)
         return TRUE;
    @@ -3647,11 +3648,12 @@
     #endif /* CURL_DISABLE_PROXY */
    
       conn->bits.user_passwd = (NULL != data->set.str[STRING_USERNAME])?TRUE:FALSE;
       conn->bits.ftp_use_epsv = data->set.ftp_use_epsv;
       conn->bits.ftp_use_eprt = data->set.ftp_use_eprt;
    -
    +  conn->bits.poisoned_1420 = FALSE;                     /* Linden 1420 */
    +  
       conn->verifypeer = data->set.ssl.verifypeer;
       conn->verifyhost = data->set.ssl.verifyhost;
    
       conn->ip_version = data->set.ipver;
    
    diff -r -U5 curl-7.38.0/lib/urldata.h curl/lib/urldata.h
    --- curl-7.38.0/lib/urldata.h   2014-09-03 09:22:36.000000000 -0400
    +++ curl/lib/urldata.h  2014-10-10 10:57:37.745145200 -0400
    @@ -543,10 +543,12 @@
                                     connection in a CONNECT request with auth, so
                                     that libcurl should reconnect and continue. */
       bool bound; /* set true if bind() has already been done on this socket/
                      connection */
       bool type_set;  /* type= was used in the URL */
    +  bool poisoned_1420; /* Linden 1420 - Connection became unusable and must not
    +                         be considered for pipelining. */
     };
    
     struct hostname {
       char *rawalloc; /* allocated "raw" version of the name */
       char *encalloc; /* allocated IDN-encoded version of the name */
    
     
  • Carlo Wood

    Carlo Wood - 2014-10-28

    I did extensive testing of http pipelining with respect to timeouts.
    For this I wrote a testsuite that you can find here: https://github.com/CarloWood/curl_http_pipeline_tester

    Running the testsuite (VERBOSE off) on vanilla libcurl
    (5b12a4b4e404e5b5a75d3509cce21553edc9a974) I get the following
    output:

    2014-10-28 18:48:00.944097: : Request #0 added [now running: 1]
    <html><body>Reply 1 on connection 1 for request #0</body></html>
    2014-10-28 18:48:00.944979: : Request #0 finished [now running: 0]
    2014-10-28 18:48:00.945011: : Request #1 added [now running: 1]
    2014-10-28 18:48:00.945022: : Request #2 added [now running: 2]
    2014-10-28 18:48:00.945032: : Request #3 added [now running: 3]
    2014-10-28 18:48:00.945058: : Request #4 added [now running: 4]
    <... 1.001327 seconds ...>
    2014-10-28 18:48:01.946385: : Request #1 TIMED OUT! [now running: 3]
    2014-10-28 18:48:01.946506: : Request #2 TIMED OUT! [now running: 2]
    2014-10-28 18:48:01.946561: : Request #4 TIMED OUT! [now running: 1]
    2014-10-28 18:48:01.946575: : Request #5 added [now running: 2]
    2014-10-28 18:48:01.946585: : Request #6 added [now running: 3]
    2014-10-28 18:48:01.946607: : Request #7 added [now running: 4]
    <html><body>Reply 2 on connection 1 for request #1</body></html>
    <... 0.099109 seconds ...>
    2014-10-28 18:48:02.045716: : Request #3 finished [now running: 3]
    2014-10-28 18:48:02.045758: : Request #8 added [now running: 4]
    <html><body>Reply 1 on connection 2 for request #5</body></html>
    <html><body>Reply 1 on connection 3 for request #6</body></html>
    <html><body>Reply 1 on connection 4 for request #7</body></html>
    <html><body>Reply 1 on connection 5 for request #8</body></html>
    <... 0.100185 seconds ...>
    2014-10-28 18:48:02.145943: : Request #5 finished [now running: 3]
    2014-10-28 18:48:02.145979: : Request #6 finished [now running: 2]
    2014-10-28 18:48:02.145991: : Request #7 finished [now running: 1]
    2014-10-28 18:48:02.146004: : Request #8 finished [now running: 0]
    2014-10-28 18:48:02.146017: : Request #9 added [now running: 1]
    <html><body>Reply 2 on connection 2 for request #9</body></html>
    <... 0.100499 seconds ...>
    2014-10-28 18:48:02.246516: : Request #9 finished [now running: 0]

    This shows the data corruption that Monty was talking about, since we still get the output "<html><body>Reply 2 on connection 1 for request #1</body></html>" for the timeout out request #1. Also in total 5 connections have been used instead of 2 (not 1, we need to create a new connection when the pipe breaks due to the timeout of #1). Request #5, #6, #7 and #8 all cause a new connection because curl forgot that the server can do pipelining. After receiving the reply on connection 2 for #5 it knows that this server does pipelining again, so request #9 is using connection 2 too.

    After applying Monty's patch, I get this result:

    2014-10-28 18:57:48.487717: : Request #0 added [now running: 1]
    <html><body>Reply 1 on connection 1 for request #0</body></html>
    2014-10-28 18:57:48.488689: : Request #0 finished [now running: 0]
    2014-10-28 18:57:48.488720: : Request #1 added [now running: 1]
    2014-10-28 18:57:48.488740: : Request #2 added [now running: 2]
    2014-10-28 18:57:48.488750: : Request #3 added [now running: 3]
    2014-10-28 18:57:48.488760: : Request #4 added [now running: 4]
    <... 1.001437 seconds ...>
    2014-10-28 18:57:49.490197: : Request #1 TIMED OUT! [now running: 3]
    2014-10-28 18:57:49.490252: : Request #2 TIMED OUT! [now running: 2]
    2014-10-28 18:57:49.490266: : Request #4 TIMED OUT! [now running: 1]
    2014-10-28 18:57:49.490281: : Request #5 added [now running: 2]
    2014-10-28 18:57:49.490292: : Request #6 added [now running: 3]
    2014-10-28 18:57:49.490302: : Request #7 added [now running: 4]
    <html><body>Reply 1 on connection 3 for request #3</body></html>
    <html><body>Reply 1 on connection 5 for request #5</body></html>
    <... 0.100334 seconds ...>
    2014-10-28 18:57:49.590636: : Request #3 finished [now running: 3]
    2014-10-28 18:57:49.590673: : Request #5 finished [now running: 2]
    2014-10-28 18:57:49.590688: : Request #8 added [now running: 3]
    2014-10-28 18:57:49.590700: : Request #9 added [now running: 4]
    <html><body>Reply 1 on connection 6 for request #6</body></html>
    2014-10-28 18:57:49.590815: : Request #6 finished [now running: 3]
    <html><body>Reply 1 on connection 7 for request #7</body></html>
    2014-10-28 18:57:49.590855: : Request #7 finished [now running: 2]
    <html><body>Reply 2 on connection 3 for request #8</body></html>
    <html><body>Reply 2 on connection 5 for request #9</body></html>
    <... 0.100399 seconds ...>
    2014-10-28 18:57:49.691254: : Request #8 finished [now running: 1]
    2014-10-28 18:57:49.691289: : Request #9 finished [now running: 0]

    As you see now we even made 7 connections in total. The reason for this
    is that although the timed out #2 and #4 attempt a new connection
    now, they time out immediately after and also close those connections as
    if the pipe broke (again)...

    We end up as before with 4 connections: 3, 5, 6 and 7.

     

    Last edit: Carlo Wood 2014-10-28
  • Carlo Wood

    Carlo Wood - 2014-10-28

    I fixed all http pipelining bugs that I could find with my testsuite.
    You can find/get the commits from https://github.com/CarloWood/curl

    The testsuite now gives this as output:

    2014-10-28 19:07:28.399012: : Request #0 added [now running: 1]
    <html><body>Reply 1 on connection 1 for request #0</body></html>
    2014-10-28 19:07:28.399972: : Request #0 finished [now running: 0]
    2014-10-28 19:07:28.400096: : Request #1 added [now running: 1]
    2014-10-28 19:07:28.400111: : Request #2 added [now running: 2]
    2014-10-28 19:07:28.400134: : Request #3 added [now running: 3]
    2014-10-28 19:07:28.400145: : Request #4 added [now running: 4]
    <... 1.000572 seconds ...>
    2014-10-28 19:07:29.400717: : Request #1 TIMED OUT! [now running: 3]
    2014-10-28 19:07:29.400759: : Request #2 TIMED OUT! [now running: 2]
    2014-10-28 19:07:29.400773: : Request #4 TIMED OUT! [now running: 1]
    2014-10-28 19:07:29.400786: : Request #5 added [now running: 2]
    2014-10-28 19:07:29.400797: : Request #6 added [now running: 3]
    2014-10-28 19:07:29.400808: : Request #7 added [now running: 4]
    <html><body>Reply 1 on connection 2 for request #3</body></html>
    <html><body>Reply 2 on connection 2 for request #5</body></html>
    <html><body>Reply 3 on connection 2 for request #6</body></html>
    <html><body>Reply 4 on connection 2 for request #7</body></html>
    <... 0.100344 seconds ...>
    2014-10-28 19:07:29.501152: : Request #3 finished [now running: 3]
    2014-10-28 19:07:29.501188: : Request #5 finished [now running: 2]
    2014-10-28 19:07:29.501201: : Request #6 finished [now running: 1]
    2014-10-28 19:07:29.501213: : Request #7 finished [now running: 0]
    2014-10-28 19:07:29.501227: : Request #8 added [now running: 1]
    2014-10-28 19:07:29.501237: : Request #9 added [now running: 2]
    <html><body>Reply 5 on connection 2 for request #8</body></html>
    <... 0.100530 seconds ...>
    2014-10-28 19:07:29.601767: : Request #8 finished [now running: 1]
    <html><body>Reply 6 on connection 2 for request #9</body></html>
    <... 0.037036 seconds ...>
    2014-10-28 19:07:29.638803: : Request #9 finished [now running: 0]

    After #1 times out, the connection is marked for closing (and it is actually closed, unlike in the previous two cases), a new connection is created (connection 2) which is used for request #3 that was in the pipe but didn't timeout yet, as well as all newly added requests.

    In total 2 connections are made, and only 1 connection exists at any time.

     
  • Carlo Wood

    Carlo Wood - 2014-10-29

    There was still a problem left when I tested this in practise: during my tests I was running the server on localhost, so connects were immediate. That caused me to miss the fact that multiple connection were STILL being made during the time between connect and connection established.

    Pushed a commit to the curl-7_38_0_pipelinefix branch that fixes this
    (Don't make many connections when pipelining after a timeout.)

    https://github.com/CarloWood/curl/commits/curl-7_38_0_pipelinefix

     
  • Carlo Wood

    Carlo Wood - 2014-10-30

    Hmm, if life was only this simple...

    0x7fffe0a9d700 CURLIO 0x7fffc9173c80 * Connection cache is full, closing the oldest one.
    0x7fffe0a9d700 CURLIO : 0x7fffc9173c80 Connection closed (#4); 0 connections remaining [0x7fffc8015110].
    0x7fffe0a9d700 CURLIO 0x7fffc9173c80 * Closing connection 4
    0x7fffe0a9d700 CURLIO 0x7fffc9173c80 * Removed connection 0x7fffd8484e18 to bundle at 0x7fffd849acb8, 0 connections left.
    0x7fffe0a9d700 CURLIO 0x7fffc9173c80 * Destroying bundle at 0x7fffd849acb8
    0x7fffe0a9d700 CURLIO 0x7fffc9173c80 * The cache now contains 9 members
    ...
    0x7fffe0a9d700 CURLIO 0x7fffc916d0d0 * Created a new bundle at 0x7fffd876fd58
    0x7fffe0a9d700 CURLIO 0x7fffc916d0d0 * Added connection 0x7fffd8484e18 to bundle at 0x7fffd876fd58, now 1 connections.
    0x7fffe0a9d700 CURLIO 0x7fffc916d0d0 * Added connection 19. The cache now contains 10 members
    ...
    and there we go again
    0x7fffe0a9d700 CURLIO 0x7fffc9073d90 * Server doesn't support pipelining
    ...
    0x7fffe0a9d700 CURLIO 0x7fffc916d0d0 * Added connection 0x7fffd8484e18 to bundle at 0x7fffd876fd58, now 1 connections.
    0x7fffe0a9d700 CURLIO 0x7fffc9073d90 * Added connection 0x7fffd95f1848 to bundle at 0x7fffd876fd58, now 2 connections.
    0x7fffe0a9d700 CURLIO 0x7fffc90574c0 * Added connection 0x7fffd959c2d8 to bundle at 0x7fffd876fd58, now 3 connections.
    0x7fffe0a9d700 CURLIO 0x7fffc89ec6a0 * Added connection 0x7fffd95ef888 to bundle at 0x7fffd876fd58, now 4 connections.
    0x7fffe0a9d700 CURLIO 0x7fffc89e9c10 * Added connection 0x7fffd93b2588 to bundle at 0x7fffd876fd58, now 5 connections.
    0x7fffe0a9d700 CURLIO 0x7fffc8f634b0 * Added connection 0x7fffd9786a38 to bundle at 0x7fffd876fd58, now 6 connections.
    0x7fffe0a9d700 CURLIO 0x7fffc8c703b0 * Added connection 0x7fffd93b1a68 to bundle at 0x7fffd876fd58, now 7 connections.
    0x7fffe0a9d700 CURLIO 0x7fffc9042600 * Added connection 0x7fffd8f5c558 to bundle at 0x7fffd876fd58, now 8 connections.
    0x7fffe0a9d700 CURLIO 0x7fffc916d850 * Added connection 0x7fffd8f5cc28 to bundle at 0x7fffd876fd58, now 9 connections.
    0x7fffe0a9d700 CURLIO 0x7fffc903d4b0 * Added connection 0x7fffd8f46b18 to bundle at 0x7fffd876fd58, now 10 connections.
    0x7fffe0a9d700 CURLIO 0x7fffc92629e0 * Added connection 0x7fffd9733428 to bundle at 0x7fffd876fd58, now 11 connections.
    0x7fffe0a9d700 CURLIO 0x7fffc8cd4020 * Added connection 0x7fffd9733af8 to bundle at 0x7fffd876fd58, now 12 connections.
    0x7fffe0a9d700 CURLIO 0x7fffc9174320 * Added connection 0x7fffd9615e78 to bundle at 0x7fffd876fd58, now 13 connections.
    0x7fffe0a9d700 CURLIO 0x7fffc916c7c0 * Added connection 0x7fffd9616548 to bundle at 0x7fffd876fd58, now 14 connections.
    0x7fffe0a9d700 CURLIO 0x7fffc91768f0 * Added connection 0x7fffd9616c18 to bundle at 0x7fffd876fd58, now 15 connections.
    0x7fffe0a9d700 CURLIO 0x7fffc89e67e0 * Added connection 0x7fffd96172e8 to bundle at 0x7fffd876fd58, now 16 connections.
    0x7fffe0a9d700 CURLIO 0x7fffc916e610 * Added connection 0x7fffd96179b8 to bundle at 0x7fffd876fd58, now 17 connections.
    0x7fffe0a9d700 CURLIO 0x7fffc917d7b0 * Added connection 0x7fffd9599b38 to bundle at 0x7fffd876fd58, now 18 connections.
    0x7fffe0a9d700 CURLIO 0x7fffc91b1930 * Added connection 0x7fffd959a208 to bundle at 0x7fffd876fd58, now 19 connections.
    0x7fffe0a9d700 CURLIO 0x7fffc9186ac0 * Added connection 0x7fffd975e328 to bundle at 0x7fffd876fd58, now 20 connections.
    0x7fffe0a9d700 CURLIO 0x7fffc813ee50 * Added connection 0x7fffd959a8d8 to bundle at 0x7fffd876fd58, now 21 connections.
    0x7fffe0a9d700 CURLIO 0x7fffc9283c90 * Added connection 0x7fffd959afa8 to bundle at 0x7fffd876fd58, now 22 connections.
    0x7fffe0a9d700 CURLIO 0x7fffc8e1ede0 * Added connection 0x7fffd959b678 to bundle at 0x7fffd876fd58, now 23 connections.
    0x7fffe0a9d700 CURLIO 0x7fffc813fde0 * Added connection 0x7fffd8f59428 to bundle at 0x7fffd876fd58, now 24 connections.
    0x7fffe0a9d700 CURLIO 0x7fffc916cf20 * Added connection 0x7fffd8f59af8 to bundle at 0x7fffd876fd58, now 25 connections.
    0x7fffe0a9d700 CURLIO 0x7fffc8f62d00 * Added connection 0x7fffd8f5a1c8 to bundle at 0x7fffd876fd58, now 26 connections.
    0x7fffe0a9d700 CURLIO 0x7fffc9282710 * Added connection 0x7fffd8f5a898 to bundle at 0x7fffd876fd58, now 27 connections.
    0x7fffe0a9d700 CURLIO 0x7fffc8e1d8d0 * Added connection 0x7fffd8f5af68 to bundle at 0x7fffd876fd58, now 28 connections.
    0x7fffe0a9d700 CURLIO 0x7fffc8abb270 * Added connection 0x7fffd8f5b638 to bundle at 0x7fffd876fd58, now 29 connections.
    0x7fffe0a9d700 CURLIO 0x7fffc8c2c3e0 * Added connection 0x7fffd8f5bd08 to bundle at 0x7fffd876fd58, now 30 connections.
    0x7fffe0a9d700 CURLIO 0x7fffc91a44f0 * Added connection 0x7fffd9583f88 to bundle at 0x7fffd876fd58, now 31 connections.

     
  • Daniel Stenberg

    Daniel Stenberg - 2014-10-30

    Thanks Carlo, but as Steve pointed out on the mailing list the other day, we much prefer having patches posted to the mailing list for review and discussions: http://curl.haxx.se/dev/contribute.html#How_to_get_your_changes_into_the

    Can you please post them there?

     
    • Carlo Wood

      Carlo Wood - 2014-10-30

      Hi Daniel,

      I'm still not satisfied with the result.

      The objective is to have only a single connection (I know that
      you can set a limit on the number of simultaneous requests in the
      pipeline and when THAT limit is reached then of course a new
      connection should be created; but for now lets assume that that
      limit is not or not reached; in that case want just one connection
      to be active-- the whole point of doing pipelining).

      If a bundle has only one connection, then the normal reaction of
      libcurl is to destroy the bundle when that connection is closed.

      The reasons why a connection can be closed are:
      1) One (or more) requests timeout.
      2) The server closes the connection (connection reset by peer).
      3) The connection cache runs full and the pipeline connection
      happens to be the oldest idle connection.

      Perhaps there are more reasons that I didn't run into yet.

      My first commits solved 1), and a commit that I didn't push yet
      should fix 2). Then I ran into 3)... and I'm starting to think
      that the attempt to keep the bundle alive each of the (special)
      cases is not the right approach.

      A better approach is look at it from the other side: when a
      server connection is made and it is not yet known if the that
      connection support pipelining or not then there are the following
      possibilities:

      1) Libcurl creates a new connection for every new request (this
      is the current behavior).
      2) Libcurl creates a single connection, and queues all other
      requests until it received all headers for that first request
      and decided if the server supports pipelining or not.
      If it does, it uses the same connection for the the queued
      requests. If it does not, then it creates new connections
      for each request as under 1).
      3) Libcurl assumes every connection can do pipelining and just
      sends all requests over the first connection, until it finds
      out that this server can't do pipelining - in which case the
      pipe 'breaks' and already existing code will cause all requests
      to be retried - now knowing that the server does not support
      pipelining.
      4) Libcurl doesn't wait for headers but uses information from
      the user to decide if the server is supposed to support
      http pipelining; meaning that it picks strategy 2) or 3) based
      on a flag set on the easy handle by the user.

      I think that option 4 is far the best experience for the user;
      however it is the hardest to implement because it requires to
      implement both 2) and 3) as well as add the code to add a new flag.
      The easiest solution is probably to always pick 3), but that is
      almost unacceptable 'dirty' for those connections that do not
      support pipelining; My current approach (for the user application)
      is to add only a single request and let that finish, so the application
      itself can detect if pipelining is supported or not, and when not
      to add that site to the blacklist. Doing that option 3) would not
      lead to 'dirty' failures, but I can hardly say that that seems like
      a good general solution; it would be far more favorable to add
      a flag for easy handles. Hence, if the choice is to not add
      a new flag to easy handles, so the user can specify preferences
      on a per easy handle case (as opposed to being forced to use a
      dedicated multi handle for pipelining) then option 2) seems the
      only reasonable choice.

      Note that just option 2) (instead of 4) isn't that bad: if pipelining
      is supported and the first request would cause a stall of -say-
      10 seconds, then also in that case all requests added behind that
      in the pipeline would be stalled ("queued" on the server, instead
      of client-side on the viewer). The main difference is that if
      pipelining is NOT supported then you don't create a lot of parallel
      connections right away and hence there is an extra delay more or
      less equal to the response time of the first request (typical less
      than a second) for the additional requests; which subsequently occurs
      every time that libcurl destroys the bundle structure (ie, when ALL
      connections are closed). I don't see that as something bad: if all
      connections are closed then apparently we're done with downloading
      the bulk - getting an extra sub-second delay the next burst seems
      very acceptable to me. Of course we only need to do with when the
      multi handle is flagged to support pipelining in the first place.
      The benefit is far larger!

       
  • Daniel Stenberg

    Daniel Stenberg - 2014-10-30

    Carlo, you're talking about further improvements and those are not at all related to the bug report you comment on here.

    Development and design discussions here will not reach the proper audience. Please take that to the curl-library list, thanks!

     
  • Daniel Stenberg

    Daniel Stenberg - 2014-12-09

    Is this still a problem after the recent pipelining fixes?

     
    • Monty Brandenberg

      Haven't had an opportunity to look over the fixes or give them a try. Everything is in github at this point?

       
      • Daniel Stenberg

        Daniel Stenberg - 2014-12-10

        Let's put it like this: we did quite some pipelining polishing and that is now in the master branch on github. And I believe this particular bug is fixed in there.

        That's not the end of the pipeline fixes though (nor the end of identified pipelining bugs). Carlo has pending patches in his branch but has had to slow down his work for now due to medical reasons so it isn't clear right now how the path forward from here goes or at what speed.

         
  • Daniel Stenberg

    Daniel Stenberg - 2014-12-09
    • status: open --> pending-needsinfo
    • assigned_to: Daniel Stenberg
     
  • Daniel Stenberg

    Daniel Stenberg - 2015-02-20

    Closing this issue now. If the problems remain, we either reopen this or we create a new bug with more and new info.

     
  • Daniel Stenberg

    Daniel Stenberg - 2015-02-20
    • status: pending-needsinfo --> closed-out-of-date