#1315 curl-7.34.0: hangs after hitting IPv6 address with no IPv6 available

closed-fixed
None
5
2013-12-28
2013-12-24
No

This was hit on a gentoo system and the downstream report can be seen at

https://bugs.gentoo.org/show_bug.cgi?id=495170

In brief, if curl 7.34.0 tries an ipv6 address when one isn't available, it falls into a loop and eats 100% cpu. This issue is not reproduceable on 7.33.0.

Here we used c-ares-1.9.1, libidn-1.28, openssl-1.0.1e and zlib-1.2.8.

Related

Bugs: #1317

Discussion

  • Bjorn Stenberg

    Bjorn Stenberg - 2013-12-24

    I am unable to reproduce this:

    ~/src/curl$ src/curl --version
    curl 7.34.0-DEV (x86_64-unknown-linux-gnu) libcurl/7.34.0-DEV OpenSSL/1.0.1e zlib/1.2.8 c-ares/1.10.0 libidn/1.28
    Protocols: dict file ftp ftps gopher http https imap imaps pop3 pop3s rtsp smtp smtps telnet tftp 
    Features: AsynchDNS Debug TrackMemory IDN IPv6 Largefile NTLM NTLM_WB SSL libz TLS-SRP
    
    ~/src/curl$ src/curl -v bad14.haxx.se
    * STATE: INIT => CONNECT handle 0xe57438; line 1011 (connection #-5000) 
    * Rebuilt URL to: bad14.haxx.se/
    * Hostname was NOT found in DNS cache
    * Adding handle: conn: 0xe913b8
    * Adding handle: send: 0
    * Adding handle: recv: 0
    * Curl_addHandleToPipeline: length: 1
    * 0xe57438 is at send pipe head!
    * - Conn 0 (0xe913b8) send_pipe: 1, recv_pipe: 0
    * STATE: CONNECT => WAITRESOLVE handle 0xe57438; line 1042 (connection #0) 
    *   Trying 2a00:1a28:1:2011::1...
    * Immediate connect fail for 2a00:1a28:1:2011::1: Network is unreachable
    * Closing connection 0
    * The cache now contains 0 members
    * Expire cleared
    curl: (7) Couldn't connect to server
    
     
  • Daniel Stenberg

    Daniel Stenberg - 2013-12-24

    Yes, the original reporter says it is a busy-loop in there but doesn't disclose any details of that. Björn's test case seems to at least show that this problem is not easy to repeat with "just" the non-working IPv6 address as a condition.

    If the original reporter can repeat this problem, it would be useful to see a gdb trace while single-stepping through the code that shows it looping and possibly showing some contents of local variables that make the code take that decision.

     
  • Daniel Stenberg

    Daniel Stenberg - 2013-12-24
    • assigned_to: Bjorn Stenberg
     
  • Michał Górny

    Michał Górny - 2013-12-25

    The issue happens when the host has both IPv6 and IPv4 addresses. bad14 has only IPv6 address.

    I can reproduce it with e.g. bad12. There it tries the two IPv6 addresses and then deadlocks.

    Here follows a bit of single-steeping with breakspoints, but I doubt it will be close to readable (all in connect.c):

    1108      while(res != CURLE_OK &&
    (gdb) n
    1109            conn->tempaddr[0] &&
    (gdb) n
    1110            conn->tempaddr[0]->ai_next &&
    (gdb) n
    1112        res = trynextip(conn, FIRSTSOCKET, 0);
    (gdb) n
    
    Breakpoint 3, trynextip (conn=conn@entry=0x100bc00, sockindex=sockindex@entry=0, 
        tempindex=tempindex@entry=0) at connect.c:537
    537 {
    (gdb) n
    544   curl_socket_t fd_to_close = conn->tempsock[tempindex];
    (gdb) n
    545   conn->tempsock[tempindex] = CURL_SOCKET_BAD;
    (gdb) p fd_to_close 
    $1 = -1
    (gdb) n
    538   CURLcode rc = CURLE_COULDNT_CONNECT;
    (gdb) n
    547   if(sockindex == FIRSTSOCKET) {
    (gdb) n
    551     if(conn->tempaddr[tempindex]) {
    (gdb) n
    553       family = conn->tempaddr[tempindex]->ai_family;
    (gdb) p conn->tempaddr[tempindex]
    $2 = (Curl_addrinfo *) 0x100c680
    (gdb) p *conn->tempaddr[tempindex]
    $3 = {ai_flags = 0, ai_family = 10, ai_socktype = 1, ai_protocol = 0, ai_addrlen = 28, 
      ai_canonname = 0x100c750 "bittern.gentoo.org", ai_addr = 0x100c770, ai_next = 0x100cc90}
    (gdb) n
    554       ai = conn->tempaddr[tempindex]->ai_next;
    (gdb) n
    559       family = (firstfamily == AF_INET) ? AF_INET6 : AF_INET;
    (gdb) p family
    $4 = 10
    (gdb) p ai
    $5 = (Curl_addrinfo *) 0x100cc90
    (gdb) n
    568         rc = singleipconnect(conn, ai, &conn->tempsock[tempindex]);
    (gdb) p family
    $6 = 10
    (gdb) n
    563     while(ai) {
    (gdb) p rc
    $7 = CURLE_COULDNT_CONNECT
    (gdb) n
    564       while(ai && ai->ai_family != family)
    (gdb) p *ai
    $8 = {ai_flags = 0, ai_family = 2, ai_socktype = 1, ai_protocol = 0, ai_addrlen = 16, 
      ai_canonname = 0x100ccd0 "bittern.gentoo.org", ai_addr = 0x100ccf0, ai_next = 0x0}
    (gdb) n
    565         ai = ai->ai_next;
    (gdb) n
    564       while(ai && ai->ai_family != family)
    (gdb) p ai
    $9 = (Curl_addrinfo *) 0x0
    (gdb) n
    567       if(ai) {
    (gdb) n
    579   if(fd_to_close != CURL_SOCKET_BAD)
    (gdb) n
    583 }
    (gdb) n
    
     
  • Michał Górny

    Michał Górny - 2013-12-25

    At a first thought, it seems that the conn->tempaddr[tempindex] condition doesn't stop evaluating to true, so it loops looking for yet another IPv6 address while there are only IPv4 addresses left.

     
  • Bjorn Stenberg

    Bjorn Stenberg - 2013-12-25

    Thank you. Unfortunately I can't repeat it against bad12 either:

    ~/src/curl$ src/curl -v bad12.haxx.se
    * STATE: INIT => CONNECT handle 0x19c3438; line 1011 (connection #-5000) 
    * Rebuilt URL to: bad12.haxx.se/
    * Hostname was NOT found in DNS cache
    * Adding handle: conn: 0x19fd3b8
    * Adding handle: send: 0
    * Adding handle: recv: 0
    * Curl_addHandleToPipeline: length: 1
    * 0x19c3438 is at send pipe head!
    * - Conn 0 (0x19fd3b8) send_pipe: 1, recv_pipe: 0
    * STATE: CONNECT => WAITRESOLVE handle 0x19c3438; line 1042 (connection #0) 
    *   Trying 80.67.4.176...
    * STATE: WAITRESOLVE => WAITCONNECT handle 0x19c3438; line 1103 (connection #0) 
    * connect to 80.67.4.176 port 80 failed: Connection refused
    *   Trying 80.67.6.1...
    * connect to 80.67.6.1 port 80 failed: Connection refused
    *   Trying 2a00:1a28:1200:9::1...
    * Immediate connect fail for 2a00:1a28:1200:9::1: Network is unreachable
    *   Trying 2a00:1a28:1:2010::1...
    * Immediate connect fail for 2a00:1a28:1:2010::1: Network is unreachable
    * Failed to connect to bad12.haxx.se port 80: Connection refused
    * Closing connection 0
    * The cache now contains 0 members
    * Expire cleared
    curl: (7) Failed to connect to bad12.haxx.se port 80: Connection refused
    

    We need to pin down exactly which circumstance is triggering it. Can you show me your output against bad12?

     
  • Michał Górny

    Michał Górny - 2013-12-26

    Sure.

    $ curl -v bad12.haxx.se
    * Rebuilt URL to: bad12.haxx.se/
    * Hostname was NOT found in DNS cache
    * Adding handle: conn: 0x84fbb0
    * Adding handle: send: 0
    * Adding handle: recv: 0
    * Curl_addHandleToPipeline: length: 1
    * - Conn 0 (0x84fbb0) send_pipe: 1, recv_pipe: 0
    *   Trying 2a00:1a28:1200:9::1...
    * Immediate connect fail for 2a00:1a28:1200:9::1: Sieć jest niedostępna
    *   Trying 2a00:1a28:1:2010::1...
    * Immediate connect fail for 2a00:1a28:1:2010::1: Sieć jest niedostępna
    

    After this, it hangs. The gdb stepping goes the same way as with my other example -- except for a different domain in the output :).

    I can reproduce it with today's snapshot as well. Additionally, I can confirm that it doesn't happen when c-ares is disabled.

     
  • Daniel Stenberg

    Daniel Stenberg - 2013-12-26

    Ack, I managed to repeat this just now - although it took me several attempts. I get the "hang" and when attaching gdb to the process I get to see the loop. See below:

    ~~~~

    0x0000000000429929 in trynextip (conn=0x1e32988, sockindex=0, tempindex=0)
    at connect.c:564
    564 while(ai && ai->ai_family != family)
    (gdb) bt

    0 0x0000000000429929 in trynextip (conn=0x1e32988, sockindex=0, tempindex=0)

    at connect.c:564
    

    1 0x000000000042a8b1 in Curl_connecthost (conn=0x1e32988, remotehost=0x1e33a38)

    at connect.c:1112
    

    2 0x000000000044f904 in Curl_setup_conn (conn=0x1e32988, protocol_done=0x7fff0727b6c9)

    at url.c:5573
    

    3 0x00000000004305e5 in Curl_async_resolved (conn=0x1e32988,

    protocol_done=0x7fff0727b6c9) at hostasyn.c:133
    

    4 0x000000000042c89a in multi_runsingle (multi=0x1e16e28, now=..., data=0x1df9788)

    at multi.c:1084
    

    5 0x000000000042de32 in curl_multi_perform (multi_handle=0x1e16e28,

    running_handles=0x7fff0727b7cc) at multi.c:1734
    

    6 0x00000000004276d9 in easy_transfer (multi=0x1e16e28) at easy.c:705

    7 0x000000000042789d in easy_perform (data=0x1df9788, events=false) at easy.c:784

    8 0x00000000004278e7 in curl_easy_perform (easy=0x1df9788) at easy.c:803

    9 0x0000000000418dca in operate (config=0x7fff0727bdd0, argc=3, argv=0x7fff0727c298)

    at tool_operate.c:1493
    

    10 0x00000000004128c9 in main (argc=3, argv=0x7fff0727c298) at tool_main.c:103

    (gdb) p ai
    $1 = (Curl_addrinfo ) 0x1e33c98
    (gdb) p family
    $2 = 10
    (gdb) n
    565 ai = ai->ai_next;
    (gdb) p
    ai
    $3 = {ai_flags = 0, ai_family = 2, ai_socktype = 1, ai_protocol = 0, ai_addrlen = 16,
    ai_canonname = 0x1e33ce8 "bad12.haxx.se", ai_addr = 0x1e33d18, ai_next = 0x0}
    (gdb) n
    564 while(ai && ai->ai_family != family)
    (gdb)
    567 if(ai) {
    (gdb)
    575 break;
    (gdb)
    579 if(fd_to_close != CURL_SOCKET_BAD)
    (gdb)
    582 return rc;
    (gdb)
    583 }
    (gdb)
    Curl_connecthost (conn=0x1e32988, remotehost=0x1e33a38) at connect.c:1108
    1108 while(res != CURLE_OK &&
    (gdb)
    1109 conn->tempaddr[0] &&
    (gdb)
    1108 while(res != CURLE_OK &&
    (gdb)
    1110 conn->tempaddr[0]->ai_next &&
    (gdb)
    1109 conn->tempaddr[0] &&
    (gdb)
    1111 conn->tempsock[0] == CURL_SOCKET_BAD)
    (gdb)
    1110 conn->tempaddr[0]->ai_next &&
    (gdb)
    1112 res = trynextip(conn, FIRSTSOCKET, 0);
    (gdb) s
    trynextip (conn=0x1e32988, sockindex=0, tempindex=0) at connect.c:538
    538 CURLcode rc = CURLE_COULDNT_CONNECT;
    (gdb) n
    544 curl_socket_t fd_to_close = conn->tempsock[tempindex];
    (gdb)
    545 conn->tempsock[tempindex] = CURL_SOCKET_BAD;
    (gdb)
    547 if(sockindex == FIRSTSOCKET) {
    (gdb)
    551 if(conn->tempaddr[tempindex]) {
    (gdb)
    553 family = conn->tempaddr[tempindex]->ai_family;
    (gdb)
    554 ai = conn->tempaddr[tempindex]->ai_next;
    (gdb)
    563 while(ai) {
    (gdb)
    564 while(ai && ai->ai_family != family)
    (gdb)

     
    Last edit: Daniel Stenberg 2013-12-26
  • Bjorn Stenberg

    Bjorn Stenberg - 2013-12-26

    I have a first shot at a patch now. I need to run more tests before submitting it, but would appreciate if you could try it out too.

    diff --git a/lib/connect.c b/lib/connect.c
    index 4b6ee00..97a0655 100644
    --- a/lib/connect.c
    +++ b/lib/connect.c
    @@ -1104,13 +1104,12 @@ CURLcode Curl_connecthost(struct connectdata *conn,  /* 
         conn->tempaddr[0]->ai_next == NULL ? timeout_ms : timeout_ms / 2;
    
       /* start connecting to first IP */
    -  res = singleipconnect(conn, conn->tempaddr[0], &(conn->tempsock[0]));
    -  while(res != CURLE_OK &&
    -        conn->tempaddr[0] &&
    -        conn->tempaddr[0]->ai_next &&
    -        conn->tempsock[0] == CURL_SOCKET_BAD)
    -    res = trynextip(conn, FIRSTSOCKET, 0);
    -
    +  while(conn->tempaddr[0]) {
    +    res = singleipconnect(conn, conn->tempaddr[0], &(conn->tempsock[0]));
    +    if(res == CURLE_OK)
    +        break;
    +    conn->tempaddr[0] = conn->tempaddr[0]->ai_next;
    +  }
       if(conn->tempsock[0] == CURL_SOCKET_BAD)
         return res;
    
     
  • Michał Górny

    Michał Górny - 2013-12-27

    It fixes the issue for me, thanks.

     
  • Remi Gacogne

    Remi Gacogne - 2013-12-27

    Hi,

    I ran into this bug too, and I also confirm that your patch fixes the issue for me.

    Thanks,

     
  • Daniel Stenberg

    Daniel Stenberg - 2013-12-27
    • status: open --> open-confirmed
     
  • Daniel Stenberg

    Daniel Stenberg - 2013-12-27

    Björn, once you're happy enough I hope you send a proper patch to the list?

     
  • Steve Holme

    Steve Holme - 2013-12-28

    Pushed in commit 4e1ece2e44f432c2614f2090155c0aaf2226ea80.

     
  • Daniel Stenberg

    Daniel Stenberg - 2013-12-28
    • status: open-confirmed --> closed-fixed
     
  • Daniel Stenberg

    Daniel Stenberg - 2013-12-28

    Thanks everyone, case closed!

     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks