This was hit on a gentoo system and the downstream report can be seen at
https://bugs.gentoo.org/show_bug.cgi?id=495170
In brief, if curl 7.34.0 tries an ipv6 address when one isn't available, it falls into a loop and eats 100% cpu. This issue is not reproduceable on 7.33.0.
Here we used c-ares-1.9.1, libidn-1.28, openssl-1.0.1e and zlib-1.2.8.
I am unable to reproduce this:
Yes, the original reporter says it is a busy-loop in there but doesn't disclose any details of that. Björn's test case seems to at least show that this problem is not easy to repeat with "just" the non-working IPv6 address as a condition.
If the original reporter can repeat this problem, it would be useful to see a gdb trace while single-stepping through the code that shows it looping and possibly showing some contents of local variables that make the code take that decision.
The issue happens when the host has both IPv6 and IPv4 addresses. bad14 has only IPv6 address.
I can reproduce it with e.g. bad12. There it tries the two IPv6 addresses and then deadlocks.
Here follows a bit of single-steeping with breakspoints, but I doubt it will be close to readable (all in
connect.c):At a first thought, it seems that the
conn->tempaddr[tempindex]condition doesn't stop evaluating to true, so it loops looking for yet another IPv6 address while there are only IPv4 addresses left.Thank you. Unfortunately I can't repeat it against bad12 either:
We need to pin down exactly which circumstance is triggering it. Can you show me your output against bad12?
Sure.
After this, it hangs. The gdb stepping goes the same way as with my other example -- except for a different domain in the output :).
I can reproduce it with today's snapshot as well. Additionally, I can confirm that it doesn't happen when c-ares is disabled.
Ack, I managed to repeat this just now - although it took me several attempts. I get the "hang" and when attaching gdb to the process I get to see the loop. See below:
~~~~
0x0000000000429929 in trynextip (conn=0x1e32988, sockindex=0, tempindex=0)
at connect.c:564
564 while(ai && ai->ai_family != family)
(gdb) bt
0 0x0000000000429929 in trynextip (conn=0x1e32988, sockindex=0, tempindex=0)
1 0x000000000042a8b1 in Curl_connecthost (conn=0x1e32988, remotehost=0x1e33a38)
2 0x000000000044f904 in Curl_setup_conn (conn=0x1e32988, protocol_done=0x7fff0727b6c9)
3 0x00000000004305e5 in Curl_async_resolved (conn=0x1e32988,
4 0x000000000042c89a in multi_runsingle (multi=0x1e16e28, now=..., data=0x1df9788)
5 0x000000000042de32 in curl_multi_perform (multi_handle=0x1e16e28,
6 0x00000000004276d9 in easy_transfer (multi=0x1e16e28) at easy.c:705
7 0x000000000042789d in easy_perform (data=0x1df9788, events=false) at easy.c:784
8 0x00000000004278e7 in curl_easy_perform (easy=0x1df9788) at easy.c:803
9 0x0000000000418dca in operate (config=0x7fff0727bdd0, argc=3, argv=0x7fff0727c298)
10 0x00000000004128c9 in main (argc=3, argv=0x7fff0727c298) at tool_main.c:103
(gdb) p ai
$1 = (Curl_addrinfo ) 0x1e33c98
(gdb) p family
$2 = 10
(gdb) n
565 ai = ai->ai_next;
(gdb) p ai
$3 = {ai_flags = 0, ai_family = 2, ai_socktype = 1, ai_protocol = 0, ai_addrlen = 16,
ai_canonname = 0x1e33ce8 "bad12.haxx.se", ai_addr = 0x1e33d18, ai_next = 0x0}
(gdb) n
564 while(ai && ai->ai_family != family)
(gdb)
567 if(ai) {
(gdb)
575 break;
(gdb)
579 if(fd_to_close != CURL_SOCKET_BAD)
(gdb)
582 return rc;
(gdb)
583 }
(gdb)
Curl_connecthost (conn=0x1e32988, remotehost=0x1e33a38) at connect.c:1108
1108 while(res != CURLE_OK &&
(gdb)
1109 conn->tempaddr[0] &&
(gdb)
1108 while(res != CURLE_OK &&
(gdb)
1110 conn->tempaddr[0]->ai_next &&
(gdb)
1109 conn->tempaddr[0] &&
(gdb)
1111 conn->tempsock[0] == CURL_SOCKET_BAD)
(gdb)
1110 conn->tempaddr[0]->ai_next &&
(gdb)
1112 res = trynextip(conn, FIRSTSOCKET, 0);
(gdb) s
trynextip (conn=0x1e32988, sockindex=0, tempindex=0) at connect.c:538
538 CURLcode rc = CURLE_COULDNT_CONNECT;
(gdb) n
544 curl_socket_t fd_to_close = conn->tempsock[tempindex];
(gdb)
545 conn->tempsock[tempindex] = CURL_SOCKET_BAD;
(gdb)
547 if(sockindex == FIRSTSOCKET) {
(gdb)
551 if(conn->tempaddr[tempindex]) {
(gdb)
553 family = conn->tempaddr[tempindex]->ai_family;
(gdb)
554 ai = conn->tempaddr[tempindex]->ai_next;
(gdb)
563 while(ai) {
(gdb)
564 while(ai && ai->ai_family != family)
(gdb)
Last edit: Daniel Stenberg 2013-12-26
I have a first shot at a patch now. I need to run more tests before submitting it, but would appreciate if you could try it out too.
It fixes the issue for me, thanks.
Hi,
I ran into this bug too, and I also confirm that your patch fixes the issue for me.
Thanks,
Björn, once you're happy enough I hope you send a proper patch to the list?
Pushed in commit 4e1ece2e44f432c2614f2090155c0aaf2226ea80.
Thanks everyone, case closed!