The problem appears both in libcurl 7.16.4 and 7.18.0 (latest to date).
Application crashes when traversing an FTP site (which is simply a series of FTP LIST commands). The crash is at the beginning of Curl_done() when it is called with connection pointer equal to NULL:
file url.c:
CURLcode Curl_done(struct connectdata **connp, /* 'connp' points to NULL */
CURLcode status,
bool premature)
{
CURLcode result;
struct connectdata *conn = *connp; /* assigning NULL to 'conn' */
struct SessionHandle *data = conn->data; /* dereferencing... Oops! */
...
Curl_done() is called from Curl_perform():
file transfer.c:
CURLcode Curl_perform(struct SessionHandle *data)
{
...
do {
res = connect_host(data, &conn); /* primary connection */
if(res == CURLE_OK) {
...
/* !!! note: this may nullify the connection! */
res = Curl_do(&conn, &do_done);
if(res == CURLE_OK) {
...
/* long and ugly-looking sequence of nested if()'s */
...
}
else /* !!! note: Curl_do() returned failure, 'conn' may be NULL */
/* Curl_do() failed, clean up left-overs in the done-call */
res2 = Curl_done(&conn, res, FALSE); /* oops... */
Proposed solution:
Check for NULL connection at the beginning of Curl_done():
file url.c:
CURLcode Curl_done(struct connectdata **connp,
CURLcode status,
bool premature)
{
CURLcode result;
struct connectdata *conn = *connp;
struct SessionHandle *data;
if(conn == NULL)
return CURLE_OK; /* or some error code */
data = conn->data; /* safe to dereference */
Logged In: YES
user_id=1110
Originator: NO
Can you please elaborate on exactly when this happens and how it comes it passes NULL to this function? Can you provide a stand-alone example app that can reproduce this problem?
NULL is not a legal value to pass in so I think we should work out why it happened in the first place rather than just make Curl_done() deal with it.
Logged In: YES
user_id=1835005
Originator: YES
I will try my best to elaborate on it, but I doubt that you will be able to reproduce it.
First the globals.
The application recursively traverses an FTP server with lots of subdirectories and terabytes of data. The server is not accessible from Internet (it's in a local network), so you will not be able to connect to it for tests. The application is quite complex and scripts-driven, so I cannot post it here.
The error is quite rare. The application could run for hours without crashing, but the amount of directories in the server is so huge that even this rare bug appears regularly. Note that the server does not allow multiple connection from the same IP address. Sometimes it refuses to authenticate you after abnormal disconnection (it thinks you're still logged in).
Now the details.
The application creates one easy curl handle. The options of the handle are:
CURLOPT_MAXCONNECTS = 1 (this is important: I didn't notice any crashes before, when the default was 5).
CURLOPT_NOSIGNAL = 1 (true)
CURLOPT_URL is in form "ftp://10.x.x.x/path/path1/" (ends in slash, so the library retrieves the directory listing).
CURLOPT_TIMEOUT = 120 seconds (2 minutes)
CURLOPT_CONNECTTIMEOUT = 180 seconds (3 minutes)
CURLOPT_WRITEFUNCTION, CURLOPT_WRITEDATA - set.
CURLOPT_HEADERFUNCTION, CURLOPT_WRITEHEADER - also set.
CURLOPT_ERRORBUFFER - set to a buffer.
I have enabled verbose operation. I have also added some lines with fprintf(stderr, "%s:%d - connection is %p\n", __FILE__, __LINE__, conn); - to print traces in Curl_perform(). You can check the line numbers printed in the trace (file 'transfer.c' in curl 7.18.0; note that each added line shifted the text downwards):
.\transfer.c:2371 - between lines 2370 and 2371;
.\transfer.c:2379 - between lines 2377 and 2378;
.\transfer.c:2418 - between lines 2413 and 2414 (I had to add curly brackets to make it a block).
So here is the trace:
Getting listing of "ftp://10.x.x.x/Soundtracks/J/Jurassic Park III(2001)/"
* Re-using existing connection! (#0) with host 10.x.x.x
* Connected to 10.x.x.x (10.x.x.x) port 21 (#0)
.\transfer.c:2371 - connection is 0xab1bd8
> CWD /
< 250 CWD command successful
> CWD Soundtracks
< 250 CWD command successful
> CWD J
< 250 CWD command successful
> CWD Jurassic Park III(2001)
< 250 CWD command successful
> EPSV
* Connect data stream passively
< 229 Entering Extended Passive Mode (|||51060|)
* Trying 10.x.x.x... * Connection refused
* couldn't connect to host
* got positive EPSV response, but can't connect. Disabling EPSV
* Send failure: Connection was reset
* Re-used connection seems dead, get a new one
* Closing connection #0
* About to connect() to 10.x.x.x port 21 (#0)
* Trying 10.x.x.x... * connected
* Connected to 10.x.x.x (10.x.x.x) port 21 (#0)
< 220 FTP Server Ready ...
> USER anonymous
< 331 Anonymous login ok, send your complete email address as your password.
> PASS ftp@example.com
< 530 Maximum number of connections from your IP-address exceeded.
* Access denied: 530
* Closing connection #0
.\transfer.c:2379 - connection is (nil)
.\transfer.c:2418 - connection is (nil)
(* crash in Curl_done() *)
Logged In: YES
user_id=1110
Originator: NO
Aha, so the conn pointer is NULL already when Curl_do() returns (an error code).
So what about a fix like this:
--- lib/transfer.c 20 Feb 2008 08:28:02 -0000 1.383
+++ lib/transfer.c 21 Feb 2008 12:17:30 -0000
@@ -2389,7 +2389,7 @@
if(CURLE_OK == res)
res = res2;
}
- else
+ else if(conn)
/* Curl_do() failed, clean up left-overs in the done-call */
res2 = Curl_done(&conn, res, FALSE);
Logged In: YES
user_id=1110
Originator: NO
btw, if you give me your real name I'll make sure to credit you for this finding in the changelog!
Logged In: YES
user_id=1835005
Originator: YES
The fix looks logical for me. :)
You can credit me as Zmey Petroff <zmeypetroff@newmail.ru> if you wish.
Logged In: YES
user_id=1110
Originator: NO
Thanks a lot. Code committed to CVS just now. Case closed!