|
From: Nicholas N. <n.n...@gm...> - 2009-04-28 07:03:34
Attachments:
fdleak_ipv4.c
|
Hi, none/tests/fdleak_ipv4 doesn't work on Darwin. I can't work out why, but I don't know much about network programming. If anyone who does (Filipe?) it would be very helpful. The program forks and creates a client process and a server process. If the server process runs first and reaches accept() before the client tries to connect(), it works fine. I tested this by inserting a sleep() in the client. But without that sleep(), on my machine usually the client reaches connect() first. It fails with "connection refused", which is reasonable, because the server hasn't started listening/accepting connections. But then on the retry, after the sleep(1), the connect() fails with an invalid argument. And this repeats 10 times around the client's connect() loop until the client gives up. It's the invalid argument errors that I don't understand. I looked at the xnu kernel sources, AFAICT the only way to get EINVAL from connect() is to make the 3rd arg (address_len) too small but that's definitely not the case. Besides, if we don't get EINVAL on the first try, why should we get it on subsequent tries with identical arguments? Finally, the server process never shuts down, it just hangs waiting for accept(), which complicates subsequent runs. Any ideas? I attach a version which prints messages before and after each network operation, which makes it easier to debug. Nick |
|
From: Bart V. A. <bar...@gm...> - 2009-04-28 07:20:28
|
On Tue, Apr 28, 2009 at 9:03 AM, Nicholas Nethercote <n.n...@gm...> wrote: > none/tests/fdleak_ipv4 doesn't work on Darwin. I can't work out why, > but I don't know much about network programming. If anyone who does > (Filipe?) it would be very helpful. Some remarks (found via source reading): * The statement "setsockopt(s, SOL_SOCKET, SO_REUSEADDR, &one, sizeof(int));" should be checked. While Linux accepts both one-byte and four-byte arguments for SO_REUSEADDR, some platforms only accept one-byte arguments when setting a boolean socket option. * The statement "addr.sin_port = 12321;" should be "addr.sin_port = htons(12321);", otherwise port number 12321 will be used on big endian systems and port number 8496 on little-endian systems. Bart. |
|
From: Bart V. A. <bar...@gm...> - 2009-04-28 10:00:37
|
On Tue, Apr 28, 2009 at 9:20 AM, Bart Van Assche <bar...@gm...> wrote: > On Tue, Apr 28, 2009 at 9:03 AM, Nicholas Nethercote > <n.n...@gm...> wrote: >> none/tests/fdleak_ipv4 doesn't work on Darwin. I can't work out why, >> but I don't know much about network programming. If anyone who does >> (Filipe?) it would be very helpful. > > Some remarks (found via source reading): > * The statement "setsockopt(s, SOL_SOCKET, SO_REUSEADDR, &one, > sizeof(int));" should be checked. While Linux accepts both one-byte > and four-byte arguments for SO_REUSEADDR, some platforms only accept > one-byte arguments when setting a boolean socket option. An update: at least for the 2.6.29 Linux kernel, a SOL_SOCKET option must have at least the size of an integer or it is rejected with error code EINVAL. SOL_IP setsockopt arguments can be either of type int or of type uchar. See also the kernel source files net/socket.c, net/core/sock.c and net/ipv4/ip_sockglue.c. Bart. |
|
From: Filipe C. <fi...@gm...> - 2009-04-28 15:53:29
|
Hi, Bart Van Assche wrote: > On Tue, Apr 28, 2009 at 9:03 AM, Nicholas Nethercote > <n.n...@gm...> wrote: >> none/tests/fdleak_ipv4 doesn't work on Darwin. I can't work out why, >> but I don't know much about network programming. If anyone who does >> (Filipe?) it would be very helpful. > > Some remarks (found via source reading): > * The statement "setsockopt(s, SOL_SOCKET, SO_REUSEADDR, &one, > sizeof(int));" should be checked. While Linux accepts both one-byte > and four-byte arguments for SO_REUSEADDR, some platforms only accept > one-byte arguments when setting a boolean socket option. > * The statement "addr.sin_port = 12321;" should be "addr.sin_port = > htons(12321);", otherwise port number 12321 will be used on big endian > systems and port number 8496 on little-endian systems. > Isn't that already being done? With the DO(...) macro? As for the error... The "Single UNIX specification" (http://www.unix.org/single_unix_specification/ ) says, about connect: -------------------------------- APPLICATION USAGE If connect() fails, the state of the socket is unspecified. Conforming applications should close the file descriptor and create a new socket before attempting to reconnect. -------------------------------- In Linux it may work, but it's not guaranteed to work on a UNIX. I'm creating a new socket in every iteration right now but still have some differences on the stderr side (some ellipsis we're not expecting and an extra connection refused). I'll see what's happening and make a patch later today. Regards, Filipe |
|
From: Bart V. A. <bar...@gm...> - 2009-04-28 15:59:46
|
On Tue, Apr 28, 2009 at 5:53 PM, Filipe Cabecinhas <fi...@gm...> wrote: > Bart Van Assche wrote: >> >> On Tue, Apr 28, 2009 at 9:03 AM, Nicholas Nethercote >> <n.n...@gm...> wrote: >>> >>> none/tests/fdleak_ipv4 doesn't work on Darwin. I can't work out why, >>> but I don't know much about network programming. If anyone who does >>> (Filipe?) it would be very helpful. >> >> Some remarks (found via source reading): >> * The statement "setsockopt(s, SOL_SOCKET, SO_REUSEADDR, &one, >> sizeof(int));" should be checked. While Linux accepts both one-byte >> and four-byte arguments for SO_REUSEADDR, some platforms only accept >> one-byte arguments when setting a boolean socket option. >> * The statement "addr.sin_port = 12321;" should be "addr.sin_port = >> htons(12321);", otherwise port number 12321 will be used on big endian >> systems and port number 8496 on little-endian systems. >> > Isn't that already being done? With the DO(...) macro? Are we looking at the same source code ? I didn't find any htons() call in the DO() macro in none/tests/fdleak.h on the DARWIN branch. Bart. |
|
From: Filipe C. <fi...@gm...> - 2009-04-28 17:25:28
Attachments:
valgrind-fdleak-test.patch
|
Bart Van Assche wrote: > On Tue, Apr 28, 2009 at 5:53 PM, Filipe Cabecinhas <fi...@gm...> wrote: >> Bart Van Assche wrote: >>> On Tue, Apr 28, 2009 at 9:03 AM, Nicholas Nethercote >>> <n.n...@gm...> wrote: >>>> none/tests/fdleak_ipv4 doesn't work on Darwin. I can't work out why, >>>> but I don't know much about network programming. If anyone who does >>>> (Filipe?) it would be very helpful. >>> Some remarks (found via source reading): >>> * The statement "setsockopt(s, SOL_SOCKET, SO_REUSEADDR, &one, >>> sizeof(int));" should be checked. While Linux accepts both one-byte >>> and four-byte arguments for SO_REUSEADDR, some platforms only accept >>> one-byte arguments when setting a boolean socket option. >>> * The statement "addr.sin_port = 12321;" should be "addr.sin_port = >>> htons(12321);", otherwise port number 12321 will be used on big endian >>> systems and port number 8496 on little-endian systems. >>> >> Isn't that already being done? With the DO(...) macro? > > Are we looking at the same source code ? I didn't find any htons() > call in the DO() macro in none/tests/fdleak.h on the DARWIN branch. > > Bart. I'm sorry, I was only addressing your first point, not the second. In my (attached) patch I added both htons(). The patch creates the socket inside the client loop and closes it on error (printing the error), creating a new socket again. I also changed the filter_fdleak to filter out lines containing (only) "connect: Connection refused". I also got this error: +./fdleak_ipv4: +dSYM directory has wrong UUID; consider using --auto-run-dsymutil=yes I ran with that argument and then (after creating the *.dSYM/), ran the regtest again and it worked. Shouldn't gcc create the directory when running with -g? Regards, Filipe |
|
From: Michael S. <ms...@xi...> - 2009-04-28 17:17:14
|
On Tue, Apr 28, 2009 at 12:03 AM, Nicholas Nethercote <n.n...@gm...> wrote: > Hi, > > none/tests/fdleak_ipv4 doesn't work on Darwin. I can't work out why, > but I don't know much about network programming. If anyone who does > (Filipe?) it would be very helpful. > > The program forks and creates a client process and a server process. > If the server process runs first and reaches accept() before the > client tries to connect(), it works fine. I tested this by inserting > a sleep() in the client. > > But without that sleep(), on my machine usually the client reaches > connect() first. It fails with "connection refused", which is > reasonable, because the server hasn't started listening/accepting > connections. But then on the retry, after the sleep(1), the connect() > fails with an invalid argument. And this repeats 10 times around the > client's connect() loop until the client gives up. The only other thing I can see that other source I've compared to does differently is to memset() the sockaddr_in to zero before filling in the other fields. Might darwin be checking that the other fields (which are just padding?) are zeroed? I see you've checked the kernel sources, which I haven't... but it's the only suggestion I have. Mike |
|
From: Michael S. <ms...@ap...> - 2009-04-28 17:28:36
|
I don't think you can reuse a socket after a failed connect()... On Apr 28, 2009, at 12:03 AM, Nicholas Nethercote wrote: > Hi, > > none/tests/fdleak_ipv4 doesn't work on Darwin. I can't work out why, > but I don't know much about network programming. If anyone who does > (Filipe?) it would be very helpful. > > The program forks and creates a client process and a server process. > If the server process runs first and reaches accept() before the > client tries to connect(), it works fine. I tested this by inserting > a sleep() in the client. > > But without that sleep(), on my machine usually the client reaches > connect() first. It fails with "connection refused", which is > reasonable, because the server hasn't started listening/accepting > connections. But then on the retry, after the sleep(1), the connect() > fails with an invalid argument. And this repeats 10 times around the > client's connect() loop until the client gives up. > > It's the invalid argument errors that I don't understand. I looked at > the xnu kernel sources, AFAICT the only way to get EINVAL from > connect() is to make the 3rd arg (address_len) too small but that's > definitely not the case. Besides, if we don't get EINVAL on the first > try, why should we get it on subsequent tries with identical > arguments? > > Finally, the server process never shuts down, it just hangs waiting > for accept(), which complicates subsequent runs. > > Any ideas? I attach a version which prints messages before and after > each network operation, which makes it easier to debug. > > Nick > < > fdleak_ipv4 > .c > > > ------------------------------------------------------------------------------ > Register Now & Save for Velocity, the Web Performance & Operations > Conference from O'Reilly Media. Velocity features a full day of > expert-led, hands-on workshops and two days of sessions from industry > leaders in dedicated Performance & Operations tracks. Use code > vel09scf > and Save an extra 15% before 5/3. http://p.sf.net/sfu/velocityconf_______________________________________________ > Valgrind-developers mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-developers ________________________________________ Michael R Sweet, Senior Printing System Engineer |
|
From: Michael S. <ms...@xi...> - 2009-04-28 17:33:49
|
On Tue, Apr 28, 2009 at 10:28 AM, Michael Sweet <ms...@ap...> wrote: > I don't think you can reuse a socket after a failed connect()... That is indeed the problem here. Which is... not mentioned in the documentation - the manpage (this from linux, but the osx one has similar wording) says "connection-based protocol sockets may successfully connect() only once" - but this didn't connect _successfully_. However, changing the test program to close the socket and create a new one makes it work. Arguably a bug in darwin, but easy enough to work around. Mike |
|
From: Nicholas N. <n.n...@gm...> - 2009-04-29 01:45:05
|
On Wed, Apr 29, 2009 at 3:33 AM, Michael Smith <ms...@xi...> wrote: > On Tue, Apr 28, 2009 at 10:28 AM, Michael Sweet <ms...@ap...> wrote: >> I don't think you can reuse a socket after a failed connect()... > > That is indeed the problem here. Which is... not mentioned in the > documentation - the manpage (this from linux, but the osx one has > similar wording) says "connection-based protocol sockets may > successfully connect() only once" - but this didn't connect > _successfully_. However, changing the test program to close the socket > and create a new one makes it work. Arguably a bug in darwin, but easy > enough to work around. Yes, it was this wording in the man page that fooled me. I've committed a patch similar to the one Filipe posted -- main difference is that I don't treat ECONNREFUSED specially, I just close and re-open the socket if connect() fails for any reason -- and the test is passing now. Thanks to everyone for their suggestions! On Darwin, I now get only 16 test failures in memcheck/ and 6 in none/, which is excellent. (Helgrind, DRD and exp-Ptrcheck still cause lots of failures, though, as they are all completely broken.) Nick |