Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#304 Failure of socket tests on Mac OS X

lisp error
closed-fixed
Sam Steingold
clisp (525)
5
2006-02-23
2006-01-07
Gregory Wright
No

Hi,

On Mac OS X version 10.4.3, the socket tests give 30
failures for clisp 2.37. The underlying cause is
a return of EADDRNOTAVAIL from bind.

The cause of this is a failure to fill struct sockaddr_in
(or struct sockaddr_un) with zeroes before using it.
This is a requirement on OS X, and IIRC, older BSDs
as well. If the structure is not zero-filled, the
exact error generated may depend on the contents of
the uninitialed variable.

I attach a patch to fix this on OS X. If the problem
occurs on other OSs, the conditional can be changed
to cover those cases.

With the patch, all tests pass.

Best Wishes,
Greg Wright

Discussion

1 2 3 > >> (Page 1 of 3)
  • Doug Philips
    Doug Philips
    2006-01-08

    Logged In: YES
    user_id=392583

    Verified socket test failure (with different symptoms) on
    2006 January 04 and that the patch worked allowing all the
    tests to pass. While I don't know the CLISP internals enough
    to know if raw memset if the correct fix, I would strongly
    suggest that initializing the struct should be considered a
    cross-platform bug.

    --Doug Philips

     
  • Sam Steingold
    Sam Steingold
    2006-01-08

    Logged In: YES
    user_id=5735

    thanks.
    this patch (or something similar) will go in without ifdefs.
    I am not sure yet if we need to initialize _all_ these
    structs though...
    also, I would like to make sure that this bug has not been
    introduced recently but has been there all along.
    (tests/socket.tst is a recent addition)
    Greg, Doug, and everyone - please test some older release
    (e.g., 2.33).
    Thanks!

     
  • Sam Steingold
    Sam Steingold
    2006-01-08

    Logged In: YES
    user_id=5735

    also, the patch appears to blindly zero-out all sockaddr
    structs even though their slots are being initialized
    immediately after that.
    specifically, it is not clear why one has to init sockaddr
    before getsockname, getpeername, accept.
    http://www.opengroup.org/onlinepubs/009695399/functions/getpeername.html
    does not require that the argument is initialized.
    are these OSX bugs?
    please investigate.

    note that you do not need to redump lispinit.mem for these
    changes, i.e., you can change socket.d, do "make lisp.run",
    and test with
    ./lisp.run -M lispinit.mem -norc -i tests/test -x '(run-test
    "tests/socket")'

    thanks.

     
  • Gregory Wright
    Gregory Wright
    2006-01-09

    Logged In: YES
    user_id=1321993

    Hi,

    Yes, the patch does blindly zero out the whole sockaddr structure, even
    though the slots are initialized afterward. This is required by traditional
    BSDs (and evidently OS X as well). The reason is that the bind
    syscall (at least) doesn't parse the fields but just looks for the zeroes to find
    out where
    the address port and address fields end. This is not the case on Linux,
    and I think has been fixed on recent BSDs as well (it seems not to be required
    on FreeBSD 6, which I am also running).

    As you Sam notes, this is not posix behavior. However, it is documented
    as standard in the first edition of Stevens's _Unix Network Programming_"
    (see p 285, bzero is used to clear the whole sockaddr_in struct before the
    call to bind). Yes, I agree it ought to be fixed and will file an RFE with
    Apple for OS X.

    I can test on 2.33, but it is possible that the bug may exist but not show up,
    since it depends on the contents of an uninitialized automatic allocation.
    If zeroes show up at the end of the address field, the call will succeed.

    It is guaranteed safe to zero the whole struct on any OS, and the
    very minor performance hit --- the memory has to be pulled into cache
    anyway to initialize the other fileds--- is only taken once, on the
    initial setup of the struct.

    Best Wishes,
    Greg

     
  • Doug Philips
    Doug Philips
    2006-01-09

    Logged In: YES
    user_id=392583

    I concur with Greg. Uninitialized values are very tricky to
    test for. I've only recently started using CLISP again
    because I ran into troubles building the older versions
    under Mac OS X and did not have the time to become even a
    part-time contributor. I guess I could try that test that
    Sam requested, but I don't understand what the results would
    tell us. I was seeing very different failures than Greg was
    on the same code base to start with.
    --Doug

     
  • Sam Steingold
    Sam Steingold
    2006-01-09

    Logged In: YES
    user_id=5735

    what I don't understand is where the uninitialized values
    are coming from. we always properly init the relevant fields
    of all structs before calling bind et al (apparently, the
    osx bug is that it also looks at the "padding space").
    at any rate, I would like to see the following tests done:
    1. assertain that the oldest clisp you have available still
    have (at least some) socket problems
    2. find out what are the specific system calls that fail
    (you can use memset to set all bytes to 0xFF :-).
    thanks.

     
  • Gregory Wright
    Gregory Wright
    2006-01-09

    Logged In: YES
    user_id=1321993

    Yes, on traditional BSDs and OS X, the pad space is checked to ensure that
    it is all zero. This was because there were other sockaddr structures
    (e.g., sockaddr_ns for XNS) that were also 16 bytes long but
    had different field layouts. Checking the pad space was a sanity check
    that you hadn't set the address family to AF_INET but given a XNS address.

    The traditional idiom (Stevens, Unix Network Programming, 1st ed.) was
    always to bzero the sockaddr_* structures before setting any of the fields,
    to ensure that the empty space was always zero.

    I will test an older version of clisp by filling the empty space with 0xffs.
    2.33 an acceptable version?

    I know that one of the failed system calls is bind. get/setsockopt may also
    fail, but I haven't tested them yet.

    -Greg

     
  • Sam Steingold
    Sam Steingold
    2006-01-09

    Logged In: YES
    user_id=5735

    I only have the the 3rd edition,
    but google appears to support you.
    2.33 is old enough, thanks.

     
  • Sam Steingold
    Sam Steingold
    2006-01-12

    • status: open --> pending
     
1 2 3 > >> (Page 1 of 3)