Menu

#304 Failure of socket tests on Mac OS X

lisp error
closed-fixed
clisp (524)
5
2006-02-23
2006-01-07
No

Hi,

On Mac OS X version 10.4.3, the socket tests give 30
failures for clisp 2.37. The underlying cause is
a return of EADDRNOTAVAIL from bind.

The cause of this is a failure to fill struct sockaddr_in
(or struct sockaddr_un) with zeroes before using it.
This is a requirement on OS X, and IIRC, older BSDs
as well. If the structure is not zero-filled, the
exact error generated may depend on the contents of
the uninitialed variable.

I attach a patch to fix this on OS X. If the problem
occurs on other OSs, the conditional can be changed
to cover those cases.

With the patch, all tests pass.

Best Wishes,
Greg Wright

Discussion

  • Doug Philips

    Doug Philips - 2006-01-08

    Logged In: YES
    user_id=392583

    Verified socket test failure (with different symptoms) on
    2006 January 04 and that the patch worked allowing all the
    tests to pass. While I don't know the CLISP internals enough
    to know if raw memset if the correct fix, I would strongly
    suggest that initializing the struct should be considered a
    cross-platform bug.

    --Doug Philips

     
  • Sam Steingold

    Sam Steingold - 2006-01-08

    Logged In: YES
    user_id=5735

    thanks.
    this patch (or something similar) will go in without ifdefs.
    I am not sure yet if we need to initialize _all_ these
    structs though...
    also, I would like to make sure that this bug has not been
    introduced recently but has been there all along.
    (tests/socket.tst is a recent addition)
    Greg, Doug, and everyone - please test some older release
    (e.g., 2.33).
    Thanks!

     
  • Sam Steingold

    Sam Steingold - 2006-01-08

    Logged In: YES
    user_id=5735

    also, the patch appears to blindly zero-out all sockaddr
    structs even though their slots are being initialized
    immediately after that.
    specifically, it is not clear why one has to init sockaddr
    before getsockname, getpeername, accept.
    http://www.opengroup.org/onlinepubs/009695399/functions/getpeername.html
    does not require that the argument is initialized.
    are these OSX bugs?
    please investigate.

    note that you do not need to redump lispinit.mem for these
    changes, i.e., you can change socket.d, do "make lisp.run",
    and test with
    ./lisp.run -M lispinit.mem -norc -i tests/test -x '(run-test
    "tests/socket")'

    thanks.

     
  • Gregory Wright

    Gregory Wright - 2006-01-09

    Logged In: YES
    user_id=1321993

    Hi,

    Yes, the patch does blindly zero out the whole sockaddr structure, even
    though the slots are initialized afterward. This is required by traditional
    BSDs (and evidently OS X as well). The reason is that the bind
    syscall (at least) doesn't parse the fields but just looks for the zeroes to find
    out where
    the address port and address fields end. This is not the case on Linux,
    and I think has been fixed on recent BSDs as well (it seems not to be required
    on FreeBSD 6, which I am also running).

    As you Sam notes, this is not posix behavior. However, it is documented
    as standard in the first edition of Stevens's _Unix Network Programming_"
    (see p 285, bzero is used to clear the whole sockaddr_in struct before the
    call to bind). Yes, I agree it ought to be fixed and will file an RFE with
    Apple for OS X.

    I can test on 2.33, but it is possible that the bug may exist but not show up,
    since it depends on the contents of an uninitialized automatic allocation.
    If zeroes show up at the end of the address field, the call will succeed.

    It is guaranteed safe to zero the whole struct on any OS, and the
    very minor performance hit --- the memory has to be pulled into cache
    anyway to initialize the other fileds--- is only taken once, on the
    initial setup of the struct.

    Best Wishes,
    Greg

     
  • Doug Philips

    Doug Philips - 2006-01-09

    Logged In: YES
    user_id=392583

    I concur with Greg. Uninitialized values are very tricky to
    test for. I've only recently started using CLISP again
    because I ran into troubles building the older versions
    under Mac OS X and did not have the time to become even a
    part-time contributor. I guess I could try that test that
    Sam requested, but I don't understand what the results would
    tell us. I was seeing very different failures than Greg was
    on the same code base to start with.
    --Doug

     
  • Sam Steingold

    Sam Steingold - 2006-01-09

    Logged In: YES
    user_id=5735

    what I don't understand is where the uninitialized values
    are coming from. we always properly init the relevant fields
    of all structs before calling bind et al (apparently, the
    osx bug is that it also looks at the "padding space").
    at any rate, I would like to see the following tests done:
    1. assertain that the oldest clisp you have available still
    have (at least some) socket problems
    2. find out what are the specific system calls that fail
    (you can use memset to set all bytes to 0xFF :-).
    thanks.

     
  • Gregory Wright

    Gregory Wright - 2006-01-09

    Logged In: YES
    user_id=1321993

    Yes, on traditional BSDs and OS X, the pad space is checked to ensure that
    it is all zero. This was because there were other sockaddr structures
    (e.g., sockaddr_ns for XNS) that were also 16 bytes long but
    had different field layouts. Checking the pad space was a sanity check
    that you hadn't set the address family to AF_INET but given a XNS address.

    The traditional idiom (Stevens, Unix Network Programming, 1st ed.) was
    always to bzero the sockaddr_* structures before setting any of the fields,
    to ensure that the empty space was always zero.

    I will test an older version of clisp by filling the empty space with 0xffs.
    2.33 an acceptable version?

    I know that one of the failed system calls is bind. get/setsockopt may also
    fail, but I haven't tested them yet.

    -Greg

     
  • Sam Steingold

    Sam Steingold - 2006-01-09

    Logged In: YES
    user_id=5735

    I only have the the 3rd edition,
    but google appears to support you.
    2.33 is old enough, thanks.

     
  • Sam Steingold

    Sam Steingold - 2006-01-12
    • status: open --> pending
     
  • Lennart Staflin

    Lennart Staflin - 2006-01-25

    Logged In: YES
    user_id=30503

    I found this comment in the OpenMCL source:

    ;; Darwin includes the SIN_ZERO field of the sockaddr_in when
    ;; comparing the requested address to the addresses of
    configured
    ;; interfaces (as if the zeros were somehow part of either
    address.)
    ;; "rletz" zeros out the stack-allocated structure, so
    those zeros
    ;; will be 0.

     
  • Sam Steingold

    Sam Steingold - 2006-01-27
    • assigned_to: sds --> haible
    • status: pending --> open
     
  • Sam Steingold

    Sam Steingold - 2006-01-27

    Logged In: YES
    user_id=5735

    bruno, it is up to you now.
    there was a discussion on clisp-list about it.
    my patch works, but I am not sure it is not an overkill

     
  • Sam Steingold

    Sam Steingold - 2006-02-16

    patch - final

     
  • Sam Steingold

    Sam Steingold - 2006-02-16

    Logged In: YES
    user_id=5735

    required action from OS X users:
    0. record OS version; check out clisp cvs head.
    1. make sure that sockets break without the patch by running
    ./clisp -norc -i tests/tests -x '(run-test "tests/socket")'
    2. apply the patch, make sure the test works
    3. modify socket.d and replace "0" with "1" in
    #define FILL0(s) memset((void*)&s,0,sizeof(s))
    so that it reads
    #define FILL0(s) memset((void*)&s,1,sizeof(s))
    and make sure that the test fails in an identical way to
    <1> above.

    report results here:
    os version
    did pristine version fail
    did patched (FILL0) version fail
    did FILL1 version fail

     
  • Sam Steingold

    Sam Steingold - 2006-02-16
    • summary: Failure of socket tests on Mac OS X (with patch) --> Failure of socket tests on Mac OS X
     
  • Jörg Höhle

    Jörg Höhle - 2006-02-21

    Logged In: YES
    user_id=377168

    Joe Koski reports these results:
    1. OS X 10.3.9
    2. pristine 2.38 fails, but not with CVS stream.d patch
    3. FILL0 socket.d patch passes tests
    4. FILL1 socket.d patch causes tests to fail

    Greg mentions this as "traditional BSD" behaviour. Shouldn't
    the FILL0 patch then get used via #ifdef UNIX_BSD(sp?)
    (supposing that MacOSX causes such a define to be set? What
    do users think?

     
  • Chris Johnsen

    Chris Johnsen - 2006-02-22

    Logged In: YES
    user_id=477349

    Here are my results under Mac OS X 10.4.4:

    The stream.d 1.558 changes seem to prevent the socket test failures. Neither
    "FILL0", nor "FILL1" affect the results of the socket tests.

    Without the stream.d 1.558 changes, using the "FILL1" patch does not prevent
    the socket test failures. However, using the "FILL0" patch does prevent the
    socket test failures.

    "FILL0" means socket-osx.diff is applied.
    "FILL1" means socket-osx.diff is applied and the second arg to memset in
    the FILL0 macro is changed from 0 to 1.

    The test machine is running Mac OS X 10.4.4 and Xcode 2.2.1 (gcc labels
    itself
    as 4.0.1, Apple build 5250).

    Here "CVS" is CVS as of 2004-06-21 12:19 GMT:

    Source tests/socket results
    CVS ("tests/socket" 60 0)
    CVS FILL0 ("tests/socket" 60 0)
    CVS FILL1 ("tests/socket" 60 0)

    Here "CVS-stream.d" is CVS as of 2006-02-21 12:19 GMT with src/stream.d
    rolled back to 1.557 (like the 2.38 release):

    Source tests/socket results
    CVS-stream.d ("tests/socket" 60 32)
    CVS-stream.d FILL0 ("tests/socket" 60 0)
    CVS-stream.d FILL1 ("tests/socket" 60 32)

    The failures (as reported in the socket.erg files) are identical for the
    tests when run with stream.d 1.557 both with and without the FILL1 patch.

     
  • Sam Steingold

    Sam Steingold - 2006-02-22

    Logged In: YES
    user_id=5735

    Chris, thanks a lot for testing!
    Are you saying that CVS head does not have the problem?
    in that case the patch is not needed and will be discarded.
    OTOH, it might be that the change in the default interface
    made the difference and the problem is still there.
    Could you please try the :interface argument to socket-server
    (pass :interface "localhost" or :interface "127.0.0.1")
    thanks!

     
  • Chris Johnsen

    Chris Johnsen - 2006-02-23

    Logged In: YES
    user_id=477349

    From the following, it looks like both the stream.d and socket.d changes are
    important. Good call on asking about the results of using :interface. I have
    never used the socket API in CLISP before, so I hope the following test is
    something like what you wanted...

    (let* ((interfaces (list "127.0.0.1" "localhost" "0.0.0.0" "10.0.0.104"))
    (arg-tails (mapcar (lambda (i) (list :interface i)) interfaces)))
    (mapcar (lambda (tail) (apply (function socket:socket-server) 0 tail))
    (list* nil arg-tails)))

    [Note: 10.0.0.104, is an address that was assigned to one of my network
    interfaces when I ran the tests.]

    Again, all these results are based on build derived from a CVS update as of
    2006-01-21 12:19 GMT. "FILL0" has the patch from this bug report applied.
    "FILL1" has the same patch applied, but with the memset changed to fill with
    1 instead of 0. The "stream.d" variants have the 1.557 version of stream.d
    instead of the 1.558 version that was present as of the CVS update.

    CVS
    ---
    (#<SOCKET-SERVER 0.0.0.0:49314> #<SOCKET-SERVER 0.0.0.0:49315>
    #<SOCKET-SERVER 0.0.0.0:49316> #<SOCKET-SERVER 0.0.0.0:49317>
    #<SOCKET-SERVER 0.0.0.0:49318>)

    CVS FILL0
    ---------
    (#<SOCKET-SERVER 0.0.0.0:49319> #<SOCKET-SERVER 127.0.0.1:49320>
    #<SOCKET-SERVER 127.0.0.1:49321> #<SOCKET-SERVER 0.0.0.0:49322>
    #<SOCKET-SERVER 10.0.0.104:49323>)

    CVS FILL1
    ---------
    (#<SOCKET-SERVER 0.0.0.0:49324> #<SOCKET-SERVER 0.0.0.0:49325>
    #<SOCKET-SERVER 0.0.0.0:49326> #<SOCKET-SERVER 0.0.0.0:49327>
    #<SOCKET-SERVER 0.0.0.0:49328>)

    CVS-stream.d
    ------------

    [stream.d:13882] *** - UNIX error 49 (EADDRNOTAVAIL): Cannot assign
    requested address

    CVS-stream.d FILL0
    ------------------
    (#<SOCKET-SERVER 127.0.0.1:49329> #<SOCKET-SERVER
    127.0.0.1:49331> #<SOCKET-SERVER 127.0.0.1:49333> #<SOCKET-SERVER
    127.0.0.1:49335> #<SOCKET-SERVER 127.0.0.1:49337>)

    CVS-stream.d FILL1
    ------------------

    [stream.d:13882] *** - UNIX error 49 (EADDRNOTAVAIL): Cannot assign
    requested address

    So to me, it looks like "CVS FILL0" is the only variation that works as
    expected. This would indicate that both the FILL0 patch (associated with this
    bug report) and the changes already made to stream.d in version 1.558 are
    both required for correct functionality on Mac OS X 10.4.4. The CVS and CVS
    FILL1 variants are close to working, but they seem to not accurately honor the
    request to bind to a particular interface address. The stream.d-1.557 FILL0
    version seems to work for local-use-only sockets.

    It might be useful to add a socket test that does something along the lines of
    this:

    (let* ((addr-list (list "127.0.0.1" "0.0.0.0"))
    (host-list (list* "0.0.0.0" addr-list))
    (arg-tails (list* nil
    (mapcar (lambda (i) (list :interface i)) addr-list))))
    (equal (mapcar (lambda (tail)
    (socket:socket-server-host
    (apply (function socket:socket-server) 0 tail)))
    arg-tails)
    host-list))

    That would make sure that requesting binding to all interfaces or just
    localhost works (and verifies the default binding if no :interface is supplied).

    I hope this info is useful.

     
  • Sam Steingold

    Sam Steingold - 2006-02-23

    Logged In: YES
    user_id=5735

    thank you for your bug report.
    the bug has been fixed in the CVS tree.
    you can either wait for the next release (recommended)
    or check out the current CVS tree (see http://clisp.cons.org\)
    and build CLISP from the sources (be advised that between
    releases the CVS tree is very unstable and may not even build
    on your platform).

     
  • Sam Steingold

    Sam Steingold - 2006-02-23
    • assigned_to: haible --> sds
    • milestone: 100332 --> lisp error
    • status: open --> closed-fixed
     
  • Nirendra

    Nirendra - 2006-06-13

    Logged In: YES
    user_id=1539062

    Just a note that I also experienced this problem on AIX
    4.3.2 and 5.1. I applied this patch and the 'stream.d'
    patch, and it seems to have fixed it.

     

Log in to post a comment.