Hi,
On Mac OS X version 10.4.3, the socket tests give 30
failures for clisp 2.37. The underlying cause is
a return of EADDRNOTAVAIL from bind.
The cause of this is a failure to fill struct sockaddr_in
(or struct sockaddr_un) with zeroes before using it.
This is a requirement on OS X, and IIRC, older BSDs
as well. If the structure is not zero-filled, the
exact error generated may depend on the contents of
the uninitialed variable.
I attach a patch to fix this on OS X. If the problem
occurs on other OSs, the conditional can be changed
to cover those cases.
With the patch, all tests pass.
Best Wishes,
Greg Wright
Logged In: YES
user_id=392583
Verified socket test failure (with different symptoms) on
2006 January 04 and that the patch worked allowing all the
tests to pass. While I don't know the CLISP internals enough
to know if raw memset if the correct fix, I would strongly
suggest that initializing the struct should be considered a
cross-platform bug.
--Doug Philips
Logged In: YES
user_id=5735
thanks.
this patch (or something similar) will go in without ifdefs.
I am not sure yet if we need to initialize _all_ these
structs though...
also, I would like to make sure that this bug has not been
introduced recently but has been there all along.
(tests/socket.tst is a recent addition)
Greg, Doug, and everyone - please test some older release
(e.g., 2.33).
Thanks!
Logged In: YES
user_id=5735
also, the patch appears to blindly zero-out all sockaddr
structs even though their slots are being initialized
immediately after that.
specifically, it is not clear why one has to init sockaddr
before getsockname, getpeername, accept.
http://www.opengroup.org/onlinepubs/009695399/functions/getpeername.html
does not require that the argument is initialized.
are these OSX bugs?
please investigate.
note that you do not need to redump lispinit.mem for these
changes, i.e., you can change socket.d, do "make lisp.run",
and test with
./lisp.run -M lispinit.mem -norc -i tests/test -x '(run-test
"tests/socket")'
thanks.
Logged In: YES
user_id=1321993
Hi,
Yes, the patch does blindly zero out the whole sockaddr structure, even
though the slots are initialized afterward. This is required by traditional
BSDs (and evidently OS X as well). The reason is that the bind
syscall (at least) doesn't parse the fields but just looks for the zeroes to find
out where
the address port and address fields end. This is not the case on Linux,
and I think has been fixed on recent BSDs as well (it seems not to be required
on FreeBSD 6, which I am also running).
As you Sam notes, this is not posix behavior. However, it is documented
as standard in the first edition of Stevens's _Unix Network Programming_"
(see p 285, bzero is used to clear the whole sockaddr_in struct before the
call to bind). Yes, I agree it ought to be fixed and will file an RFE with
Apple for OS X.
I can test on 2.33, but it is possible that the bug may exist but not show up,
since it depends on the contents of an uninitialized automatic allocation.
If zeroes show up at the end of the address field, the call will succeed.
It is guaranteed safe to zero the whole struct on any OS, and the
very minor performance hit --- the memory has to be pulled into cache
anyway to initialize the other fileds--- is only taken once, on the
initial setup of the struct.
Best Wishes,
Greg
Logged In: YES
user_id=392583
I concur with Greg. Uninitialized values are very tricky to
test for. I've only recently started using CLISP again
because I ran into troubles building the older versions
under Mac OS X and did not have the time to become even a
part-time contributor. I guess I could try that test that
Sam requested, but I don't understand what the results would
tell us. I was seeing very different failures than Greg was
on the same code base to start with.
--Doug
Logged In: YES
user_id=5735
what I don't understand is where the uninitialized values
are coming from. we always properly init the relevant fields
of all structs before calling bind et al (apparently, the
osx bug is that it also looks at the "padding space").
at any rate, I would like to see the following tests done:
1. assertain that the oldest clisp you have available still
have (at least some) socket problems
2. find out what are the specific system calls that fail
(you can use memset to set all bytes to 0xFF :-).
thanks.
Logged In: YES
user_id=1321993
Yes, on traditional BSDs and OS X, the pad space is checked to ensure that
it is all zero. This was because there were other sockaddr structures
(e.g., sockaddr_ns for XNS) that were also 16 bytes long but
had different field layouts. Checking the pad space was a sanity check
that you hadn't set the address family to AF_INET but given a XNS address.
The traditional idiom (Stevens, Unix Network Programming, 1st ed.) was
always to bzero the sockaddr_* structures before setting any of the fields,
to ensure that the empty space was always zero.
I will test an older version of clisp by filling the empty space with 0xffs.
2.33 an acceptable version?
I know that one of the failed system calls is bind. get/setsockopt may also
fail, but I haven't tested them yet.
-Greg
Logged In: YES
user_id=5735
I only have the the 3rd edition,
but google appears to support you.
2.33 is old enough, thanks.
Logged In: YES
user_id=5735
http://article.gmane.org/gmane.lisp.clisp.general:10773
Logged In: YES
user_id=30503
I found this comment in the OpenMCL source:
;; Darwin includes the SIN_ZERO field of the sockaddr_in when
;; comparing the requested address to the addresses of
configured
;; interfaces (as if the zeros were somehow part of either
address.)
;; "rletz" zeros out the stack-allocated structure, so
those zeros
;; will be 0.
Logged In: YES
user_id=5735
bruno, it is up to you now.
there was a discussion on clisp-list about it.
my patch works, but I am not sure it is not an overkill
patch - final
Logged In: YES
user_id=5735
required action from OS X users:
0. record OS version; check out clisp cvs head.
1. make sure that sockets break without the patch by running
./clisp -norc -i tests/tests -x '(run-test "tests/socket")'
2. apply the patch, make sure the test works
3. modify socket.d and replace "0" with "1" in
#define FILL0(s) memset((void*)&s,0,sizeof(s))
so that it reads
#define FILL0(s) memset((void*)&s,1,sizeof(s))
and make sure that the test fails in an identical way to
<1> above.
report results here:
os version
did pristine version fail
did patched (FILL0) version fail
did FILL1 version fail
Logged In: YES
user_id=377168
Joe Koski reports these results:
1. OS X 10.3.9
2. pristine 2.38 fails, but not with CVS stream.d patch
3. FILL0 socket.d patch passes tests
4. FILL1 socket.d patch causes tests to fail
Greg mentions this as "traditional BSD" behaviour. Shouldn't
the FILL0 patch then get used via #ifdef UNIX_BSD(sp?)
(supposing that MacOSX causes such a define to be set? What
do users think?
Logged In: YES
user_id=477349
Here are my results under Mac OS X 10.4.4:
The stream.d 1.558 changes seem to prevent the socket test failures. Neither
"FILL0", nor "FILL1" affect the results of the socket tests.
Without the stream.d 1.558 changes, using the "FILL1" patch does not prevent
the socket test failures. However, using the "FILL0" patch does prevent the
socket test failures.
"FILL0" means socket-osx.diff is applied.
"FILL1" means socket-osx.diff is applied and the second arg to memset in
the FILL0 macro is changed from 0 to 1.
The test machine is running Mac OS X 10.4.4 and Xcode 2.2.1 (gcc labels
itself
as 4.0.1, Apple build 5250).
Here "CVS" is CVS as of 2004-06-21 12:19 GMT:
Source tests/socket results
CVS ("tests/socket" 60 0)
CVS FILL0 ("tests/socket" 60 0)
CVS FILL1 ("tests/socket" 60 0)
Here "CVS-stream.d" is CVS as of 2006-02-21 12:19 GMT with src/stream.d
rolled back to 1.557 (like the 2.38 release):
Source tests/socket results
CVS-stream.d ("tests/socket" 60 32)
CVS-stream.d FILL0 ("tests/socket" 60 0)
CVS-stream.d FILL1 ("tests/socket" 60 32)
The failures (as reported in the socket.erg files) are identical for the
tests when run with stream.d 1.557 both with and without the FILL1 patch.
Logged In: YES
user_id=5735
Chris, thanks a lot for testing!
Are you saying that CVS head does not have the problem?
in that case the patch is not needed and will be discarded.
OTOH, it might be that the change in the default interface
made the difference and the problem is still there.
Could you please try the :interface argument to socket-server
(pass :interface "localhost" or :interface "127.0.0.1")
thanks!
Logged In: YES
user_id=477349
From the following, it looks like both the stream.d and socket.d changes are
important. Good call on asking about the results of using :interface. I have
never used the socket API in CLISP before, so I hope the following test is
something like what you wanted...
(let* ((interfaces (list "127.0.0.1" "localhost" "0.0.0.0" "10.0.0.104"))
(arg-tails (mapcar (lambda (i) (list :interface i)) interfaces)))
(mapcar (lambda (tail) (apply (function socket:socket-server) 0 tail))
(list* nil arg-tails)))
[Note: 10.0.0.104, is an address that was assigned to one of my network
interfaces when I ran the tests.]
Again, all these results are based on build derived from a CVS update as of
2006-01-21 12:19 GMT. "FILL0" has the patch from this bug report applied.
"FILL1" has the same patch applied, but with the memset changed to fill with
1 instead of 0. The "stream.d" variants have the 1.557 version of stream.d
instead of the 1.558 version that was present as of the CVS update.
CVS
---
(#<SOCKET-SERVER 0.0.0.0:49314> #<SOCKET-SERVER 0.0.0.0:49315>
#<SOCKET-SERVER 0.0.0.0:49316> #<SOCKET-SERVER 0.0.0.0:49317>
#<SOCKET-SERVER 0.0.0.0:49318>)
CVS FILL0
---------
(#<SOCKET-SERVER 0.0.0.0:49319> #<SOCKET-SERVER 127.0.0.1:49320>
#<SOCKET-SERVER 127.0.0.1:49321> #<SOCKET-SERVER 0.0.0.0:49322>
#<SOCKET-SERVER 10.0.0.104:49323>)
CVS FILL1
---------
(#<SOCKET-SERVER 0.0.0.0:49324> #<SOCKET-SERVER 0.0.0.0:49325>
#<SOCKET-SERVER 0.0.0.0:49326> #<SOCKET-SERVER 0.0.0.0:49327>
#<SOCKET-SERVER 0.0.0.0:49328>)
CVS-stream.d
------------
[stream.d:13882] *** - UNIX error 49 (EADDRNOTAVAIL): Cannot assign
requested address
CVS-stream.d FILL0
------------------
(#<SOCKET-SERVER 127.0.0.1:49329> #<SOCKET-SERVER
127.0.0.1:49331> #<SOCKET-SERVER 127.0.0.1:49333> #<SOCKET-SERVER
127.0.0.1:49335> #<SOCKET-SERVER 127.0.0.1:49337>)
CVS-stream.d FILL1
------------------
[stream.d:13882] *** - UNIX error 49 (EADDRNOTAVAIL): Cannot assign
requested address
So to me, it looks like "CVS FILL0" is the only variation that works as
expected. This would indicate that both the FILL0 patch (associated with this
bug report) and the changes already made to stream.d in version 1.558 are
both required for correct functionality on Mac OS X 10.4.4. The CVS and CVS
FILL1 variants are close to working, but they seem to not accurately honor the
request to bind to a particular interface address. The stream.d-1.557 FILL0
version seems to work for local-use-only sockets.
It might be useful to add a socket test that does something along the lines of
this:
(let* ((addr-list (list "127.0.0.1" "0.0.0.0"))
(host-list (list* "0.0.0.0" addr-list))
(arg-tails (list* nil
(mapcar (lambda (i) (list :interface i)) addr-list))))
(equal (mapcar (lambda (tail)
(socket:socket-server-host
(apply (function socket:socket-server) 0 tail)))
arg-tails)
host-list))
That would make sure that requesting binding to all interfaces or just
localhost works (and verifies the default binding if no :interface is supplied).
I hope this info is useful.
Logged In: YES
user_id=5735
thank you for your bug report.
the bug has been fixed in the CVS tree.
you can either wait for the next release (recommended)
or check out the current CVS tree (see http://clisp.cons.org\)
and build CLISP from the sources (be advised that between
releases the CVS tree is very unstable and may not even build
on your platform).
Logged In: YES
user_id=1539062
Just a note that I also experienced this problem on AIX
4.3.2 and 5.1. I applied this patch and the 'stream.d'
patch, and it seems to have fixed it.