From: <don...@is...> - 2011-04-06 18:11:47
|
Sam Steingold writes: > the problem is this changeset: > changeset: 13699:a7ecbfe342f9 > user: sds > date: Tue Dec 30 16:54:05 2008 +0000 > files: src/ChangeLog src/unixaux.d > description: > (fd_read): when errno is ECONNRESET, close fd to avoid > SIGPIPE on further accesses (see tests/econnreset.lisp) > Reported by Don Cohen > > > when I revert it, I see this: > > (multiple-value-bind (run args) (cmd-args) > (let ((se (socket:socket-server))) > (ext:run-program run :arguments (append args (list "-q" "-q" "-x" (format nil "(close (socket:socket-connect ~D))" (socket:socket-server-port se)))) > :wait nil :input nil :output nil) I'm trying to figure out how this code works. I'd expect that (close (socket:socket-connect ~D)) would connect and close immediately and that this would result in a FIN rather than a RST. And yet you seem to get resets out of it? > (with-open-stream (so (socket:socket-accept se)) > (list > (handler-case (write-line "foo" so) > (error (c) (princ 'write-line) (princ-error c) 'error)) > (handler-case (read-char so) > (stream-error (c) (princ 'read-char) (princ-error c) 'stream-error)) > (handler-case (write-line "bar" so) > (error (c) (princ 'write-line) (princ-error c) 'error)) > (handler-case (read-char so) > (end-of-file (c) (princ 'read-char) (princ-error c) 'end-of-file)))))) > > ==> ("foo" STREAM-ERROR ERROR END-OF-FILE) > > the standard-output is > > READ-CHAR > [SIMPLE-STREAM-ERROR]: > UNIX error 104 (ECONNRESET): Connection reset by peer > > WRITE-LINE > [SIMPLE-OS-ERROR]: > UNIX error 32 (EPIPE): Broken pipe, child process terminated or socket closed > > READ-CHAR > [SIMPLE-END-OF-FILE]: > READ: input stream #<IO INPUT-BUFFERED SOCKET-STREAM CHARACTER 0.0.0.0:57821> has reached its end > > > I will now commit the reversion, this test, and I will remove the > tests/econnreset.lisp which I never used and do not understand. > > Don, please test your code which used to generate a SIGPIPE and submit a > new bug report if it fails. I don't know what used to generate sigpipe, but what I see now is: [1]> (setf ss (socket:socket-server 1234)) #<SOCKET-SERVER 0.0.0.0:1234> [2]> (setf s (socket-accept ss)) [peer connects] #<IO INPUT-BUFFERED SOCKET-STREAM CHARACTER 0.0.0.0:1234> [3]> (princ "asd" s) "asd" [peer closes, resulting in rst] [4]> (read-char s) [../src/stream.d:6196] *** - UNIX error 104 (ECONNRESET): Connection reset by peer The following restarts are available: ABORT :R1 Abort main loop [of course I still want this error to be a subclass of eof, now referred to as TCP-RST-EOF] Break 1 [5]> (read-char s) *** - READ: input stream #<IO INPUT-BUFFERED SOCKET-STREAM CHARACTER 0.0.0.0:1234> has reached its end This is what I reported in my gdb session. It's a clear improvement over ebadf. What I'd like to see is a repeat of the error that I'd prefer above, that is, another TCP-RST-EOF The following restarts are available: ABORT :R1 Abort debug loop ABORT :R2 Abort main loop Break 2 [6]> (read-char s) *** - READ: input stream #<IO INPUT-BUFFERED SOCKET-STREAM CHARACTER 0.0.0.0:1234> has reached its end The following restarts are available: ABORT :R1 Abort debug loop ABORT :R2 Abort debug loop ABORT :R3 Abort main loop Break 3 [7]> (princ "asd" s) [../src/stream.d:13040] *** - UNIX error 32 (EPIPE): Broken pipe, child process terminated or socket closed The following restarts are available: ABORT :R1 Abort debug loop ABORT :R2 Abort debug loop ABORT :R3 Abort debug loop ABORT :R4 Abort main loop Break 4 [8]> (princ "asd" s) [../src/stream.d:13040] *** - UNIX error 32 (EPIPE): Broken pipe, child process terminated or socket closed The following restarts are available: ABORT :R1 Abort debug loop ABORT :R2 Abort debug loop ABORT :R3 Abort debug loop ABORT :R4 Abort debug loop ABORT :R5 Abort main loop So now we at least see how to generate epipe from that situation. I had expected ECONNRESET here. However, another experiment: [14]> (setf s (socket-accept ss)) [peer connects] #<IO INPUT-BUFFERED SOCKET-STREAM CHARACTER 0.0.0.0:1234> [15]> (princ "asd" s) "asd" [peer closes, causing RST] [16]> (princ "asd" s) [../src/stream.d:13040] *** - UNIX error 104 (ECONNRESET): Connection reset by peer The following restarts are available: ABORT :R1 Abort main loop Break 1 [17]> (princ "asd" s) [../src/stream.d:13040] *** - UNIX error 32 (EPIPE): Broken pipe, child process terminated or socket closed The following restarts are available: ABORT :R1 Abort debug loop ABORT :R2 Abort main loop Break 2 [18]> So what I think is going on in both cases is that the first attempt to read or write results in ECONNRESET and this stores something that causes later attempts to read to return EOF and later attempts to write to return EPIPE. I think that later attempts to read and write should return the same things as initial attempts, and that in the case of read this should be a subclass of EOF. BTW, another small point: [24]> (setf s (socket-accept ss)) [peer connects] #<IO INPUT-BUFFERED SOCKET-STREAM CHARACTER 0.0.0.0:1234> [25]> (princ "asd" s) "asd" now peer does (princ "a" s) and then closes s [26]> (read-char s) #\a so we see, as expected, that there is no error until we reach the end of input [27]> (read-char s) [../src/stream.d:6196] *** - UNIX error 104 (ECONNRESET): Connection reset by peer The following restarts are available: ABORT :R1 Abort main loop Break 1 [28]> Summary: - This change is a big improvement. - I still think it's important for the initial error on read to be a subclass of EOF so that normal EOF processing will do the right thing. - I still think that subsequent attempts to read/write should generate the same error as the first I'll now return to the message of 2011-02-21 20:58:43 GMT http://article.gmane.org/gmane.lisp.clisp.general/13638 I try here to summarize the remaining issues in this thread. Perhaps some of these should be separated into different discussion threads. 1. > I therefore surmise that RST is generated when the socket is closed > with unread input remaining. I don't know whether this is controlled > in any way by clisp or is just a (bad, in my opinion so far) decision > in the linux TCP implementation. Still to be determined: Is this correct? And if so, is this related to anything in clisp code or entirely up to the OS? unchanged 2. I had asked earlier whether RST can be detected other than by attempting to do an operation on a stream, e.g., as an asynchronous signal. I still don't know the answer. If so, then it seems to me that it would be useful to translate that into a lisp signal, which is the one I suggested not be an error, or even a serious condition. This also raises the question of which thread handles such conditions. (Have we discussed that before? Where's the answer?) unchanged 3. The only check for ECONNRESET I now see is in the wrapper for read, and I expect that it should be added to several of the other wrappers, though I've not yet found a list of what system calls can cause that error. However it seems almost certain that at very least this could result from other attempts to read and write. I now see at least that there's another one in write. Are there only these two? 4. We have discussed possible lisp conditions that could/should be signalled in response to ECONNRESET. I've heard no response to my last suggestion that - when it's caused by attempting to read past the end of the data sent on the socket, the condition should be (a subtype of) eof - when it's caused by attempting to do something else that cannot be done to a reset socket, such as writing, it should not be eof, but still a stream error - if there's any way to detect reset other than trying to do an operation that fails, e.g., if you can detect it when you're filling a buffer but the lisp function does not require reading past the available input, then a non serious-condition should (or at least COULD reasonably) be generated. There is now a RFE for the read case. I still don't know whether there even is some way to detect the third case. The second case is related to my previous message about trying to place errors properly in the class hierarchy. 5. Related to 4, Bruno asked what other implementations do. I've heard no responses to that so far. unchanged 6. What does and should socket-status do with ECONNRESET? It currently seems to generate an error only when the next read would fail due to being out of data AND having the stream reset. This seems contrary to the doc, which says We define status for a SOCKET:SOCKET-SERVER or a SOCKET:SOCKET-STREAM to be :ERROR if any i/o operation will cause an ERROR. Now I notice that the statement above is ambiguous: "if any i/o operation will cause an error" could mean only if ALL io operations would cause errors. I had imagined that it meant that if read will work but write will cause an error, then the status should be :ERROR. In any case, it would be useful to be able to detect that case. If my argument above that attempting to read past the end of data for a reset stream should be a (subtype of) eof, then I think that socket-status in that case should also be treated as eof. unchanged To me this topic seems relatively important. 7. A new point I'd like to add (though related to previous discussion). Right now the code that detects a reset seems to close the fd. I think this is a mistake. The result is that a later attempt to close the stream gives EBADF. And even worse than that, another attempt to close the file after that seems to cause a seg fault. I've not figured out exactly what code is executed along each of these paths, but I'd like to suggest independent fixes to these various problems. - reset should not close the fd - a later attempt to use the stream This is now fixed. should give an error related to a reset rather than ebadf We now do not get ebadf, but I think we should get the same error on the second attempt as on the first, and that it should be related to reset, not the current general eof or epipe. - an attempt to close a stream containing an invalid fd should still successfully close the lisp stream (even though I hope this will no longer happen) and The message from Joerg convinces me that this is not correct. There should be no ebadf's and therefore no code for trying to react to them. - an attempt to close a stream containing an invalid fd should not segfault (even though I hope this will also no longer happen) Of course, I want nothing to segfault. |