[clisp-list] econnreset & ebadf

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Sam Steingold writes:
 > the problem is this changeset:
 > changeset:   13699:a7ecbfe342f9
 > user:        sds
 > date:        Tue Dec 30 16:54:05 2008 +0000
 > files:       src/ChangeLog src/unixaux.d
 > description:
 > (fd_read): when errno is ECONNRESET, close fd to avoid
 > SIGPIPE on further accesses (see tests/econnreset.lisp)
 > Reported by Don Cohen
 > 
 > 
 > when I revert it, I see this:
 > 
 > (multiple-value-bind (run args) (cmd-args)
 >   (let ((se (socket:socket-server)))
 >     (ext:run-program run :arguments (append args (list "-q" "-q" "-x" (format nil "(close (socket:socket-connect ~D))" (socket:socket-server-port se))))
 >                      :wait nil :input nil :output nil)

I'm trying to figure out how this code works.
I'd expect that (close (socket:socket-connect ~D)) would connect and
close immediately and that this would result in a FIN rather than a RST.
And yet you seem to get resets out of it?

 >     (with-open-stream (so (socket:socket-accept se))
 >       (list
 >        (handler-case (write-line "foo" so)
 >          (error (c) (princ 'write-line) (princ-error c) 'error))
 >        (handler-case (read-char so)
 >          (stream-error (c) (princ 'read-char) (princ-error c) 'stream-error))
 >        (handler-case (write-line "bar" so)
 >          (error (c) (princ 'write-line) (princ-error c) 'error))
 >        (handler-case (read-char so)
 >          (end-of-file (c) (princ 'read-char) (princ-error c) 'end-of-file))))))
 > 
 > ==> ("foo" STREAM-ERROR ERROR END-OF-FILE)
 > 
 > the standard-output is
 > 
 > READ-CHAR
 > [SIMPLE-STREAM-ERROR]: 
 > UNIX error 104 (ECONNRESET): Connection reset by peer
 > 
 > WRITE-LINE
 > [SIMPLE-OS-ERROR]: 
 > UNIX error 32 (EPIPE): Broken pipe, child process terminated or socket closed
 > 
 > READ-CHAR
 > [SIMPLE-END-OF-FILE]: 
 > READ: input stream #<IO INPUT-BUFFERED SOCKET-STREAM CHARACTER 0.0.0.0:57821> has reached its end
 > 
 > 
 > I will now commit the reversion, this test, and I will remove the
 > tests/econnreset.lisp which I never used and do not understand.
 > 
 > Don, please test your code which used to generate a SIGPIPE and submit a
 > new bug report if it fails.

I don't know what used to generate sigpipe, but what I see now is:

  [1]> (setf ss (socket:socket-server 1234))
  #<SOCKET-SERVER 0.0.0.0:1234>
  [2]> (setf s (socket-accept ss))
[peer connects]
  #<IO INPUT-BUFFERED SOCKET-STREAM CHARACTER 0.0.0.0:1234>
  [3]> (princ "asd" s)
  "asd"
[peer closes, resulting in rst]
  [4]> (read-char s)

  [../src/stream.d:6196] 
  *** - UNIX error 104 (ECONNRESET): Connection reset by peer
  The following restarts are available:
  ABORT          :R1      Abort main loop
[of course I still want this error to be a subclass of eof, now
referred to as TCP-RST-EOF]

  Break 1 [5]> (read-char s)
  *** - READ: input stream #<IO INPUT-BUFFERED SOCKET-STREAM CHARACTER
  0.0.0.0:1234> has
        reached its end

This is what I reported in my gdb session.
It's a clear improvement over ebadf.
What I'd like to see is a repeat of the error that I'd prefer above,
that is, another TCP-RST-EOF

  The following restarts are available:
  ABORT          :R1      Abort debug loop
  ABORT          :R2      Abort main loop
  Break 2 [6]> (read-char s)

  *** - READ: input stream #<IO INPUT-BUFFERED SOCKET-STREAM CHARACTER
  0.0.0.0:1234> has
        reached its end
  The following restarts are available:
  ABORT          :R1      Abort debug loop
  ABORT          :R2      Abort debug loop
  ABORT          :R3      Abort main loop
  Break 3 [7]> (princ "asd" s)

  [../src/stream.d:13040] 
  *** - UNIX error 32 (EPIPE): Broken pipe, child process terminated or
  socket closed
  The following restarts are available:
  ABORT          :R1      Abort debug loop
  ABORT          :R2      Abort debug loop
  ABORT          :R3      Abort debug loop
  ABORT          :R4      Abort main loop
  Break 4 [8]> (princ "asd" s)

  [../src/stream.d:13040] 
  *** - UNIX error 32 (EPIPE): Broken pipe, child process terminated or
  socket closed
  The following restarts are available:
  ABORT          :R1      Abort debug loop
  ABORT          :R2      Abort debug loop
  ABORT          :R3      Abort debug loop
  ABORT          :R4      Abort debug loop
  ABORT          :R5      Abort main loop

So now we at least see how to generate epipe from that situation.
I had expected ECONNRESET here.
However, another experiment:

  [14]> (setf s (socket-accept ss))
[peer connects]
  #<IO INPUT-BUFFERED SOCKET-STREAM CHARACTER 0.0.0.0:1234>
  [15]> (princ "asd" s)
  "asd"
[peer closes, causing RST]
  [16]> (princ "asd" s)

  [../src/stream.d:13040] 
  *** - UNIX error 104 (ECONNRESET): Connection reset by peer
  The following restarts are available:
  ABORT          :R1      Abort main loop
  Break 1 [17]> (princ "asd" s)

  [../src/stream.d:13040] 
  *** - UNIX error 32 (EPIPE): Broken pipe, child process terminated or
  socket closed
  The following restarts are available:
  ABORT          :R1      Abort debug loop
  ABORT          :R2      Abort main loop
  Break 2 [18]> 

So what I think is going on in both cases is that the first attempt to
read or write results in ECONNRESET and this stores something that
causes later attempts to read to return EOF and later attempts to
write to return EPIPE.

I think that later attempts to read and write should return the same
things as initial attempts, and that in the case of read this should
be a subclass of EOF.

BTW, another small point:
  [24]> (setf s (socket-accept ss))
[peer connects]
  #<IO INPUT-BUFFERED SOCKET-STREAM CHARACTER 0.0.0.0:1234>
  [25]> (princ "asd" s)
  "asd"
now peer does (princ "a" s) and then closes s
  [26]> (read-char s)
  #\a
so we see, as expected, that there is no error until we reach the end
of input
  [27]> (read-char s)

  [../src/stream.d:6196] 
  *** - UNIX error 104 (ECONNRESET): Connection reset by peer
  The following restarts are available:
  ABORT          :R1      Abort main loop
  Break 1 [28]> 

Summary:
- This change is a big improvement.
- I still think it's important for the initial error on read to be a
subclass of EOF so that normal EOF processing will do the right thing.
- I still think that subsequent attempts to read/write should generate
the same error as the first

I'll now return to the message of 2011-02-21 20:58:43 GMT
http://article.gmane.org/gmane.lisp.clisp.general/13638

  I try here to summarize the remaining issues in this thread.
  Perhaps some of these should be separated into different discussion
  threads.

  1.
   > I therefore surmise that RST is generated when the socket is closed
   > with unread input remaining.  I don't know whether this is controlled
   > in any way by clisp or is just a (bad, in my opinion so far) decision
   > in the linux TCP implementation.
  Still to be determined:
  Is this correct?  And if so, is this related to anything in clisp code
  or entirely up to the OS?
unchanged

  2. I had asked earlier whether RST can be detected other than by
  attempting to do an operation on a stream, e.g., as an asynchronous
  signal.  I still don't know the answer.  If so, then it seems to me
  that it would be useful to translate that into a lisp signal, which is
  the one I suggested not be an error, or even a serious condition.
  This also raises the question of which thread handles such conditions.
  (Have we discussed that before?  Where's the answer?)
unchanged

  3. The only check for ECONNRESET I now see is in the wrapper for
  read, and I expect that it should be added to several of the other
  wrappers, though I've not yet found a list of what system calls can
  cause that error.  However it seems almost certain that at very least
  this could result from other attempts to read and write.
I now see at least that there's another one in write.
Are there only these two?

  4. We have discussed possible lisp conditions that could/should be
  signalled in response to ECONNRESET.
  I've heard no response to my last suggestion that
  - when it's caused by attempting to read past the end of the data
    sent on the socket, the condition should be (a subtype of) eof
  - when it's caused by attempting to do something else that cannot
    be done to a reset socket, such as writing, it should not be eof, 
    but still a stream error
  - if there's any way to detect reset other than trying to do an
    operation that fails, e.g., if you can detect it when you're filling
    a buffer but the lisp function does not require reading past the
    available input, then a non serious-condition should (or at least
    COULD reasonably) be generated.
There is now a RFE for the read case.
I still don't know whether there even is some way to detect the third
case.  The second case is related to my previous message about trying
to place errors properly in the class hierarchy.

  5. Related to 4, Bruno asked what other implementations do.
  I've heard no responses to that so far.
unchanged
  6. What does and should socket-status do with ECONNRESET?
  It currently seems to generate an error only when the next read would
  fail due to being out of data AND having the stream reset.
  This seems contrary to the doc, which says 
    We define status for a SOCKET:SOCKET-SERVER or a
  SOCKET:SOCKET-STREAM
    to be :ERROR if any i/o operation will cause an ERROR.
  Now I notice that the statement above is ambiguous:
  "if any i/o operation will cause an error" could mean only if ALL
  io operations would cause errors.  I had imagined that it meant that
  if read will work but write will cause an error, then the status
  should be :ERROR.
  In any case, it would be useful to be able to detect that case.
  If my argument above that attempting to read past the end of data for
  a reset stream should be a (subtype of) eof, then I think that
  socket-status in that case should also be treated as eof.
unchanged
To me this topic seems relatively important. 

  7. A new point I'd like to add (though related to previous
  discussion).
  Right now the code that detects a reset seems to close the fd.
  I think this is a mistake.  The result is that a later attempt to
  close the stream gives EBADF.  And even worse than that, another
  attempt to close the file after that seems to cause a seg fault.
  I've not figured out exactly what code is executed along each of
  these paths, but I'd like to suggest independent fixes to these 
  various problems.
  - reset should not close the fd - a later attempt to use the stream
This is now fixed.

    should give an error related to a reset rather than ebadf
We now do not get ebadf, but I think we should get the same error
on the second attempt as on the first, and that it should be
related to reset, not the current general eof or epipe.
  - an attempt to close a stream containing an invalid fd should 
    still successfully close the lisp stream (even though I hope this
    will no longer happen) and 
The message from Joerg convinces me that this is not correct.
There should be no ebadf's and therefore no code for trying to
react to them.
  - an attempt to close a stream containing an invalid fd should 
    not segfault (even though I hope this will also no longer happen)
Of course, I want nothing to segfault.