#45 econnreset error => subclass of eof

open
Sam Steingold
5
2011-04-05
2011-04-05
Don Cohen
No

Right now, when a socket is reset by the peer, nothing exciting happens until you try to read past the last available input.
At that point you get a OS error (ECONNRESET) which is not an EOF condition (I think it should be), and this further closes the fd, without
marking the stream as closed (which I think it should not do). The result is that further attempts to use the stream give EBADF OS errors.
Requests:
- new condition class CONNECTION-RESET:
(define-condition connection-reset (end-of-file) ())
- change in CLISP behavior s.t. ECONNRESET results in CONNECTION-RESET
and not in OS-ERROR (or whatever)
- don't close fd on ECONNRESET - it should act like other eof's - another attempt to read from the stream should cause another eof.
Test code (at least for linux)
This requires two lisp processes, one for the client, one for the server. I make use of the fact that closing a socket when there is
still input available seems to result (again, only tested in linux) in a TCP reset rather than a FIN. The result of a FIN terminated
connection is what I view as correct, so I will demonstrate that also.
[server] (setf ss (socket:socket-server 1234))
[server] (setf s (socket-accept ss))
[client] (setf s (socket:socket-connect 1234))
[server-RST] (princ "asd" s)
[client] (close s)
at this point the stream is closed, either with FIN if the [server-RST] line is left out, or with RST if it is included
[server] (read-char s)
the result in FIN scenario is an EOF error, in the RST scenario it's *** - UNIX error 9 (EBADF): Bad file number
[server] (read-char s)
in the FIN scenario the result is another EOF error, in the RST scenario it's *** - UNIX error 9 (EBADF): Bad file number
In both cases s appears to be an open stream, though in the RST scenario, the underlying fd has been closed.

Discussion

  • Don Cohen
    Don Cohen
    2011-04-06

    I now notice that the ECONNRESET OS-ERROR occurs on both read and write.
    I'd like to clarify that only reads should result in EOF errors, so the condition should perhaps be named to indicate some relation to eof or read, e.g., TCP-RESET-EOF, and ECONNRESET resulting from a write should not result in that particular condition.

     
  • Sam Steingold
    Sam Steingold
    2011-04-06

    I am extremely reluctant to treat ECONNRESET differently on input and output.

     
  • Sam Steingold
    Sam Steingold
    2011-04-07

    Let me clarify my objections to the proposal:
    1. EOF is, intrinsically, an input event.
    ECONNRESET can happen on both input and output.
    I don't think it is right to treat it separately for input and output.

    2. EOF is a normal (non-error) condition in POSIX;
    ECONNRESET is an exception.

    3. The purpose of TCP RST which CLISP observes as ECONNRESET
    is to tell us that the client never read the string "asd" which the server sent with princ,
    not that there will be no more data coming through this connection
    (that's the purpose of the FIN which the server gets in the second read-char).

    In fact, I am not even sure that this is the case - is it really true that
    no data will ever arrive from a socket which signaled ECONNRESET?
    It would be nice if a TCP or POSIX expert could clarify this matter for us.

    see http://thread.gmane.org/gmane.lisp.clisp.general/13700

     
  • Don Cohen
    Don Cohen
    2011-04-07

    1. EOF is, intrinsically, an input event.
    I agree, though there is a corresponding problem with trying to write to a stream that is closed for output.
    ECONNRESET can happen on both input and output.
    Here you are referring to a particular error number.
    I would not view the appearance of that error number on a write to be the same sort of event as the appearance of that number on a read.
    I don't think it is right to treat it separately for input and output.
    So you adopt the position that an error number defines a condition. Is there any justification for this?
    2. EOF is a normal (non-error) condition in POSIX;
    ECONNRESET is an exception.
    It seems possible (in fact, eof is proof) that the lisp models and posix models are not completely compatible.
    As a lisp implementer you have to make lisp present the lisp model.

    3. The purpose of TCP RST which CLISP observes as ECONNRESET
    is to tell us that the client never read the string "asd" which the server sent with princ,
    I do not agree. I think this reset is unjustified by the TCP spec, in fact contradicts it.

    not that there will be no more data coming through this connection
    However the RST does imply that there will be no more data coming.

    (that's the purpose of the FIN which the server gets in the second
    read-char).
    The server does not get a FIN at all in this case. Only a RST.
    A FIN means "I'm done sending". After that I can send more packets but no more stream data.
    You can continue to send stream data. I am supposed to ACK that data. When you FIN then I
    ACK your FIN. (You probably already ACK'd my FIN.) Then the connection is closed.
    On the other hand, if you send RST then I'm supposed to not send you anything more. I interpret
    that as meaning that as far as you're concerned, this connection is closed.

    In fact, I am not even sure that this is the case - is it really true that
    no data will ever arrive from a socket which signaled ECONNRESET?
    If it is following the protocol then it will not send any data or anything else after sending RST.
    It thinks there is no such connection. At most it will reply to any further communication to that
    (non)connection with another RST.

    It would be nice if a TCP or POSIX expert could clarify this matter for us.
    There are degrees of expertise, but for purposes of this discussion, at least, I seem to be the expert.
    This is mostly a matter of how much time you spend reading the rfc (and other related rfcs).

     
  • Don Cohen
    Don Cohen
    2011-04-07

    drat, my formatting was lost in the last comment
    Next time I'll know to use something in addition to spaces to mark the quoted text I'm replying to.
    Sorry.

     
  • Sam Steingold
    Sam Steingold
    2011-04-08

    >>The purpose of TCP RST which CLISP observes as ECONNRESET
    >>is to tell us that the client never read the string "asd" which the
    >>server sent with princ,
    >I do not agree. I think this reset is unjustified by the TCP spec, in
    >fact contradicts it.
    the client program closes the socket when there is still some input there,
    thus the client kernel sends an RST telling the server that the "asd" was not read.
    seems strait from the rfc.

     
  • Don Cohen
    Don Cohen
    2011-04-08

    > the client program closes the socket when there is still some input there,
    > thus the client kernel sends an RST telling the server that the "asd" was not read.
    > seems strait from the rfc.
    I don't understand this at all.
    Where does the rfc say anything like that?

     
  • Sam Steingold
    Sam Steingold
    2011-04-11

    >So you adopt the position that an error number defines a condition.
    >Is there any justification for this?

    CLISP is built on top of the API provided by the kernel (linux & windows).
    Both l.&w. define their errors in terms of errnos.
    What you require is the following:
    (define-condition os-error (error) (($errno :initarg :errno :reader of-error-number)))
    (define-condition connection-reset-on-input (end-of-file os-error) ())
    (define-condition connection-reset-on-output (os-error) ())
    (define-condition connection-reset-other (os-error) ())
    and 3 such conditions for each socket errno.
    I think this is an overkill.

    As for use cases: do socket applications out there treat ECONNRESET
    the same way they treat a routine EOF? E.g,, mozilla? ftp?

     
  • Don Cohen
    Don Cohen
    2011-04-11

    I would like to see all of the errors that are intuitively stream errors to be be classified that way.
    I don't think I am trying to require any where near as many conditions as you suggest.
    At this point I only really want to argue for one,
    (define-condition connection-reset-on-input (end-of-file os-error) ())
    to allow this particular case to be treated like eof.
    The other applications you mention seem unfair comparisons, since the protocols involved
    generally require one side to read all the input from the other before answering.
    (I guess we could program a web server to read GET / and then ignore all after the / and send a web page
    and then close, and see what the browser does. But then I don't see what conclusion we'd draw from any answer.)
    Since there seems to be some disagreement about whether the read encountering RST is an eof, I suggest a run time switch (or if you like, even a special variable that can be bound different ways in different cases) to determine whether the condition in this case is a subtype of EOF.

     
  • Don Cohen
    Don Cohen
    2011-04-12

    A little new info on this:
    The first reply in comp.lang.lisp mentions something I also saw
    searching comp.protocols.tcp-ip. It's in rfc1122:

    A host MAY implement a "half-duplex" TCP close sequence, so
    that an application that has called CLOSE cannot continue to
    read data from the connection. If such a host issues a
    CLOSE call while received data is still pending in TCP, or
    if new data is received after CLOSE is called, its TCP
    SHOULD send a RST to show that data was lost.

    This at least explains the behavior that I thought was not justified
    by rfc 793.
    Just to clarify, if the client sends a FIN, that means that he is
    done sending but is still reading.
    This form of reset is a signal that the peer is not listening any
    more. It seems more relevant to writing than reading. It's really a
    signal that your write is not working.

    Note that the tcp is actually connected to some other program and even
    if tcp has delivered all the data to that program, it cannot tell
    whether that program ever looked at that data. Watch this:

    [server] (setf s (socket-accept ss))
    [client] (setf s (socket:socket-connect 1234))
    [server] (princ "asd" s)
    up to here the same script we saw before
    [client](read-char s)
    [client] (close s)
    The previous script skipped the read-char.
    In that case we get RST.
    By doing the read-char, clisp retrieves the data in the tcp queue,
    namely "asd". Tcp considers that data to be delivered, even though
    the lisp application has only looked at the "a". In this case the
    close ends with FIN. And therefore,
    [server] (read-char s)
    *** - READ: input stream #<IO INPUT-BUFFERED SOCKET-STREAM CHARACTER
    0.0.0.0:1234> has reached its end

     
  • Don Cohen
    Don Cohen
    2011-04-12

    Now let me respond to your poll.
    Your question was slightly misleading.
    I'd have preferred something that specifically says
    when you try to read on a stream that has been reset and you get to
    the end, should you get a signal that is a (subclass of) EOF
    In this context, the definition of EOF makes my case:
    The type end-of-file consists of error conditions related to read
    operations that are done on streams that have no more data.

    On the other hand I thought your example was good.
    The point is that when the peer closes the connection then your
    example code *MAY* get an error (previous post shows both cases).

    The problem I see is that you can't really tell whether the RST
    is a signal that your output was not all read or something more
    serious (the peer went down and then came back up with no record
    of this connection). (This is what I don't like about rfc1122.)
    I don't see that failure to read all of the input indicates an
    error, and as I pointed out, this RST is not a good indication
    of that anyway.

    Therefore, as a programmer you have to make a choice.
    The problem is that it's so inconvenient to make the choice now.
    I think this justifies some support for controlling the choice, as
    I suggested in the post of 2011-04-11 18:34:10 GMT.