From: Nikodemus S. <nik...@ra...> - 2008-11-06 13:03:48
Attachments:
socket-connect.patch
|
While stressing SBCL threads with hunchentoot & drakma I frequently see connect() returning -1 & EINTR in SB-BSD-SOCKETS -- which SBCL currently considers an error. I am not a network hacker by any measure, so I would much appreciate it if someone could tell me if just retrying connect() on EINTR is sane? The attached patch does this. Cheers, -- Nikodemus |
From: Leslie P. P. <sk...@vi...> - 2008-11-06 13:44:24
|
> While stressing SBCL threads with hunchentoot & drakma I frequently > see connect() returning -1 & EINTR in SB-BSD-SOCKETS -- which SBCL > currently considers an error. > > I am not a network hacker by any measure, so I would much appreciate > it if someone could tell me if just retrying connect() on EINTR is > sane? The attached patch does this. Unless someone with more knowledge jumps in... EINTR The system call was interrupted by a signal that was caught; see signal(7). is written in connect(2) for Linux. I wonder what kind of signals might that be in your test cases? Some threading signal like SIGCHLD or...? Leslie |
From: Ingvar <in...@he...> - 2008-11-06 13:45:35
|
Nikodemus Siivola writes: > While stressing SBCL threads with hunchentoot & drakma I frequently > see connect() returning -1 & EINTR in SB-BSD-SOCKETS -- which SBCL > currently considers an error. > > I am not a network hacker by any measure, so I would much appreciate > it if someone could tell me if just retrying connect() on EINTR is > sane? The attached patch does this. I'd say that one of: 1 Signal a continuable error with "reconnect" and "abort" as restarts 2 Just try reconnecting 3 Signal as per 1, if unhandled continue anyway (may require signalling a condition rather than an error) This based on the fact that connect(2) returning -1 and errno set to EINTR only signals that there was an interruption, rather than an actual error in trying to connect. Off-hand, I think 2 is the "simplest solution" and 3 is probably some sort of platonic ideal, in that it allows marginally more flexibility for the discerning network code programmer. Saying that, I can't actually think of any situation where I'd care about it. //Ingvar |
From: Nikodemus S. <nik...@ra...> - 2008-11-06 14:24:48
|
On Thu, Nov 6, 2008 at 3:03 PM, Nikodemus Siivola <nik...@ra...> wrote: > While stressing SBCL threads with hunchentoot & drakma I frequently > see connect() returning -1 & EINTR in SB-BSD-SOCKETS -- which SBCL > currently considers an error. > > I am not a network hacker by any measure, so I would much appreciate > it if someone could tell me if just retrying connect() on EINTR is > sane? The attached patch does this. Some more detail: EINTR can be caused by eg. GC triggered by another thread (thanks to SIG_STOP_FOR_GC being sent to all threads.) This is what I believe I was seeing, and SB-SPROF is another likely source of similar havoc. (Situations under which EINTR occurs in the real world are probably platform dependant.) I don't see how user of SB-BSD-SOCKETS can reasonably deal with EINTR except by (a) panicking for no good reason (b) retrying. Hence the patch. However, since I am not familiar with network programming idioms, I am mostly wondering if there are eg. cases where it is common to depend on <SOMETHING> causing connect() to fail with EINTR instead of trying till timeout is reached, etc. > Cheers, > > -- Nikodemus |
From: Chun T. (binghe) <bin...@gm...> - 2008-11-06 16:37:14
|
Hi, I think just leave it unchanged will be OK. SB-BSD-SOCKETS is just a direct wrapper for BSD sockets API, no need to consider doing more of users. There're many reason which can cause connect() fail, and EINTR is one of them which just caused by something unrelated to the network, when it's happened, there're must be someone interrupted it, or Lisp system itself has something wrong, simply retry it is not a good idea. When you start to think about retry any operation in networking programming, you have to decide: * How soon this retry behavior will happen? (immediately, or wait some time) * If you decide wait some time, is the time fixed, or a changed value according to actual networking status? * How many times of retry will be happen? (infinitely, or just retry fix times, or non-fix times) A Low-level API shouldn't consider too much, try just do the FFI (C- >Lisp) work best will be OK, and leave other part to user code. --binghe On 2008-11-6, at 22:24, Nikodemus Siivola wrote: > On Thu, Nov 6, 2008 at 3:03 PM, Nikodemus Siivola > <nik...@ra...> wrote: >> While stressing SBCL threads with hunchentoot & drakma I frequently >> see connect() returning -1 & EINTR in SB-BSD-SOCKETS -- which SBCL >> currently considers an error. >> >> I am not a network hacker by any measure, so I would much appreciate >> it if someone could tell me if just retrying connect() on EINTR is >> sane? The attached patch does this. > > Some more detail: EINTR can be caused by eg. GC triggered by another > thread (thanks to SIG_STOP_FOR_GC being sent to all threads.) This is > what I believe I was seeing, and SB-SPROF is another likely source of > similar havoc. (Situations under which EINTR occurs in the real world > are probably platform dependant.) > > I don't see how user of SB-BSD-SOCKETS can reasonably deal with EINTR > except by (a) panicking for no good reason (b) retrying. Hence the > patch. > > However, since I am not familiar with network programming idioms, I am > mostly wondering if there are eg. cases where it is common to depend > on <SOMETHING> causing connect() to fail with EINTR instead of trying > till timeout is reached, etc. > >> Cheers, >> >> -- Nikodemus > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > Build the coolest Linux based applications with Moblin SDK & win > great prizes > Grand prize is a trip for two to an Open Source event anywhere in > the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Sbcl-devel mailing list > Sbc...@li... > https://lists.sourceforge.net/lists/listinfo/sbcl-devel -- Chun Tian (binghe) NetEase.com, Inc. P. R. China |
From: Gábor M. <me...@re...> - 2008-11-06 16:52:36
|
On Jueves 06 Noviembre 2008, Chun Tian (binghe) wrote: > Hi, > > I think just leave it unchanged will be OK. SB-BSD-SOCKETS is just a > direct wrapper for BSD sockets API, no need to consider doing more of > users. There're many reason which can cause connect() fail, and EINTR > is one of them which just caused by something unrelated to the > network, when it's happened, there're must be someone interrupted it, > or Lisp system itself has something wrong, simply retry it is not a > good idea. Yes, EINTR is just one of the possible reasons, but I think this one should be retried automatically: there is no way to tell for user code if EINTR happened due being profiled, gc striking, etc. It basically carries no information and all sane handlers must retry. Cheers, Gabor |
From: Brian M. <br...@ma...> - 2008-11-06 16:57:21
|
On Nov 6, 2008, at 10:37 AM, Chun Tian (binghe) wrote: > Hi, > > I think just leave it unchanged will be OK. SB-BSD-SOCKETS is just a > direct wrapper for BSD sockets API, no need to consider doing more of > users. There're many reason which can cause connect() fail, and EINTR > is one of them which just caused by something unrelated to the > network, when it's happened, there're must be someone interrupted it, > or Lisp system itself has something wrong, simply retry it is not a > good idea. What makes EINTR different is that it can be caused by an SBCL- generated interrupt, and I think it's probably good sense to not expose those interrupts to the user as there's nothing they can do about it. Just on the basis of not exposing things to the user that they have no reason to care about, I think automatic retry for interruptions that are caused by SBCL is the only sensible behavior. -- Brian Mastenbrook br...@ma... http://brian.mastenbrook.net/ |
From: Julian S. <der...@we...> - 2008-11-07 13:51:22
|
Brian Mastenbrook <br...@ma...> writes: > What makes EINTR different is that it can be caused by an SBCL- > generated interrupt, and I think it's probably good sense to not > expose those interrupts to the user as there's nothing they can do > about it. Just on the basis of not exposing things to the user that > they have no reason to care about, I think automatic retry for > interruptions that are caused by SBCL is the only sensible behavior. Either that or document the current behavior in the manual. Regards, -- Julian Stecklina Well, take it from an old hand: the only reason it would be easier to program in C is that you can't easily express complex problems in C, so you don't. - Erik Naggum (in comp.lang.lisp) |
From: Chun T. (binghe) <bin...@gm...> - 2008-11-06 17:28:24
|
On 2008-11-7, at 24:57, Brian Mastenbrook wrote: > > On Nov 6, 2008, at 10:37 AM, Chun Tian (binghe) wrote: > >> Hi, >> >> I think just leave it unchanged will be OK. SB-BSD-SOCKETS is just a >> direct wrapper for BSD sockets API, no need to consider doing more of >> users. There're many reason which can cause connect() fail, and EINTR >> is one of them which just caused by something unrelated to the >> network, when it's happened, there're must be someone interrupted it, >> or Lisp system itself has something wrong, simply retry it is not a >> good idea. > > > What makes EINTR different is that it can be caused by an SBCL- > generated interrupt, and I think it's probably good sense to not > expose those interrupts to the user as there's nothing they can do > about it. Just on the basis of not exposing things to the user that > they have no reason to care about, I think automatic retry for > interruptions that are caused by SBCL is the only sensible behavior. Hmmm, your explanation is sensible. But I'm still worried about that: is there any chance the "SOCKET-CONNECT with automatic retry" could fall into some kind of loop? i.e. EINTR, retry, EINTR, retry, ... I don't know much about SBCL's behavior on lack of resources: if SOCKET- CONNECT return EINTR because no memory or no file descriptor (max-fd is always limited), "auto retry" may cause a indefinite loop ... > > -- > Brian Mastenbrook > br...@ma... > http://brian.mastenbrook.net/ > -- Chun Tian (binghe) NetEase.com, Inc. P. R. China |
From: Brian M. <br...@ma...> - 2008-11-06 17:38:20
|
On Nov 6, 2008, at 11:28 AM, Chun Tian (binghe) wrote: > Hmmm, your explanation is sensible. But I'm still worried about > that: is there any chance the "SOCKET-CONNECT with automatic retry" > could fall into some kind of loop? i.e. EINTR, retry, EINTR, > retry, ... I don't know much about SBCL's behavior on lack of > resources: if SOCKET-CONNECT return EINTR because no memory or no > file descriptor (max-fd is always limited), "auto retry" may cause a > indefinite loop ... Those situations wouldn't cause EINTR, but another condition. I don't see any way that automatic retry for EINTR could cause a loop. -- Brian Mastenbrook br...@ma... http://brian.mastenbrook.net/ |
From: James Y K. <fo...@fu...> - 2008-11-06 17:42:43
|
On Nov 6, 2008, at 8:03 AM, Nikodemus Siivola wrote: > While stressing SBCL threads with hunchentoot & drakma I frequently > see connect() returning -1 & EINTR in SB-BSD-SOCKETS -- which SBCL > currently considers an error. > > I am not a network hacker by any measure, so I would much appreciate > it if someone could tell me if just retrying connect() on EINTR is > sane? The attached patch does this. I'm really tempted to say not sane, but now I can't think of a reason why, so I guess I have to change my vote. It's still possible to escape from the loop via a SIGALRM or so (which is an important feature to limit the time you spend waiting for connect!), because you can do a nonlocal exit...assuming nonlocal exits from signal handlers are working properly that week. :) So, it seems okay. But this behavior should also be extended to everything else that can return EINTR: e.g. read/write/send/recv at least, there's a bunch more too. But, surely the signals involved with GC should have SA_RESTART set, so that they don't even cause a EINTR in the first place? That'll keep C libraries which may not have such a loop from having issues too...and you won't have to go identify everything that can throw EINTR and stick a loop around it. James |
From: Stelian I. <sio...@co...> - 2008-11-06 22:36:04
|
On Thu, 2008-11-06 at 15:03 +0200, Nikodemus Siivola wrote: > While stressing SBCL threads with hunchentoot & drakma I frequently > see connect() returning -1 & EINTR in SB-BSD-SOCKETS -- which SBCL > currently considers an error. > > I am not a network hacker by any measure, so I would much appreciate > it if someone could tell me if just retrying connect() on EINTR is > sane? The attached patch does this. It's not the correct thing to do because in the meantime some packets may have been sent on the socket, e.g. SYN packets on TCP sockets, and the socket cannot be reset for a new connection attempt. The right thing to do is to poll the socket for writability and if the socket appears to be writable use getsockopt(,,SO_ERROR,,) to retrieve the pending error on the socket. If getsockopt() returns 0 the connection has been completed. This is also valid for non-blocking sockets where connect() usually returns immediately an EWOULDBLOCK(Linux) or EINPROGRESS(*BSDs). IMO it would be useful to have a WAIT-CONNECT &optional TIMEOUT restart that implemented the aforementioned algorithm. -- Stelian Ionescu a.k.a. fe[nl]ix Quidquid latine dictum sit, altum videtur. |
From: James Y K. <fo...@fu...> - 2008-11-07 16:12:06
|
On Nov 6, 2008, at 4:30 PM, Stelian Ionescu wrote: > On Thu, 2008-11-06 at 15:03 +0200, Nikodemus Siivola wrote: >> While stressing SBCL threads with hunchentoot & drakma I frequently >> see connect() returning -1 & EINTR in SB-BSD-SOCKETS -- which SBCL >> currently considers an error. >> >> I am not a network hacker by any measure, so I would much appreciate >> it if someone could tell me if just retrying connect() on EINTR is >> sane? The attached patch does this. > > It's not the correct thing to do because in the meantime some packets > may have been sent on the socket, e.g. SYN packets on TCP sockets, and > the socket cannot be reset for a new connection attempt. The right > thing > to do is to poll the socket for writability and if the socket > appears to > be writable use getsockopt(,,SO_ERROR,,) to retrieve the pending error > on the socket. If getsockopt() returns 0 the connection has been > completed. This is also valid for non-blocking sockets where connect() > usually returns immediately an EWOULDBLOCK(Linux) or > EINPROGRESS(*BSDs). > IMO it would be useful to have a WAIT-CONNECT &optional TIMEOUT > restart > that implemented the aforementioned algorithm. I'm pretty certain retrying connect after an EINTR gets you correct behavior... Re-calling connect doesn't actually reset the socket and try to connect again, it just resumes. At least on all OSes I've seen. James |