From: Stephen D. <sd...@gm...> - 2010-04-12 12:53:16
|
On Sun, Apr 11, 2010 at 4:16 PM, <com...@bi...> wrote: > > changeset: 2575:3963e46562e8 > user: Gustaf Neumann <ne...@wu...> > date: Sun Apr 11 17:16:27 2010 +0200 > summary: Fix to prevent multiple DriverAccepts on the same socket. > The original coded relied on the fact that later accepts lead to > an ERROR_STATE. Under RHEL 4 (Power, 64bit) the second accept blocks. > > > diff --git a/nsd/driver.c b/nsd/driver.c > --- a/nsd/driver.c > +++ b/nsd/driver.c > @@ -1523,7 +1523,22 @@ SockAccept(Driver *drvPtr, Sock **sockPt > * Accept the new connection. > */ > > - status = DriverAccept(sockPtr); > + /* > + * Hmmm: the original implementation was written in style that > + * DriverAccept was called twice, one to return for e.g. a simple, > + * new HTTP request NS_DRIVER_ACCEPT (staying in the SOCK_MORE > + * status), and then calling ACCEPT again, but which causes on our > + * RHEL 4 system (POWER6, 64bit) a hang: the second accept blocks, > + * while it returns under (most?) other system a > + * NS_DRIVER_ACCEPT_ERROR. It seems that the original code rely on > + * this ERROR handling. It is not clear to me, why the second call > + * to ACCEPT is necessary, when the socket is already available. > + */ > + if (*sockPtrPtr) { > + status = NS_DRIVER_ACCEPT_ERROR; > + } else { > + status = DriverAccept(sockPtr); > + } > > if (status == NS_DRIVER_ACCEPT_ERROR) { > status = SOCK_ERROR; > It is not clear to me, why the second call to ACCEPT is necessary, when the socket is already available. http://bitbucket.org/naviserver/naviserver/src/tip/nsd/driver.c#cl-1175 : DriverThread(): if (waitPtr == NULL) { /* * If configured, try to accept more than one request, under heavy load * this helps to process more requests */ accepted = 0; while (accepted < drvPtr->acceptsize && drvPtr->queuesize < drvPtr->maxqueuesize && PollIn(&pdata, drvPtr->pidx) && (n = SockAccept(drvPtr, &sockPtr)) != SOCK_ERROR) { switch (n) { ... |
From: Gustaf N. <ne...@wu...> - 2010-04-12 13:35:58
|
I see, this is the intention. Without the patch naviserver hangs (blocks) on that machine already at the first or second request. Teh problems seems to be already around for a while, strangely on 64 bit machines: http://www.mail-archive.com/nav...@li.../msg02244.html We can try varying the acceptsize to see what happens.... -gustaf neumann Am 12.04.10 14:52, schrieb Stephen Deasey: > On Sun, Apr 11, 2010 at 4:16 PM,<com...@bi...> wrote: > >> changeset: 2575:3963e46562e8 >> user: Gustaf Neumann<ne...@wu...> >> date: Sun Apr 11 17:16:27 2010 +0200 >> summary: Fix to prevent multiple DriverAccepts on the same socket. >> The original coded relied on the fact that later accepts lead to >> an ERROR_STATE. Under RHEL 4 (Power, 64bit) the second accept blocks. >> >> >> diff --git a/nsd/driver.c b/nsd/driver.c >> --- a/nsd/driver.c >> +++ b/nsd/driver.c >> @@ -1523,7 +1523,22 @@ SockAccept(Driver *drvPtr, Sock **sockPt >> * Accept the new connection. >> */ >> >> - status = DriverAccept(sockPtr); >> + /* >> + * Hmmm: the original implementation was written in style that >> + * DriverAccept was called twice, one to return for e.g. a simple, >> + * new HTTP request NS_DRIVER_ACCEPT (staying in the SOCK_MORE >> + * status), and then calling ACCEPT again, but which causes on our >> + * RHEL 4 system (POWER6, 64bit) a hang: the second accept blocks, >> + * while it returns under (most?) other system a >> + * NS_DRIVER_ACCEPT_ERROR. It seems that the original code rely on >> + * this ERROR handling. It is not clear to me, why the second call >> + * to ACCEPT is necessary, when the socket is already available. >> + */ >> + if (*sockPtrPtr) { >> + status = NS_DRIVER_ACCEPT_ERROR; >> + } else { >> + status = DriverAccept(sockPtr); >> + } >> >> if (status == NS_DRIVER_ACCEPT_ERROR) { >> status = SOCK_ERROR; >> > > > > >> It is not clear to me, why the second call to ACCEPT is necessary, when the socket is already available. >> > > > http://bitbucket.org/naviserver/naviserver/src/tip/nsd/driver.c#cl-1175 : > > > DriverThread(): > > if (waitPtr == NULL) { > /* > * If configured, try to accept more than one request, > under heavy load > * this helps to process more requests > */ > > accepted = 0; > while (accepted< drvPtr->acceptsize > && drvPtr->queuesize< drvPtr->maxqueuesize > && PollIn(&pdata, drvPtr->pidx) > && (n = SockAccept(drvPtr,&sockPtr)) != SOCK_ERROR) { > > switch (n) { > ... > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > naviserver-devel mailing list > nav...@li... > https://lists.sourceforge.net/lists/listinfo/naviserver-devel > |
From: Stephen D. <sd...@gm...> - 2010-04-12 14:04:18
|
On Mon, Apr 12, 2010 at 2:33 PM, Gustaf Neumann <ne...@wu...> wrote: > > Without the patch naviserver hangs (blocks) on that machine already > at the first or second request. This should never happen as the listen socket is set to non-blocking mode, here: http://bitbucket.org/naviserver/naviserver/src/tip/nssock/nssock.c#cl-131 nssock.c:Accept() calls nsd/sock.c:Ns_SockAccept() which calls accept(2), who's man page says: If no pending connections are present on the queue, and the socket is not marked as non-blocking, accept() blocks the caller until a connection is present. If the socket is marked non-blocking and no pending connections are present on the queue, accept() fails with the error EAGAIN or EWOULDBLOCK. In which case, Ns_SockAccept looks wrong: SOCKET Ns_SockAccept(SOCKET lsock, struct sockaddr *saPtr, int *lenPtr) { SOCKET sock; sock = accept(lsock, saPtr, (socklen_t *) lenPtr); if (sock != INVALID_SOCKET) { sock = SockSetup(sock); } return sock; } Shouldn't this be something more like: if (sock > -1) { sock = SockSetup(sock); } as INVALID_SOCKET is -1 but there is more than one possible error state? (I haven't tried this...) |
From: Gustaf N. <ne...@wu...> - 2010-04-12 15:35:59
|
Cool, good guess, but it did not help - but the proposed change should go into the repository. Not surpisingly, the hang happens only when (acceptsize > 1) .... But, i got that sucker: In the following ioctl (and others maybe as well) the "long" is wrong, since linux expects there an int. That's why we see the problem with 64bit machines and on bsd! ======================================== int Ns_SockSetNonBlocking(SOCKET sock) { unsigned long nb = 1; if (ns_sockioctl(sock, FIONBIO, &nb) == -1) { return NS_ERROR; } return NS_OK; } ======================================== Interestingly OpenBSD and linux http://www.rootr.net/man/man/ioctl/2 http://linux.die.net/man/2/ioctl_list list the type as "int", while e.g. mks wants a "long" http://www.mkssoftware.com/docs/man5/sockets.5.asp Does anyone know, how to figure out, on which platforms the long is wanted? The majority seems to be "int". -gustaf Am 12.04.10 16:03, schrieb Stephen Deasey: > On Mon, Apr 12, 2010 at 2:33 PM, Gustaf Neumann<ne...@wu...> wrote: > >> Without the patch naviserver hangs (blocks) on that machine already >> at the first or second request. >> > > This should never happen as the listen socket is set to non-blocking mode, here: > > http://bitbucket.org/naviserver/naviserver/src/tip/nssock/nssock.c#cl-131 > > nssock.c:Accept() calls nsd/sock.c:Ns_SockAccept() which calls > accept(2), who's man page says: > > If no pending connections are present on the queue, and the socket > is not marked > as non-blocking, accept() blocks the caller until a connection is > present. If the socket > is marked non-blocking and no pending connections are present on > the queue, accept() > fails with the error EAGAIN or EWOULDBLOCK. > > In which case, Ns_SockAccept looks wrong: > > SOCKET > Ns_SockAccept(SOCKET lsock, struct sockaddr *saPtr, int *lenPtr) > { > SOCKET sock; > > sock = accept(lsock, saPtr, (socklen_t *) lenPtr); > > if (sock != INVALID_SOCKET) { > sock = SockSetup(sock); > } > > return sock; > } > > > Shouldn't this be something more like: > > if (sock> -1) { > sock = SockSetup(sock); > } > > as INVALID_SOCKET is -1 but there is more than one possible error state? > > (I haven't tried this...) > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > naviserver-devel mailing list > nav...@li... > https://lists.sourceforge.net/lists/listinfo/naviserver-devel > -- Univ.Prof. Dr. Gustaf Neumann Institute of Information Systems and New Media WU Vienna Augasse 2-6, A-1090 Vienna, AUSTRIA |
From: Stephen D. <sd...@gm...> - 2010-04-12 16:31:23
|
On Mon, Apr 12, 2010 at 3:58 PM, Gustaf Neumann <ne...@wu...> wrote: > > Cool, good guess, but it did not help - but the proposed > change should go into the repository. Actually this doesn't make any sense and shows how long it's been since I've touched any C. > But, i got that sucker: In the following ioctl (and others > maybe as well) the "long" is wrong, since linux expects > there an int. That's why we see the problem with 64bit > machines and on bsd! Well spotted. > Does anyone know, how to figure out, on which platforms > the long is wanted? The majority seems to be "int". Not sure. Just go with int? How many people are running on MKS or Amiga? |
From: Gustaf N. <ne...@wu...> - 2010-04-13 10:59:53
|
Am 12.04.10 18:30, schrieb Stephen Deasey: > Actually this doesn't make any sense and shows how long it's been > since I've touched any C. > i haven't found any manpage where accepts results different errors than -1. >> But, i got that sucker: In the following ioctl (and others >> maybe as well) the "long" is wrong, since linux expects >> there an int. That's why we see the problem with 64bit >> machines and on bsd! >> > Well spotted. > >> Does anyone know, how to figure out, on which platforms >> the long is wanted? The majority seems to be "int". >> > > Not sure. Just go with int? How many people are running on MKS or Amiga? > replaced 2 occurrences of the problem with int and undid the first fix. Best regards -gustaf neumann |