This is a problem I first noticed with the original
AllegroServe and have already reported to Franz.
When the server is under a heavy load, the main
listener loop is coded so that it will sleep. This can
(and does) cause it to stop listening on the port
altogether. When this happens, clients get a
connection refused error, just as if they were talking
to a machine with no web server at all.
In the file main.cl in the function http-accept-thread,
look around line 1395
((1 2 3) (logmess "all threads busy, pause")
(4 (logmess "forced to create new thread")
(5 (logmess "can't even create new thread, quitting")
(return-from http-accept-thread nil)))
You see that when all the worker threads are currently
busy, the listener itself sleeps which means that
(accept-connection) is not called and the web server is
completely offline. Increasing the number of worker
threads can reduce the chances of running into this
bug, but the code path where the listener explicitly
sleeps should simply not be there.
I proposed a solution to Franz that I they have agreed
to implement where new connections are immediately sent
to a queue and then the listener resumes listening.
Worker threads then read items off the queue. If the
size of the queue gets too big, the server can increase
the number of listeners or respond with a 500. In any
case, the server is always listening and clients always
get a response.