From: Vlad S. <vl...@cr...> - 2006-01-12 21:27:33
|
Just from the top of my head, will it work if poll will do timeout for trigger socket as well, this will it will be waking up constantly? Zoran Vasiljevic wrote: > > Vlad, Stephen, > > What do you think? > > Anfang der weitergeleiteten E-Mail: > >> Von: Jeff Rogers <dv...@DI...> >> Datum: 12. Januar 2006 20:34:09 MEZ >> An: AOL...@LI... >> Betreff: [AOLSERVER] aolserver bug >> Antwort an: AOLserver Discussion <AOL...@LI...> >> >> I found a bug in aolserver 4.0.10 (and previous 4.x versions, not >> sure about >> earlier) that causes the server to lock up. I'm fairly certain I >> understand >> the cause, and my fix appears to work although I'm not sure it is the >> best >> approach. >> >> The bug: when benchmarking the server with a program like ab with >> concurrency=1 (that is, it issues a single request, waits for it to >> complete, then immediately issues the next one) the server will lock up, >> consuming no cpu, but not responding to any requests. >> >> My explanation: when the max number of threads is hit then when a new >> connection is queued (NsQueueConn) it will be unable to find a free >> connection in the pool and the queueing fails, and the new connection is >> added to the wait list (waitPtr). If there is a wait list then no >> drivers >> are polled for new connections (driver.c:801), rather it waits to be >> triggered (SockTrigger) to indicate that a thread is available to >> handle the >> connection. The triggering is done when the connection is completed, >> within >> NsSockClose. NsSockClose in turn is going to be called somewhere >> within the >> running of the connection (ConnRun - queue.c:617). However, the >> available >> thread is not put back onto the queue free list until after ConnRun has >> completed (queue.c:638). So if the driver thread runs in the time slice >> after ConnRun has completed for all active connections but before >> they are >> added back to the free list, then it attempts to queue the connection, >> fails, adds it to the wait list, then waits for the trigger which >> will never >> come, and everything stops. >> >> The problem is a race condition, and as such is extremely timing >> sensitive; >> I cannot reproduce the problem on a generic setup, but when I'm >> benchmarking >> my OpenACS setup it hits the bug very quickly and reliably. The >> explanation >> suggests, and my testing confirms that it seems to occur much less >> reliably >> with concurrency > 1 or if there is a small delay between sending the >> connections. Together these mean that the lockup is most likely to >> show up >> in exactly my test case, while much less likely on a production >> server or >> with high-concurrency load testing. >> >> My solution is to register SockTrigger as a ready proc, which are run >> immediately after the freed conns are put back on to the free queue >> (queue.c:645). This fixes the problem by ensuring that the trigger >> pipe is >> notified strictly after the free queue is updated and the waiting >> conn will >> sucessfully be queued. However I'm not sure this is best: NsSockClose >> attempts to minimize the number of times SockTrigger is called in the >> case >> when multiple connections are being closed at the same time; my fix >> means it >> is called exactly once for each connection, or twice counting the >> call in >> NsSockClose. It's not clear to me what adverse impact this has, if >> any, but >> one thing that could be done is to remove the SockTrigger calls from >> NsSockClose as redundant. Some additional logic could be added into >> SockTrigger to not send to the trigger pipe under certain conditions >> (i.e., >> if it has been triggered and not acknowledged yet, or if there is not >> waitin >> connection), but that would require mutex protection which could >> ultimately >> be more expensive than just blindly triggering the pipe. >> >> Here's a context diff for my patch: >> *** driver.c.orig Thu Jan 12 11:39:05 2006 >> --- driver.c Thu Jan 12 11:39:10 2006 >> *************** >> *** 773,778 **** >> --- 773,781 ---- >> drvPtr = nextDrvPtr; >> } >> >> + /* register a ready proc to trigger the poll */ >> + Ns_RegisterAtReady(SockTrigger,NULL); >> + >> /* >> * Loop forever until signalled to shutdown and all >> * connections are complete and gracefully closed. >> >> >> -J >> >> >> -- >> AOLserver - http://www.aolserver.com/ >> >> To Remove yourself from this list, simply send an email to >> <lis...@li...> with the >> body of "SIGNOFF AOLSERVER" in the email message. You can leave the >> Subject: field of your email blank. > > > > > ------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. Do you grep through log > files > for problems? Stop! Download the new AJAX search engine that makes > searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! > http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click > _______________________________________________ > naviserver-devel mailing list > nav...@li... > https://lists.sourceforge.net/lists/listinfo/naviserver-devel > -- Vlad Seryakov 571 262-8608 office vl...@cr... http://www.crystalballinc.com/vlad/ |