[Nagios-checkins] SF.net SVN: nagios:[2331] nagioscore/trunk/lib/worker.c
Nagios network monitoring software is enterprise server monitoring
Brought to you by:
egalstad,
sawolf-nagios
From: <ag...@us...> - 2012-10-15 12:00:50
|
Revision: 2331 http://nagios.svn.sourceforge.net/nagios/?rev=2331&view=rev Author: ageric Date: 2012-10-15 12:00:40 +0000 (Mon, 15 Oct 2012) Log Message: ----------- lib/worker: Avoid issuing 'kill(0, SIGKILL)' It appears we can sometime get child processes with pid 0 when checking if any worker jobs have timed out. That's pretty bad and it's still not known why it happens, but when it does we must take care not to issue kill(0, SIGKILL), since that kills the worker and all its running jobs, finally leaving Nagios with no workers and unable to do anything. Signed-off-by: Andreas Ericsson <ae...@op...> Modified Paths: -------------- nagioscore/trunk/lib/worker.c Modified: nagioscore/trunk/lib/worker.c =================================================================== --- nagioscore/trunk/lib/worker.c 2012-10-15 12:00:22 UTC (rev 2330) +++ nagioscore/trunk/lib/worker.c 2012-10-15 12:00:40 UTC (rev 2331) @@ -323,6 +323,11 @@ int ret; struct rusage ru; + if (!cp->ei->pid) { + wlog("No pid for job %d (%u running); '%s'", cp->id, running_jobs, cp->cmd); + return; + } + /* brutal but efficient */ ret = kill(cp->ei->pid, SIGKILL); if (ret < 0) { This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |