From: <pm...@ci...> - 2008-03-30 02:13:43
|
Dear all, a couple of weeeks ago I have talked about a problem I perceive, that the Bacula Director is not terminating cleanly when running certain schedules. I have now seen this problem in version 2.2.7 and 2.2.8, I also have upgraded the OS (from FreeBSD 5.5 to 6.3-p1) and everything else. It is persistent. So now, in order to get some more specific information, I would ask You for a little help - if You can afford it. It will only take about 5 minutes and two restarts of the Bacula Director. This is only relevant if You run the Bacula Director on a *Unix* or equivalent OS (preferrably a BSD style, but let's see what we can find out), and if You have a recent Version (>= 2.2.7). And, BTW, if You do not feel comfortable with what I am talking about, or do not get the clue of it, then please just leave it alone. Same goes, if Your Bacula is never idle or critically productive. (I don't want to disrupt anybody's systems.) 0. Choose a time when Your Bacula does nothing for a while: no jobs running or pending for reschedule. 1. Send a normal "kill" (SIGTERM) to the Director. This is the regular way to stop it, and it should cleanly shutdown. 2. After some seconds check if the Director really has shutdown. Chances are that it only got inresponsive, so please look into the process table (ps). If it is still there, send it another normal "kill" and check again. If the director now has terminated, then this is just what I perceive. Restart it normally, and You're done, and please tell me Your OS version. If it does still not terminate, then You may have some other installation or configuration problem, which is not my concern here. If the Director has terminated at the first try, then please continue: 3. Configure a Schedule in the config file, like the following: Schedule { Name = Halfhourly Run = at 00:04 hourly Run = at 00:34 hourly } 4. Configure some backup job that actually uses this schedule. It does not matter what this backup job does, because we are done in 5 minutes, so just choose the two execution times so that it will never be executed. 5. Send a "kill -HUP" to the Director, so that it reloads the config file. Check with "bconsole" command "stat dir", that the new job appears in the list of scheduled jobs as it should. 6. Now again send a normal "kill" (SIGQUIT) to the Director, and check if it does terminate. If it does terminate, then with Your OS it seem to be fine. If it does not terminate, send it another "kill" (SIGQUIT), and quite likely now it will terminate. 7. In both cases, after the Director has terminated, remove the Schedule and the Job from the config file, restart normally and continue regular operation. 8. Please tell me the outcome of step 6. rgds, PMc |