From: Mantis B. T. <no...@bu...> - 2011-02-23 20:46:33
|
The following issue has been ASSIGNED. ====================================================================== http://bugs.bacula.org/view.php?id=1701 ====================================================================== Reported By: mnalis Assigned To: kern ====================================================================== Project: bacula Issue ID: 1701 Category: File Daemon Reproducibility: random Severity: crash Priority: normal Status: closed Resolution: unable to reproduce Fixed in Version: ====================================================================== Date Submitted: 2011-02-17 13:39 GMT Last Modified: 2011-02-23 20:46 GMT ====================================================================== Summary: bacula-fd crashes on windows sometimes Description: Sometimes, bacula-fd crashes on windows. The interesting factors are: 1) it does not seem to be related to the backup jobs scheduled. For example, the service was restarted at 08:30, and manual backup job run at 08:43. No other jobs were run, and next job was scheduled to run on 02:00... however, bacula-fd crashed at 16:45 before the next scheduled job was to be run. Also no jobs were running at the time according to bacula-director SQL (although it is possible someone tried to query that server status or something). 2) when it crashes, the service does NOT get auto-restarted, even if it is set to auto-restart on crash (in recovery tab on service properties). Steps to Reproduce: Use windows version of bacula. No specific way to trigger this has been found, and it happens relatively rarely so far - about once in every two months. It is interesting that this happened so far on two (of about dozen of them: one x64 and one i386) different windows servers in two different occasions, and yet it never happened on any of our RHEL or Debian GNU/Linux servers (of about 150 of them). So it might be windows-only bug. Additional Information: On both servers where it happened so far, all other services seem to run without any problems. I've recommended our windows admin to upgrade all windows servers to bacula 5.0.3 and run it as C:\Program Files\Bacula\bacula-fd.exe" /service -c "C:\Program Files\Bacula\bacula-fd.conf" -dt -d100 (although -dt does not seem to work on windows, due to Bug 1700) in hope to catch more info when it repeats. Let me know if there is better way to capture more information. The auto-restart failure might be (I'm guessing here) due to ReportStatus(SERVICE_STOPPED, service_error, 0); at the end of baculaWorkerThread() in src/win32/libwin32/service.cpp -- perhaps that should be omitted if bacula-fd is not terminated cleanly but due to bug (in order to allow service manager to restart faulty bacula-fd service)? ====================================================================== ---------------------------------------------------------------------- (0005765) kern (administrator) - 2011-02-23 20:46 http://bugs.bacula.org/view.php?id=1701#c5765 ---------------------------------------------------------------------- We've *never* seen this before. My recommendation is to do as you did, get all the daemons upgraded to 5.0.3. Also make sure no one load a 32 bit FD on a 64 bit OS or vise-verse. It is the OS that counts not the architecture. About the only thing you can do is download the mingw32 debugger (gdb) install it, and run Bacula under it so that you can get a traceback when it crashes. Otherwise there is nothing we can do. The ReportStatus() has nothing to do with not restarting -- that code is called only at the beginning of the execution of the FD and then only if it fails and is going to exit. I don't know how auto-restart works on Windows and we have never explicitly programmed for it. Issue History Date Modified Username Field Change ====================================================================== 2011-02-17 13:39 mnalis New Issue 2011-02-17 13:39 mnalis Issue Monitored: mnalis 2011-02-23 20:46 kern Note Added: 0005765 2011-02-23 20:46 kern Assigned To => kern 2011-02-23 20:46 kern Status new => closed 2011-02-23 20:46 kern Resolution open => unable to reproduce ====================================================================== |