From: Mantis B. T. <no...@bu...> - 2011-04-11 18:56:54
|
The following issue has been SUBMITTED. ====================================================================== http://bugs.bacula.org/view.php?id=1723 ====================================================================== Reported By: carlson39 Assigned To: ====================================================================== Project: bacula Issue ID: 1723 Category: Director Reproducibility: sometimes Severity: crash Priority: normal Status: new ====================================================================== Date Submitted: 2011-04-11 15:36 GMT Last Modified: 2011-04-11 15:36 GMT ====================================================================== Summary: Director seems to crash when +200 jobs are running Description: Currently, the Director, storage node, and Postgresql database are all running on one system. Around 8pm, the director crashed while running 139 Full jobs (plus 31 jobs that started at 6pm,144 jobs at 7pm...) at 7:03pm, the Storage node/Director emailed this error: 09-Apr 19:03 write.llnl.gov-dir JobId 0: Fatal error: authenticate.c:120 Director unable to authenticate with Storage daemon at "write-01.llnl.gov:9103". Possible causes: Passwords or names not the same or Maximum Concurrent Jobs exceeded on the SD or SD networking messed up (restart daemon). Please see http://www.bacula.org/en/rel-manual/Bacula_Freque_Asked_Questi.html#SECTION003760000000000000000 for help. This so far has happened dozens of times since I reached around 1100 clients configured with our Bacula environment, and this is the 2nd time I've gotten a usable traceback. That is attached. Steps to Reproduce: Not sure, schedule http://bugs.bacula.org/view.php?id=104#c300 jobs at once? Additional Information: I have 5 storage nodes: write-01 - write-05 All are running FreeBSD 8.x, 64bit hardware (all HP DL180 models), running ZFS. the storage nodes have 1024 Storage Devices configured ( 1 for Incr Jobs, 1 for VirtualFulls/Fulls). Each device is configured for 2 maximum concurrent jobs. So, 2048 "jobs" per storage node The director is configured for 1000 max concurrent jobs. ====================================================================== Issue History Date Modified Username Field Change ====================================================================== 2011-04-11 15:36 carlson39 New Issue 2011-04-11 15:36 carlson39 File Added: bacula.1426.traceback ====================================================================== |