From: Clark, P. A. <cl...@or...> - 2015-03-03 13:21:44
|
Any reason for not updating to v7 Bacula? It contains a number of fixes as well as new features. The version that you are running is nearly 2 years old, although there were a few bug fixes along the way – however no updates since April 2014. Patti Clark Linux System Administrator R&D Systems Support Oak Ridge National Laboratory From: Robert Heinzmann <r.h...@fr...<mailto:r.h...@fr...>> Date: Tuesday, March 3, 2015 at 3:37 AM To: "bac...@li...<mailto:bac...@li...>" <bac...@li...<mailto:bac...@li...>> Subject: [Bacula-users] Bacula SD 5.2.13 crash - Mutex lock failure. ERR=Invalid argument Hello, we are using Bacula 5.2.13-18 on CentOS6 and from time to time bacula-sd crashes with, causing all backups to fail until bacula-sd is started again: Mar 3 06:59:00 XXXX bacula-sd: XXXX:storage:default: ABORTING due to ERROR in lockmgr.c:100#012Mutex lock failure. ERR=Invalid argument Mar 3 06:59:00 XXXX bacula-sd: Bacula interrupted by signal 6: IOT trap Setup: 3 Servers: 1 Bacula Director (extra machine) 1 Bacula Catalog Server (extra machine) 1 Bacula Storage Deamon (extra machine) We have ~573 Jobs (some TB, all Full Backups) to backup each day. Jobs are distributed across the day depending on minimum load of the server, distributed evenly otherwise: Time Jobs 0:00-1:00 35 1:00-2:00 121 2:00-3:00 93 3:00-4:00 60 4:00-5:00 46 5:00-6:00 71 6:00-7:00 60 7:00-8:00 43 8:00-9:00 32 9:00-10:00 12 10:00-11:00 7 11:00-12:00 3 12:00-13:00 5 13:00-14:00 2 14:00-15:00 7 15:00-16:00 8 16:00-17:00 7 17:00-18:00 3 18:00-19:00 2 19:00-20:00 3 20:00-21:00 11 21:00-22:00 14 22:00-23:00 28 23:00-24:00 25 Our SD is configured with 20 virtual drives in a backup2disk setup allowing 20 concurrent backups to disk. Each Backup Job is an individual file in the backend (so full backups can be accessed and restored through bls/bextract). We have an external “scripted” job, which cleans up unused / purged volumes from disk. Bacula Director Configuration: ------------------------------ Storage { Name = "XXXX:storage:default" Address = HOSTNAME_OF_THE_SD_MACHINE Password = "SECRET" Device = "FileStorage" Maximum Concurrent Jobs = 20 Media Type = File Heartbeat Interval = 15 TLS Enable = no } Pool { Name = " HOSTNAME_OF_THE_SD_MACHINE:pool:default" Storage = "XXXX:storage:default" # All Volumes will have the format standard.date.time to ensure they # are kept unique throughout the operation and also aid quick analysis # We won't use a counter format for this at the moment. Label Format = "BACULA-${Job}.${Year}${Month:p/2/0/r}${Day:p/2/0/r}.${Hour:p/2/0/r}${Minute:p/2/0/r}.${JobId}" Pool Type = Backup # Clean up any we don't need, and keep them for a maximum of a month (in # theory the same time period for weekly backups from the clients) # Note the files for the old volumes will still remain on the disk but will # be truncated to a zero size. Recycle = No Auto Prune = Yes Action On Purge = Truncate Volume Retention = 30 days # Don't allow re-use of volumes; one volume per job only Maximum Volume Jobs = 1 } Bacula SD Configuration: ------------------------------ Autochanger { Name = "FileStorage" Changer Device = /dev/null Changer Command = "" Device = FileStorage-sd-0 Device = FileStorage-sd-1 Device = FileStorage-sd-2 Device = FileStorage-sd-3 Device = FileStorage-sd-4 Device = FileStorage-sd-5 Device = FileStorage-sd-6 Device = FileStorage-sd-7 Device = FileStorage-sd-8 Device = FileStorage-sd-9 Device = FileStorage-sd-10 Device = FileStorage-sd-11 Device = FileStorage-sd-12 Device = FileStorage-sd-13 Device = FileStorage-sd-14 Device = FileStorage-sd-15 Device = FileStorage-sd-16 Device = FileStorage-sd-17 Device = FileStorage-sd-18 Device = FileStorage-sd-19 Device = FileStorage-sd-20 } Autochanger { Name = "FileStorage-restore" Changer Device = /dev/null Changer Command = "" Device = FileStorage-sd-restore-0 Device = FileStorage-sd-restore-1 Device = FileStorage-sd-restore-2 Device = FileStorage-sd-restore-3 Device = FileStorage-sd-restore-4 Device = FileStorage-sd-restore-5 Device = FileStorage-sd-restore-6 Device = FileStorage-sd-restore-7 Device = FileStorage-sd-restore-8 Device = FileStorage-sd-restore-9 Device = FileStorage-sd-restore-10 Device = FileStorage-sd-restore-11 Device = FileStorage-sd-restore-12 Device = FileStorage-sd-restore-13 Device = FileStorage-sd-restore-14 Device = FileStorage-sd-restore-15 Device = FileStorage-sd-restore-16 Device = FileStorage-sd-restore-17 Device = FileStorage-sd-restore-18 Device = FileStorage-sd-restore-19 Device = FileStorage-sd-restore-20 } Backup Drives like this: Device { Name = FileStorage-sd-0 # Add a hyphen to SD/autochanger name & match with drive index Device Type = File Media Type = File #unique to each archive device path, different path, different mediatype Archive Device = /bacula/data01 AutomaticMount = yes AlwaysOpen = yes RemovableMedia = yes Autochanger = yes Drive Index = 0 Maximum Concurrent Jobs = 1 Volume Poll Interval = 5 LabelMedia = yes Spool Directory = /bacula/spool01 Autoselect = yes Maximum Network Buffer Size = 65536 } … 18 more… Device { Name = FileStorage-sd-20 # Add a hyphen to SD/autochanger name & match with drive index Device Type = File Media Type = File #unique to each archive device path, different path, different mediatype Archive Device = /bacula/data01 AutomaticMount = yes AlwaysOpen = yes RemovableMedia = yes Autochanger = yes Drive Index = 20 Maximum Concurrent Jobs = 1 Volume Poll Interval = 5 LabelMedia = yes Spool Directory = /bacula/spool01 Autoselect = yes Maximum Network Buffer Size = 65536 } Restore Drives like this: Device { Name = FileStorage-sd-restore-0 # Add a hyphen to SD/autochanger name & match with drive index Device Type = File Media Type = File #unique to each archive device path, different path, different mediatype Archive Device = /bacula/data01 AutomaticMount = yes AlwaysOpen = yes RemovableMedia = yes Autochanger = yes Drive Index = 0 Maximum Concurrent Jobs = 1 Volume Poll Interval = 5 LabelMedia = yes Spool Directory = /bacula/spool01 Autoselect = no Maximum Network Buffer Size = 65536 } Any idea what’s causing the bacula-sd crash ? how can be debug further ? Regards, Robert |