From: Martin S. <ma...@li...> - 2013-01-11 14:52:15
|
>>>>> On Fri, 11 Jan 2013 10:00:16 +0100, Uwe Schuerkamp said: > > Hello all, > > even with bacula version 5.2.12 we're still seeing the errors below > where bacula sometimes fails to properly recycle a disk based volume. > > There are lots of error messages like these: > > ###################################################################### > 10-Jan 21:16 serverl186-dir JobId 51100: Start Backup JobId 51100, Job=serverw2107.2013-01-10_16.39.52_51 > 10-Jan 21:16 serverl186-dir JobId 51100: Purging oldest volume "incremental-0509" > 10-Jan 21:16 serverl186-dir JobId 51100: 0 File on Volume "incremental-0509" purged from catalog. > 10-Jan 21:16 serverl186-dir JobId 51100: Purging oldest volume "incremental-0509" > 10-Jan 21:16 serverl186-dir JobId 51100: 0 File on Volume "incremental-0509" purged from catalog. > 10-Jan 21:16 serverl186-dir JobId 51100: Purging oldest volume "incremental-0509" > .... > <SNIP SNIP> > .... > > 10-Jan 22:19 serverw2107-fd JobId 51100: DIR and FD clocks differ by -30 seconds, FD automatically compensating. > 10-Jan 22:19 serverl186-dir JobId 51100: Purging oldest volume "incremental-0509" > 10-Jan 22:19 serverl186-dir JobId 51100: 0 File on Volume "incremental-0509" purged from catalog. > 10-Jan 22:19 serverl186-dir JobId 51100: Purging oldest volume "incremental-0509" > 10-Jan 22:19 serverl186-dir JobId 51100: 0 File on Volume "incremental-0509" purged from catalog. > 10-Jan 22:19 serverl186-sd JobId 51100: Job serverw2107.2013-01-10_16.39.52_51 is waiting. Cannot find any appendable volumes. > Please use the "label" command to create a new Volume for: > Storage: "FileStorage_incremental" (/mnt/msa/online_backup) > Pool: Online_incremental > Media type: File > 10-Jan 22:24 serverl186-sd JobId 51100: Volume "incremental-0508" previously written, moving to end of data. > 10-Jan 22:24 serverl186-sd JobId 51100: Ready to append to end of Volume "incremental-0508" size=1202 > 10-Jan 22:24 serverw2107-fd JobId 51100: Generate VSS snapshots. Driver="VSS Vista", Drive(s)="C" > 10-Jan 22:24 serverw2107-fd JobId 51100: VSS Writer (BackupComplete): "Task Scheduler Writer", State: 0x1 (VSS_WS_STABLE) > 10-Jan 22:24 serverw2107-fd JobId 51100: VSS Writer (BackupComplete): "VSS Metadata Store Writer", State: 0x1 (VSS_WS_STABLE) > 10-Jan 22:24 serverw2107-fd JobId 51100: VSS Writer (BackupComplete): "Performance Counters Writer", State: 0x1 (VSS_WS_STABLE) > 10-Jan 22:24 serverw2107-fd JobId 51100: VSS Writer (BackupComplete): "System Writer", State: 0x1 (VSS_WS_STABLE) > 10-Jan 22:24 serverw2107-fd JobId 51100: VSS Writer (BackupComplete): "ASR Writer", State: 0x1 (VSS_WS_STABLE) > 10-Jan 22:24 serverw2107-fd JobId 51100: VSS Writer (BackupComplete): "Shadow Copy Optimization Writer", State: 0x1 (VSS_WS_STABLE) > 10-Jan 22:24 serverw2107-fd JobId 51100: VSS Writer (BackupComplete): "COM+ REGDB Writer", State: 0x1 (VSS_WS_STABLE) > 10-Jan 22:24 serverw2107-fd JobId 51100: VSS Writer (BackupComplete): "Registry Writer", State: 0x1 (VSS_WS_STABLE) > 10-Jan 22:24 serverw2107-fd JobId 51100: VSS Writer (BackupComplete): "IIS Config Writer", State: 0x1 (VSS_WS_STABLE) > 10-Jan 22:24 serverw2107-fd JobId 51100: VSS Writer (BackupComplete): "IIS Metabase Writer", State: 0x1 (VSS_WS_STABLE) > 10-Jan 22:24 serverw2107-fd JobId 51100: VSS Writer (BackupComplete): "BITS Writer", State: 0x1 (VSS_WS_STABLE) > 10-Jan 22:24 serverw2107-fd JobId 51100: VSS Writer (BackupComplete): "WMI Writer", State: 0x1 (VSS_WS_STABLE) > 10-Jan 22:25 serverl186-sd JobId 51100: Elapsed time=00:00:24, Transfer rate=21 Bytes/second > 10-Jan 22:25 serverl186-dir JobId 51100: Fatal error: Catalog error creating JobMedia record. sql_create.c:155 Update Media record UPDATE Media SET EndFile=50, EndBlock=1638829081 WHERE MediaId=58 failed: ERR= "ERR=" looks like it failed to report the real error. > 10-Jan 22:25 serverl186-sd JobId 51100: Fatal error: Error creating JobMedia record: 1992 Create JobMedia error > > 10-Jan 22:25 serverl186-sd JobId 51100: Fatal error: acquire.c:516 Could not create JobMedia record for Volume="incremental-0508" Job=serverw2107.2013-01-10_16.39.52_51 > 10-Jan 22:25 serverl186-dir JobId 51100: Error: Unable to get Media record for Volume incremental-0058: ERR=sql_get.c:1094 Media record for Volume "incremental-0058" not found. Do you know which volume is used? There seems to be some confusion between incremental-0058 and incremental-0508. Is that a common feature of other failures like this? > ###################################################################### > > At this time in mysql, I can still see a long running query initiated > by the volume recycle process: > > ###################################################################### > # mystat > Id User Host db Command Time State Info > Progress > 16715 bacula localhost bacula Query 18534 updating > DELETE FROM File WHERE JobId IN > (48428,48430,48432,48433,48434,48566,48568,48572,48574,48576,48578,4 > 0.000 > 16774 root localhost NULL Query 0 NULL show > processlist 0.000 > > ###################################################################### > > As you can see above, the query to delete the files for the job > records found on the volume has been running for over 18,000 seconds. > > We don't generally seem to have a problem with long recycle times, but > every once in a while bacula hangs in the circumstances described > above. > > The error seems to occur roughly one hour after the recycling starts, > so could this be some sort of maximum wait interval expiring or > something? > > Any ideas how to fix this situation would be greatly appreciated. Can you log all queries in the mysql server to see if that records anthing interesting? __Martin |