From: <bac...@li...> - 2007-09-27 16:36:06
|
A NOTE has been added to this issue. ====================================================================== http://bugs.bacula.org/view.php?id=927 ====================================================================== Reported By: drescherjm Assigned To: ====================================================================== Project: bacula Issue ID: 927 Category: Storage Daemon Reproducibility: sometimes Severity: minor Priority: normal Status: feedback ====================================================================== Date Submitted: 08-21-2007 11:45 EDT Last Modified: 09-27-2007 11:45 EDT ====================================================================== Summary: Multi drive changer keeps switching which drive a job runs on and blocks because the tape is in the other drive. Description: I upgraded to bacula-2.2.0 last week and every night the storage daemon gets stuck waiting for a mount request on an autochanger when the requested tape is already in a drive of the changer but not the drive that the storage daemon wants. This has happened 3 nights in a row and it did not occur with bacula-2.1.X (not sure of the exact version of the sd at the time of the upgrade). ====================================================================== ---------------------------------------------------------------------- drescherjm - 08-21-07 11:49 ---------------------------------------------------------------------- BTW, The pool LTO2-Study-Backup is not the pool of this tape A00015 as it should be UserBackup-LTO2. ---------------------------------------------------------------------- kern - 08-21-07 13:23 ---------------------------------------------------------------------- This is a bit surprising because it is exactly the kind of situation that 2.2.0 *should* handle much better than 2.0.x Your note about the pool difference is probably the key to the problem. Could you restart all the daemons, and run the SD with debug level set to 51 and the output directed to a file? When it blocks again, please make a copy of the file and attach it to this bug report. It should allow me to find out why you are having this problem. It would also be helpful to have the Job reports of all the jobs that did run during the debug period. ---------------------------------------------------------------------- drescherjm - 08-21-07 13:36 ---------------------------------------------------------------------- Thanks Kern. I will do that after the waiting jobs finish. I issued mount on the first drive and it appears bacula is continuing with the backups: 3001 OK mount. Device="LTO2-0" (/dev/nst0) 21-Aug 13:31 dev6-sd: 3301 Issuing autochanger "loaded? drive 0" command. 21-Aug 13:31 dev6-sd: 3302 Autochanger "loaded? drive 0", result: nothing loaded. 21-Aug 13:31 dev6-sd: 3301 Issuing autochanger "loaded? drive 0" command. 21-Aug 13:31 dev6-sd: 3302 Autochanger "loaded? drive 0", result: nothing loaded. 21-Aug 13:31 dev6-sd: 3301 Issuing autochanger "loaded? drive 0" command. 21-Aug 13:31 dev6-sd: 3302 Autochanger "loaded? drive 0", result: nothing loaded. 21-Aug 13:31 dev6-sd: 3307 Issuing autochanger "unload slot 17, drive 1" command. ---------------------------------------------------------------------- drescherjm - 08-23-07 10:34 ---------------------------------------------------------------------- Kern, I just wanted to add that after restarting all the daemons with debugging enabled on the sd this has not happened 2 nights in a row. ---------------------------------------------------------------------- kern - 08-29-07 06:39 ---------------------------------------------------------------------- I'm closing this bug report since I'm not sure how to reproduce it and thus how to fix it. If it does re-occur, please re-open the bug report and supply additional information. ---------------------------------------------------------------------- drescherjm - 09-10-07 12:23 ---------------------------------------------------------------------- It looks like the problem is back: 08-Sep 21:30 fileserver-dir: Start Backup JobId 6605, Job=radimgws49-full.2007-09-08_20.30.10 08-Sep 21:30 fileserver-dir: Using Device "LTO2-0" 08-Sep 21:35 radimgws49-fd: DIR and FD clocks differ by 298 seconds, FD automatically adjusting. 09-Sep 04:07 dev6-sd: Please mount Volume "A00037" or label a new one for: Job: dev2-CMT-external.2007-09-07_20.30.09 Storage: "LTO2-0" (/dev/nst0) Pool: UserBackup-LTO2 Media type: LTO-2 09-Sep 06:07 dev6-sd: Please mount Volume "A00037" or label a new one for: Job: dev2-CMT-external.2007-09-07_20.30.09 Storage: "LTO2-0" (/dev/nst0) Pool: UserBackup-LTO2 Media type: LTO-2 09-Sep 10:07 dev6-sd: Please mount Volume "A00037" or label a new one for: Job: dev2-CMT-external.2007-09-07_20.30.09 Storage: "LTO2-0" (/dev/nst0) Pool: UserBackup-LTO2 Media type: LTO-2 09-Sep 18:07 dev6-sd: Please mount Volume "A00037" or label a new one for: Job: dev2-CMT-external.2007-09-07_20.30.09 Storage: "LTO2-0" (/dev/nst0) Pool: UserBackup-LTO2 Media type: LTO-2 10-Sep 10:07 dev6-sd: Please mount Volume "A00037" or label a new one for: Job: dev2-CMT-external.2007-09-07_20.30.09 Storage: "LTO2-0" (/dev/nst0) Pool: UserBackup-LTO2 Media type: LTO-2 ##status Status available for: 1: Director 2: Storage 3: Client 4: All Select daemon type for status (1-4): Unexpected question has been received. 2 The defined Storage resources are: 1: File 2: DLT-IV 3: DVD-FILESERVER 4: DEV6-DLT-IV-0 5: DEV6-LTO-0 6: DEV6-LTO2-1 7: DEV6-Changer Select Storage resource (1-7): Unexpected question has been received. 7 Connecting to Storage daemon DEV6-Changer at dev6.radimg.pitt.edu:9103 dev6-sd Version: 2.2.0 (08 August 2007) x86_64-pc-linux-gnu gentoo Daemon started 21-Aug-07 15:03, 195 Jobs run since started. Heap: heap=1,069,056 smbytes=786,394 max_bytes=1,311,747 bufs=273 max_bufs=282 Sizes: boffset_t=8 size_t=8 int32_t=4 int64_t=8 Running Jobs: Writing: Incremental Backup job radimgws40-full JobId=6593 Volume="A00037" pool="UserBackup-LTO2" device="LTO2-0" (/dev/nst0) spooling=0 despooling=0 despool_wait=0 Files=0 Bytes=0 Bytes/sec=0 FDReadSeqNo=6 in_msg=6 out_msg=4 fd=11 Writing: Incremental Backup job dev2-CMT-external JobId=6586 Volume="A00037" pool="UserBackup-LTO2" device="LTO2-0" (/dev/nst0) spooling=0 despooling=0 despool_wait=0 Files=0 Bytes=0 Bytes/sec=0 FDReadSeqNo=6 in_msg=6 out_msg=4 fd=4 Writing: Incremental Backup job radimgws3-full JobId=6602 Volume="A00037" pool="UserBackup-LTO2" device="LTO2-0" (/dev/nst0) spooling=0 despooling=0 despool_wait=0 Files=0 Bytes=0 Bytes/sec=0 FDReadSeqNo=6 in_msg=6 out_msg=4 fd=12 Writing: Incremental Backup job radimgws40-outlook JobId=6603 Volume="A00037" pool="UserBackup-LTO2" device="LTO2-0" (/dev/nst0) spooling=0 despooling=0 despool_wait=0 Files=0 Bytes=0 Bytes/sec=0 FDReadSeqNo=5 in_msg=4 out_msg=3 fd=14 Writing: Incremental Backup job radimgws49-full JobId=6605 Volume="A00037" pool="UserBackup-LTO2" device="LTO2-0" (/dev/nst0) spooling=0 despooling=0 despool_wait=0 Files=0 Bytes=0 Bytes/sec=0 FDReadSeqNo=6 in_msg=6 out_msg=4 fd=15 ==== Jobs waiting to reserve a drive: ==== Terminated Jobs: JobId Level Files Bytes Status Finished Name =================================================================== 6584 Incr 0 0 OK 07-Sep-07 20:30 dev2-outlook 6590 Incr 914 209.1 M OK 07-Sep-07 20:34 radimgws49-full 6587 Incr 256 862.7 M OK 07-Sep-07 20:41 radimgws3-full 6591 Incr 150 952.4 M OK 07-Sep-07 20:48 radimgws0-full 6585 Incr 847 716.5 M OK 07-Sep-07 20:52 dev2-full 6583 Incr 0 0 Other 07-Sep-07 20:59 radimgtm0-full 6589 Incr 0 0 Other 07-Sep-07 21:21 radimgws20-impt 6592 Incr 0 0 Other 07-Sep-07 21:30 radimgws23-data 6598 Incr 0 0 Other 08-Sep-07 20:59 radimgtm0-full 6604 Incr 0 0 Other 08-Sep-07 21:30 radimgws20-impt ==== Device status: Autochanger "Magnum224-0" with devices: "LTO2-0" (/dev/nst0) "LTO2-1" (/dev/nst1) Device "LTO2-0" (/dev/nst0) is not open. Device is BLOCKED waiting for mount of volume "A00037", Pool: UserBackup-LTO2 Media type: LTO-2 Drive 0 status unknown. Configured device capabilities: EOF BSR BSF FSR FSF EOM REM !RACCESS AUTOMOUNT LABEL !ANONVOLS ALWAYSOPEN Device state: !OPENED TAPE !LABEL !MALLOC !APPEND !READ !EOT !WEOT !EOF !NEXTVOL !SHORT !MOUNTED num_writers=0 block=2 Device parameters: Archive name: /dev/nst0 Device name: LTO2-0 File=0 block=0 Min block=0 Max block=0 Device "LTO2-1" (/dev/nst1) is mounted with: Volume: A00037 Pool: LTO2-Study-Backup Media type: LTO-2 Slot 4 is loaded in drive 1. Configured device capabilities: EOF BSR BSF FSR FSF EOM REM !RACCESS AUTOMOUNT LABEL !ANONVOLS ALWAYSOPEN Device state: OPENED TAPE LABEL !MALLOC APPEND !READ !EOT !WEOT !EOF !NEXTVOL !SHORT !MOUNTED num_writers=0 block=0 Device parameters: Archive name: /dev/nst1 Device name: LTO2-1 File=223 block=0 Min block=0 Max block=0 Total Bytes=213,148,293,120 Blocks=3,304,009 Bytes/block=64,512 Positioned at File=223 Block=0 Device "DEV6-DLT-IV-0" (/dev/nst2) is not open. Configured device capabilities: EOF BSR BSF FSR FSF EOM REM !RACCESS AUTOMOUNT LABEL !ANONVOLS ALWAYSOPEN Device state: !OPENED TAPE !LABEL !MALLOC !APPEND !READ !EOT !WEOT !EOF !NEXTVOL !SHORT !MOUNTED num_writers=0 block=0 Device parameters: Archive name: /dev/nst2 Device name: DEV6-DLT-IV-0 File=0 block=0 Min block=262144 Max block=262144 Device "DEV6-DVD-WRITER" (/dev/dvd) is not open. Configured device capabilities: EOF BSR BSF FSR FSF EOM REM RACCESS AUTOMOUNT LABEL !ANONVOLS !ALWAYSOPEN Device state: !OPENED !TAPE !LABEL !MALLOC !APPEND !READ !EOT !WEOT !EOF !NEXTVOL !SHORT !MOUNTED num_writers=0 block=0 Device parameters: Archive name: /dev/dvd Device name: DEV6-DVD-WRITER File=0 block=0 Min block=0 Max block=0 ==== In Use Volume status: A00030 on device "LTO2-1" (/dev/nst1) Reader=0 writers=0 reserved=0 A00037 on device "LTO2-0" (/dev/nst0) Reader=0 writers=0 reserved=5 ==== Data spooling: 0 active jobs, 0 bytes; 134 total jobs, 6,442,333,843 max bytes/job. Attr spooling: 0 active jobs, 0 bytes; 134 total jobs, 89,892,830 max bytes. ==== # Notice tape A00037 is in drive 1 and again the status shows the wrong pool LTO2-Study-Backup I have activated the debugging on the SD with the debug level set to 51 on August 21st so I assume that debug log is enormous. Would you like me to trim it down a bit. It looks like the problem started on September 7th or 8th as the backups that ran on Thursday September 6th ran fine. ---------------------------------------------------------------------- drescherjm - 09-10-07 18:53 ---------------------------------------------------------------------- Kern, I have attached the log file that begins on the 21st of August and has the entire sd log since then. The other files are the messages (grabbed in a wxconsole) and the current dir status. Thanks, John ---------------------------------------------------------------------- drescherjm - 09-18-07 14:14 ---------------------------------------------------------------------- I believe this may have something to do with manually mounting or releasing of volumes. I mean if I issue a mount or release command on a tape then later the nightly backups may run into this problem. If the system gets into this state it will happen every night until the director is restarted. ---------------------------------------------------------------------- kern - 09-23-07 15:16 ---------------------------------------------------------------------- If you can, you might want to test the code that is in the current SVN rev 5632 or later. It may fix your problem. However, please be aware that the code *may* be somewhat unstable since I made some important modifications to the reservations system. It should be much less likely to try to put two different Volumes on the same drive. ---------------------------------------------------------------------- drescherjm - 09-27-07 11:45 ---------------------------------------------------------------------- No problems so far but I have not done any manual operations on the changer since I have installed the svn bacula on the machine with the autochanger. Issue History Date Modified Username Field Change ====================================================================== 08-21-07 11:45 drescherjm New Issue 08-21-07 11:49 drescherjm Note Added: 0002690 08-21-07 11:57 drescherjm Issue Monitored: drescherjm 08-21-07 13:23 kern Note Added: 0002691 08-21-07 13:23 kern Status new => feedback 08-21-07 13:36 drescherjm Note Added: 0002692 08-23-07 10:34 drescherjm Note Added: 0002694 08-29-07 06:39 kern Note Added: 0002697 08-29-07 06:39 kern Status feedback => closed 08-29-07 06:39 kern Resolution open => unable to duplicate 09-10-07 12:23 drescherjm Status closed => feedback 09-10-07 12:23 drescherjm Resolution unable to duplicate => reopened 09-10-07 12:23 drescherjm Note Added: 0002727 09-10-07 18:46 drescherjm File Added: bacula-dir.status 09-10-07 18:50 drescherjm File Added: bacula-sd.log.new.bz2 09-10-07 18:51 drescherjm File Added: bacula-messages.log 09-10-07 18:53 drescherjm Note Added: 0002733 09-18-07 14:14 drescherjm Note Added: 0002786 09-23-07 15:16 kern Note Added: 0002804 09-27-07 11:45 drescherjm Note Added: 0002806 ====================================================================== |