From: <bac...@li...> - 2005-06-28 16:56:52
|
A BUGNOTE has been added to this bug. ====================================================================== http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000352 ====================================================================== Reported By: kagato Assigned To: ====================================================================== Project: bacula Bug ID: 352 Category: Director Reproducibility: sometimes Severity: major Priority: normal Status: feedback ====================================================================== Date Submitted: 06-18-2005 07:18 PDT Last Modified: 06-28-2005 09:56 PDT ====================================================================== Summary: Autochanger Doesn't Try To Mount Tape Description: Previously working autochanger setup. I have an Exabyte VXA 1U 10-slot changer with a bar code reader. The plan has been to barcode the tapes (done), have Bacula read the codes from the changer (works), set a fairly short recycle time (done), and have the customer rotate through the tapes in a linear circular fashion. Thus, Bacula should decide to recycle the old tapes as they go through. There has been the issue of running "update slots" before the backup, although I have a cron-job with bconsole to do this. This generally works for about two weeks. At some point, it mysteriously decides to stop loading tapes. Running the SD in debug mode indicates that it repeatedly tries to open the tape device without ever attempting to load a tape. The only way I can fix it is to nuke the database file and start from scratch. As you'd imagine that's not what I would consider a long-term solution. I don't have time to sanitize the configuration files to post them right now, but I've included below a snippet of the errors produced in the log. ====================================================================== ---------------------------------------------------------------------- kern - 06-18-2005 07:44 PDT ---------------------------------------------------------------------- First, there is no reason to nuke the database and start over -- Bacula doesn't require that in any situation. The error you are getting is something that shouldn't happen. It is getting an I/O error trying to open the tape. This indicates that there is no tape in the drive. I suspect that this is a support problem rather than a bug. Before continuing with the bug report, this is something that you should bring up and discuss with the bacula-users list where you can get help. They may be able to immediately help you if it is a problem with your conf files, otherwise I suggest that you do a "list volume=nnn" on each of your volumes at the beginning when things work, then when it doesn't work, do another "list volume=nnn" on each volume and send the two outputs to the list. ---------------------------------------------------------------------- kern - 06-22-2005 13:22 PDT ---------------------------------------------------------------------- Most likely not a bug, but a configuration problem. If consultation with the bacula-users list indicates it is a bug, please re-open it. ---------------------------------------------------------------------- kagato - 06-23-2005 14:36 PDT ---------------------------------------------------------------------- Okay, I've determined what was going wrong. Perhaps this was intentional in the design, so I'll ask here (for documentation). Every night, we swap tapes. Depending on the status of the next backup (Full or Diff) we put in different numbers of tapes. After the tapes are changed (but before the next backup) a cron job runs "update slots". This updates the InChanger and Slot values from the barcodes on the changer via mtx. When slots are empty, the mtx-changer script does not report them. The update slots command does NOT mark the unreported slots as InChanger=0. As such, it tries to load a tape from a slot that is empty and fails. I've currently worked around this by having my update-slots script do and sqlquery to "UPDATE Media SET InChanger=0;". Is this intentional? Is there a solution I've missed. Thanks for your consideration. ---------------------------------------------------------------------- kern - 06-25-2005 01:06 PDT ---------------------------------------------------------------------- No, the fact that slots not reported is not a bug (unless I am missunderstanding your problem). It is intentional. It permits users to update or scan a small number of slots rather than all. Just imagine if you had a 30,000 tape autochanger and each time you changed one volume you needed to rescan the whole list. It appears to me that your mtx-changer script (perhaps the default one?) is broken. Your mtx-script should report all slots in the specified range (default full range) whether or not they have Volumes. If this is a behavior of the default mtx-changer script, please open a new bug report with specific examples of "list volumes" "update slots" (output from the mtx-changer), and "list volumes". ---------------------------------------------------------------------- kagato - 06-28-2005 09:49 PDT ---------------------------------------------------------------------- I hate to keep reopening this bug, but I'm afraid the mtx-changer script appears fine. Sorry for the inconvenience. The output of the script is: 1:G0000007 2:D0000005 3:D0000006 4:G0000004 5:D0000007 6:G0000006 10:CLNA0001 which is right for my case. At first I thought you intended for it to print this: 1:G0000007 2:D0000005 3:D0000006 4:G0000004 5:D0000007 6:G0000006 7: 8: 9: 10:CLNA0001 So I modified it to do so. Unfortunately, this still didn't fix the problem (the documentation applies that only full slots should be reported). Here is a succint sequence of events that always triggers the problem: Two tapes, barcoded A0000001 and A0000002. Empty database. Run "label barcodes" and it creates two volume entries as follows: +---------+------------+------+-----------+ | MediaId | VolumeName | Slot | InChanger | +---------+------------+------+-----------+ | 1 | A0000001 | 1 | 1 | | 2 | A0000002 | 2 | 1 | +---------+------------+------+-----------+ This is correct. If I run "update slots" it still is correct. Without even backing up, I remove A0000001. Now I run "update slots" again. It prints: 3301 Issuing autochanger "loaded drive 0" command. 3302 Autochanger "loaded drive 0", result: nothing loaded. 3306 Issuing autochanger "list" command. Catalog record for Volume "A0000002" is up to date. Note that it does not clear the InChanger flag on A0000001 (despite the slot being in the range of the scan). I would expect something like "UPDATE Media SET InChanger = 0 WHERE MediaId=1;". When it attempts to backup, it does something like this: backup-dir: Start Backup JobId 89, Job=NightlyDatastore.2005-06-03_20.50.01 VXA: 3301 Issuing autochanger "loaded drive 0" command. VXA: 3302 Autochanger "loaded drive 0", result: nothing loaded. VXA: 3304 Issuing autochanger "load slot 1, drive 0" command. VXA: 3992 Bad autochanger "load slot 1, drive 0": ERR=Child exited with code 1. VXA: NightlyDatastore.2005-06-03_20.50.01 Fatal error: device.c:317 Unable to open device /dev/nst0. ERR=dev.c:289 stored: unable to open device /dev/nst0: ERR=Input/output error datastore-fd: NightlyDatastore.2005-06-03_20.50.01 Fatal error: job.c:1665 Bad response to Append Data command. Wanted 3000 OK data, got 3903 Error append data Essentially it attempts to mount A0000001 loaded from slot 1, which makes sense because the database lists it as InChanger=1 and Slot=1. Am I missing something? Is this fixed in a version newer than 1.36.2? On an empty slot X, should you issue something like "UPDATE Media SET InChanger=0 WHERE Slot=X;"? ---------------------------------------------------------------------- kagato - 06-28-2005 09:56 PDT ---------------------------------------------------------------------- I think you can close this. After scouring the bug tracker for mentions of InChanger (since I first didn't have a clue that this was the problem), I've found that this is a duplicate of bugs http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000017 and http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000252. My current workaround script works like a charm, I'd like to close with the following script which I would ask you direct others towards if they experience the same problem. Since Debian Sarge now distributes 1.36.2, I suspect more than a few stability-minded admins will experience this problem. I use this as a nightly cronjob (can't seem to run it in a RunBeforeJob because of database locking issues). Note that it is important to only run this when there are no jobs running. --snip-- #!/bin/sh exec /usr/bin/bconsole<<EOT sqlquery UPDATE Media SET InChanger=0; update slots exit EOT --snip-- Thanks for all of your help. Bug History Date Modified Username Field Change ====================================================================== 06-18-05 07:18 kagato New Bug 06-18-05 07:44 kern Bugnote Added: 0000975 06-18-05 07:44 kern Status new => feedback 06-22-05 13:22 kern Bugnote Added: 0000999 06-22-05 13:22 kern Resolution open => not a bug 06-22-05 13:22 kern Status feedback => closed 06-23-05 14:36 kagato Bugnote Added: 0001018 06-23-05 14:36 kagato Resolution not a bug => reopened 06-23-05 14:36 kagato Status closed => feedback 06-25-05 01:06 kern Bugnote Added: 0001021 06-25-05 01:06 kern Resolution reopened => not a bug 06-25-05 01:06 kern Status feedback => closed 06-28-05 09:49 kagato Bugnote Added: 0001024 06-28-05 09:49 kagato Resolution not a bug => reopened 06-28-05 09:49 kagato Status closed => feedback 06-28-05 09:56 kagato Bugnote Added: 0001025 ====================================================================== |