From: <bac...@li...> - 2007-12-20 15:26:55
|
A NOTE has been added to this issue. ====================================================================== http://bugs.bacula.org/view.php?id=1018 ====================================================================== Reported By: fs Assigned To: ebollengier ====================================================================== Project: bacula Issue ID: 1018 Category: Director Reproducibility: random Severity: minor Priority: normal Status: feedback ====================================================================== Date Submitted: 11-21-2007 22:54 UTC Last Modified: 12-20-2007 15:26 UTC ====================================================================== Summary: Multiple jobs attempting to use same volume on two different drives Description: I have a single tape library with two drives. There are about 20 clients each with a single backup job, all scheduled to start at the same time. Periodically (about once or twice a week), two jobs appear to both request the same volume, but on two different drives. The job that requested the volume on the drive it was already mounted on finishes normally. The unlucky job ends up waiting for a manual mount request before it can proceed. I've attached a collection of the report emails from a nightly run that show the problem and some sanitized config files. Let me know if there's anything else that might help shed light on the problem. ====================================================================== ---------------------------------------------------------------------- ebollengier - 11-22-07 18:42 ---------------------------------------------------------------------- Hi, Can you run your bacula-sd and bacula-dir with debug=50, attach your job log output (more easy to read than mbox file) and the debug output. You can set OPTIONS='' to OPTION='-d50' to bacula-ctl-dir and bacula-ctl-sd and redirect all ouputs to a file ie : ${BACSDBIN}/bacula-sd $2 ${OPTIONS} -v -c ${BACSDCFG}/bacula-sd.conf >> /tmp/bacula-sd.debug 2>&1 and ${BACSDBIN}/bacula-dir $2 ${OPTIONS} -v -c ${BACSDCFG}/bacula-dir.conf >> /tmp/bacula-dir.debug 2>&1 ---------------------------------------------------------------------- fs - 11-23-07 02:06 ---------------------------------------------------------------------- I'll fire up the debug level once I get back to the office (Monday 26th) and report back when it captures the problem. ---------------------------------------------------------------------- kern - 11-27-07 11:59 ---------------------------------------------------------------------- Since I am not able to reproduce this, I recommend that you run the Storage daemon with -d50 and capture the output, then when the problem occurs post the output to the bug report. Please only include output of all jobs running at the time something goes wrong -- it would also be very helpful to include the Job report output of all the jobs running at the time of the failure. ---------------------------------------------------------------------- fs - 11-27-07 12:05 ---------------------------------------------------------------------- Okay, I've got the dir and sd running with -d50 and the output is being captured. I'll report back as soon as I can capture an instance of the failure. ---------------------------------------------------------------------- fs - 12-02-07 19:09 ---------------------------------------------------------------------- Here's the debug logs. I think I've trimmed out previous runs, but I'm not 100% sure as the debug output doesn't include timestamps. Let me know if there's anything else you need. ---------------------------------------------------------------------- ebollengier - 12-06-07 22:12 ---------------------------------------------------------------------- Could you attach the bacula job log ? ---------------------------------------------------------------------- fs - 12-06-07 22:20 ---------------------------------------------------------------------- Attached... ---------------------------------------------------------------------- ebollengier - 12-10-07 21:39 ---------------------------------------------------------------------- It may be because there is some code in src/stored/reserve.c where it attempts to switch drives that is not protected by a mutex so there is a race condition when two jobs are simultaneously running through the reservations code. Probably the best fix for this is to modify the code so that when it finds a tape on a different drive, Bacula switches to using the other drive -- this is a bit complicated, but code already exists in acquire.c for reading volumes that does this. The current code *attempts* to release the old volume if it is not in use, then reuse it on the current drive (current dev). I looked at the debug output and it does seem it is passing through that code. ---------------------------------------------------------------------- msmith - 12-20-07 15:26 ---------------------------------------------------------------------- I am also having this problem (version 2.2.6 as well). Just curious if there is an ETA of when it will be fixed? Is there anything I can do to help (get more debug info / test any patches)? Anyone have a temporary work-around? Issue History Date Modified Username Field Change ====================================================================== 11-21-07 22:54 fs New Issue 11-21-07 22:54 fs File Added: bacula-volume-contention.mbox 11-21-07 22:54 fs File Added: bacula-dir.conf 11-21-07 22:54 fs File Added: bacula-sd.conf 11-22-07 18:37 ebollengier Status new => assigned 11-22-07 18:37 ebollengier Assigned To => ebollengier 11-22-07 18:42 ebollengier Note Added: 0002967 11-23-07 02:06 fs Note Added: 0002969 11-27-07 11:59 kern Note Added: 0002994 11-27-07 11:59 kern Status assigned => feedback 11-27-07 12:05 fs Note Added: 0002995 12-02-07 19:08 fs File Added: bacula-dir.debug 12-02-07 19:09 fs File Added: bacula-sd.debug 12-02-07 19:09 fs Note Added: 0003005 12-06-07 22:12 ebollengier Note Added: 0003010 12-06-07 22:19 fs File Added: bacula-log.txt 12-06-07 22:20 fs Note Added: 0003011 12-10-07 21:39 ebollengier Note Added: 0003019 12-20-07 15:26 msmith Note Added: 0003031 ====================================================================== |