From: Josh F. <jf...@pv...> - 2008-05-13 14:18:36
|
Kern, Below is a condensed version of the log from a sequence of jobs using bacula-dir and bacula-sd v. 2.2.10-b2. I believe that the reason job 7272 below failed has something to do with bug 1083 (mult-drive disk autochanger and volume swapping between drives). I purposefully reset the job start times to prevent simultaneous jobs from starting. I inadvertently missed one and jobs 7273 and 7274 ran in parallel. This turned out to be interesting, because when jobs 7273 and 7274 were simultaneously started the drives were nearly in the same state as when job 7272 started. A usable volume was in drive 1 and an unusable volume (or no volume) was in drive 0. For the simultaneous jobs it worked. I think this might have been by chance, however, because it appears that job 7273 (using drive 0) did not have to swap from drive 1 because job 7274 (using drive 1) had already unloaded drive 1 to slot 34??? Also, both Dir and SD are running in the same Xen domU with only one virtual CPU assigned. Would job 7273 (the one using drive 0) had failed in the same way as job 7272 if there would have been another CPU available and the two jobs "really" started simultaneously? Anyway, I did not have debug level set in the SD, so I don't have much more for you at this time. I am now running SD with debug level 150 and will let you know what happens. Should I file another bug report if this can be reproduced? Thanks, Josh Fisher ### sequence of jobs before, during, and after volume swap error ### 12-May 08:10 drive 0 = slot-32 (pool=inc) - drive 1 = slot-34 (pool=inc) job 7263 started using drive 0 finds slot-32 (from pool="inc") in drive 0 unloads drive 0 into slot-32 loads slot-37 (from pool="cat") into drive 0 writes to slot-37 exits OK 12-May 11:10 drive 0 = slot-37 (pool=cat) - drive 1 = slot-34 (pool=inc) job 7264 using drive 0 finds slot-37 (pool=cat) in drive 0 unloads drive 0 into slot-37 loads slot-39 (pool=full) into drive 0 job fails before writing ("no data available" at append.c:159 - see note below) 12-May 14:01 drive 0 = slot-39 (pool=full) - drive 1 = slot-34 (pool=inc) job 7266 using drive 0 job failed (no route to host) 12-May 19:01 drive 0 = slot-39 (pool=full) - drive 1 = slot-34 (pool=inc) job 7268 using drive 1 finds slot-34 (pool=inc) in drive 1 writes to slot-34 exits OK 12-May 19:02 drive 0 = slot-39 (pool=full) - drive 1 = slot-34 (pool=inc) job 7269 using drive 1 finds slot-34 (pool=inc) in drive 1 writes to slot-34 exits OK 12-May 23:51 drive 0 = slot-39 (pool=full) - drive 1 = slot-34 (pool=inc) job 7272 using drive 0 finds slot-39 (pool=full) in drive 0 unloads drive 0 into slot 39 attempts to load slot-34 (pool=inc) into drive 0 fails - autochanger script returns (Storage Element 34 Empty (loaded in drive 1)) 13-May 00:31 drive 0 = unloaded - drive 1 = slot-34 (pool=inc) job 7273 started using drive 0 finds drive 0 unloaded loads slot-34 into drive 0 writes to slot-34 exits OK job 7274 started using drive 1 finds slot-34 (pool=inc) in drive 1 unloads drive 1 into slot 34 loads slot-32 (pool=inc) into drive 1 writes to slot-32 exits OK |