Re: [Bacula-devel] Patch: Migration media table update wrong source storage

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Fri, Mar 28, 2008 at 02:27:38PM +0100, Kern Sibbald wrote:
! Thanks for the patch.  Though I did it slightly differently, I have 
! implemented the concept in the current development trunk and in the 2.2.9 
! beta release.

! PS: If you get a chance, I would appreciate it if you would try the 2.2.9-b3 
! and make sure the problem does not exist any more.

Oh wow, thanks for the info. And good timing this is - I just
finished most of the issues with my OS upgrade (same game, bugs to
fix & to report). :)

2.2.9b3 installed and running, and, at first glance it looks good.
As I see, my first patch (rev.6377) did not yet make it into this
release.

Now concerning this one: the problem did not appear so far. And from
the source I also would not expect it to appear. You did a more 
drastic change, stopping *all* non-writing jobs from updating the 
media.storageid column - I like it better this way - it's more
consistent. (Since I did not precisely know what this column is
used for I tried to focus on the least-possible modification.
If You open this up to the most possible straightforwarness, then 
it is just great. :))

But, well, I am not really sure if I should already speak up about
this - but if I do not it might also be wrong. At the moment I 
do not have any evidence yet, and it might all be the weirdness
of strange coincidences. To put it short: it looks like the SD
has caught some kind of Alzheimer; it has difficulties remembering
which Volume is in which drive.

Just telling a story as it is: since we found the problem with
the concurrency counters going negative in jobq.c, my installation
started to become real fun - means it started to work about the way 
I think it should. 
Then I rebuilt the whole installation (OS+apps) of my backend 
cluster to a new version, plus major tidy-up of everything. In 
the end I re-installed Bacula (2.2.8) and ran all the necessary
jobs plus big migration. It ran for nearly a day, because the 
machine was still loaded with compiling other applications, so 
there were hundreds of scheduled jobs during the migration, and
if there still were a conflict, it should have shown up. Actually
there is one, I know about it, and we will have to look at that
in due time. But besides that, it worked itself smoothly thru, 
and unravelled everything cleanly.

Now today, I ran a small migration with the 2.2.9p3 for test,
and I got half a dozen failed schedule-jobs from conflicting
mounts in the autochanger, and even a completely stalled drive
needing manual intervention. It seems not to recognize when a 
volume that it wants to use in one drive is already mounted
in another drive. (*)
The good thing is: it recovered from it, rescheduled the failed
jobs, and didn't need a restart.

(*) The virtual autochanger script that You supply ('disk-changer')
  does not care about conflicting mounts - it will happily mount 
  the same Volume into two drives at the same time (and if there
  is one read and one write, that may even work). My script denies
  this - like a real autochanger would do.

Basically, that problem was already present - but it appeared
rather seldom, and I was still trying to figure out the exact 
circumstances. So this might be just strange coincidence. 

I am sorry that right at the moment I do not have the time to 
work myself thru the SVN changes and look for something that might
match with my observation. So I only speak out, with the plea
to be taken "with a grain of salt" as this might well be a 
no-issue.

rgds,
PMc