Found this in the manual at: http://www.bacula.org/5.2.x-manuals/en/main/main/Configuring_Director.html#SECTION0014150000000000000000


Maximum Volume Jobs = positive-integer
This directive specifies the maximum number of Jobs that can be written to the Volume. If you specify zero (the default), there is no limit. Otherwise, when the number of Jobs backed up to the Volume equals positive-integer the Volume will be marked Used. When the Volume is marked Used it can no longer be used for appending Jobs, much like the Full status but it can be recycled if recycling is enabled, and thus used again. By settingMaximumVolumeJobs to one, you get the same effect as setting UseVolumeOnce = yes.

The value defined by this directive in the bacula-dir.conf file is the default value used when a Volume is created. Once the volume is created, changing the value in the bacula-dir.conf file will not change what is stored for the Volume. To change the value for an existing Volume you must use the update command in the Console.

If you are running multiple simultaneous jobs, this directive may not work correctly because when a drive is reserved for a job, this directive is not taken into account, so multiple jobs may try to start writing to the Volume. At some point, when the Media record is updated, multiple simultaneous jobs may fail since the Volume can no longer be written.


Look at the third paragraph.  I imagine this may be what you're hitting.  Basically, limiting the number of jobs per volume fights with the multiplexing the SD does to handle multiple concurrent jobs to the same pool.  I think this would potentially be a problem regardless of the max volume jobs, but I think you'd get bitten by it more often the lower the setting.

I think your two options are job-per-pool (what I do) or multiple concurrent jobs to one pool, and let the SD be free to multiplex jobs onto volumes however it sees fit.

I'm curious -- are you really wanting exactly one job per volume (and then I'd be curious, why?),or are you rather trying to limit the size of the files backing the volumes to possibly make restoring less time consuming (I don't know that it does), or overcome filesystem limitations?  In the latter cases, I think you'd be able to set Max Volume Files and/or Max Volume Bytes and NOT stomp on the multiplexing the SD does.

I think the race condition comes from the delay in the media record update (the docs say it happens "at some point").  This could have a lot of variation in behavior depending on the specific timings of the kind of jobs you were doing.

Also, earlier you said "To this end I have multiple devices "pointing" at the same disk based pool.
"  I am confused (and of course I haven't looked at your config -- sorry about that) because pools reference Storage resources, which in turn reference Device resources, which are defined in the SD's configuration.  Do you rather mean that you have multiple Device resources in the SD config that are using the same "Archive Device" and "Media Type"?

The manual also says (see the warning near the end):

Device = device-name
This directive specifies the Storage daemon's name of the device resource to be used for the storage. If you are using an Autochanger, the name specified here should be the name of the Storage daemon's Autochanger resource rather than the name of an individual device. This name is not the physical device name, but the logical device name as defined on the Name directive contained in the Device or the Autochanger resource definition of the Storage daemonconfiguration file. You can specify any name you would like (even the device name if you prefer) up to a maximum of 127 characters in length. The physical device name associated with this device is specified in the Storage daemonconfiguration file (as Archive Device). Please take care not to define two different Storage resource directives in the Director that point to the same Device in the Storage daemon. Doing so may cause the Storage daemon to block (or hang) attempting to open the same device that is already open. This directive is required.


Maybe you're running into that.

Hope that helps.  I'd look at your configs but I am heading out on vacation and haven't finished packing :)

-Jonathan Hankins



On Tue, Dec 17, 2013 at 9:00 PM, Mike Brady <mike.brady@devnull.net.nz> wrote:
Hi

If this isn't supposed to work then that would mean that there can
only ever be one volume per pool mounted for writing at any one time.
Is that a known Bacula limitation?

I believe that the jobs are "working".  I have restored a number of
the volumes and they have always contained what I thought they should,
but as you say I may just have been lucky.

I have run the directory in debug mode on my test system today and it
looks like there is a race of some sort because multiple jobs are
picking up the same volume initially, but only one job ever uses that
volume.  The other jobs always seem to move on to another volume. At
least I think that is what the log is showing.  I have attached the
debug log.

Bug or by design something isn't right, so I guess it is back to the
drawing board :-(

Thanks

Mike

Quoting Jonathan Hankins <jhankins@homewood.k12.al.us>:

> Heya,
>
>
> On Tue, Dec 17, 2013 at 2:42 PM, Mike Brady <mike.brady@devnull.net.nz>wrote:
>>
>> I am doing one job per volume which means that I need to have multiple
>> volumes from the same pool mounted at the same time in order to do
>> concurrent jobs.
>
>
> I don't think this is supposed to work.  I wanted to do the something
> similar, and I wound up doing one pool/storage/media type/etc. per job, and
> wrote a script to generate my configs from templates.  I made it a bit too
> complex, as most of my jobs turned out looking the same, and I didn't need
> as much flexibility as I designed in, but you can knock something together
> pretty quickly.
>
>
>
>> To this end I have multiple devices "pointing" at
>> the same disk based pool.  This works except for the intermittent
>> problem with allocating a volume as indicated in my original post.
>
>
> I think that the concurrent writers support happens in the SD, at the
> pool level, by interleaving writes from concurrent jobs to the volume
> mounted for that pool.  If I had to guess, I'd say it either appears to be
> working, but isn't really writing out jobs correctly, or you're just
> getting really lucky.
>
> -Jonathan
>
> --
> ------------------------------------------------------------------------
> Jonathan Hankins    Homewood City Schools
>
> The simplest thought, like the concept of the number one,
> has an elaborate logical underpinning. - Carl Sagan
>
> jhankins@homewood.k12.al.us
> ------------------------------------------------------------------------
>




--
------------------------------------------------------------------------
Jonathan Hankins    Homewood City Schools

The simplest thought, like the concept of the number one,
has an elaborate logical underpinning. - Carl Sagan

jhankins@homewood.k12.al.us
------------------------------------------------------------------------