|
From: Lloyd B. <llo...@by...> - 2025-05-21 19:30:29
|
Interesting. I remember being confused about which `Maximum Concurrent
Jobs` to use (the `bacula-fd.conf` vs the director's config), but your
explanation makes a lot of sense. I hadn't realized/considered the
implications of having it set too low in the FD's config.
I'll test it and if the problem recurs over the next few days, I'll post
here again.
Thank you
Lloyd
On 5/21/25 09:39, Bill Arlofski via Bacula-users wrote:
> On 5/20/25 3:20 PM, Lloyd Brown wrote:
>> Hi all,
>>
>> I'm running into an issue with some bacula-fd instances and hoping
>> someone can point me in the right direction.
>>
>> In short: I have bacula-fd instances that are clearly running jobs
>> (confirmed via strace), but they often time out when I run status
>> client=CLIENTNAME. They only seem reliably responsive when idle.
>>
>> Details:
>>
>> * Bacula version: 9.6.6 (yes, I know it's old — upgrade is
>> planned).
>> * Setup: Two hosts (`zhomebackup[1-2]`) running both SD and
>> FD. A script at the beginning of each job snapshots NFS
>> shares, mounts them, and outputs file paths for backups.
>> * Problem: These hosts struggle to handle more than 6–7 jobs
>> effectively. Going beyond that causes a drop in aggregate
>> file scan rates.
>> * Attempted solution: Spun up additional FD instances on
>> separate ports (originally inside Docker, but now just
>> running natively on non-standard ports). These new instances are
>> /intermittently/ responsive to `status client`, even
>> with only 1–3 jobs. The original FD (on the default port) remains
>> responsive, even with 6–7 jobs.
>>
>> I'm wondering if this could be a shared resource issue or some FD
>> limitation I'm not accounting for. Or is there a better way to scale
>> job throughput?
>>
>> I've attached a tarball containing systemd service files, FD configs,
>> and relevant parts of the Director config, including an example job
>> definition.
>>
>> Any insights would be greatly appreciated.
>>
>> Thanks,
>> Lloyd
>
> Hello Lloyd,
>
> For Bacula, each connection is counted as a 'Job'
>
> This means on your FDs, once three jobs are running (six for the first
> one), the FD will not accept the new "job" connection for the `status
> client` which will appear as the symptom you are describing.
>
> MaximumConcurrentJobs settings grepped from your configs:
> ----8<----
> $ grep -ir maximum zhomebackup1/etc/bacula/bacula-fd.con*
> zhomebackup1/etc/bacula/bacula-fd.conf: Maximum Concurrent Jobs
> = 6
> zhomebackup1/etc/bacula/bacula-fd.container1.conf: #Maximum
> Concurrent Jobs = 20
> zhomebackup1/etc/bacula/bacula-fd.container1.conf: Maximum
> Concurrent Jobs = 3
> zhomebackup1/etc/bacula/bacula-fd.container2.conf: #Maximum
> Concurrent Jobs = 20
> zhomebackup1/etc/bacula/bacula-fd.container2.conf: Maximum
> Concurrent Jobs = 3
> zhomebackup1/etc/bacula/bacula-fd.container3.conf: #Maximum
> Concurrent Jobs = 20
> zhomebackup1/etc/bacula/bacula-fd.container3.conf: Maximum
> Concurrent Jobs = 3
> ----8<----
>
> Just increase these FD settings, restart each FD and you should be fine.
>
> A setting of `MaximumConcurrentJobs = 20` as shipped with the
> default/example configs is a good starting point for the FDs, and then
> you can manage the actual number of concurrent jobs triggered on each
> of the clients with the MaximumConcurrentJobs setting in the Client{}
> configurations on the Director which can then be adjusted up or down
> without requiring a restart of the FDs - just a bconsole 'reload'
> command is needed for the Director to pick up these changes.
>
>
> Hope this helps,
> Bill
>
>
>
> _______________________________________________
> Bacula-users mailing list
> Bac...@li...
> https://lists.sourceforge.net/lists/listinfo/bacula-users
--
Lloyd Brown
HPC Systems Administrator
Office of Research Computing
Brigham Young University
http://rc.byu.edu
|