From: Mantis B. T. <no...@bu...> - 2017-08-01 21:59:52
|
The following issue has been SUBMITTED. ====================================================================== http://bugs.bacula.org/view.php?id=2299 ====================================================================== Reported By: Phil Stracchino Assigned To: ====================================================================== Project: Bacula Bug Reports Issue ID: 2299 Category: Director Reproducibility: always Severity: major Priority: normal Status: new ====================================================================== Date Submitted: 2017-08-01 22:59 BST Last Modified: 2017-08-01 22:59 BST ====================================================================== Summary: Release 9.0.3 : Database write batch size code does not work (and is set to wrong value anyway) Description: The Director contains code to batch attribute writes into chunks throughout the job instead of writing them all at the end. However, this code demonstrably does not work. By modifying the code for testing purposes to make the batch table a global table instead of a thread-private temporary table, it is possible to see that the batch table grows monotonically throughout the job until the very end of the job, and is then all spooled into the database in one massive blast that may be millions of rows on a large backup. In most cases, this merely causes a performance problem as Bacula hammers the database at the end of each job. However, when Bacula is running against a MySQL-alike database with Galera synchronous clustering, because the batching does not work, any job that backs up more than 128K files will fail with a wsrep_max_ws_rows exceeded error. It is reasonable to expect that it may also cause similar failures when running against Oracle MySQL 5.7's Group Replication feature, as Group Replication is Oracle's fairly blatant attempt to copy Galera's functionality. This is not a new bug; I don't know when this batch size limit code was added, but watching DB usage patterns across multiple Bacula releases makes it clear that it has never worked. Steps to Reproduce: Method 1: Run Bacula against a Galera cluster (MariaDB + Galera or Percona XtraDB Cluster). Observe that all jobs over 128k files fail at the end of the job with 'wsrep_max_ws_rows exceeded'. The failure point can be moved by changing wsrep_max_ws_rows, but this variable cannot be increased indefinitely. Method 2: Change the hardcoded batch limit from 500000 to 25000, 10000, even 1000, and recompile. Observe that the behavior does not change, and jobs against Galera clusters still fail at 128K files. Method 3: monitor any database during a job and observe that no attribute inserts take place until right at the end of the job after data writing is complete. Method 4: monitor the batch table by any means during a job and observe that rather than repeatedly growing and being flushed, it grows monotonically until the end of the job, when it is all dumped into the DB at once. Additional Information: >From discussions with Kern, the batch size limit was SUPPOSED to be set to 25000. It is actually set to 500,000. However, this is moot because it doesn't work anyway. The ideal would be if this were made a Director configuration tunable (MaxBatchSize, say) defaulting to 25000, to enable ANYONE to tune it to whatever value they found performed best with *their* backing database. ====================================================================== Issue History Date Modified Username Field Change ====================================================================== 2017-08-01 22:59 Phil StracchinoNew Issue ====================================================================== |