From: Mantis B. T. <no...@bu...> - 2010-08-03 12:38:02
|
The following issue has been UPDATED. ====================================================================== http://bugs.bacula.org/view.php?id=1614 ====================================================================== Reported By: chibisuke Assigned To: ====================================================================== Project: bacula Issue ID: 1614 Category: bat Reproducibility: always Severity: minor Priority: low Status: feedback ====================================================================== Date Submitted: 2010-07-27 08:06 BST Last Modified: 2010-08-03 13:37 BST ====================================================================== Summary: Bat hangs when accessing Job List - MySQL 5.1.43 Query takes forever to complete Description: I'm using bacula in "bigger" enviornment (about 150 servers), and noticed that now after we're using it for 4 months now, bat takes forever to load the joblist. Bat seams to send a DB Query to the database, that takes now 20 minutes to complete for it to display the joblist. I tested modifying the query using a hex editor and succeded in so far, that it now loads the list fast, but this seams to cause other bad sideffects. In fact it looks like the 2 subquerys are the problem. if I replace them with "1" the problem is gone. I'm running bacula-dir itself on MySQL 5.1.43 on opensolaris. Also I tried finding the bottleneck of the query, but beside removing that subquerys I wouldn't know what to do else. Anyone else experienced this problem already maybe? What else can I do helping me to figure out where this problem comes from? ====================================================================== ---------------------------------------------------------------------- (0005524) chibisuke (reporter) - 2010-07-27 08:56 http://bugs.bacula.org/view.php?id=1614#c5524 ---------------------------------------------------------------------- The query executed is: SELECT Job.JobId AS JobId, Job.Name AS JobName, Client.Name AS Client, Job.Starttime AS JobStart, Job.Type as JobType, Job.Level As BackupLevel, Job.Jobfiles AS FileCount, Job.JobBytes AS Bytes, Job.JobStatus AS Status, Job.PurgedFiles AS Purged, FileSet.FileSet, Pool.Name AS Pool, (SELECT Media.VolumeName FROM JobMedia JOIN Media ON JobMedia.MediaId=Media.MediaId WHERE JobMedia.JobId = Job.JobId ORDER BY JobMediaId LIMIT 1) AS FirstVolume, (SELECT count(DISTINCT MediaId) FROM JobMedia WHERE JobMedia.JobId=Job.JobId) AS Volumed FROM Job JOIN Client ON (Client.ClientId = Job.ClientId) LEFT OUTER JOIN FileSet ON (FileSet.FileSetId=Job.FileSetId) LEFT OUTER JOIN Pool ON Job.PoolId = Pool.PoolId ORDER BY Job.JobId DESC LIMIT 150; Query Exectution Plan: | 1 | PRIMARY | Client | ALL | PRIMARY | NULL | NULL | NULL | 117 | Using temporary; Using filesort | | 1 | PRIMARY | Job | ref | ClientId | ClientId | 5 | bacula.Client.ClientId | 34 | Using where | | 1 | PRIMARY | FileSet | eq_ref | PRIMARY | PRIMARY | 4 | bacula.Job.FileSetId | 1 | | | 1 | PRIMARY | Pool | eq_ref | PRIMARY | PRIMARY | 4 | bacula.Job.PoolId | 1 | | | 3 | DEPENDENT SUBQUERY | JobMedia | ref | JobId | JobId | 4 | bacula.Job.JobId | 4520 | Using index | | 2 | DEPENDENT SUBQUERY | JobMedia | index | JobId | PRIMARY | 4 | NULL | 1 | Using where | | 2 | DEPENDENT SUBQUERY | Media | eq_ref | PRIMARY | PRIMARY | 4 | bacula.JobMedia.MediaId | 1 | | ---------------------------------------------------------------------- (0005525) chibisuke (reporter) - 2010-07-27 09:10 http://bugs.bacula.org/view.php?id=1614#c5525 ---------------------------------------------------------------------- Removing the 1st subquery (replacing it with "0 as FirstVolume"), resolves that problem, but seams to cause other sid effects when trying to manipulate the jobs which I don't know yet if they're originating from the change or not. ---------------------------------------------------------------------- (0005526) ebollengier (administrator) - 2010-07-28 15:00 http://bugs.bacula.org/view.php?id=1614#c5526 ---------------------------------------------------------------------- Can you try to run different parts of the query alone with timing information ? (the 2 sub queries and the global one without the two). Can you give us some information about your catalog, how many media,jobmedia and jobs have you ? How many jobmedia have you in average for a job ? ---------------------------------------------------------------------- (0005528) chibisuke (reporter) - 2010-07-28 15:38 http://bugs.bacula.org/view.php?id=1614#c5528 ---------------------------------------------------------------------- The outer query: SELECT Job.JobId AS JobId, Job.Name AS JobName, Client.Name AS Client, Job.Starttime AS JobStart, Job.Type as JobType, Job.Level As BackupLevel, Job.Jobfiles AS FileCount, Job.JobBytes AS Bytes, Job.JobStatus AS Status, Job.PurgedFiles AS Purged, FileSet.FileSet, Pool.Name AS Pool FROM Job JOIN Client ON (Client.ClientId = Job.ClientId) LEFT OUTER JOIN FileSet ON (FileSet.FileSetId=Job.FileSetId) LEFT OUTER JOIN Pool ON Job.PoolId = Pool.PoolId ORDER BY Job.JobId DESC LIMIT 150; 150 rows in set (0.70 sec) The first subquery: SELECT Media.VolumeName FROM JobMedia JOIN Media ON JobMedia.MediaId=Media.MediaId WHERE JobMedia.JobId = 10437 ORDER BY JobMediaId LIMIT 1; 1 row in set (0.00 sec) The 2nd subquery: SELECT count(DISTINCT MediaId) FROM JobMedia WHERE JobMedia.JobId=10437; 1 row in set (0.00 sec) So the individual querys are not a problem it seams, but only the combination of the query. count(media): 9690 count(JobMedia): 45808 count(Job): 10322 I will attach a MySQL profiler run of the query later today. ---------------------------------------------------------------------- (0005529) ebollengier (administrator) - 2010-07-28 15:46 http://bugs.bacula.org/view.php?id=1614#c5529 ---------------------------------------------------------------------- Thanks, very strange anyway... The two subqueries are not used in the JOIN or WHERE part, so they should be executed once per record (150), and as they are very fast, it shouldn't take 20mins. It looks to be a strange MySQL behavior. I'm waiting for your profiler output. ---------------------------------------------------------------------- (0005530) chibisuke (reporter) - 2010-07-28 16:43 http://bugs.bacula.org/view.php?id=1614#c5530 ---------------------------------------------------------------------- ouch! - the whole query took 150 rows in set (12 min 21.51 sec) and created 51639 lines in the profile. I attached it as a tab-seperated-value file. For convenience I'll include the headers of the profiling table here: | QUERY_ID | SEQ | STATE | DURATION | CPU_USER | CPU_SYSTEM | CONTEXT_VOLUNTARY | CONTEXT_INVOLUNTARY | BLOCK_OPS_IN | BLOCK_OPS_OUT | MESSAGES_SENT | MESSAGES_RECEIVED | PAGE_FAULTS_MAJOR | PAGE_FAULTS_MINOR | SWAPS | SOURCE_FUNCTION | SOURCE_FILE | SOURCE_LINE | to me that output looks like mysql is calling the subquery for every column in the table and applying the limit afterwards, so it might as well be a bug in mysql. I also attached my my.cnf just in case. ---------------------------------------------------------------------- (0005531) ebollengier (administrator) - 2010-07-28 16:51 http://bugs.bacula.org/view.php?id=1614#c5531 ---------------------------------------------------------------------- It looks to be a MySQL support issue, I'm also using 5.1.47, and when I'm running this query on the same average of data, it runs instantly. I would like to avoid to split the query. We will have to execute 300 queries instead of one... Can you dump your Job, JobMedia and Media table, then send me this dump on eri...@ba... ? ---------------------------------------------------------------------- (0005533) chibisuke (reporter) - 2010-07-28 17:53 http://bugs.bacula.org/view.php?id=1614#c5533 ---------------------------------------------------------------------- will do that tomorrow, when I'm on the office again. I can't access the machine from here. splitting the query isn't a solution as it would take a lot of time either. Maybe I find some time to rework that query as a join instead of subquery, that might fix it. ---------------------------------------------------------------------- (0005546) ebollengier (administrator) - 2010-08-03 13:34 http://bugs.bacula.org/view.php?id=1614#c5546 ---------------------------------------------------------------------- Thanks for your set of data, I loaded it on my PC, MySQL 5.1.47, and the average time for this query is about 8s. When I'm trying it on MySQL 5.0.51a, it's about 0s... It looks to be a sever regression on MySQL and I can imagine that other parameters can increase the query time. I also tested it on PostgreSQL, and the timing is also about 0s. Sorry, but as this is not a Bacula bug, I let you contact MySQL to expose them this regression, with your set of data, it should be rather easy for them to reproduce it and provide a fix. A simple workaround is to downgrade your MySQL version or to test other implementation like MariaDB. Issue History Date Modified Username Field Change ====================================================================== 2010-07-27 08:06 chibisuke New Issue 2010-07-27 08:56 chibisuke Note Added: 0005524 2010-07-27 09:10 chibisuke Note Added: 0005525 2010-07-27 11:17 mnalis Issue Monitored: mnalis 2010-07-28 15:00 ebollengier Note Added: 0005526 2010-07-28 15:00 ebollengier Status new => feedback 2010-07-28 15:01 ebollengier Severity major => minor 2010-07-28 15:38 chibisuke Note Added: 0005528 2010-07-28 15:46 ebollengier Note Added: 0005529 2010-07-28 16:38 chibisuke File Added: profile.rar 2010-07-28 16:43 chibisuke File Added: my.cnf 2010-07-28 16:43 chibisuke Note Added: 0005530 2010-07-28 16:51 ebollengier Note Added: 0005531 2010-07-28 17:53 chibisuke Note Added: 0005533 2010-08-03 13:34 ebollengier Note Added: 0005546 2010-08-03 13:37 ebollengier Priority normal => low 2010-08-03 13:37 ebollengier Resolution open => no change required 2010-08-03 13:37 ebollengier Summary Bacula hangs when accessing Job List - MySQL Query takes forever to complete => Bat hangs when accessing Job List - MySQL 5.1.43 Query takes forever to complete ====================================================================== |