[Bacula-bugs] [bacula 0000106]: VolumeToCatalog verify jobs are unusably slow

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

A BUGNOTE has been added to this bug.
======================================================================
http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000106
======================================================================
Reported By:                martin
Assigned To:                Dan Langille
======================================================================
Project:                    bacula
Bug ID:                     106
Category:                   Director
Reproducibility:            always
Severity:                   major
Priority:                   normal
Status:                     assigned
======================================================================
Date Submitted:             09-14-2004 05:16 PDT
Last Modified:              09-20-2004 04:49 PDT
======================================================================
Summary:                    VolumeToCatalog verify jobs are unusably slow
Description: 
Each file is taking 35 seconds!  The problem is in db_get_file_record(),
which accesses the file table like this:

SELECT FileId, LStat, MD5 FROM File WHERE File.JobId=260 AND
File.PathId=25180 AND File.FilenameId=243613

but there is no index for jobid, pathid and filenameid.
======================================================================

----------------------------------------------------------------------
 martin - 09-15-2004 05:07 PDT 
----------------------------------------------------------------------
DiskToCatalog verify jobs are also impossibly slow, e.g. doing

SELECT FileId, LStat, MD5 FROM File,Job WHERE File.JobId=Job.JobId AND
File.PathId=2027 AND File.FilenameId=7839 AND Job.Type='B' AND
Job.JobSTATUS='T' AND ClientId=1 ORDER BY StartTime DESC LIMIT 1

at 30 seconds per file.

----------------------------------------------------------------------
 Dan Langille - 09-15-2004 05:57 PDT 
----------------------------------------------------------------------
Good!  We have a query and a time.  Please paste the output of the
following:

explain analyse
SELECT FileId, LStat, MD5 
  FROM File,Job 
 WHERE File.JobId=Job.JobId 
   AND File.PathId=2027
   AND File.FilenameId=7839
   AND Job.Type='B'
   AND Job.JobSTATUS='T'
   AND ClientId=1
ORDER BY StartTime DESC LIMIT 1

Then run "vacuum analyse" and then do the above again, and give us the
results.

----------------------------------------------------------------------
 martin - 09-15-2004 06:28 PDT 
----------------------------------------------------------------------
Thanks for the suggestions.  I ran this as the pgsql user and the vacuum
took several minutes, but it doesn't seem to have changed much for the
query.

bacula=# explain analyse
bacula-# SELECT FileId, LStat, MD5 
bacula-#   FROM File,Job 
bacula-#  WHERE File.JobId=Job.JobId 
bacula-#    AND File.PathId=2027
bacula-#    AND File.FilenameId=7839
bacula-#    AND Job.Type='B'
bacula-#    AND Job.JobSTATUS='T'
bacula-#    AND ClientId=1
bacula-# ORDER BY StartTime DESC LIMIT 1;
                                                           QUERY PLAN     

---------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=18786.64..18786.64 rows=1 width=105) (actual
time=34102.67..34102.68 rows=1 loops=1)
   ->  Sort  (cost=18786.64..18786.64 rows=1 width=105) (actual
time=34102.65..34102.66 rows=2 loops=1)
         Sort Key: job.starttime
         ->  Nested Loop  (cost=0.00..18786.63 rows=1 width=105) (actual
time=20545.13..34102.47 rows=3 loops=1)
               ->  Seq Scan on file  (cost=0.00..18783.60 rows=1 width=93)
(actual time=20519.81..34051.89 rows=3 loops=1)
                     Filter: ((pathid = 2027) AND (filenameid = 7839))
               ->  Index Scan using job_pkey on job  (cost=0.00..3.02
rows=1 width=12) (actual time=16.81..16.82 rows=1 loops=3)
                     Index Cond: ("outer".jobid = job.jobid)
                     Filter: (("type" = 'B'::bpchar) AND (jobstatus =
'T'::bpchar) AND (clientid = 1))
 Total runtime: 34102.88 msec
(10 rows)

bacula=# vacuum analyse;
VACUUM
bacula=# explain analyse
bacula-# SELECT FileId, LStat, MD5 
bacula-#   FROM File,Job 
bacula-#  WHERE File.JobId=Job.JobId 
bacula-#    AND File.PathId=2027
bacula-#    AND File.FilenameId=7839
bacula-#    AND Job.Type='B'
bacula-#    AND Job.JobSTATUS='T'
bacula-#    AND ClientId=1
bacula-# ORDER BY StartTime DESC LIMIT 1;
                                                           QUERY PLAN     

---------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=274212.10..274212.11 rows=1 width=109) (actual
time=34092.11..34092.12 rows=1 loops=1)
   ->  Sort  (cost=274212.10..274212.11 rows=1 width=109) (actual
time=34092.10..34092.11 rows=2 loops=1)
         Sort Key: job.starttime
         ->  Nested Loop  (cost=0.00..274212.09 rows=1 width=109) (actual
time=20760.43..34092.05 rows=3 loops=1)
               ->  Seq Scan on file  (cost=0.00..274207.51 rows=1
width=97) (actual time=20742.78..34053.71 rows=3 loops=1)
                     Filter: ((pathid = 2027) AND (filenameid = 7839))
               ->  Index Scan using job_pkey on job  (cost=0.00..4.57
rows=1 width=12) (actual time=12.73..12.74 rows=1 loops=3)
                     Index Cond: ("outer".jobid = job.jobid)
                     Filter: (("type" = 'B'::bpchar) AND (jobstatus =
'T'::bpchar) AND (clientid = 1))
 Total runtime: 34092.21 msec
(10 rows)

bacula=# \dt
           List of relations
 Schema |     Name     | Type  | Owner 
--------+--------------+-------+-------
 public | basefiles    | table | pgsql
 public | client       | table | pgsql
 public | counters     | table | pgsql
 public | file         | table | pgsql
 public | filename     | table | pgsql
 public | fileset      | table | pgsql
 public | job          | table | pgsql
 public | jobmedia     | table | pgsql
 public | media        | table | pgsql
 public | path         | table | pgsql
 public | pool         | table | pgsql
 public | unsavedfiles | table | pgsql
 public | version      | table | pgsql
(13 rows)

bacula=# \di
                           List of relations
 Schema |             Name              | Type  | Owner |    Table     
--------+-------------------------------+-------+-------+--------------
 public | basefiles_pkey                | index | pgsql | basefiles
 public | client_name_idx               | index | pgsql | client
 public | client_pkey                   | index | pgsql | client
 public | counters_pkey                 | index | pgsql | counters
 public | file_pkey                     | index | pgsql | file
 public | filename_name_idx             | index | pgsql | filename
 public | filename_pkey                 | index | pgsql | filename
 public | fileset_name_idx              | index | pgsql | fileset
 public | fileset_pkey                  | index | pgsql | fileset
 public | job_media_job_id_media_id_idx | index | pgsql | jobmedia
 public | job_name_idx                  | index | pgsql | job
 public | job_pkey                      | index | pgsql | job
 public | jobmedia_pkey                 | index | pgsql | jobmedia
 public | media_pkey                    | index | pgsql | media
 public | media_volumename_id           | index | pgsql | media
 public | path_name_idx                 | index | pgsql | path
 public | path_pkey                     | index | pgsql | path
 public | pool_name_idx                 | index | pgsql | pool
 public | pool_pkey                     | index | pgsql | pool
 public | unsavedfiles_pkey             | index | pgsql | unsavedfiles
(20 rows)

bacula=#

----------------------------------------------------------------------
 Dan Langille - 09-15-2004 10:57 PDT 
----------------------------------------------------------------------
I have good news.  This takes a while on my box too:

bacula=# explain analyse
bacula-# SELECT FileId, LStat, MD5
bacula-#   FROM File,Job
bacula-#  WHERE File.JobId=Job.JobId
bacula-#    AND File.PathId=2027
bacula-#    AND File.FilenameId=7839
bacula-#    AND Job.Type='B'
bacula-#    AND Job.JobSTATUS='T'
bacula-#    AND ClientId=1
bacula-# ORDER BY StartTime DESC LIMIT 1
bacula-# ;
                                                          QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=134360.90..134360.90 rows=1 width=96) (actual
time=18971.874..18971.874 rows=0 loops=1)
   ->  Sort  (cost=134360.90..134360.90 rows=1 width=96) (actual
time=18971.866..18971.866 rows=0 loops=1)
         Sort Key: job.starttime
         ->  Nested Loop  (cost=0.00..134360.89 rows=1 width=96) (actual
time=18971.852..18971.852 rows=0 loops=1)
               ->  Seq Scan on file  (cost=0.00..134355.54 rows=1
width=92) (actual time=18971.841..18971.841 rows=0 loops=1)
                     Filter: ((pathid = 2027) AND (filenameid = 7839))
               ->  Index Scan using job_pkey on job  (cost=0.00..5.34
rows=1 width=12) (never executed)
                     Index Cond: ("outer".jobid = job.jobid)
                     Filter: (("type" = 'B'::bpchar) AND (jobstatus =
'T'::bpchar) AND (clientid = 1))
 Total runtime: 18971.966 ms
(10 rows)

bacula=#

Now, let's create one index:

bacula=# create index file_pathid on file(pathid);
CREATE INDEX

And run that query again!

bacula=# explain analyse
bacula-# SELECT FileId, LStat, MD5
bacula-#   FROM File,Job
bacula-#  WHERE File.JobId=Job.JobId
bacula-#    AND File.PathId=2027
bacula-#    AND File.FilenameId=7839
bacula-#    AND Job.Type='B'
bacula-#    AND Job.JobSTATUS='T'
bacula-#    AND ClientId=1
bacula-# ORDER BY StartTime DESC LIMIT 1;
                                                                 QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1083.85..1083.85 rows=1 width=96) (actual
time=385.884..385.884 rows=0 loops=1)
   ->  Sort  (cost=1083.85..1083.85 rows=1 width=96) (actual
time=385.875..385.875 rows=0 loops=1)
         Sort Key: job.starttime
         ->  Nested Loop  (cost=0.00..1083.84 rows=1 width=96) (actual
time=385.860..385.860 rows=0 loops=1)
               ->  Index Scan using file_pathid on file 
(cost=0.00..1078.49 rows=1 width=92) (actual time=385.850..385.850 rows=0
loops=1)
                     Index Cond: (pathid = 2027)
                     Filter: (filenameid = 7839)
               ->  Index Scan using job_pkey on job  (cost=0.00..5.34
rows=1 width=12) (never executed)
                     Index Cond: ("outer".jobid = job.jobid)
                     Filter: (("type" = 'B'::bpchar) AND (jobstatus =
'T'::bpchar) AND (clientid = 1))
 Total runtime: 385.975 ms
(11 rows)
bacula=#

Much much better.  Less than 0.4s.

Try that.  See if it helps.

edited on: 09-15-04 10:57

----------------------------------------------------------------------
 martin - 09-16-2004 05:23 PDT 
----------------------------------------------------------------------
Thanks, that fixes it (for both VolumeToCatalog and DiskToCatalog)! 
Hopefully the PathId index should be sufficient until someone creates a
million files in one directory :-)

Was there some performance problem with having this index in the past?  I
see that the mysql config has several indexes for File, including this one
and others that probably speed up 'list files jobid=272' and the 'List
where a File is saved regardless of the directory:' query which also take
over 30 seconds for me.  These are not to important though, because they
are one-off operations.

I will check the time taken for some medium sized full backups (around
200000 files per fileset) over the weekend.

----------------------------------------------------------------------
 Dan Langille - 09-16-2004 05:30 PDT 
----------------------------------------------------------------------
The original PostgreSQL DDL contained foreign keys.  That, by implication,
created indexes (IIRC).  The removed FKs were not replaced with indexes.

I think we'll just add indexes as they are found to be justified.

Please monitor your backups and restores for a while to see if there are
any side effects.

I think you are already running "vacuum analyse" on a regular basis.  But
if not, you should be.

----------------------------------------------------------------------
 martin - 09-16-2004 08:08 PDT 
----------------------------------------------------------------------
I will report back on the timings.

Thanks again.

----------------------------------------------------------------------
 martin - 09-20-2004 04:49 PDT 
----------------------------------------------------------------------
I checked the times for the weekend backups with the file(pathid) index and
they are almost identical to last weekend's backups without it.

Adding this looks like a worthwhile change.

Bug History
Date Modified  Username       Field                    Change              
======================================================================
09-14-04 05:16 martin         New Bug                                      
09-15-04 05:07 martin         Bugnote Added: 0000234                       
09-15-04 05:57 Dan Langille   Bugnote Added: 0000237                       
09-15-04 06:28 martin         Bugnote Added: 0000238                       
09-15-04 10:56 Dan Langille   Bugnote Added: 0000239                       
09-15-04 10:57 Dan Langille   Bugnote Edited: 0000239                      
09-15-04 17:04 Dan Langille   Assigned To               => Dan Langille    
09-15-04 17:04 Dan Langille   Status                   new => assigned     
09-16-04 05:23 martin         Bugnote Added: 0000240                       
09-16-04 05:30 Dan Langille   Bugnote Added: 0000241                       
09-16-04 08:08 martin         Bugnote Added: 0000242                       
09-20-04 04:49 martin         Bugnote Added: 0000249                       
======================================================================