From: <bac...@li...> - 2005-08-19 17:19:18
|
A BUGNOTE has been added to this bug. ====================================================================== http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000402 ====================================================================== Reported By: awui Assigned To: ====================================================================== Project: bacula Bug ID: 402 Category: Director Reproducibility: sometimes Severity: major Priority: urgent Status: feedback ====================================================================== Date Submitted: 08-16-2005 08:18 PDT Last Modified: 08-19-2005 10:19 PDT ====================================================================== Summary: Backup records wrong order of volumes, tape-spanning files truncated during restore Description: For backups spanning more than one tape, a job may report Volume name(s) in an order different from the order in which the volumes were actually written. During restore, volumes are read in this order. A file spanning over two tapes is truncated if the tail-part volume is read before the head-part volume, in which case only the head contents of the file are restored. ====================================================================== ---------------------------------------------------------------------- kern - 08-18-2005 06:58 PDT ---------------------------------------------------------------------- This report surprises me because the code *explicitly* attempts to keep the volumes in order. The oldest Bacula I have loaded is 1.36.3, so I am reasonably sure that the code also exists in version 1.36.1. To resolve the problem, I'll need the following: - A case where you can reproduce the problem. - The Job output from the backup - A copy of the bootstrap file produced for restore (copy it at the point of the yes/mod/no prompt at the end of the restore dialog). - The Job output from the restore. However, before doing all this, look at the file: src/cats/sql_get.c In the subroutine db_get_job_volume_names() there should be a reference to sorting on VolIndex, and in the subroutine db_get_job_volume_parameters(), the first SQL SELECT command should have ORDER BY VolIndex, ... If these are not true, then you will need to upgrade to 1.36.3. If they are true, then something is going wrong in the process and I'll need the above info for a start. ---------------------------------------------------------------------- awui - 08-18-2005 11:09 PDT ---------------------------------------------------------------------- Kern, thanks for the response - the ZIP file provided with the initial report contains backup/restore log files for two examples of failed attempts with two and three tapes, respectively, and one example with two tapes that restored correctly. I have attached the sgl_get.c excerpts of our current installation, which seem to include the SQL parts that should take care of the sorting. In further investigation, we queried PostgreSQL directly for the entries relating to Job 11 (backup spanning across three tapes) with sorting applied and also including the VolIndex entries. The correct order of volumes should have been Full-0005, Full-0004, Full-0002; sorting by VolIndex results in Full-0004, Full-0002, Full-0005 order, thus the VolIndexes were already incorrectly recorded in the SQL database. Note that the firstindex/lastindex entries reflect the correct ordering of the tapes. Maybe this helps in narrowing down the problem. ---------------------------------------------------------------------- kern - 08-18-2005 11:47 PDT ---------------------------------------------------------------------- The SQL code you have corresponds to mine, so that should not be the problem. Can you attach a list of the JobMedia records as a .txt file with the VolIndex column included? It appears that either something is going wrong in inserting the VolIndex items, or the sort is not working correctly. ---------------------------------------------------------------------- awui - 08-18-2005 12:08 PDT ---------------------------------------------------------------------- > Can you attach a list of the JobMedia records as a .txt file with the > VolIndex column included? - done, the .log version was malformatted in the process ... ---------------------------------------------------------------------- kern - 08-19-2005 02:09 PDT ---------------------------------------------------------------------- As you mentioned the last time, the VolIndex is not right. It is computed from: SELECT count(*) from JobMedia; so I cannot see how it could possibly end up with the values you have unless you deleted a job during the backup or some pruning happened. Is that possible? I cannot think of any other reason why the VolIndex would be out of order. In otherwords what would cause the total number of JobMedia records to decrease. In thinking about this a bit, I think the above select statement could be much improved and made independent of job deletion or pruning by changing it to be: Mmsg(mdb->cmd, "SELECT count(*) from JobMedia WHERE JobId=%s", edit_int64(jm->JobId, ed1)); Actually, this value (VolIndex) needs only to be incremented when a new MediaId is being added, but I cannot think of any simple efficient way to do that. The above code is at the very top of the subroutine db_create_jobmedia_record() in src/cats/sql_create.c. I am making the change to my code and testing it, I would appreciate it if you would do the same. ---------------------------------------------------------------------- awui - 08-19-2005 10:19 PDT ---------------------------------------------------------------------- I had another look into the log file to figure out what happened before the three-tape test (Job 11) that recorded the wrong order. A backup performed on the previous day had been cancelled, and immediately before recycling the Full-0004 volume, it was pruning 1 Job on Full-0001 and 3 Jobs on Full-0003 from catalog (these parts are not in the log snippets of our initial report; Full-0001 was the last tape written during the aborted backup, but neither Full-0001/0003 volumes were used for the failed backup; Full-0002 was written during the aborted backup and recycled as the last volume for Job 11). I found the sql_create.c command line, currently reading "SELECT count(*) from JobMedia" only. Since we have to recompile the daemons anyway for further investigation and seem to have a rather old version, we discussed to get more recent sources, make the modification you suggested, and then try again what happens. If you have any other testing scenario, we will be happy to try; obviously, the cancelled backup and the timing of the pruning may have had an impact. Bug History Date Modified Username Field Change ====================================================================== 08-16-05 08:18 awui New Bug 08-16-05 08:18 awui File Added: bacula-tests.zip 08-16-05 08:22 awui Bug Monitored: awui 08-18-05 06:58 kern Bugnote Added: 0001109 08-18-05 06:58 kern Status new => feedback 08-18-05 10:58 awui File Added: sql_get_snippet.txt 08-18-05 10:59 awui File Added: three-tapes-psql.log 08-18-05 11:09 awui Bugnote Added: 0001113 08-18-05 11:47 kern Bugnote Added: 0001114 08-18-05 12:06 awui File Added: three-tapes-psql.txt 08-18-05 12:08 awui Bugnote Added: 0001115 08-19-05 02:09 kern Bugnote Added: 0001117 08-19-05 02:11 kern Priority normal => urgent 08-19-05 10:19 awui Bugnote Added: 0001125 ====================================================================== |