From: <bac...@li...> - 2006-10-11 21:30:21
|
The following bug has been CLOSED ====================================================================== http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000691 ====================================================================== Reported By: fvolf Assigned To: ====================================================================== Project: bacula Bug ID: 691 Category: Storage Daemon Reproducibility: always Severity: major Priority: normal Status: closed ====================================================================== Date Submitted: 10-11-2006 11:38 PDT Last Modified: 10-11-2006 14:29 PDT ====================================================================== Summary: Backups reported as succesfull are not on the tape Description: During a restore action I noticed that some backups are not on tape, while according to bacula they are succesfully backuped. Here are the details. I have a volume called DEZE-0015, according to the SQL database the following volumes should be on it: > > +-------+--------------------+---------------------+------+-------+--------+---------------+--------+ > > | JobId | Name | StartTime | Type | Level | Files | Bytes | Status | > > +-------+--------------------+---------------------+------+-------+--------+---------------+--------+ > > | 7,090 | LaptopDocuments | 2006-10-08 14:01:54 | B | I | 428 | 2,529,684,742 | T | > > | 7,091 | LaptopOS | 2006-10-08 16:57:53 | B | I | 62 | 11,696,692 | T | > > | 7,092 | LaptopC | 2006-10-08 16:58:30 | B | I | 1,212 | 101,578,587 | T | > > | 7,085 | CorfuOS | 2006-10-08 17:01:21 | B | F | 11,806 | 2,068,391,753 | T | > > | 7,086 | TheaOS | 2006-10-08 17:43:29 | B | D | 283 | 83,681,947 | T | > > | 7,093 | Drawbridge | 2006-10-09 01:05:07 | B | I | 1,647 | 189,754,414 | T | > > | 7,094 | Drawbridge-home2 | 2006-10-09 02:05:06 | B | I | 6,664 | 22,163,687 | T | > > | 7,095 | Drawbridge-home3 | 2006-10-09 02:08:40 | B | I | 2 | 4,946,568 | T | > > | 7,104 | BackupCatalog | 2006-10-09 07:10:51 | B | F | 1 | 104,853,720 | T | > > | 7,105 | SemiDocumentsMieke | 2006-10-09 14:15:30 | B | I | 9 | 313,290,044 | T | > > | 7,106 | SemiDocumentsSep | 2006-10-09 17:11:21 | B | I | 5 | 38,054,192 | T | > > +-------+--------------------+---------------------+------+-------+--------+---------------+--------+ However, if I list the tape using bls, I only find the first 6 backups: > > # bls -V DEZE-0015 -c /usr/local/etc/bacula-sd.conf -j DDS-3 > > bls: butil.c:269 Using device: "DDS-3" for reading. > > 09-Oct 19:56 bls: Ready to read from volume "DEZE-0015" on device "DDS-3" (/dev/nsa0). > > bls: acquire.c:200 jcr->dcr=0x80a8818 > > Volume Record: File:blk=0:1 SessId=311 SessTime=1159536358 JobId=1 DataLen=178 > > End Job Session Record: File:blk=0:15244 SessId=311 SessTime=1159536358 JobId=7090 > > Date=08-Oct-2006 14:07:58 Level=I Type=B Files=428 Bytes=2,529,774,685 Errors=0 Status=T > > Begin Job Session Record: File:blk=1:1 SessId=312 SessTime=1159536358 JobId=7091 > > Job=LaptopOS.2006-10-08_14.00.39 Date=08-Oct-2006 16:57:54 Level=I Type=B > > End Job Session Record: File:blk=1:182 SessId=312 SessTime=1159536358 JobId=7091 > > Date=08-Oct-2006 16:58:11 Level=I Type=B Files=62 Bytes=11,706,879 Errors=0 Status=T > > Begin Job Session Record: File:blk=2:1 SessId=313 SessTime=1159536358 JobId=7092 > > Job=LaptopC.2006-10-08_14.00.48 Date=08-Oct-2006 16:58:30 Level=I Type=B > > End Job Session Record: File:blk=2:1580 SessId=313 SessTime=1159536358 JobId=7092 > > Date=08-Oct-2006 16:59:37 Level=I Type=B Files=1,212 Bytes=101,791,107 Errors=0 Status=T > > Begin Job Session Record: File:blk=3:1 SessId=314 SessTime=1159536358 JobId=7085 > > Job=CorfuOS.2006-10-08_03.05.00 Date=08-Oct-2006 17:01:22 Level=F Type=B > > End Job Session Record: File:blk=5:1122 SessId=314 SessTime=1159536358 JobId=7085 > > Date=08-Oct-2006 17:18:40 Level=F Type=B Files=11,806 Bytes=2,070,280,449 Errors=0 Status=T > > Begin Job Session Record: File:blk=6:1 SessId=315 SessTime=1159536358 JobId=7086 > > Job=TheaOS.2006-10-08_03.05.01 Date=08-Oct-2006 17:43:29 Level=D Type=B > > End Job Session Record: File:blk=6:1300 SessId=315 SessTime=1159536358 JobId=7086 > > Date=08-Oct-2006 17:44:14 Level=D Type=B Files=283 Bytes=83,732,199 Errors=0 Status=T > > 09-Oct 20:32 bls: End of Volume at file 7 on device "DDS-3" (/dev/nsa0), Volume "DEZE-0015" > > 09-Oct 20:32 bls: End of all volumes. > > # This is very strange, because for the remaining jobs I also received a "completion notification" for the other jobs: > > 08-Oct 18:02 drawbrid09-Oct 01:05 drawbridge-dir: Start Backup JobId 7093, Job= > > 09-Oct 01:13 drawbridge-dir: Bacula 1.38.11 (28Jun06): 09-Oct-2006 01:13:35 > > JobId: 7093 > > Job: Drawbridge.2006-10-09_01.05.00 > > Backup Level: Incremental, since=2006-10-08 01:05:07 > > Client: "drawbridge-fd" i386-portbld-freebsd6.1,freebsd,6.1-S > > FileSet: "Drawbridge" 2004-10-05 01:05:02 > > Pool: "Default" > > Storage: "DDS-3" > > Scheduled time: 09-Oct-2006 01:05:00 > > Start time: 09-Oct-2006 01:05:07 > > End time: 09-Oct-2006 01:13:35 > > Elapsed time: 8 mins 28 secs > > Priority: 10 > > FD Files Written: 1,647 > > SD Files Written: 1,647 > > FD Bytes Written: 189,754,414 (189.7 MB) > > SD Bytes Written: 189,957,275 (189.9 MB) > > Rate: 373.5 KB/s > > Software Compression: None > > Volume name(s): DEZE-0015 > > Volume Session Id: 320 > > Volume Session Time: 1159536358 > > Last Volume Bytes: 3,443,065,623 (3.443 GB) > > Non-fatal FD errors: 0 > > SD Errors: 0 > > FD termination status: OK > > SD termination status: OK > > Termination: Backup OK > > > > 09-Oct 01:13 drawbridge-dir: Begin pruning Jobs. > > 09-Oct 01:13 drawbridge-dir: Pruned 1 Job for client drawbridge-fd from catalog > > 09-Oct 01:13 drawbridge-dir: Begin pruning Files. > > 09-Oct 01:14 drawbridge-dir: Pruned Files from 2 Jobs for client drawbridge-fd > > 09-Oct 01:14 drawbridge-dir: End auto prune. > > > > 09-Oct 02:05 drawbridge-dir: Start Backup JobId 7094, Job=Drawbridge-home2.2006 > > 09-Oct 02:08 drawbridge-dir: Bacula 1.38.11 (28Jun06): 09-Oct-2006 02:08:32 > > JobId: 7094 > > Job: Drawbridge-home2.2006-10-09_02.05.00 > > Backup Level: Incremental, since=2006-10-08 02:05:07 > > Client: "drawbridge-fd" i386-portbld-freebsd6.1,freebsd,6.1-S > > FileSet: "Drawbridge-home2" 2005-09-23 02:05:02 > > Pool: "Default" > > Storage: "DDS-3" > > Scheduled time: 09-Oct-2006 02:05:00 > > Start time: 09-Oct-2006 02:05:06 > > End time: 09-Oct-2006 02:08:32 > > Elapsed time: 3 mins 26 secs > > Priority: 10 > > FD Files Written: 6,664 > > SD Files Written: 6,664 > > FD Bytes Written: 22,163,687 (22.16 MB) > > SD Bytes Written: 22,903,635 (22.90 MB) > > Rate: 107.6 KB/s > > Software Compression: None > > Volume name(s): DEZE-0015 > > Volume Session Id: 321 > > Volume Session Time: 1159536358 > > Last Volume Bytes: 3,466,070,458 (3.466 GB) > > Non-fatal FD errors: 0 > > SD Errors: 0 > > FD termination status: OK > > SD termination status: OK > > Termination: Backup OK > > > > 09-Oct 02:08 drawbridge-dir: Begin pruning Jobs. > > 09-Oct 02:08 drawbridge-dir: Pruned 1 Job for client drawbridge-fd from catalog > > 09-Oct 02:08 drawbridge-dir: Begin pruning Files. > > 09-Oct 02:08 drawbridge-dir: No Files found to prune. > > 09-Oct 02:08 drawbridge-dir: End auto prune. [deleted more of the same] So, according to this log, jobs 7093 and 7094 are safely on volume DEZE-0015, but according to bls they are not. Now I'm trying to restore a file from job 7105, once again the log indicates succesfull storage: > > 09-Oct 14:15 drawbridge-dir: Start Backup JobId 7105, Job=SemiDocumentsMieke.20 > > 09-Oct 14:12 drawbridge-fd: DIR and FD clocks differ by -158 seconds, FD automa > > 09-Oct 14:15 drawbridge-sd: Spooling data ... > > 09-Oct 17:07 drawbridge-sd: Committing spooled data to Volume "DEZE-0015". Desp > > 09-Oct 17:11 drawbridge-sd: Sending spooled attrs to the Director. Despooling 3 > > 09-Oct 17:11 drawbridge-dir: Bacula 1.38.11 (28Jun06): 09-Oct-2006 17:11:10 > > JobId: 7105 > > Job: SemiDocumentsMieke.2006-10-09_08.05.00 > > Backup Level: Incremental, since=2006-10-06 11:29:47 > > Client: "semi-fd" Windows XP,MVS,NT 5.1.2600 > > FileSet: "WindowsDocumentsMieke" 2006-08-20 18:55:24 > > Pool: "Default" > > Storage: "DDS-3" > > Scheduled time: 09-Oct-2006 08:05:00 > > Start time: 09-Oct-2006 14:15:30 > > End time: 09-Oct-2006 17:11:10 > > Elapsed time: 2 hours 55 mins 40 secs > > Priority: 12 > > FD Files Written: 9 > > SD Files Written: 9 > > FD Bytes Written: 313,290,044 (313.2 MB) > > SD Bytes Written: 313,292,266 (313.2 MB) > > Rate: 29.7 KB/s > > Software Compression: None > > Volume name(s): DEZE-0015 > > Volume Session Id: 352 > > Volume Session Time: 1159536358 > > Last Volume Bytes: 3,889,528,461 (3.889 GB) > > Non-fatal FD errors: 0 > > SD Errors: 0 > > FD termination status: OK > > SD termination status: OK > > Termination: Backup OK > > > > 09-Oct 17:11 drawbridge-dir: Begin pruning Jobs. > > 09-Oct 17:11 drawbridge-dir: No Jobs found to prune. > > 09-Oct 17:11 drawbridge-dir: Begin pruning Files. > > 09-Oct 17:11 drawbridge-dir: Pruned Files from 5 Jobs for client semi-fd from c > > 09-Oct 17:11 drawbridge-dir: End auto prune. If I try to restore, I get the following and nothing is restored: > > 09-Oct 19:40 drawbridge-dir: Start Restore Job RestoreFiles.2006-10-09_19.40.21 > > 09-Oct 19:40 drawbridge-sd: Ready to read from volume "DEZE-0015" on device "DDS-3" (/dev/nsa0). > > 09-Oct 19:40 drawbridge-sd: Forward spacing to file:block 13:0. > > 09-Oct 19:42 drawbridge-sd: End of Volume at file 7 on device "DDS-3" (/dev/nsa0), Volume "DEZE-0015" > > 09-Oct 19:42 drawbridge-sd: End of all volumes. Strangely, it tries to find file:block 13:0, which seems incorrect to me, but I do not not know the internals of the backup system, so maybe file numbers can be skipped(?) ====================================================================== ---------------------------------------------------------------------- ArnoL - 10-11-2006 12:37 PDT ---------------------------------------------------------------------- What is your storage configuration, especially "BSF at EOM" and "TWO EOF"? Seeing that you run FreeBSD, these settings might be quite important to you. What I read in the manual indicates that you should have "BSF at EOM" enabled when you have "TWO EOF" enabled. ---------------------------------------------------------------------- fvolf - 10-11-2006 12:46 PDT ---------------------------------------------------------------------- Here is the storage configuration: # # A FreeBSD tape drive # Device { Name = DDS-3 Description = "DDS-3 for FreeBSD" Media Type = DDS-3 Archive Device = /dev/nsa0 AutomaticMount = yes; # when device opened, read it AlwaysOpen = yes Offline On Unmount = yes Close On Poll = yes Volume Poll Interval = 0 # Broken in 1.38.x????? #Volume Poll Interval = 1h Hardware End of Medium = no BSF at EOM = yes Backward Space Record = no Backward Space File = no Fast Forward Space File = no TWO EOF = yes Maximum Spool Size = 40g Spool Directory = /home4/bacula } ---------------------------------------------------------------------- ArnoL - 10-11-2006 13:09 PDT ---------------------------------------------------------------------- Looks ok to me. I suspect that the problem is that there are two EOFs between jobs, which should never happen because it indicates a "End of Tape" to the OS. Not really knowing FreeBSD myself you could perhaps verify this using mt and dd to try to either forward space over the "fake" EOF (which will probably fail) and check what is beyond it. Or you could space to the EOT and dump a limited amount of data onto tape. This might render the tape useless, but it might happen that it overwrites only the wrong EOT mark and allows spacing beyond it to verify if there is Bacula data after the EOT. This could ensure that the data Bacula claims to have written to that tape is really there, and it would prove that the EOT is there due to a problem. I suppose you'll ned a test case that produces the problem which could then be investigated. You could set up a test job - probably small - and a test pool with one volume. Run this job a number of times. Make sure you have some normal runs, some runs where no data is saved, and you unmount / remount the tape in between. Keep the job reports to compare with bls and see if there is a pattern. Another possiblity is to simply run the SD with debug output, capture that output, and see if you can find where it writes EOF marks to tape. Perhaps the debug log allows Kern or some other developer to understand what leads to the double EOF marks. Good luck! ---------------------------------------------------------------------- kern - 10-11-2006 14:29 PDT ---------------------------------------------------------------------- I don't believe that this is a bug but rather as Arno told you, you somehow have gotten two consecutive EOF marks on your tape, which will cause Bacula and most other programs to stop reading, but apparently when the OS does a EOD (go to end of data), it doesn't see the double EOF. This condition most likely occurred for one of the following reasons: 1. You ignored or attempted to repair messages by Bacula saying that the number of files on the Volume did not correspond to the catalog. 2. Your tape drive is not 100% correctly configured -- by that I mean either the drive parameters or the Device resource in the SD. The first thing to do is to stop using that tape. The second thing is to go through the *full* procedure (9 points if I remember right) for verifying that your tape drive is properly configured. See the manual. Bug History Date Modified Username Field Change ====================================================================== 10-11-06 11:38 fvolf New Bug 10-11-06 11:38 fvolf File Added: btape.out 10-11-06 12:37 ArnoL Bugnote Added: 0001939 10-11-06 12:46 fvolf Bugnote Added: 0001940 10-11-06 13:09 ArnoL Bugnote Added: 0001941 10-11-06 14:29 kern Steps to Reproduce Updated 10-11-06 14:29 kern Description Updated 10-11-06 14:29 kern Bugnote Added: 0001942 10-11-06 14:29 kern Additional Information Updated 10-11-06 14:29 kern Resolution open => not a bug 10-11-06 14:29 kern Status new => closed ====================================================================== |