#13 dump/restore when first output fs is full

open
Stelian Pop
None
5
2009-06-18
2009-01-20
Michael Vezie
No

I'm backing up filesystems to a pair of backup partitions using dump <level>uafqj /back1/file,/back2/file (so that if /back1 fills up, it will put the rest on /back2). The problem is that when /back1 is full before the backup starts (quite possible, as all the backups are automated from scripts, so if one backup fills up /back1 the rest will start with a full /back1), it switches to /back2 with the first block. But the volume for the file in /back2 is tagged as volume 2, and with the /back1 file empty, there's no volume 1, and restore refuses to restore if it can't find a volume 1.

I tried changing the c_volume field in the first block to 1 (fixing the checksum) in a hex editor, but that didn't work. It seemed there were two TS_TAPE blocks (first block tagged as volume 2, and second tagged as volume 1), and that confuses restore. I even tried skipping the first or second block, but it still didn't work.

Skipping the first block:

dd if=pub.day.20090110.dump.002 bs=1k skip=1 | restore tdvf -
Verify tape and initialize maps
Input is from a local file/pipe
Volume header (new inode format)
Input block size is 10
Volume header (new inode format)
Dump tape is compressed.
Dump date: Sat Jan 10 00:07:10 2009
Dumped from: Sun Jan 4 00:06:55 2009
Level 2 dump of /pub on deacon:/dev/md4
Label: PUB
Prefix size error, max size 32768, got 42582571
decompression error, block 11: length mismatch
File decompression error while restoring <file removal list>
continue? [yn] y
Prefix size error, max size 32768, got 239376861
Prefix size error, max size 32768, got 215409582
decompression error, block 54: length mismatch
File decompression error while restoring <file removal list>
continue? [yn] n

Using the first (hacked to be volume 1) block, but skipping the second yields similar results.

I think dump should be fixed so that if the very first block fails the write, it should write the second file as if it were the first (calling it volume 1), and restore should be fixed to handle files with this peculiar corruption (belt and suspenders).

Discussion

  • Stelian Pop
    Stelian Pop
    2009-06-18

    • assigned_to: nobody --> stelian
     
  • Michael Vezie
    Michael Vezie
    2010-12-30

    This is still needed.

    Upon further checking, it seems that the first TS_TAPE record is a "new volume" record (has "volume 2", and the next one is the proper first record of the volume (it is tagged as volume 1 while the first record is tagged as volume 2). I don't know why skipping the first record doesn't work but it might be related to how the tape buffers are managed.

    It could be argued that this should be fixed in dump, that when nothing is written to the first volume, it should treat the second as a first volume, but I have several dump files that I need to read that have the bug, and I can't read them (and need to!).

     
  • Stelian Pop
    Stelian Pop
    2011-01-03

    I'm not sure if this is caused by the fact that you're using compression or not.

    Anyway, I should take a deeper look into this, but unfortunately I haven't had the opportunity yet (I'm quite busy at the moment...)

     
  • Michael Vezie
    Michael Vezie
    2011-01-12

    I'm not sure it's related to compression. I did a test where I created a couple tiny filesystems (both about 93k), filled them up, and dumped the first one to second,elsewhere. The file in second was empty (fs was full), so all the data went to "elsewhere" (where there's plenty of space).

    I then did a dump of the first filesystem directly to elsewhere and compared the two files.

    The first thing I noticed was that if I just dumped only to elsewhere, the file was actually larger than if I first attempted to dump to a full fs before going to elsewhere.

    I ran the test both with compression on and off, and got the same (size difference) results both time.

    What this suggests to me is that if dump can't write to the first file, that it won't write the missing data to the second file.

    Clearly there's a bug in dump. It shouldn't pretend the second file is the second volume if it didn't write anything to the first file. And that may be easier to fix than restore. I think restore should also be fixed (but if my suspicion is true, it may be impossible).