Menu

#174 corrupted dumps of 2TB+ filesystems....

open
nobody
None
9
2024-09-08
2022-06-21
Greg Oster
No

Hi

While validating a dump made of a 16TB filesystem it was observed that the validation failed badly. Long story short is that any backups made of 2TB+ filesystems are likely incomplete and will contain corrupted data/files.

Enclosed are a set of patches that (at least partially -- they are necessary, but possibly not sufficient to) address this issue -- I've validated the dump/restore on a test set of data that was failing before, and have a larger dump/restore in progress for further testing/validation. These diffs are against the most recent version of dump (0.4b47) and testing was done on a Ubuntu 20.04.4 (x86_64) system. The filesystem being dumped looked like:
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup-Foo 19T 16T 2.2T 88% /u1

Basically the corruption occurs on files whose logical block addresses are larger than 32 bits. The requested address ends up overflowing, and pulling in a block that doesn't belong to the original file. Also note that we need to use ext2fs_block_iterate3() instead of ext2fs_block_iterate2(), as the latter cannot cope with 64-bit sizes.
I'm trying to build a sparse filesystem to show easy replication, but neither debugfs nor dd seem to want to deal with 64-bit offsets/sizes either.

In the meantime, at an absolute minimum, the existing code should be modified to detect that the filesystem being dumped is larger than 2TB and refuse to run.

Thanks.

Later...
Greg Oster

3 Attachments

Discussion

  • Greg Oster

    Greg Oster - 2022-06-21

    My math here might be wrong... if 4294967296 is the maximum logical block address, then the corruption wouldn't be seen until LBAs of over that value.. I.e. for a block size of 4K, that would mean on filesystems larger than 16TB... which would help explain why this hasn't been reported before.

    Later...
    Greg Oster

     
  • Greg Oster

    Greg Oster - 2022-06-23

    8.5TB of data successfully dumped/restored with the submitted patches in use. The dump/restore of this data set has 1000's of validation errors without the patches.

    Later...
    Greg Oster

     
  • Greg Oster

    Greg Oster - 2022-06-29

    These changes are necessary, but not sufficient. A multi-tape dump looks like it is corrupting a file that spans two tapes. The error seen is:
    Incorrect block for <filename> at 11432470600 blocks
    Incorrect block for <filename> at 11432470601 blocks
    ...
    Incorrect block for <filename> at 11432470790 blocks
    Incorrect block for <filename> at 11432470791 blocks</filename></filename></filename></filename>

    When the 16TB restore finishes I'll know if this is the only file that is corrupt. [UPDATE: 16TB restore finished. 'diff' showed that only the one file above (which spanned tapes) was corrupt.]

    I suspect that to fix this we'll need to modify compat/include/protocols/dumprestore.h to bump int32_t c_firstrec; to an int64_t . But such a change will need to be made in a backwards compatible way so old backups arn't rendered obsolete.

    Fixing the above should also allow an easy fix for the outstanding dump progress "% done" issue too.
    Later...
    Greg Oster

     

    Last edit: Greg Oster 2022-07-01
  • Tim Woodall

    Tim Woodall - 2024-09-04

    Thanks for this! I've managed to generate a test case that doesn't require terrabytes of data, only about 3GB of diskspace to reproduce:

    (This assume you have no loop devices in use - it will trash them if there are!)

    mkdir -p d1.mnt
    mkidr -p d2.mnt
    mkdir -p big
    
    rm -f d1
    truncate -s 3G d1
    losetup -f d1
    mkfs.ext4 /dev/loop0
    losetup -d /dev/loop0
    
    rm -f d2
    truncate -s 1G d2
    losetup -f d2
    mkfs.ext4 /dev/loop0
    losetup -d /dev/loop0
    
    mount -o loop d1 d1.mnt/
    mount -o loop d2 d2.mnt/
    
    truncate -s 15T d1.mnt/pv
    truncate -s 15T d2.mnt/pv
    losetup -f d1.mnt/pv
    losetup -f d2.mnt/pv
    
    vgcreate vg30T /dev/loop2 /dev/loop3
    
    lvcreate -n big -l 7860000 vg30T
    
    mkfs.ext4 /dev/vg30T/big
    
    mount /dev/vg30T/big big
    
    fallocate -l 10T big/bigfile1
    fallocate -l 10T big/bigfile2
    
    dd if=/dev/urandom of=big/bigblock bs=1K count=10K
    
    debugfs -R "stat bigblock" /dev/vg30T/big | cat
    
    rm big/bigfile1 big/bigfile2
    
    sync
    
    dump -v -0 /dev/vg30T/big -f - | restore -C -D big/ -f -
    
    umount big
    vgchange -an vg30T
    
    losetup -d /dev/loop3
    losetup -d /dev/loop2
    umount d1.mnt
    umount d2.mnt
    
    rm -f d1
    rm -f d2
    

    And this is the result without this patch:

    dump -v -0 /dev/vg30T/big -f - | restore -C -D big/ -f -
      DUMP: Date of this level 0 dump: Wed Sep  4 19:18:43 2024
      DUMP: Dumping /dev/vg30T/big (an unlisted file system) to standard output
      DUMP: Excluding inode 8 (journal inode) from dump
      DUMP: Excluding inode 7 (resize inode) from dump
      DUMP: Label: none
      DUMP: Writing 10 Kilobyte records
      DUMP: mapping (Pass I) [regular files]
      DUMP: mapping (Pass II) [directories]
      DUMP: estimated 133100 blocks.
      DUMP: Volume 1 started with block 1 at: Wed Sep  4 19:18:44 2024
    Dump   date: Wed Sep  4 19:18:43 2024
    Dumped from: the epoch
    Level 0 dump of an unlisted file system on dirac.home.woodall.me.uk:/dev/vg30T/big
    Label: none
      DUMP: dumping (Pass III) [directories]
      DUMP: dumping directory inode 2
      DUMP: dumping directory inode 11
      DUMP: dumping (Pass IV) [regular files]
      DUMP: dumping regular inode 14
    filesys = big/
    ./bigblock: tape and disk copies are different
      DUMP: Volume 1 completed at: Wed Sep  4 19:18:46 2024
      DUMP: Volume 1 133090 blocks (129.97MB)
      DUMP: Volume 1 took 0:00:02
      DUMP: Volume 1 transfer rate: 66545 kB/s
      DUMP: 133090 blocks (129.97MB)
      DUMP: finished in 2 seconds, throughput 66545 kBytes/sec
      DUMP: Date of this level 0 dump: Wed Sep  4 19:18:43 2024
      DUMP: Date this dump completed:  Wed Sep  4 19:18:46 2024
      DUMP: Average transfer rate: 66545 kB/s
      DUMP: DUMP IS DONE
    Some files were modified!  1 compare errors
    

    There's another serious bug related to EXT2_EXTENT_FLAGS_UNINIT which I've got a fix for (and might be the cause of bug 175)

    There's also an issue with the verify of long symlinks. (doesn't affect the restore, only a verify similar to what is being done above in the testcase)

    (There's also a longstanding bug related to verify and counting of extended attributes for which there's a fix in the debian package but doesn't appear to be in here)

     
  • Greg Oster

    Greg Oster - 2024-09-04

    You're most welcome! Thanks for coming up with a small test case -- I switched from 'dump' to 'restic' for backups at about the same time that I reported the issue, and so havn't had the need to chase this problem further.
    Later...
    Greg Oster

     
  • Tim Woodall

    Tim Woodall - 2024-09-08

    There was a minor bug in the original patches which I've fixed in the attached patch.
    I had to add a bit of extra logging to actually show that the test case was duming an EA block >2^32

    dumping EA (block) in inode #13 block=4312435892

     
  • Tim Woodall

    Tim Woodall - 2024-09-08

    A somewhat modified testcase that runs a bit quicker.

     

Log in to post a comment.