Menu

#162 file corruption

open
nobody
None
5
2024-09-05
2014-03-12
David S
No

Overview - Ext4 file system lvm2 snapshot is dumped (not mounted) and then restored to new logical volume. Restored file system contains files with binary differences to snapshot and original file system.

Linux distribution: Debian GNU/Linux jessie/sid
# uname -a
Linux gwar 3.10.22gwar_2013.12.07 #1 SMP Sat Dec 7 12:10:45 EST 2013 x86_64 GNU/Linux

dump 0.4b44 (using libext2fs 1.42.9 of 4-Feb-2014)

restore 0.4b44 (using libext2fs 1.42.9 of 4-Feb-2014)

commands used:
+ lvcreate --size 40g --snapshot --name home-snap /dev/vg0/home
  Logical volume "home-snap" created
+ dump 0f /mnt/home.2014-03-11.dump /dev/vg0/home-snap
  DUMP: Date of this level 0 dump: Tue Mar 11 13:44:54 2014
  DUMP: Dumping /dev/vg0/home-snap (an unlisted file system) to /mnt/home.2014-03-11.dump
  DUMP: Label: none
  DUMP: Writing 10 Kilobyte records
  DUMP: mapping (Pass I) [regular files]
  DUMP: mapping (Pass II) [directories]
  DUMP: estimated 715508510 blocks.
  DUMP: Volume 1 started with block 1 at: Tue Mar 11 13:45:00 2014
  DUMP: dumping (Pass III) [directories]
  DUMP: dumping (Pass IV) [regular files]
  DUMP: 3.40% done at 81157 kB/s, finished in 2:21
  DUMP: 6.98% done at 83188 kB/s, finished in 2:13
  DUMP: 10.47% done at 83073 kB/s, finished in 2:08
  DUMP: 13.93% done at 82900 kB/s, finished in 2:03
  DUMP: 17.09% done at 81362 kB/s, finished in 2:01
  DUMP: 20.49% done at 81303 kB/s, finished in 1:56
  DUMP: 23.69% done at 80570 kB/s, finished in 1:52
  DUMP: 27.02% done at 80432 kB/s, finished in 1:48
  DUMP: 30.05% done at 79522 kB/s, finished in 1:44
  DUMP: 33.28% done at 79260 kB/s, finished in 1:40
  DUMP: 36.35% done at 78722 kB/s, finished in 1:36
  DUMP: 39.41% done at 78243 kB/s, finished in 1:32
  DUMP: 42.52% done at 77926 kB/s, finished in 1:27
  DUMP: 45.66% done at 77710 kB/s, finished in 1:23
  DUMP: 48.82% done at 77551 kB/s, finished in 1:18
  DUMP: 51.83% done at 77177 kB/s, finished in 1:14
  DUMP: 55.00% done at 77090 kB/s, finished in 1:09
  DUMP: 57.86% done at 76596 kB/s, finished in 1:05
  DUMP: 60.77% done at 76215 kB/s, finished in 1:01
  DUMP: 63.74% done at 75942 kB/s, finished in 0:56
  DUMP: 66.45% done at 75408 kB/s, finished in 0:53
  DUMP: 69.58% done at 75374 kB/s, finished in 0:48
  DUMP: 72.52% done at 75146 kB/s, finished in 0:43
  DUMP: 75.48% done at 74959 kB/s, finished in 0:39
  DUMP: 78.39% done at 74736 kB/s, finished in 0:34
  DUMP: 81.29% done at 74521 kB/s, finished in 0:29
  DUMP: 84.02% done at 74164 kB/s, finished in 0:25
  DUMP: 86.86% done at 73937 kB/s, finished in 0:21
  DUMP: 89.71% done at 73725 kB/s, finished in 0:16
  DUMP: 92.51% done at 73494 kB/s, finished in 0:12
  DUMP: 95.32% done at 73277 kB/s, finished in 0:07
  DUMP: 97.54% done at 72645 kB/s, finished in 0:04
  DUMP: 99.71% done at 71997 kB/s, finished in 0:00
  DUMP: Closing /mnt/home.2014-03-11.dump
  DUMP: Volume 1 completed at: Tue Mar 11 16:30:43 2014
  DUMP: Volume 1 715360690 blocks (698594.42MB)
  DUMP: Volume 1 took 2:45:43
  DUMP: Volume 1 transfer rate: 71946 kB/s
  DUMP: 715360690 blocks (698594.42MB) on 1 volume(s)
  DUMP: finished in 9943 seconds, throughput 71946 kBytes/sec
  DUMP: Date of this level 0 dump: Tue Mar 11 13:44:54 2014
  DUMP: Date this dump completed:  Tue Mar 11 16:30:43 2014
  DUMP: Average transfer rate: 71946 kB/s
  DUMP: DUMP IS DONE

# vgdisplay
  --- Volume group ---
  VG Name               vg0
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  27
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                5
  Open LV               3
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               5.46 TiB
  PE Size               128.00 MiB
  Total PE              44702
  Alloc PE / Size       25480 / 3.11 TiB
  Free  PE / Size       19222 / 2.35 TiB
  VG UUID               g72JMA-wTIJ-wcM3-q0GX-MMe4-Zx4Z-uEzgYQ

# lvcreate -L 1200g -n hometest vg0
  Logical volume "hometest" created

# mkfs -t ext4 /dev/vg0/hometest
mke2fs 1.42.9 (4-Feb-2014)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=128 blocks, Stripe width=256 blocks
78643200 inodes, 314572800 blocks
15728640 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
9600 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks: 
    32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
    4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 
    102400000, 214990848

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done     

root@gwar:/# mount /dev/vg0/hometest /hometest
root@gwar:/# cd /hometest
root@gwar:/hometest# restore -rf /mnt/home.2014-03-11.dump
restore: ./lost+found: File exists
root@gwar:/hometest# rm restoresymtable
root@gwar:/hometest# cd ..
root@gwar:/# mount -o ro /dev/vg0/home-snap /home-snap
root@gwar:/# diff -r home-snap hometest
Binary files home-snap/dsorkin/libvirt_images/oderus.img and hometest/dsorkin/libvirt_images/oderus.img differ
Binary files home-snap/dsorkin2/.cache/mozilla/firefox/5lnxsfy1.default/Cache/_CACHE_001_ and hometest/dsorkin2/.cache/mozilla/firefox/5lnxsfy1.default/Cache/_CACHE_001_ differ
Binary files home-snap/dsorkin2/.cache/mozilla/firefox/5lnxsfy1.default/Cache/_CACHE_002_ and hometest/dsorkin2/.cache/mozilla/firefox/5lnxsfy1.default/Cache/_CACHE_002_ differ
Binary files home-snap/dsorkin2/.cache/mozilla/firefox/5lnxsfy1.default/Cache/_CACHE_003_ and hometest/dsorkin2/.cache/mozilla/firefox/5lnxsfy1.default/Cache/_CACHE_003_ differ
(lines omitted for sym links and special files)

root@gwar:/# cmp home-snap/dsorkin2/.cache/mozilla/firefox/5lnxsfy1.default/Cache/_CACHE_001_ hometest/dsorkin2/.cache/mozilla/firefox/5lnxsfy1.default/Cache/_CACHE_001_
home-snap/dsorkin2/.cache/mozilla/firefox/5lnxsfy1.default/Cache/_CACHE_001_ hometest/dsorkin2/.cache/mozilla/firefox/5lnxsfy1.default/Cache/_CACHE_001_ differ: byte 495617, line 7008
root@gwar:/# sum home-snap/dsorkin2/.cache/mozilla/firefox/5lnxsfy1.default/Cache/_CACHE_001_ hometest/dsorkin2/.cache/mozilla/firefox/5lnxsfy1.default/Cache/_CACHE_001_ home/dsorkin2/.cache/mozilla/firefox/5lnxsfy1.default/Cache/_CACHE_001_
35772  4096 home-snap/dsorkin2/.cache/mozilla/firefox/5lnxsfy1.default/Cache/_CACHE_001_
55937  4096 hometest/dsorkin2/.cache/mozilla/firefox/5lnxsfy1.default/Cache/_CACHE_001_
35772  4096 home/dsorkin2/.cache/mozilla/firefox/5lnxsfy1.default/Cache/_CACHE_001_
root@gwar:/# ls -l home-snap/dsorkin2/.cache/mozilla/firefox/5lnxsfy1.default/Cache/_CACHE_001_ hometest/dsorkin2/.cache/mozilla/firefox/5lnxsfy1.default/Cache/_CACHE_001_ home/dsorkin2/.cache/mozilla/firefox/5lnxsfy1.default/Cache/_CACHE_001_
-rw------- 1 1004 users 4194304 Oct 17 23:46 home/dsorkin2/.cache/mozilla/firefox/5lnxsfy1.default/Cache/_CACHE_001_
-rw------- 1 1004 users 4194304 Oct 17 23:46 home-snap/dsorkin2/.cache/mozilla/firefox/5lnxsfy1.default/Cache/_CACHE_001_
-rw------- 1 1004 users 4194304 Oct 17 23:46 hometest/dsorkin2/.cache/mozilla/firefox/5lnxsfy1.default/Cache/_CACHE_001_

root@gwar:/# tune2fs -l /dev/vg0/home-snap
tune2fs 1.42.9 (4-Feb-2014)
Filesystem volume name:   <none>
Last mounted on:          /home-snap
Filesystem UUID:          9b55f6e6-f0f0-4e48-b05c-810cbbaf2aeb
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags:         signed_directory_hash 
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              201326592
Block count:              805306368
Reserved block count:     40265318
Free blocks:              614251931
Free inodes:              200777989
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      832
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
RAID stride:              128
RAID stripe width:        256
Flex block group size:    16
Filesystem created:       Fri Sep 13 19:47:09 2013
Last mount time:          Tue Mar 11 17:23:07 2014
Last write time:          Tue Mar 11 17:23:43 2014
Mount count:              37
Maximum mount count:      -1
Last checked:             Fri Sep 13 19:47:09 2013
Check interval:           0 (<none>)
Lifetime writes:          1595 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:           256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      ed8a746d-558f-426e-822c-42518900ea68
Journal backup:           inode blocks
root@gwar:/# 

Thanks!

Discussion

  • David S

    David S - 2014-03-12

    I have duplicated the behavior using direct dump/restore as follows:

     root@gwar:/hometest# dump 0f - /dev/vg0/home-snap | restore rf -
    

    The corruption seems to occur when the original file ends in a lot of zeros. Dump/restore replaces some of these zeros with different values. The checksum of the corrupted files shows the problem is deterministic. The files are not sparse files. I think this is a legitimate defect that should be fixed.

    Thanks

    root@gwar:/# ls -lsh /hometest/dsorkin2/.cache/mozilla/firefox/5lnxsfy1.default/Cache/CACHE_001 /home/dsorkin2/.cache/mozilla/firefox/5lnxsfy1.default/Cache/CACHE_001
    4.0M -rw------- 1 1004 users 4.0M Oct 17 23:46 /home/dsorkin2/.cache/mozilla/firefox/5lnxsfy1.default/Cache/CACHE_001
    4.0M -rw------- 1 1004 users 4.0M Oct 17 23:46 /hometest/dsorkin2/.cache/mozilla/firefox/5lnxsfy1.default/Cache/CACHE_001

    root@gwar:/# sum /hometest/dsorkin2/.cache/mozilla/firefox/5lnxsfy1.default/Cache/CACHE_001 /home/dsorkin2/.cache/mozilla/firefox/5lnxsfy1.default/Cache/CACHE_001
    55937 4096 /hometest/dsorkin2/.cache/mozilla/firefox/5lnxsfy1.default/Cache/CACHE_001
    35772 4096 /home/dsorkin2/.cache/mozilla/firefox/5lnxsfy1.default/Cache/CACHE_001

    root@gwar:/# cmp /hometest/dsorkin2/.cache/mozilla/firefox/5lnxsfy1.default/Cache/CACHE_002 /home/dsorkin2/.cache/mozilla/firefox/5lnxsfy1.default/Cache/CACHE_002
    /hometest/dsorkin2/.cache/mozilla/firefox/5lnxsfy1.default/Cache/CACHE_002 /home/dsorkin2/.cache/mozilla/firefox/5lnxsfy1.default/Cache/CACHE_002 differ: byte 471041, line 2494

    root@gwar:/# od -Ad -c -j 471041 /home/dsorkin2/.cache/mozilla/firefox/5lnxsfy1.default/Cache/CACHE_002
    0471041 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
    *
    4194289 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
    4194304
    root@gwar:/hometest/dsorkin2# od -Ad -c -j 471041 /hometest/dsorkin2/.cache/mozilla/firefox/5lnxsfy1.default/Cache/CACHE_002 | head
    0471041 006 301 026 235 033 221 L j \0 367 001 005 016 G + 245
    0471057 \0 354 020 202 025 277 025 024 002 / 034 ! 4 r 004 [
    0471073 002 241 D E 023 202 022 006 002 244 \f 025 021 305 006 254
    0471089 024 Z 003 207 004 025 \0 030 \b G 002 026 E 307 023 o
    0471105 . K ( 210 006 A 026 5 020 305 \a " 024 323 \a 032
    0471121 001 271 004 274 \r 207 005 027 006 ; \0 233 025 } 030 1
    0471137 034 \a 027 375 003 . 032 310 \0 _ : 357 032 w 006 G
    0471153 022 371 004 247 020 302 2 333 \f 302 \v \a \0 202 001 b
    0471169 020 351 1 232 @ \f 004 & 034 ~ 022 w \v , 002 a
    0471185 021 B * 7 004 330 026 262 020 226 " , 001 \r 006 243

     
  • dreamlayers

    dreamlayers - 2016-10-02

    I'm getting this same issue with dump 0.4b44-7 on Ubuntu 16.04.1 LTS x86_64. I also reported the bug to Ubuntu at https://bugs.launchpad.net/ubuntu/+source/dump/+bug/1629617

    Looking at _CACHE_001_ in debugfs and doing "dump_extents _CACHE_001_" I see:

    Level Entries       Logical          Physical Length Flags
     0/ 0   1/  2     0 -     3 4961280 - 4961283      4 
     0/ 0   2/  2     4 -  1023 4961284 - 4962303   1020 Uninit
    

    Maybe that Uninit flag isn't handled properly?

     

    Last edit: dreamlayers 2016-10-02
  • fwxx

    fwxx - 2022-10-03

    I think, I encountered the same bug on slackware64 15.0 using dump version 0.4b47 from the slackbuilds.org site (e2fsprogs-1,46.5).
    When checking the restored dump of that tool, some files had a "garbage" block of data somewhere. e.g. at the end.
    Comparing that with debugfs on the original ext4 filesystem, revealed that
    the blocks on disk are very identical to the restored file, but dump seems not to honor some new(?) filesystem block marker, which can mark a filesystem block as "unwritten".
    Example:
    An example file had 1 Garbage block at the very end after restore.
    the filefrag command gives some hint:

    root@linux:/home/user/.mozilla/firefox/somedir/# filefrag -v 2918063365piupsah.sqlite
    Filesystem type is: ef53
    File size of 2918063365piupsah.sqlite is 49152 (12 blocks of 4096 bytes)
    ext: logical_offset: physical_offset: length: expected: flags:
    0: 0.. 1: 12689028.. 12689029: 2:
    1: 2.. 10: 12689044.. 12689052: 9: 12689030:
    2: 11.. 11: 12689053.. 12689053: 1: last,unwritten,eof
    2918063365piupsah.sqlite: 2 extents found
    root@linux:/home/user/.mozilla/firefox/somedir/#

    The last 4k block is marked as "unwritten". An when you mount the filesystem and copy that file,
    that exact block is returned as 0 all over. But dump reads(and restores) the actual -but invalid- on disk data of that block, which contains garbage.
    Maybe this is also some failure in the filesystem, that this block is always marked unwritten, but e2fsck -f does not complain about this, and obviously the kernel 5.15.63 does not complain and everything seems to work with other tools.
    And of course, the original filesystem was offline during dump, so no runtime problems as mentioned from Linus 20 years ago should be relevant.

    Conclusion:
    - For me, only a very small amount of files were affected: (less than 0.01%)
    - dump in its current version(0.4b47) is unfortunately not usable for ext4 filesystem backups.
    - it's a pity, nobody cares about that backup tool :(

     
  • dreamlayers

    dreamlayers - 2022-10-04

    Thank you for investigating this. It is easy to create a file with a single unwritten extent using the fallocate command, like: fallocate -l 512 foo. Such a file will trigger this bug.

     
  • Tim Woodall

    Tim Woodall - 2024-09-05

    I've just reported this as #176 and attached a patch there that fixes this.

     
  • Tim Woodall

    Tim Woodall - 2024-09-05

    attaching patch here too.

     

Log in to post a comment.