|
From: Ben H. <bj...@ca...> - 2007-02-07 15:03:08
|
I've recently found that some of the dumps produced by dump don't correctly record certain files. In the example that brought this problem to my attention, a file that was created six hours before dump was run was recorded in the dump as consisting entirely of zeroes, which it didn't when viewed through the filesystem. I can easily reproduce the problem using the following test script: i=0 while sleep 1; do echo $$.$i > testfile.$$.$i /sbin/dump -0f - . 2>/dev/null | /sbin/restore -Cf - i=$(($i + 1)) done After a few iterations, there will usually be some test files that repeatedly appear different in tape and disk copies, and they can continue to be dumped incorrectly for tens of minutes at least (I've not yet run the test for longer). Running "sync" doesn't seem to help matters. Running "blockdev --flushbufs" does help, in that files created before the flush start appearing correctly, but it can only be run by root, which makes it a bit of a nuisance to set up. Is this behaviour to be expected? Does it represent a bug in either dump or Linux? How do other people deal with it? My tests have so far been on the following systems: SUSE LINUX 10.1 (X86-64) dump 0.4b41-14 kernel 2.6.16.27-0.6-smp e2fsprogs 1.38-25.9 glibc 2.4-31.1 SUSE LINUX Enterprise Server 9 (i586) dump 0.4b35-41.1 kernel 2.6.5-7.283-smp e2fsprogs 1.38-4.18 glibc 2.3.3-98.73 Debian testing/unstable dump 0.4b41-2 kernel 2.6.15-1-amd64-k8-smp e2fsprogs 1.38+1.39-WIP-2005.12.31-1 libc6 2.3.6-10 Sample dump output is: DUMP: Date of this level 0 dump: Wed Feb 7 14:56:10 2007 DUMP: Dumping /dev/sda5 (/home (dir /dump/test)) to standard output DUMP: Label: none DUMP: Writing 10 Kilobyte records DUMP: mapping (Pass I) [regular files] DUMP: mapping (Pass II) [directories] DUMP: estimated 804 blocks. DUMP: Volume 1 started with block 1 at: Wed Feb 7 14:56:10 2007 DUMP: dumping (Pass III) [directories] DUMP: dumping (Pass IV) [regular files] DUMP: Volume 1 completed at: Wed Feb 7 14:56:10 2007 DUMP: Volume 1 800 blocks (0.78MB) DUMP: 800 blocks (0.78MB) DUMP: finished in less than a second DUMP: Date of this level 0 dump: Wed Feb 7 14:56:10 2007 DUMP: Date this dump completed: Wed Feb 7 14:56:10 2007 DUMP: Average transfer rate: 0 kB/s DUMP: DUMP IS DONE -- Ben Harris, University of Cambridge Computing Service. Tel: (01223) 334728 |
|
From: <pm...@fr...> - 2007-02-07 19:17:38
|
On Wed, 7 Feb 2007, Ben Harris wrote: > Is this behaviour to be expected? Does it represent a bug in either dump > or Linux? How do other people deal with it? Hello Ben, I think, it's expected, since you use dump on an active file-system. How to deal with it: - you can mount the file-system read-only during the dump - you can use the snapshot feature of LVM (that's what I do, very nice) Cheers, Peter -- http://pmrb.free.fr/contact/ |
|
From: Ben H. <bj...@ca...> - 2007-03-08 16:31:22
|
On Wed, 7 Feb 2007, Peter Münster wrote: > I think, it's expected, since you use dump on an active file-system. So it would appear. It looks like dump actually calls BLKFLSBUF itself, so the problem only occurs when dump isn't running as root. > How to deal with it: > - you can mount the file-system read-only during the dump This doesn't help. Without the BLKFLSBUF, the dump is still inconsistent. > - you can use the snapshot feature of LVM (that's what I do, very nice) That looks like by far the best solution. Unfortunately, not all of my users use LVM at the moment. Meanwhile, I think <http://dump.sourceforge.net/isdumpdeprecated.html> really ought to be updated a little, since the following statements are now untrue: "you can safely use dump on ... read-only filesystems." "You can also safely use dump on idle filesystems if you sync before dumping" -- Ben Harris, University of Cambridge Computing Service. Tel: (01223) 334728 |
|
From: Stelian P. <st...@po...> - 2007-03-09 16:15:18
|
Le jeudi 08 mars 2007 à 16:31 +0000, Ben Harris a écrit : > On Wed, 7 Feb 2007, Peter Münster wrote: > > > I think, it's expected, since you use dump on an active file-system. > > So it would appear. It looks like dump actually calls BLKFLSBUF itself, > so the problem only occurs when dump isn't running as root. Dump does anyway need to have enough priviledges to access the raw block device directly, and I would have expected BLKFLSBUF to work in these conditions. > > How to deal with it: > > - you can mount the file-system read-only during the dump > > This doesn't help. Without the BLKFLSBUF, the dump is still inconsistent. How can this be ? The remount process explicitely flushes the data to the disk, and in R/O mode no further modifications are allowed. Stelian. -- Stelian Pop <st...@po...> |
|
From: Ben H. <bj...@ca...> - 2007-03-09 17:20:15
|
On Fri, 9 Mar 2007, Stelian Pop wrote: > Le jeudi 08 mars 2007 à 16:31 +0000, Ben Harris a écrit : > > On Wed, 7 Feb 2007, Peter Münster wrote: > > > > > I think, it's expected, since you use dump on an active file-system. > > > > So it would appear. It looks like dump actually calls BLKFLSBUF itself, > > so the problem only occurs when dump isn't running as root. > > Dump does anyway need to have enough priviledges to access the raw block > device directly, and I would have expected BLKFLSBUF to work in these > conditions. It doesn't. BLKFLSBUF requires CAP_SYS_ADMIN, which isn't required just to open the device for reading. See linux/block/ioctl.c::blkdev_ioctl(). > > > How to deal with it: > > > - you can mount the file-system read-only during the dump > > > > This doesn't help. Without the BLKFLSBUF, the dump is still inconsistent. > > How can this be ? The remount process explicitely flushes the data to > the disk, and in R/O mode no further modifications are allowed. Actually, I was mistaken -- using a r/o mount gives me a different failure mode. Using this test script: i=0 while sleep 1; do echo $$.$i > testfile.$$.$i mount -o remount,ro /dev/stuff/test1 /mnt su -c "/sbin/dump -0f - . 2>/dev/null | /sbin/restore -Cf -" dump mount -o remount,rw /dev/stuff/test1 /mnt i=$(($i + 1)) done I get results like this: Dump date: Fri Mar 9 16:31:53 2007 Dumped from: the epoch Level 0 dump of /mnt on wraith:/dev/mapper/stuff-test1 Label: none filesys = /mnt expected next file 32769, got 460 expected next file 32769, got 461 expected next file 32769, got 462 Some files were modified! Each loop adds another unexpected file to the list, and running blockdev --flushbufs wipes the list out. If I actually restore the dump, the new files are missing from it. -- Ben Harris, University of Cambridge Computing Service. Tel: (01223) 334728 |
|
From: Stelian P. <st...@po...> - 2007-03-12 11:14:15
|
Le vendredi 09 mars 2007 à 17:19 +0000, Ben Harris a écrit :
> > Dump does anyway need to have enough priviledges to access the raw block
> > device directly, and I would have expected BLKFLSBUF to work in these
> > conditions.
>
> It doesn't. BLKFLSBUF requires CAP_SYS_ADMIN, which isn't required just
> to open the device for reading. See linux/block/ioctl.c::blkdev_ioctl().
Right. Unfortunate but correct.
> >
> > How can this be ? The remount process explicitely flushes the data to
> > the disk, and in R/O mode no further modifications are allowed.
>
> Actually, I was mistaken -- using a r/o mount gives me a different failure
> mode. Using this test script:
[...]
> Each loop adds another unexpected file to the list, and running blockdev
> --flushbufs wipes the list out. If I actually restore the dump, the new
> files are missing from it.
You're correct, I reproduced this here using a simple - and small - loop
mounted filesystem. I never saw this before because I always do the
dumps as root - so BLKFLSBUF works.
However, I am not sure if this is the intented behaviour ("mounting r/o
means all further writings are disallowed, but you still need to
manually flush the buffers if you want to make sure the data reached the
disk" or a genuine bug in the kernel block layer.
Stelian.
--
Stelian Pop <st...@po...>
|