Re: [Jfs-discussion] raid5 failed (power outage) and now jfs_fsck fails
Brought to you by:
blaschke-oss,
shaggyk
From: Dave K. <sh...@au...> - 2006-07-28 03:12:52
|
On Thu, 2006-07-27 at 00:43 +0300, Lasse Hynninen wrote: > After the power outage I noticed that one of the disks had lots of bad sectors. > Then I managed to remove wrong disk and boot the system -> raid > superblock was written so that the bad sectored drive was marked as > clean.. > After running some mdadm --assemble --force stuff I got it to make it > even more f**ked up. At this point I finally got new disks and made > copies. > I've tried to figure out how to copy stuff from the degraded raid5 > array, which thinks it only has one clean disk and the other is marked > as spare, even though it really is clean. > > I tried program called 'raidextract', but couldn't really figure out > how it works -- everything it produced was garbage (and it wanted 3 > disks to read from (and I only had 2)). Then I tried to figure out how > to build one, but details of raid 5 at such low level are hard to > find. > Then I found some guy who had almost exactly same problem as me, and > he had solved it by re-creating the array (overwriting the > superblocks). It worked. Mount failed as I suspected, even read-only > isn't enough. > Then fsck_jfs, it printed something, but didn't remember to write it > down/log it. In any case it failed. > Then some searching later I found that "jfs_fsck -d -n /dev/md0" could help: > ----- > jfs_fsck version 1.1.8, 03-May-2005 > processing started: 7/27/2006 3.1.35 > The current device is: /dev/md0 [xchkdsk.c:1555] > Open(...READONLY...) returned rc = 0 [fsckpfs.c:3194] > Primary superblock is valid. [fsckmeta.c:1556] > The type of file system for the device is JFS. [xchkdsk.c:1572] > Block size in bytes: 4096 [xchkdsk.c:1899] > Filesystem size in blocks: 117190080 [xchkdsk.c:1906] > **Phase 1 - Check Blocks, Files/Directories, and Directory Entries > [xchkdsk.c:2038] > Invalid data (43) detected in file system object MA16. [fsckxtre.c:1192] > Primary metadata inode A16 is corrupt. [fsckmeta.c:2414] > Invalid data (7) detected in file system object MA1. [fsckmeta.c:2458] > Invalid data (8) detected in file system object MA1. [fsckmeta.c:2466] > Invalid data (9) detected in file system object MA1. [fsckmeta.c:2474] > Invalid data (10) detected in file system object MA1. [fsckmeta.c:2482] > Invalid data (11) detected in file system object MA1. [fsckmeta.c:2493] > Invalid data (12) detected in file system object MA1. [fsckmeta.c:2501] > Invalid data (13) detected in file system object MA1. [fsckmeta.c:2509] > Secondary metadata inode A1 is corrupt. [fsckmeta.c:2558] > Invalid stamp detected in file system object MA16. [fsckmeta.c:2320] > Invalid data (1) detected in file system object MA16. [fsckmeta.c:2328] > Invalid data (2) detected in file system object MA16. [fsckmeta.c:2336] > Invalid data (2a) detected in file system object MA16. [fsckmeta.c:2344] > Invalid data (3) detected in file system object MA16. [fsckmeta.c:2355] > Invalid data (4) detected in file system object MA16. [fsckmeta.c:2363] > Invalid data (5) detected in file system object MA16. [fsckmeta.c:2371] > Invalid data (6) detected in file system object MA16. [fsckmeta.c:2379] > Secondary metadata inode A16 is corrupt. [fsckmeta.c:2418] > Errors detected in the Primary File/Directory Allocation Table. > [fsckmeta.c:1895] > Errors detected in the Secondary File/Directory Allocation Table. > [fsckmeta.c:1900] > CANNOT CONTINUE. [fsckmeta.c:1910] > processing terminated: 7/27/2006 3:01:35 with return code: -10049 > exit code: 4. [xchkdsk.c:469] > ------ > dumped the log also with jfs_fscklog, ran it through strings: > ----- > JFS chkdskSvcLog< > processing started: 7/27/2006 1.54.43 [xchkdsk.c:1480] > Using default parameter: -p [xchkdsk.c:3150] > The current device is: /dev/md0 [xchkdsk.c:1555] > 80]H > Open(...READ/WRITE EXCLUSIVE...) returned rc = 0 [fsckpfs.c:3227] > Primary superblock is valid. [fsckmeta.c:1556] > The type of file system for the device is JFS. [xchkdsk.c:1572] > Block size in bytes: 4096 [xchkdsk.c:1899] > Filesystem size in blocks: 117190080 [xchkdsk.c:1906] > **Phase 0 - Replay Journal Log [xchkdsk.c:1913] > LOGREDO: Log superblock contains invalid magic number. [logredo.c:529] > logredo failed (rc=-268). fsck continuing. [xchkdsk.c:1943] > Filesystem is dirty. [fsckmeta.c:138] > ingd > processing terminated: 7/27/2006 1:54:43 with return code: 0 exit > code: 4. [xchkdsk.c:469] > ----- > and btw, the mount: > ----- > :/# mount -t jfs -oro /dev/md0 /dump/ > jfs_mount: diMount(ipaimap2) failed, rc = -5 > Mount JFS Failure: -5 > jfs_mount failed w/return code = -5 > > mount: wrong fs type, bad option, bad superblock on /dev/md0, > missing codepage or other error > In some cases useful info is found in syslog - try > dmesg | tail or so > > ----- > > Sooo.. > I'm wondering, can I somehow fix that inode MA16. I don't think so, at least not unless you can rebuild the array differently. It looks like besides the superblock, jfs_fsck isn't recognizing anything at all. It's complaining about almost every field in that inode. > I've had the root "directory" corrupted twice (filenames were lost, > but files were intact). And that would be absolutely fantastic, since > it's quite easy to recover from. And even if some files are corrupted > it would be fine. I'm just not ready accept failure and reformat > everything. It doesn't look like there's anyway to fix the file system as it currently is. The only hope would be if the array was re-created in a different configuration, and it could be rebuilt as it was originally. I don't know raid well enough to give you any help there. > Any help is very much appreciated. > Thanks! Sorry I'm not very helpful, Shaggy -- David Kleikamp IBM Linux Technology Center |