[Jfs-discussion] raid5 failed (power outage) and now jfs_fsck fails
Brought to you by:
blaschke-oss,
shaggyk
From: Lasse H. <dom...@gm...> - 2006-07-27 14:33:58
|
After the power outage I noticed that one of the disks had lots of bad sectors. Then I managed to remove wrong disk and boot the system -> raid superblock was written so that the bad sectored drive was marked as clean.. After running some mdadm --assemble --force stuff I got it to make it even more f**ked up. At this point I finally got new disks and made copies. I've tried to figure out how to copy stuff from the degraded raid5 array, which thinks it only has one clean disk and the other is marked as spare, even though it really is clean. I tried program called 'raidextract', but couldn't really figure out how it works -- everything it produced was garbage (and it wanted 3 disks to read from (and I only had 2)). Then I tried to figure out how to build one, but details of raid 5 at such low level are hard to find. Then I found some guy who had almost exactly same problem as me, and he had solved it by re-creating the array (overwriting the superblocks). It worked. Mount failed as I suspected, even read-only isn't enough. Then fsck_jfs, it printed something, but didn't remember to write it down/log it. In any case it failed. Then some searching later I found that "jfs_fsck -d -n /dev/md0" could help: ----- jfs_fsck version 1.1.8, 03-May-2005 processing started: 7/27/2006 3.1.35 The current device is: /dev/md0 [xchkdsk.c:1555] Open(...READONLY...) returned rc = 0 [fsckpfs.c:3194] Primary superblock is valid. [fsckmeta.c:1556] The type of file system for the device is JFS. [xchkdsk.c:1572] Block size in bytes: 4096 [xchkdsk.c:1899] Filesystem size in blocks: 117190080 [xchkdsk.c:1906] **Phase 1 - Check Blocks, Files/Directories, and Directory Entries [xchkdsk.c:2038] Invalid data (43) detected in file system object MA16. [fsckxtre.c:1192] Primary metadata inode A16 is corrupt. [fsckmeta.c:2414] Invalid data (7) detected in file system object MA1. [fsckmeta.c:2458] Invalid data (8) detected in file system object MA1. [fsckmeta.c:2466] Invalid data (9) detected in file system object MA1. [fsckmeta.c:2474] Invalid data (10) detected in file system object MA1. [fsckmeta.c:2482] Invalid data (11) detected in file system object MA1. [fsckmeta.c:2493] Invalid data (12) detected in file system object MA1. [fsckmeta.c:2501] Invalid data (13) detected in file system object MA1. [fsckmeta.c:2509] Secondary metadata inode A1 is corrupt. [fsckmeta.c:2558] Invalid stamp detected in file system object MA16. [fsckmeta.c:2320] Invalid data (1) detected in file system object MA16. [fsckmeta.c:2328] Invalid data (2) detected in file system object MA16. [fsckmeta.c:2336] Invalid data (2a) detected in file system object MA16. [fsckmeta.c:2344] Invalid data (3) detected in file system object MA16. [fsckmeta.c:2355] Invalid data (4) detected in file system object MA16. [fsckmeta.c:2363] Invalid data (5) detected in file system object MA16. [fsckmeta.c:2371] Invalid data (6) detected in file system object MA16. [fsckmeta.c:2379] Secondary metadata inode A16 is corrupt. [fsckmeta.c:2418] Errors detected in the Primary File/Directory Allocation Table. [fsckmeta.c:1895] Errors detected in the Secondary File/Directory Allocation Table. [fsckmeta.c:1900] CANNOT CONTINUE. [fsckmeta.c:1910] processing terminated: 7/27/2006 3:01:35 with return code: -10049 exit code: 4. [xchkdsk.c:469] ------ dumped the log also with jfs_fscklog, ran it through strings: ----- JFS chkdskSvcLog< processing started: 7/27/2006 1.54.43 [xchkdsk.c:1480] Using default parameter: -p [xchkdsk.c:3150] The current device is: /dev/md0 [xchkdsk.c:1555] 80]H Open(...READ/WRITE EXCLUSIVE...) returned rc = 0 [fsckpfs.c:3227] Primary superblock is valid. [fsckmeta.c:1556] The type of file system for the device is JFS. [xchkdsk.c:1572] Block size in bytes: 4096 [xchkdsk.c:1899] Filesystem size in blocks: 117190080 [xchkdsk.c:1906] **Phase 0 - Replay Journal Log [xchkdsk.c:1913] LOGREDO: Log superblock contains invalid magic number. [logredo.c:529] logredo failed (rc=-268). fsck continuing. [xchkdsk.c:1943] Filesystem is dirty. [fsckmeta.c:138] ingd processing terminated: 7/27/2006 1:54:43 with return code: 0 exit code: 4. [xchkdsk.c:469] ----- and btw, the mount: ----- :/# mount -t jfs -oro /dev/md0 /dump/ jfs_mount: diMount(ipaimap2) failed, rc = -5 Mount JFS Failure: -5 jfs_mount failed w/return code = -5 mount: wrong fs type, bad option, bad superblock on /dev/md0, missing codepage or other error In some cases useful info is found in syslog - try dmesg | tail or so ----- Sooo.. I'm wondering, can I somehow fix that inode MA16. I've had the root "directory" corrupted twice (filenames were lost, but files were intact). And that would be absolutely fantastic, since it's quite easy to recover from. And even if some files are corrupted it would be fine. I'm just not ready accept failure and reformat everything. Any help is very much appreciated. Thanks! - |