[Jfs-discussion] Corruption after reboot of active NFS server (in test environment)
Brought to you by:
blaschke-oss,
shaggyk
From: Chris P. <cp...@gm...> - 2005-03-17 16:10:56
|
I've been testing a system that is hopefully to be used as a model for a couple of new nfs servers and in one test I got "XT_GETPAGE: xtree page corrupt". I was able to jfs_fsck to clean it, but the datafiles were removed as a result. What I was doing at the time was NFS exporting a 4TB volume to two systems. One was on the same subnet and one was behind a NAT box on the same subnet (a linux cluster node). Everything has gigabit ethernet. The NFS server is SLES 9 using lvm2 to merge four 1TB luns into a 4TB volume and the other systems are SuSE Pro 9.1. Both test systems were writting a 2.7GB file to the NFS export, which is exported rw,sync,no_subtree_check,no_root_squash and mounted with rw,bg,hard,intr,nfsvers=3,tcp,rsize=32768,wsize=32768. After writting for about 10s in 'init 6'ed the nfs server to reboot it. In a test with just the box on the subnet doing this the system rebooted fine and the file continued to be writting with no errors (md5sum was the same). When I had both boxes going in the second test the NFS server came up fine (fsck was clean), but shortly after resuming I started getting the "xtree page corrupt" messages in /var/log/messages (lots of them). Below are snipped logs of jfs_fsck -nv, then jfs_fsck -v, and then jfs_fsck -v again (just to make sure it didn't see anything new). jfs_fsck version 1.1.7, 22-Jul-2004 processing started: 3/17/2005 10.54.29 The current device is: /dev/g-nicfs2_s/v-nicfs2_s Open(...READONLY...) returned rc = 0 Primary superblock is valid. The type of file system for the device is JFS. Block size in bytes: 4096 Filesystem size in blocks: 1142800384 **Phase 1 - Check Blocks, Files/Directories, and Directory Entries Invalid data (43) detected in file system object FF8198. Invalid data (43) detected in file system object FF8199. **Phase 2 - Count links **Phase 3 - Duplicate Block Rescan and Directory Connectedness **Phase 4 - Report Problems File system object FF8198 is linked as: /cpenney/testfilelnx cannot repair the data format error(s) in this file. cannot repair FF8198. File system object FF8199 is linked as: /cpenney/testfilenode cannot repair the data format error(s) in this file. cannot repair FF8199. **Phase 5 - Check Connectivity **Phase 6 - Perform Approved Corrections **Phase 7 - Verify File/Directory Allocation Maps **Phase 8 - Verify Disk Allocation Maps 24 consecutive blocks observed available but pmap (4379, 251, 26) indicates they are allocated. 8 consecutive blocks observed available but pmap (4379, 252, 26) indicates they are allocated. 32 consecutive blocks observed available but pmap (4379, 253, 10) indicates they are allocated. 30 consecutive blocks observed available but pmap (4379, 255, 2) indicates they are allocated. Incorrect number free detected in dmap 4379. Incorrect internal (0) value detected in DM page 4379. Incorrect internal (4) value detected in DM page 4379. Incorrect internal (20) value detected in DM page 4379. Incorrect internal (83) value detected in DM page 4379. Incorrect internal (84) value detected in DM page 4379. ================= SNIP ===================== Incorrect number of free blocks detected in Block Map Control Page. Incorrect maximum active AGs detected in Block Map Control Page. Incorrect maxbud AG detected in Block Map Control Page. Incorrect number of free blocks in AG 2 detected in Block Map Control Page. Incorrect number of free blocks in AG 3 detected in Block Map Control Page. Incorrect number of free blocks in AG 4 detected in Block Map Control Page. Incorrect number of free blocks in AG 5 detected in Block Map Control Page. Incorrect number of free blocks in AG 6 detected in Block Map Control Page. Incorrect number of free blocks in AG 43 detected in Block Map Control Page. Incorrect number of free blocks in AG 44 detected in Block Map Control Page. Incorrect number of free blocks in AG 45 detected in Block Map Control Page. Incorrect number of free blocks in AG 46 detected in Block Map Control Page. Incorrect number of free blocks in AG 47 detected in Block Map Control Page. Incorrect number of free blocks in AG 48 detected in Block Map Control Page. Incorrect number of free blocks in AG 49 detected in Block Map Control Page. Incorrect number of free blocks in AG 50 detected in Block Map Control Page. Incorrect number of free blocks in AG 51 detected in Block Map Control Page. Incorrect number of free blocks in AG 52 detected in Block Map Control Page. Incorrect number of free blocks in AG 53 detected in Block Map Control Page. Incorrect number of free blocks in AG 54 detected in Block Map Control Page. Incorrect number of free blocks in AG 55 detected in Block Map Control Page. Incorrect number of free blocks in AG 56 detected in Block Map Control Page. Incorrect number of free blocks in AG 57 detected in Block Map Control Page. Incorrect number of free blocks in AG 58 detected in Block Map Control Page. Incorrect number of free blocks in AG 59 detected in Block Map Control Page. Incorrect number of free blocks in AG 60 detected in Block Map Control Page. Incorrect number of free blocks in AG 61 detected in Block Map Control Page. Incorrect number of free blocks in AG 62 detected in Block Map Control Page. Incorrect number of free blocks in AG 63 detected in Block Map Control Page. Incorrect number of free blocks in AG 64 detected in Block Map Control Page. Incorrect number of free blocks in AG 65 detected in Block Map Control Page. Incorrect number of free blocks in AG 66 detected in Block Map Control Page. Incorrect number of free blocks in AG 67 detected in Block Map Control Page. Descrepancies detected between observed block allocations and pmaps. Inconsistencies detected in leaf values (DM). Inconsistencies detected in internal values (DM). Incorrect data detected in pages (DM). Inconsistencies detected in leaf values (L0). Inconsistencies detected in internal values (L0). Inconsistencies detected in leaf values (L1). Inconsistencies detected in internal values (L1). Discrepancies detected in the Block Map Control Page AG free count list. Incorrect data detected in the Block Map Control Page. Incorrect data detected in disk allocation structures. Incorrect data detected in disk allocation control structures. Filesystem Summary: Blocks in use for inodes: 16 Inode count: 128 File count: 15 Directory count: 2 Block count: 1142800384 Free block count: 1136377670 4571201536 kilobytes total disk space. 1 kilobytes in 2 directories. 21951723 kilobytes in 15 user files. 0 kilobytes in extended attributes 0 kilobytes in access control lists 3739134 kilobytes reserved for system use. 4545510680 kilobytes are available for use. File system checked READ ONLY. ERRORS HAVE BEEN DETECTED. Run fsck with the -f parameter to repair. Filesystem is dirty. processing terminated: 3/17/2005 10:54:43 with return code: 0 exit code: 4. jfs_fsck version 1.1.7, 22-Jul-2004 processing started: 3/17/2005 10.55.17 Using default parameter: -p The current device is: /dev/g-nicfs2_s/v-nicfs2_s Open(...READ/WRITE EXCLUSIVE...) returned rc = 0 Primary superblock is valid. The type of file system for the device is JFS. Block size in bytes: 4096 Filesystem size in blocks: 1142800384 **Phase 0 - Replay Journal Log LOGREDO: Log already redone! logredo returned rc = 0 **Phase 1 - Check Blocks, Files/Directories, and Directory Entries Invalid data (43) detected in file system object FF8198. Invalid data (43) detected in file system object FF8199. **Phase 2 - Count links **Phase 3 - Duplicate Block Rescan and Directory Connectedness **Phase 4 - Report Problems File system object FF8198 is linked as: /cpenney/testfilelnx cannot repair the data format error(s) in this file. cannot repair FF8198. Will release. File system object FF8199 is linked as: /cpenney/testfilenode cannot repair the data format error(s) in this file. cannot repair FF8199. Will release. **Phase 5 - Check Connectivity **Phase 6 - Perform Approved Corrections Superblock marked dirty because repairs are about to be written. Directory inode F8192 entry reference to inode F8198 removed. Directory inode F8192 entry reference to inode F8199 removed. Storage allocated to inode F8198 has been cleared. Storage allocated to inode F8199 has been cleared. **Phase 7 - Rebuild File/Directory Allocation Maps **Phase 8 - Rebuild Disk Allocation Maps Filesystem Summary: Blocks in use for inodes: 16 Inode count: 128 File count: 13 Directory count: 2 Block count: 1142800384 Free block count: 1136377670 4571201536 kilobytes total disk space. 1 kilobytes in 2 directories. 21951722 kilobytes in 13 user files. 0 kilobytes in extended attributes 0 kilobytes in access control lists 3739135 kilobytes reserved for system use. 4545510680 kilobytes are available for use. Filesystem is clean. All observed inconsistencies have been repaired. Filesystem has been marked clean. **** Filesystem was modified. **** processing terminated: 3/17/2005 10:55:27 with return code: 0 exit code: 1. jfs_fsck version 1.1.7, 22-Jul-2004 processing started: 3/17/2005 10.55.40 Using default parameter: -p The current device is: /dev/g-nicfs2_s/v-nicfs2_s Open(...READ/WRITE EXCLUSIVE...) returned rc = 0 Primary superblock is valid. The type of file system for the device is JFS. Block size in bytes: 4096 Filesystem size in blocks: 1142800384 **Phase 0 - Replay Journal Log LOGREDO: Log already redone! logredo returned rc = 0 Filesystem is clean. All observed inconsistencies have been repaired. Filesystem has been marked clean. **** Filesystem was modified. **** processing terminated: 3/17/2005 10:55:40 with return code: 0 exit code: 0. |