contains a file system with errors

Help
2008-08-28
2012-11-28
  • Barry Wright
    Barry Wright
    2008-08-28

    One of our oracle databases crashed while maintenance was being carried out on the application that connects to it.
    As we have recently been getting "oracle logical block errors" for one dbf's on the SAN (EVA4000) partition (listed in fstab as sda1) I took the opportunity to fsck it while it was unmounted.
    The application server's OS is RedHat Linux Server 5.

    fsck /dev/sda1
    fsck 1.39 (29-May-2006)
    e2fsck 1.39 (29-May-2006)
    /dev/sda1 contains a file system with errors, check forced.
    Pass 1: Checking inodes, blocks, and sizes
    Pass 2: Checking directory structure
    Pass 3: Checking directory connectivity
    Pass 4: Checking reference counts
    Pass 5: Checking group summary information
    /dev/sda1: 38/6553600 files (36.8% non-contiguous), 2786445/13107199 blocks

    Now to the question:
    Can anybody please tell me (or direct me to documentation) that defines what "contains a file system with errors" covers.

    Thanks

     
    • Theodore Ts'o
      Theodore Ts'o
      2008-08-29

      What probably happenned is that a disk buffer got corrupted, either in memory, or while it was being transferred from disk to memory.  The kernel noticed the problem, and so it set the EXT2_ERROR_FS flag in the superblock.  If the filesystem wasn't where /var is located and/or your kernel wasn't configured to force a reboot or remount the filesystem read/only there may be an entry in your system's log file regarding the error that your system found.

       
      • Barry Wright
        Barry Wright
        2008-08-29

        Thanks for taking the time to reply.

        Barry.

         
      • Theodore Ts'o
        Theodore Ts'o
        2008-08-29

        Oh, let me clarify one thing.  The reason why I suspect this is what happened (in case it wasn't obvious) was because e2fsck didn't find any errors, but in order for the EXT3_ERROR_FS to have been set, the kernel must have found some filesystem inconsistency that was so obvious that it doesn't require looking at large portions of the filesystem at the same time.

        So for example, if you are deleting a file, and the kernel is freeing blocks, and it finds that the blocks are already marked as freed (for example, if the block bitmap was corrupted while it was being transferred from disk to memory), it will log that fact, set EXT3_ERROR_FS, and either (a) continue, (b) panic, or (c) remount the filesystem read-only.   (a) is the default, but I have had people who have argued that (b) or (c) should be the default, since if the filesystem is damaged, forcing a reboot (and letting a hot standby take over), or remounting read-only (to avoid the filesystem damage from spreading causing more data loss) might be the better choice.