e2fsck and corruption in the journal (ext3)

Anonymous
2011-02-16
2012-11-28

  • Anonymous
    2011-02-16

    Hi,

    I recently had a problem with an ext3 file system being corrupted (the root file system, of course). On reboot, fsck cleared up the problems and I continued merrily, but later the fs would corrupt again (typically after an hour or so of fairly intensive I/O), forcing it to remount r/o. After a couple of repeats I managed to track it down to a problem with the ext3 journal itself… see the following extract from the syslog showing the error that caused the fs to switch to readonly mode

      Feb 15 10:47:20 linux2 kernel: [ 3880.545907] journal_bmap: journal block not found at offset 29115 on sda2
      Feb 15 10:47:20 linux2 kernel: [ 3880.545912] Aborting journal on device sda2.
      Feb 15 10:47:20 linux2 syslog-ng[6408]: I/O error occurred while writing; fd='7', error='Read-only file system (30)'
      Feb 15 10:47:20 linux2 kernel: [ 3880.546296] ext3_abort called.
      Feb 15 10:47:20 linux2 kernel: [ 3880.546299] EXT3-fs error (device sda2): ext3_journal_start_sb: Detected aborted journal
      Feb 15 10:47:20 linux2 kernel: [ 3880.546303] Remounting filesystem read-only
      Feb 15 10:47:20 linux2 syslog-ng[6408]: Suspending write operation because of an I/O error; fd='7', time_reopen='60'
    

    I'm pretty sure this wasn't a bad sector, as (initially suspecting the disk itself to be faulty and the the cause of repeated corruptions) I'd used ddrescue to image the entire disk and this was now happening on a copy of the original disk.

    Anyway, the cure seemed to be to use tune2fs to remove the journal ("-O ^has_journal") , and then create a new journal ("-j"). This seemed to fix the problem, but I was wondering, if my layman's diagnosis is vaguely accurate, if perhaps e2fsck shouldn't, after using the journal to fix any issues, then go on to check all the journal space itself (however it's stored/allocated), or maybe it should, on finding and fixing any error, always drop and recreate the journal just to be sure. The docs seem to imply that it will use an ext3 journal to speed up the repair compared to ext2, but doesn't then say anything about fixing the journal itself.

    I see ext4 uses checksums on the journal, so maybe this isn't such an issue with the newer fs, but thought I'd ask anyway.

    Apologies if my simplistic understanding is wildly inaccurate.

    Regards

    • Tim