On Wed, 2008-04-30 at 17:32 +0200, Per Jessen wrote:
> Peter Grandi wrote:
> > This and "very busy storing/processing new files (24h/day)" later
> > seem to describe a fairly critical system with somewhat high
> > availability requirements.
> Fairly high, yes. It has now been down for almost 48hours, which is
> probably just about as far as I can let it go. We've already promised
> our customers it will be back up Friday morning. Tomorrow is a holiday
> here, very fortunate.
> >>> but someone will if it goes on for another 12-15 hours. I
> >>> really do not want to have a 2nd day of this tomorrow ...
> >> Well, looks like that was wishful thinking.
> > Indeed, and if one has availability constraints, relying on
> > 'fsck' being quick is equally unrealistic.
> That's an interesting comment - I guess I _have_ been relying on 1) the
> system only rarely needing a reboot and 2) a fast fsck when it happens.
JFS is, of course, designed so that under normal circumstances, fsck
only replays the journal, which is very fast. When something bad
happens, and it has to do the full processing, it isn't necessarily
going to be fast.
> Do you have any insights to share wrt availability, large filesystems
> (up to 1Tb in our case) and millions of files? (apart from "don't do
> it" :-)
JFS's fsck time is basically tied to the number of inodes. I don't have
numbers to give you, but a huge, nearly-empty file system won't take too
much time to check, but one with millions of inodes may take a long
time. Worst case (other than a fatal error that fsck can't recover
from) is when cross-linked blocks are detected, and it has to do the
pass that is causing you so much delay. It used to be MUCH worse before
jfsutils-1.1.5, if you can believe it.
> >> Now 35 hours and counting.
> > The time taken to do a deep check of entangled filesystems can
> > be long. For an 'ext3' filesystem it was 75 days, and there are
> > other interesting reports of long 'fsck' times:
I doubt it gets as bad as that, but again, I have no idea how much
longer it will take.
> Uh oh. I guess I'd better move ahead with my new system, and hope to
> migrate whatever I can later on.
> > but I haven't found (even on this mailing list) many reports of
> > 'fsck' durations for JFS, and my own filesystems are rather small
> > like yours (a few hundred thousand files, a few hundred GB of
> > data), and 'fsck' takes a few minutes on undamaged or mostly OK
> > filesystems.
> That has been my experience too - right up until 28 April at around
> 20:00. :-(
> /Per Jessen, Zürich
IBM Linux Technology Center