Thread: [Jfs-discussion] running complete filesystem check with 100.000s of files
Brought to you by:
blaschke-oss,
shaggyk
From: Per J. <pe...@co...> - 2008-04-29 07:37:26
|
For the last 12 hours, I've had a full fsck running on a 100Gb filesystem - probably with a few hundred thousand files. The large majority are less than 100K. It seems to be taking forever and it's finding lots of problems like these: Inode F3338155 has references to cross linked blocks. File system object FF3338155 has corrupt data (39). Duplicate reference to 1 block(s) beginning at offset 12398768 found in file system object FF3338156. Inode F3338156 has references to cross linked blocks. File system object FF3338156 has corrupt data (39). Duplicate reference to 1 block(s) beginning at offset 12398772 found in file system object FF3338159. Inode F3338159 has references to cross linked blocks. File system object FF3338159 has corrupt data (39). Duplicate reference to 1 block(s) beginning at offset 12398775 found in file system object FF3338160. Inode F3338160 has references to cross linked blocks. File system object FF3338160 has corrupt data (39). Duplicate reference to 1 block(s) beginning at offset 12398618 found in file system object FF3338163. Is there _any_ way of guesstimating a time of competion? /Per Jessen, Zürich |
From: Per J. <pe...@co...> - 2008-04-29 12:29:58
|
Per Jessen wrote: > For the last 12 hours, I've had a full fsck running on a 100Gb > filesystem - probably with a few hundred thousand files. The large > majority are less than 100K. Now 18 hours and counting. I'm somewhat worried about the many messages I'm seeing: > Inode F3338155 has references to cross linked blocks. > File system object FF3338155 has corrupt data (39). > Duplicate reference to 1 block(s) beginning at offset 12398768 found > in file system object FF3338156. > Inode F3338156 has references to cross linked blocks. > File system object FF3338156 has corrupt data (39). > Duplicate reference to 1 block(s) beginning at offset 12398772 found > in file system object FF3338159. but I'm much more worried about the time it's taking. We're only talking about 90Gb ... /Per Jessen, Zürich |
From: Dave K. <sh...@li...> - 2008-04-29 12:49:07
|
On Tue, 2008-04-29 at 14:29 +0200, Per Jessen wrote: > Per Jessen wrote: > > > For the last 12 hours, I've had a full fsck running on a 100Gb > > filesystem - probably with a few hundred thousand files. The large > > majority are less than 100K. > > Now 18 hours and counting. I'm somewhat worried about the many messages > I'm seeing: Ouch. How many? hundreds? thousands? You'll likely lose all the files that are found to have cross-linked blocks. > > Inode F3338155 has references to cross linked blocks. > > File system object FF3338155 has corrupt data (39). > > Duplicate reference to 1 block(s) beginning at offset 12398768 found > > in file system object FF3338156. > > Inode F3338156 has references to cross linked blocks. > > File system object FF3338156 has corrupt data (39). > > Duplicate reference to 1 block(s) beginning at offset 12398772 found > > in file system object FF3338159. > > but I'm much more worried about the time it's taking. We're only > talking about 90Gb ... I really don't have an estimate. Years ago, this processing was even slower, but I guess it can still be pretty horrible. Fortunately, it only kicks in rarely. I don't know what could have caused the problem. Cross-linked blocks are blocks that more than one file claim. > /Per Jessen, Zürich Shaggy -- David Kleikamp IBM Linux Technology Center |
From: Per J. <pe...@co...> - 2008-04-29 13:08:38
|
Dave Kleikamp wrote: > Ouch. How many? hundreds? thousands? You'll likely lose all the > files that are found to have cross-linked blocks. By now I would say a thousand easily. The vast majority of the files are old and/or throw-away, and I should have a backup those that aren't. >> >> but I'm much more worried about the time it's taking. We're only >> talking about 90Gb ... > > I really don't have an estimate. Years ago, this processing was even > slower, but I guess it can still be pretty horrible. Fortunately, it > only kicks in rarely. I don't know what could have caused the > problem. Cross-linked blocks are blocks that more than one file claim. Fortunately, no customer has complained yet, but someone will if it goes on for another 12-15 hours. I really do not want to have a 2nd day of this tomorrow ... /Per Jessen, Zürich |
From: Per J. <pe...@co...> - 2008-04-30 05:49:12
|
Per Jessen wrote: > Fortunately, no customer has complained yet, but someone will if it > goes on for another 12-15 hours. I really do not want to have a 2nd > day of this tomorrow ... Well, looks like that was wishful thinking. Now 35 hours and counting. Recent output is stuff like this: Duplicate reference to 2 block(s) beginning at offset 13952656 found in file system object DF192093. Duplicate reference to 13 block(s) beginning at offset 13952674 found in file system object DF192093. Duplicate reference to 80 block(s) beginning at offset 13952688 found in file system object DF192093. Duplicate reference to 7 block(s) beginning at offset 13952789 found in file system object DF192093. Duplicate reference to 6578 block(s) beginning at offset 13952797 found in file system object DF192093. Duplicate reference to 6579 block(s) beginning at offset 13952796 found in file system object DF192093. Fortunately, most people will be off work the next 4 days, so in about 12 hours I'll probably start rebuilding/recreating this system. It has got to be working again by Monday. Shaggy, any idea what could possibly have caused such a mess?? This is a old(ish) SMP system, running 2.4.33, jfsutils 1.1.7. I tried upgrading to 1.1.11, but had to back down to 1.1.7 as the new utils refused to do an fsck. The filesystem is about 140Gb in total of which 90Gb is in use. It's backed by a software RAID5. I'm guessing the filesystem probably had some 500.000 files with up to maybe 40,000 in some directories, but generally less. The system was generally very busy storing/processing new files (24h/day). /Per Jessen, Zürich |
From: <pg...@jf...> - 2008-04-30 15:01:33
|
[ ... ] >> Fortunately, no customer has complained yet, This and "very busy storing/processing new files (24h/day)" later seem to describe a fairly critical system with somewhat high availability requirements. >> but someone will if it goes on for another 12-15 hours. I >> really do not want to have a 2nd day of this tomorrow ... > Well, looks like that was wishful thinking. Indeed, and if one has availability constraints, relying on 'fsck' being quick is equally unrealistic. > Now 35 hours and counting. The time taken to do a deep check of entangled filesystems can be long. For an 'ext3' filesystem it was 75 days, and there are other interesting reports of long 'fsck' times: http://www.sabi.co.uk/blog/anno05-4th.html#051009 http://www.sabi.co.uk/blog/anno05-4th.html#051108 http://www.sabi.co.uk/blog/0802feb.html#080210 My impression is that JFS has a much better 'fsck' than 'ext3', but I haven't found (even on this mailing list) many reports of 'fsck' durations for JFS, and my own filesystems are rather small like yours (a few hundred thousand files, a few hundred GB of data), and 'fsck' takes a few minutes on undamaged or mostly OK filesystems. Anyhow the high bounds on 'fsck' times and space are a well known problems, especially for multi-TB filesystems, and this are some of the most recent news for a couple of other filesystems: http://kerneltrap.org/Linux/Improving_fsck_Speeds_in_ext4 http://oss.sgi.com/archives/xfs/2008-01/msg00187.html > Recent output is stuff like this: [ ... shared blocks ... ] > [ ... ] what could possibly have caused such a mess?? A very optimistic sysadm? :-) > This is a old(ish) SMP system, running 2.4.33, [ ... ] My impression is that both SMP and JFS in 2.4.33 are not as well tested as in 2.6, as there have been some important bug fixes in the 2.6 series that probably apply very much to high load systems, especially for SMP. Using a kernel that old means accepting whatever issues it has and hoping that they don't affect your load. Anyhow in my experience most events like the above are caused by hardware issues, more than old old bugs remaining unfixed in the SMP or JFS code of old kernels. Even a single bit error in RAM or a single block error during IO can have devastating effects. Never mind firmware or other errors. Consider for example this interesting report on IO "silent corruption" from a largish installation with a lot of experience: https://indico.desy.de/contributionDisplay.py?contribId=65&sessionId=42&confId=257 and their subsquent update: http://indico.fnal.gov/contributionDisplay.py?contribId=44&sessionId=15&confId=805 System integration and qualification is a very difficult and expensive activity... > [ ... ] The filesystem is about 140Gb in total of which 90Gb > is in use. It's backed by a software RAID5. As you now have duscovered it would have been much quicker to restore it from backups ('-o nointegrity' would have made it even faster). That's a way of doing 'fsck' that is often faster than 'fsck', because it relies largely on straightforward sequential accesses, while 'fsck' relies a lot on random accesses and somewhat hairy algorithms. > I'm guessing the filesystem probably had some 500.000 files > with up to maybe 40,000 in some directories, That's generally unwise, but the real problems are the overlapping allocations because then 'fsck' must check everything against everything. > but generally less. The system was generally very busy > storing/processing new files (24h/day). |
From: Per J. <pe...@co...> - 2008-04-30 05:53:31
|
Per Jessen wrote: > Fortunately, no customer has complained yet, but someone will if it > goes on for another 12-15 hours. I really do not want to have a 2nd > day of this tomorrow ... Well, looks like that was wishful thinking. Now 34 hours and counting. Recent output is stuff like this: Duplicate reference to 2 block(s) beginning at offset 13952656 found in file system object DF192093. Duplicate reference to 13 block(s) beginning at offset 13952674 found in file system object DF192093. Duplicate reference to 80 block(s) beginning at offset 13952688 found in file system object DF192093. Duplicate reference to 7 block(s) beginning at offset 13952789 found in file system object DF192093. Duplicate reference to 6578 block(s) beginning at offset 13952797 found in file system object DF192093. Duplicate reference to 6579 block(s) beginning at offset 13952796 found in file system object DF192093. Fortunately, most people will be off work the next 4 days, so in about 12 hours I'll probably start rebuilding/recreating this system. It has got to be working again by Monday. Still - Dave, any idea what could possibly have caused such a mess?? This is a old(ish) SMP system, running 2.4.33, jfsutils 1.1.7. I tried upgrading to 1.1.11, but had to back down to 1.1.7 as the new utils refused to do an fsck. The filesystem is about 140Gb in total of which 90Gb is used. It's backed by a softwarea RAID5 software I'm guessing the filesystem probably had some 500.000 files, with up to maybe 40,000 in some directories. The system was generally very busy storing new files (24h/day). /Per Jessen, Zürich |
From: Dave K. <sh...@li...> - 2008-04-30 13:28:53
|
On Wed, 2008-04-30 at 07:53 +0200, Per Jessen wrote: > Per Jessen wrote: > > > Fortunately, no customer has complained yet, but someone will if it > > goes on for another 12-15 hours. I really do not want to have a 2nd > > day of this tomorrow ... > > Well, looks like that was wishful thinking. Now 34 hours and counting. > > Recent output is stuff like this: > > Duplicate reference to 2 block(s) beginning at offset 13952656 found in > file system object DF192093. > Duplicate reference to 13 block(s) beginning at offset 13952674 found in > file system object DF192093. > Duplicate reference to 80 block(s) beginning at offset 13952688 found in > file system object DF192093. > Duplicate reference to 7 block(s) beginning at offset 13952789 found in > file system object DF192093. > Duplicate reference to 6578 block(s) beginning at offset 13952797 found > in file system object DF192093. > Duplicate reference to 6579 block(s) beginning at offset 13952796 found > in file system object DF192093. > > > Fortunately, most people will be off work the next 4 days, so in about > 12 hours I'll probably start rebuilding/recreating this system. It has > got to be working again by Monday. > > Still - Dave, any idea what could possibly have caused such a mess?? > This is a old(ish) SMP system, running 2.4.33, jfsutils 1.1.7. Wow. That is pretty old. I've pretty much forgotten about the 2.4 kernel. There have been a lot of bug fixes since then, but I wouldn't know off the top of my head anything specific that would explain this. > I tried > upgrading to 1.1.11, but had to back down to 1.1.7 as the new utils > refused to do an fsck. What error did you get? There's no reason 1.1.11 should have failed. > The filesystem is about 140Gb in total of which > 90Gb is used. It's backed by a softwarea RAID5 software I'm guessing > the filesystem probably had some 500.000 files, with up to maybe 40,000 > in some directories. The system was generally very busy storing new > files (24h/day). Do you have any plans to upgrade to a newer distribution? JFS has gotten a lot more stable in the 2.6 kernel than it was back in 2.4. I'm pretty impressed that it's been holding up this long under such a high load. Shaggy -- David Kleikamp IBM Linux Technology Center |
From: Christian K. <li...@ne...> - 2008-04-30 12:53:30
|
On Wed, April 30, 2008 07:53, Per Jessen wrote: > This is a old(ish) SMP system, running 2.4.33, jfsutils 1.1.7. I tried > upgrading to 1.1.11, but had to back down to 1.1.7 as the new utils > refused to do an fsck. Hm, I would've assumed current jfsutils would run no matter what the kernel version was or how old the system is. What was the error message when jfs_fsck refused to run? > The filesystem is about 140Gb in total of which 90Gb is > used. It's backed by a softwarea RAID5 software I'm guessing the Hm, did you try to boot off a rescue-CD [0] with more current jfsutils/kernel? Not that I know why this would help, but when doing fsck I tend to use the latest and the greatest fsck tools. Christian. [0] http://grml.org/download/ (comes with jfsutils-1.1.11-1 and kernel v2.6.23) -- make bzImage, not war |
From: Per J. <pe...@co...> - 2008-04-30 13:18:19
|
Christian Kujau wrote: > On Wed, April 30, 2008 07:53, Per Jessen wrote: >> This is a old(ish) SMP system, running 2.4.33, jfsutils 1.1.7. I tried >> upgrading to 1.1.11, but had to back down to 1.1.7 as the new utils >> refused to do an fsck. > > Hm, I would've assumed current jfsutils would run no matter what the > kernel version was or how old the system is. What was the error message > when jfs_fsck refused to run? I'm not sure, I think it complained about the superblock. >> The filesystem is about 140Gb in total of which 90Gb is >> used. It's backed by a softwarea RAID5 software I'm guessing the > > Hm, did you try to boot off a rescue-CD [0] with more current > jfsutils/kernel? > Not that I know why this would help, but when doing fsck I tend to use the > latest and the greatest fsck tools. Yeah, I did boot a opensuse 10.2 system, which I think is how I noticed the problem with jfsutils 1.1.11. Btw, the fsck is still running, but at least it doesn't seem to have found any errors since early this morning. /Per |
From: Per J. <pe...@co...> - 2008-04-30 14:41:16
|
Dave Kleikamp wrote: > There have been a lot of bug fixes since then, but I wouldn't > know off the top of my head anything specific that would explain this. > >> I tried upgrading to 1.1.11, but had to back down to 1.1.7 as the new >> utils refused to do an fsck. > > What error did you get? There's no reason 1.1.11 should have failed. I'm pretty certain it said something about the superblock, but I didn't take note. I can probably reproduce it, but is there any point? > Do you have any plans to upgrade to a newer distribution? JFS has > gotten a lot more stable in the 2.6 kernel than it was back in 2.4. Yep, I've been preparing a new system since this morning. Latest 2.6 kernel. The fsck is still running, but I've been able to copy the key files to the new system, later I hope to be able to recover as much as possible of the data. > I'm pretty impressed that it's been holding up this long under such a > high load. Well, looks like it wasn't holding up all that well ... /Per Jessen, Zürich |
From: Dave K. <sh...@li...> - 2008-04-30 14:58:01
|
On Wed, 2008-04-30 at 16:40 +0200, Per Jessen wrote: > Dave Kleikamp wrote: > > > There have been a lot of bug fixes since then, but I wouldn't > > know off the top of my head anything specific that would explain this. > > > >> I tried upgrading to 1.1.11, but had to back down to 1.1.7 as the new > >> utils refused to do an fsck. > > > > What error did you get? There's no reason 1.1.11 should have failed. > > I'm pretty certain it said something about the superblock, but I didn't > take note. I can probably reproduce it, but is there any point? Not if you're moving up to a new system. I'm a bit curious though. > > Do you have any plans to upgrade to a newer distribution? JFS has > > gotten a lot more stable in the 2.6 kernel than it was back in 2.4. > > Yep, I've been preparing a new system since this morning. Latest 2.6 > kernel. > The fsck is still running, but I've been able to copy the key files to > the new system, later I hope to be able to recover as much as possible > of the data. > > > I'm pretty impressed that it's been holding up this long under such a > > high load. > > Well, looks like it wasn't holding up all that well ... I think you'll have better luck on a modern kernel. I trust you'll let me know if any new problems show up. Thanks, Shaggy -- David Kleikamp IBM Linux Technology Center |
From: Per J. <pe...@co...> - 2008-04-30 15:09:19
|
Dave Kleikamp wrote: >> I'm pretty certain it said something about the superblock, but I >> didn't take note. I can probably reproduce it, but is there any >> point? > > Not if you're moving up to a new system. I'm a bit curious though. I'll try it once I've the new system up and running. /Per Jessen, Zürich |
From: Per J. <pe...@co...> - 2008-04-30 15:32:27
|
Peter Grandi wrote: > This and "very busy storing/processing new files (24h/day)" later > seem to describe a fairly critical system with somewhat high > availability requirements. Fairly high, yes. It has now been down for almost 48hours, which is probably just about as far as I can let it go. We've already promised our customers it will be back up Friday morning. Tomorrow is a holiday here, very fortunate. >>> but someone will if it goes on for another 12-15 hours. I >>> really do not want to have a 2nd day of this tomorrow ... > >> Well, looks like that was wishful thinking. > > Indeed, and if one has availability constraints, relying on > 'fsck' being quick is equally unrealistic. That's an interesting comment - I guess I _have_ been relying on 1) the system only rarely needing a reboot and 2) a fast fsck when it happens. Do you have any insights to share wrt availability, large filesystems (up to 1Tb in our case) and millions of files? (apart from "don't do it" :-) >> Now 35 hours and counting. > > The time taken to do a deep check of entangled filesystems can > be long. For an 'ext3' filesystem it was 75 days, and there are > other interesting reports of long 'fsck' times: Uh oh. I guess I'd better move ahead with my new system, and hope to migrate whatever I can later on. > but I haven't found (even on this mailing list) many reports of > 'fsck' durations for JFS, and my own filesystems are rather small > like yours (a few hundred thousand files, a few hundred GB of > data), and 'fsck' takes a few minutes on undamaged or mostly OK > filesystems. That has been my experience too - right up until 28 April at around 20:00. :-( /Per Jessen, Zürich |
From: Dave K. <sh...@li...> - 2008-04-30 17:27:58
|
On Wed, 2008-04-30 at 17:32 +0200, Per Jessen wrote: > Peter Grandi wrote: > > > This and "very busy storing/processing new files (24h/day)" later > > seem to describe a fairly critical system with somewhat high > > availability requirements. > > Fairly high, yes. It has now been down for almost 48hours, which is > probably just about as far as I can let it go. We've already promised > our customers it will be back up Friday morning. Tomorrow is a holiday > here, very fortunate. > > >>> but someone will if it goes on for another 12-15 hours. I > >>> really do not want to have a 2nd day of this tomorrow ... > > > >> Well, looks like that was wishful thinking. > > > > Indeed, and if one has availability constraints, relying on > > 'fsck' being quick is equally unrealistic. > > That's an interesting comment - I guess I _have_ been relying on 1) the > system only rarely needing a reboot and 2) a fast fsck when it happens. JFS is, of course, designed so that under normal circumstances, fsck only replays the journal, which is very fast. When something bad happens, and it has to do the full processing, it isn't necessarily going to be fast. > > Do you have any insights to share wrt availability, large filesystems > (up to 1Tb in our case) and millions of files? (apart from "don't do > it" :-) JFS's fsck time is basically tied to the number of inodes. I don't have numbers to give you, but a huge, nearly-empty file system won't take too much time to check, but one with millions of inodes may take a long time. Worst case (other than a fatal error that fsck can't recover from) is when cross-linked blocks are detected, and it has to do the pass that is causing you so much delay. It used to be MUCH worse before jfsutils-1.1.5, if you can believe it. > >> Now 35 hours and counting. > > > > The time taken to do a deep check of entangled filesystems can > > be long. For an 'ext3' filesystem it was 75 days, and there are > > other interesting reports of long 'fsck' times: I doubt it gets as bad as that, but again, I have no idea how much longer it will take. > Uh oh. I guess I'd better move ahead with my new system, and hope to > migrate whatever I can later on. > > > but I haven't found (even on this mailing list) many reports of > > 'fsck' durations for JFS, and my own filesystems are rather small > > like yours (a few hundred thousand files, a few hundred GB of > > data), and 'fsck' takes a few minutes on undamaged or mostly OK > > filesystems. > > That has been my experience too - right up until 28 April at around > 20:00. :-( > > > /Per Jessen, Zürich -- David Kleikamp IBM Linux Technology Center |
From: <pg...@jf...> - 2008-04-30 21:52:09
|
[ ... ] >> This and "very busy storing/processing new files (24h/day)" >> later seem to describe a fairly critical system with somewhat >> high availability requirements. [ ... ] >> Indeed, and if one has availability constraints, relying on >> 'fsck' being quick is equally unrealistic. > That's an interesting comment - I guess I _have_ been relying > on 1) the system only rarely needing a reboot and 2) a fast > fsck when it happens. Plenty of people do that, and then bad news do happen. I was some time ago at a workshop about large scale system admin at big national research labs (CERN and so on) and I asked almost every speaker what they were doing about filesystem checking times, and some seemed to be unaware of the issue. The main driver of the issue is that thanks to RAID of various sorts it is easy to scale up capacity and read or write accesses, but 'fsck' does not take advantage of the multiple spindles in RAID because it is serial. > Do you have any insights to share wrt availability, large > filesystems (up to 1Tb in our case) and millions of files? > (apart from "don't do it" :-) >From other things you have written it looks like that you use the filesystem as a structured database. "don't do it" :-) Usually it is better to use a database manager instead if you want to store many records, instead of a filesystem. However filesystems can grow in their own way, without remotely looking like structured databases. For example a 200TB repository with 100M files. As to that, in general the only way I can see now to do that is via clusters of very many smaller filesystems, each of which can be either repaired or restored from backup pretty quickly, which means 1-4TB and hundreds of thousands of inodes. Some notes I have written on the subject: http://www.sabi.co.uk/blog/0804apr.html#080417 http://www.sabi.co.uk/blog/0804apr.html#080407 [ ... ] >> The time taken to do a deep check of entangled filesystems can >> be long. For an 'ext3' filesystem it was 75 days, and there are >> other interesting reports of long 'fsck' times: > Uh oh. I guess I'd better move ahead with my new system, and > hope to migrate whatever I can later on. In your case for up to 1TB the best stragtegy is probably frequent backups and then in case of trouble a quick restore copying back a the whole disk using 'dd'. With FW800 or eSATA I get around 50MB sustained average (better with O_DIRECT and large block sizes, which I now prefer) when duplicating modern cheap 500GB drives: http://www.sabi.co.uk/blog/0705may.html#070505 Of course if you have a "warm" backup you can just swap in the backup drive and do an offline 'fsck' if really necessary on the swapped out damaged filesystem. There are several options that are advisable depending on circumstances. >> but I haven't found (even on this mailing list) many reports of >> 'fsck' durations for JFS, and my own filesystems are rather small >> like yours (a few hundred thousand files, a few hundred GB of >> data), and 'fsck' takes a few minutes on undamaged or mostly OK >> filesystems. > That has been my experience too - right up until 28 April at around > 20:00. :-( That's because even 'jfs_fsck -f' is quite quick on clean filesystems, the problem is deep scans on messed up filesystems. Some numbers for clean filesystems: -------------------------------------------------------------- # sysctl vm/drop_caches=3; time jfs_fsck -f /dev/sda8 vm.drop_caches = 3 jfs_fsck version 1.1.12, 24-Aug-2007 processing started: 4/30/2008 21.52.28 The current device is: /dev/sda8 Block size in bytes: 4096 Filesystem size in blocks: 61046992 **Phase 0 - Replay Journal Log **Phase 1 - Check Blocks, Files/Directories, and Directory Entries **Phase 2 - Count links **Phase 3 - Duplicate Block Rescan and Directory Connectedness **Phase 4 - Report Problems **Phase 5 - Check Connectivity **Phase 6 - Perform Approved Corrections **Phase 7 - Rebuild File/Directory Allocation Maps **Phase 8 - Rebuild Disk Allocation Maps 244187968 kilobytes total disk space. 27013 kilobytes in 8242 directories. 173119730 kilobytes in 118301 user files. 10896 kilobytes in extended attributes 128391 kilobytes reserved for system use. 70955964 kilobytes are available for use. Filesystem is clean. real 1m2.159s user 0m1.530s sys 0m1.630s -------------------------------------------------------------- That's on a 2004 class desktop machine and it is doing almost 2,000 inodes/s; I get similar results this is instead for a contemporary chunky server on an 8 drive RAID10: -------------------------------------------------------------- # sysctl vm/drop_caches=2; time jfs_fsck -f /dev/md0 vm.drop_caches = 2 jfs_fsck version 1.1.12, 24-Aug-2007 processing started: 4/30/2008 21.58.17 The current device is: /dev/md0 Block size in bytes: 4096 Filesystem size in blocks: 390070208 **Phase 0 - Replay Journal Log **Phase 1 - Check Blocks, Files/Directories, and Directory Entries **Phase 2 - Count links **Phase 3 - Duplicate Block Rescan and Directory Connectedness **Phase 4 - Report Problems **Phase 5 - Check Connectivity **Phase 6 - Perform Approved Corrections **Phase 7 - Rebuild File/Directory Allocation Maps **Phase 8 - Rebuild Disk Allocation Maps 1560280832 kilobytes total disk space. 108545 kilobytes in 82098 directories. 17122632 kilobytes in 251457 user files. 100 kilobytes in extended attributes 496649 kilobytes reserved for system use. 1542769996 kilobytes are available for use. Filesystem is clean. real 0m55.271s user 0m2.428s sys 0m4.127s -------------------------------------------------------------- That's a large filesystem mostly empty because I use that for testing and the particular test was a lot of very small files and yet it does 4-5,000 inodes/s. A million inodes? Probably 4-5 minutes. But it is not a deep scan over a messed up filesystem. |
From: Per J. <pe...@co...> - 2008-04-30 18:22:37
|
Per Jessen wrote: >> Do you have any plans to upgrade to a newer distribution? JFS has >> gotten a lot more stable in the 2.6 kernel than it was back in 2.4. > > Yep, I've been preparing a new system since this morning. Latest 2.6 > kernel. The fsck is still running, but I've been able to copy the key > files to the new system, later I hope to be able to recover as much as > possible of the data. On the topic of recovering files - the toplevel directory has about 300 subdirectories, each with 5-6 subdirs, of which one with 3 subdirs. Initially I was able to recover all the files of the toplevel directory, plus the 300 subdirs and any files in those. I'm now working on the subdirs of the 300 toplevel subdirs. This is where the vast majority of the files are stored. I'm using rsync to copy each of the 300 toplevel subdirs individually to keep an eye on what's being copied and what's not. I'm seeing quite a few errors such as: rsync: readlink "<subdir>/reports/nov2005.1.email" failed: Permission denied (13) rsync: readdir("<subdir>/quarantined/summary"): Input/output error (5) Other subdirs report no errors at all. I see "permission denied" on files, and the "input/output error" on directories. Could these errors somehow be construed to be an indication of "how far" the fsck is? Or which subdirs have been "marked" clean? I'm grasping at straws here, I know. /Per Jessen, Zürich |