Re: [Jfs-discussion] JFS-Questions
Brought to you by:
blaschke-oss,
shaggyk
From: Dave K. <sh...@au...> - 2003-09-22 22:26:01
|
Sorry it's taken me this long to respond. On Tue, 2003-09-16 at 22:57, mt...@am... wrote: > Hello Jfs-discussion, > > I'm using JFS for quite a while and gained some experience to > share for an estimation how critical (data-loss) the observed > problems are. > > Systems used: i386 with SuSE 7.3, 8.0, 8.1 and 8.2 > > - deleted filespace (several gigs) is not made available - > even after repetitive syncs - fsck-ing/remounting/reboot > helped... problem: it required a system-reboot because it was > the root-filesystem which was affected I introduced this problem with a performance enhancement that went a little too far in avoiding too-frequent writes to the journal. I have some ideas about how to fix it without losing the performance gain, but I haven't fixed it yet. > - partition was checked as clean on restart after crash - > none the less the filesystem had errors. it required a > manual and forced full JFS check to get rid of them. > -> how reliable is the clean/dirty detection really and what > could make it fail? The clean/dirty detection by design assumes that the partition is clean after replaying the journal unless an error is detected. Otherwise, rebooting after a power loss or crash would not be fast. However, there are several places in the code where JFS sees a problem and doesn't handle it properly. We have some work in progress to have JFS correctly mark the superblock dirty in these instances, which would force fsck to check the whole partition. > > - after a HD-sector-failure I was able to rescue most of the disk's > sectors with dd_rescue - but not all sectors could be copied correctly. > -> does JFS employ block/sector checksums in order to be able > to detect integrity-errors (or is this possibe) - I was > worried which files contained data from errorous sectors. > Fortunately the system survived the failure quite well and > no essential files seem to got damaged - nontheless I asked > myself what would happen if it went worse - how could I detect > which files were OK and which were not. No, JFS has no mechanisms in place to ensure data integrity. The main design goals were to ensure meta-data integrity allowing quick recoverability after a crash or power failure. It is recommended that important data be backed up periodically. > - a friend of mine lost all data on one parition due to > "invalid superblocks" - fsck.jfs printed this message and gave up... > great - all sectors still contained valid files, data, etc... > maybe it was some geometry offset due to changing hardware. > -> this is what worries me most - is there a practical solution > to recover data if the superblocks gets corrupted or slightly > byte-offset (+/-1 errors) > -> for example: is it possible to scan all sectors of the > partition and (at least partially) reconstruct the filesystem? > (when hex-looking at the partition-header it showed some "normal" > looking JFS signature and following blocks) JFS was designed to be able to recover from any single point of failure in the metadata. If both the primary and secondary superblocks are lost, it is probably that other important metadata are lost as well. It is unlikely that under normal circumstances, both superblocks would be lost. I am curious how this happened. > - a similar problem like the one above happened to me - a disk > which I had formatted, set up and used in a removeable bay refused > to mount when connected via external interface (firewire - but > hex-dumping of sector contents worked fine!) - the disk mounted > fine when it was placed back into the internal bay again... > really weird! (Note: /etc/fstab entries were correct :)) > Formatting the partition while hooked on firewire seems to have > solved the problem. I don't have a clue here. > - fsck once "trashed" my home-directory and all files inside (suse 8.2) > all files were renamed and dumped to lost+found... > -> I don't know how this happened, but it was quite horrible to > manually rename and move the files back! > Fortunately for me the locate database was still holding the > original names and structure - but it took quite long to > inspect the contents. > -> why the overhead of journaling when fsck won't recover > the filenames... there should be a way to maintain more > original information during fsck. fsck found something wrong in your home directory that it couldn't fix, so, rather than leaving a broken directory, it removed it and the contents were put in lost+found. Admittedly fsck should probably do a better job when it finds a problem and may possibly prevent losing the entire directory, but that's the way it works. The journaling itself is there to prevent this kind of damage from happening, which it usually does. However, there are still bugs, and other conditions that occasionally cause problems that fsck can't fix. Backing up data is always a good idea. > > Best regards! > > > > > PS - my personal jfs-feature-whishlist: > - shrinking > - built-in encryption-layer (as loop-aes would render journaling > useless - when included in the FS) Hopefully, we'll get to these, but they aren't in our short-term plans. -- David Kleikamp IBM Linux Technology Center |