Re: [Jfs-discussion] Issue with JFS losing files
Brought to you by:
blaschke-oss,
shaggyk
From: Christian W. <c.w...@gm...> - 2014-04-23 08:57:23
|
> 1. You should not do jfs_fsck on mounted filesystems, esp. when it's > mounted rw. At least do mount -o remount,ro beforehand. Those fsck > logs are not "dependable." Thanks for your answer! I know that it is not dependable to run fsck on a mounted drive. But unfortunately it is the only possibility, since the system does not allow unmounting the drive. BUT: The error is definitly there, because: 1.) The system performs its own fsck while booting (I assume the drive is not mounted then). Thats the situation where it 'magically' deletes our recordings. 2.) We did an experiment with putting the drive in a PC once. Then it was checked with fsck (without being mounted). The results were nearly identical to that ones I sent you. In special: The error message including the 'Will release' string occured there too - and the reported file has been deleted. File system object FF4121 is linked as: /DataFiles/Tierärztin Dr. Mertens (25).rec cannot repair the data format error(s) in this file. cannot repair FF4121. Will release. Thats why I asked you, what this special error message does mean. I could imagine two (or more) possible causes: 1.) One or more disk blocks are assigned to two files (maybe due to some error in the cutting process). 2.) One or more disk blocks are assigned to one of the files, but have been (at least temporarily) marked as free blocks - and might be overwritten with other stuff now. Could one of them be correct? Or are there even more possible defects? And most important: How can I debug this? Best regards, Christian > 2. Once I had some troubles after the system switched from UP to SMP, > but now I don't have any issues with the kernel parameter > jfs.commit_threads=1. I don't know the real culprit, though. > > (for me it breaks the root filesystem and remounts it ro immediately, > so there is almost no relevant syslog left. It's so tiresome to fix > begone /dev. I think I saw ERROR: (device sda1): diRead: i_ino != > di_number or something...) > > 2014-04-18 18:16 GMT+09:00, Christian Wünsch <c.w...@gm...>: > > Hello, > > > > I thought, maybe a logfile with the jfs_fsck's error messages could be > > helpful... > > So, here are two logfiles showing the (reproducable) file system > > corruption. > > What can be the cause for that? Or how can I find it out? > > > > (Maybe it could be a good idea to print out the inode's list of > > allocated disk blocks and the list of "free" blocks. Then I could > > compare those lists to check, whether one block is assigned with to > > files or one block belonging to a file is marked as "free". > > Is there a possibility to get such a list? > > > > Best regards, > > Christian > > > > > > 2014-04-15 20:52 GMT+02:00, Christian Wünsch <c.w...@gm...>: > >> Hello again, > >> > >> I was told that there was some information missing in my last post. > >> Sorry for that! > >> > >> I want to do my very best to provide as much additional information as > >> possible: > >> > >> 1.) The devices on which the problem occurs are set-top-boxes from > >> Topfield, namly the SRP/CRP series (e.g. SRP-2401, CRP-2401) > >> > >> 2.) The system runs on a MIPS-CPU from Broadcom. CPU details: > >> system type : BCM97xxx Settop Platform > >> build target : 7405b0-topfield > >> processor : 0 > >> cpu model : BMIPS4380 V4.4 FPU V0.1 > >> cpu MHz : 402.43 > >> > >> 3.) Details about the running Linux kernel: > >> > >> cat /proc/version > >> Linux version 2.6.18-7.1 (jack@PDS) (gcc version 4.2.0 20070124 > >> (prerelease) > >> - B > >> RCM 11ts-20090508) #2 SMP Tue Feb 5 15:07:17 KST 2013 > >> > >> # uname -a > >> Linux (none) 2.6.18-7.1 #2 SMP Tue Feb 5 15:07:17 KST 2013 7405b0-smp > >> unknown > >> > >> Since the firmware is provided by the manufacturer, we will NOT be > >> able to replace the kernel or change anything within the kernel. > >> > >> But it would be great to find out, what exactly the quoted error > >> message does mean. And by what circumstances a problem like that can > >> be produced. > >> So hopefully we can find a workaround that prevents the JFS system > >> from corrupting (and therefore erasing) our recorded files... > >> > >> If you need any further information, or debugging from me, please let me > >> know! > >> > >> Best regards, > >> Christian > >> > >> > >> 2014-04-14 1:13 GMT+02:00, Christian Wünsch <c.w...@gm...>: > >>> Hi there! > >>> > >>> I have been working on a very strange JFS-related problem for more > >>> than 4 months now. I hope, you guys can help me fixing this finally... > >>> > >>> So, I am developing a video cutting tool for an embedded device > >>> (set-top-box) with Linux operationg system and JFS file system. > >>> My tool uses a quite simple cutting routine, that has been built-in > >>> into the firmware by the manufacturer. > >>> > >>> Now, on a (very small) number of users it happens, that while cutting > >>> a recorded video file, this file gets damaged in a mysterious way. All > >>> files that are damaged this way get unrevocably erased during the next > >>> execution of fsck. > >>> > >>> On suchlike damaged files, jfs_fsck regularly reports an error like this > >>> one: > >>> File system object FF4121 is linked as: /DataFiles/Tierärztin Dr. > >>> Mertens (25).rec > >>> cannot repair the data format error(s) in this file. > >>> cannot repair FF4121. Will release. > >>> > >>> ... and then it erases the file. > >>> > >>> So, could you please give me a hint, what exactly is meant by this > >>> error message? > >>> And how could such a damage possibly be caused? > >>> > >>> If you have any questions or need any further information or debugging > >>> from me, please let me know! > >>> > >>> And I am thankful for every tip or peace of advice ;-) > >>> > >>> Best regards, > >>> Christian > >>> > >>> > >>> PS: Some additional information: > >>> This post shows a logfile of jfs_fsck 1.1.15 on a (mounted, but > >>> errorfree) hard disk. > >>> http://www.topfield-europe.com/forum/showthread.php?p=988963 > >>> Is it normal to get such a bunch of error messages? Or are there some > >>> critical ones in there? > >>> > >> > > |