Re: [Jfs-discussion] Corrupt JFS root nodes on volumes with 500+ top level directories
Brought to you by:
blaschke-oss,
shaggyk
From: Jeffrey S. <jef...@gm...> - 2010-11-17 16:43:45
|
Now this isn't a direct answer to your problem.... However you might want to try building some hierarchy to your directory tree. Something like /year/month/day. So today's directory would be /2010/11/17/. This will avoid putting many entries in one directory. As a historical note, the original UNIX file system was designed as a hierarchy so the phone book could be stored with the higher level directories having names related to the first few letters of the name being stored. -Jeff On Wed, Nov 17, 2010 at 1:12 AM, Tim Nufire <jfs...@ib...> wrote: > All, > A quick update on this issue... Our application creates 1 new top level > directory each day and after about 500 days *all* of the servers I've > checked have corrupt root nodes. Even more troubling, after we repair a > volume by running jfs_jfsck and recovering data from lost+found (see below), > the problem re-occurs after about a month of creating new directories. > However, if no new top level directories are created, and only changes lower > down in the hierarchy are made, the problem does not reoccur. > Does anyone have any theories about what is going on here? Is there anything > we can do to prevent this from happening? Would moving all the data down one > level (e.g. nested in a single root directory) help or is the root node like > any other node and 500+ nested directories at any level too much for JFS? > Because these are older machines, they are all running Debian 4 with a > backported 2.6.26 kernel.. Is there any chance upgrading to Debian 5 and a > newer kernel would help? > Thanks in advance for any help :-) > Tim > On Aug 25, 2010, at 2:16 PM, Tim Nufire wrote: > > Hello, > I've got a problem that I'm hoping someone on this list can help me with... > Read-only fsck.jfs checks on my oldest volumes are reporting an alarming > number of corrupted root nodes despite the fact that these volumes appear to > be healthy when mounted read-only. Here's the error that I'm getting... > fsck.jfs -n -v /dev/md/10 > fsck.jfs version 1.1.14, 06-Apr-2009 > processing started: 8/13/2010 10.9.6 > The current device is: /dev/md/10 > Open(...READONLY...) returned rc = 0 > Primary superblock is valid. > The type of file system for the device is JFS. > Block size in bytes: 4096 > Filesystem size in blocks: 4756914448 > **Phase 1 - Check Blocks, Files/Directories, and Directory Entries > Invalid data format detected in root directory. > CANNOT CONTINUE. > ERRORS HAVE BEEN DETECTED. Run fsck with the -f parameter to repair. > processing terminated: 8/13/2010 10:10:05 with return code: 10062 exit > code: 4. > Despite the catastrophic sounding error above, mounting the file system > read-only and listing the directory from the command-line works fine.... > ls > 20090110 20090303 20090418 20090605 20090721 20090914 20091030 > 20091215 20100130 20100317 20100502 20100617 > 20090111 20090304 20090419 20090606 20090722 20090915 20091031 > 20091216 20100131 20100318 20100503 20100618 > 20090113 20090305 20090420 20090607 20090723 20090916 20091101 > 20091217 20100201 20100319 20100504 20100619 > 20090114 20090306 20090421 20090608 20090724 20090917 20091102 > 20091218 20100202 20100320 20100505 20100620 > 20090115 20090307 20090422 20090609 20090725 20090918 20091103 > 20091219 20100203 20100321 20100506 20100622 > 20090116 20090308 20090423 20090610 20090727 20090919 20091104 > 20091220 20100204 20100322 20100507 20100623 > 20090117 20090309 20090424 20090611 20090728 20090920 20091105 > 20091221 20100205 20100323 20100508 20100624 > 20090118 20090310 20090425 20090612 20090729 20090921 20091106 > 20091222 20100206 20100324 20100509 20100625 > 20090119 20090311 20090426 20090613 20090730 20090922 20091107 > 20091223 20100207 20100325 20100510 20100626 > 20090120 20090312 20090427 20090614 20090731 20090923 20091108 > 20091224 20100208 20100326 20100511 20100627 > 20090121 20090313 20090428 20090615 20090801 20090924 20091109 > 20091225 20100209 20100327 20100512 20100628 > 20090122 20090314 20090429 20090616 20090802 20090925 20091110 > 20091226 20100210 20100328 20100513 20100629 > 20090123 20090315 20090430 20090617 20090803 20090926 20091111 > 20091227 20100211 20100329 20100514 20100630 > 20090126 20090316 20090501 20090618 20090804 20090927 20091112 > 20091228 20100212 20100330 20100515 20100701 > 20090127 20090317 20090502 20090619 20090805 20090928 20091113 > 20091229 20100213 20100331 20100516 20100702 > 20090128 20090318 20090503 20090620 20090809 20090929 20091114 > 20091230 20100214 20100401 20100517 20100703 > 20090129 20090319 20090504 20090621 20090810 20090930 20091115 > 20091231 20100215 20100402 20100518 20100704 > 20090130 20090320 20090505 20090622 20090811 20091001 20091116 > 20100101 20100216 20100403 20100519 20100705 > 20090202 20090321 20090506 20090623 20090812 20091002 20091117 > 20100102 20100217 20100404 20100520 20100706 > 20090204 20090322 20090507 20090624 20090813 20091003 20091118 > 20100103 20100218 20100405 20100521 20100707 > 20090205 20090323 20090508 20090625 20090814 20091004 20091119 > 20100104 20100219 20100406 20100522 20100708 > 20090206 20090324 20090509 20090626 20090815 20091005 20091120 > 20100105 20100220 20100407 20100523 20100709 > 20090207 20090325 20090510 20090627 20090816 20091006 20091121 > 20100106 20100221 20100408 20100524 20100710 > 20090208 20090326 20090511 20090628 20090817 20091007 20091122 > 20100107 20100222 20100409 20100525 20100711 > 20090209 20090327 20090512 20090629 20090818 20091008 20091123 > 20100108 20100223 20100410 20100526 20100712 > 20090210 20090328 20090513 20090630 20090819 20091009 20091124 > 20100109 20100224 20100411 20100527 20100713 > 20090211 20090329 20090514 20090701 20090820 20091010 20091125 > 20100110 20100225 20100412 20100528 20100714 > 20090212 20090330 20090515 20090702 20090821 20091011 20091126 > 20100111 20100226 20100413 20100529 20100715 > 20090213 20090331 20090516 20090703 20090822 20091012 20091127 > 20100112 20100227 20100414 20100530 20100716 > 20090214 20090401 20090517 20090704 20090823 20091013 20091128 > 20100113 20100228 20100415 20100531 20100717 > 20090215 20090402 20090518 20090705 20090824 20091014 20091129 > 20100114 20100301 20100416 20100601 20100718 > 20090216 20090403 20090519 20090706 20090825 20091015 20091130 > 20100115 20100302 20100417 20100602 20100719 > 20090217 20090404 20090520 20090707 20090826 20091016 20091201 > 20100116 20100303 20100418 20100603 20100720 > 20090218 20090405 20090521 20090708 20090827 20091017 20091202 > 20100117 20100304 20100419 20100604 20100721 > 20090219 20090406 20090522 20090709 20090828 20091018 20091203 > 20100118 20100305 20100420 20100605 20100722 > 20090220 20090407 20090523 20090710 20090901 20091019 20091204 > 20100119 20100306 20100421 20100606 20100723 > 20090221 20090408 20090524 20090711 20090902 20091020 20091205 > 20100120 20100307 20100422 20100607 20100724 > 20090222 20090409 20090527 20090712 20090903 20091021 20091206 > 20100121 20100308 20100423 20100608 20100725 > 20090223 20090410 20090528 20090713 20090904 20091022 20091207 > 20100122 20100309 20100424 20100609 20100726 > 20090224 20090411 20090529 20090714 20090905 20091023 20091208 > 20100123 20100310 20100425 20100610 20100727 > 20090225 20090412 20090530 20090715 20090906 20091024 20091209 > 20100124 20100311 20100426 20100611 20100728 > 20090226 20090413 20090531 20090716 20090907 20091025 20091210 > 20100125 20100312 20100427 20100612 20100729 > 20090227 20090414 20090601 20090717 20090908 20091026 20091211 > 20100126 20100313 20100428 20100613 mount_check > 20090228 20090415 20090602 20090718 20090909 20091027 20091212 > 20100127 20100314 20100429 20100614 > 20090301 20090416 20090603 20090719 20090912 20091028 20091213 > 20100128 20100315 20100430 20100615 > 20090302 20090417 20090604 20090720 20090913 20091029 20091214 > 20100129 20100316 20100501 20100616 > Running fsck.jfs read-wrirte re-initiallizes the root node and moves all of > its former contents into lost+found. I can recover the data from lost+found > so this is not fatal but still something I would like to fix/avoid. > I have not repaired the above volume yet but have repaired others... Here's > the fsck.jfs output for a read-write repair on a volume that had the same > errors as those described above. > fsck.jfs -v /dev/md10 > fsck.jfs version 1.1.14, 06-Apr-2009 > processing started: 4/23/2010 4.32.24 > Using default parameter: -p > The current device is: /dev/md10 > Open(...READ/WRITE EXCLUSIVE...) returned rc = 0 > Primary superblock is valid. > The type of file system for the device is JFS. > Block size in bytes: 4096 > Filesystem size in blocks: 4756914448 > **Phase 0 - Replay Journal Log > LOGREDO: Log record for Sync Point at: 0x05774f34 > LOGREDO: Beginning to update the Inode Allocation Map. > LOGREDO: Done updating the Inode Allocation Map. > LOGREDO: Beginning to update the Block Map. > LOGREDO: Incorrect leaf index detected (k=(d) 0, j=(d) 0, idx=(d) 0) while > writing Block Map. > LOGREDO: Write Block Map control page failed in UpdateMaps(). > LOGREDO: Unable to update map(s). > logredo failed (rc=-231). fsck continuing. > **Phase 1 - Check Blocks, Files/Directories, and Directory Entries > Root directory has a corrupt tree. > Initialized tree created for root directory. > The root directory has an invalid data format. Will correct. > **Phase 2 - Count links > **Phase 3 - Duplicate Block Rescan and Directory Connectedness > **Phase 4 - Report Problems > **Phase 5 - Check Connectivity > **Phase 6 - Perform Approved Corrections > Superblock marked dirty because repairs are about to be written. > No \lost+found directory found in the filesystem. > Directory inode 18661404 has been reconnected to /lost+found/. > Directory inode 18637982 has been reconnected to /lost+found/. > Directory inode 18614880 has been reconnected to /lost+found/. > Directory inode 18595359 has been reconnected to /lost+found/. > Directory inode 18581312 has been reconnected to /lost+found/. > Directory inode 18556038 has been reconnected to /lost+found/. > . > . > . > Directory inode 448971 has been reconnected to /lost+found/. > File inode 443531 has been reconnected to /lost+found/. > Directory inode 442414 has been reconnected to /lost+found/. > . > . > . > Directory inode 2320 has been reconnected to /lost+found/. > Directory inode 101 has been reconnected to /lost+found/. > Directory inode 32 has been reconnected to /lost+found/. > 622 directories reconnected to /lost+found/. > 1 file reconnected to /lost+found/. > **Phase 7 - Rebuild File/Directory Allocation Maps > **Phase 8 - Rebuild Disk Allocation Maps > **Phase 9 - Reformat File System Log > logformat returned rc = 0 > Filesystem Summary: > Blocks in use for inodes: 2276956 > Inode count: 18215648 > File count: 16453081 > Directory count: 1529882 > Block count: 4756914448 > Free block count: 655162544 > 19027657792 kilobytes total disk space. > 6342069 kilobytes in 1529882 directories. > 16397493672 kilobytes in 16453081 user files. > 0 kilobytes in extended attributes > 0 kilobytes in access control lists > 15856013 kilobytes reserved for system use. > 2620650176 kilobytes are available for use. > Filesystem is clean. > All observed inconsistencies have been repaired. > Filesystem has been marked clean. > **** Filesystem was modified. **** > processing terminated: 4/23/2010 9:08:55 with return code: 0 exit code: > 1. > This problem appears to be related to age and/or the number of directories > in the root node. It's hard to distinguish between these two attributes in > our environment because the root node of our data volumes contain one > directory for each day the volume has been in use. The tipping point appears > to be around 500 days/directories. > Is this a known issue? Is there really a problem with the root node or does > fsck.jfs have an analysis bug? In any event, since the OS can list the > contents of the root node, fsck.jfs should be able to do better than just > dumping all the contents into lost+found. > I've also seen corruption in my allocation maps which could be > related... How can I help debug this further? > Thanks! > Tim > ------------------------------------------------------------------------------ > Sell apps to millions through the Intel(R) Atom(Tm) Developer Program > Be part of this innovative community and reach millions of netbook users > worldwide. Take advantage of special opportunities to increase revenue and > speed time-to-market. Join now, and jumpstart your future. > http://p.sf.net/sfu/intel-atom-d2d_______________________________________________ > Jfs-discussion mailing list > Jfs...@li... > https://lists.sourceforge.net/lists/listinfo/jfs-discussion > > > ------------------------------------------------------------------------------ > Beautiful is writing same markup. Internet Explorer 9 supports > standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3. > Spend less time writing and rewriting code and more time creating great > experiences on the web. Be a part of the beta today > http://p.sf.net/sfu/msIE9-sfdev2dev > _______________________________________________ > Jfs-discussion mailing list > Jfs...@li... > https://lists.sourceforge.net/lists/listinfo/jfs-discussion > > |