Re: [Jfs-discussion] article in german magazine c't
Brought to you by:
blaschke-oss,
shaggyk
From: Oliver D. <od...@ct...> - 2002-03-11 16:13:31
|
Christoph Hellwig wrote: > On Sat, Mar 09, 2002 at 12:14:57PM +0100, Per Jessen wrote: > >>I've just received the most recent c't magazine, which has an >>interesting article/comparison of the journaled file systems in >>the linux world - xfs, jfs, ext3, reiser .... >> >>JFS doesn't get a particularly honourable report - and the >>description and experiences as described in the article certainly don't >>match my own. >> >>Anyone else read this article ? >> > > Sure, the c't appeared in my inbox today in the morning, as it does in > alomost every German-speaking hacker's :) > > But in my regular stree-testings using fsstress and cerberus I'm > absoloutly unable to reproduce is using a variety of test enviroments. > On the other hand JFS <= 1.0.14 had a very similar bug when built > modular (I really wonder why the QA of the distributors shipping JFS > didn't found it - once I built a modular JFS on my test machines I was > able to trigger it within a few minutes). > > I've Cc'ed Oliver so he can comment on the exact test enviroment > (Kernel, Hardware, JFS-Version, Config-Options). > > Christoph > > Hi, here are some details regarding my test environment. First of all, I have to say that I was quite suprised by the unexpected problems with JFS. (Btw, I reported everything in detail to Steve Best and discussed the results with him.) I used a Pentium 4, 1400 MHz, Intel chip set, 256 MByte RAM. The system survived one weekend with continuous kernel compiling and a complete Cerberus run without problems and did not exhibit any irregularities with the other fs in any of my tests and benchmarks, so I concluded that the cause of the JFS failures should not be a general hardware problem. If it really was hardware related, it must be a very special problem that only shines up with JFS. I used a standard Red Hat Linux 7.2, installed on one ext3 partition, with a regular kernel 2.4.17 from www.kernel.org, patched with the jfs-2.4-common-1.0.15 and the jfs-2.4.17-1.0.15 patches, dating from February 15th. No rejects with the patches, no error messages when building the kernel. The relevant .config settings (the complete .config is available on request): CONFIG_JFS_FS=y # CONFIG_JFS_DEBUG is not set Booting the JFS kernel did not yield any errors. All stability and performance tests were run in single user mode after rebooting the system. My Cerberus and LTP setting were similar to the one used by Red Hat (people.redhat.com/bmatthews) but restrained on the fs related tests. Cerberus/LTP yielded a problem after 10 or 15 minutes: I got a kernel error stating "invalid operand 0000" from JFS code (don't have registers, stack, and call trace information available). The error seemed to occur while running the iogen LTP program. An "rm -rf" of the files left from Cerberus and LTP hung, and the JFS device could not be unmounted (umount gave an lseek error). After rebooting, mounting was impossible ("wrong fs type, bad option, bad superblock"). "fsck.jfs -n" detected errors in the Fileset File/Directory Allocation Map control information, in the Fileset File/Directory Allocation Map, and incorrect data in disk allocation structures and disk allocation control structures. When doing a "fsck.jfs -a", it replayed the Log and said the fs was clean. The JFS device could be mounted then. After a reboot, I started a second Cerberus/LTP run with exactly the same settings, and it succeeded. My second test was a Perl script that repeatedly started several file system intensive tasks in parallel. After some time, all processes accessing the JFS hung. The kernel itself did not crash (you could remote login etc.). A "strace -p" to these processes gave no response -- the processes did not exhibit any activity (I guess they hung somewhere in their read(), open(), readdir() or lseek() calls) -- quite the same behavior as the "rm -rf" after the Cerberus/LTP crash. The tasks (the JFS partition was mounted to /jfs and contained a subdir jfs/test with three complete kernel trees [about 35,000 files/500 MByte]): * "ls -lR /jfs" * "find /jfs -type f -exec grep XXX {} \; * cp -a /jfs/test/* /jfs/tmp3 * dbench 10; sleep 180 * a program that creates, writes, and removes 32,000 files in /jfs/tmp1 * a programm that recursively creates and removes a directory tree of depth 12 in /jfs/tmp2 * a programm that creates a 500 MByte sparse file /jfs/large_file and does lseek(), read() and write() calls within. The last program reported a read() error, getting less bytes than requested before the the "crash". I can post my test script, the programs, and the exact setting if someone is interested in. Oliver -- Dr. Oliver Diedrich Helstorfer Strasse 7 c't Magazin für Computertechnik D-30625 Hannover, Germany e-mail: od...@ct... Tel: +49 (0)511 5352 300 |