From: Jake H. <jh...@po...> - 2005-03-19 06:47:35
|
Now that Syllable can run the DejaGnu test framework (requires kernel build 0009 plus some glibc patches which I've sent to Vanders), I tried running the GCC 3.4.3 testsuite, which gives the system a pretty good workout. Unfortunately, after 15 minutes or so, the system locked up, and I got these AFS related messages on the next boot. Now when I tail /var/log/kernel, the filesystem seizes up and I can't launch any new processes. Time to restore the partition... I plan to take a break from low-level kernel hacking for a little while in order to study AFS, with the goal (besides fixing any bugs I find!) of writing an fsck utility so that we don't have to keep reinstalling/ restoring Syllable when these things happen. Has anyone started to write anything along those lines? Anyway, I wanted to let y'all know that it IS possible to screw up the filesystem through heavy activity, even with SMP disabled (although it's easier with SMP enabled). At the time of the crash, it had been doing CPU-intensive compiles, but nothing disk intensive. However, during the same session I untarred the GCC 3.4.3 source tree and bootstrapped it, so there had been major disk activity at some point prior to the crash. One scary thing I've noticed is that after there has been heavy disk activity, "sync" will often spend several seconds writing out data even if the system has been idle for a long time since the last disk access. In other words, there doesn't seem to be any "buffer flush" thread writing out dirty buffers to the disk on a regular basis. This could definitely be making AFS corruption issues more severe than they otherwise might be. -Jake |