From: Valient G. <vg...@po...> - 2005-01-19 23:37:44
|
Valient Gough wrote: > > > Miklos Szeredi wrote: >> >> I've fixed this particular error. But there's another problem with >> kernel versions <= 2.4.20, that's not as simple to deal with. Since >> these can now be considered "ancient", and fewer and fewer people will >> use them, I'm reluctant to put too much effort into porting FUSE for >> these. >> > Thanks, I've gone back to 1.4, which works fine. But I thought I'd > send a last update to close out this report in case it was helpful: > Ooops.. I take it back, that won't be the last message :-) I found that 1.4 locks up on me too on Linux 2.4.x. It doesn't happen all the time, but I can reproduce it by doing "while true; do df; done" in a shell, and within a few seconds it is locked.. What is interesting is that if I run df in another window then it unlocks the first window's df process, but then the second df process is locked. I can go back and forth between two windows, typing df and each one unlocks the other but locks itself.. Make sense? Here is what I've found from strace: I mounted a filesystem with "sudo stract -f encfs -f ~/.crypt ~/crypt". I started entering "df" commands in another window. The first two returned normal results, but the third hung. In strace, it looks like the problems starts at an "unfinished" / resumed strace command: [pid 6617] statfs("/home/vgough/.crypt", <unfinished ...> [pid 6588] read(3, <unfinished ...> [pid 6617] <... statfs resumed> {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=14138407, f_bfree=6670058, f_files=7192576, f_ffree=6982471, f_namelen=255}) = 0 [pid 6617] write(3, "\4\0\0\0\0\0\0\0\0\20\0\0\'\274\327\0m\321Z\0\0\300m\0"..., 32 <unfinished ...> [pid 6616] <... poll resumed> [{fd=4, events=POLLIN}], 1, 2000) = 0 [pid 6616] getppid() = 6588 [pid 6616] poll( <unfinished ...> [pid 6617] <... write resumed> ) = 32 [pid 6588] <... read resumed> "\5\0\0\0\21\0\0\0\0\0\0\0c\0\0\0\24\'\0\0", 8192) = 20 *[pid 6617] read(3, <unfinished ...>* [pid 6588] statfs("/home/vgough/.crypt", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=14138407, f_bfree=6670058, f_files=7192576, f_ffree=6982471, f_namelen=255}) = 0 *[pid 6588] write(3, "\5\0\0\0\0\0\0\0\0\20\0\0\'\274\327\0m\321Z\0\0\300m\0"..., 32 <unfinished ...>* [pid 6616] <... poll resumed> [{fd=4, events=POLLIN}], 1, 2000) = 0 [pid 6616] getppid() = 6588 [pid 6616] poll([{fd=4, events=POLLIN}], 1, 2000) = 0 [pid 6616] getppid() = 6588 Then it sits in poll, with the df command hung.. I switched to another terminal and did a 'df' there, which unblocked the first df, but blocked the second.. [{fd=4, events=POLLIN}], 1, 2000) = 0 [pid 6616] getppid() = 6588 [pid 6616] poll( <unfinished ...> *[pid 6617] <... read resumed> "\6\0\0\0\21\0\0\0\0\0\0\0\334\t\0\0\334\t\0\0",8192) = 20* *[pid 6588] <... write resumed> ) = 32* [pid 6617] statfs("/home/vgough/.crypt", <unfinished ...> [pid 6588] read(3, <unfinished ...> [pid 6617] <... statfs resumed> {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=14138407, f_bfree=6670058, f_files=7192576, f_ffree=6982471, f_namelen=255}) = 0 [pid 6617] write(3, "\6\0\0\0\0\0\0\0\0\20\0\0\'\274\327\0m\321Z\0\0\300m\0"..., 32 <unfinished ...> [pid 6616] <... poll resumed> [{fd=4, events=POLLIN}], 1, 2000) = 0 [pid 6616] getppid() = 6588 [pid 6616] poll([{fd=4, events=POLLIN}], 1, 2000) = 0 [pid 6616] getppid() = 6588 [pid 6616] poll( [{fd=4, events=POLLIN}], 1, 2000) = 0 [pid 6616] getppid() = 6588 [pid 6616] poll( [{fd=4, events=POLLIN}], 1, 2000) = 0 [pid 6616] getppid() = 6588 [pid 6616] poll( I've marked the calls that straddle the lock point, which may be the cause.. Looks like the "[pid 6588] write", and the earlier read didn't resume until after the next df command.. Does seem to match up with the GDB trace I sent before showing 'write' as the current location of the locked thread... Any ideas, or suggestions for further debugging? I realize that 2.4.x isn't a priority, but perhaps finding the problem here will be useful in avoiding a similar problem in 2.6.x kernels? thanks, Valient |