Re: [fuse-devel] issues with 2.4 kernel and fuse >= 2.1

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Valient Gough wrote:
>
>
> Miklos Szeredi wrote:
>>
>> I've fixed this particular error.  But there's another problem with
>> kernel versions <= 2.4.20, that's not as simple to deal with.  Since
>> these can now be considered "ancient", and fewer and fewer people will
>> use them, I'm reluctant to put too much effort into porting FUSE for
>> these.
>>  
> Thanks, I've gone back to 1.4, which works fine.  But I thought I'd 
> send a last update to close out this report in case it was helpful:
>
Ooops..  I take it back, that won't be the last message :-)

I found that 1.4 locks up on me too on Linux 2.4.x.  It doesn't happen 
all the time, but I can reproduce it by doing "while true; do df; done" 
in a shell, and within a few seconds it is locked..

What is interesting is that if I run df in another window then it 
unlocks the first window's df process, but then the second df process is 
locked.  I can go back and forth between two windows, typing df and each 
one unlocks the other but locks itself..  Make sense?

Here is what I've found from strace:

I mounted a filesystem with "sudo stract -f encfs -f ~/.crypt ~/crypt".  
I started entering "df" commands in another window.  The first two 
returned normal results, but the third hung.  In strace, it looks like 
the problems starts at an "unfinished" / resumed strace command:

[pid  6617] statfs("/home/vgough/.crypt",  <unfinished ...>
[pid  6588] read(3,  <unfinished ...>
[pid  6617] <... statfs resumed> {f_type="EXT2_SUPER_MAGIC", 
f_bsize=4096, f_blocks=14138407, f_bfree=6670058, f_files=7192576, 
f_ffree=6982471, f_namelen=255}) = 0
[pid  6617] write(3, 
"\4\0\0\0\0\0\0\0\0\20\0\0\'\274\327\0m\321Z\0\0\300m\0"..., 32 
<unfinished ...>
[pid  6616] <... poll resumed> [{fd=4, events=POLLIN}], 1, 2000) = 0
[pid  6616] getppid()                   = 6588
[pid  6616] poll(
 <unfinished ...>
[pid  6617] <... write resumed> )       = 32
[pid  6588] <... read resumed> 
"\5\0\0\0\21\0\0\0\0\0\0\0c\0\0\0\24\'\0\0", 8192) = 20
*[pid  6617] read(3,  <unfinished ...>*
[pid  6588] statfs("/home/vgough/.crypt", {f_type="EXT2_SUPER_MAGIC", 
f_bsize=4096, f_blocks=14138407, f_bfree=6670058, f_files=7192576, 
f_ffree=6982471, f_namelen=255}) = 0
*[pid  6588] write(3, 
"\5\0\0\0\0\0\0\0\0\20\0\0\'\274\327\0m\321Z\0\0\300m\0"..., 32 
<unfinished ...>*
[pid  6616] <... poll resumed> [{fd=4, events=POLLIN}], 1, 2000) = 0
[pid  6616] getppid()                   = 6588
[pid  6616] poll([{fd=4, events=POLLIN}], 1, 2000) = 0
[pid  6616] getppid()                   = 6588

Then it sits in poll, with the df command hung..  I switched to another 
terminal and did a 'df' there, which unblocked the first df, but blocked 
the second..

[{fd=4, events=POLLIN}], 1, 2000) = 0
[pid  6616] getppid()                   = 6588
[pid  6616] poll( <unfinished ...>
*[pid  6617] <... read resumed> 
"\6\0\0\0\21\0\0\0\0\0\0\0\334\t\0\0\334\t\0\0",8192) = 20*
*[pid  6588] <... write resumed> )       = 32*
[pid  6617] statfs("/home/vgough/.crypt",  <unfinished ...>
[pid  6588] read(3,  <unfinished ...>
[pid  6617] <... statfs resumed> {f_type="EXT2_SUPER_MAGIC", 
f_bsize=4096, f_blocks=14138407, f_bfree=6670058, f_files=7192576, 
f_ffree=6982471, f_namelen=255}) = 0
[pid  6617] write(3, 
"\6\0\0\0\0\0\0\0\0\20\0\0\'\274\327\0m\321Z\0\0\300m\0"..., 32 
<unfinished ...>
[pid  6616] <... poll resumed> [{fd=4, events=POLLIN}], 1, 2000) = 0
[pid  6616] getppid()                   = 6588
[pid  6616] poll([{fd=4, events=POLLIN}], 1, 2000) = 0
[pid  6616] getppid()                   = 6588
[pid  6616] poll(
[{fd=4, events=POLLIN}], 1, 2000) = 0
[pid  6616] getppid()                   = 6588
[pid  6616] poll(
[{fd=4, events=POLLIN}], 1, 2000) = 0
[pid  6616] getppid()                   = 6588
[pid  6616] poll(

I've marked the calls that straddle the lock point, which may be the 
cause..  Looks like the "[pid  6588] write", and the earlier read didn't 
resume until after the next df command..  Does seem to match up with the 
GDB trace I sent before showing 'write' as the current location of the 
locked thread...

Any ideas, or suggestions for further debugging?  I realize that 2.4.x 
isn't a priority, but perhaps finding the problem here will be useful in 
avoiding a similar problem in 2.6.x kernels?

thanks,
Valient