From: Kevin D. <the...@gm...> - 2007-09-11 03:49:15
|
Jakub Bogusz <jakub.bogusz@...> writes: > > On Tue, Aug 28, 2007 at 05:13:41PM +0200, Miklos Szeredi wrote: > > > We (Gemius SA) are working on our distributed filesystem based on fuse. > > > During tests it appeared that stat()ing (even using ls -l) a file during > > > read/write operations rarely could cause read() to fail (returning 0 > > > before actual EOF). > > > It happens on other fuse-based filesystems too (e.g. fusexmp_fh included > > > in fuse sources, or sshfs), which can be tested with test-stat program > > > (sources attached; while read returning positive length less than > > > requested is OK, it in most failure cases returns 0, wrongly indicating > > > EOF). > > > It's caused by race in fuse kernel module, when some write operation > > > expanding file occurs in the time window between getattr/lookup operation > > > done in userspace and actual i_size update afterwards. Then i_size > > > update by write operation is overridden by older file size, causing > > > further read operation to "think" that file is shorter than really is. > > > > Hmm, very interesting. I can reproduce it with the test program. > > > > This is the solution that I have in mind: > > > > - let's introduce a per-filesystem version counter > > > > - the counter is incremented before each write (and anything else, > > that modifies the attributes), and the new version is stored in > > fuse_inode > > inode version stored before or after actual operation? > > > - getattr/lookup note the value of the version counter before sending > > the request > > You mean per-filesystem counter here? > > > - when getattr/lookup return, they check if the inode's version is > > greater than the noted version. If it is, they don't store the > > size, and mark the inode attributes invalid > > > > Do you see anything wrong with such a scheme? > > It seems to require (at least) per-inode locking too, to update version > atomically with write/setattr (to avoid reading it by getattr/lookup in > a wrong moment; per-inode locking with lookup is problematic, as I noted > earlier)... > > And some per-filesystem locking as well, to protect fs version counter. > > If you see a version-based solution without expensive locking, please > give more details (order of operations/incrementation, used locking) > - now I see some possible race in all cases without per-inode or > per-filesystem locking I think of. > This looks similar to my problem and I am wondering if it is or is not. Under uClibc instead of glibc, a deadlock occurs 100% of the time on all fuse mounted filesystems. The mount works fine, but and read operation on the filesystem causes the application performing the read to deadlock. The only way to get back control to that particular terminal would be to use another one and killall -9 on something such as sshfs. an example sshfs user@localhost:/home/user/Desktop/ /home/user/test/ Mount would be successful. Running df anywhere would deadlock: # df Running ls /home/user/test/ would deadlock # ls /home/user/test/ Pressing the TAB key to tab complete that address would deadlock when I press TAB at this point: # /home/user/ Any form of reading the mounted directory, its contents, or the mount itself will result in a deadlock. In the past, I was able to workaround the problem by removing the following lines of code from lib/fuse_lowlevel.c: __asm__(".symver fuse_reply_statfs_compat,fuse_reply_statfs@FUSE_2.4"); __asm__(".symver fuse_reply_open_compat,fuse_reply_open@FUSE_2.4"); __asm__(".symver fuse_lowlevel_new_compat,fuse_lowlevel_new@FUSE_2.4"); This is no longer the case, removing it no longer prevents the problem. Any ideas/help? Is this the same problem? |