Thread: [Encfs-users] close() racing with rename() [was Re: bug report for latest EncFs stuff]
Brought to you by:
vgough
From: Valient G. <vg...@po...> - 2005-03-24 18:58:45
|
Thanks, I think that is the most complete bug report I've ever seen! Yeah, a quick run with encfs in verbose mode indicates (to me) that it is a race condition. Mmmm... this isn't going to be easy to fix though. I'm going to forward this along with the test program to the Fuse mailing list as well, because I think this might affect some (if not all) other Fuse based filesystems. Miklos: I've tested that it also doesn't work properly on sshfs (version 1.1 + fuse 3/16 CVS build - I haven't tried with the last 3 CVS changes). I'm also going to describe the problem in gory detail, although not for your benefit - you probably already know what's going on just from the subject line :-). In encfs, rename() and close() are racing for the same file. Encfs refuses to rename over an existing node, because that would cause it to be in an invalid state. The reason is that libfuse uses the filename as the primary key for most operations, which doesn't map exactly to unix filesystem semantics. It is legal for a unix filesystem to have multiple files by the same name (open a file, rename a file on top of it, or delete it, repeat and still have open references to the other versions of that file). Libfuse deals with this by transparently renaming hidden files itself. So if I opened "a", then deleted it while it was still open, libfuse would rename this to something like ".fuse_hiddenxxxxxxxxxxxxx", so that it could still refer to the file with a unique name. Thus encfs expects to never be asked to rename on top of an open file, since libfuse should have taken care of this possibility. So, this is all well and good, except that this case doesn't trigger this hiding. Libfuse must not be seeing any race. In libfuse's do_release, a lock is held only around the modification of libfuse's internal state, so the filesystem implementation (encfs or sshfs, etc) inherits a race condition here. I'm sure I could whip up a band-aid in encfs - since encfs knows that libfuse should never be asking for a rename over an open file, but my hope is that this could be handled in libfuse.. I started mucking around with locking in libfuse, but the problem is that do_release and do_rename are not locking on a common object (two different files involved). So I'm not sure how to fix this in libfuse without either adding a global lock for these functions (not pretty or nice on performance), or going to an inode based interface (doesn't help existing filesystems).. Any ideas? If not, I may just band-aid up encfs for the mean time.. regards, Valient dw...@dw... wrote: >hi, >i've been using your latest EncFS stuff with good success. however, >i ran across a bug using an application that relies on rename to >do some important things, and was able to reproduce the "bug" in a small >c++ program (find source attached.) i say "bug" because the program definitely >works >differently in an encfs filesystem compared to an ext2 filesystem. and the >rename man page seems to imply that this program should work without >failures (tho the man page is a bit cryptic, i admit.) >but i suppose it might just >be a limitation of rename in EncFS (browsing the source for encfs didn't shed >any >light on it... for me anyway.) >i'll let you decide if it is a real "bug". :-) > >thanks, >dwight melcher >dw...@dw... > >----cut-------- > > >// dwight melcher, mar 24, 2005 >// this little program demonstrates that rename seems to work >// significantly differently in encfs than ext2. >// >// WARNING: running this program will destroy two files in the >// cwd called "a" and "b". >// >// compile with >// gcc t.cc -lstdc++ >// run as a normal "non-root" user. >// >// (see below for all version info, everything is current >// as of the date above, i believe.) >// >// there are no errors reported from ext2, the errors logged in >// encfs are: >// >//Mar 24 08:54:14 armbiter encfs: Error, attempting to rename >/home/dwight/.crypt-raw/4wobKZTkFhFBC1 over existing open file >/home/dwight/.crypt-raw/93XTB3NFa1dug, ! >// >// the error is the same every time this program is run. >// the other app that fails with this error (eg, p4d), seem to fail >// randomly while doing "submit" processing, >// making me think it is some kinda race condition. >// >// the machine this fails on is: >// >// Linux armbiter.dwyco.com 2.4.21-27.0.2.ELsmp #1 SMP Wed Jan 12 23:25:44 EST >2005 x86_64 x86_64 x86_64 GNU/Linux >// >// FUSE version from dmesg: >// fuse init (API version 5.1) >// fuse distribution version: 2.2.1 >// >// gcc version 3.2.3 20030502 (Red Hat Linux 3.2.3-49) >// >// EncFS version. >// Version 5 configuration; created by EncFS 1.2.0 (revision 20040813) >//Filesystem cipher: "ssl/blowfish", version 2:1:1 >//Filename encoding: "nameio/block", version 3:0:1 >//Key Size: 160 bits >//Block Size: 512 bytes >//Each file contains 8 byte header with unique IV data. >//Filenames encoded using IV chaining mode. > > >#include <stdlib.h> >#include <stdio.h> >#include <unistd.h> >#include <sys/stat.h> >#include <fcntl.h> > >int >main(int, char **) >{ > // get the two files set up > int fd; > if((fd = creat("a", 0600)) == -1) > { > perror("create a"); > exit(1); > } > close(fd); > if((fd = creat("b", 0600)) == -1) > { > perror("create b"); > exit(1); > } > close(fd); > while(1) > { > if(rename("a", "b") == -1) > { > perror("rename 1");; > exit(1); > } > int fd; > if((fd = creat("a", 0600)) == -1) > { > perror("create just renamed a"); > exit(1); > } > close(fd); > if(rename("b", "a") == -1) > { > perror("rename 2");; > exit(1); > } > } > return 0; >} > > > > > |
From: Miklos S. <mi...@sz...> - 2005-03-24 21:39:46
|
> Miklos: I've tested that it also doesn't work properly on sshfs > (version 1.1 + fuse 3/16 CVS build - I haven't tried with the last 3 CVS > changes). I'm also going to describe the problem in gory detail, > although not for your benefit - you probably already know what's going > on just from the subject line :-). No. This sounds quite strange. I tried the test program on sshfs and it doesn't work but only because sftp server does the rename in some tricky way, which doesn't allow renaming to an existing file. > In encfs, rename() and close() are racing for the same file. Encfs > refuses to rename over an existing node, because that would cause it to > be in an invalid state. The reason is that libfuse uses the filename as > the primary key for most operations, which doesn't map exactly to unix > filesystem semantics. It is legal for a unix filesystem to have > multiple files by the same name (open a file, rename a file on top of > it, or delete it, repeat and still have open references to the other > versions of that file). > > Libfuse deals with this by transparently renaming hidden files itself. > So if I opened "a", then deleted it while it was still open, libfuse > would rename this to something like ".fuse_hiddenxxxxxxxxxxxxx", so that > it could still refer to the file with a unique name. Thus encfs expects > to never be asked to rename on top of an open file, since libfuse should > have taken care of this possibility. > > So, this is all well and good, except that this case doesn't trigger > this hiding. Libfuse must not be seeing any race. In libfuse's > do_release, a lock is held only around the modification of libfuse's > internal state, so the filesystem implementation (encfs or sshfs, etc) > inherits a race condition here. I'm sure I could whip up a band-aid > in encfs - since encfs knows that libfuse should never be asking for a > rename over an open file, but my hope is that this could be handled in > libfuse.. OK, so what you're seeing is that the fuse library sees RELEASE first and RENAME second, but the filesystem sees RENAME first and RELEASE second? This is indeed nasty. > I started mucking around with locking in libfuse, but the problem is > that do_release and do_rename are not locking on a common object (two > different files involved). So I'm not sure how to fix this in libfuse > without either adding a global lock for these functions (not pretty or > nice on performance), or going to an inode based interface (doesn't help > existing filesystems).. > > Any ideas? If not, I may just band-aid up encfs for the mean time.. How about this for a bandaid? This should solve this particular case, because a single thread is doing the release and the rename, and by waiting for the release to finish, they should not mix up. This is not a proper fix though. I'll try to figure out how to solve this properly. Thanks, Miklos Index: kernel/dev.c =================================================================== RCS file: /cvsroot/fuse/fuse/kernel/dev.c,v retrieving revision 1.68 diff -u -r1.68 dev.c --- kernel/dev.c 21 Mar 2005 11:47:04 -0000 1.68 +++ kernel/dev.c 24 Mar 2005 21:36:05 -0000 @@ -261,7 +261,7 @@ intr = request_wait_answer_nonint(req); spin_lock(&fuse_lock); } - if (!intr) + if (!intr || req->background) return; if (!interruptible || req->sent) @@ -367,6 +367,17 @@ spin_lock(&fuse_lock); background_request(fc, req); spin_unlock(&fuse_lock); + request_send_wait(fc, req, 0); + fuse_put_request(fc, req); +} + +static void request_send_background_nowait(struct fuse_conn *fc, + struct fuse_req *req) +{ + req->isreply = 1; + spin_lock(&fuse_lock); + background_request(fc, req); + spin_unlock(&fuse_lock); request_send_nowait(fc, req); } @@ -385,7 +396,7 @@ req->out.numargs = 1; req->out.args[0].size = sizeof(*arg); req->out.args[0].value = arg; - request_send_background(fc, req); + request_send_background_nowait(fc, req); } static inline int lock_request(struct fuse_req *req) |