|
From: Miklos S. <mi...@sz...> - 2009-05-25 15:33:27
|
On Thu, 21 May 2009, John Haxby wrote:
> Hello all,
>
> I couldn't find anything about this problem but please accept my
> apologies if it's already known about.
>
> We have some people who are using fuse-2.7.4 on RHEL4 for which, of
> course, they need to compile and install the kernel module. (Actually,
> we also have some people using RHEL5 and they still build fuse.ko
> because it's not configured in the RHEL5 kernel.)
>
> There's an unpleasant race condition that crops up when creating a
> file. If you interrupt a process that's creating files (eg tar) at just
> the right time then you'll get the fuse filesystem process dying with
>
> fuse internal error: node 0 not found
>
> The circumstances under which this can happen are relatively unusual and
> the window in which you can get the failure is quite narrow which is
> probably why this hasn't happened before.
>
> The main requirement for getting the error is that you have a non-zero
> negative_timeout. If that's the case then an interrupt while the fuse
> filesystem process is in fuse_lib_lookup() when the file it's looking
> for doesn't exist will cause e.ino and err to be set to zero:
>
> err = lookup_path(f, parent, name, path, &e, NULL);
> if (err == -ENOENT && f->conf.negative_timeout != 0.0) {
> e.ino = 0;
> e.entry_timeout = f->conf.negative_timeout;
> err = 0;
> }
>
> When fuse_lib_lookup() calls reply_entry() to finish off the LOOKUP
> request it will get an ENOENT from the kernel because the operation has
> been aborted and then reply_entry() will call forget_node() with that
> zero nodeid and you'll get the abort error.
>
> This error is relatively easy to achieve if you are untarring an archive
> into a fuse filesystem, just keep hitting ^C to kill the tar and sooner
> or later you'll get the error and your filesystem process abort().
>
> Fortunately, there's an easy fix:
> http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.24.y.git;a=commitdiff;h=a131de0a482ac95e6469f56981c7b063593fdc5d
> (or http://tinyurl.com/opmuh2)
>
> This stops interrupts aborting outstanding operations and so this
> particular race condition never happens.
>
> It would be nice if this patch could be rolled into the 2.7 tree so that
> others don't trip over the same problem.
Sure, do you have a tested patch against the 2.7 tree? I'd happily
apply it and release a 2.7.5 with the fix.
Thanks,
Miklos
|