From: Anatol P. <ana...@gm...> - 2012-12-13 00:14:37
|
Hi, Miklos. On our servers we have many fuse filesystems that run by different users. We also have some kind of monitoring software that tracks fuse daemon processes and kills one if it misbehaves. Sometimes (e.g. in case of a deadlock) the only way to kill daemon is to send SIGKILL to it. Unfortunately SIGKILL produces another issue - the mountpount is left in inconsistent state. libfuse calls umount() in its uninitialization logic and SIGKILL does not give any chance to run umount(). But having clean unmount even in case of SIGKILL would be really nice to have in fuse. Miklos, is there any way to cleanly umount the filesystem in case of SIGKILL? Maybe it can be done in kernel in fuse_dev_release()? This function corresponds to close() of /dev/fuse - kernel always closes descriptors in case of thread (i.e. fuse daemon) death. |
From: Nikolaus R. <Nik...@ra...> - 2012-12-13 07:52:51
|
Anatol Pomozov <ana...@pu...> writes: > Unfortunately SIGKILL produces another issue - the mountpount is left in > inconsistent state. libfuse calls umount() in its uninitialization logic > and SIGKILL does not give any chance to run umount(). But having clean > unmount even in case of SIGKILL would be really nice to have in fuse. This is generally not a good idea. Imagine if you run a tool like rsync. If the source mountpoint suddenly becomes empty, rsync would end up deleting everything in the destination. If the mountpoint returns an I/O error instead (as it currently does), rsync can detect the problem and will instead refuse to do anything. Best, -Nikolaus -- »Time flies like an arrow, fruit flies like a Banana.« PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C |
From: Anatol P. <ana...@gm...> - 2013-03-20 19:48:41
|
Hi On Wed, Dec 12, 2012 at 11:28 PM, Nikolaus Rath <Nik...@ra...> wrote: > Anatol Pomozov <ana...@pu...> writes: >> Unfortunately SIGKILL produces another issue - the mountpount is left in >> inconsistent state. libfuse calls umount() in its uninitialization logic >> and SIGKILL does not give any chance to run umount(). But having clean >> unmount even in case of SIGKILL would be really nice to have in fuse. > > This is generally not a good idea. Imagine if you run a tool like rsync. > If the source mountpoint suddenly becomes empty, rsync would end up > deleting everything in the destination. If the mountpoint returns an I/O > error instead (as it currently does), rsync can detect the problem and > will instead refuse to do anything. It end up that our users have the same concerns about autoumount-on-SIGKILL. Our build system (the user) distinguishes "normal" fs errors (like ENOENT) from abnormal one (ENOTCONN). In case of ENOTCONN build system knows that filesystem is broken and there is nothing what it can do. So build tool aborts current compilations and shutdowns itself. Only after all users exited mountpoint can be cleaned. If we would do autoumount then build system does not know that fs is broken and keeps compiling. This might produce incorrect output. |
From: Maxim V. P. <mpa...@pa...> - 2012-12-13 08:29:01
|
Anatol, 12/13/2012 11:28 AM, Nikolaus Rath пишет: > Anatol Pomozov <ana...@pu...> writes: >> Unfortunately SIGKILL produces another issue - the mountpount is left in >> inconsistent state. libfuse calls umount() in its uninitialization logic >> and SIGKILL does not give any chance to run umount(). But having clean >> unmount even in case of SIGKILL would be really nice to have in fuse. > This is generally not a good idea. Imagine if you run a tool like rsync. > If the source mountpoint suddenly becomes empty, rsync would end up > deleting everything in the destination. If the mountpoint returns an I/O > error instead (as it currently does), rsync can detect the problem and > will instead refuse to do anything. FUSE reconnect can be implemented relatively easy. The idea is to keep kernel fuse queueing requests while user-space is dead and when it restarts, it's being connected to existing kernel fuse_conn. Having this feature implemented, you could SIGKILL deadlocked fuse daemon, then start it again and umount the filesystem cleanly. Will this scheme be helpful for you? Thanks, Maxim |
From: Anatol P. <ana...@gm...> - 2013-02-25 17:00:35
|
Hi On Thu, Dec 13, 2012 at 12:18 AM, Maxim V. Patlasov <mpa...@pa...> wrote: > Anatol, > > FUSE reconnect can be implemented relatively easy. The idea is to keep > kernel fuse queueing requests while user-space is dead and when it restarts, > it's being connected to existing kernel fuse_conn. Having this feature > implemented, you could SIGKILL deadlocked fuse daemon, then start it again > and umount the filesystem cleanly. Will this scheme be helpful for you? One question about the reconnection - what are you going to do with open file descriptors? With daemon crash they become invalid and should be closed and thus you break filesystem clients anyway. Otherwise reconnection sounds interesting. In fact it can be useful for regular clean shutdown if we want to make hot filesystem upgrade. But the reconnection makes sense only in case the system has some kind of "process supervisor". Something that tracks process status and restarts it on crash, e.g. systemd service. Otherwise we still have the same issue - on abnormal daemon exit the user has inconsistent mountpoint and has to do something with it. The only difference is that the crashed filesystem returns ECONN error now and with your proposal it will hang (in uninterruptable sleep!). As of our server setup we do have a "process supervisor". But in our case crash does not always lead to restart, e.g. the process is rescheduled on different machine. So we still need to have some kind of process afterwork cleanup, but we want to keep the supervisor code fuse-unaware. Kernel autocleanup on daemon death seems the best option for us. This can also replace recently added "auto_unmount" feature. The option enables user-space cleanup mechanism, but having kernel cleanup on daemon shutdown is more reliable. Anyway I have a working code for kernel autocleanup and I'll post it here for comments. |
From: Anatol P. <ana...@gm...> - 2013-02-25 18:12:30
|
To cleanup its mountpoint a fuse application registres signal hook that calls 'fusermount' tool. But in case of abnormal exit (SIGSEGV, SIGKILL) application has no chance to call fusermount and the mountpoint is left in inconsistent state (it returns ENOTCONN error). There is an option that was added recently "auto_unmount" but it utilizes user-space daemon and not very reliable (it also can be killed with SIGKILL). Instead we implement unmount on '/dev/fuse' file close. With it there is no need to use 'auto_unmount' or call 'fusermount' on shutdown but we keep it for compatibility with old kernels. Current implementation unmounts original mountpoint and all bind mounts. So it differs from original implementation that called 'fusermount' only on original mount. Note that both fusermount and kernel style mount cleanup unmounts filesystem only in current process namespace. If daemon changed filesystem namespace then those mountpoints are left untouched. Tested: run a fuse filesystem and tried to kill it different ways: SIGTERM, SIGKILL, "umount dir". Check that it also works in case of bind mounts. Google-Bug-Id: 7718269 Change-Id: I0838b40e1e3c9328c76674d5043b7a700b9053b7 Signed-off-by: Anatol Pomozov <ana...@gm...> --- fs/fuse/dev.c | 34 ++++++++++++++++++++++++++++++++++ fs/namespace.c | 10 ++++++++-- include/linux/mount.h | 1 + 3 files changed, 43 insertions(+), 2 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index e9bdec0..4f592b7 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -19,6 +19,10 @@ #include <linux/pipe_fs_i.h> #include <linux/swap.h> #include <linux/splice.h> +#include <linux/nsproxy.h> +#include <linux/mount.h> + +#include "../mount.h" MODULE_ALIAS_MISCDEV(FUSE_MINOR); MODULE_ALIAS("devname:fuse"); @@ -2082,10 +2086,34 @@ void fuse_abort_conn(struct fuse_conn *fc) } EXPORT_SYMBOL_GPL(fuse_abort_conn); +static void fuse_umount(struct super_block *sb) +{ + struct nsproxy *nsp = task_nsproxy(current); + struct mnt_namespace *ns = nsp->mnt_ns; + struct mount *mnt, *tmp; + + list_for_each_entry_safe(mnt, tmp, &ns->list, mnt_list) { + struct vfsmount *vfsmnt = &mnt->mnt; + if (vfsmnt->mnt_sb == sb) { + /* in case of mount binds there can be more than one + * mountpoint that corresponds to sb + */ + mntget(vfsmnt); + do_umount(vfsmnt, 0); + mntput(vfsmnt); + + /* TODO: better debug message? */ + pr_debug("fuse: mountpoint (%d,%d) automatically unmounted\n", + MAJOR(sb->s_dev), MINOR(sb->s_dev)); + } + } +} + int fuse_dev_release(struct inode *inode, struct file *file) { struct fuse_conn *fc = fuse_get_conn(file); if (fc) { + struct super_block *sb = fc->sb; spin_lock(&fc->lock); fc->connected = 0; fc->blocked = 0; @@ -2094,6 +2122,12 @@ int fuse_dev_release(struct inode *inode, struct file *file) wake_up_all(&fc->blocked_waitq); spin_unlock(&fc->lock); fuse_conn_put(fc); + + /* super block might already be NULL if we killed this fs by + * "umount" + */ + if (sb) + fuse_umount(sb); } return 0; diff --git a/fs/namespace.c b/fs/namespace.c index 55605c5..d7496d8 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -1147,7 +1147,7 @@ void umount_tree(struct mount *mnt, int propagate, struct list_head *kill) static void shrink_submounts(struct mount *mnt, struct list_head *umounts); -static int do_umount(struct mount *mnt, int flags) +static int __do_umount(struct mount *mnt, int flags) { struct super_block *sb = mnt->mnt.mnt_sb; int retval; @@ -1237,6 +1237,12 @@ static int do_umount(struct mount *mnt, int flags) return retval; } +int do_umount(struct vfsmount *mnt, int flags) +{ + return __do_umount(real_mount(mnt), flags); +} +EXPORT_SYMBOL(do_umount); + /* * Now umount can handle mount points as well as block devices. * This is important for filesystems which use unnamed block devices. @@ -1272,7 +1278,7 @@ SYSCALL_DEFINE2(umount, char __user *, name, int, flags) if (!ns_capable(mnt->mnt_ns->user_ns, CAP_SYS_ADMIN)) goto dput_and_out; - retval = do_umount(mnt, flags); + retval = __do_umount(mnt, flags); dput_and_out: /* we mustn't call path_put() as that would clear mnt_expiry_mark */ dput(path.dentry); diff --git a/include/linux/mount.h b/include/linux/mount.h index d7029f4..333c1e8 100644 --- a/include/linux/mount.h +++ b/include/linux/mount.h @@ -65,6 +65,7 @@ extern struct vfsmount *mntget(struct vfsmount *mnt); extern void mnt_pin(struct vfsmount *mnt); extern void mnt_unpin(struct vfsmount *mnt); extern int __mnt_is_readonly(struct vfsmount *mnt); +extern int do_umount(struct vfsmount *mnt, int flags); struct file_system_type; extern struct vfsmount *vfs_kern_mount(struct file_system_type *type, -- 1.8.1.3 |
From: Anatol P. <ana...@gm...> - 2013-02-25 18:14:58
|
Hi, I sent it as a reply to "clean mountpoint umount on daemon SIGKILL" mail thread. The change is RFC and needs discussion. On Mon, Feb 25, 2013 at 10:11 AM, Anatol Pomozov <ana...@gm...> wrote: > To cleanup its mountpoint a fuse application registres signal hook that calls > 'fusermount' tool. But in case of abnormal exit (SIGSEGV, SIGKILL) application > has no chance to call fusermount and the mountpoint is left in inconsistent > state (it returns ENOTCONN error). > There is an option that was added recently "auto_unmount" but it utilizes > user-space daemon and not very reliable (it also can be killed with SIGKILL). > > Instead we implement unmount on '/dev/fuse' file close. With it there is no > need to use 'auto_unmount' or call 'fusermount' on shutdown but we keep it for > compatibility with old kernels. > > Current implementation unmounts original mountpoint and all bind mounts. So > it differs from original implementation that called 'fusermount' only on > original mount. > > Note that both fusermount and kernel style mount cleanup unmounts filesystem > only in current process namespace. If daemon changed filesystem namespace > then those mountpoints are left untouched. > > Tested: run a fuse filesystem and tried to kill it different ways: > SIGTERM, SIGKILL, "umount dir". Check that it also works in case of bind mounts. > > Google-Bug-Id: 7718269 > > Change-Id: I0838b40e1e3c9328c76674d5043b7a700b9053b7 > Signed-off-by: Anatol Pomozov <ana...@gm...> > --- > fs/fuse/dev.c | 34 ++++++++++++++++++++++++++++++++++ > fs/namespace.c | 10 ++++++++-- > include/linux/mount.h | 1 + > 3 files changed, 43 insertions(+), 2 deletions(-) > > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c > index e9bdec0..4f592b7 100644 > --- a/fs/fuse/dev.c > +++ b/fs/fuse/dev.c > @@ -19,6 +19,10 @@ > #include <linux/pipe_fs_i.h> > #include <linux/swap.h> > #include <linux/splice.h> > +#include <linux/nsproxy.h> > +#include <linux/mount.h> > + > +#include "../mount.h" > > MODULE_ALIAS_MISCDEV(FUSE_MINOR); > MODULE_ALIAS("devname:fuse"); > @@ -2082,10 +2086,34 @@ void fuse_abort_conn(struct fuse_conn *fc) > } > EXPORT_SYMBOL_GPL(fuse_abort_conn); > > +static void fuse_umount(struct super_block *sb) > +{ > + struct nsproxy *nsp = task_nsproxy(current); > + struct mnt_namespace *ns = nsp->mnt_ns; > + struct mount *mnt, *tmp; > + > + list_for_each_entry_safe(mnt, tmp, &ns->list, mnt_list) { > + struct vfsmount *vfsmnt = &mnt->mnt; > + if (vfsmnt->mnt_sb == sb) { > + /* in case of mount binds there can be more than one > + * mountpoint that corresponds to sb > + */ > + mntget(vfsmnt); > + do_umount(vfsmnt, 0); > + mntput(vfsmnt); > + > + /* TODO: better debug message? */ > + pr_debug("fuse: mountpoint (%d,%d) automatically unmounted\n", > + MAJOR(sb->s_dev), MINOR(sb->s_dev)); > + } > + } > +} > + > int fuse_dev_release(struct inode *inode, struct file *file) > { > struct fuse_conn *fc = fuse_get_conn(file); > if (fc) { > + struct super_block *sb = fc->sb; > spin_lock(&fc->lock); > fc->connected = 0; > fc->blocked = 0; > @@ -2094,6 +2122,12 @@ int fuse_dev_release(struct inode *inode, struct file *file) > wake_up_all(&fc->blocked_waitq); > spin_unlock(&fc->lock); > fuse_conn_put(fc); > + > + /* super block might already be NULL if we killed this fs by > + * "umount" > + */ > + if (sb) > + fuse_umount(sb); > } > > return 0; > diff --git a/fs/namespace.c b/fs/namespace.c > index 55605c5..d7496d8 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -1147,7 +1147,7 @@ void umount_tree(struct mount *mnt, int propagate, struct list_head *kill) > > static void shrink_submounts(struct mount *mnt, struct list_head *umounts); > > -static int do_umount(struct mount *mnt, int flags) > +static int __do_umount(struct mount *mnt, int flags) > { > struct super_block *sb = mnt->mnt.mnt_sb; > int retval; > @@ -1237,6 +1237,12 @@ static int do_umount(struct mount *mnt, int flags) > return retval; > } > > +int do_umount(struct vfsmount *mnt, int flags) > +{ > + return __do_umount(real_mount(mnt), flags); > +} > +EXPORT_SYMBOL(do_umount); > + > /* > * Now umount can handle mount points as well as block devices. > * This is important for filesystems which use unnamed block devices. > @@ -1272,7 +1278,7 @@ SYSCALL_DEFINE2(umount, char __user *, name, int, flags) > if (!ns_capable(mnt->mnt_ns->user_ns, CAP_SYS_ADMIN)) > goto dput_and_out; > > - retval = do_umount(mnt, flags); > + retval = __do_umount(mnt, flags); > dput_and_out: > /* we mustn't call path_put() as that would clear mnt_expiry_mark */ > dput(path.dentry); > diff --git a/include/linux/mount.h b/include/linux/mount.h > index d7029f4..333c1e8 100644 > --- a/include/linux/mount.h > +++ b/include/linux/mount.h > @@ -65,6 +65,7 @@ extern struct vfsmount *mntget(struct vfsmount *mnt); > extern void mnt_pin(struct vfsmount *mnt); > extern void mnt_unpin(struct vfsmount *mnt); > extern int __mnt_is_readonly(struct vfsmount *mnt); > +extern int do_umount(struct vfsmount *mnt, int flags); > > struct file_system_type; > extern struct vfsmount *vfs_kern_mount(struct file_system_type *type, > -- > 1.8.1.3 > |
From: Anatol P. <ana...@gm...> - 2013-04-27 14:47:55
|
Hi On Thu, Dec 13, 2012 at 12:18 AM, Maxim V. Patlasov <mpa...@pa...> wrote: > Anatol, > > 12/13/2012 11:28 AM, Nikolaus Rath пишет: > >> Anatol Pomozov <ana...@pu...> >> writes: >>> >>> Unfortunately SIGKILL produces another issue - the mountpount is left in >>> inconsistent state. libfuse calls umount() in its uninitialization logic >>> and SIGKILL does not give any chance to run umount(). But having clean >>> unmount even in case of SIGKILL would be really nice to have in fuse. >> >> This is generally not a good idea. Imagine if you run a tool like rsync. >> If the source mountpoint suddenly becomes empty, rsync would end up >> deleting everything in the destination. If the mountpoint returns an I/O >> error instead (as it currently does), rsync can detect the problem and >> will instead refuse to do anything. > > > FUSE reconnect can be implemented relatively easy. The idea is to keep > kernel fuse queueing requests while user-space is dead and when it restarts, > it's being connected to existing kernel fuse_conn. Having this feature > implemented, you could SIGKILL deadlocked fuse daemon, then start it again > and umount the filesystem cleanly. Will this scheme be helpful for you? More I think about fuse transparent reconnect more I like it. In the future it will allow to implement stuff like failover (start a new daemon in place of the dead/killed one) and hotswap daemon upgrade. Maxim, have you tied to implement it? Do you see any issues, in particular is it possible to restore daemon state to make filesystem client believe it is the same connection? |
From: Maxim V. P. <mpa...@pa...> - 2013-04-29 13:45:47
|
Hi Anatol, 04/27/2013 06:47 PM, Anatol Pomozov пишет: > Hi > > On Thu, Dec 13, 2012 at 12:18 AM, Maxim V. Patlasov > <mpa...@pa...> wrote: >> Anatol, >> >> 12/13/2012 11:28 AM, Nikolaus Rath пишет: >> >>> Anatol Pomozov <ana...@pu...> >>> writes: >>>> Unfortunately SIGKILL produces another issue - the mountpount is left in >>>> inconsistent state. libfuse calls umount() in its uninitialization logic >>>> and SIGKILL does not give any chance to run umount(). But having clean >>>> unmount even in case of SIGKILL would be really nice to have in fuse. >>> This is generally not a good idea. Imagine if you run a tool like rsync. >>> If the source mountpoint suddenly becomes empty, rsync would end up >>> deleting everything in the destination. If the mountpoint returns an I/O >>> error instead (as it currently does), rsync can detect the problem and >>> will instead refuse to do anything. >> >> FUSE reconnect can be implemented relatively easy. The idea is to keep >> kernel fuse queueing requests while user-space is dead and when it restarts, >> it's being connected to existing kernel fuse_conn. Having this feature >> implemented, you could SIGKILL deadlocked fuse daemon, then start it again >> and umount the filesystem cleanly. Will this scheme be helpful for you? > More I think about fuse transparent reconnect more I like it. In the > future it will allow to implement stuff like failover (start a new > daemon in place of the dead/killed one) and hotswap daemon upgrade. > > Maxim, have you tied to implement it? Yes, there are two patches developed by Pavel Emelyanov: one to show open files in fusectl, and another to reconnect fuse daemon to an existing fuse-connection. I can post them as 'rfc' if you're interested. > Do you see any issues, in > particular is it possible to restore daemon state to make filesystem > client believe it is the same connection? > It's depend on fuse daemon. If it's simple enough, re-opening files listed in fusectl on restart would work. But any transient userspace state not derivable from the list of open files will be the problem. In our case, fuse daemon keeps knowledge about last write request that was flushed on data server (i.e. we sync storage less often than send writes). So after restart the daemon won't be able to recognize whether data servers are in consistent state or not. Thanks, Maxim |