From: Brian F. <bf...@re...> - 2013-07-09 18:05:37
|
On 07/09/2013 10:01 AM, Maxim Patlasov wrote: > Hi Brian, > > Thanks a lot for in-depth review! See, please, inline comments below. > > 07/08/2013 11:46 PM, Brian Foster пишет: >> On 06/29/2013 01:44 PM, Maxim Patlasov wrote: ... >>> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c >>> index b755884..cd67a0c 100644 >>> --- a/fs/fuse/dir.c >>> +++ b/fs/fuse/dir.c >>> @@ -854,8 +854,11 @@ static void fuse_fillattr(struct inode *inode, >>> struct fuse_attr *attr, >>> struct fuse_conn *fc = get_fuse_conn(inode); >>> /* see the comment in fuse_change_attributes() */ >>> - if (fc->writeback_cache && S_ISREG(inode->i_mode)) >>> + if (fc->writeback_cache && S_ISREG(inode->i_mode)) { >>> attr->size = i_size_read(inode); >>> + attr->mtime = inode->i_mtime.tv_sec; >>> + attr->mtimensec = inode->i_mtime.tv_nsec; >>> + } >> Could we test the new fuse_inode bit here instead of the writeback_cache >> setting and preserve existing behavior in cases where the in core mtime >> is not dirty? > > I considered that approach but gave it up deliberately because it would > require extra locking. All places where the bit is modified are > protected (implicitly) by i_mutex. But getattr is not protected. Hence, > when we execute fuse_fillattr() we cannot be sure that the current state > is applicable to a moment in the past when the server was filling > attr->mtime for us. > > A race could look like this: getattr gets some old stale mtime from the > server, then another thread executing fuse_fsync_common() intervenes > updating mtime on the server and clearing the bit, then fuse_fillattr > resumes execution and observes that in core mtime is not dirty. But this > doesn't imply that server' mtime is valid. > Hi Maxim, Thanks for the context. I suspect what I was sort of expecting this to look like is more along the lines of where you started. ... >>> +/* >>> + * S_NOCMTIME is clear, so we need to update inode->i_mtime >>> manually. But >>> + * we can also clear FUSE_I_MTIME_UPDATED if FUSE_SETATTR has just >>> changed >>> + * mtime on server. >>> + */ >>> +static void fuse_set_mtime_local(struct iattr *iattr, struct inode >>> *inode) >>> +{ >>> + unsigned ivalid = iattr->ia_valid; >>> + >>> + if ((ivalid & ATTR_MTIME) && update_mtime(ivalid)) { >>> + if (ivalid & ATTR_MTIME_SET) >>> + set_mtime_helper(inode, iattr->ia_mtime); >>> + else >>> + set_mtime_helper(inode, current_fs_time(inode->i_sb)); >>> + } else if (ivalid & ATTR_SIZE) >>> + set_mtime_helper(inode, current_fs_time(inode->i_sb)); >>> +} >>> + >> So we set i_mtime and clear the dirty bit in set_mtime_helper(). In the >> cases where ia_mtime hasn't been set, how have we ensured the in core >> and client fs mtimes are equivalent? > > The only case when set_mtime_helper is called without ia_mtime being set > is ATTR_SIZE. truncate(2) is supposed to update mtime on the server to > "now". This is equivalent to updating local i_mtime to "now" in kernel > fuse. Of course, there can be some discrepancy between "now" here and > there, but it must be harmless (because we always trust only our local > in-kernel i_mtime). > ... >>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c ... >>> @@ -2553,8 +2571,16 @@ static long fuse_file_fallocate(struct file >>> *file, int mode, loff_t offset, >>> goto out; >>> /* we could have extended the file */ >>> - if (!(mode & FALLOC_FL_KEEP_SIZE)) >>> - fuse_write_update_size(inode, offset + length); >>> + if (!(mode & FALLOC_FL_KEEP_SIZE)) { >>> + bool changed = fuse_write_update_size(inode, offset + length); >>> + >>> + if (changed && fc->writeback_cache) { >>> + struct fuse_inode *fi = get_fuse_inode(inode); >>> + >>> + inode->i_mtime = current_fs_time(inode->i_sb); >>> + clear_bit(FUSE_I_MTIME_UPDATED, &fi->state); >> Same question here as for fuse_set_mtime_local() with regard to >> equivalent mtime. > > So far as we don't set S_NOCMTIME in i_flags anymore, we can (and must) > maintain i_mtime locally (i.e. in kernel fuse). And we're doing it > consistently by trusting in core i_mtime only. If the size of file was > changed, the server is supposed to update its mtime as well. So, it > should be safe to clear the bit. Some discrepancy between "now" here and > there must be harmless. > > Actually, the patch is not targeted at handling mtime-s (local and > server) absolutely precisely. The problem we're solving is the > following: an application which successfully wrote some data to a file > must never see (by fstat(2)) mtime older than the moment when the write > happened. E.g. a mtime-based backup tool must not be confused by old > stale mtime if the content of a file has actually been changed. > I follow the train of thought, but aren't we making some assumptions about the potential for discrepancy between "here and there?" E.g., what if the client fs is doing its own caching, etc.? I haven't experimented with this mechanism yet, but it seems like even though we trust the in-core mtime, the user-visible mtime could still change unexpectedly in certain scenarios if the inode happens to be evicted. So the in-core ownership of mtime makes sense given writeback mode, but to me, that means it should also be the cache's responsibility to eventually synchronize to the client fs any time that cache takes it upon itself to update the inode. Could we implement this in a way that ensures that happens? For example, mark_inode_dirty() whenever we update the in-core mtime and implement a write_inode() handler to invoke the new fuse_flush_mtime() handler. We've already introduced the mtime flush mechanism on release and general policy of in-core ownership, after all. Brian >> >>> + } >>> + } >>> if (mode & FALLOC_FL_PUNCH_HOLE) >>> truncate_pagecache_range(inode, offset, offset + length - 1); >>> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h >>> index a0062a0..82abb82 100644 >>> --- a/fs/fuse/fuse_i.h >>> +++ b/fs/fuse/fuse_i.h >>> @@ -115,6 +115,8 @@ struct fuse_inode { >>> enum { >>> /** Advise readdirplus */ >>> FUSE_I_ADVISE_RDPLUS, >>> + /** i_mtime has been updated locally; a flush to userspace >>> needed */ >>> + FUSE_I_MTIME_UPDATED, >>> }; >>> struct fuse_conn; >>> @@ -867,7 +869,9 @@ long fuse_ioctl_common(struct file *file, >>> unsigned int cmd, >>> unsigned fuse_file_poll(struct file *file, poll_table *wait); >>> int fuse_dev_release(struct inode *inode, struct file *file); >>> -void fuse_write_update_size(struct inode *inode, loff_t pos); >>> +bool fuse_write_update_size(struct inode *inode, loff_t pos); >>> + >>> +int fuse_flush_mtime(struct file *file, bool nofail); >>> int fuse_do_setattr(struct inode *inode, struct iattr *attr, >>> struct file *file); >>> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c >>> index 121638d..591b46c 100644 >>> --- a/fs/fuse/inode.c >>> +++ b/fs/fuse/inode.c >>> @@ -170,8 +170,11 @@ void fuse_change_attributes_common(struct inode >>> *inode, struct fuse_attr *attr, >>> inode->i_blocks = attr->blocks; >>> inode->i_atime.tv_sec = attr->atime; >>> inode->i_atime.tv_nsec = attr->atimensec; >>> - inode->i_mtime.tv_sec = attr->mtime; >>> - inode->i_mtime.tv_nsec = attr->mtimensec; >>> + /* mtime from server may be stale due to local buffered write */ >>> + if (!fc->writeback_cache || !S_ISREG(inode->i_mode)) { >>> + inode->i_mtime.tv_sec = attr->mtime; >>> + inode->i_mtime.tv_nsec = attr->mtimensec; >>> + } >> Same comment as for fuse_fillattr(). > > There are many ways to get in fuse_change_attributes_common(). Most of > them are not necessarily protected by i_mutex. Hence the same race as I > described above would be possible if we relied on > test_bit(FUSE_I_MTIME_UPDATED, &fi->state) here. > > Generally, fully trust in-kernel i_mtime looks as lesser evil than > serializing all points of access to FUSE_I_MTIME_UPDATED bit, IMHO. > > Thanks, > Maxim > >> >> Brian >> >>> inode->i_ctime.tv_sec = attr->ctime; >>> inode->i_ctime.tv_nsec = attr->ctimensec; >>> @@ -249,6 +252,8 @@ static void fuse_init_inode(struct inode >>> *inode, struct fuse_attr *attr) >>> { >>> inode->i_mode = attr->mode & S_IFMT; >>> inode->i_size = attr->size; >>> + inode->i_mtime.tv_sec = attr->mtime; >>> + inode->i_mtime.tv_nsec = attr->mtimensec; >>> if (S_ISREG(inode->i_mode)) { >>> fuse_init_common(inode); >>> fuse_init_file_inode(inode); >>> @@ -295,7 +300,9 @@ struct inode *fuse_iget(struct super_block *sb, >>> u64 nodeid, >>> return NULL; >>> if ((inode->i_state & I_NEW)) { >>> - inode->i_flags |= S_NOATIME|S_NOCMTIME; >>> + inode->i_flags |= S_NOATIME; >>> + if (!fc->writeback_cache) >>> + inode->i_flags |= S_NOCMTIME; >>> inode->i_generation = generation; >>> inode->i_data.backing_dev_info = &fc->bdi; >>> fuse_init_inode(inode, attr); >>> >> >> > |