Re: [fuse-devel] [PATCH 07/16] fuse: Trust kernel i_mtime only

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

07/09/2013 09:27 PM, Brian Foster пишет:
> On 07/09/2013 10:01 AM, Maxim Patlasov wrote:
>> Hi Brian,
>>
>> Thanks a lot for in-depth review! See, please, inline comments below.
>>
>> 07/08/2013 11:46 PM, Brian Foster пишет:
>>> On 06/29/2013 01:44 PM, Maxim Patlasov wrote:
> ...
>>>> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
>>>> index b755884..cd67a0c 100644
>>>> --- a/fs/fuse/dir.c
>>>> +++ b/fs/fuse/dir.c
>>>> @@ -854,8 +854,11 @@ static void fuse_fillattr(struct inode *inode,
>>>> struct fuse_attr *attr,
>>>>        struct fuse_conn *fc = get_fuse_conn(inode);
>>>>          /* see the comment in fuse_change_attributes() */
>>>> -    if (fc->writeback_cache && S_ISREG(inode->i_mode))
>>>> +    if (fc->writeback_cache && S_ISREG(inode->i_mode)) {
>>>>            attr->size = i_size_read(inode);
>>>> +        attr->mtime = inode->i_mtime.tv_sec;
>>>> +        attr->mtimensec = inode->i_mtime.tv_nsec;
>>>> +    }
>>> Could we test the new fuse_inode bit here instead of the writeback_cache
>>> setting and preserve existing behavior in cases where the in core mtime
>>> is not dirty?
>> I considered that approach but gave it up deliberately because it would
>> require extra locking. All places where the bit is modified are
>> protected (implicitly) by i_mutex. But getattr is not protected. Hence,
>> when we execute fuse_fillattr() we cannot be sure that the current state
>> is applicable to a moment in the past when the server was filling
>> attr->mtime for us.
>>
>> A race could look like this: getattr gets some old stale mtime from the
>> server, then another thread executing fuse_fsync_common() intervenes
>> updating mtime on the server and clearing the bit, then fuse_fillattr
>> resumes execution and observes that in core mtime is not dirty. But this
>> doesn't imply that server' mtime is valid.
>>
> Hi Maxim,
>
> Thanks for the context. I suspect what I was sort of expecting this to
> look like is more along the lines of where you started.

Hi Brian,

I forgot to mention that both approaches were well discussed on 
fuse-devel. Just in case you're interested to see how it was evolving:

http://article.gmane.org/gmane.linux.file-systems/65943
http://article.gmane.org/gmane.comp.file-systems.fuse.devel/12273
http://article.gmane.org/gmane.linux.file-systems/71275
http://article.gmane.org/gmane.linux.kernel/1463766
http://article.gmane.org/gmane.comp.file-systems.fuse.devel/12822

more inline comments below ...

>
> ...
>>>> +/*
>>>> + * S_NOCMTIME is clear, so we need to update inode->i_mtime
>>>> manually. But
>>>> + * we can also clear FUSE_I_MTIME_UPDATED if FUSE_SETATTR has just
>>>> changed
>>>> + * mtime on server.
>>>> + */
>>>> +static void fuse_set_mtime_local(struct iattr *iattr, struct inode
>>>> *inode)
>>>> +{
>>>> +    unsigned ivalid = iattr->ia_valid;
>>>> +
>>>> +    if ((ivalid & ATTR_MTIME) && update_mtime(ivalid)) {
>>>> +        if (ivalid & ATTR_MTIME_SET)
>>>> +            set_mtime_helper(inode, iattr->ia_mtime);
>>>> +        else
>>>> +            set_mtime_helper(inode, current_fs_time(inode->i_sb));
>>>> +    } else if (ivalid & ATTR_SIZE)
>>>> +        set_mtime_helper(inode, current_fs_time(inode->i_sb));
>>>> +}
>>>> +
>>> So we set i_mtime and clear the dirty bit in set_mtime_helper(). In the
>>> cases where ia_mtime hasn't been set, how have we ensured the in core
>>> and client fs mtimes are equivalent?
>> The only case when set_mtime_helper is called without ia_mtime being set
>> is ATTR_SIZE. truncate(2) is supposed to update mtime on the server to
>> "now". This is equivalent to updating local i_mtime to "now" in kernel
>> fuse. Of course, there can be some discrepancy between "now" here and
>> there, but it must be harmless (because we always trust only our local
>> in-kernel i_mtime).
>>
> ...
>>>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> ...
>>>>    @@ -2553,8 +2571,16 @@ static long fuse_file_fallocate(struct file
>>>> *file, int mode, loff_t offset,
>>>>            goto out;
>>>>          /* we could have extended the file */
>>>> -    if (!(mode & FALLOC_FL_KEEP_SIZE))
>>>> -        fuse_write_update_size(inode, offset + length);
>>>> +    if (!(mode & FALLOC_FL_KEEP_SIZE)) {
>>>> +        bool changed = fuse_write_update_size(inode, offset + length);
>>>> +
>>>> +        if (changed && fc->writeback_cache) {
>>>> +            struct fuse_inode *fi = get_fuse_inode(inode);
>>>> +
>>>> +            inode->i_mtime = current_fs_time(inode->i_sb);
>>>> +            clear_bit(FUSE_I_MTIME_UPDATED, &fi->state);
>>> Same question here as for fuse_set_mtime_local() with regard to
>>> equivalent mtime.
>> So far as we don't set S_NOCMTIME in i_flags anymore, we can (and must)
>> maintain i_mtime locally (i.e. in kernel fuse). And we're doing it
>> consistently by trusting in core i_mtime only. If the size of file was
>> changed, the server is supposed to update its mtime as well. So, it
>> should be safe to clear the bit. Some discrepancy between "now" here and
>> there must be harmless.
>>
>> Actually, the patch is not targeted at handling mtime-s (local and
>> server) absolutely precisely. The problem we're solving is the
>> following: an application which successfully wrote some data to a file
>> must never see (by fstat(2)) mtime older than the moment when the write
>> happened. E.g. a mtime-based backup tool must not be confused by old
>> stale mtime if the content of a file has actually been changed.
>>
> I follow the train of thought, but aren't we making some assumptions
> about the potential for discrepancy between "here and there?" E.g., what
> if the client fs is doing its own caching, etc.?

Before diving in details, let me please formulate the core idea: set 
FUSE_I_MTIME_UPDATED bit when a file is locally modified, flush it to 
the server (client fs) on last close (actually on last fput() to be 
precise). All other things like special processing of setattr w/ 
ATTR_SIZE and fallocate are only optimizations to avoid that final 
FUSE_SETATTR request. We can easily throw them out if you think they're 
too cumbersome for real-life client fs implementations.

Let's now sync our understanding of "here and there". We're probably on 
the same page about "here": fuse_do_setattr() calls 
fuse_set_mtime_local() which calls set_mtime_helper() which sets i_mtime 
to "now" explicitly. As for "there", I meant userspace fuse reading 
FUSE_SETATTR (w/ FATTR_SIZE) request from the kernel. So far as the 
userspace set FUSE_WRITEBACK_CACHE on init stage explicitly, it's aware 
that "now" should be calculated and stashed soon after getting such a 
request from kernel.

So, the potential for discrepancy between "here and there" is the delay 
(e.g. due to scheduling latency on heavy-loaded node) between execution 
of kernel and userspace handlers involved in processing FUSE_SETATTR (or 
any other request which is supposed to update mtime). The worst case is 
that mtime on persistent storage will be slightly ahead of what the user 
have seen by fstat(2). This what I called "harmless" in my previous 
email. But now I understand that my implementation went astray: I had to 
stash "now" *before* fuse_request_send() and set i_mtime to that saved 
value. It was overlook from my side, sorry for confusion, and thanks for 
bringing my attention to it ;)

As for client fs doing caching, it's up to client fs to stash current 
"now" along with data/metadata in its cache to ensure that when the 
data/metadata are synced eventually, the mtime on persistent storage is 
updated accordingly.

> I haven't experimented
> with this mechanism yet, but it seems like even though we trust the
> in-core mtime, the user-visible mtime could still change unexpectedly in
> certain scenarios if the inode happens to be evicted.

An inode cannot be evicted until last reference goes away. Hence 
fuse_release() ensures flushing mtime before eviction. Is your concern 
about clearing the bit in the middle?

>
> So the in-core ownership of mtime makes sense given writeback mode, but
> to me, that means it should also be the cache's responsibility to
> eventually synchronize to the client fs any time that cache takes it
> upon itself to update the inode.

Yes, I agree. I only attempted to avoid explicit synchronization in 
situations where mtime is already synchronized implicitly. E.g. in 
synchronous processing of direct write.

>
> Could we implement this in a way that ensures that happens? For example,
> mark_inode_dirty() whenever we update the in-core mtime and implement a
> write_inode() handler to invoke the new fuse_flush_mtime() handler.
> We've already introduced the mtime flush mechanism on release and
> general policy of in-core ownership, after all.

It's already implemented to ensure that mtime is flushed eventually. If 
we claim those optimizations dangerous, let's abandon them.

FUSE write_inode() is interesting, but it won't come for free -- we'll 
need extra locking to serialize mtime updates and setattr-s. Based on 
quite a long history of given patch and the patchset at whole, I'd 
prefer to polish what's already done rather than reworking it from 
scratch again.

Thanks,
Maxim