Re: [fuse-devel] [PATCH 0/10] fuse: An attempt to implement a write-back cache policy

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On 09/06/2012 05:49 AM, Pavel Emelyanov wrote:
> On 09/05/2012 10:10 PM, Brian Foster wrote:
>> (Modified CC list)
>>
>> On 07/03/2012 11:53 AM, Pavel Emelyanov wrote:
>>> Hi everyone.
>>>
>>
>> Hi Pavel,
>>
>> First off, thanks for this work and sorry for the late response relative
>> to your post. We had a chance to play around with this a bit in general
>> and throw some performance tests at a gluster volume and the result was
>> positive (something near a 4x improvement in sequential 4k writes). This
>> patch set seems rather promising.
> 
> Cool! Thanks for looking into this :)
> 

No problem, thank you!

>> Some questions from review and playing around with this...
>>
>> ...
>>> The writeback feature is per-connection and is explicitly configurable at the
>>> init stage (is it worth making it CAP_SOMETHING protected?) When the writeback is
>>> turned ON:
>>>
>>> * still copy writeback pages to temporary buffer when sending a writeback request
>>>   and finish the page writeback immediately
>>>
>>> * make kernel maintain the inode's i_size to avoid frequent i_size synchronization
>>>   with the user space
>>>
>>
>> Would this pose a problem for a filesystem in which the size of the
>> inode can change remotely (i.e., not visible to the local instance of
>> fuse)? I haven't tested this, but it seems like it could be an issue
>> based on the implementation.
> 
> Yes, it will. The model of i_size management I implemented here is based
> on an assumption that the userspace is just a storage for data and should
> catch up with the kernel i_size value. In order to make it possible for user
> space to update i_size in kernel we'd have to implement some (probably)
> tricky algorithm, I haven't yet thought about it.
> 

Ok... any thoughts on something like the approach described immediately
below? I'm not an NFS expert by any means, but I think this is how it
handles such things and it at least seems like it would work. Maybe I'll
play around with it a bit more if I get some time.

>> Having a look at NFS, it appears to include logic to update the kernel
>> i_size from the remote side if the client side has no pending writes.
>> Perhaps we could do something similar here to make sure we account for
>> cached writes, but also can pick up remote changes to the inode?
>>
>> Unrelated to that, we noticed in some tests that fuse was issuing read
>> requests beyond EOF (presumably in write_begin via fuse_prepare_write())
>> on sub-page size appending writes (i.e., dd if=/dev/zero of=file bs=1k).
>> This seems like a minor bug. Again, taking a look at NFS, it includes a
>> check of the page index against the file size to skip the read. Perhaps
>> something similar is necessary here as well?
> 
> Do you mean -- fuse kernel sends a request to the userspace to read data
> beyond the user space's file boundary?
> 

Yes, that is what I was seeing. i.e., the first 1k write to a file would
be preceded by a 4k read at offset 0, even though the file size is 0.
The writes still seemed to come through correctly. I think it's just a
matter of checking the index of the page against the file size in
write_begin (i.e., like nfs_want_read_modify_write() and
nfs_page_length() in NFS).

>> Thanks again. We look forward to checking out the next version. :)
> 
> I'm doing my best to fix Miklos' comments, but it doesn't move as fast as I
> wanted it to, sorry :( Plus I've found more bugs while testing it and the
> logic is becoming more complicated which also takes time.
> 

No problem, there's some complicated problems to solve yet. Glad to hear
this work is still in progress. :) Thanks again.

Brian

>> Brian
> 
> Thanks,
> Pavel
>