From: Brian F. <bf...@re...> - 2012-09-06 13:46:34
|
On 09/06/2012 05:49 AM, Pavel Emelyanov wrote: > On 09/05/2012 10:10 PM, Brian Foster wrote: >> (Modified CC list) >> >> On 07/03/2012 11:53 AM, Pavel Emelyanov wrote: >>> Hi everyone. >>> >> >> Hi Pavel, >> >> First off, thanks for this work and sorry for the late response relative >> to your post. We had a chance to play around with this a bit in general >> and throw some performance tests at a gluster volume and the result was >> positive (something near a 4x improvement in sequential 4k writes). This >> patch set seems rather promising. > > Cool! Thanks for looking into this :) > No problem, thank you! >> Some questions from review and playing around with this... >> >> ... >>> The writeback feature is per-connection and is explicitly configurable at the >>> init stage (is it worth making it CAP_SOMETHING protected?) When the writeback is >>> turned ON: >>> >>> * still copy writeback pages to temporary buffer when sending a writeback request >>> and finish the page writeback immediately >>> >>> * make kernel maintain the inode's i_size to avoid frequent i_size synchronization >>> with the user space >>> >> >> Would this pose a problem for a filesystem in which the size of the >> inode can change remotely (i.e., not visible to the local instance of >> fuse)? I haven't tested this, but it seems like it could be an issue >> based on the implementation. > > Yes, it will. The model of i_size management I implemented here is based > on an assumption that the userspace is just a storage for data and should > catch up with the kernel i_size value. In order to make it possible for user > space to update i_size in kernel we'd have to implement some (probably) > tricky algorithm, I haven't yet thought about it. > Ok... any thoughts on something like the approach described immediately below? I'm not an NFS expert by any means, but I think this is how it handles such things and it at least seems like it would work. Maybe I'll play around with it a bit more if I get some time. >> Having a look at NFS, it appears to include logic to update the kernel >> i_size from the remote side if the client side has no pending writes. >> Perhaps we could do something similar here to make sure we account for >> cached writes, but also can pick up remote changes to the inode? >> >> Unrelated to that, we noticed in some tests that fuse was issuing read >> requests beyond EOF (presumably in write_begin via fuse_prepare_write()) >> on sub-page size appending writes (i.e., dd if=/dev/zero of=file bs=1k). >> This seems like a minor bug. Again, taking a look at NFS, it includes a >> check of the page index against the file size to skip the read. Perhaps >> something similar is necessary here as well? > > Do you mean -- fuse kernel sends a request to the userspace to read data > beyond the user space's file boundary? > Yes, that is what I was seeing. i.e., the first 1k write to a file would be preceded by a 4k read at offset 0, even though the file size is 0. The writes still seemed to come through correctly. I think it's just a matter of checking the index of the page against the file size in write_begin (i.e., like nfs_want_read_modify_write() and nfs_page_length() in NFS). >> Thanks again. We look forward to checking out the next version. :) > > I'm doing my best to fix Miklos' comments, but it doesn't move as fast as I > wanted it to, sorry :( Plus I've found more bugs while testing it and the > logic is becoming more complicated which also takes time. > No problem, there's some complicated problems to solve yet. Glad to hear this work is still in progress. :) Thanks again. Brian >> Brian > > Thanks, > Pavel > |