From: Vlastimil B. <ba...@ds...> - 2007-11-30 16:24:18
|
Greetings. First some background. There's a zlomekFS [1] network filesystem project developed on our university. It was first written as a master thesis by Josef Zlomek [2] and consisted of a user-space daemon and Linux kernel module. Recently, it was ported [3] to use FUSE instead of own kernel module, as an another master thesis by Miloslav Trmac [4] (see README in [3] for a quick start guide). Unfortunately, FUSE had to be extended a bit to support kernel cache operations needed for proper functioning of zlomekFS. Because it's a network filesystem, file modifications come not only through the FUSE interface, but also from other network nodes, and in some cases we need to ensure: - no stale cached getattr/lookup results - dirty pages are flushed out - page cache of a file is invalidated (more details in the thesis [4]. section 2.5.2, page 29-30) Of course one solution would be to use very short timeouts or disable kernel cache completely, but that means worse performance. A better solution could be to extend FUSE to allow on-demand invalidation/flush of these kernel caches from user space. (more details in [4], section 3.1, page 43 and section 3.4, page 53. Note that the thesis also proposes some other, e.g. NFS and performance related extensions that might be perhaps interesting to read, but those were not implemented) The implementation adds three functions to be called from user space: - fuse_kernel_invalidate_metadata - invalidates cached metadata - fuse_kernel_invalidate_data - invalidates page cache - fuse_kernel_sync_inode - flushes dirty data It also allows to open file with a FOPEN_NO_CACHING flag to switch between cached and uncached file access. Unfortunately, the implementation had to be a bit tricky because FUSE currently doesn't support "reverse" calls from user to kernel space. Thus, the reverse requests have a header that looks like an usual reply for a standard "forward" request, with request identifier set to 0 and followed by the reverse request itself. And to avoid the asynchronous replies for reverse requests, the error code of the write system call performing the reverse request is used as a error code of the request. This is sufficient for these calls, because they need no reply except the error code, but I know it might feel a bit hackish. (more details are in [4], chapter 4, page 55). We are obviously interested of knowing what the FUSE developers feel about our extensions, and the possibility of having them (or at least a subset) included in future versions. I'm therefore requesting you to kindly look at them and provide comments/suggestions/etc :) The changes should be backwards compatible with existing FUSE-based filesystems, but since it affects also the kernel part, I don't expect straightforward inclusion :) The whole patch is located on [5] and applies also to current svn HEAD (assuming the copy fuse_kernel.h is present in include/ as it is in the tarballs). Mr. Trmac also created a patch for fuse4bsd, located at [6], but noted some difficulties ([4] page 57) in the FreeBSD implementation. Thank you and with best regards, Vlastimil Babka [1] http://dsrg.mff.cuni.cz/~ceres/prj/zlomekFS/ [2] https://shiva.ms.mff.cuni.cz/svn/zzzzzfs/branches/trmac/doc/Zlomek-SharedFileSystem.pdf [3] https://shiva.ms.mff.cuni.cz/svn/zzzzzfs/branches/trmac/ [4] https://shiva.ms.mff.cuni.cz/svn/zzzzzfs/branches/trmac/doc/Trmac-zlomekFSoverFUSE.pdf [5] https://shiva.ms.mff.cuni.cz/svn/zzzzzfs/branches/trmac/patches/fuse-2.7.1.patch [6] https://shiva.ms.mff.cuni.cz/svn/zzzzzfs/branches/trmac/patches/fuse4bsd.patch -- Vlastimil Babka Ph.D. student Distributed Systems Research Group Department of Software Engineering Faculty of Mathematics and Physics Charles University in Prague Czech Republic http://dsrg.mff.cuni.cz/ |
From: Miklos S. <mi...@sz...> - 2007-12-03 11:16:15
|
> We are obviously interested of knowing what the FUSE developers feel > about our extensions, and the possibility of having them (or at least a > subset) included in future versions. I'm therefore requesting you to > kindly look at them and provide comments/suggestions/etc :) The changes > should be backwards compatible with existing FUSE-based filesystems, but > since it affects also the kernel part, I don't expect straightforward > inclusion :) Thanks for the patch and the writeup. This cache invalidation API seems to be needed by an increasing number of projects, so I'll take a serious look at adding support for the next release. There's a similar patch written by John Muir: http://article.gmane.org/gmane.comp.file-systems.fuse.devel/5299 Miklos |
From: Vlastimil B. <ba...@ds...> - 2007-12-07 13:33:50
|
Miklos Szeredi wrote: >> We are obviously interested of knowing what the FUSE developers feel >> about our extensions, and the possibility of having them (or at least a >> subset) included in future versions. I'm therefore requesting you to >> kindly look at them and provide comments/suggestions/etc :) The changes >> should be backwards compatible with existing FUSE-based filesystems, but >> since it affects also the kernel part, I don't expect straightforward >> inclusion :) > > Thanks for the patch and the writeup. This cache invalidation API > seems to be needed by an increasing number of projects, so I'll take a > serious look at adding support for the next release. Great news, we'll sure be looking forward to that and test and comment on that progress! > There's a similar patch written by John Muir: > > http://article.gmane.org/gmane.comp.file-systems.fuse.devel/5299 Looks like I missed this thread when looking into archives :) Seems it's kinda similar solution, The exact implementation shouldn't matter, as long as there's the needed functionality :) Vlastimil |
From: Csaba H. <csa...@cr...> - 2007-12-21 18:03:45
|
On 2007-11-30, Vlastimil Babka <ba...@ds...> wrote: > Unfortunately, FUSE had to be extended a bit to support kernel cache > operations needed for proper functioning of zlomekFS. Because it's a > network filesystem, file modifications come not only through the FUSE > interface, but also from other network nodes, and in some cases we need > to ensure: > - no stale cached getattr/lookup results > - dirty pages are flushed out > - page cache of a file is invalidated > (more details in the thesis [4]. section 2.5.2, page 29-30) AFAICS it's a pretty basic feature of the FUSE design that dirty pages are flushed immediately, so the second item is not an issue with FUSE. > The whole patch is located on [5] and applies also to current svn HEAD > (assuming the copy fuse_kernel.h is present in include/ as it is in the > tarballs). Mr. Trmac also created a patch for fuse4bsd, located at [6], > but noted some difficulties ([4] page 57) in the FreeBSD implementation. I've run trough the FreeBSD implementation and Miloslav's notes (I Cc this post also to him, I hope I dug up a working email address of him...). If I understand correctly, the difficulty is that FreeBSD has its elaborate locking scheme for doing *anything* with vnodes, and that the locks are set up appropriately by the VFS machinery for you if you bump into a vnode in a VFS syscall handler -- however, if you want to do something with a vnode in an arbitrary context, it's left to you to ensure the safe access to the vnode and this is not made easy in any way. The "Zlomek" FBSD patch introduced its own heroic elaborate locking scheme (utilizing vnode interlocks) to get done with this, and the mission is claimed/seems to be completed, with some warts (there is a comment however: /* So, this is EVIL. Nevertheless, vp must be unlocked during the blocking release. Let's hope the VFS doesn't collapse... */ [I just hope such issues have been _introduced_ by the patch and not have been _discovered_ in the original code -- tell me if pls the original is also affected...]). So I've got the picture to some degree, but I have a much more simple idea, which could avoid all this hassle -- I'd be happy to hear what do think about that. My idea is that the kernel side handler of the daemon -> kernel cache inval messages shouldn't look up or operate on vnodes at all. It should do nothing apart from inserting the cache inval request into a hash table (stored in the "superblock" of the fs, ie. the fuse_data structure). When some user of the fs is doing some I/O on a vnode, at the appropriate phase of handling the syscall (I guess that would be fuse_getattr() and fuse_strategy_i() in our case [name cache is not yet implemented for FreeBSD, so fuse_lookup() is not affected ATM]) we would do a lookup in the cache inval hash table. If we find an entry which refers to the current vnode, we invalidate the cache. That's it -- with this design, the only thing for we'd need to do extra synchronization is the hash table. Regards, Csaba |